Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quran.com integration #67

Merged
merged 36 commits into from
Nov 7, 2024
Merged

Quran.com integration #67

merged 36 commits into from
Nov 7, 2024

Conversation

abdullah-alnahas
Copy link
Collaborator

@abdullah-alnahas abdullah-alnahas commented Nov 6, 2024

Changes:

  • Implemented initial version of AnsariWorkflow, an agent that executes predefined workflows instead of LLM-driven decisions
  • Created a generic Vectara search tool that can be instantiated for both mawsuah and tafsir searches
  • Enhanced GitHub Actions performance:
    • Switched from pip to uv for faster package installation
    • Changed PostgreSQL port to 5433 to avoid conflicts with local instances when running act locally
  • Refactored ansari.py for improved code organization
  • Updated tool descriptions to ensure proper function calls by GPT-4

BREAKING CHANGE: Vectara API v2 Migration

  • Switched to Vectara's RESTful API v2
  • Renamed VECTARA_AUTH_TOKEN to VECTARA_API_KEY to match new API conventions
  • Removed VECTARA_CUSTOMER_ID requirement

Required Action:
Could you please create a new Vectara API key with query permissions for mawsuah and tafsir corpora. Update this key in the deployment environment?

@abdullah-alnahas abdullah-alnahas changed the title [WIP] Quran.com integration Quran.com integration Nov 7, 2024
Copy link
Collaborator

@waleedkadous waleedkadous left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nicely done. MAy Allah reward you. Some minor todos but none are blocking.

One thing to discuss: still not sure of the value of Ansari Workflow. But we can work this out later.

@@ -36,7 +37,7 @@ jobs:
POSTGRES_PASSWORD: postgres
PGPASSWORD: postgres
ports:
- "5432:5432"
- "5433:5432"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why this is necessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is purely for convenience. I usually run a local PostgreSQL instance on port 5432 for development. When testing the GitHub Action locally using 'act', I need to shut down my local PostgreSQL instance to avoid port conflicts. By changing the port, I can avoid this issue and keep both services running simultaneously :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Please leave a comment explaining this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

settings.VECTARA_AUTH_TOKEN.get_secret_value(),
settings.VECTARA_CUSTOMER_ID,
settings.VECTARA_CORPUS_ID,
sm = SearchVectara(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this generalization and refactoring.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks :)

self.message_history = [
{"role": "system", "content": self.sys_msg}
] + message_history
print(f"Original trace is {self.message_logger.trace_id}")
print(f"Id 1 is {langfuse_context.get_current_trace_id()}")
logger.info(f"Original trace is {self.message_logger.trace_id}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for correcting this. Apologies.


Attributes:
tool_name_to_instance (dict): Mapping of tool names to their respective instances.
model (str): The name of the language model to use.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo (for future): Support different models in different parts of the workflow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I also don't understand: why doesn't this replace agents/ansari.py?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is based on our previous WhatsApp discussion. We identified two distinct use cases:

  1. Letting the LLM dynamically decide whether and which tools to use
  2. Enforcing a specific sequence of tool usage

For some scenarios (like the quran.com integration), we want to guarantee a specific workflow - first execute a search, then formulate an answer based on the search results. Therefore:

  • Ansari will handle cases where the LLM makes tool-usage decisions
  • AnsariWorkflo will execute predefined sequences of steps


class AnsariWorkflow:
"""
AnsariWorkflow manages the execution of modular workflow steps for processing user queries.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to overlap a little bit with LangChain, but let's try it for now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes :) I hadn't considered LangChain initially, probably because I perceived it as overly complex. However, you make a good point - LangChain offers numerous valuable features. If we need that functionality, switching to LangChain would make sense. LangGraph is also quite impressive.

Kathir's work. Regardless of the language used in the original conversation,
you will translate the query into English before searching the tafsir. The
function returns a list of **potentially** relevant matches, which may include
multiple passages of interpretation and analysis.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be pulled out into a resource instead of included in the code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes total sense since it's part of the prompt.

# Create AnsariWorkflow instance
ansari_workflow = AnsariWorkflow(settings)

ayah_id = req.surah*1000 + req.ayah
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull this out into a separate function in a separate library e.g. ayah_get_range_index(surah, ayah).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting we create a sub-package for Quran.com-related functionality?

@@ -0,0 +1,139 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be checked in in a more permanent place (e.g. "how to use Ansari from Jupyter") or similar.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Should we create a separate repository for experimental code, unused but potentially useful implementations (e.g. dspy work ).
This would help keep our main codebase clean while preserving valuable work.

@@ -543,3 +545,61 @@ async def test_add_feedback(login_user, create_thread):
)
assert response.status_code == 200
assert response.json() == {"status": "success"}

@pytest.fixture(scope="module")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding more tests!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure :)

"name": TOOL_NAME,
"description": "Search the Hadith for relevant narrations. Returns a list of hadith. Multiple hadith may be relevant.",
"name": "search_hadith",
"description": "Search for relevant Hadith narrations based on a specific topic.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gpt-4o was deciding to use the tools but wasn't outputting the required arguments (the query parameter in this case). This was causing the CI check to fail. After modifying the tool description, the model started providing the arguments correctly.

@waleedkadous waleedkadous merged commit 2aa47e6 into main Nov 7, 2024
1 check passed
@abdullah-alnahas abdullah-alnahas deleted the feat/workflow-exec branch November 8, 2024 06:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants