-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quran.com integration #67
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nicely done. MAy Allah reward you. Some minor todos but none are blocking.
One thing to discuss: still not sure of the value of Ansari Workflow. But we can work this out later.
@@ -36,7 +37,7 @@ jobs: | |||
POSTGRES_PASSWORD: postgres | |||
PGPASSWORD: postgres | |||
ports: | |||
- "5432:5432" | |||
- "5433:5432" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious why this is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is purely for convenience. I usually run a local PostgreSQL instance on port 5432 for development. When testing the GitHub Action locally using 'act', I need to shut down my local PostgreSQL instance to avoid port conflicts. By changing the port, I can avoid this issue and keep both services running simultaneously :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Please leave a comment explaining this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
settings.VECTARA_AUTH_TOKEN.get_secret_value(), | ||
settings.VECTARA_CUSTOMER_ID, | ||
settings.VECTARA_CORPUS_ID, | ||
sm = SearchVectara( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this generalization and refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks :)
self.message_history = [ | ||
{"role": "system", "content": self.sys_msg} | ||
] + message_history | ||
print(f"Original trace is {self.message_logger.trace_id}") | ||
print(f"Id 1 is {langfuse_context.get_current_trace_id()}") | ||
logger.info(f"Original trace is {self.message_logger.trace_id}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for correcting this. Apologies.
|
||
Attributes: | ||
tool_name_to_instance (dict): Mapping of tool names to their respective instances. | ||
model (str): The name of the language model to use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Todo (for future): Support different models in different parts of the workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I also don't understand: why doesn't this replace agents/ansari.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation is based on our previous WhatsApp discussion. We identified two distinct use cases:
- Letting the LLM dynamically decide whether and which tools to use
- Enforcing a specific sequence of tool usage
For some scenarios (like the quran.com integration), we want to guarantee a specific workflow - first execute a search, then formulate an answer based on the search results. Therefore:
Ansari
will handle cases where the LLM makes tool-usage decisionsAnsariWorkflo
will execute predefined sequences of steps
|
||
class AnsariWorkflow: | ||
""" | ||
AnsariWorkflow manages the execution of modular workflow steps for processing user queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to overlap a little bit with LangChain, but let's try it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes :) I hadn't considered LangChain initially, probably because I perceived it as overly complex. However, you make a good point - LangChain offers numerous valuable features. If we need that functionality, switching to LangChain would make sense. LangGraph is also quite impressive.
Kathir's work. Regardless of the language used in the original conversation, | ||
you will translate the query into English before searching the tafsir. The | ||
function returns a list of **potentially** relevant matches, which may include | ||
multiple passages of interpretation and analysis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be pulled out into a resource instead of included in the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes total sense since it's part of the prompt.
# Create AnsariWorkflow instance | ||
ansari_workflow = AnsariWorkflow(settings) | ||
|
||
ayah_id = req.surah*1000 + req.ayah |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull this out into a separate function in a separate library e.g. ayah_get_range_index(surah, ayah).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting we create a sub-package for Quran.com-related functionality?
@@ -0,0 +1,139 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably be checked in in a more permanent place (e.g. "how to use Ansari from Jupyter") or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Should we create a separate repository for experimental code, unused but potentially useful implementations (e.g. dspy work ).
This would help keep our main codebase clean while preserving valuable work.
@@ -543,3 +545,61 @@ async def test_add_feedback(login_user, create_thread): | |||
) | |||
assert response.status_code == 200 | |||
assert response.json() == {"status": "success"} | |||
|
|||
@pytest.fixture(scope="module") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding more tests!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure :)
"name": TOOL_NAME, | ||
"description": "Search the Hadith for relevant narrations. Returns a list of hadith. Multiple hadith may be relevant.", | ||
"name": "search_hadith", | ||
"description": "Search for relevant Hadith narrations based on a specific topic.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gpt-4o was deciding to use the tools but wasn't outputting the required arguments (the query
parameter in this case). This was causing the CI check to fail. After modifying the tool description, the model started providing the arguments correctly.
Changes:
act
locallyBREAKING CHANGE: Vectara API v2 Migration
VECTARA_AUTH_TOKEN
toVECTARA_API_KEY
to match new API conventionsVECTARA_CUSTOMER_ID
requirementRequired Action:
Could you please create a new Vectara API key with query permissions for mawsuah and tafsir corpora. Update this key in the deployment environment?