Quran.com integration #67

abdullah-alnahas · 2024-11-06T21:22:00Z

Changes:

Implemented initial version of AnsariWorkflow, an agent that executes predefined workflows instead of LLM-driven decisions
Created a generic Vectara search tool that can be instantiated for both mawsuah and tafsir searches
Enhanced GitHub Actions performance:
- Switched from pip to uv for faster package installation
- Changed PostgreSQL port to 5433 to avoid conflicts with local instances when running act locally
Refactored ansari.py for improved code organization
Updated tool descriptions to ensure proper function calls by GPT-4

BREAKING CHANGE: Vectara API v2 Migration

Switched to Vectara's RESTful API v2
Renamed VECTARA_AUTH_TOKEN to VECTARA_API_KEY to match new API conventions
Removed VECTARA_CUSTOMER_ID requirement

Required Action:
Could you please create a new Vectara API key with query permissions for mawsuah and tafsir corpora. Update this key in the deployment environment?

… integ

waleedkadous

Really nicely done. MAy Allah reward you. Some minor todos but none are blocking.

One thing to discuss: still not sure of the value of Ansari Workflow. But we can work this out later.

waleedkadous · 2024-11-07T16:42:25Z

.github/workflows/python-app.yml

@@ -36,7 +37,7 @@ jobs:
          POSTGRES_PASSWORD: postgres
          PGPASSWORD: postgres
        ports:
-          - "5432:5432"
+          - "5433:5432"


Just curious why this is necessary.

This change is purely for convenience. I usually run a local PostgreSQL instance on port 5432 for development. When testing the GitHub Action locally using 'act', I need to shut down my local PostgreSQL instance to avoid port conflicts. By changing the port, I can avoid this issue and keep both services running simultaneously :)

Sounds good. Please leave a comment explaining this.

waleedkadous · 2024-11-07T16:43:14Z

agents/ansari.py

-            settings.VECTARA_AUTH_TOKEN.get_secret_value(),
-            settings.VECTARA_CUSTOMER_ID,
-            settings.VECTARA_CORPUS_ID,
+        sm = SearchVectara(


Love this generalization and refactoring.

waleedkadous · 2024-11-07T16:43:42Z

agents/ansari.py

        self.message_history = [
            {"role": "system", "content": self.sys_msg}
        ] + message_history
-        print(f"Original trace is {self.message_logger.trace_id}")
-        print(f"Id 1 is {langfuse_context.get_current_trace_id()}")
+        logger.info(f"Original trace is {self.message_logger.trace_id}")


Thank you for correcting this. Apologies.

waleedkadous · 2024-11-07T16:45:48Z

agents/ansari_workflow.py

+
+    Attributes:
+        tool_name_to_instance (dict): Mapping of tool names to their respective instances.
+        model (str): The name of the language model to use.


Todo (for future): Support different models in different parts of the workflow.

One thing I also don't understand: why doesn't this replace agents/ansari.py?

This implementation is based on our previous WhatsApp discussion. We identified two distinct use cases:

Letting the LLM dynamically decide whether and which tools to use

Enforcing a specific sequence of tool usage

For some scenarios (like the quran.com integration), we want to guarantee a specific workflow - first execute a search, then formulate an answer based on the search results. Therefore:

Ansari will handle cases where the LLM makes tool-usage decisions

AnsariWorkflo will execute predefined sequences of steps

waleedkadous · 2024-11-07T16:46:57Z

agents/ansari_workflow.py

+
+class AnsariWorkflow:
+    """
+    AnsariWorkflow manages the execution of modular workflow steps for processing user queries.


This seems to overlap a little bit with LangChain, but let's try it for now.

Yes :) I hadn't considered LangChain initially, probably because I perceived it as overly complex. However, you make a good point - LangChain offers numerous valuable features. If we need that functionality, switching to LangChain would make sense. LangGraph is also quite impressive.

waleedkadous · 2024-11-07T16:51:06Z

config.py

+        Kathir's work. Regardless of the language used in the original conversation, 
+        you will translate the query into English before searching the tafsir. The 
+        function returns a list of **potentially** relevant matches, which may include 
+        multiple passages of interpretation and analysis.


Should this be pulled out into a resource instead of included in the code?

Makes total sense since it's part of the prompt.

waleedkadous · 2024-11-07T16:52:24Z

main_api.py

+        # Create AnsariWorkflow instance
+        ansari_workflow = AnsariWorkflow(settings)
+
+        ayah_id = req.surah*1000 + req.ayah


Pull this out into a separate function in a separate library e.g. ayah_get_range_index(surah, ayah).

Are you suggesting we create a sub-package for Quran.com-related functionality?

waleedkadous · 2024-11-07T16:53:37Z

temp-test-workflow-agent.ipynb

@@ -0,0 +1,139 @@
+{


Should probably be checked in in a more permanent place (e.g. "how to use Ansari from Jupyter") or similar.

Agreed. Should we create a separate repository for experimental code, unused but potentially useful implementations (e.g. dspy work ).
This would help keep our main codebase clean while preserving valuable work.

waleedkadous · 2024-11-07T16:54:08Z

tests/test_main_api.py

@@ -543,3 +545,61 @@ async def test_add_feedback(login_user, create_thread):
    )
    assert response.status_code == 200
    assert response.json() == {"status": "success"}
+
+@pytest.fixture(scope="module")


Thank you for adding more tests!

waleedkadous · 2024-11-07T16:54:38Z

tools/search_hadith.py

-                "name": TOOL_NAME,
-                "description": "Search the Hadith for relevant narrations. Returns a list of hadith. Multiple hadith may be relevant.",
+                "name": "search_hadith",
+                "description": "Search for relevant Hadith narrations based on a specific topic.",


Why the change?

gpt-4o was deciding to use the tools but wasn't outputting the required arguments (the query parameter in this case). This was causing the CI check to fail. After modifying the tool description, the model started providing the arguments correctly.

abdullah-alnahas added 25 commits November 3, 2024 14:11

feat(ansari_workflow.py): WIP implement basic workflow functionality

937d0b5

refactor(agents/ansari_workflow.py): rm file (restore w/history later)

ff6ee49

refactor(ansari_workflow.py): add file (history restore pending)

81731d1

added file content back

a1a3eaf

fix(search_quran.py): remove an extraneous arg

a79eb16

feat(ansari_workflow.py): complete a minimal implementation

9c0320a

temp(temp-test-workflow-agent.ipynb): add temp test for workflow

4acd58b

refactor(ansari.py): use inline if/else to reduce code complexity

937ab54

feat(ansari_workflow.py): finish minimal workflow exec impl

6e6f922

update workflow exec example

431240c

feat(ansari.py): use generic vectara search tool

b167f89

feat(search_vectara.py): impl a generic vectara search tool

8b12f46

feat(ansari_workflow.py): add tafsir search tool

0497400

feat(config.py): align config names with vectara search tool

bddd7ea

refactor(.env.example): update config vars to match new requirements

99d64fe

refactor(ansari.py): use the generic vectara search tool

01be69b

fix(tools): correct tool desc

023f3d4

style(search_hadith.py): change var name

6ce0f90

refactor(python-app.yml): use uv instead of pip

118f69e

style: use ruff formatter

04d372b

fix(ansari.py): correct tool call response parsing

01cf095

refactor(ansari.py): simplify stream processing of tool calls

cb720c8

refactor(config.py): enhance mawsuah tool desc

23f24d7

refactor(tools): enhance tool desc

c8918b2

refactor(ansari.py): remove extra unneeded logging

c41c9cb

abdullah-alnahas requested a review from waleedkadous November 6, 2024 21:31

abdullah-alnahas added 4 commits November 7, 2024 00:46

fix(search_hadith.py): call post-processing function properly

b8122ee

feat(ansari_workflow.py): allow metadata filters

08e957d

fix(search_vectara): properly post-process search res

1fe95f3

feat(main_api.py): add ayah tafsir endpoint

954e8d0

abdullah-alnahas added 5 commits November 7, 2024 16:56

feat(config.py): add quran.com apikey to the config

845602f

feat(ansari_db.py): store quran.com interactions

1a039ed

test(test_main_api.py): add tests for quran.com integration

a0783f5

feat(09_create_quran_answer_tables.sql): create a table for quran.com…

3a1453d

… integ

Merge branch 'main' into feat/workflow-exec

8b4f964

abdullah-alnahas changed the title ~~[WIP] Quran.com integration~~ Quran.com integration Nov 7, 2024

waleedkadous approved these changes Nov 7, 2024

View reviewed changes

waleedkadous added 2 commits November 7, 2024 09:12

Merge branch 'main' into feat/workflow-exec

917c70b

Correct minor issue with merge.

aa5fdc0

waleedkadous merged commit 2aa47e6 into main Nov 7, 2024
1 check passed

abdullah-alnahas deleted the feat/workflow-exec branch November 8, 2024 06:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quran.com integration #67

Quran.com integration #67

abdullah-alnahas commented Nov 6, 2024 •

edited

Loading

waleedkadous left a comment

waleedkadous Nov 7, 2024

abdullah-alnahas Nov 8, 2024

waleedkadous Nov 8, 2024

abdullah-alnahas Nov 8, 2024

waleedkadous Nov 7, 2024

abdullah-alnahas Nov 8, 2024

waleedkadous Nov 7, 2024

waleedkadous Nov 7, 2024

waleedkadous Nov 7, 2024

abdullah-alnahas Nov 8, 2024

waleedkadous Nov 7, 2024

abdullah-alnahas Nov 8, 2024

waleedkadous Nov 7, 2024

abdullah-alnahas Nov 8, 2024

waleedkadous Nov 7, 2024

abdullah-alnahas Nov 8, 2024

waleedkadous Nov 7, 2024

abdullah-alnahas Nov 8, 2024

waleedkadous Nov 7, 2024

abdullah-alnahas Nov 8, 2024

waleedkadous Nov 7, 2024

abdullah-alnahas Nov 8, 2024

Quran.com integration #67

Quran.com integration #67

Conversation

abdullah-alnahas commented Nov 6, 2024 • edited Loading

waleedkadous left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abdullah-alnahas commented Nov 6, 2024 •

edited

Loading