Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector database connection error when using Qwen text-embedding-vX #12158

Open
5 tasks done
tapdefenser opened this issue Dec 27, 2024 · 4 comments
Open
5 tasks done

Vector database connection error when using Qwen text-embedding-vX #12158

tapdefenser opened this issue Dec 27, 2024 · 4 comments
Labels
🌚 invalid This doesn't seem right

Comments

@tapdefenser
Copy link

Self Checks

  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

0.14.1

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

When I create a new knowledge
I upload a *.txt file
and In the Text Preprocessing and Cleaning process I choose "Automatic","high quality" with "text-embedding-vX" ,using every method of Retrieval Settings(with and without Rerank)
and then i save&process.
It seems it didn't work, my Qwen dashborad seems to be not requested, and the Dify says "Vector database connection error"

✔️ Expected Behavior

It should work correctly

❌ Actual Behavior

Vector database connection error

@dosubot dosubot bot added 🐞 bug Something isn't working 👻 feat:rag Embedding related issue, like qdrant, weaviate, milvus, vector database. labels Dec 27, 2024
Copy link

dosubot bot commented Dec 27, 2024

Hey @tapdefenser! 👋 I'm here to help you with any bugs, questions, or contribution-related queries. Let's work together to resolve the issue you're experiencing.

I found a similar open issue regarding a vector database connection error with Qdrant in a self-hosted Docker environment. The suggested solution includes several steps:

  1. Check Environment Variables: Ensure that your .env file has the correct settings for connecting to your Qdrant instance. Verify that QDRANT_URL is set to the correct endpoint and QDRANT_API_KEY is correctly configured.

  2. Network Configuration: Make sure that the Docker network settings allow for external connections. Adjust the Docker network settings if necessary to enable communication between Dify services and the Qdrant instance.

  3. Docker Compose Configuration: Verify your docker-compose.yaml file to ensure the Qdrant service is correctly configured and that necessary ports are exposed. You might need to uncomment the port mappings for Qdrant:

    ports:
      - "6333:6333"
      - "6334:6334"
  4. Firewall and Security Groups: Check that any firewalls or security groups allow traffic on the necessary ports (e.g., 6333 and 6334) between your Dify instance and the Qdrant instance.

  5. Logs and Debugging: Review the logs for any error messages that might provide more details about the connection issue. This can help identify if the problem is related to network connectivity, authentication, or other factors.

These steps should help resolve the connection issue with the vector database [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@crazywoola
Copy link
Member

Please provide the container logs and check the status of the containers.

@crazywoola crazywoola added 🌚 invalid This doesn't seem right and removed 🐞 bug Something isn't working 👻 feat:rag Embedding related issue, like qdrant, weaviate, milvus, vector database. labels Dec 27, 2024
@tapdefenser
Copy link
Author

tapdefenser commented Dec 27, 2024

@dosubot
here are the logs

[worker]     | 2024-12-27 10:02:31,185.185 INFO [MainThread] [strategy.py:161] - Task tasks.document_indexing_task.document_indexing_task[e69555d8-b325-4deb-a137-f551e774a99a] received
[worker]     | 2024-12-27 10:02:31,306.306 INFO [Dummy-2] [document_indexing_task.py:59] - Start process document: 871761ef-ec0c-4213-8c63-d4b44a3034fe
。。。
[worker]     | Building prefix dict from the default dictionary ...
[worker]     | 2024-12-27 10:03:25,901.901 DEBUG [Thread-3 (_process_keyword_index)] [__init__.py:113] - Building prefix dict from the default dictionary ...
[worker]     | Dumping model to file cache /tmp/jieba.cache
[worker]     | 2024-12-27 10:03:26,646.646 DEBUG [Thread-3 (_process_keyword_index)] [__init__.py:146] - Dumping model to file cache /tmp/jieba.cache
[worker]     | Loading model cost 0.810 seconds.
[worker]     | Prefix dict has been built successfully.
[worker]     | 2024-12-27 10:03:26,712.712 DEBUG [Thread-3 (_process_keyword_index)] [__init__.py:164] - Loading model cost 0.810 seconds.
[worker]     | 2024-12-27 10:03:26,712.712 DEBUG [Thread-3 (_process_keyword_index)] [__init__.py:166] - Prefix dict has been built successfully.
。。。
[worker]     | 2024-12-27 10:04:15,797.797 WARNING [Dummy-2] [warnings.py:112] - /app/api/extensions/ext_logging.py:49: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
[worker]     |   return datetime.utcfromtimestamp(seconds).astimezone(timezone).timetuple()
[worker]     | 
[worker]     | 2024-12-27 10:04:15,792.792 ERROR [Dummy-2] [indexing_runner.py:94] - consume document failed
[worker]     | Traceback (most recent call last):
[worker]     |   File "/app/api/.venv/lib/python3.12/site-packages/urllib3/connection.py", line 199, in _new_conn
[worker]     |     sock = connection.create_connection(
[worker]     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[worker]     |   File "/app/api/.venv/lib/python3.12/site-packages/urllib3/util/connection.py", line 60, in create_connection
[worker]     |     for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
[worker]     |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[worker]     |   File "/app/api/.venv/lib/python3.12/site-packages/gevent/_socketcommon.py", line 221, in getaddrinfo
[worker]     |     addrlist = get_hub().resolver.getaddrinfo(host, port, family, type, proto, flags)
[worker]     |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[worker]     |   File "/app/api/.venv/lib/python3.12/site-packages/gevent/resolver/thread.py", line 63, in getaddrinfo
[worker]     |     return self.pool.apply(_socket.getaddrinfo, args, kwargs)
[worker]     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[worker]     |   File "/app/api/.venv/lib/python3.12/site-packages/gevent/pool.py", line 161, in apply
[worker]     |     return self.spawn(func, *args, **kwds).get()
[worker]     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[worker]     |   File "src/gevent/event.py", line 330, in gevent._gevent_cevent.AsyncResult.get
[worker]     |   File "src/gevent/event.py", line 360, in gevent._gevent_cevent.AsyncResult.get
[worker]     |   File "src/gevent/event.py", line 348, in gevent._gevent_cevent.AsyncResult.get
[worker]     |   File "src/gevent/event.py", line 328, in gevent._gevent_cevent.AsyncResult._raise_exception
[worker]     |   File "/app/api/.venv/lib/python3.12/site-packages/gevent/_compat.py", line 50, in reraise
[worker]     |     raise value.with_traceback(tb)
[worker]     |   File "/app/api/.venv/lib/python3.12/site-packages/gevent/threadpool.py", line 173, in __run_task
[worker]     |     thread_result.set(func(*args, **kwargs))
[worker]     |     ^^^^^^^^^^^^^^^^^
[worker]     | socket.gaierror: [Errno -2] Name or service not known
[worker]     | 
[worker]     | The above exception was the direct cause of the following exception:
[worker]     | 
[worker]     | Traceback (most recent call last):
[worker]     |   File "/app/api/.venv/lib/python3.12/site-packages/urllib3/connectionpool.py", line 789, in urlopen
[worker]     |     response = self._make_request(
[worker]     |                ^^^^^^^^^^^^^^^^^^^
[worker]     |   File "/app/api/.venv/lib/python3.12/site-packages/urllib3/connectionpool.py", line 495, in _make_request
[worker]     |     conn.request(
[worker]     |   File "/app/api/.venv/lib/python3.12/site-packages/urllib3/connection.py", line 441, in request
[worker]     |     self.endheaders()
[worker]     |   File "/usr/local/lib/python3.12/http/client.py", line 1331, in endheaders
[worker]     |     self._send_output(message_body, encode_chunked=encode_chunked)
[worker]     |   File "/usr/local/lib/python3.12/http/client.py", line 1091, in _send_output
[worker]     |     self.send(msg)
[worker]     |   File "/usr/local/lib/python3.12/http/client.py", line 1035, in send
[worker]     |     self.connect()
[worker]     |   File "/app/api/.venv/lib/python3.12/site-packages/urllib3/connection.py", line 279, in connect
[worker]     |     self.sock = self._new_conn()
[worker]     |                 ^^^^^^^^^^^^^^^^
[worker]     |   File "/app/api/.venv/lib/python3.12/site-packages/urllib3/connection.py", line 206, in _new_conn
[worker]     |     raise NameResolutionError(self.host, self, e) from e
[worker]     | urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPConnection object at 0x7faa480bc170>: Failed to resolve 'weaviate' ([Errno -2] Name or service not known)
[worker]     | 
[worker]     | The above exception was the direct cause of the following exception:
[worker]     | 
[worker]     | Traceback (most recent call last):
[worker]     |   File "/app/api/.venv/lib/python3.12/site-packages/requests/adapters.py", line 486, in send
[worker]     |     resp = conn.urlopen(
[worker]     |            ^^^^^^^^^^^^^
[worker]     |   File "/app/api/.venv/lib/python3.12/site-packages/urllib3/connectionpool.py", line 843, in urlopen
[worker]     |     retries = retries.increment(
[worker]     |               ^^^^^^^^^^^^^^^^^^
[worker]     |   File "/app/api/.venv/lib/python3.12/site-packages/urllib3/util/retry.py", line 519, in increment
[worker]     |     raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
[worker]     |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[worker]     | urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='weaviate', port=8181): Max retries exceeded with url: /v1/meta (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7faa480bc170>: Failed to resolve 'weaviate' ([Errno -2] Name or service not known)"))
[worker]     | 
[worker]     | During handling of the above exception, another exception occurred:
[worker]     | 
[worker]     | Traceback (most recent call last):
[worker]     |   File "/app/api/core/rag/datasource/vdb/weaviate/weaviate_vector.py", line 45, in _init_client
[worker]     |     client = weaviate.Client(
[worker]     |              ^^^^^^^^^^^^^^^^
[worker]     |   File "/app/api/.venv/lib/python3.12/site-packages/weaviate/client.py", line 150, in __init__
[worker]     |     self._connection = Connection(
[worker]     |                        ^^^^^^^^^^^
[worker]     |   File "/app/api/.venv/lib/python3.12/site-packages/weaviate/connect/connection.py", line 654, in __init__
[worker]     |     self._server_version = self.get_meta()["version"]
[worker]     |                            ^^^^^^^^^^^^^^^
[worker]     |   File "/app/api/.venv/lib/python3.12/site-packages/weaviate/connect/connection.py", line 675, in get_meta
[worker]     |     response = self.get(path="/meta")
[worker]     |                ^^^^^^^^^^^^^^^^^^^^^^
[worker]     |   File "/app/api/.venv/lib/python3.12/site-packages/weaviate/connect/connection.py", line 516, in get
[worker]     |     return self._session.get(
[worker]     |            ^^^^^^^^^^^^^^^^^^
[worker]     |   File "/app/api/.venv/lib/python3.12/site-packages/requests/sessions.py", line 602, in get
[worker]     |     return self.request("GET", url, **kwargs)
[worker]     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[worker]     |   File "/app/api/.venv/lib/python3.12/site-packages/requests/sessions.py", line 589, in request
[worker]     |     resp = self.send(prep, **send_kwargs)
[worker]     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[worker]     |   File "/app/api/.venv/lib/python3.12/site-packages/requests/sessions.py", line 703, in send
[worker]     |     r = adapter.send(request, **kwargs)
[worker]     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[worker]     |   File "/app/api/.venv/lib/python3.12/site-packages/requests/adapters.py", line 519, in send
[worker]     |     raise ConnectionError(e, request=request)
[worker]     | requests.exceptions.ConnectionError: HTTPConnectionPool(host='weaviate', port=8181): Max retries exceeded with url: /v1/meta (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7faa480bc170>: Failed to resolve 'weaviate' ([Errno -2] Name or service not known)"))
[worker]     | 
[worker]     | During handling of the above exception, another exception occurred:
[worker]     | 
[worker]     | Traceback (most recent call last):
[worker]     |   File "/app/api/core/indexing_runner.py", line 78, in run
[worker]     |     self._load(
[worker]     |   File "/app/api/core/indexing_runner.py", line 672, in _load
[worker]     |     tokens += future.result()
[worker]     |               ^^^^^^^^^^^^^^^
[worker]     |   File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 456, in result
[worker]     |     return self.__get_result()
[worker]     |            ^^^^^^^^^^^^^^^^^^^
[worker]     |   File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
[worker]     |     raise self._exception
[worker]     |   File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 59, in run
[worker]     |     result = self.fn(*self.args, **self.kwargs)
[worker]     |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[worker]     |   File "/app/api/core/indexing_runner.py", line 729, in _process_chunk
[worker]     |     index_processor.load(dataset, chunk_documents, with_keywords=False)
[worker]     |   File "/app/api/core/rag/index_processor/processor/paragraph_index_processor.py", line 56, in load
[worker]     |     vector = Vector(dataset)
[worker]     |              ^^^^^^^^^^^^^^^
[worker]     |   File "/app/api/core/rag/datasource/vdb/vector_factory.py", line 35, in __init__
[worker]     |     self._vector_processor = self._init_vector()
[worker]     |                              ^^^^^^^^^^^^^^^^^^^
[worker]     |   File "/app/api/core/rag/datasource/vdb/vector_factory.py", line 56, in _init_vector
[worker]     |     return vector_factory_cls().init_vector(self._dataset, self._attributes, self._embeddings)
[worker]     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[worker]     |   File "/app/api/core/rag/datasource/vdb/weaviate/weaviate_vector.py", line 275, in init_vector
[worker]     |     return WeaviateVector(
[worker]     |            ^^^^^^^^^^^^^^^
[worker]     |   File "/app/api/core/rag/datasource/vdb/weaviate/weaviate_vector.py", line 36, in __init__
[worker]     |     self._client = self._init_client(config)
[worker]     |                    ^^^^^^^^^^^^^^^^^^^^^^^^^
[worker]     |   File "/app/api/core/rag/datasource/vdb/weaviate/weaviate_vector.py", line 49, in _init_client
[worker]     |     raise ConnectionError("Vector database connection error")
[worker]     | ConnectionError: Vector database connection error
[worker]     | 2024-12-27 10:04:15,808.808 INFO [Dummy-2] [document_indexing_task.py:76] - Processed dataset: 91908ffd-e064-4192-b8c2-06f57f39ab2d latency: 104.62223403499229
[worker]     | 2024-12-27 10:04:15,846.846 INFO [Dummy-2] [trace.py:128] - Task tasks.document_indexing_task.document_indexing_task[e69555d8-b325-4deb-a137-f551e774a99a] succeeded in 104.66030469202087s: None

Copy link

dosubot bot commented Dec 27, 2024

The error you're encountering indicates a failure to resolve the 'weaviate' service, which is likely due to an incorrect or unreachable endpoint configuration. Here are some steps to resolve this issue:

  1. Check Endpoint Configuration: Ensure that the WEAVIATE_ENDPOINT in your configuration is correctly set to a valid and reachable URL. The default endpoint in the docker-compose.yaml file is http://weaviate:8080. Verify that this endpoint is correctly configured in your environment [1].

  2. Verify Docker Network Settings: Make sure that the Docker network settings allow for proper name resolution. The service name 'weaviate' should be correctly defined in your Docker network. You might need to check the docker-compose.yaml file to ensure that the Weaviate service is part of the correct network and that the network is properly configured [2].

  3. Service Status: Ensure that the Weaviate service is running and accessible. You can check the status of the containers using docker ps to verify that the Weaviate container is up and running.

  4. Logs and Debugging: Review the logs for any additional error messages that might provide more details about the connection issue. This can help identify if the problem is related to network connectivity, authentication, or other factors.

By following these steps, you should be able to resolve the vector database connection error related to the Weaviate service.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🌚 invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

2 participants