Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: RemoteBulkWriter doesn't work in Windows environment. #2431

Open
1 task done
counter2015 opened this issue Dec 12, 2024 · 0 comments
Open
1 task done

[Bug]: RemoteBulkWriter doesn't work in Windows environment. #2431

counter2015 opened this issue Dec 12, 2024 · 0 comments
Labels
kind/bug Something isn't working

Comments

@counter2015
Copy link
Contributor

counter2015 commented Dec 12, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

I deploy a milvus-standalone on my laptop, and try to use RemoteBulkWriter

It writes data just under bucket root path \\1.json as following.

image

And I try to deploy to docker, it can write to folder /data correctly

Expected Behavior

The data should be upload to minio path such like s3://a-bucket/data/<uuid>/1.json

Steps/Code To Reproduce behavior

from pymilvus import CollectionSchema, FieldSchema, DataType
from pymilvus.bulk_writer import RemoteBulkWriter

if __name__ == "__main__":
    ACCESS_KEY = "minioadmin"
    SECRET_KEY = "minioadmin"
    BUCKET_NAME = "a-bucket"
    ENDPOINT = "localhost:9000"

    conn = RemoteBulkWriter.S3ConnectParam(
        endpoint=ENDPOINT,
        access_key=ACCESS_KEY,
        secret_key=SECRET_KEY,
        bucket_name=BUCKET_NAME,
    )

    from pymilvus.bulk_writer import BulkFileType

    schema = CollectionSchema(
        fields=[
            FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
            FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=768),
            FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=4096),
        ],
        description="Test collection",
        enable_dynamic_field=True,
    )

    writer = RemoteBulkWriter(
        schema=schema,
        remote_path="/data",
        connect_param=conn,
        file_type=BulkFileType.JSON,
    )

    for i in range(1000):
        writer.append_row({"embedding": [1.0] * 768, "text": "hello world"})

    writer.commit()
    print(writer.batch_files) # [['\\1.json']]
    print(writer.data_path) # \data\8a67749b-bb60-4369-bdb4-b5e5f5853a6e
    print(writer.uuid) # 8a67749b-bb60-4369-bdb4-b5e5f5853a6e

Environment details

  • Hardware/Softward conditions
    • OS: Windows
    • CPU: 13th Gen Intel(R) Core(TM) i7-1365U
  • Method of installation: docker-compose, standalone
  • Milvus version : 2.4.15
  • Milvus configuration :

insdie docker-compose.yaml

services:
  etcd:
    container_name: milvus-etcd
    image: quay.io/coreos/etcd:v3.5.5
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
      - ETCD_SNAPSHOT_COUNT=50000
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd
    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
    healthcheck:
      test: ["CMD", "etcdctl", "endpoint", "health"]
      interval: 30s
      timeout: 20s
      retries: 3

  minio:
    container_name: milvus-minio
    image: minio/minio:RELEASE.2023-03-20T20-16-18Z
    environment:
      MINIO_ACCESS_KEY: minioadmin
      MINIO_SECRET_KEY: minioadmin
    ports:
      - "9001:9001"
      - "9000:9000"
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data
    command: minio server /minio_data --console-address ":9001"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3

  standalone:
    container_name: milvus-standalone
    image: milvusdb/milvus:v2.4.15
    command: ["milvus", "run", "standalone"]
    security_opt:
    - seccomp:unconfined
    environment:
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9000
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
      interval: 30s
      start_period: 90s
      timeout: 20s
      retries: 3
    ports:
      - "19530:19530"
      - "9091:9091"
    depends_on:
      - "etcd"
      - "minio"

  attu:
    container_name: milvus-attu
    image: zilliz/attu:v2.4
    environment:
      MILVUS_URL: standalone:19530
    ports:
      - "8000:3000"
    depends_on:
      - "standalone"
    networks:
      - default

networks:
  default:
    name: milvus

Anything else?

I have not test this behavior on other SDK, should we add test to CI/CD stage towards windows platform ?

@counter2015 counter2015 added the kind/bug Something isn't working label Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant