An AI assistant that supports answering questions and performing tasks from users, with speech-to-text and text-to-speech capabilities.
-
Speech-to-Text - Implement repo Fast-whisper-server - use PhoWhisper large model
Converts the vinai/PhoWhisper-large model (a pretrained ASR model by VINAI) from the Hugging Face transformers format into the CTranslate2 format for optimized inference.
-
Text-to-Speech - Implement repo vietTTS
-
LLM-based Agent - Use Qwen2.5:3b self-host by vLLM serving
-
RESTful API interface - FastAPI
-
Burr - Monitor, trace, persist, and execute on my own agent infrastructure.
- Python 3.10 or higher
- poetry
- Docker
- Burr Framwork
- instructor
- vLLM
sudo make run # Run docker-compose -> start all container
sudo make stop # Stop all container
The project can be configured through:
- llm.yaml - LLM settings
- models.yaml - Model configurations
- params.yaml - General paramters
- text2speech - TTS settings
- Endpoint:
/botvov/create_new
- Method:
GET
- Description: Registers a new user and creates a new application instance.
- Response:
status_code
: HTTP status codeuid
: Unique identifier for the new applicationmessage
: Response messagestatus
: Operation status
- Endpoint:
/botvov/send_audio_query
- Method:
POST
- Description: Processes an audio query and returns the corresponding command.
- Parameters:
uid
: Unique identifier for the applicationlat
: Latitude of the user's locationlong
: Longitude of the user's locationaudio
: Audio file to be processed
- Response:
audio/wav
: Processed audio response
- Endpoint:
/botvov/get_audio_response
- Method:
GET
- Description: Retrieves the command response for a given application.
- Parameters:
uid
: Unique identifier for the application
- Response:
status_code
: HTTP status codetext_response
: Textual response from the applicationdata
: Command response datamessage
: Response messagestatus
: Operation status
BOTVOV/
├── botvov/ # Main service: Qwen2.5:3b LLM
│ ├── main.py # Define API endpoint
│ ├── llm_service.py # Use Burr framework to build multi-agent workflow
│ ├── stt_service.py # Call speech2tech service
│ ├── tts_service.py # Call text2speech service
│ ├── run_burr_UI.sh # Bash script to run: Burr tracking, run main
│ ├── models.py # Define some models (Pydantic)
│ ├── utils.py
│ └── tool_calling/
│ ├── VOV_channel.py # Define class to provide vov channel information from API
│ └── Weather.py # Define class to provide weather from OpenWeather API
├── speech2text
├── text2speech
├── docs/ # API docs
├── .gitignore
├──
├── requirements.txt # Danh sách các thư viện cần thiết
├── Dockerfile # Dockerfile để đóng gói ứng dụng
└── README.md # Tệp hướng dẫn sử dụng dự án