Summary

Code snippets showing how to record I2S audio and store as .wav file on ESP32 with SD card, how to transcribe pre-recorded audio via STT (SpeechToText) Deepgram API, how to generate audio from text via TTS (TextToSpeech) API from Google TTS or OpenAI TTS or (new) SpeechGen.IO. Triggering ESP32 actions via Voice.

The repository contains the Demo main sketch 'KALO_ESP32_Voice_Assistant.ino', demonstrating different use case of my libraries 'lib_audio_recording.ino', 'lib_audio_transcription.ino' and (new) 'lib_TTS_SpeechGen.ino'

Features

Explore the demo use case examples in main sketch, summary:

Recording and playing audio are working offline, online connection needed for STT, TTS and streaming services
Recording Voice Audio with variable length (recording as long a button is pressed), storing as .wav file (with 44 byte header) on SD card
Replay your recorded audio (using Schreibfaul1 <audio.h> library)
Playing Audio streams (e.g. playing music via radio streams with <audio.h> library)
Triggering ESP actions via voice (e.g. triggering GPIO LED pins, addressing dedicated voices by calling their name, playing music on request)
STT (SpeechToText): Deepgram API service (registration needed)
TTS (TextToSpeech): Google TTS free API (no registration needed)
TTS (TextToSpeech): Open AI API (6 multilingual 6 voices, registration needed)
TTS (TextToSpeech): #NEW#: SpeechGen.IO voices (many voices, not free, payment needed)

Hardware

ESP32 development board (e.g. ESP32-WROOM-32), connected to Wifi
I2S digital microphone, e.g. INMP441 [I2S pins 22, 33, 35]
I2S audio amplifier, e.g. MAX98357A [I2S pins 25,26,27] with speaker
Micro SD Card [VSPI Default pins 5,18,19,23]
RGB LED (status indicator) and Analog Poti (audio volume)

Installation & Customizing

Required (Jan. 2025): Arduino IDE with ESP32 lib 3.1.x (based on ESP-IDF 5.3.x). Older 2.x ESP framework fail because new I2S driver missed
Required (for playing Audio on ESP32): AUDIO.H library ESP32-audioI2S.zip. Install latest zip (3.0.11g from July 18, 2024 or newer)
Copy all .ino files of 'KALO-ESP32-Voice-Assistant' into same folder (it is one sketch, split into multiple Arduino IDE tabs)
Update your pin assignments & wlan settings (ssid, password) in the .ino header files
Update headers with personal credentials (Deepgram API key, optional: OpenAI API key, SpeechGen Token)
Define your favorite recording settings (SAMPLE_RATE, BITS_PER_SAMPLE, GAIN_BOOSTER_I2S) in lib_audio_recording.ino header
Define your language settings (Google TTS in KALO_ESP32_Voice_Assistant.ino, Deepgram STT in lib_audio_transcription.ino header)
Toggle DEBUG flag to true (displaying Serial.print details) or false (for final usage)

Known issues

Earlier WifiClientSecure connection issues seem solved (with KALO 2025-01-06 Update & arduino-esp32 3.1.x)
Google TTS support short sentences only (Google limitation), non-free services (OpenAI and SpeechGen.IO) are not length limited.
TTS: Only OpenAI voices are multi-lingual (supporting multiple languages in same request), Google & SpeechGen.IO request language parameter/voices

Updates

2025-01-06: NEW library for TTS Speechgen.IO (hundreds of voices)
2025-01-06: Cleaned code, connection reliability issues solved, response time improved
2024-07-22: Misc. enhancements, WifiClientSecure reliability workarounds, code cleaned up
2024-07-18: 'Play 8bit audio' issue solved (latest AUDIO.H support 8bit wav format)
2024-07-14: WifiClientSecure connection reliablility improved (still not perfect)
2024-07-14: STT Deepgram response faster (typical response time on e.g. 5 sec voice record: ~ 2.5 sec).
2024-07-08: First drop, already working, not finally cleaned up (just posted this drop on some folks request)

Next steps

currently no major updates planned, enjoy the libraries :)

. . .

Demo Videos

Short video clip, presenting Recording & SpeechToText & TextToSpeech (without Open AI, ESP32 is not 'answering', just parroting my voice with free Google TTS voice). Workflow:

Recording user voice, storing audio .wav file (8KHz/8bit) to SD card,
STT: transcribe pre-recorded voice via Deepgram API,
TTS: repeat spoken sentence with Goggle TTS voice (a/o triggering e.g. LED via voice):

Featured video from other users & friends:
@techiesms: using my Deepgram transcription STT library in his IoT projects:
https://www.youtube.com/watch?v=j0EEFXmikvk-

Name		Name	Last commit message	Last commit date
Latest commit History 174 Commits
KALO_ESP32_Voice_Assistant		KALO_ESP32_Voice_Assistant
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summary

Features

Hardware

Installation & Customizing

Known issues

Updates

Next steps

Demo Videos

About

Releases

Packages

Languages

kaloprojects/KALO-ESP32-Voice-Assistant

Folders and files

Latest commit

History

Repository files navigation

Summary

Features

Hardware

Installation & Customizing

Known issues

Updates

Next steps

Demo Videos

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages