Skip to content

Latest commit

 

History

History
200 lines (144 loc) · 6.83 KB

README.md

File metadata and controls

200 lines (144 loc) · 6.83 KB

Voice Sythesizer : AI Voice Generator 👾💬

Come and try out the AI voice generating services in our website!

We used Neural Network Models (GlowTTS, HIFI-GAN, mlp) with KSS dataset and preprocessed Taeyeon voice dataset to create an optimized model. We synthesized the voice with the newly created model then converted the input text to speech. These processes allow users to listen to Taeyeon sing different singer's songs. i.e) Hyo Shin Park


System Architecture

system architecture

Tech Stack

Backend: Flask
Frontend: React, Next.js, Typescript, jQuery, Redux, Redux-Saga, styled-components
Middleware: Gunicorn
etc: Nginx, Docker, Mysql, Colaboratory, Google Cloud Storage, Pytorch, Swagger

Installation

Clone Repository

$git clone --recursive https://github.com/SiliconWildCat/SiliconWildCat.git

Docker 🐳

docker-compose up -d 

Nginx

  • Frontend
http://localhost:80
  • Backend
http://localhost:8000

Local

Frontend: http://localhost:3000
Backend: http://localhost:5000

Features

 This website provides 2 features, Text To Speech and Singing Voice Synthesize.

1) Provides clips of music in the style of our source voice(Taeyeon) covering songs originally from other singers. 

2) Provides two options of voices that reads out a given text.
 

1) Text To Speech

TTS

  • Enter the text you want to convert and select the desired voice to play the text as the corresponding voice.

  • In Text To Speech, GlowTTS and HIFI-GAN were used.

    • Train the audio dataset converted to Mel spectogram to learn the tone and pronounce of voice based Glow TTS Neural Network.

    • Reduce Noise and make the voice similar to the actual speaker by Hifi-Gan Neural Network.

2) Singing Voice Synthesis

VS

  • This will provide the result of synthesizing songs with singer Taeyeon's voice.

  • In Voice Synthesizing, MLP Neural Network and HIFI-GAN were used.

    • Build the MLP Neural Network Layers based model with three files - text file, midi file, vocal file - to create a Mel-spectrogram.

      We use text files and midi files to extract the pitch and phoneme to generate mel-spectrogram.

    • Reduce Noise and make the voice similar to the actual speaker by Hifi-Gan Neural Network.


Frontend

How to Initiallize

> when you use npm
     npm i && npm run build && npm start
     
> when you use yarn
    yarn && yarn build && yarn start 

About Installation

 1. yarn : you can get node modules
    ./frontend/node_moduels
 2. yarn build : you can get next build files
    ./frontend/
 3. yarn start : run webpage!!!

About Pages

 When you start the webpage you will see the SVG(Singing Voice Synthesis) page first.
 
 Switching in between two pages can be reached by clicking on the button. 
 Enjoy IT! 😃

Directory Structure

 frontend
 ┣ components
 ┃ ┣ Music
 ┃ ┃ ┣ Music.tsx
 ┃ ┃ ┗ music.scss
 ┃ ┣ Tts.tsx
 ┃ ┗ musicPlayer.tsx
 ┣ hooks
 ┃ ┣ createRequestSaga.ts
 ┃ ┗ useSelector.tsx
 ┣ interface
 ┃ ┣ counter.ts
 ┃ ┣ loading.ts
 ┃ ┗ tts.ts
 ┣ lib
 ┃ ┗ api
 ┃ ┃ ┣ api.ts
 ┃ ┗ ┗ client.ts
 ┣ modules
 ┃ ┣ index.ts
 ┃ ┣ loading.ts
 ┃ ┗ tts.ts
 ┣ pages
 ┃ ┣ _app.tsx
 ┃ ┣ _document.tsx
 ┗ ┗ index.tsx

Backend

How to Initiallize

  docker exec -it backend /bin/bash
  python3 run.py

About

  Enter the text you want to convert to desired voice. Our project provides Taeyeon and KSS voice dataset. If you select the voice and press the 'say it' button, the audio file will be saved in the path below.
  
  >> /app/audio.wav   

Directory Structure

 backend
 ┣ web
 ┃ ┣ TTS (submodule)
 ┃ ┣ g2pK (submodule)
 ┃ ┣ glowtts-v2 (Text to Mel spectogram Model)
 ┃ ┃ ┣ KSS
 ┃ ┃ ┗ TaeYeon 
 ┃ ┣ hifigan-v2 (Mel spectogram to Audio Model)
 ┃ ┃ ┣ KSS
 ┃ ┃ ┗ TaeYeon
 ┃ ┣ config.py (database configuration)
 ┃ ┣ inference.py (TTS synthesis)
 ┃ ┣ run.py
 ┃ ┗ saveText.py (save text to DB)
 ┣ Dockerfile
 ┗ requirements.txt

Submodule

g2pK : g2p module that converts graphemes to phonemes for Korean language

TTS : library for advanced Text-to-Speech generation

Swagger

swagger1


Reference