TTS Indic Server
Overview
Text to Speech (TTS) for Indian languages using ai4bharat/IndicF5 model.
Table of Contents
- Getting Started - Development
- For Development (Local)
- Downloading Indic TTS Model
- Running with FastAPI Server
- Evaluating Results
- Examples
- Specifying a Different Format
- Playing Back the Audio
- Building Docker Image
- Run the Docker Image
- Citations
Getting Started
For Development (Local)
- Prerequisites: Python 3.10, Ubuntu 22.04
- Steps:
- Create a virtual environment:
python3.10 -m venv venv - Activate the virtual environment:
On Windows, use:
source venv/bin/activatevenv\Scripts\activate - Install dependencies:
pip install -r requirements.txt
Downloading TTS Models
Models can be downloaded from AI4Bharat's HuggingFace repository:
- https://huggingface.co/ai4bharat/IndicF5
- Log in HuggingFace Account
- Request Access to the model
-
https://huggingface.co/docs/hub/security-tokens
- Get a Read token for your account
export HF_TOKEN=<YOUR-READ-TOKEN-HERE>bash download_model.sh hf download ai4bharat/IndicF5
Local Model Run
from transformers import AutoModel
import numpy as np
import soundfile as sf
# Load INF5 from Hugging Face
repo_id = "ai4bharat/IndicF5"
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)
# Generate speech
audio = model(
"ಬೆಂಗಳೂರು ಕರ್ನಾಟಕ ರಾಜ್ಯದ ರಾಜಧಾನಿ ಆಗಿದೆ, ಕರ್ನಾಟಕದಲ್ಲಿ ನಾವು ಕನ್ನಡ ಮಾತನಾಡುತ್ತೇವೆ",
ref_audio_path="prompts/KAN_F_HAPPY_00001.wav",
ref_text="ನಮ್ ಫ್ರಿಜ್ಜಲ್ಲಿ ಕೂಲಿಂಗ್ ಸಮಸ್ಯೆ ಆಗಿ ನಾನ್ ಭಾಳ ದಿನದಿಂದ ಒದ್ದಾಡ್ತಿದ್ದೆ, ಆದ್ರೆ ಅದ್ನೀಗ ಮೆಕಾನಿಕ್ ಆಗಿರೋ ನಿಮ್ ಸಹಾಯ್ದಿಂದ ಬಗೆಹರಿಸ್ಕೋಬೋದು ಅಂತಾಗಿ ನಿರಾಳ ಆಯ್ತು ನಂಗೆ."
)
# Normalize and save output
if audio.dtype == np.int16:
audio = audio.astype(np.float32) / 32768.0
sf.write("namaste.wav", np.array(audio, dtype=np.float32), samplerate=24000)
- Or Run the python code
cd src/server python tts_indic_f5.py
Running with FastAPI Server
Run the server using FastAPI
python src/server/main.py --host 0.0.0.0 --port 10804
Evaluating Results
You can evaluate the TTS generation results using curl commands. Below are examples for Kannada audio samples.
Kannada
```bash kannada_example.sh curl -X 'POST' \ 'http://localhost:10804/v1/audio/speech' \ -H 'accept: /' \ -H 'Content-Type: application/json' \ -d '{ "text": "ಉದ್ಯಾನದಲ್ಲಿ ಮಕ್ಕಳ ಆಟವಾಡುತ್ತಿದ್ದಾರೆ ಮತ್ತು ಪಕ್ಷಿಗಳು ಚಿಲಿಪಿಲಿ ಮಾಡುತ್ತಿವೆ." }' -o kannada-tts-out.wav
#### Hindi
```bash hindi_example.sh
curl -X 'POST' \
'http://localhost:10804/v1/audio/speech' \
-H 'accept: */*' \
-H 'Content-Type: application/json' \
-d '{
"text": "अरे, तुम आज कैसे हो?"
}' -o hindi-tts-output.wav
Specifying a Different Format
```bash specify_format.sh curl -s -H "content-type: application/json" localhost:7860/v1/audio/speech -d '{"input": "Hey, how are you?", "response_type": "wav"}' -o audio.wav
- [For Production (Docker)](#for-production-docker)
### For Production (Docker)
- **Prerequisites**: Docker and Docker Compose
- **Steps**:
1. **Start the server**:
```bash
export HF_TOKEN=<YOUR-READ-TOKEN-HERE>
docker compose -f compose.yaml up -d
```
## Building Docker Image
Build the Docker image locally:
```bash
docker build -t dwani/tts-indic-server:latest .
Run the Docker Image
docker run --runtime nvidia -it --rm -p 10804:10804 -e HF_TOKEN=$HF_TOKEN dwani/tts-indic-server:latest
Contributing
We welcome contributions! Please read the CONTRIBUTING.md file for guidelines on how to contribute to this project.
Also you can join the discord group to collaborate
- Reference
Citations
```bibtex citation_1.bib @misc{lacombe-etal-2024-parler-tts, author = {Yoach Lacombe and Vaibhav Srivastav and Sanchit Gandhi}, title = {Parler-TTS}, year = {2024}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/huggingface/parler-tts}} }
```bibtex citation_2.bib
@misc{lyth2024natural,
title = {Natural language guidance of high-fidelity text-to-speech with synthetic annotations},
author = {Dan Lyth and Simon King},
year = {2024},
eprint = {2402.01912},
archivePrefix = {arXiv},
primaryClass = {cs.SD}
}
@misc{AI4Bharat_IndicF5_2025,
author = {Praveen S V and Srija Anand and Soma Siddhartha and Mitesh M. Khapra},
title = {IndicF5: High-Quality Text-to-Speech for Indian Languages},
year = {2025},
url = {https://github.com/AI4Bharat/IndicF5},
}