Skip to content

TTS Indic Server

Overview

Text to Speech (TTS) for Indian languages using ai4bharat/IndicF5 model.

Table of Contents

Getting Started

For Development (Local)

  • Prerequisites: Python 3.10, Ubuntu 22.04
  • Steps:
  • Create a virtual environment:
    python3.10 -m venv venv
    
  • Activate the virtual environment:
    source venv/bin/activate
    
    On Windows, use:
    venv\Scripts\activate
    
  • Install dependencies:
    pip install -r requirements.txt
    

Downloading TTS Models

Models can be downloaded from AI4Bharat's HuggingFace repository:

  • https://huggingface.co/ai4bharat/IndicF5
  • Log in HuggingFace Account
  • Request Access to the model
  • https://huggingface.co/docs/hub/security-tokens

    • Get a Read token for your account
    export HF_TOKEN=<YOUR-READ-TOKEN-HERE>
    

    bash download_model.sh hf download ai4bharat/IndicF5

Local Model Run

from transformers import AutoModel
import numpy as np
import soundfile as sf

# Load INF5 from Hugging Face
repo_id = "ai4bharat/IndicF5"
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)


# Generate speech
audio = model(
    "ಬೆಂಗಳೂರು ಕರ್ನಾಟಕ ರಾಜ್ಯದ ರಾಜಧಾನಿ ಆಗಿದೆ, ಕರ್ನಾಟಕದಲ್ಲಿ ನಾವು ಕನ್ನಡ ಮಾತನಾಡುತ್ತೇವೆ",
    ref_audio_path="prompts/KAN_F_HAPPY_00001.wav",
    ref_text="ನಮ್‌ ಫ್ರಿಜ್ಜಲ್ಲಿ  ಕೂಲಿಂಗ್‌ ಸಮಸ್ಯೆ ಆಗಿ ನಾನ್‌ ಭಾಳ ದಿನದಿಂದ ಒದ್ದಾಡ್ತಿದ್ದೆ, ಆದ್ರೆ ಅದ್ನೀಗ ಮೆಕಾನಿಕ್ ಆಗಿರೋ ನಿಮ್‌ ಸಹಾಯ್ದಿಂದ ಬಗೆಹರಿಸ್ಕೋಬೋದು ಅಂತಾಗಿ ನಿರಾಳ ಆಯ್ತು ನಂಗೆ."
)


# Normalize and save output
if audio.dtype == np.int16:
    audio = audio.astype(np.float32) / 32768.0
sf.write("namaste.wav", np.array(audio, dtype=np.float32), samplerate=24000)
  • Or Run the python code
    cd src/server
    python tts_indic_f5.py
    

Running with FastAPI Server

Run the server using FastAPI

  python src/server/main.py --host 0.0.0.0 --port 10804

Evaluating Results

You can evaluate the TTS generation results using curl commands. Below are examples for Kannada audio samples.

Kannada

```bash kannada_example.sh curl -X 'POST' \ 'http://localhost:10804/v1/audio/speech' \ -H 'accept: /' \ -H 'Content-Type: application/json' \ -d '{ "text": "ಉದ್ಯಾನದಲ್ಲಿ ಮಕ್ಕಳ ಆಟವಾಡುತ್ತಿದ್ದಾರೆ ಮತ್ತು ಪಕ್ಷಿಗಳು ಚಿಲಿಪಿಲಿ ಮಾಡುತ್ತಿವೆ." }' -o kannada-tts-out.wav

#### Hindi

```bash hindi_example.sh
curl -X 'POST' \
  'http://localhost:10804/v1/audio/speech' \
  -H 'accept: */*' \
  -H 'Content-Type: application/json' \
  -d '{
  "text": "अरे, तुम आज कैसे हो?"
}' -o hindi-tts-output.wav

Specifying a Different Format

```bash specify_format.sh curl -s -H "content-type: application/json" localhost:7860/v1/audio/speech -d '{"input": "Hey, how are you?", "response_type": "wav"}' -o audio.wav

- [For Production (Docker)](#for-production-docker)
### For Production (Docker)
- **Prerequisites**: Docker and Docker Compose
- **Steps**:
  1. **Start the server**:
  ```bash
  export HF_TOKEN=<YOUR-READ-TOKEN-HERE>
  docker compose -f compose.yaml up -d
  ```


## Building Docker Image
Build the Docker image locally:
```bash
docker build -t dwani/tts-indic-server:latest .

Run the Docker Image

docker run --runtime nvidia -it --rm -p 10804:10804 -e HF_TOKEN=$HF_TOKEN dwani/tts-indic-server:latest

Contributing

We welcome contributions! Please read the CONTRIBUTING.md file for guidelines on how to contribute to this project.

Also you can join the discord group to collaborate

Citations

```bibtex citation_1.bib @misc{lacombe-etal-2024-parler-tts, author = {Yoach Lacombe and Vaibhav Srivastav and Sanchit Gandhi}, title = {Parler-TTS}, year = {2024}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/huggingface/parler-tts}} }

```bibtex citation_2.bib
@misc{lyth2024natural,
  title = {Natural language guidance of high-fidelity text-to-speech with synthetic annotations},
  author = {Dan Lyth and Simon King},
  year = {2024},
  eprint = {2402.01912},
  archivePrefix = {arXiv},
  primaryClass = {cs.SD}
}

@misc{AI4Bharat_IndicF5_2025,
  author       = {Praveen S V and Srija Anand and Soma Siddhartha and Mitesh M. Khapra},
  title        = {IndicF5: High-Quality Text-to-Speech for Indian Languages},
  year         = {2025},
  url          = {https://github.com/AI4Bharat/IndicF5},
}