-
TTS - Text to Speech
- https://github.com/dwani-ai/tts-indic-server
git clone https://github.com/dwani-ai/tts-indic-server cd tts-indic-server git checkout gh-200 export HF_TOKEN='this-my-token' python -m venv venv source venv/bin/activate pip install wheel packaging pip install -r requirements.txt pip uninstall torch torchaudio torchvision pip install torch==2.7.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 python src/gh200/main.py --host 0.0.0.0 --port 7864 --config config_two
TTS Indic Server
Overview
Text to Speech (TTS) for Indian languages using ai4bharat/IndicF5 model.
Table of Contents
- Live Server
- Usage
- How to Use the Service
- Getting Started - Development
- For Development (Local)
- Downloading Indic TTS Model
- Running with FastAPI Server
- Evaluating Results
- Examples
- Specifying a Different Format
- Playing Back the Audio
- Describing the Voice
- Building Docker Image
- Run the Docker Image
- Available Speakers
- Tips
- Description Examples
- Citations
Live Server
We have hosted a Text to Speech (TTS) service that can be used to verify the accuracy of Speech generation.
Getting Started
For Development (Local)
- Prerequisites: Python 3.10, Ubuntu 22.04
- Steps:
- Create a virtual environment:
python -m venv venv - Activate the virtual environment:
On Windows, use:
source venv/bin/activatevenv\Scripts\activate - Install dependencies:
pip install -r requirements.txt
Downloading Indic TTS Model
```bash download_model.sh huggingface_cli download ai4bharat/IndicF5
### Local Model Run
```python
from transformers import AutoModel
import numpy as np
import soundfile as sf
# Load INF5 from Hugging Face
repo_id = "ai4bharat/IndicF5"
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)
# Generate speech
audio = model(
"ಬೆಂಗಳೂರು ಕರ್ನಾಟಕ ರಾಜ್ಯದ ರಾಜಧಾನಿ ಆಗಿದೆ, ಕರ್ನಾಟಕದಲ್ಲಿ ನಾವು ಕನ್ನಡ ಮಾತನಾಡುತ್ತೇವೆ",
ref_audio_path="prompts/KAN_F_HAPPY_00001.wav",
ref_text="ನಮ್ ಫ್ರಿಜ್ಜಲ್ಲಿ ಕೂಲಿಂಗ್ ಸಮಸ್ಯೆ ಆಗಿ ನಾನ್ ಭಾಳ ದಿನದಿಂದ ಒದ್ದಾಡ್ತಿದ್ದೆ, ಆದ್ರೆ ಅದ್ನೀಗ ಮೆಕಾನಿಕ್ ಆಗಿರೋ ನಿಮ್ ಸಹಾಯ್ದಿಂದ ಬಗೆಹರಿಸ್ಕೋಬೋದು ಅಂತಾಗಿ ನಿರಾಳ ಆಯ್ತು ನಂಗೆ."
)
# Normalize and save output
if audio.dtype == np.int16:
audio = audio.astype(np.float32) / 32768.0
sf.write("namaste.wav", np.array(audio, dtype=np.float32), samplerate=24000)
- Or Run the python code
python tts_indic_f5.py
Contributing
We welcome contributions! Please read the CONTRIBUTING.md file for guidelines on how to contribute to this project.
Also you can join the discord group to collaborate
- Reference
Citations
```bibtex citation_1.bib @misc{lacombe-etal-2024-parler-tts, author = {Yoach Lacombe and Vaibhav Srivastav and Sanchit Gandhi}, title = {Parler-TTS}, year = {2024}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/huggingface/parler-tts}} }
```bibtex citation_2.bib
@misc{lyth2024natural,
title = {Natural language guidance of high-fidelity text-to-speech with synthetic annotations},
author = {Dan Lyth and Simon King},
year = {2024},
eprint = {2402.01912},
archivePrefix = {arXiv},
primaryClass = {cs.SD}
}
@misc{AI4Bharat_IndicF5_2025,
author = {Praveen S V and Srija Anand and Soma Siddhartha and Mitesh M. Khapra},
title = {IndicF5: High-Quality Text-to-Speech for Indian Languages},
year = {2025},
url = {https://github.com/AI4Bharat/IndicF5},
}