docs-indic-server
git clone https://github.com/dwani-ai/docs-indic-server.git
cd docs-indic-server
python -m venv --system-site-packages venv
source venv/bin/activate
pip install -r requirements.txt
pip install "numpy<2.0"
python src/server/docs_api.py --host 0.0.0.0 --port 7861
- Dependencies
- decord
cd mkdir external cd external git clone --recursive https://github.com/dmlc/decord cd decord mkdir build && cd build export CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda export CUDACXX=/usr/local/cuda/bin/nvcc nvcc --version cmake .. -DUSE_CUDA=1 -DCMAKE_BUILD_TYPE=Release -DFFMPEG_DIR=/usr cmake .. -DUSE_CUDA=1 -DCMAKE_BUILD_TYPE=Release make cd ../python python3 setup.py install --user pip install "numpy<2.0" - olmocr
cd ../../ git clone https://github.com/allenai/olmocr.git cd olmocr pip install --upgrade pip setuptools wheel packaging // copy ppyproject.toml pip install -e . cd ../../dwani_org/gh-200-docs-indic-server/ pip install "numpy<2.0" - ffmpeg
sudo apt update sudo apt install ffmpeg ffmpeg -version sudo apt update sudo apt install cmake ffmpeg build-essential python3-dev sudo apt install pkg-config libavcodec-dev libavformat-dev libavutil-dev libswscale-dev libswresample-dev libavfilter-dev sudo apt-get install poppler-utils
Docs-Indic-Server
Overview
Document parser for Indian languages
Table of Contents
- Features
- Getting Started - Development
- Downloading Model
- Running with FastAPI Server
- Evaluating Results
- Citations
Features
- Extract text from PDF - Single Page, Multiple, Full
- Extract text from Image
- Summary text from Image/PDF
- English
- Kannada
- German
- Recreate PDF -> Scanned doc to clean PDF
- Convert PDF ->
- English to Kannada
- Kannada to English
For Development
- Prerequisites: Python 3.6+
- Steps:
- Create a virtual environment:
python -m venv venv - Activate the virtual environment:
On Windows, use:
source venv/bin/activatevenv\Scripts\activate - Install dependencies:
-
bash pip install -r requirements.txt -
Backend Server - Select based on GPU VRAM
bash vllm serve google/gemma-3-4b-itbash vllm serve reducto/RolmOCR-
bash vllm serve google/gemma-3-12b-it -
for H100 only
-
google/gemma-3-12b-it
-
for A100 only
- google/gemma-3-12b-it
Running with FastAPI Server
python src/server/docs_api_dwani.py --port 7860 --host 0.0.0.0
GPU server setup
- Terminal 1
git clone https://github.com/slabstech/docs-indic-server.git cd docs-indic-server chmod +x install-script.sh bash install-script.sh export HF_TOKEN='YOUR-HF-TOKEN' export HF_HOME=/home/ubuntu/data-dhwani-models vllm serve google/gemma-3-4b-it - Terminal 2
cd docs-indic-server source venv/bin/activate export HF_TOKEN='YOUR-HF-TOKEN' export HF_HOME=/home/ubuntu/data-dhwani-models python src/server/docs_api.py --port 7860 --host 0.0.0.0 - Terminal 3
git clone https://github.com/slabstech/indic-translate-server cd indic-translate-server python3.10 -m venv venv source venv/bin/activate pip install -r server-requirements.txt export HF_TOKEN='YOUR-HF-TOKEN' export HF_HOME=/home/ubuntu/data-dhwani-models huggingface-cli download ai4bharat/indictrans2-indic-en-dist-200M huggingface-cli download ai4bharat/indictrans2-en-indic-dist-200M python src/server/translate_api.py --port 7861 --host 0.0.0.0 --device cuda --use_distilled
-- For kannad pdf we need to add - NotoSans Kannada font from google
https://fonts.google.com/noto/specimen/Noto+Sans+Kannada
https://github.com/googlefonts/noto-fonts
Contributing
We welcome contributions! Please read the CONTRIBUTING.md file for guidelines on how to contribute to this project.
Also you can join the discord group to collaborate
Citations
```bibtex citation_1.bib @misc{poznanski2025olmocrunlockingtrillionstokens, title={olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models}, author={Jake Poznanski and Jon Borchardt and Jason Dunkelberger and Regan Huff and Daniel Lin and Aman Rangapur and Christopher Wilhelm and Kyle Lo and Luca Soldaini}, year={2025}, eprint={2502.18443}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2502.18443}, }
<!--
wget https://github.com/slabstech/docs-indic-server/blob/01e811210d56e655091313c1df8481d11e7640a6/install-script.sh
chmod +x install-script.sh
bash install-script.sh
## Download Qwen VL
```bash download_model.sh
huggingface_cli download google/gemma-3-4b-it
Download Gemma
```bash download_model.sh huggingface_cli download google/gemma-3-4b-it
## Download Pixtral
```bash download_model.sh
huggingface_cli download mistralai/Pixtral-12B-2409
Download Moondream2
huggingface_cli vikhyatk/moondream2
Getting Started - Development
- For moondream, libvips system library is required
sudo apt-get update && sudo apt-get install libvips
Evaluating Results
You can evaluate the ASR transcription results using curl commands. Below are examples for Kannada audio samples.
Kannada
```bash kannada_example.sh curl -s -H "content-type: application/json" localhost:7860/v1/audio/speech -d '{"input": "ಉದ್ಯಾನದಲ್ಲಿ ಮಕ್ಕಳ ಆಟವಾಡುತ್ತಿದ್ದಾರೆ ಮತ್ತು ಪಕ್ಷಿಗಳು ಚಿಲಿಪಿಲಿ ಮಾಡುತ್ತಿವೆ."}' -o audio_kannada.mp3
#### Hindi
```bash hindi_example.sh
curl -s -H "content-type: application/json" localhost:7860/v1/audio/speech -d '{"input": "अरे, तुम आज कैसे हो?"}' -o audio_hindi.mp3
Specifying a Different Format
```bash specify_format.sh curl -s -H "content-type: application/json" localhost:7860/v1/audio/speech -d '{"input": "Hey, how are you?", "response_type": "wav"}' -o audio.wav
### For Production (Docker)
- **Prerequisites**: Docker and Docker Compose
- **Steps**:
1. **Start the server**:
For GPU
```bash
docker compose -f compose.yaml up -d
```
For CPU only
```bash
docker compose -f cpu-compose.yaml up -d
```
# - vllm serve vikhyatk/moondream2 --trust-remote-code
## Building Docker Image
Build the Docker image locally:
```bash
docker build -t slabstech/docs_indic_server -f Dockerfile .
Run the Docker Image
docker run --gpus all -it --rm -p 7860:7860 slabstech/docs_indic_server
-->