Instructions to use sulabhkatiyar/ne-asr-clk with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sulabhkatiyar/ne-asr-clk with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="sulabhkatiyar/ne-asr-clk")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("sulabhkatiyar/ne-asr-clk", dtype="auto") - Notebooks
- Google Colab
- Kaggle
NE-ASR: Idu Mishmi (clk)
Romanized Idu Mishmi CTC ASR adapter for facebook/mms-1b-all.
Model description
This is a per-language CTC adapter for the MMS-1B model. The base model is frozen; only the language adapter (about 2.19 M parameters, roughly 8.8 MB on disk) and the CTC head are trained. Input is 16 kHz mono audio (up to 30 seconds). Output is a Romanized (Latin-script) transcript, lower-cased and NFC-normalized.
Datasets
- Training:
sulabhkatiyar/ne-asr-clk - Augmented training:
sulabhkatiyar/ne-asr-clk-aug
| split | n |
|---|---|
| train | 2,742 |
| val | 113 |
| test | 89 |
The train count above refers to the augmented training set actually used for
fine-tuning; the un-augmented source corpus is available at the first dataset link.
Training
- Base model:
facebook/mms-1b-all(frozen; only the language adapter and CTC head are trained) - Epochs: 4
- Batch size: 32 (effective batch size 32 with gradient accumulation = 1)
- Learning rate: 1e-3
- Warmup ratio: 0.10
- Precision: bf16
- Augmentation: 3x speed perturbation only (pitch shift disabled for this tonal language), plus online SpecAugment
- Audio: 16 kHz mono; max input 30 seconds
- Hardware: AMD MI300X (ROCm)
Evaluation
| split | n | WER% | CER% |
|---|---|---|---|
| val | 113 | 73.90 | 19.69 |
| test | 89 | 67.59 | 19.69 |
Reference processing: NFC + strip + lower. Hypothesis decoded with greedy CTC, no language model.
Usage
from huggingface_hub import hf_hub_download
from transformers import Wav2Vec2CTCTokenizer, Wav2Vec2FeatureExtractor, Wav2Vec2Processor, Wav2Vec2ForCTC
from safetensors.torch import load_file
import torch, torchaudio
REPO = "sulabhkatiyar/ne-asr-clk"
BASE = "facebook/mms-1b-all"
# Tokenizer and processor come from the adapter repo
_ = hf_hub_download(REPO, "tokenizer_config.json")
_ = hf_hub_download(REPO, "vocab.json")
tokenizer = Wav2Vec2CTCTokenizer.from_pretrained(REPO, do_lower_case=False)
feat_ext = Wav2Vec2FeatureExtractor.from_pretrained(BASE)
processor = Wav2Vec2Processor(feature_extractor=feat_ext, tokenizer=tokenizer)
# Load the base model and attach the adapter weights
model = Wav2Vec2ForCTC.from_pretrained(
BASE,
vocab_size=len(tokenizer),
pad_token_id=tokenizer.pad_token_id,
ignore_mismatched_sizes=True,
)
model.init_adapter_layers()
adapter_path = hf_hub_download(REPO, "adapter.clk.safetensors")
missing, unexpected = model.load_state_dict(load_file(adapter_path), strict=False)
model.eval()
# Inference (expects 16 kHz mono float32)
wav, sr = torchaudio.load("your_audio.wav")
if sr != 16000:
wav = torchaudio.functional.resample(wav, sr, 16000)
inputs = processor(wav.squeeze(0).numpy(), sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
logits = model(inputs.input_values).logits
pred_ids = logits.argmax(dim=-1)
transcription = processor.batch_decode(pred_ids)[0]
print(transcription)
Citation
If you use this model, please cite the MMS base model:
@article{pratap2023scaling,
title = {Scaling Speech Technology to 1,000+ Languages},
author = {Pratap, Vineel and Tjandra, Andros and Shi, Bowen and Tomasello, Paden and Babu, Arun and Kundu, Sayani and Elkahky, Ali and Ni, Zhaoheng and Vyas, Apoorv and Fazel-Zarandi, Maryam and Baevski, Alexei and Adi, Yossi and Zhang, Xiaohui and Hsu, Wei-Ning and Conneau, Alexis and Auli, Michael},
journal = {arXiv preprint arXiv:2305.13516},
year = {2023}
}
And the NE-ASR adapter release (placeholder; replace when the canonical publication is available):
@misc{katiyar2026neasr,
author = {Katiyar, Sulabh},
title = {NE-ASR: MMS-1B Adapters for Northeast Indian Languages},
year = {2026},
howpublished = {\url{/sulabhkatiyar/ne-asr-clk}},
note = {Placeholder citation; replace with the canonical publication when available.}
}
License
CC-BY-NC 4.0. This adapter is derived from facebook/mms-1b-all, which is released under CC-BY-NC 4.0. When you use this adapter, you must comply with the MMS license terms.
Model tree for sulabhkatiyar/ne-asr-clk
Base model
facebook/mms-1b-allDatasets used to train sulabhkatiyar/ne-asr-clk
sulabhkatiyar/ne-asr-clk
Paper for sulabhkatiyar/ne-asr-clk
Evaluation results
- wer on sulabhkatiyar/ne-asr-clktest set self-reported67.590
- cer on sulabhkatiyar/ne-asr-clktest set self-reported19.690