NE-ASR: Idu Mishmi (clk)

Romanized Idu Mishmi CTC ASR adapter for facebook/mms-1b-all.

Model description

This is a per-language CTC adapter for the MMS-1B model. The base model is frozen; only the language adapter (about 2.19 M parameters, roughly 8.8 MB on disk) and the CTC head are trained. Input is 16 kHz mono audio (up to 30 seconds). Output is a Romanized (Latin-script) transcript, lower-cased and NFC-normalized.

Datasets

Training: sulabhkatiyar/ne-asr-clk
Augmented training: sulabhkatiyar/ne-asr-clk-aug

split	n
train	2,742
val	113
test	89

The train count above refers to the augmented training set actually used for fine-tuning; the un-augmented source corpus is available at the first dataset link.

Training

Base model: facebook/mms-1b-all (frozen; only the language adapter and CTC head are trained)
Epochs: 4
Batch size: 32 (effective batch size 32 with gradient accumulation = 1)
Learning rate: 1e-3
Warmup ratio: 0.10
Precision: bf16
Augmentation: 3x speed perturbation only (pitch shift disabled for this tonal language), plus online SpecAugment
Audio: 16 kHz mono; max input 30 seconds
Hardware: AMD MI300X (ROCm)

Evaluation

split	n	WER%	CER%
val	113	73.90	19.69
test	89	67.59	19.69

Reference processing: NFC + strip + lower. Hypothesis decoded with greedy CTC, no language model.

Usage

from huggingface_hub import hf_hub_download
from transformers import Wav2Vec2CTCTokenizer, Wav2Vec2FeatureExtractor, Wav2Vec2Processor, Wav2Vec2ForCTC
from safetensors.torch import load_file
import torch, torchaudio

REPO = "sulabhkatiyar/ne-asr-clk"
BASE = "facebook/mms-1b-all"

# Tokenizer and processor come from the adapter repo
_ = hf_hub_download(REPO, "tokenizer_config.json")
_ = hf_hub_download(REPO, "vocab.json")
tokenizer = Wav2Vec2CTCTokenizer.from_pretrained(REPO, do_lower_case=False)
feat_ext  = Wav2Vec2FeatureExtractor.from_pretrained(BASE)
processor = Wav2Vec2Processor(feature_extractor=feat_ext, tokenizer=tokenizer)

# Load the base model and attach the adapter weights
model = Wav2Vec2ForCTC.from_pretrained(
    BASE,
    vocab_size=len(tokenizer),
    pad_token_id=tokenizer.pad_token_id,
    ignore_mismatched_sizes=True,
)
model.init_adapter_layers()
adapter_path = hf_hub_download(REPO, "adapter.clk.safetensors")
missing, unexpected = model.load_state_dict(load_file(adapter_path), strict=False)
model.eval()

# Inference (expects 16 kHz mono float32)
wav, sr = torchaudio.load("your_audio.wav")
if sr != 16000:
    wav = torchaudio.functional.resample(wav, sr, 16000)
inputs = processor(wav.squeeze(0).numpy(), sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
    logits = model(inputs.input_values).logits
pred_ids = logits.argmax(dim=-1)
transcription = processor.batch_decode(pred_ids)[0]
print(transcription)

Citation

If you use this model, please cite the MMS base model:

@article{pratap2023scaling,
  title   = {Scaling Speech Technology to 1,000+ Languages},
  author  = {Pratap, Vineel and Tjandra, Andros and Shi, Bowen and Tomasello, Paden and Babu, Arun and Kundu, Sayani and Elkahky, Ali and Ni, Zhaoheng and Vyas, Apoorv and Fazel-Zarandi, Maryam and Baevski, Alexei and Adi, Yossi and Zhang, Xiaohui and Hsu, Wei-Ning and Conneau, Alexis and Auli, Michael},
  journal = {arXiv preprint arXiv:2305.13516},
  year    = {2023}
}

And the NE-ASR adapter release (placeholder; replace when the canonical publication is available):

@misc{katiyar2026neasr,
  author       = {Katiyar, Sulabh},
  title        = {NE-ASR: MMS-1B Adapters for Northeast Indian Languages},
  year         = {2026},
  howpublished = {\url{/sulabhkatiyar/ne-asr-clk}},
  note         = {Placeholder citation; replace with the canonical publication when available.}
}

License

CC-BY-NC 4.0. This adapter is derived from facebook/mms-1b-all, which is released under CC-BY-NC 4.0. When you use this adapter, you must comply with the MMS license terms.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for sulabhkatiyar/ne-asr-clk

Base model

facebook/mms-1b-all

Finetuned

(413)

this model

Datasets used to train sulabhkatiyar/ne-asr-clk

Paper for sulabhkatiyar/ne-asr-clk

Scaling Speech Technology to 1,000+ Languages

Paper • 2305.13516 • Published May 22, 2023 • 12

Evaluation results

wer on sulabhkatiyar/ne-asr-clk
test set self-reported

67.590
cer on sulabhkatiyar/ne-asr-clk
test set self-reported

19.690