Image-to-Text
Transformers
Safetensors
mistral3
text-generation
ocr
document-understanding
vision-language
pdf
tables
forms
Eval Results
🇪🇺 Region: EU
Instructions to use lightonai/LightOnOCR-1B-1025 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use lightonai/LightOnOCR-1B-1025 with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="lightonai/LightOnOCR-1B-1025")# Load model directly from transformers import AutoProcessor, AutoModelForSeq2SeqLM processor = AutoProcessor.from_pretrained("lightonai/LightOnOCR-1B-1025") model = AutoModelForSeq2SeqLM.from_pretrained("lightonai/LightOnOCR-1B-1025") - Notebooks
- Google Colab
- Kaggle
| - dataset: | |
| id: allenai/olmOCR-bench | |
| task_id: overall | |
| value: 76.1 | |
| notes: "Excluding Headers & Footers category" | |
| source: | |
| url: /papers/2601.14251 | |
| name: LightOnOCR technical report | |
| user: Bapt120 | |
| - dataset: | |
| id: allenai/olmOCR-bench | |
| task_id: arxiv_math | |
| value: 81.4 | |
| source: | |
| url: /papers/2601.14251 | |
| name: LightOnOCR technical report | |
| user: Bapt120 | |
| - dataset: | |
| id: allenai/olmOCR-bench | |
| task_id: old_scans_math | |
| value: 71.6 | |
| source: | |
| url: /papers/2601.14251 | |
| name: LightOnOCR technical report | |
| user: Bapt120 | |
| - dataset: | |
| id: allenai/olmOCR-bench | |
| task_id: table_tests | |
| value: 76.4 | |
| source: | |
| url: /papers/2601.14251 | |
| name: LightOnOCR technical report | |
| user: Bapt120 | |
| - dataset: | |
| id: allenai/olmOCR-bench | |
| task_id: old_scans | |
| value: 35.2 | |
| source: | |
| url: /papers/2601.14251 | |
| name: LightOnOCR technical report | |
| user: Bapt120 | |
| - dataset: | |
| id: allenai/olmOCR-bench | |
| task_id: multi_column | |
| value: 80.0 | |
| source: | |
| url: /papers/2601.14251 | |
| name: LightOnOCR technical report | |
| user: Bapt120 | |
| - dataset: | |
| id: allenai/olmOCR-bench | |
| task_id: long_tiny_text | |
| value: 88.7 | |
| source: | |
| url: /papers/2601.14251 | |
| name: LightOnOCR technical report | |
| user: Bapt120 | |
| - dataset: | |
| id: allenai/olmOCR-bench | |
| task_id: headers_footers | |
| value: 35.5 | |
| notes: "Instead of removing headers and footers, our model is trained for full-page transcription and explicitly rewards their presence (via flipped RLVR tests), which lowers this score under the original benchmark scoring." | |
| source: | |
| url: /papers/2601.14251 | |
| name: LightOnOCR technical report | |
| user: Bapt120 | |
| - dataset: | |
| id: allenai/olmOCR-bench | |
| task_id: baseline | |
| value: 99.6 | |
| source: | |
| url: /papers/2601.14251 | |
| name: LightOnOCR technical report | |
| user: Bapt120 |