Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
e6a2bcf
Add LightOnOCR model implementation
baptiste-aubertin Oct 15, 2025
ed388e6
fix modular docstring error
baptiste-aubertin Oct 15, 2025
7f037a7
Improve LightOnOCR documentation and exports
baptiste-aubertin Oct 15, 2025
22f3288
Rename LightOnOCR multi-modal projector to vision projection and add …
baptiste-aubertin Oct 16, 2025
cd9fa3a
fix load without lmhead in safetensor
baptiste-aubertin Oct 16, 2025
3c653a6
temp
baptiste-aubertin Oct 16, 2025
03c0cf0
Refactor LightOnOCR config to use sub_configs pattern
baptiste-aubertin Oct 18, 2025
ba10af5
rename processor kwargs
baptiste-aubertin Oct 18, 2025
524b6a4
Refactor LightOnOCR processor to use effective patch size
baptiste-aubertin Oct 18, 2025
dab7772
Improve LightOnOCR generation support with proper KV cache handling
baptiste-aubertin Oct 18, 2025
73642e6
add modeling tests and compile modular
baptiste-aubertin Oct 18, 2025
1bf1c89
Clean up LightOnOCR code and remove unused variables
baptiste-aubertin Oct 18, 2025
7462ef7
Add LightOnOCR documentation and test improvements
baptiste-aubertin Oct 18, 2025
cdcbb62
Refactor LightOnOCR to use standardized RopeParameters and consolidat…
baptiste-aubertin Oct 27, 2025
010cbda
Rename LightOnOCR model classes and fix config parameter naming
baptiste-aubertin Oct 27, 2025
3940608
Add missing parameter documentation for LightOnOCR config
baptiste-aubertin Oct 27, 2025
bdcc07b
Simplify LightOnOCR forward methods with decorators and fix loss func…
baptiste-aubertin Oct 27, 2025
d212e26
Reorganize LightOnOCR components to place vision before text and remo…
baptiste-aubertin Oct 27, 2025
53d50f1
fixup
baptiste-aubertin Oct 27, 2025
d237007
Fix image token expansion logic in Processor
staghado Oct 28, 2025
b095a01
Copy pixtral attention to have both pixtral and qwen eager attention …
baptiste-aubertin Oct 28, 2025
54389f2
remove LightOnOCRTextPreTrainedModel from modular to be able to retur…
baptiste-aubertin Oct 28, 2025
6edc435
Support both tensor and list formats for image_sizes parameter
baptiste-aubertin Oct 29, 2025
8b5be9f
Update tests/models/lightonocr/test_processor_lightonocr.py
baptiste-aubertin Oct 29, 2025
da44e3f
Update docs/source/en/model_doc/lightonocr.md
baptiste-aubertin Oct 29, 2025
fcee5eb
Move image_sizes tensor conversion from model to processor
baptiste-aubertin Oct 29, 2025
00e2546
Simplify weight initialization to use uniform text_config initializer…
baptiste-aubertin Oct 29, 2025
974bbf1
rename 1 letter vars
baptiste-aubertin Oct 29, 2025
ab4c94f
Get image special tokens from tokenizer attributes in processor
baptiste-aubertin Oct 29, 2025
895bdfb
Return BaseModelOutputWithPast from LightOnOCRModel forward
baptiste-aubertin Oct 29, 2025
7d2b1bd
Add chat template to LightOnOCR processor test setup
baptiste-aubertin Oct 29, 2025
a1e4e19
rm get_output_embeddings from LightOnOCRForConditionalGeneration (not…
baptiste-aubertin Oct 29, 2025
019d601
Add OCR integration test for LightOnOCR model
baptiste-aubertin Oct 29, 2025
8caf104
Fix device/dtype handling in LightOnOCR vision processing
baptiste-aubertin Oct 29, 2025
ee7b019
Add TransformersKwargs type hints to LightOnOCR forward methods
baptiste-aubertin Oct 29, 2025
26148cf
Make torch imports conditional and use _from_config for LightOnOCR su…
baptiste-aubertin Oct 30, 2025
7ce262d
Set patch_size at runtime instead of modifying class defaults in Ligh…
baptiste-aubertin Oct 30, 2025
c904c24
type kwargs
baptiste-aubertin Oct 31, 2025
663e0b9
Remove loocr forward comments
baptiste-aubertin Oct 31, 2025
09c1e9a
Add vocab_size property and fix image_token_id in LightOnOCR
baptiste-aubertin Oct 31, 2025
4aa257e
Add vocab_size setter to LightOnOCR configuration
baptiste-aubertin Oct 31, 2025
d4a3d3b
Fix device mismatch in vision rotary embeddings and optimize test ima…
baptiste-aubertin Oct 31, 2025
b9982c8
Improve LightOnOCR integration test with similarity-based output vali…
baptiste-aubertin Oct 31, 2025
d67bd3f
Enable flex attention
baptiste-aubertin Oct 31, 2025
91a8c3f
Enable flex attention
baptiste-aubertin Nov 3, 2025
b0a71fc
Loocr description with blogpost
baptiste-aubertin Nov 5, 2025
8b47aa9
redundant tie_word_embeddings
baptiste-aubertin Nov 5, 2025
803d661
remove architecture from default config
baptiste-aubertin Nov 5, 2025
9874da7
vocab_size accessors
baptiste-aubertin Nov 5, 2025
a905c30
remove useless tensor conversion
baptiste-aubertin Nov 5, 2025
dc41e1c
remove useless conversion
baptiste-aubertin Nov 5, 2025
d5eff32
move dtype conversion to after image feature extraction
baptiste-aubertin Nov 5, 2025
1860d4d
remove useless stuff
baptiste-aubertin Nov 5, 2025
e7aeaad
fixup
baptiste-aubertin Nov 6, 2025
5f9998d
export text and vision config classes
baptiste-aubertin Nov 12, 2025
ea6281b
refactor(lightonocr): remove unused weight initialization and fix tie…
baptiste-aubertin Nov 18, 2025
9d084d8
fix(lightonocr): fix test failures for vocab_size access and device p…
baptiste-aubertin Nov 20, 2025
c5f0285
ruff
baptiste-aubertin Nov 20, 2025
1f91d31
fix mistake tokenizer
baptiste-aubertin Nov 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1072,6 +1072,8 @@
title: LayoutXLM
- local: model_doc/lfm2_vl
title: LFM2-VL
- local: model_doc/lightonocr
title: LightOnOCR
- local: model_doc/lilt
title: LiLT
- local: model_doc/llama4
Expand Down
66 changes: 66 additions & 0 deletions docs/source/en/model_doc/lightonocr.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the
License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

specific language governing permissions and limitations under the License. -->
*This model was released on {release_date} and added to Hugging Face Transformers on 2025-11-18.*

# LightOnOCR


**LightOnOCR** is a compact, end-to-end vision–language model for Optical Character Recognition (OCR) and document understanding. It achieves state-of-the-art accuracy in its weight class while being several times faster and cheaper than larger general-purpose VLMs.

📝 **[Read the full blog post](https://huggingface.co/blog/lightonai/lightonocr/)** | 📓 **[Finetuning notebook](https://colab.research.google.com/drive/1WjbsFJZ4vOAAlKtcCauFLn_evo5UBRNa?usp=sharing)**

**Model Overview**

LightOnOCR combines a Vision Transformer encoder(Pixtral-based) with a lightweight text decoder(Qwen3-based) distilled from high-quality open VLMs. It is optimized for document parsing tasks, producing accurate, layout-aware text extraction from high-resolution pages.




## LightOnOCRConfig

[[autodoc]] LightOnOCRConfig

## LightOnOCRTextConfig

[[autodoc]] LightOnOCRTextConfig

## LightOnOCRVisionConfig

[[autodoc]] LightOnOCRVisionConfig

## LightOnOCRProcessor

[[autodoc]] LightOnOCRProcessor
- __call__

## LightOnOCRTextModel

[[autodoc]] LightOnOCRTextModel
- forward

## LightOnOCRVisionModel

[[autodoc]] LightOnOCRVisionModel
- forward

## LightOnOCRModel

[[autodoc]] LightOnOCRModel
- forward

## LightOnOCRForConditionalGeneration

[[autodoc]] LightOnOCRForConditionalGeneration
- forward
2 changes: 2 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,7 @@
("lfm2_moe", "Lfm2MoeConfig"),
("lfm2_vl", "Lfm2VlConfig"),
("lightglue", "LightGlueConfig"),
("lightonocr", "LightOnOCRConfig"),
("lilt", "LiltConfig"),
("llama", "LlamaConfig"),
("llama4", "Llama4Config"),
Expand Down Expand Up @@ -665,6 +666,7 @@
("lfm2_moe", "Lfm2Moe"),
("lfm2_vl", "Lfm2Vl"),
("lightglue", "LightGlue"),
("lightonocr", "LightOnOCR"),
("lilt", "LiLT"),
("llama", "LLaMA"),
("llama2", "Llama2"),
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/auto/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,7 @@ class _BaseModelWithGenerate(PreTrainedModel, GenerationMixin):
("lfm2_moe", "Lfm2MoeModel"),
("lfm2_vl", "Lfm2VlModel"),
("lightglue", "LightGlueForKeypointMatching"),
("lightonocr", "LightOnOCRModel"),
("lilt", "LiltModel"),
("llama", "LlamaModel"),
("llama4", "Llama4ForConditionalGeneration"),
Expand Down Expand Up @@ -1004,6 +1005,7 @@ class _BaseModelWithGenerate(PreTrainedModel, GenerationMixin):
("kosmos-2", "Kosmos2ForConditionalGeneration"),
("kosmos-2.5", "Kosmos2_5ForConditionalGeneration"),
("lfm2_vl", "Lfm2VlForConditionalGeneration"),
("lightonocr", "LightOnOCRForConditionalGeneration"),
("llama4", "Llama4ForConditionalGeneration"),
("llava", "LlavaForConditionalGeneration"),
("llava_next", "LlavaNextForConditionalGeneration"),
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/processing_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@
("layoutlmv2", "LayoutLMv2Processor"),
("layoutlmv3", "LayoutLMv3Processor"),
("lfm2_vl", "Lfm2VlProcessor"),
("lightonocr", "LightOnOCRProcessor"),
("llama4", "Llama4Processor"),
("llava", "LlavaProcessor"),
("llava_next", "LlavaNextProcessor"),
Expand Down
7 changes: 7 additions & 0 deletions src/transformers/models/auto/tokenization_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -368,6 +368,13 @@
("led", ("LEDTokenizer", "LEDTokenizerFast" if is_tokenizers_available() else None)),
("lfm2", (None, "PreTrainedTokenizerFast" if is_tokenizers_available() else None)),
("lfm2_vl", (None, "PreTrainedTokenizerFast" if is_tokenizers_available() else None)),
(
"lightonocr",
(
"Qwen2Tokenizer",
"Qwen2TokenizerFast" if is_tokenizers_available() else None,
),
),
("lilt", ("LayoutLMv3Tokenizer", "LayoutLMv3TokenizerFast" if is_tokenizers_available() else None)),
(
"llama",
Expand Down
28 changes: 28 additions & 0 deletions src/transformers/models/lightonocr/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Copyright 2024 The Qwen Team and The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING

from ...utils import _LazyModule
from ...utils.import_utils import define_import_structure


if TYPE_CHECKING:
from .configuration_lightonocr import *
from .modeling_lightonocr import *
from .processing_lightonocr import *
else:
import sys

_file = globals()["__file__"]
sys.modules[__name__] = _LazyModule(__name__, _file, define_import_structure(_file), module_spec=__spec__)
Loading