Bug
When attempting to open an Excel document with complex tables, Docling fails to extract the tables correctly.
Steps to reproduce
from docling.document_converter import DocumentConverter
source = "./excel-tests.xlsx" # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown()) # output: "## Docling Technical Report[...]"
excel-tests.xlsx
Output
| HIGH VOLTAGE SWITCHBOARD |
| DATA SHEET |
| MODEL-000 |
| 2016 |
| Page 1 of 10 |
| Power system |
| 1 |
| =(A11+1) |
| =(A12+1) |
| =(A13+1) |
| =(A14+1) |
| =(A15+1) |
| Construction |
| 7 |
| =(A18+1) |
| =(A19+1) |
| Environmental conditions |
| =(A20+1) |
| =(A22+1) |
| =(A23+1) |
| =(A24+1) |
| Arc test |
| =(A25+1) |
| Notes |
| 1 |
| 2 |
| Rated system voltage |
| Rated system frequency |
| No. of phases |
| System earthing |
| Earth fault current |
| Control voltage supply |
| kV |
: |
130 (131 Um, 132 AC, 133 BIL) |
| Hz |
: |
134 |
|
: |
3 |
| : |
Solidly Earthed |
| : |
135 kA |
| : |
2 x 136V AC UPS 1 x 137V AC normal |
| A |
: |
135 kA |
|
: |
2 x 136V AC UPS 1 x 137V AC normal |
| Metal-enclosed partition |
| VT for cable discharging |
| Voltage and Current measurement |
| No |
| Low Power Instrument Transformers |
| Hazardous area classification |
| Ambient temp. |
| Location |
| Humidity |
| : |
Non hazardous |
| : |
Min. -5, max. +40 |
| : |
Indoor |
| : |
100 |
| Converted to 110VDC |
Converted to 110VDC |
Converted to 110VDC |
Converted to 110VDC |
Converted to 110VDC |
| Arc test (type test) |
Arc test (type test) |
Arc test (type test) |
Arc test (type test) |
Arc test (type test) |
| None |
None |
None |
None |
None |
Docling version
Docling version: 2.17.0
Docling Core version: 2.16.0
Docling IBM Models version: 3.3.0
Docling Parse version: 3.1.2
Python: cpython-310 (3.10.7)
Platform: Windows-10-10.0.19045-SP0
Python version
Python 3.10.7
Final Considerations
I understand that the table is complex, so I would like to know what would be the requirements for an Excel document to work with Docling. Digging into the code, I noticed this:
Hope it helps,
Let me know if you need more information.
Have a nice day!