Skip to content

msexcel_backend.py doesn’t parse complex Excel tables properly. #834

@rafaelsanchezsouza

Description

@rafaelsanchezsouza

Bug

When attempting to open an Excel document with complex tables, Docling fails to extract the tables correctly.

Steps to reproduce

from docling.document_converter import DocumentConverter

source = "./excel-tests.xlsx"  # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())  # output: "## Docling Technical Report[...]"

excel-tests.xlsx

Output

HIGH VOLTAGE SWITCHBOARD
DATA SHEET
MODEL-000
2016
Page 1 of 10
Package no.:
13156456
Doc. no.:
144564
Rev.
A
Power system
1
=(A11+1)
=(A12+1)
=(A13+1)
=(A14+1)
=(A15+1)
Construction
7
=(A18+1)
=(A19+1)
Environmental conditions
=(A20+1)
=(A22+1)
=(A23+1)
=(A24+1)
Arc test
=(A25+1)
Notes
1
2
Rated system voltage
Rated system frequency
No. of phases
System earthing
Earth fault current
Control voltage supply
kV : 130 (131 Um, 132 AC, 133 BIL)
Hz : 134
: 3
: Solidly Earthed
: 135 kA
: 2 x 136V AC UPS 1 x 137V AC normal
A : 135 kA
: 2 x 136V AC UPS 1 x 137V AC normal
Metal-enclosed partition
VT for cable discharging
Voltage and Current measurement
No
Low Power Instrument Transformers
Hazardous area classification
Ambient temp.
Location
Humidity
: Non hazardous
: Min. -5, max. +40
: Indoor
: 100
Converted to 110VDC Converted to 110VDC Converted to 110VDC Converted to 110VDC Converted to 110VDC
Arc test (type test) Arc test (type test) Arc test (type test) Arc test (type test) Arc test (type test)
None None None None None

Docling version

Docling version: 2.17.0
Docling Core version: 2.16.0
Docling IBM Models version: 3.3.0
Docling Parse version: 3.1.2
Python: cpython-310 (3.10.7)
Platform: Windows-10-10.0.19045-SP0

Python version

Python 3.10.7

Final Considerations

I understand that the table is complex, so I would like to know what would be the requirements for an Excel document to work with Docling. Digging into the code, I noticed this:

Hope it helps,

Let me know if you need more information.

Have a nice day!

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingenhancementNew feature or requestxlsxissue related to xlsx backend

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions