Skip to content

Two crashes when parsing PDFs #26

@TACIXAT

Description

@TACIXAT

Govdocs -

000899.pdf
001940.pdf

Parsing PDF obj 62 0Traceback (most recent call last):
  File "/usr/local/bin/polyfile", line 11, in <module>
    load_entry_point('polyfile===0.1.6-git', 'console_scripts', 'polyfile')()
  File "/home/taxicat/.local/lib/python3.6/site-packages/polyfile-0.1.6_git-py3.6.egg/polyfile/__main__.py", line 99, in main
    for match in matcher.match(file_path, progress_callback=progress_callback, trid_defs=trid_defs):
  File "/home/taxicat/.local/lib/python3.6/site-packages/polyfile-0.1.6_git-py3.6.egg/polyfile/polyfile.py", line 178, in match
    yield from submatch_iter
  File "/home/taxicat/.local/lib/python3.6/site-packages/polyfile-0.1.6_git-py3.6.egg/polyfile/pdf.py", line 296, in submatch
    yield from parse_pdf(file_stream, matcher=self.matcher, parent=self)
  File "/home/taxicat/.local/lib/python3.6/site-packages/polyfile-0.1.6_git-py3.6.egg/polyfile/pdf.py", line 290, in parse_pdf
    yield from parse_object(file_stream, object, matcher=matcher, parent=parent)
  File "/home/taxicat/.local/lib/python3.6/site-packages/polyfile-0.1.6_git-py3.6.egg/polyfile/pdf.py", line 118, in parse_object
    yield from _emit_dict(oPDFParseDictionary.parsed, obj, parent.offset)
  File "/home/taxicat/.local/lib/python3.6/site-packages/polyfile-0.1.6_git-py3.6.egg/polyfile/pdf.py", line 38, in _emit_dict
    value_end = value[-1].offset.offset + len(value[-1].token)
IndexError: list index out of range
Parsing PDF obj 424 0Traceback (most recent call last):
  File "/usr/local/bin/polyfile", line 11, in <module>
    load_entry_point('polyfile===0.1.6-git', 'console_scripts', 'polyfile')()
  File "/home/taxicat/.local/lib/python3.6/site-packages/polyfile-0.1.6_git-py3.6.egg/polyfile/__main__.py", line 99, in main
    for match in matcher.match(file_path, progress_callback=progress_callback, trid_defs=trid_defs):
  File "/home/taxicat/.local/lib/python3.6/site-packages/polyfile-0.1.6_git-py3.6.egg/polyfile/polyfile.py", line 178, in match
    yield from submatch_iter
  File "/home/taxicat/.local/lib/python3.6/site-packages/polyfile-0.1.6_git-py3.6.egg/polyfile/pdf.py", line 296, in submatch
    yield from parse_pdf(file_stream, matcher=self.matcher, parent=self)
  File "/home/taxicat/.local/lib/python3.6/site-packages/polyfile-0.1.6_git-py3.6.egg/polyfile/pdf.py", line 290, in parse_pdf
    yield from parse_object(file_stream, object, matcher=matcher, parent=parent)
  File "/home/taxicat/.local/lib/python3.6/site-packages/polyfile-0.1.6_git-py3.6.egg/polyfile/pdf.py", line 118, in parse_object
    yield from _emit_dict(oPDFParseDictionary.parsed, obj, parent.offset)
  File "/home/taxicat/.local/lib/python3.6/site-packages/polyfile-0.1.6_git-py3.6.egg/polyfile/pdf.py", line 61, in _emit_dict
    ''.join(v.token for v in value),
  File "/home/taxicat/.local/lib/python3.6/site-packages/polyfile-0.1.6_git-py3.6.egg/polyfile/pdf.py", line 61, in <genexpr>
    ''.join(v.token for v in value),
AttributeError: 'str' object has no attribute 'token'

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions