Skip to content

Exception on invalid xml. #145

@rillian

Description

@rillian

Some logging output got into my tei files, and hooktest asserts rather than reporting the error:

  File "${HOME}/HookTest/HookTest/capitains_units/cts.py", line 434, in auto_rng
    xml = parse(self.path)
  File "src/lxml/etree.pyx", line 3435, in lxml.etree.parse
  File "src/lxml/parser.pxi", line 1840, in lxml.etree._parseDocument
  File "src/lxml/parser.pxi", line 1866, in lxml.etree._parseDocumentFromURL
  File "src/lxml/parser.pxi", line 1770, in lxml.etree._parseDocFromFile
  File "src/lxml/parser.pxi", line 1163, in lxml.etree._BaseParser._parseDocFromFile
  File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 640, in lxml.etree._raiseParseError
  File "tests/repo1/data/hafez/divan/hafez.divan.perseus-eng1.xml", line 1
lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1

One may reproduce by prepending the string 'Garbage text\n' to e.g. the beginning of tests/repo1/data/hafez/divan/hafez.divan.perseus-eng1.xml.

The XMLSyntaxError is hidden by the imap_unordered call through the threadpool and presents instead as a MaybeEncodingError because lxml.etree can't pickle its _ListErrorLog. Flattening the parallel iterator to a serial one reveals the underlying issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions