Skip to content

fix(files): stop emitting DeprecationWarning at import edgar time#832

Merged
dgunning merged 4 commits into
dgunning:mainfrom
kevinchiu:fix/silence-internal-deprecations
May 26, 2026
Merged

fix(files): stop emitting DeprecationWarning at import edgar time#832
dgunning merged 4 commits into
dgunning:mainfrom
kevinchiu:fix/silence-internal-deprecations

Conversation

@kevinchiu
Copy link
Copy Markdown
Contributor

Symptom

python -W error -c "import edgar" fails in a clean venv:

DeprecationWarning: edgar.files.html_documents module (including
HtmlDocument) is deprecated and will be removed in v6.0…

Downstream test suites that run under -W error (a common, recommended
pytest setup) cannot import edgar at all without installing manual
warnings.filterwarnings(…) calls just to silence three of edgartools'
own internal imports.

Root cause

edgar/files/html_documents.py, edgar/files/html.py, and
edgar/files/htmltools.py each emit a top-level
warnings.warn(…, DeprecationWarning). edgartools' own startup cascade
imports all three — for example:

edgar/__init__.py
  → edgar._filings
      → edgar._markdown
          → edgar.files.html_documents   # warning fires here

So the warning fires on every import edgar, with the caller being an
internal edgartools module. Worse, after the first import the module is
cached, so users who later access the deprecated API directly never get
the deprecation signal — the warning is loud where it shouldn't be and
silent where it should.

Fix

Move the deprecation signal from module top into the relevant class
entry points (HtmlDocument.__init__, ChunkedDocument.__init__,
SECHTMLParser.__init__, and Document.__post_init__ for the
@dataclass), via a small shared helper at
edgar/files/_deprecation.py::warn_legacy_html_usage.

The helper walks the call stack past the deprecation helper itself, the
three deprecated modules, and the dataclasses module (which hosts
synthesized __init__ trampolines). If the first non-transparent frame
belongs to edgar or any edgar.* submodule, the call is internal and
the warning is suppressed. Any other caller — user code, notebooks,
tests, third-party libraries — receives the standard
DeprecationWarning pointing at its own call site.

This trades the "fires once at import" pattern for "fires per
user-driven instantiation," which is the standard Python pattern for
class-level deprecation and is what API consumers expect.

Tests

tests/test_internal_deprecation_silence.py:

  • test_import_edgar_under_W_error — subprocess: python -W error -c "import edgar" exits 0. Fails on main; passes with this change.
  • test_import_deprecated_submodules_under_W_error — subprocess:
    importing each deprecated submodule directly under -W error is also
    silent.
  • test_user_instantiation_of_html_document_warns,
    test_user_instantiation_of_legacy_document_warns,
    test_user_instantiation_of_chunked_document_warns — confirm that
    user-code instantiation still emits DeprecationWarning, so the
    user-facing deprecation notice is preserved.

All five pass with the fix; the import test fails on main (catches
the regression).

kevinchiu added 4 commits May 23, 2026 02:12
The legacy HTML modules (edgar.files.html_documents, edgar.files.html,
edgar.files.htmltools) emitted DeprecationWarnings at module top.
Because edgartools' own startup cascade imports them
(edgar.__init__ -> edgar._filings -> edgar._markdown ->
edgar.files.html_documents), the warnings fired on every `import
edgar`. Downstream test suites running under `-W error` (a common,
recommended pytest setup) had to install warnings filters just to let
`import edgar` succeed.

Internal callers cannot be ergonomically removed today — the legacy
classes are still load-bearing inside edgartools — so move the
deprecation signal from module-top to the relevant class `__init__`
(or `__post_init__` for the @DataClass Document), with frame
inspection that suppresses the warning when the call site is itself
inside edgartools. User code that instantiates HtmlDocument,
ChunkedDocument, SECHTMLParser, or the legacy Document still receives
the standard DeprecationWarning pointing at its own call site.

Result: `python -W error -c "import edgar"` succeeds, and the
user-facing deprecation notice is preserved at the actual API surface.
CodeFactor flagged the previous sys._getframe(1) call as protected-access.
inspect.currentframe() is the documented public API and is equivalent on
CPython; the helper already tolerates the None return that PyPy or
restricted interpreters might give. Behavior unchanged — all five
regression tests in tests/test_internal_deprecation_silence.py still
pass.
CodeFactor flagged the leftover `import warnings` after the module-top
warnings.warn call was moved into class init in the previous commit.
The module no longer references warnings, so the import is dead.
CodeFactor flagged the bare `try/except Exception: pass` as a
broad-catch antipattern. The downstream failure is a specific
AttributeError from chunks2df on minimal HTML; use contextlib.suppress
to express the same suppression with a typed exception.
Copy link
Copy Markdown
Owner

@dgunning dgunning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. Thanks for this PR

@dgunning dgunning merged commit 8a402f0 into dgunning:main May 26, 2026
6 checks passed
dgunning added a commit that referenced this pull request May 28, 2026
Added:
- xbrl.calculation_linkbase() — per-filing calculation linkbase as a
  pandas DataFrame, one row per parent->child arc (GH #766 Phase 1)
- Statement.extension_arcs() — surfaces filer-authored concepts that
  participate in a statement's calc linkbase but are absent from its
  presentation tree (GH #766 Phase 2)
- Section.markdown() — structure-preserving per-section markdown for
  per-item chunkers / RAG pipelines (PR #833, @HonzaCuhel)

Fixed:
- StreamingParser dropped 20%+ of text from <span>-wrapped paragraphs
  on filings crossing the 10MB streaming threshold (PR #830, @kevinchiu)
- HTTP_MGR had no default timeout — stalled requests could pin
  workers indefinitely (PR #831, @kevinchiu)
- 13F-HR holdings merged Put/Call positions into the underlying equity
  row, losing the PutCall column (GH #824)
- import edgar emitted DeprecationWarning on every startup, breaking
  downstream test suites running under -W error (PR #832, @kevinchiu)
- Filing.search() / Filing.grep() returned nothing on pre-2002
  plain-text filings (GH #819)
- TOC analyzer fabricated phantom Items on 10-Q filings via three
  10-K-shaped heuristics that fired regardless of form (PR #827,
  @HonzaCuhel)
- SearchResults panel labels conflated BM25 rank with section index
  (GH #765)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants