fix(files): stop emitting DeprecationWarning at import edgar time#832
Merged
dgunning merged 4 commits intoMay 26, 2026
Merged
Conversation
The legacy HTML modules (edgar.files.html_documents, edgar.files.html, edgar.files.htmltools) emitted DeprecationWarnings at module top. Because edgartools' own startup cascade imports them (edgar.__init__ -> edgar._filings -> edgar._markdown -> edgar.files.html_documents), the warnings fired on every `import edgar`. Downstream test suites running under `-W error` (a common, recommended pytest setup) had to install warnings filters just to let `import edgar` succeed. Internal callers cannot be ergonomically removed today — the legacy classes are still load-bearing inside edgartools — so move the deprecation signal from module-top to the relevant class `__init__` (or `__post_init__` for the @DataClass Document), with frame inspection that suppresses the warning when the call site is itself inside edgartools. User code that instantiates HtmlDocument, ChunkedDocument, SECHTMLParser, or the legacy Document still receives the standard DeprecationWarning pointing at its own call site. Result: `python -W error -c "import edgar"` succeeds, and the user-facing deprecation notice is preserved at the actual API surface.
CodeFactor flagged the previous sys._getframe(1) call as protected-access. inspect.currentframe() is the documented public API and is equivalent on CPython; the helper already tolerates the None return that PyPy or restricted interpreters might give. Behavior unchanged — all five regression tests in tests/test_internal_deprecation_silence.py still pass.
CodeFactor flagged the leftover `import warnings` after the module-top warnings.warn call was moved into class init in the previous commit. The module no longer references warnings, so the import is dead.
CodeFactor flagged the bare `try/except Exception: pass` as a broad-catch antipattern. The downstream failure is a specific AttributeError from chunks2df on minimal HTML; use contextlib.suppress to express the same suppression with a typed exception.
dgunning
added a commit
that referenced
this pull request
May 28, 2026
Added: - xbrl.calculation_linkbase() — per-filing calculation linkbase as a pandas DataFrame, one row per parent->child arc (GH #766 Phase 1) - Statement.extension_arcs() — surfaces filer-authored concepts that participate in a statement's calc linkbase but are absent from its presentation tree (GH #766 Phase 2) - Section.markdown() — structure-preserving per-section markdown for per-item chunkers / RAG pipelines (PR #833, @HonzaCuhel) Fixed: - StreamingParser dropped 20%+ of text from <span>-wrapped paragraphs on filings crossing the 10MB streaming threshold (PR #830, @kevinchiu) - HTTP_MGR had no default timeout — stalled requests could pin workers indefinitely (PR #831, @kevinchiu) - 13F-HR holdings merged Put/Call positions into the underlying equity row, losing the PutCall column (GH #824) - import edgar emitted DeprecationWarning on every startup, breaking downstream test suites running under -W error (PR #832, @kevinchiu) - Filing.search() / Filing.grep() returned nothing on pre-2002 plain-text filings (GH #819) - TOC analyzer fabricated phantom Items on 10-Q filings via three 10-K-shaped heuristics that fired regardless of form (PR #827, @HonzaCuhel) - SearchResults panel labels conflated BM25 rank with section index (GH #765) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Symptom
python -W error -c "import edgar"fails in a clean venv:Downstream test suites that run under
-W error(a common, recommendedpytest setup) cannot import edgar at all without installing manual
warnings.filterwarnings(…)calls just to silence three of edgartools'own internal imports.
Root cause
edgar/files/html_documents.py,edgar/files/html.py, andedgar/files/htmltools.pyeach emit a top-levelwarnings.warn(…, DeprecationWarning). edgartools' own startup cascadeimports all three — for example:
So the warning fires on every
import edgar, with the caller being aninternal edgartools module. Worse, after the first import the module is
cached, so users who later access the deprecated API directly never get
the deprecation signal — the warning is loud where it shouldn't be and
silent where it should.
Fix
Move the deprecation signal from module top into the relevant class
entry points (
HtmlDocument.__init__,ChunkedDocument.__init__,SECHTMLParser.__init__, andDocument.__post_init__for the@dataclass), via a small shared helper atedgar/files/_deprecation.py::warn_legacy_html_usage.The helper walks the call stack past the deprecation helper itself, the
three deprecated modules, and the
dataclassesmodule (which hostssynthesized
__init__trampolines). If the first non-transparent framebelongs to
edgaror anyedgar.*submodule, the call is internal andthe warning is suppressed. Any other caller — user code, notebooks,
tests, third-party libraries — receives the standard
DeprecationWarningpointing at its own call site.This trades the "fires once at import" pattern for "fires per
user-driven instantiation," which is the standard Python pattern for
class-level deprecation and is what API consumers expect.
Tests
tests/test_internal_deprecation_silence.py:test_import_edgar_under_W_error— subprocess:python -W error -c "import edgar"exits 0. Fails onmain; passes with this change.test_import_deprecated_submodules_under_W_error— subprocess:importing each deprecated submodule directly under
-W erroris alsosilent.
test_user_instantiation_of_html_document_warns,test_user_instantiation_of_legacy_document_warns,test_user_instantiation_of_chunked_document_warns— confirm thatuser-code instantiation still emits
DeprecationWarning, so theuser-facing deprecation notice is preserved.
All five pass with the fix; the import test fails on
main(catchesthe regression).