feat: add async 'for' loop support to LogScanner (#424)#438
Open
qzyu999 wants to merge 4 commits intoapache:mainfrom
Open
feat: add async 'for' loop support to LogScanner (#424)#438qzyu999 wants to merge 4 commits intoapache:mainfrom
qzyu999 wants to merge 4 commits intoapache:mainfrom
Conversation
Contributor
|
@qzyu999 Ty for the PR, but I checked this branch out and integration tests for python still hang even when I run them locally. PTAL |
Author
Hi @fresh-borzoni, applied the fix. Ran |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: close #424
This pull request completes Issue #424 by enabling standard cross-boundary native Python
async forlanguage built-ins over the high-performance PyO3 wrappedLogScannerstream instance.Brief change log
Previously, PyFluss developers had to manually orchestrate
while Truepolling loops over network boundaries usingscanner.poll(timeout). This PR refactors the PythonLogScanneriterator logic by implementing the async traversal natively via Rust__anext__polling bindings and Python Generator__aiter__context adapters:ScannerKindinternals into a safely bufferedArc<tokio::sync::Mutex<ScannerState>>. This guarantees strict thread-safety and fulfills Rust's lifetime constraints enabling unboxed state transitions inside thepython_async_runtimestokioclosure..awaitfuture yield sequence smoothly without blocking event cycles or hardware threads directly!inspect.isasyncgen()compliance checks within strictly versioned Python 3.12+ engines (such as modern IPython Jupyter servers),__aiter__dynamically generates a properly wrapped coroutine generator dynamically inside the codebase viapy.run(). This completely masks the Python ecosystem's iterator type limitations automatically out-of-the-box.Tests
test_log_table.py::test_async_iterator: Integrated a testcontainers ecosystem confirming zero-configuration iteration capabilities function natively evaluatingasync for record in scannerperfectly without pipeline interruptions while yielding thousands of appended instances sequentially backwards matching existing legacy data frameworks.API and Format
Yes, this expands the API natively extending capabilities allowing
async forloops gracefully. Existing user logic leveraging explicit implementations of.poll_arrow()or legacy functions are untouched.Documentation
Yes, I updated integration tests acting as live documentation proof demonstrating the capability natively.