Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3084 +/- ##
============================================
+ Coverage 81.18% 81.65% +0.47%
- Complexity 1506 1623 +117
============================================
Files 268 276 +8
Lines 7356 7850 +494
Branches 325 353 +28
============================================
+ Hits 5972 6410 +438
- Misses 1226 1272 +46
- Partials 158 168 +10 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
* Arcade demo in about page of docs. Signed-off-by: merobi-hub <merobi@gmail.com> * Move demo to homepage component. Signed-off-by: merobi-hub <merobi@gmail.com> --------- Signed-off-by: merobi-hub <merobi@gmail.com> Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
…19.1 (MarquezProject#3005) Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
…arquezProject#3004) Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: Willy Lulciuc <willy.lulciuc@gmail.com> Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: Willy Lulciuc <willy.lulciuc@gmail.com> Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
…test configurations Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
…urity Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
…19.1 (MarquezProject#3005) Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
…arquezProject#3004) Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: Willy Lulciuc <willy.lulciuc@gmail.com> Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: Willy Lulciuc <willy.lulciuc@gmail.com> Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
- Added PartitionManagementService to handle creation and cleanup of database partitions. - Implemented DatasetVersionData and RunData models for lineage data representation. - Created migration scripts (V75, V76) for partitioned denormalized tables and management functions Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
…RunDataMapper, and RunData Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
- Added PartitionManagementService to handle creation and cleanup of database partitions. - Implemented DatasetVersionData and RunData models for lineage data representation. - Created migration scripts (V75, V76) for partitioned denormalized tables and management functions Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
…ineage services Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
…ineage services Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
…geService - Add V77__BackfillDenormalizedLineageTablesTest with 17 tests covering migration metadata, empty database, chunking, parent-child runs, large dataset skip, and error handling - Expand DenormalizedLineageServiceTest with 11 new tests for partition stats, analyze operations, error handling, delete/repopulate logic, parent detection, and null timestamp handling - Fix database cleanup in V77 tests by explicitly cleaning denormalized tables and updating PostgreSQL statistics with VACUUM ANALYZE Improves coverage for V77 migration from 23.46% to ~90% and DenormalizedLineageService from 73.28% to ~95% Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
Signed-off-by: swar00pduthks <swaroopduthks@gmail.com>
46666b9 to
50b4b72
Compare
|
@thijs-s can you check what do you think about this runlineage implementation , I have done on this dropwizard upgrade and if naming conventions for table please help to review |
|
Definitely! I'm a little afraid of the size but will check it. |
Thanks , the size is because of the dropwizard pull request is included in this (#3056) as it was not yet approved.. I can remove that .. and only raise pull request for runtime lineage .. let me know |
|
no worries, I will try to handle it. if not then will let you know. |
Problem
👋 Thanks for opening a pull request! Please include a brief summary of the problem your change is trying to solve, or bug fix. If your change fixes a bug or you'd like to provide context on why you're making the change, please link the issue as follows:
Closes: #3054 , #2772
Solution
New Feature: This PR introduces run-level graph visualization - enabling users to visualize relationships between dataset versions, job versions, and run nodes to simplify data troubleshooting and impact analysis.
Scalable Architecture: To ensure this feature can scale from day one, we implement CQRS (Command Query Responsibility Segregation) pattern with two denormalized tables optimized for different graph views:
run_parent_lineage_denormalized) - Business users see streamlined graphs with only top-level pipeline run nodes (~50 nodes for typical workloads)run_lineage_denormalized) - Engineers access comprehensive graphs with all child run nodes for deep debugging (~13,500 nodes for detailed analysis)Performance: Designed for speed at scale - parent run graphs render in ~0.1s, detailed run graphs in ~0.8s, even with millions of historical runs. The denormalized design pre-computes 5-8 table joins, enabling sub-second graph traversal and rendering.
One-line summary: Introduces run-level graph visualization with denormalized tables and partitioning for scalable, sub-second lineage traversal supporting both parent-level (business) and detailed (technical) graph views.
Checklist
CHANGELOG.md(Depending on the change, this may not be necessary)..sqldatabase schema migration according to Flyway's naming convention (if relevant)