fix: prevent storing entries with Unix epoch timestamp (fixes #97)#98
Merged
fix: prevent storing entries with Unix epoch timestamp (fixes #97)#98
Conversation
- Added MIN_VALID_TIMESTAMP constant (1998-01-01) in crawler/common.rs - Crawlers now filter out entries with timestamps before 1998-01-01 - Added default 30-day time window for queries without timestamp filters - Prevents slow full table scans (12.9s → 1.0s, 13x faster) - Returns recent data instead of starting from 1970 - Added unit test for MIN_VALID_TIMESTAMP validation - Created CLEANUP_PLAN.md for production database cleanup Fixes #97
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #97 by preventing erroneous broker entries with Unix epoch timestamp (1970-01-01) from being stored and returned in search results.
Changes
Bug Fixes
Added MIN_VALID_TIMESTAMP constant (1998-01-01) in
crawler/common.rsAdded default 30-day time window for queries without timestamp filters
Testing
test_min_valid_timestampto verify the thresholdProduction Impact
Performance Improvement