Skip to content

Conversation

@EnjoyBacon7
Copy link
Collaborator

@EnjoyBacon7 EnjoyBacon7 commented Nov 4, 2025

API changes:

  • OpenRAG now optionally expects "datetime" as as file metadata.
  • Profiding "modified_at" and "created_at" in indexing requests overrides file-provided created and modified times
  • "indexed_at" is ignored when provided in indexing request (it is overridden by the current date)

EnjoyBacon7 and others added 6 commits October 22, 2025 16:51
- openrag/utils/temporal.py
TemporalQueryNormalizer class extracts temporal filters
Date patterns recognition in multiple languages
Relative time Extraction

- openrag/components/indexer/chunker/chunker.py
Chunkers now add an indexed_at timestamp to documents
It is expected that indexed documents provide a created_at timestamp if available

- openrag/components/indexer/vectordb/vectordb.py
Milvus schema updated to include created_at and indexed_at fields
Added Temporal filtering support in vector database queries

- openrag/components/retriever.py & pipeline.py
Added temporal_filter parameter to all retrievers
Automatic temporal extraction from queries via TemporalQueryNormalizer
Injects current UTC datetime into system prompt

- openrag/components/reranker.py
Reranker now combines relevance and temporal scores using a linear decay formula

- RERANKER_TEMPORAL_WEIGHT (default 0.3)
- RERANKER_TEMPORAL_DECAY_DAYS (default 365)
Added extraction for "modified_at" field in indexation
Added "datetime" metadata field as preferred field for date information
Added formatted prompt logging in DEBUG mode
Fixed db search with date filters to use OR logic between date fields
@EnjoyBacon7 EnjoyBacon7 marked this pull request as draft November 4, 2025 16:00
@dodekapod dodekapod self-requested a review November 13, 2025 10:52
4. Temporal Awareness
* Pay attention to the **temporal context** of both the query and the retrieved documents.
* Each document includes **creation_date** and **indexed_date** metadata indicating when it was created and indexed.
* When the user asks about **recent events**, **latest updates**, or uses temporal references (e.g., "last week", "yesterday", "this year"), prioritize documents with **more recent dates**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reformulation proposition with less redundancy: When the query includes temporal references (e.g., "last week", "yesterday", "this year"), prioritize documents with **more recent dates**.

self.relative_number_pattern = r'(\d+)\s*\w+|\w+\s+(\d+)'

# English patterns for backward compatibility
self.english_patterns = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming seems incorrect, it sounds more something like common_languages_patterns?

return self._get_last_n_days(number)
else:
# Large number, likely days
return self._get_last_n_days(number)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This heuristic seems a bit risky to me; what if the query is something like 5 years or 12 months? We'll fall in days, right? Likewise, if I have a query like "summarize the documents mentioning 7 eleven acquisition" , we'll take it as a time query for 7 days, right?
I feel like we could have a lot of false positive here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants