Bug: When paper PDFs already exist on disk but the RAG index doesn't (e.g., index was deleted, or fresh deployment with pre-existing files), new_downloaded_files will be empty (since the files are skipped as "already downloaded"), so index_papers won't rebuild the index. This means generate_report will call get_info against an empty/stale index and return no context.
The old code didn't have this issue because it checked downloaded_files, which included both newly downloaded and already-existing files.
Consider also checking whether the RAG index currently exists (e.g., BaseRAGPipeline.index is None) before deciding to skip the rebuild. This way, if the index is missing but papers exist on disk, the index will still be rebuilt.
# Determine if an index already exists on the RAG service. If it doesn't,
# we should rebuild even when there are no newly downloaded files, since
# papers may already exist on disk (e.g., fresh deployment or deleted index).
existing_index = getattr(self.rag_service, "index", None)
has_index = existing_index is not None
if not state['new_downloaded_files'] and has_index:
logger.info("No new papers downloaded. Reusing existing RAG index.")
return {}
logger.info("Rebuilding RAG index with current papers...")
Originally posted by @Copilot in #24 (comment)
Bug: When paper PDFs already exist on disk but the RAG index doesn't (e.g., index was deleted, or fresh deployment with pre-existing files),
new_downloaded_fileswill be empty (since the files are skipped as "already downloaded"), soindex_paperswon't rebuild the index. This meansgenerate_reportwill callget_infoagainst an empty/stale index and return no context.The old code didn't have this issue because it checked
downloaded_files, which included both newly downloaded and already-existing files.Consider also checking whether the RAG index currently exists (e.g.,
BaseRAGPipeline.index is None) before deciding to skip the rebuild. This way, if the index is missing but papers exist on disk, the index will still be rebuilt.Originally posted by @Copilot in #24 (comment)