Skip to content

Bug: When paper PDFs already exist on disk but the RAG index doesn't (e.g., index was deleted, or fresh deployment with pre-existing files), new_downloaded_files will be empty (since the files are skipped as "already downloaded"), so index_papers won't rebuild the index. This means generate_report will call get_info against an empty/stale index and return no context. #25

Description

@Sreehari05055

Bug: When paper PDFs already exist on disk but the RAG index doesn't (e.g., index was deleted, or fresh deployment with pre-existing files), new_downloaded_files will be empty (since the files are skipped as "already downloaded"), so index_papers won't rebuild the index. This means generate_report will call get_info against an empty/stale index and return no context.

The old code didn't have this issue because it checked downloaded_files, which included both newly downloaded and already-existing files.

Consider also checking whether the RAG index currently exists (e.g., BaseRAGPipeline.index is None) before deciding to skip the rebuild. This way, if the index is missing but papers exist on disk, the index will still be rebuilt.

        # Determine if an index already exists on the RAG service. If it doesn't,
        # we should rebuild even when there are no newly downloaded files, since
        # papers may already exist on disk (e.g., fresh deployment or deleted index).
        existing_index = getattr(self.rag_service, "index", None)
        has_index = existing_index is not None

        if not state['new_downloaded_files'] and has_index:
            logger.info("No new papers downloaded. Reusing existing RAG index.")
            return {}
        
        logger.info("Rebuilding RAG index with current papers...")

Originally posted by @Copilot in #24 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions