Skip to content

[Bug]: Deleting a resource folder triggers full recursive refresh of entire resource tree #2959

Description

@dfwgj

Bug Description

When deleting a resource folder via _enqueue_delete_refresh, the system creates a SemanticMsg with recursive=True (the default), causing the entire resource tree to be re-processed — including all unrelated sibling folders. This results in thousands of unnecessary embedding tasks, API rate limiting (429), server health check timeouts, and massive token waste (~400W tokens wasted in one day).

Steps to Reproduce

  1. Upload a large code repository as a resource (e.g., the OpenViking repo itself with 2874 files)
  2. Upload another resource folder with ~1000 files
  3. Delete the second folder
  4. Observe the embedding task queue

Expected Behavior

Deleting a folder should:

  • Update the parent directory's overview to reflect the deletion
  • Use existing abstracts for sibling directories (not re-process them)
  • Trigger a small number of embedding tasks (only for the parent directory, ~2 tasks)

Actual Behavior

Deleting a folder triggers:

  • A SemanticMsg with uri=viking://resources, recursive=True, changes={'deleted': ['viking://resources/<folder>/']}
  • 2874+ embedding tasks for the entire resource tree (including unrelated sibling folders)
  • 15,000+ 429 rate limit errors from the VLM API
  • Server health check timeouts (container marked unhealthy)
  • ~400W tokens wasted on unnecessary VLM calls for re-processing already-indexed files

Minimal Reproducible Example

# File: openviking/service/fs_service.py:313-345
async def _enqueue_delete_refresh(self, *, root_uri, deleted_uri, context_type, ctx):
    msg = SemanticMsg(
        uri=root_uri,           # viking://resources
        context_type=context_type,
        changes={"deleted": [deleted_uri]},
        # ❌ BUG: No explicit recursive=False
        # SemanticMsg defaults to recursive=True
    )
    await semantic_queue.enqueue(msg)


`SemanticMsg` default:

# File: openviking/storage/queuefs/semantic_msg.py:46
@dataclass
class SemanticMsg:
    recursive: bool = True  # Default is True!

Error Logs

07:15:14 Processing semantic generation for: SemanticMsg(
  uri='viking://resources', 
  recursive=True,
  changes={'deleted': ['viking://resources/<folder>/']}
)
07:19:05 Registered 2874 embedding tasks ← BUG: should be ~2 tasks
07:27:04 2874 tasks completed (8 minutes of unnecessary work)
07:46:17 Server recovers (was unhealthy for ~9 minutes)

OpenViking Version

main

Python Version

3.13

Operating System

Windows

Model Backend

None

Additional Context

Root Cause

_enqueue_delete_refresh (introduced in 2583da84 "Feature/wiki link (#2558)") doesn't set recursive=False. The SemanticMsg default is recursive=True, so the entire resource tree gets re-processed on delete.

Why recursive=False Is Correct

With recursive=False, the SemanticDagExecutor:

  1. Processes only files in the parent directory itself
  2. Uses existing abstracts for children directories (via _finalize_children_abstracts)
  3. Generates a new overview reflecting the deletion
  4. Does NOT re-process sibling directories

Related Issues/PRs

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Fields

No fields configured for issues without a type.

Projects

Status
Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions