Skip to content

Feat/zip upload connector#10

Merged
Grandvizir merged 5 commits into
mainfrom
feat/zip-upload-connector
Apr 10, 2026
Merged

Feat/zip upload connector#10
Grandvizir merged 5 commits into
mainfrom
feat/zip-upload-connector

Conversation

@Grandvizir
Copy link
Copy Markdown
Collaborator

Summary

Add a new ZIP upload connector that lets users import documents by uploading a .zip archive, fix connector creation form submission bugs, and improve error visibility during HTTP ingestion.

Changes

New: ZIP Upload Connector

  • New connector type zipupload: Users can upload a .zip file containing documents (PDF, DOCX, TXT, MD, HTML, etc.) directly from the UI. The archive is stored via Django's pluggable storage backend (local filesystem by
    default, Azure Blob / S3 in production).
  • connectors/zipupload.py: Connector implementation — lists files inside the zip, filters by supported extensions, skips macOS __MACOSX resource forks, and extracts individual files on demand for the ingestion pipeline.
  • connectors/models.py: Added ZIPUPLOAD to ConnectorType choices.
  • connectors/init.py: Register the new connector module.
  • connectors/views.py: Handle zip file upload in connector_create (validation, storage), and serve individual files from the zip in document_file.
  • score/settings.py: Added uploads storage backend config and UPLOAD_ROOT setting, with support for Azure Blob / S3 via environment variables.
  • pyproject.toml / requirements.txt: Added optional django-storages[azure] and django-storages[s3] dependencies.
  • connectors/migrations/0005_add_zipupload_connector_type.py: Migration for the new connector type.
  • Templates: Added ZIP upload config panel (file input) and icon/title in both the modal (_connector_cards.html) and standalone create page (create.html).

Fix: Connector creation form not submitting

  • Disable hidden required inputs: The Elasticsearch config_index input had a required attribute that silently blocked browser form validation when other connector types were selected. The JS updateModal() / update() functions
    now disable required inputs in hidden config panels and re-enable them when their panel becomes active.
  • Add data-turbo="false": Prevents Hotwire Turbo from intercepting form submissions in both the modal and standalone create forms.
  • Add visual validation feedback: The standalone create page now shows inline error messages and red borders on invalid fields.

Fix: HTTP connector documents not appearing

  • connectors/generic.py: _fetch_http() now raises ValueError when the HTTP response body is empty (e.g. HTTP 202 Accepted), instead of silently producing an empty document.
  • ingestion/pipeline.py: When text extraction fails or _process_document() raises an exception, the document is now created with ERROR status and an error message, instead of being silently skipped. Users can see failed
    documents in the connector detail view.

Test Plan

  • Tests pass (pytest)
  • Linting passes (ruff check .)
  • Manual testing done:
    • Upload a .zip file via the modal → connector is created → sync → documents from the zip appear in the connector detail
    • Upload an invalid file (not a zip) → error message is displayed
    • View a document from a zip connector → file content is served correctly
    • Open connector list → click each connector type card → fill form → submit → connector is created for all types
    • Submit form with empty name → validation error is shown
    • Create an HTTP generic connector with a URL returning empty content → sync → document appears with ERROR status
    • Create an HTTP generic connector with a valid URL → sync → document appears normally

Related Issues

Closes #

# Conflicts:
#	CHANGELOG.md
# Conflicts:
#	CHANGELOG.md
The Elasticsearch config_index input had a required attribute that
silently blocked form validation when other connector types were
selected, because the hidden panel's inputs still participated in
browser validation. Disable required fields in hidden config panels
and add data-turbo="false" to prevent Turbo from intercepting form
submissions.
…P response

HTTP connector now raises ValueError when the response body is empty
(e.g. HTTP 202). The ingestion pipeline now creates documents with
ERROR status when text extraction fails or an exception occurs, so
users can see what went wrong instead of seeing 0 documents.
@Grandvizir Grandvizir merged commit 3b6dc30 into main Apr 10, 2026
2 checks passed
@Grandvizir Grandvizir deleted the feat/zip-upload-connector branch April 10, 2026 12:55
@Grandvizir Grandvizir restored the feat/zip-upload-connector branch April 14, 2026 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant