[Security] SSRF + path traversal chain in bio-research ncbi_utils.py and sra_geo_fetch.py

## Description

The `bio-research` plugin's Python scripts have two defense-in-depth concerns in how they fetch and download FASTQ data from external APIs.

**Severity: Low-Medium** (not immediately exploitable, but worth hardening)

## Issue 1: HTTP Protocol Downgrade on FASTQ Downloads (Medium)

**File:** `bio-research/skills/nextflow-development/scripts/utils/ncbi_utils.py` (line 343)

The ENA API is queried over HTTPS (line 314), but the actual FASTQ file downloads are forced to unencrypted HTTP:

```python
# Line 343 — FTP paths from ENA converted to HTTP (not HTTPS)
urls = [f"http://{url}" for url in ftp_urls.split(';') if url]
```

A real ENA response returns values like `ftp.sra.ebi.ac.uk/vol1/fastq/SRR635/000/SRR6357070/SRR6357070_1.fastq.gz`, which becomes `http://ftp.sra.ebi.ac.uk/...`.

**Impact:** FASTQ downloads (often multi-GB) happen over unencrypted HTTP. A network-level attacker could modify file contents in transit. While genomic data isn't secret, integrity matters for research reproducibility.

**Fix:** Change `http://` to `https://` on line 343. ENA supports HTTPS downloads.

## Issue 2: No Domain Validation on Download URLs (Low)

**File:** `bio-research/skills/nextflow-development/scripts/utils/ncbi_utils.py` (lines 338-344)

The `fastq_ftp` field from the ENA API response is used to construct download URLs without validating that they point to known ENA/NCBI domains:

```python
# Lines 338-344
ftp_urls = fields[ftp_idx]
if ftp_urls:
    urls = [f"http://{url}" for url in ftp_urls.split(';') if url]
    fastq_urls[srr] = urls
```

These URLs are then passed to `download_file()` which streams the response body to disk via `requests.get(url, stream=True)`.

**Impact:** If the ENA API were ever compromised or its response tampered with, the code would fetch from arbitrary URLs and write content to disk. This is a defense-in-depth concern — the ENA query itself is over HTTPS (line 314), so MITM is not trivial.

**Fix:** Validate that download URLs match expected ENA domains (e.g., `*.ebi.ac.uk`, `ftp.sra.ebi.ac.uk`) before fetching.

## Issue 3: Missing URL Encoding on API Parameters (Informational)

**File:** `bio-research/skills/nextflow-development/scripts/utils/ncbi_utils.py` (lines 99, 156, 212, 314)

User-supplied `geo_id` is interpolated into API URLs without `urllib.parse.quote()`:

```python
search_url = f"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=gds&term={geo_id}[Accession]&retmode=json"
```

Since this is a CLI tool where the user provides their own arguments, this is not exploitable in practice — but URL encoding is good hygiene.

## What's NOT a vulnerability (correcting our original report)

- **Output path (`--output`)**: Our original report claimed this was "arbitrary file write." It's not — this is a CLI tool where the user supplies their own arguments. Normal CLI behavior, not a security issue.
- **Compound attack scenario**: Our original report chained HTTPS MITM + CLI argument control. This was unrealistic — each link requires conditions that make the chain implausible.

## Suggested Fixes

1. **Line 343**: Change `f"http://{url}"` to `f"https://{url}"` (simplest, highest impact)
2. **Lines 338-344**: Add domain allowlist check before downloading
3. **Lines 99, 156, 212, 314**: Use `urllib.parse.quote()` for geo_id/accession in URLs

## Secure Patterns Already in Use (Credit)

- ✅ ENA API query is over HTTPS (line 314)
- ✅ `yaml.safe_load()` used correctly
- ✅ `subprocess.run()` uses list format, not `shell=True`
- ✅ No hardcoded secrets
- ✅ NCBI rate limiting properly enforced

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security] SSRF + path traversal chain in bio-research ncbi_utils.py and sra_geo_fetch.py #166

Description

Issue 1: HTTP Protocol Downgrade on FASTQ Downloads (Medium)

Issue 2: No Domain Validation on Download URLs (Low)

Issue 3: Missing URL Encoding on API Parameters (Informational)

What's NOT a vulnerability (correcting our original report)

Suggested Fixes

Secure Patterns Already in Use (Credit)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Security] SSRF + path traversal chain in bio-research ncbi_utils.py and sra_geo_fetch.py #166

Description

Description

Issue 1: HTTP Protocol Downgrade on FASTQ Downloads (Medium)

Issue 2: No Domain Validation on Download URLs (Low)

Issue 3: Missing URL Encoding on API Parameters (Informational)

What's NOT a vulnerability (correcting our original report)

Suggested Fixes

Secure Patterns Already in Use (Credit)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions