Skip to content

YouTube downloads: M3U/HLS saved as .mp4, and proposal for a Kotlin downloader rewrite #63

@Priveetee

Description

@Priveetee

Hi @InfinityLoop1308,

Following up on our discussion in PipePipeExtractor#62, where you mentioned that the legacy NewPipe YouTube downloader is hard to maintain and that an FFmpeg-like direction may be the correct way, I started investigating the YouTube download issue reported in PipePipe#2288.

I would like your opinion before implementing something large.

What I found

I reproduced one real bad output in PipePipe Debug:

  • YouTube video: Rick Astley - Never Gonna Give You Up
  • The output filename ended with .mp4
  • The file was only 57053 bytes
  • file identified it as M3U playlist, ASCII text
  • ffprobe failed with moov atom not found
  • So PipePipe saved playlist/manifest text as an MP4 file

I also added local debug instrumentation to the client downloader and captured a successful control run for the same video.

In the successful control run:

  • The selected video stream was itag 136, avc1.4d401f, 720p
  • The selected audio stream was itag 140, mp4a.40.2
  • Both streams were exposed as direct URLs
  • HTTP returned Content-Type: video/mp4 and Content-Type: audio/mp4
  • Range requests returned 206
  • Mp4FromDashMuxer completed successfully
  • The final MP4 was valid

So the current evidence does not point to a universal MP4 muxer failure. It looks more like some YouTube response/session variant can expose a playlist/manifest resource where the client expects direct media.

Current hypothesis

There are probably two separate problems:

  • The Extractor/client boundary does not make the stream type strict enough for downloads.
  • The client downloader does not have a strong preflight layer to reject M3U/HLS/text before saving/postprocessing as .mp4.

A minimal fix would be to detect Content-Type or #EXTM3U and fail cleanly instead of saving a corrupt MP4.

But since the legacy YouTube downloader path is hard to maintain anyway, I would prefer to use this as a starting point for a more structured rewrite.

Proposed rewrite direction

I am thinking about a progressive Kotlin rewrite of the download stack, not a big-bang replacement.

The idea would be:

  • Keep the current UI and service entry points at first
  • Introduce typed models like DownloadRequest, DownloadPlan, PlannedResource, and ProbeResult
  • Add a real probe layer for redirects, cookies, headers, Content-Type, range support, length, and manifest detection
  • Replace the current range downloader with a stricter resumable block downloader
  • Use temporary files and atomic finalization instead of writing directly to the final output when possible
  • Make muxing explicit through a Muxer abstraction
  • Decide per format whether to use existing internal muxers, FFmpeg/libavformat, or another backend
  • Add proper HLS/M3U support instead of accidentally treating playlists as direct media

In TypeType-Downloader I used a different design: direct media streams are downloaded separately with strict parallel range validation, then audio/video are remuxed through libavformat using stream copy. That is server-side Go, so it cannot be copied directly to Android, but I think the architecture idea may apply.

Questions

Would you prefer a small targeted fix first, or are you open to starting a larger downloader rewrite?

If a rewrite is acceptable, what muxing direction would you prefer for PipePipe?

  • Keep existing internal muxers
  • Move toward FFmpeg/libavformat-style remuxing
  • Use a hybrid/pluggable muxer abstraction
  • Something else

For the Extractor/client boundary, how should HLS/M3U-like streams be exposed?

  • Explicitly mark them as HLS/manifest delivery
  • Hide them from normal download choices until the client supports HLS downloads
  • Expose them only if the client declares support
  • Another approach

I am fully open to your guidance here. If you are interested in this direction, I can split the work into small PRs and do my best to maintain it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions