Python implementation of a lossless audio codec. The system uses Linear Predictive Coding (LPC) to mathematically model the signal spectrally and Rice Coding (Golomb-Rice) with dynamic parameter adaptation to compress the residual.
The project implements an audio compressor from scratch that achieves compression ratios similar to FLAC. The system takes advantage of parallel processing (multiprocessing) to speed up both encoding and decoding, utilizing all available CPU cores.
- Pre-processing (Mid-Side): In stereo signals, it transforms from L/R to Mid/Side to improve compression.
- Framing: Segmentation into frames (default 4096 samples).
- LPC Analysis (Levinson-Durbin): For each frame, autocorrelation is calculated and the Levinson-Durbin algorithm is used to find the optimal coefficients of the predictor filter (configurable order, default 12).
-
Residual Calculation: The current signal
$\hat{x}[n]$ is predicted through a linear combination of past samples and the LPC coefficients. The difference with the actual signal is the residual:$$e[n] = x[n] - \text{round}(\hat{x}[n])$$ -
Entropy Coding (Rice):
-
Zigzag Encoding: Converts the residual (signed) to positive integers to optimize coding (
$0 \to 0, -1 \to 1, 1 \to 2...$ ). -
K Estimation: The optimal parameter
$k$ is found by testing values in a range (0-16) and selecting the one that minimizes the total bit size of the encoded frame. - Rice Coding: The compressed bitstream is generated by separating the value into quotient (unary) and remainder (binary).
-
Zigzag Encoding: Converts the residual (signed) to positive integers to optimize coding (
-
Packaging: A binary file (
.pyflac) is saved containing the global headers, and for each frame: its metadata ($k$ , padding, length), the LPC coefficients, and the compressed bitstream.
-
Reading Frames: The
$k$ parameters and the LPC coefficients of each block are extracted. -
Rice and Zigzag Decoding: The original residual
$e[n]$ is recovered. -
LPC Synthesis: The signal is reconstructed by adding the residual to the prediction generated by the recovered coefficients:
$$x[n] = e[n] + \text{round}(\hat{x}[n])$$ - Stereo Reconstruction: If Mid-Side was used, the original Left/Right channels are recovered from Mid/Side.
| File | Description |
|---|---|
pyFlac_encoder.py |
Main Encoder. Parallelized version that processes frames simultaneously. Generates encoded.pyflac. |
pyFlac_decoder.py |
Main Decoder. Reads encoded.pyflac and reconstructs the audio into Decoded_Audio.wav. |
compare.py |
Utility script to validate lossless compression (bit-exact). |
legacy/ |
Folder with previous versions of the project. |
The system allows adjusting the following parameters in the code:
- FRAME_SIZE: Size of the analysis window (Default: 4096 samples). Larger windows can improve compression in stable signals, but worsen it in fast transients.
- ORDER: LPC filter order (Default: 12). A higher order better models the spectral envelope but requires saving more coefficients per frame.
The project uses Python 3 and the following libraries:
pip install numpy scipy