2025-08-04
TSS prediction based solely on RNAseq data. Using ML methods (NN+gLM) to embed and predict data.
The project is organized in the following directories:
- src/: scripts that are connected to a GitHub repo
- data/: raw data, cleaned data, etc.
- analysis/: scripts that are not connected to a GitHub repo
- figures/: plots, graphs, etc.
- playground/: for testing purposes
- results/: results of the project
- Student: Arthur Neumann
- Supervisor: Maik Wolfram-Schauerte + Mathias Witte Paz
-
data
Data-related operations. Contains subcommands for cleaning, viewing, and windowing datasets. -
trainNo current implementation, planned for future ML implementations
- Cleans the raw dataset and outputs processed files.
- Usage Example:
python3 main.py data clean
- Previews cleaned or raw datasets in the terminal, with options to save a preview.
- Options:
--dataset {Master,Clean}: Choose which dataset to view (Masterfor raw,Cleanfor processed). Default:Clean.--rows <int>: Number of rows to preview. Default:5000.--save <path>: Save the preview to a.tsvfile at the specified path.
- Usage Example:
python3 main.py data view --dataset Clean --rows 1000 --save /path/to/save
- Performs window-based analysis on the cleaned dataset.
- Options:
--size <int>: Window size in each direction from TSS. Default:1000.--conds <list>: List of conditions to filter by. Default:Control. Choices: Control, nov, rif, tet--strand {+, -, master, all}: Specify strand for windowing. Default: all.--preview <path>: [ Optional ] Path to save preview output.--wiggle_f <path>: [ Optional ] Path to forward wiggle file.--wiggle_b <path>: [ Optional ] Path to backward wiggle file.--output_dir <path>: Output directory for results.
- Usage Example:
python3 main.py data window --size 1000 --conds Control nov --strand + --preview /path/to/preview