Hi Ziye Wang,
I’m running comebin on a large metagenomic dataset and I encountered two issues:
1) Training becomes extremely slow / appears stuck
The program becomes very slow at the training step and looks like it is sleeping.
From strace, I can see the following line:
409789 S futex_wait_queue_me python python main.py train --data /beegfs/home/lxd/2_sea_metagenomics/binning/comebin/comebin_output/MC01-25/data_augmentation --temperature 0.15 --emb_szs_forcov 2048 --batch_size 1024 --emb_szs 2048 --n_views 6 --add_model_for_coverage --output_path /beegfs/home/lxd/2_sea_metagenomics/binning/comebin/comebin_output/MC01-25/comebin_res --earlystop --addvars --vars_sqrt --num_threads 31
It stays in this status for a very long time.
2) Missing final output folder comebin_res_bins
I have many completed runs (no error messages), but in the final output directory I only get:
weight_seed_kmeans_k_363_result.tsv_bins
However, I do NOT get the expected folder:
So I’m not sure whether the run actually finished successfully or whether some final binning step was skipped.
Dataset information
My dataset is very large:
One Illumina metagenomic sample has ~100 Gbp of sequencing data.
Command used
I ran comebin using:
run_comebin.sh -a ./fna_1kbp/${name}.fna \
-o ./comebin_output/${name} \
-p ./bam/${name} \
-t 31
Questions
- Is it normal for
main.py train to become extremely slow / show futex_wait_queue_me (sleep) on very large datasets?
- What is the expected difference between
weight_seed_kmeans_k_363_result.tsv_bins and comebin_res_bins?
- Under what conditions would
comebin_res_bins not be generated even if no errors are reported?
- Are there recommended parameters or workflow adjustments for very large datasets (e.g., 100 Gbp per sample)?
Thanks a lot for your help!
Hi Ziye Wang,
I’m running comebin on a large metagenomic dataset and I encountered two issues:
1) Training becomes extremely slow / appears stuck
The program becomes very slow at the training step and looks like it is sleeping.
From
strace, I can see the following line:It stays in this status for a very long time.
2) Missing final output folder
comebin_res_binsI have many completed runs (no error messages), but in the final output directory I only get:
weight_seed_kmeans_k_363_result.tsv_binsHowever, I do NOT get the expected folder:
comebin_res_binsSo I’m not sure whether the run actually finished successfully or whether some final binning step was skipped.
Dataset information
My dataset is very large:
One Illumina metagenomic sample has ~100 Gbp of sequencing data.
Command used
I ran comebin using:
Questions
main.py trainto become extremely slow / showfutex_wait_queue_me(sleep) on very large datasets?weight_seed_kmeans_k_363_result.tsv_binsandcomebin_res_bins?comebin_res_binsnot be generated even if no errors are reported?Thanks a lot for your help!