On some occasions, CheckM's broad marker set does not encompass all species we wish to analyze, so it is necessary to create a customized dataset. As I am new to using CheckM, I am unsure of how to select the appropriate gene markers for the species I am studying. My idea is to construct a pangenome with complete reference genomes, identify the core genes of single-copy, align them, and build hmm profiles using hmmbuild. Finally, follow CheckM's documentation workflow. For example, for species X, I obtained a core-genome of ~1800 genes. I wonder if it is necessary to use all of these genes and if it is acceptable to include other species in the pangenome analysis to obtain a set of differentiating genes.
Can you please help me with this question? Thank you in advance.
Benjamin