Skip to content

GDTools COMPARE TSV output headers can vary #395

Description

@jvera888

Hi!

Breseq is a great tool! thank you for all your work.

I'm not sure if this reported behavior is a bug or a feature, so this is a question with a potential bug report.

question/issue: When using using GDTools COMPARE on multiple samples to produce TSV file output, I will usually get the same (40) headers, but occasionally get a different number. Some headers go missing (e.g. 'new_seq'), while new ones appear (e.g. 'repeat_length'). This behavior is somewhat unpredictable (i.e. does not always correspond to types of mutations present; lots of other headers always appear even when column is empty) and thus perhaps is unintended? It definitely makes parsing the table problematic. Here are a couple of different header row examples:

aa_new_seq	aa_position	aa_ref_seq	clone	codon_new_seq	codon_number	codon_position	codon_position_is_indeterminate	codon_ref_seq	gene_name	gene_position	gene_product	gene_strand	genes_inactivated	genes_overlapping	genes_promoter	locus_tag	locus_tags_inactivated	locus_tags_overlapping	locus_tags_promoter	mutation_category	mutator_status	new_read_count	new_read_count_basis	population	position	position_end	position_start	ref_read_count	ref_read_count_basis	ref_seq	repeat_length	repeat_new_copies	repeat_ref_copies	repeat_seq	seq_id	size	snp_type	time	title	transl_table	treatment	type
									PM_10340/PM_10350	intergenic (+150/+5)	ester cyclase/hypothetical protein	>/<				PM_10340/PM_10350				small_indel		12	1		964623	964663	964623	5	2	41-bp					PM_chr1	41		-1	output			DEL
									PM39400_25310	coding (99-107/837 nt)	DNA-directed RNA polymerase beta subunit	>	PM39400_25310			PM39400_25310	PM_25310			small_indel		79	1		2361336	2361344	2361336	11	2	AGTAGCCCC	9	2	3	AGTAGCCCC	PM_chr1	9		-1	output			DEL
aa_new_seq	aa_position	aa_ref_seq	clone	codon_new_seq	codon_number	codon_position	codon_position_is_indeterminate	codon_ref_seq	gene_name	gene_position	gene_product	gene_strand	genes_inactivated	genes_overlapping	genes_promoter	locus_tag	locus_tags_inactivated	locus_tags_overlapping	locus_tags_promoter	mutation_category	mutator_status	new_read_count	new_read_count_basis	new_seq	population	position	position_end	position_start	ref_read_count	ref_read_count_basis	ref_seq	seq_id	size	snp_type	time	title	transl_table	treatment	type
T	263	T		ACG	263	3		ACA	PM39400_71000	789	IS200/IS605 family transposase ISBce3	<		PM39400_71000		PM39400_71000		PM39400_71000		snp_synonymous		9	1	C		382	382	382	0	1	T	PM39400_pla12		synonymous	-1	PM39400	1		SNP
									PM39400_10340/PM39400_10350	intergenic (+150/+5)	ester cyclase/hypothetical protein	>/<				PM39400_10340/PM39400_10350				small_indel		10	1			964623	964663	964623	1	2	41-bp	PM39400_chr1	41		-1	PE39400-G4			DEL

Thanks for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions