This project evaluates the performance of Random Forest and Logistic Regression models for gene sequence classification. Visualizations were developed to illustrate sequence length distributions, k-mer frequency proportions, and feature importance, providing insights into model performance and key classification features.
Mirza Ahmadi
Phoenix Armstrong, who enhanced the script by implementing a data filtration function, implementing a plotting function for lollipop plots, and increasing code efficiency across the RMarkdown script.
Dilma Karunathilake and Fatemeh Asgarian, who provided valuable feedback and ideas.