Automated Classification of Genetic Mutations in Cancer using Machine Learning
DOI:
https://doi.org/10.3126/sra.v1i1.60140Keywords:
Cancer Cell Line Encyclopedia, Chromatography-mass spectrometry, Genomic, preclinical models, drug sensitivityAbstract
Efforts to decipher the genomic data of cancer and its implications for treatment face challenges. Robust preclinical models reflecting human cancer's genomic diversity, along with comprehensive genetic and pharmacological annotations, can greatly aid in this endeavor. Large collections of cancer cell lines effectively capture the genomic diversity and provide valuable insights into the response to anti-cancer drugs. In this study, we demonstrate significant agreement and biological consistency between drug sensitivity measurements and their corresponding genomic predictors from two publicly available pharmacogenomics databases: The Cancer Cell Line Encyclopedia and the Genomics of Drug Sensitivity in Cancer. Despite ongoing efforts to identify cancer-related metabolic changes that may reveal vulnerabilities to targeted drugs, systematic evaluations of metabolism in relation to functional genomics features and associated dependencies are still uncommon. To gain further insights into the metabolic diversity of cancer, we analyzed 225 metabolites in 928 cell lines representing over 20 cancer types using liquid chromatography-mass spectrometry (LC-MS) in the Cancer Cell Line Encyclopedia (CCLE). The analysis revealed missing data for various features, with certain percentages exceeding 40%, leading to the removal of 12 features according to standard procedures. Further analysis revealed 25 unique chromosomes and 4 unique Variant_Types in the dataset. Model performance assessment showed an accuracy score of 96% using a logistic regression model.