Distinguishing potential bacteria-tumor associations from contamination in a secondary data analysis of public cancer genome sequence data
PublisherBioMed Central Ltd.
MetadataShow full item record
AbstractBackground: A variety of bacteria are known to influence carcinogenesis. Therefore, we sought to investigate if publicly available whole genome and whole transcriptome sequencing data generated by large public cancer genome efforts, like The Cancer Genome Atlas (TCGA), could be used to identify bacteria associated with cancer. The Burrows-Wheeler aligner (BWA) was used to align a subset of Illumina paired-end sequencing data from TCGA to the human reference genome and all complete bacterial genomes in the RefSeq database in an effort to identify bacterial read pairs from the microbiome. Results: Through careful consideration of all of the bacterial taxa present in the cancer types investigated, their relative abundance, and batch effects, we were able to identify some read pairs from certain taxa as likely resulting from contamination. In particular, the presence of Mycobacterium tuberculosis complex in the ovarian serous cystadenocarcinoma (OV) and glioblastoma multiforme (GBM) samples was correlated with the sequencing center of the samples. Additionally, there was a correlation between the presence of Ralstonia spp. and two specific plates of acute myeloid leukemia (AML) samples. At the end, associations remained between Pseudomonas-like and Acinetobacter-like read pairs in AML, and Pseudomonas-like read pairs in stomach adenocarcinoma (STAD) that could not be explained through batch effects or systematic contamination as seen in other samples. Conclusions: This approach suggests that it is possible to identify bacteria that may be present in human tumor samples from public genome sequencing data that can be examined further experimentally. More weight should be given to this approach in the future when bacterial associations with diseases are suspected. Copyright The Author(s) 2017.
SponsorsThis work was funded by the National Institutes of Health through the NIH Director's New Innovator Award Program (1-DP2-OD007372) and an NIH Director's Transformative Research Award (1-R01-CA206188).
Acute myeloid leukemia
Identifier to cite or link to this itemhttps://www.scopus.com/inward/record.uri?eid=2-s2.0-85015972930&doi=10.1186%2fs40168-016-0224-8&partnerID=40&md5=b2c1c0c74c413abe2b8ab38900502f32; http://hdl.handle.net/10713/9894
- Bacteria-human somatic cell lateral gene transfer is enriched in cancer samples.
- Authors: Riley DR, Sieber KB, Robinson KM, White JR, Ganesan A, Nourbakhsh S, Dunning Hotopp JC
- Issue date: 2013
- Molecular characterization of serous ovarian carcinoma using a multigene next generation sequencing cancer panel approach.
- Authors: Ab Mutalib NS, Syafruddin SE, Md Zain RR, Mohd Dali AZ, Mohd Yunos RI, Saidin S, Jamal R, Mokhtar NM
- Issue date: 2014 Nov 17
- Involvement of DPP9 in gene fusions in serous ovarian carcinoma.
- Authors: Smebye ML, Agostini A, Johannessen B, Thorsen J, Davidson B, Tropé CG, Heim S, Skotheim RI, Micci F
- Issue date: 2017 Sep 11
- Possible Human Papillomavirus 38 Contamination of Endometrial Cancer RNA Sequencing Samples in The Cancer Genome Atlas Database.
- Authors: Kazemian M, Ren M, Lin JX, Liao W, Spolski R, Leonard WJ
- Issue date: 2015 Sep
- STAT3 polymorphisms may predict an unfavorable response to first-line platinum-based therapy for women with advanced serous epithelial ovarian cancer.
- Authors: Permuth-Wey J, Fulp WJ, Reid BM, Chen Z, Georgeades C, Cheng JQ, Magliocco A, Chen DT, Lancaster JM
- Issue date: 2016 Feb 1