FADU: A quantification tool for prokaryotic transcriptomic analyses
Dunning, Hotopp, J.C.
PublisherAmerican Society for Microbiology
MetadataShow full item record
AbstractQuantification tools for RNA sequencing (RNA-Seq) analyses are often designed and tested using human transcriptomics data sets, in which full-length transcript sequences are well annotated. For prokaryotic transcriptomics experiments, full-length transcript sequences are seldom known, and coding sequences must instead be used for quantification steps in RNA-Seq analyses. However, operons confound accurate quantification of coding sequences since a single transcript does not necessarily equate to a single gene. Here, we introduce FADU (Feature Aggregate Depth Utility), a quantification tool designed specifically for prokaryotic RNA-Seq analyses. FADU assigns partial count values proportional to the length of the fragment overlapping the target feature. To assess the ability of FADU to quantify genes in prokaryotic transcriptomics analyses, we compared its performance to those of eXpress, featureCounts, HTSeq, kallisto, and Salmon across three paired-end read data sets of (i) Ehrlichia chaffeensis, (ii) Escherichia coli, and (iii) the Wolbachia endosymbiont wBm. Across each of the three data sets, we find that FADU can more accurately quantify operonic genes by deriving proportional counts for multigene fragments within operons. FADU is available at https://github.com/IGS/FADU. IMPORTANCE Most currently available quantification tools for transcriptomics analyses have been designed for human data sets, in which full-length transcript sequences, including the untranslated regions, are well annotated. In most prokaryotic systems, full-length transcript sequences have yet to be characterized, leading to prokaryotic transcriptomics analyses being performed based on only the coding sequences. In contrast to eukaryotes, prokaryotes contain polycistronic transcripts, and when genes are quantified based on coding sequences instead of transcript sequences, this leads to an increased abundance of improperly assigned ambiguous multigene fragments, specifically those mapping to multiple genes in operons. Here, we describe FADU, a quantification tool for prokaryotic RNA-Seq analyses designed to assign proportional counts with the purpose of better quantifying operonic genes while minimizing the pitfalls associated with improperly assigning fragment counts from ambiguous transcripts. Copyright Copyright 2021 Chung et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license.
SponsorsThis project was funded by federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under grant number U19AI110820.
Identifier to cite or link to this itemhttp://hdl.handle.net/10713/15259