Results and Discussion
4 minutes • 646 words
Table of contents
Results
A Substantial Subset of Accessory Genes in E. coli Can Be Predicted Accurately.
The Random Forest approach is stochastic, so we repeated the analysis 100 times, each time splitting the data into training and test sets differently.
Out of the 12,840 accessory genes with unique PAPs analysed, 5,020 were never classified as predictable, 939 were always classified as predictable, 4,395 were classified as predictable in only some analyses (SI Appendix, Fig. S6), and the remaining 2,486 had a D score < 0.
The pangenome as an Ecosystem.
We investigated 3 signature relationships within the E. coli pangenome:
- Putative mutualisms
Here, the joint presence of a pair of genes in genomes is significantly higher than expected from their overall frequency in the dataset.
By far, putative mutualistic coincident relationship is the most frequent.
- Commensalisms
There were 2,073 instances of commensal relationships out of 33,138.
In these relationships:
- one gene, the less abundant of the pair, generally depends on the other.
- the reverse dependency is much weaker or nonexistent.
- Competition
This is the smallest. This is where one gene makes a genome much less hospitable to another.
20 connected components in our graph consist of genes that show both competition and coincident relationships where two coincident gene sets have a reduced probability of being in the same genome at the same time (Fig. 2).
In Fig. 2, we outline a set of PAPs that represent one or more gene families, that predict the presence or absence of at least one other gene family. In the cases outlined, none are plasmid borne.
In addition to being good examples of putative mutualism, commensalism, and competition, the genes that manifest these PAPs are also of translational importance.
For example, PAP E is the “pac” gene.
During a cell’s response to penicillin, the Pac protein catalyses the hydrolysis of penicillin, forming six-aminopenicillanate, which is also important in the manufacture of synthetic penicillins (60, 61).
The presence or absence of the encoding gene is predicted accurately in E. coli genomes using our Random Forest approach.
Using parsimony reconstruction, we estimate that there have been at least 72 changes from present to absent or absent to present for this gene family across the phylogeny.
We also outline the predictive relationships between four other PAPs (Fig. 2 J–M).
This pattern suggests that the relationships between these genes have evolved independently multiple times throughout evolutionary history.
Discussion
A mechanistic explanation for prokaryotic pangenome origin and evolution is emerging (64, 65).
Is gene content evolution predictable? Is within-species evolution constrained by intragenomic forces?
This leaves much of the accessory genome as nonpredictable.
But we have only used a single species pangenome. Will the inclusion of additional species and datasets give additional predictive power?
The pattern of repetition and predictability that we observe across more than 2,044 gene families is compatible with a model of deterministic evolution and more difficult to reconcile with an evolutionary process dominated by contingent events.
Ecosystems are dynamic and resilient and resistant to overall change (75).
This was shown by the E. coli pangenome.
We see a very dynamic system of gene gains and losses; we see repetitive gains of the same cohorts of genes, and we see the establishment of sets of relationships that are persistent through time and across the phylogeny.
Due to the diversity of the E. coli pangenome, each time a gene is recruited to a new genome, it finds itself in a different, and sometimes substantially different, genetic background.
Nonetheless, we observe repeated, predictable patterns of evolution following a gene’s transfer.
It is likely that rewinding the tape back to the start of E. coli evolution would still result in predictable events that are not contingent on those highly unlikely events unique to each replaying of the tape.
It is doubtful that the exact same evolutionary trajectories would play out, but several motifs would emerge over time.