Core and accessory genomes

Core and accessory genomes


A living organism, whether unicellar, multicellular or viral, has within its cells or its capsid deoxyribonucleic acid molecules (DNA) or ribonucleic acid (RNA). These molecules are the medium of information genetic. They contain two main types of regions: genes, which encode for one or several proteins, and intergenic regions, which play a role preponderant in the mechanisms of regulation and evolution. All of these data is the genome of an organism.

The genome is therefore a set of words of varying sizes (from a few letters to several billion). It is composed of 4 different letters: A, C, T and G associated with the bases Adenine, Cytosine, Thymine and Guanine. Semantically, we use the term sequence for word and the term base (or base pairs in the context of diploid genome) for letter.

Phylogenetic analysis of genomes belonging to organisms of the same species has identified constant areas within all organisms, and inconsistent areas characteristic of a population or even a strain. Constant areas (which are not necessarily contiguous) form the core genome while the inconstant areas form the accessory genome. The core genome is mainly located at the level of chromosomes. The accessory genome is found in structures such as plasmids, integrated viruses or composite transposons.

It is therefore expected that the core genome of a species is different from another. Nevertheless, the large-scale analysis of available genomes has also showed inconsistencies between taxonomic groups and this genetic information. For example, the organisms of the genus Escherichia and Shigella, have the same core genome. The difference between these two groups resides only at the level of the accessory genome, with the presence of virulence characteristics in Shigella. The same observations are noted with the species constituting the species group Bacillus cereus. Bacillus anthracis, an etiological agent of anthrax, is differentiated by the presence of two virulence plasmids, essential for pathogenicity. Some species of Bacillus cereus can also acquire one of the two plasmids and cause the appearance of symptoms reminiscent of anthrax with a moderate degree. These cases have been observed in monkeys and the bacteria have been wrongly qualified to belong to a new species: Bacillus pseudoanthracis. It is better to consider all organisms of the species group Bacillus cereus as belonging to a same species, itself composed of several populations: anthracis, cereus, thuriengiensis, etc.

These examples pose the recurring problem in Biology of the definition of the species. Define one taxon relative to another according to the absence of a character is a widespread and disputed error. During decades, bacteria have been defined as a set of organisms lacking a cell nucleus (as opposed to eukaryotes). Analyses genetic studies (performed on the 16S and 18S ribosomal RNA gene) show that there are actually two monophyletic groups (sharing one common ancestor) within the prokaryotes. Today, there are three domains: Eukaryotes, Bacteria and Archaea, each with a set of specific characters (nucleus, wall, cholesterol, etc.). In multicellulars, the case of reptiles is also famous. They have been grouped together because they are neither mammals or birds (lack of hair, feathers, homeothermia). However, turtles or crocodiles belong to distinct groups and birds are the ancestors of dinosaurs.

In the microbial fermentation industry, knowledge of the genome accessory of interest strains is a real asset in mastering the process and its improvement. It is thus possible to valorize strains with the faculty of produce certain organoleptic molecules or study the differences with the competition, in order to establish a successful segmentation strategy.