Definition

Genomics is a molecular biology method that consists in decrypting all the genetic information contained in a living being. For this, DNA is extracted from cells or virus, fragmented and sequenced. DNA fragmentation is a necessary step because our sequencing technology does not allow to sequence a complete DNA molecule (which can make from a few thousand to several billion bases). The majority of sequencers provide a size of 500 bp, but there are technologies that can expect fragments of 10 000 bp. As a result of these operations, the biologist is faced with a gigantic puzzle that will have to be solved to obtain the complete genome of the organism studied. This step is called assembly. It is an automated step that can be based on a model (for example a complete genome belonging to the same genus) or perform a blind reconstruction. We will talk about assembly by mapping in the first case or de novo in the second case. The reconstruction of a genome can not to be made from a single molecule of DNA. It is necessary to use between 25 and 75 molecules of DNA to guarantee a correct work. It is the analysis cover. Today, we have 45,168 complete genomes and many projects are under way to rapidly increase our knowledge.

Metagenomics is a method that consists in decrypting all the genomes contained in an environment. This approach is the holy grail of the biologist and will explore in the most comprehensive and comprehensive way the microbial and viral universe of a sample. Isolation and characterization of an organism by the conventional techniques are at the same time very laborious and difficult (even impossible for the so-called non-cultivable organisms or mandatory parasite). In addition, viral biodiversity is still largely underestimated. Knowing that there is more than one human species hundreds of viruses and that the number of cellular species on Earth is estimated at 8.7 million, viral biodiversity could 800 million species. Ecology in the scientific sense of the term still has a good day ahead.

Current technological limits

To illustrate our point, we will take the case of the study of the intestinal microbiome of the human species. This subject is currently booming, because it has been found by metagenetic approaches a significant variation of the microbial flora between healthy and sick individuals (obesity, Crohn's disease, etc.). A fecal sample of the large intestine contains an average of one billion bacteria per gram. For a metagenomic experiment, it is usually used 1 g for the extraction of the DNA. There are therefore at least 1 billion individuals distributed in approximately 500 species. Counting 5 Mb per genome of a species and 50x coverage requires a theoretical analysis depth of 125 Gb (a sequencing run can go up to 400 Gb). However microbial species are not in equal concentrations, so there will be over-sequencing of the majority species and a risk of missing the rare species. If a coverage of 30x (0.15 Gb) is applied to the species present at 1% in the sample, the coverage of the species present at 60% will be of the order of 1800x (~ 9 Gb). To cover at 30x the rarest species for the work of the next publication on the experiment control and applying the same calculation procedure, it would require 187 250 Gb of data on a run. There is currently no machine capable of meeting this demand.

The other limit is bioinformatics. Difficulties can be encountered during assembly when the genome contains a large amount of repeated elements and de novo assembly is always a complex step. In metagenomics, you have to deal with a mixture of genomes. The genome of a species is divided into two parts: the core genome, characteristic of a species, and the accessory genome, characteristic of a population or an individual. In the core genome, some elements are universal of a family or a phylum. Assigning sequences of these areas to the correct genome is not obvious and the risk of genome reconstruction chimerical or consensus is high. Our genetic knowledge is also very limited and the annotation of the genomes will be a laborious step.

Finally, such a bioinformatic analysis requires a computing power and a very important memory capacity to obtain results in decent times. The use of supercomputers like Titan or Tianhe-2 is essential to treat several samples.

The "longline" method

But how do scientists make progress in this theme knowing these limits? There are two strategies. First of all, the complexity of the problem can be decreased. For that, the study is realized on simple environments, constituted of some organisms, like an extreme medium. Otherwise, it is possible to look at some of the metagenome as some biological functions of interest or to the majority and known flora. In the context of obesity, biologists followed the genes related to the absorption of lipids or the metabolism of vitamins. Nevertheless, this approach is not without risk on the interpretation. Generally more than half of the sequences belong to functions unknown. Focusing on known elements can result in a bias that maximizes validation of expected assumptions. We see what we want to see.

For many years, ophthalmologists took for granted that the risk of myopia was strongly linked to the use of proximity vision (reading, television, etc.). However, the alarming increase in cases of myopia on the Asian continent has made it possible to carry out wider studies, taking into account more environmental factors. This more comprehensive approach made it possible to rule out previous findings and to identify that the main environmental factor was child detention and deprivation. of sunlight. Increased recreation time in Singapore City schools has shown a significant impact on this disease.

The current metagenomics makes it possible to draw up trends on the genomes of an environment and the biological functions present. Essential metabolic pathways like the assimilation of iron or the detoxification of heavy metals are very interesting to follow. But we are still in the early stages of this discipline and the efforts of fundamental research remain important.

Biological mapping of the future

We must remain enthusiastic and optimistic about metagenomics. When mastered, it will provide the opportunity to describe an environment in an extremely precise manner. The interactions between the species will be easily characterized (assimilator of carbon, nitrogen, iron, etc.) as well as the monitoring of evolutionary phenomena (hybridization, etc.). It will also detect DNA traces of multicellular species. In agronomy, we can better evaluate the impact of a strategy on the environment and realize a new green revolution. In health and hygiene, we will be able to identify and clearly follow a pathological risk following a variation of the microbial flora.

Metagenomics : applications and perspectives

Definition

Current technological limits

The "longline" method

Biological mapping of the future