Gaining A Complete Picture Of The Genome

Neil Ward explains the transformative opportunities created by accessing the epigenome, and the advantages of native long-read sequencing over synthetic short-read linkages.

Genomic sequencing promises to advance our biological understanding of all life. With access to the full picture of genetic variation, scientists can rapidly progress vital research – whether treating diseases in humans, capturing biodiversity or breeding crops more resistant to drought. But to fully realise the potential of genomics, scientists and researchers need a complete and highly accurate view of all genomes.

Long-read genomic sequencing has already established itself as a ‘powerful player’ that has increased our understanding of DNA and RNA variation, structure and organisation. Long-read sequencing has enabled researchers to fill many of the blind spots unseen by short-read sequencing, which is limited to read lengths of just a few hundred bases. However, taking our understanding of genetic variation to the next level requires accessing a new layer of genetic information – the epigenome.

Epigenetic changes are modifications to DNA that regulate the expression of genes without changing the sequence of codons. Methylation is a type of epigenetic change that involves methyl groups being added to the DNA – this process is key to whether or not a genetic trait is expressed and how they determine biological function in both health and disease. Until now, multiple tests have been required to evaluate methylation, but to streamline laboratory workflows and accelerate research, it’s increasingly valuable to capture both genetic and epigenetic variation automatically in a single experiment.

A New Layer Of Insight

The epigenome has largely been left unexplored due to fundamental limitations of many existing sequencing technologies. Since epigenetic modifications do not change the order of genetic code, traditional short-read sequencing methods do not pick up on them as variations. Understanding subtle patterns in this rich information will uncover important new opportunities in a broad range of applications across human, plant and animal biology.

For example, researchers at John Hopkins University found that by using long-read whole genome sequencing, they were able to get the most accurate and complete view of tomato and maize genomes and epigenomes, unlocking insights that are out of reach with short-reads. The researchers reported that gaining these insights from a single datatype, rather than results from multiple tests, presented an unmatched opportunity for analysis as well as streamlining their workflows.

Synthetic Long-Reads – A Puzzle That Doesn’t Fit Together

Over the past decade, attempts have been made to synthetically reconstruct longer-molecule sequences by pasting together multiple partial reads using short-read technologies. However, these ‘linked reads’ still carry the accuracy issues of their short-read origins. Since short-reads are too short to detect more than 70% of human genome structural variation, it is possible that when pieced together, these short sequences paint an incorrect and incomplete picture of gene expression.

One close comparison of native long-read and synthetic linked long-read sequencing found that a tandem repeat insertion was resolved as homozygous (identical pairs of genes for a specific trait) by the native long-read method but was falsely resolved as heterozygous (different genes) by the synthetic long-read, since it had been pieced together without the complete sequence. Mistaking homozygous for heterozygous leads to incorrect experiment results and in a clinical setting can cause misdiagnosis, so it’s vital scientists get these readings correct.

There are several reasons why native long-reads have higher quality and accuracy compared to synthetic long-reads, the first being molecular integrity. When using long-read, whole strands of DNA molecules are extracted from cells and directly submitted to sequencing without being broken and pieced back together. This maintains molecular integrity, so no sequences are missed out, and you have an accurate and complete reading.

Simpler sample preparation steps is another reason. Long-read sequencing does not require any DNA amplification or other sequence-altering molecular biology procedures. These additional steps are prone to biases, the introduction of errors and fragmentation, so eliminating them further increases the accuracy of long-read over synthetic counterparts.

Simpler workflows is the third reason. Simpler extraction and preparation reduces the workload for the scientist. Unlike linked short-reads, native long-reads do not require bioinformatics to work out which fragments originate from the same molecule, complicated assembly steps, or correction of errors introduced during sample prep. This further reduces the chance of error and accelerates research for scientists in the lab.

A New Standard Of Sequencing

Highly accurate long-read sequencing has already proved it can make transformative contributions to our understanding of biology and give us a much deeper insight into genomes than traditional short-read technology. Adding the ability to also screen the epigenome is the next step in the genomics research journey. Capturing both genetic and epigenetic variation across the full genome in a single experiment will speed up sample preparation without compromising on read lengths, accuracy, or completeness. Only with this advanced sequencing method can scientists confidently advance their genomics projects. 

Neil Ward is general manager of PacBio

Recent Issues