Home
Team
Connect
Impact
  • Overview
  • Society
  • Outreach
VPRE 1.0
  • Click to use VPRE
  • Concept
  • Biological Context
  • VPRE's Process
  • Results
  • Discussion
  • Future Directions
VPRE 2.0
Home
Team
Connect
Impact
  • Overview
  • Society
  • Outreach
VPRE 1.0
  • Click to use VPRE
  • Concept
  • Biological Context
  • VPRE's Process
  • Results
  • Discussion
  • Future Directions
VPRE 2.0
More
  • Home
  • Team
  • Connect
  • Impact
    • Overview
    • Society
    • Outreach
  • VPRE 1.0
    • Click to use VPRE
    • Concept
    • Biological Context
    • VPRE's Process
    • Results
    • Discussion
    • Future Directions
  • VPRE 2.0
  • Home
  • Team
  • Connect
  • Impact
  • VPRE 1.0
  • VPRE 2.0

How does VPRE work?

VPRE's workflow looks something like this:

  1. Data collection of viral sequences
  2. Bioinformatic analyses 
  3. Variational autoencoder to compress sequences into numerical variables
  4. Gaussian process regression to model evolutionary trajectory
  5. Assess validity and fitness of predicted sequences


Let's delve into the each step.

What is machine learning and deep learning?

Machine learning is when computers learn from experience and data without human involvement.

In other words, machine learning algorithms are able to improve automatically through experience.
Machine learning can sense evolution as well. For example, text auto-generation will change based on the owner’s language style. 


Deep learning is a subset of machine learning.

Deep learning uses artificial neural networks - algorithms inspired by the workings of the human brain - to learn from large amounts of data.    

We use deep learning in our daily lives. Functions like sentence autocomplete in Gmail, Google Translate, self-driving cars, image recognition, and many others all incorporate deep learning.

Okay, how about Variational AutoEncoders (VAEs)?

VAEs are deep learning models capable of generating meaningful latent spaces for image and text data.

 A latent space is a lower-dimensional representation of compressed data in which similar data points are closer together in space. 

We can train neural networks to learn from their mistakes.

Just like our own brains, neural networks can get better and better at doing something, when you train them with data to practice on.

What is training?

Training is similar to the way we learn new knowledge by doing practice exams. After we do a practice question, we check our answer against the correct answer, and reflect on what we did right or wrong, thus improving our understanding of a piece of knowledge.

In essence, the training process is for fine-tuning equations in our deep learning model to fit the data in the training set.

The machine learning model is given a set of “practice questions” which is called a “training dataset”. For each item in the training dataset, the model goes through a set of mathematical equations and then produces an output. It then compares the output with the “correct answer” and reflects on the mathematical equations that it went through, and tries to adjust and see if the adjustment improves the correctness of the output. Therefore, the performance of the model is highly dependent on the quality of the training set, just like how we need good practice exams to test our knowledge.

After training, we need to validate the model by feeding it an input that it hasn’t seen before, and see if it can produce a correct output. Pretty much like a “final exam” for the model.

   

Implementing evolution into deep learning

How do we train VPRE?

Our training data consists of thousands of existing spike protein gene sequences.

We would like to ask our computers to imitate the ones in the training set using deep learning algorithms, and simulate other possible spike protein gene sequences.

Complementing deep learning with statistics

How can we be confident in our predictions?

We are implementing a statistical model to corroborate the results of our deep learning simulations.

Statistical models are often used for modeling evolution, and provide us with a tried and true method of generating likely mutations in viral spike proteins. These predictions are accompanied by the likelihood of their occurrence, and will be used to complement results from our deep learning algorithm.  

What is a Markov Model?

A common statistical model that uses past data to extract probabilities of events occurring and uses them to calculate the likelihood of future events occurring.

For VPRE, the events we want to model are mutations in the spike protein. Using the same training data we used for deep learning, we count all the different kinds of mutations that have occurred in the past at each position of the genetic sequence. For example, we might have fifty A to C mutations in the training data at position 500 in the spike protein. We use these counts to calculate the probability of each type of mutation occurring at each position in the spike protein. 

How does a Markov Model generate predictions?

Using the probabilities of mutation from past data, the model calculates the most likely mutations to occur.

The most likely mutations are applied to current strains to generate the most likely next strains and their associated probabilities of occurring.  These statistical predictions will be used to inform the deep learning model. 


UBC Virosight

  • Connect