Results

Alignment of a Predicted Sequence

We tested many different parameters (for example training time) while we trained the model. Our best prediction achieved over 99% similarity to a known SARS-CoV-2 spike protein, meaning we confidently mapped our predicted sequence back to a known spike protein.

Shown on the right is a dot plot that compares our prediction to a known spike protein. Both axises represent nucleotide positions. Having a line means the corresponding region on the x-axis matches up with that on the y-axis. We can see our prediction has three fragments that are mapped to the known spike protein. Fragment one matches perfectly to the right position.

The second fragment matches slightly to the back of where we would expect to find it on an actual spike protein. Therefore, we see a gap between fragments 1 and 2.

The third fragment matches to a previous part on the actual spike protein, and thus our prediction is missing that last portion of a spike protein.

We also recognize that our training dataset may not be big enough, such that the predictions it makes follow exactly the same pattern as an actual spike protein (to an extent where its predictions are too identical, or 'overfitted').

Visualizing Our Data & Results

Our Training Data

Alignment of a Predicted Sequence

Composition of a Predicted Sequence

Protein Docking Simulations

Coming soon!