This repository contains various notes on using ML tools for protein structure, engineering & design, property prediction, and related topics. It is not intended to serve as introductory material. These notes are not comprehensive and may have errors. If you find any errors, or would like to contribute something you feel is missing, please contact me on [GitHub](https://www.github.com/delalamo) or on [LinkedIn](https://www.linkedin.com/in/ddelalamo/).
The contents of this site are protected by a [GNU Free Documentation License](https://www.gnu.org/licenses/fdl-1.3.html). Please reach out if you are interested in copying or repurposing pages from this repository.
[[Reading list|Link to reading list]]
### Random notes of interest
* [[Protein property prediction using PLMs does not benefit from scale except when predicting inferring features of either structural or sparsely populated sequence families]]
* [[Conformational entropy in antibodies decreases during affinity maturation]]
* [[Protein structure prediction methods are unable to predict the energetics of a conformational landscape unless explicitly trained for that purpose]]
* [[Protein backbones designed using diffusion, but not sequence-based models, have fewer beta sheets]]
* [[Structure-based methods outperform sequence-based methods on protein stability prediction of point mutants, but not full sequences]]
* [[Alternate conformations can be sampled with MSA-based structure prediction methods using custom PDB databases and subsampled MSAs]]
### Recently added or modified
* [[NGS sequence abundance does not correlate with binding affinity]]
* [[A greater proportion of viral sequences are closely related to one another than prokaryotic and eukaryotic sequences, and these sequences get removed during clustering]]
* [[Protein property prediction using PLMs does not benefit from scale except when predicting inferring features of either structural or sparsely populated sequence families]]
* [[Viral protein MSAs fail to recapitulate contacts from evolutionary coupling analyses]]
* [[Number of sequences with 90% homology is a greater predictor of viral variant effect prediction than alignment depth]]
* [[Logistic regression outperforms fine-tuned LMs on finding point mutations from NGS data]]
* [[Precision decreases and recall increases as variant effect prediction focuses on top-performing point mutations]]
* [[Language models can be infused with structure via low-rank adapter layers]]
* [[Adding structural adaptors to language models leads to improvements in thermostability prediction compared to structure-based NNs alone]]
* [[Synthetic MSAs outperform single-sequence inference in MSA-based structure prediction]]
* [[Sequences with lower log-likelihoods are worse for zero-shot variant effect prediction using PLMs]]
* [[Variant effect prediction with MSA-based PLMs improves with ensembling of multiple prompts]]
* [[The confidence metrics of AlphaFold2 are better calibrated than those of AlphaFold3]]
* [[Diffusion-based protein structure prediction methods double as energy methods comparable to traditional force fields]]
* [[Affinity maturation also selects for lower self-association]]
* [[Broadly neutralizing antibodies are more evolutionarily distant from germline sequences than less polyspecific antibodies]]
* [[Affinity-specificity tradeoff in antibodies is partially mediated by varying the magnitude of rigidification]]
### Antibody notes
* Structure
* [[Complementarity-determining regions|CDRs]]
* [[Framework region]]
* Formats
* [[Antibodies]]
* [[Fab|Fabs]]
* [[Single chain variable fragments|Single chain Fvs]]
* [[Nanobodies]] (AKA VHHs)
* Developability
* [[Developability]]
* [[Antibody glycosylation|Glycosylation]]
* [[Antibody humanization|Humanization]]
* Property prediction
* [[Antibody structure prediction]]
* [[Antibody language models]]
* [[Antibody-antigen binding affinity prediction]]
* Miscellaneous
* [[Affinity maturation]]
* [[Somatic hypermutation]]
### Structural modeling
* [[MD simulations]]
* [[Structure prediction|Protein structure prediction]]
### Protein engineering notes
* [[Ancestral sequence reconstruction|Ancestral sequence reconstruction]]
* [[Directed evolution|Directed evolution]] (related: [[Epistasis]])
* Property prediction
* [[Fitness prediction|Fitness prediction]]
* [[Variant effect prediction|Variant effect prediction]]
* [[Function prediction|Protein function prediction]]
* [[Stability and thermostability|Stability and thermostability prediction]]
* Design
* [[Inverse folding|Inverse folding]]
* [[Inversion of protein folding neural networks|Inversion of protein folding neural networks]]
* [[Protein backbone design|Protein backbone design]]
* Miscellaneous
* [[Heterodimerization domains]]
* [[Engineered trimerization domains]]
### ML notes
* [[Transformer]]
* [[Low-rank Adaptation]]
* [[Protein language models|Protein language models]]
* [[Contrastive learning]]
* [[Diffusion models]]
* [[Energy-based models]]
### Miscellaneous
* [[Evolution and natural selection]]
* [[Protein folding]]
* [[Protein dynamics]]
* [[Protein-protein interactions]]