This repository contains various notes on using ML tools for protein structure, engineering & design, property prediction, and related topics. It is not intended to serve as introductory material. These notes are not comprehensive and may have errors. If you find any errors, or would like to contribute something you feel is missing, please contact me on [GitHub](https://www.github.com/delalamo) or on [LinkedIn](https://www.linkedin.com/in/ddelalamo/). The contents of this site are protected by a [GNU Free Documentation License](https://www.gnu.org/licenses/fdl-1.3.html). Please reach out if you are interested in copying or repurposing pages from this repository. [[Reading list|Link to reading list]] ### Random notes of interest * [[Protein property prediction using PLMs does not benefit from scale except when predicting structural features]] * [[Conformational entropy in antibodies decreases during affinity maturation]] * [[Protein structure prediction methods are unable to predict the energetics of a conformational landscape unless explicitly trained for that purpose]] * [[Protein backbones designed using diffusion, but not sequence-based models, have fewer beta sheets]] * [[Structure-based methods outperform sequence-based methods on protein stability prediction of point mutants, but not full sequences]] ### Recently added or modified * [[All-atom structure prediction of RNA is driven by memorization]] * [[NMR ensembles are not thermodynamic ensembles]] * [[Language-based protein folding NNs predict novelty with far lower confidence than MSA-based protein folding NNs]] * [[Protein property prediction using PLMs does not benefit from scale except when predicting structural features]] * [[Larger PLMs generate more novel sequences from more sparsely populated protein families]] * [[Alternate sequence clustering schemes outperform uniform sampling when training protein language models]] * [[On fixed compute budgets, mixture-of-experts models outperform dense models]] * [[Structure ensemble prediction methods are unable to accurately model subtle conformational differences captured by X-ray crystallography]] * [[X-ray density can capture alternate conformations of proteins and their ligands]] * [[Proline content correlates with high unfolding cooperativity but not overall stability]] * [[Unfolding cooperativity is more difficult to predict than protein stability]] * [[Helices can be stabilized by positively-charged residues in final turn]] * [[Compactness is positively correlated with unfolding cooperativity but negatively correlated with stability]] * [[Protein structure is more evolutionarily conserved than dynamics]] * [[pLDDT weakly correlates with intra-family differences in stability but not differences in cooperativity]] * [[Buried nonpolar surface area is a major determinant of whether de novo designed proteins are stable]] * [[Proteins with greater folding stability have more intermediate partially unfolded states]] * [[Cooperative unfolding is not correlated with overall protein stability across proteins with the same fold]] * [[Amide backbone hydrogen bonds lead to greater opening energy]] * [[Mega-scale dataset is inaccurate for DNA-binding domains]] * [[AlphaFold3 universally predicts the active state of ligand-bound GPCRs, even when the ligand is an antagonist]] * [[Cryo-EM particle counting does not reliably provide true Boltzmann distributions of molecules]] * [[Overtrained language models are more difficult to fine-tune]] * [[Cytokines can be added to antibody CDRs]] * [[QM-MM and unbiased MD are insufficient to correctly determine CDRH3 conformation]] * [[pLDDT is inversely correlated with CDRH3 length]] * [[Germline usage determines whether nanobody CDR3 is kinked of extended]] ### Antibody notes * Structure * [[Complementarity-determining regions|CDRs]] * [[Framework region]] * Formats * [[Antibodies]] * [[Fab|Fabs]] * [[Single chain variable fragments|Single chain Fvs]] * [[Nanobodies]] (AKA VHHs) * Developability * [[Developability]] * [[Antibody glycosylation|Glycosylation]] * [[Antibody humanization|Humanization]] * Property prediction * [[Antibody structure prediction]] * [[Antibody language models]] * [[Antibody-antigen binding affinity prediction]] * Miscellaneous * [[Affinity maturation]] * [[Somatic hypermutation]] ### Structural modeling * [[MD simulations]] * [[Structure prediction|Protein structure prediction]] ### Protein engineering notes * [[Ancestral sequence reconstruction|Ancestral sequence reconstruction]] * [[Directed evolution|Directed evolution]] (related: [[Epistasis]]) * Property prediction * [[Fitness prediction|Fitness prediction]] * [[Variant effect prediction|Variant effect prediction]] * [[Function prediction|Protein function prediction]] * [[Stability and thermostability|Stability and thermostability prediction]] * Design * [[Inverse folding|Inverse folding]] * [[Inversion of protein folding neural networks|Inversion of protein folding neural networks]] * [[Protein backbone design|Protein backbone design]] * Miscellaneous * [[Heterodimerization domains]] * [[Engineered trimerization domains]] ### ML notes * [[Transformer]] * [[Low-rank Adaptation]] * [[Protein language models|Protein language models]] * [[Contrastive learning]] ### Miscellaneous * [[Evolution and natural selection]] * [[Protein folding]] * [[Protein dynamics]] * [[Protein-protein interactions]]