## Galactica - new large model for automatically organizing science developed by Meta AI and Papers with Code - ability to train on it for multiple epochs without overfitting, where upstream and downstream performance improves with use of repeated token - The dataset design is critical to the approach as all of it is processed in a common markdown format to blend knowledge between sources. - Citations are processed via a certain token that allows researchers to predict a citation given any input context - The capability of the model of predicting citations improves with scale and the model becomes better at the distribution of citations - the model can perform multi-modal tasks involving SMILES chemical formulas and protein sequences - transformer architecture in a decoder-only setup with GeLU activation for all model sizes.