Weight space learning - Subhaditya's Website

# Weight Space Learning ```toc ``` - Damian Borth, st.gallen - [paper](https://arxiv.org/pdf/2406.09997) - treat weights as data points - representation learning - can we look at networks -> infer the latent factors from the weights? - is there knowledge inside models, but can be accessed when they are frozen? - [[Capsule Network#Loss|loss]] surfaces and optimization problem of NN are non-convex - nn training optimization is very high dimensional - what is the relationship between the characterstics(behavior, performance etc) and their solution in weight space - GDPR : linked model to database - Hypothesis - nn populate a structure in weight space - structure contains info on properties and generating factors of the models - [[Autoencoders with Convolutions#Encoder|Encoder]] [[Autoencoders with Convolutions#Decoder|Decoder]] [[architecture]] for weight vectors - then on to down-stream tasks - rather huge model zoo generated - variated [[Chapter 4 - Deep Neural Networks#Hyperparameters|hyperparameters]] with [[Dropout#Dropout|Dropout]] - magnitude [[Pruning]] - two types of pre-training : supervised, [[Contrastive Loss|contrastive]] - weight space is symmetric sometimes : ACG [[architecture]] - multiple versions of NN which do the same thing -> can be used to reach them from a space - [[Contrastive Loss|contrastive]] loss - linear heads are fitted on the model zoos validation split - encoder is frozen - [[Initialization#Initialization|initialization]] - random normal - glorot - [[Orthogonal Initialization#Orthogonal Initialization|orthogonal]] - he normal - truncated normal - train and test on [[MNIST]] ,[[Fashion MNIST]] , [[CIFAR]] , [[SVHN]] - hypernetworks dont really work somehow? - train a encoder decoder transformer - one forward pass destroys models - took the weight vector and calculated MSE - layers that occupy more are re-constructed better (higher mean) - also what was apparently needed for [[Stable Difusion]] - paper : [[Taming Transformers for high res image synthesis]] - perception loss + [[GAN]] - sample space - are they just sampling the train set? - some ablation done to prove it's not lol - [[Sequential auto encoding of neural embeddings - SANE]]