# Neuroscience Applications of the Free Energy Principle
## Overview
The Free Energy Principle (FEP) originated within computational neuroscience and remains most deeply developed there. It provides a unified account of brain function in which all neural processes -- perception, action, learning, attention -- serve a single imperative: the minimization of variational free energy. This document covers the major neuroscience applications, from microscale neural circuit models through mesoscale electrophysiological signatures to macroscale brain imaging methods.
The FEP's contribution to neuroscience is not merely theoretical. It has generated:
- Quantitative models of cortical microcircuitry (predictive coding)
- Novel analysis methods for neuroimaging data (Dynamic Causal Modeling)
- Specific empirical predictions about neural oscillations and prediction error signals
- Mechanistic accounts of neuromodulation in terms of precision weighting
- A principled framework connecting neural computation to behavior and psychopathology
## Theoretical Framework
### Predictive Coding in Cortical Circuits
The central process theory derived from the FEP for neural computation is **predictive coding**. The cortex is organized as a hierarchical generative model where:
1. **Top-down connections** carry predictions from higher to lower cortical areas
2. **Bottom-up connections** carry prediction errors from lower to higher areas
3. **Lateral connections** encode precision (the confidence in prediction errors)
```
Formal architecture:
Level i+1 (higher cortex):
mu_{i+1} = current beliefs about slow/abstract causes
prediction: g_{i+1}(mu_{i+1}) -> expected activity at level i
Level i (lower cortex):
mu_i = current beliefs about fast/concrete causes
prediction error: epsilon_i = mu_i - g_{i+1}(mu_{i+1})
precision-weighted error: Pi_i * epsilon_i
Update rule:
d(mu_i)/dt = -partial F / partial mu_i
= D(mu_i) - Pi_{i-1} * epsilon_{i-1}
+ (partial g_i / partial mu_i)^T * Pi_i * epsilon_i
```
This maps onto known cortical anatomy:
| Computational Role | Cortical Implementation | Laminar Location |
|-------------------|------------------------|------------------|
| Predictions | Deep pyramidal neurons | Layers 5/6 |
| Prediction errors | Superficial pyramidal neurons | Layers 2/3 |
| Precision weighting | Gain modulation (neuromodulators) | All layers |
| State estimates | Population firing rates | Layers 2/3, 5/6 |
| Model parameters | Synaptic connection weights | All layers |
| Forward (error) connections | Feedforward projections | Layer 2/3 -> Layer 4 |
| Backward (prediction) connections | Feedback projections | Layer 5/6 -> Layers 1, 5/6 |
### The Canonical Microcircuit
Bastos et al. (2012) proposed that the cortical column implements a **canonical microcircuit** for predictive coding. This circuit includes:
```
Superficial pyramidal cells (SP):
- Encode prediction errors
- Project forward to the next cortical level
- Receive top-down predictions on apical dendrites
- Receive bottom-up input on basal dendrites
Deep pyramidal cells (DP):
- Encode predictions (expectations)
- Project backward to the previous cortical level
- Receive prediction errors from superficial cells
- Send predictions to superficial cells via interneurons
Inhibitory interneurons (II):
- Mediate precision weighting
- Parvalbumin (PV) interneurons: gain control on prediction errors
- Somatostatin (SST) interneurons: contextual modulation
- VIP interneurons: disinhibition (precision enhancement)
```
This canonical microcircuit is repeated across cortical areas, with area-specific generative models encoded in the connectivity patterns and synaptic weights.
## Dopamine as Precision
### The Precision Hypothesis
One of the most influential neuroscience applications of the FEP is the reconceptualization of **dopamine** as encoding the precision of beliefs about policies (action plans):
```
Classical view: Dopamine = reward prediction error
FEP view: Dopamine = precision over expected free energy of policies
pi(a) = sigma(-G(a) * gamma)
Where:
pi(a) = probability of selecting policy a
G(a) = expected free energy of policy a
gamma = precision parameter (encoded by dopamine)
sigma = softmax function
```
When dopamine (gamma) is high:
- Policies are selected more deterministically (high confidence in action plan)
- Behavior is more goal-directed and exploitative
- The agent "commits" to the best available policy
When dopamine (gamma) is low:
- Policy selection is more stochastic (low confidence)
- Behavior is more exploratory and random
- The agent "dithers" between competing policies
### Empirical Evidence
This reframing accounts for diverse dopaminergic phenomena:
**Reward prediction errors**: Under the FEP, phasic dopamine signals reflect updates to precision over policies, not reward per se. When an unexpected reward occurs, the precision of the policy that led to that reward increases -- this looks identical to a reward prediction error but has a different computational meaning.
**Motivation and vigor**: Tonic dopamine levels set the baseline precision for policy selection. Low tonic dopamine (as in Parkinson's disease) produces:
- Bradykinesia (slow movements): Low precision -> weak commitment to motor policies
- Apathy: Low precision -> inability to select among competing goals
- Akinesia (difficulty initiating movement): Insufficient precision to "win" over default inaction
**Addiction**: Drugs of abuse artificially inflate precision over drug-seeking policies:
```
gamma_drug >> gamma_natural
-> pi(drug_seeking) -> 1 regardless of G(drug_seeking)
-> Compulsive behavior despite negative consequences
```
See [[psychiatry]] for the full psychiatric account.
## Dynamic Causal Modeling (DCM)
### Overview
Dynamic Causal Modeling is a **Bayesian framework for inferring effective connectivity** from neuroimaging data. It is the primary neuroimaging analysis method derived from the FEP.
```
Core idea:
Data = y (measured brain activity: fMRI BOLD, EEG, MEG, LFP)
Model = m (neural mass model with specific connectivity)
Parameters = theta (connection strengths, time constants)
DCM inverts a generative model:
p(theta | y, m) proportional_to p(y | theta, m) * p(theta | m)
Model comparison via free energy:
F(m) = ln p(y | m) - D_KL[q(theta) || p(theta | y, m)]
Best model: m* = argmax_m F(m) approx argmax_m ln p(y | m)
```
### DCM for fMRI
DCM for fMRI models the hemodynamic response as a nonlinear observation function applied to underlying neural dynamics:
```
Neural state equation:
dz/dt = A * z + sum_j u_j * B_j * z + C * u
Where:
z = neural states (activity in each region)
A = intrinsic (endogenous) connectivity matrix
B_j = modulatory effects of input j on connectivity
C = driving inputs
u = experimental inputs (stimuli, task conditions)
Hemodynamic observation model:
y = lambda(z) + noise
Where lambda maps neural activity through:
1. Neurovascular coupling
2. Balloon model (Buxton et al.)
3. BOLD signal equation
```
DCM has been applied to study:
- Visual processing pathways (feedforward vs. feedback connectivity)
- Language networks (left lateralization, compensatory reorganization)
- Motor planning and execution circuits
- Default mode network dynamics
- Clinical populations (schizophrenia, autism, depression)
### DCM for Electrophysiology (EEG/MEG)
DCM for electrophysiological data uses neural mass models that generate oscillatory dynamics:
```
Neural mass model (Jansen-Rit or conductance-based):
Excitatory population: pyramidal cells
Inhibitory populations: fast (GABA-A) and slow (GABA-B)
Each population characterized by:
- Mean membrane potential
- Mean firing rate (sigmoid function of potential)
- Synaptic kernel (impulse response)
Coupled via intrinsic and extrinsic connections
```
This version of DCM can model:
- Event-related potentials (ERPs) as transient prediction errors
- Induced oscillatory responses as precision dynamics
- Steady-state responses as evidence for specific generative models
- Cross-frequency coupling as hierarchical message passing
## EEG/fMRI Signatures of Prediction Error
### Mismatch Negativity (MMN)
The mismatch negativity is an EEG component (peaking ~100-250ms, frontocentral topography) elicited when an auditory stimulus violates a learned regularity. Under the FEP:
```
Standard stimuli: Brain learns p(tone | context) = high probability
Deviant stimulus: Elicits prediction error epsilon = o - g(mu)
MMN amplitude: Proportional to precision-weighted prediction error Pi * epsilon
Key predictions (confirmed empirically):
1. MMN increases with deviance magnitude (larger epsilon)
2. MMN increases with standard probability (stronger prior -> larger error)
3. MMN is reduced by attention withdrawal (lower precision Pi)
4. MMN generators are in auditory cortex + frontal cortex (hierarchical model)
5. MMN adapts with repeated deviants (updating the generative model)
```
Garrido et al. (2009) used DCM to show that MMN involves changes in **backward connectivity** (top-down predictions), not just forward connectivity, supporting the predictive coding account over simpler adaptation models.
### The P300
The P300 (or P3b) is a later EEG component (~300-600ms, parietal maximum) elicited by task-relevant surprising events:
```
FEP interpretation:
P300 = higher-order prediction error that triggers model updating
P3a (frontal): Orienting response = epistemic action
-> Reflects salience of unexpected event
-> Drives attention (precision reallocation)
P3b (parietal): Context updating = belief revision
-> Reflects magnitude of belief update
-> Scales with information content: I = -ln p(event)
```
The P300's amplitude scales with surprise (Shannon information), consistent with it reflecting the magnitude of belief updating in a hierarchical generative model.
### Repetition Suppression and Expectation Suppression
fMRI studies of visual processing reveal two distinct phenomena:
```
Repetition suppression:
Repeated stimulus -> reduced BOLD response
FEP account: Prediction errors decrease as the stimulus is predicted
Expectation suppression:
Expected stimulus -> reduced BOLD response compared to unexpected
FEP account: Top-down predictions partially cancel bottom-up input
Critical dissociation (Summerfield et al., 2008):
Repetition effects: Strongest in lower visual areas (V1, V2)
Expectation effects: Strongest in higher visual areas (LOC, FFA)
-> Consistent with hierarchical predictive coding
```
## Neural Oscillations and the FEP
### Oscillatory Signatures of Predictive Coding
The FEP predicts specific roles for neural oscillations in cortical message passing:
```
Gamma oscillations (30-100 Hz):
Role: Encode prediction errors (bottom-up)
Mechanism: Superficial pyramidal cell activity
Evidence: Gamma increases with prediction error magnitude
Alpha/beta oscillations (8-30 Hz):
Role: Encode predictions (top-down)
Mechanism: Deep pyramidal cell activity
Evidence: Alpha/beta increases with predictability
Theta oscillations (4-8 Hz):
Role: Temporal scaffolding for hierarchical inference
Mechanism: Phase-amplitude coupling with gamma
Evidence: Theta modulates gamma in hippocampal-cortical circuits
```
### Asymmetric Directed Coupling
Bastos et al. (2015) demonstrated in macaque visual cortex:
```
Feedforward direction (V1 -> V4 -> FEF):
- Dominated by gamma-band Granger causality
- Increases with stimulus drive (bottom-up prediction errors)
Feedback direction (FEF -> V4 -> V1):
- Dominated by alpha/beta-band Granger causality
- Increases with predictability (top-down predictions)
```
This frequency-specific asymmetry is a strong prediction of the canonical microcircuit model and has been replicated across species and modalities.
### Cross-Frequency Coupling
Hierarchical predictive coding predicts cross-frequency coupling as the mechanism by which different levels of the hierarchy communicate:
```
Phase-amplitude coupling (PAC):
Low-frequency phase (theta/alpha) from higher levels
modulates
High-frequency amplitude (gamma) at lower levels
Interpretation:
Higher-level predictions (slow, alpha/beta) modulate the gain
of lower-level prediction errors (fast, gamma)
= Precision weighting across hierarchical levels
```
## Hierarchical Cortical Processing
### Visual Hierarchy
The visual processing stream provides the clearest example of hierarchical predictive coding:
```
V1: Edge orientation, spatial frequency
| predictions (alpha/beta) ^ errors (gamma)
V2: Texture boundaries, contour ownership
| predictions (alpha/beta) ^ errors (gamma)
V4: Shape, color, intermediate features
| predictions (alpha/beta) ^ errors (gamma)
IT: Object identity, category
| predictions (alpha/beta) ^ errors (gamma)
PFC: Context, goals, task rules
Receptive field sizes increase at each level
Temporal dynamics slow at each level
Abstraction increases at each level
```
Empirical evidence for this architecture:
- **Feedback modulates feedforward**: V1 responses to identical stimuli change depending on context provided by higher areas
- **Predictable stimuli produce less activation**: Expected visual events produce lower BOLD responses in visual cortex
- **Unpredictable stimuli produce more activation**: Prediction error signals scale with surprise
- **Omission responses**: Visual cortex responds when an expected stimulus is ABSENT -- a signature of prediction, not stimulus processing
### Auditory Hierarchy
The auditory system implements a similar predictive hierarchy:
```
Primary auditory cortex (A1): Spectrotemporal features
|
Belt areas: Auditory objects, pitch
|
Parabelt: Sound categories, sequences
|
Superior temporal sulcus (STS): Speech, complex sounds
|
Prefrontal cortex: Linguistic structure, meaning
Temporal prediction windows increase up the hierarchy:
A1: ~25-50ms (single phoneme features)
Belt: ~100-200ms (syllables, pitch contours)
STS: ~500ms-2s (words, phrases)
PFC: seconds to minutes (narrative, discourse)
```
### Somatosensory and Motor Hierarchies
Motor control under the FEP is active inference -- the motor system generates proprioceptive predictions that are fulfilled by spinal reflexes:
```
Motor cortex: Generates predictions of proprioceptive consequences
|
Spinal cord: Computes prediction errors (desired vs. actual position)
|
Alpha motor neurons: Reflex arcs minimize prediction errors by moving the limb
Movement = making proprioceptive predictions come true
```
This resolves the classical motor control problem of "how does the brain specify which muscles to activate" -- it does not. It specifies the desired sensory consequences, and peripheral reflexes figure out the muscle activations.
## Current Research
### Computational Phenotyping
Using computational models derived from the FEP to characterize individual differences in neural processing:
```
Approach:
1. Fit predictive coding model to individual's neural data
2. Extract computational parameters (precision weights, learning rates)
3. Use parameters as "computational phenotype"
4. Correlate with behavior, genetics, clinical status
Applications:
- Identifying biomarkers for psychiatric disorders
- Predicting treatment response
- Tracking disease progression
- Personalized medicine
```
### Deep Temporal Models
Extending hierarchical predictive coding to account for temporal structure at multiple timescales:
```
Classical predictive coding: Spatial hierarchy (what)
Deep temporal models: Spatiotemporal hierarchy (what + when)
Each level represents temporal regularities at different scales:
Level 1: Millisecond dynamics (sensory features)
Level 2: Hundred-millisecond dynamics (events)
Level 3: Second-scale dynamics (episodes)
Level 4: Minute-to-hour dynamics (contexts)
```
### Neuromorphic Implementation
Building hardware that implements predictive coding principles:
```
Neuromorphic chips (Intel Loihi, IBM TrueNorth):
- Spiking neural networks that naturally implement prediction error minimization
- Event-driven computation (only process prediction errors)
- Low power consumption (only surprising events trigger computation)
- Online learning through spike-timing-dependent plasticity
```
## Open Questions
1. **Granularity of the generative model**: At what level of detail does the cortex implement predictive coding? Single neurons? Cortical columns? Areas? The answer likely varies across brain regions.
2. **Precision estimation**: How exactly does the brain estimate precision? The computational mechanisms for second-order statistics remain unclear.
3. **Generalized coordinates**: The FEP formalism uses generalized coordinates (position, velocity, acceleration...). Whether the brain explicitly represents temporal derivatives is debated.
4. **Non-cortical structures**: How do subcortical structures (basal ganglia, cerebellum, thalamus) fit into the predictive coding framework? The thalamus may play a key role in routing prediction errors, while the cerebellum may encode forward models.
5. **Consciousness**: Does prediction error minimization relate to consciousness? Some proposals link consciousness to the precision of high-level predictions (see [[knowledge_base/free_energy_principle/cognitive/consciousness]]).
6. **Individual differences**: What determines the parameters of an individual's generative model? Genetics, development, experience all contribute, but the mapping is poorly understood.
## References
1. Friston, K. (2005). A theory of cortical responses. *Philosophical Transactions of the Royal Society B*, 360(1456), 815-836.
2. Bastos, A. M., Usrey, W. M., Adams, R. A., Mangun, G. R., Fries, P., & Friston, K. J. (2012). Canonical microcircuits for predictive coding. *Neuron*, 76(4), 695-711.
3. Adams, R. A., Shipp, S., & Friston, K. J. (2013). Predictions not commands: active inference in the motor system. *Brain Structure and Function*, 218(3), 611-643.
4. Shipp, S. (2016). Neural elements for predictive coding. *Frontiers in Psychology*, 7, 1792.
5. Bastos, A. M., Vezoli, J., Bosman, C. A., Schoffelen, J. M., Oostenveld, R., Dowdall, J. R., ... & Fries, P. (2015). Visual areas exert feedforward and feedback influences through distinct frequency channels. *Neuron*, 85(2), 390-401.
6. Garrido, M. I., Kilner, J. M., Stephan, K. E., & Friston, K. J. (2009). The mismatch negativity: a review of underlying mechanisms. *Clinical Neurophysiology*, 120(3), 453-463.
7. Friston, K. J., Harrison, L., & Penny, W. (2003). Dynamic causal modelling. *NeuroImage*, 19(4), 1273-1302.
8. Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., & Pezzulo, G. (2017). Active inference: a process theory. *Neural Computation*, 29(1), 1-49.
9. Summerfield, C., Trittschuh, E. H., Monti, J. M., Mesulam, M. M., & Egner, T. (2008). Neural repetition suppression reflects fulfilled perceptual expectations. *Nature Neuroscience*, 11(9), 1004-1006.
10. Heilbron, M., & Chait, M. (2018). Great expectations: Is there evidence for predictive coding in auditory cortex? *Neuroscience*, 389, 54-73.
## See Also
- [[knowledge_base/free_energy_principle/mathematics/core_principle|Core Mathematical Formulation]]
- [[knowledge_base/free_energy_principle/cognitive/perception|Perception as Free Energy Minimization]]
- [[knowledge_base/free_energy_principle/cognitive/attention|Attention and Precision]]
- [[knowledge_base/free_energy_principle/biology/neural_systems|Neural Systems]]
- [[psychiatry|Psychiatric Applications]]
- [[knowledge_base/free_energy_principle/implementations/robotics|Robotics Implementations]]