PageRank is a Markov-chain-based way to assign an “importance” score to pages.
---
Model a “random surfer”:
- States are web pages.
- From page $j$, follow an outgoing link uniformly at random.
To guarantee a unique stationary distribution (and handle dead ends), PageRank adds **teleportation** (a.k.a. damping):
- With probability $\alpha$ (often about $0.85$), follow a link.
- With probability $1-\alpha$, jump to a random page (or a personalization distribution).
The PageRank vector $\pi$ is the stationary distribution of this chain.
In practice, you build the transition matrix and solve the stationary distribution equation (usually via **power iteration**, not by explicitly finding all eigenvalues).
Example transition matrix:
![[Screenshot from 2021-05-21 23-11-30.png|300]]
Stationary distribution equation (for a row-stochastic transition matrix $P$):
$
\pi = \pi P, \quad \sum_i \pi_i = 1, \quad \pi_i \ge 0
$
Equivalently, $\pi$ is the (left) eigenvector of $P$ with eigenvalue $1$.
![[Screenshot from 2021-05-21 23-11-34.png|400]]
Those probabilities serve as the PageRank values:
![[Screenshot from 2021-05-21 23-11-40.png|600]]
References:
- Brin & Page (1998), *The Anatomy of a Large-Scale Hypertextual Web Search Engine*. https://research.google/pubs/the-anatomy-of-a-large-scale-hypertextual-web-search-engine/
- Langville & Meyer (2006), *Google's PageRank and Beyond*. https://press.princeton.edu/books/hardcover/9780691122021/googles-pagerank-and-beyond
- (Intro article) https://towardsdatascience.com/brief-introduction-to-markov-chains-2c8cab9c98ab