PageRank is a Markov-chain-based way to assign an “importance” score to pages. --- Model a “random surfer”: - States are web pages. - From page $j$, follow an outgoing link uniformly at random. To guarantee a unique stationary distribution (and handle dead ends), PageRank adds **teleportation** (a.k.a. damping): - With probability $\alpha$ (often about $0.85$), follow a link. - With probability $1-\alpha$, jump to a random page (or a personalization distribution). The PageRank vector $\pi$ is the stationary distribution of this chain. In practice, you build the transition matrix and solve the stationary distribution equation (usually via **power iteration**, not by explicitly finding all eigenvalues). Example transition matrix: ![[Screenshot from 2021-05-21 23-11-30.png|300]] Stationary distribution equation (for a row-stochastic transition matrix $P$): $ \pi = \pi P, \quad \sum_i \pi_i = 1, \quad \pi_i \ge 0 $ Equivalently, $\pi$ is the (left) eigenvector of $P$ with eigenvalue $1$. ![[Screenshot from 2021-05-21 23-11-34.png|400]] Those probabilities serve as the PageRank values: ![[Screenshot from 2021-05-21 23-11-40.png|600]] References: - Brin & Page (1998), *The Anatomy of a Large-Scale Hypertextual Web Search Engine*. https://research.google/pubs/the-anatomy-of-a-large-scale-hypertextual-web-search-engine/ - Langville & Meyer (2006), *Google's PageRank and Beyond*. https://press.princeton.edu/books/hardcover/9780691122021/googles-pagerank-and-beyond - (Intro article) https://towardsdatascience.com/brief-introduction-to-markov-chains-2c8cab9c98ab