More General Weight Functions - Mihai Nica's Notes

# More General Weight Functions In the [[Nearby Neighbour Averaging - Cannonball Example]], we can alternatively be written as a **weighted average** of the entire dataset using a weight function $Wt(x,X_i)$ that tells us the weight of influence we should assign from $x$ to $X_i$. $\hat{f}(x) = \frac{\displaystyle\sum_{i \in \text{Examples}} Y_i \cdot \text{Wt}(x,X_i) } { \displaystyle\sum_{i \in \text{Examples}} \text{Wt}(x,X_i) } $ ### Top Hat Function The case where the weight function is either 0 for points far away and 1 for points close by is exactly the nearby neighbour averaging described above. This is sometimes called a "Top Hat" function since it looks like a top hat when plotted. $ \text{Wt}(x,x^\prime) = \texttt{1}\Big\{ |x - x^\prime| < \delta \Big\} \in \{0,1\}$ ## Gaussian Kernel Function We can also use a weight function we use a Gaussian weight: $ \text{Wt}(x,x^\prime) = exp\left(-\left(\frac{x-x^\prime}{\sigma}\right)^2\right)$ Here $\sigma$ is the "characteristic length scale"; the exponential is dependent on the number of times the length $\sigma$ fits in the difference between $x$ and $x^\prime$ . The advantage of doing this is that closer points (i.e. when $x-x^\prime$ is small) are counted with a higher weight than further away points (i.e. when $x-x^\prime$ is large), rather than all of them counting equally. Another advantage is that the resulting function $\hat{f}$ will be smoother because it doesn't have discontinuities in the weight function. ## . Other exponential kernels Actually one can make a weight function of the form $ \text{Wt}(x,x^\prime) = exp\left(-\left|\frac{x-x^\prime}{\sigma}\right|^p\right)$ for any value of $p$. The case $p=2$ is the exponential kernel, and the limit $p \to \infty$ will recover the original top-hat weight function shape. <iframe src="https://www.desmos.com/calculator/uss8qhj3dh" width=600 height=200></iframe>