(01) Review of Fundamental Concepts Needed for SEM - Omnibus

# Overview These notes are a concise review of the foundational statistical and methodological concepts needed to understand the basics of structural equation modeling (SEM). A few of the concepts discussed here may be new (indicated by red stars, ★), but most should already be familiar. # Types of Variables Basic typology: numerical vs. categorical - Compare this with the Stevens' typology (NOIR). Observed vs. latent vs. emergent ★ - Observed variables (also known as manifest variables or indicators) - Latent variables (also known as factors) - Emergent variables (also known as composite variables) # Statistical Inference Distributions commonly used for inferential test statistics: - Standard normal distribution ($z$) - Chi-squared distribution ($\chi^2$) - Student’s t distribution ($t$) - Fisher-Snedecor F distribution ($F$) %% which of these are known as critical ratios (CR)? %% ## Null Hypothesis Significance Testing - Statistical hypotheses: - Null ($H_0$) - Alternative ($H_\text{a}$) - Critical value - Nominal significance level (NSL); more commonly known as the alpha-level. - Observed significance level (OSL); more commonly known as the p-value. - Type-I error - Type-II error - Statistical power ## Estimation ### Estimators An *estimator* is a rule or procedure using observed data to calculate an estimate of a given quantity (typically a parameter). Thus, the rule (the estimator), the quantity of interest (the estimand), and its result (the estimate) are distinguished. For example, the sample mean ($M$) is a commonly used estimator of the population mean ($\mu$). Broadly speaking, there are *point estimators* and *interval estimators*. While point estimators produce single-valued results, interval estimators produce a range of plausible values (e.g., confidence intervals). %% Note: "Single value" does not necessarily mean "single number", but includes vector valued or function valued estimators. %% ### Estimation methods (frameworks) ★ - Least squares (LS) - Maximum likelihood (ML) - Method of moments (MM) > [!important] > SEM generally relies on ML estimation methods. # Deviation Scores A variable can be converted (or *transformed*) to a new variable that shows the difference (or deviation) of each observed value from the mean ($M_x$) of that variable. This can be done by subtracting the mean of that variable from every observed value: $\large d_x~=~X~-~M_x $ This is also known as *centering a variable*. In such instances, the variable is said to be centered about its mean or in deviation form. A variable in deviation form retains the same measurement units (i.e., scale or metric) as the original variable, but the mean of the deviations is zero. > [!note] Notation note > To be consistent with the notation used in the Kline (2023) textbook and recommended by current APA style guidelines, the symbol $M_x$ is used for the mean of $X$ (rather than the bar notation $\bar{X}$). # Standardized / Studentized Variables A variable can also be transformed to a new variable that indicates how many standard deviations an individual score ($X_i$) is above or below the mean; this is also known as a z-score. This is a "scaleless" (or "metric-free") quantity—i.e., any information regarding the original variable’s units of measurement is lost. The basic formula is $\large z_x~=~\frac{X-M_x}{s_x} $ Notice that this is the deviation score ($d=X-M$) divided by the standard deviation. Technically, this is a studentized variable when the estimated (sample) mean ($M_x$) and standard deviation ($s_x$) is used rather than the population mean ($\mu_x$) and standard deviation ($\sigma_x$). The mean and variance of a standardized variable are 0 and 1, respectively. The original variable is said to be in unstandardized (or raw) form. # Variance and Standard Deviation Variance is a quantitative measure of the dispersion of the observed values of a variable. Least-squares (LS) estimation formula for variance (unbiased estimator): $\large s_x^2~=~\frac{\sum{\left(X-M_x\right)^2}}{N-1} $ Maximum-likelihood (ML) estimation formula for variance (biased estimator): $\large S_x^2~=~\frac{\sum{\left(X-M_x\right)^2}}{N} $ The LS estimate can easily be converted to ML using the Bessel correction: $ S_x^2~=~s_x^2 \left( \frac{N-1}{N} \right) $ In either case, the standard deviation is simply the positive square root of variance. > [!note] Notation note > The LS estimator for variance is denoted $s^2$ (lowercase), whereas the ML version is denoted $S^2$ (uppercase). This of course carries over to the notation for standard deviation with $s$ and $S$ for the LS and ML versions, respectively. > > Another common way to symbolize variance is operator notation. For the LS estimator, this is $\operatorname{var}(X)$, while $\operatorname{Var}(X)$ denotes the ML estimator. > > Other common notation for standard deviation is $\sqrt{\operatorname{var}(X)}$ and $\sqrt{\operatorname{Var}(X)}$. > [!important] Connection to covariance > Variance is a special case of covariance: Variance is the covariance of a variable with itself. > > (More on covariance in a later section.) %% From old notes: (any of this needed somewhere else?) Maximum-Likelihood Estimators - SEM typically uses a computational process known as maximum-likelihood (ML) estimation to estimate all the parameters in a model; in contrast, regression generally uses a method known as least-squares (LS) estimation. - The ML estimators for some quantities (such as covariance and variance) are known to be biased. Unbiased vs. biased estimators of covariance and variance - The sample covariance (i.e., a covariance computed from a sample of data) is used to estimate the population covariance; note that this also holds true for variance since it is a special case of covariance | Type of estimator | Covariance | Variance | | --- | --- | --- | | Least squares (unbiased) | $s_{xy}=\frac{\sum\left(X-M_x\right)\left(Y-M_y\right)}{N-1}$ | $s_x^2=\frac{\sum\left(X-M_x\right)^2}{N-1}$ | | Maximum likelihood (biased) | $S_{xy}=\frac{\sum\left(X-M_x\right)\left(Y-M_y\right)}{N}$ | $S_x^2=\frac{\sum\left(X-M_x\right)^2}{N}$ | Note that the unbiased estimators are denoted using lowercase letter ‘s,’ while the biased estimators use the uppercase letter ‘S.’ - The biased estimators are known as such because they tend to produce values that underestimate the true population covariance or variance (particularly in smaller samples) If some ML estimators are biased, why use them? - The amount of bias in ML estimators becomes smaller with larger samples (i.e., they are asymptotically unbiased). - The distinction between the biased and unbiased estimators becomes irrelevant with larger samples since the computed values of each would be virtually equal. - ML estimators are the most efficient (i.e., these need fewer observations to reach some level of precision). - For many quantities, the ML and LS estimators are identical (e.g., regression weights). Important considerations regarding software - Most SEM software packages are capable of using the variances and covariances for the observed variables as the input data (rather than the raw data). - Most SEM software will assume by default that the user is providing unbiased (LS) variances and covariances, and it will automatically convert these values to the biased (ML) form. (So, always read the manual!) %% # Correlation This section refers only to simple bivariate (zero-order) correlations—specifically, the Pearson product-moment correlation coefficient. Correlation is a measure of the linear relationship between two variables. The most recognizable notation for correlation is $r_{xy}$. There is also the operator notation, $\operatorname{corr}(X,Y)$. There are various formulas for correlation (specifically, the Pearson correlation). One formula uses the covariance and variances of the two variables involved: $\large \eqalign{ r_{xy}~&=~\frac{s_{xy}}{s_x s_y} \\[2ex] &=~\frac{S_{xy}}{S_x S_y} } $ Just as correlation can be computed from covariance, a covariance covariance can be computed from the correlation and standard deviations. Naturally, using the LS standard deviations converts the correlation to the LS covariance, $\large s_{xy}~=~(r_{xy})(s_x)(s_y) $ and the ML standard deviations produce the ML covariance, $\large S_{xy}~=~(r_{xy})(S_x)(S_y) $ The correlation will be the same regardless of using LS or ML estimators for variance and covariance because both of these formulas reduce to the same form: $\large r_{xy}~=~\frac{\sum{(X-M_x)(X-M_x)}}{\sqrt{\sum{(X-M_x)} \cdot \sum{(Y-M_y)}}} $ Continuing with algebraic manipulation of this formula arrives at yet another formula for correlation, this time using standardized variables: $\large r_{xy}~=~\frac{\sum{z_x z_y}}{N} $ - Note: The divisor should be $N-1$ if the variables are standardized using LS estimators of their standard deviations. > [!important] Connection to covariance > Correlation is a special case of covariance: The correlation between two variables (say, $X$ and $Y$) is equivalent to the covariance of the standardized versions of those variables ($z_x$ and $z_y$). > > (More on covariance in a later section.) # Covariance ★ Covariance is a measure of the degree to which the observed values for a given pair of variables tend to change (i.e., covary) together. More specifically, it is a measure of the linear relationship (dependence) between two variables. Another way to describe covariance is in terms of how two variables covary around their respective means in a pairwise (i.e., joint) manner. Think of covariance as a measure of how two variables (say, $X$ and $Y$) "dance together." When $X$ is above its mean value and $Y$ is also above its mean at the same time (or both are below their means), the product of their deviation scores is positive. If this happens consistently, the covariance is positive. If $X$ being above its mean tends to correspond to $Y$ being below its mean (and vice versa), the covariance will be negative. The magnitude (size) of covariance conveys the strength of this "co-movement," but because it depends on the units of the variables, it is difficult to interpret on its own. So, by itself, covariance is not a particularly useful or easily interpretable quantity. ==However, it is a vitally important quantity for the purposes of SEM computations.== Covariance is said to be unstandardized as it depends on the scales (units of measurement) of the two variables, which limits its direct comparability across datasets/samples. So, although it retains information about the metrics used by the variables (unlike the Pearson correlation), the magnitude of a covariance is not all that meaningful by itself. > [!tip] Interpreting a covariance > The basic interpretation of covariance focuses on the sign (+/-) of the covariance; this indicates the general direction of the relationship (positive, negative, or zero). > - If $\operatorname{Cov}(X,Y)>0$ (i.e., the covariance is positive), $X$ and $Y$ tend to increase or decrease together (positive association). > - If $\operatorname{Cov}(X,Y)<0$ (i.e., the covariance is negative), $X$ and $Y$ tend to move in opposite directions (negative or inverse association). > - If $\operatorname{Cov}(X,Y)=0$, there is no linear relationship between $X$ and $Y$. Nuances: - Unit Dependence: Unlike correlation, covariance is not standardized. For example, the covariance between height (in cm) and weight (in kg) will differ significantly from that between height (in inches) and weight (in pounds), even if the relationship is identical. - Context-Specific Meaning: A high covariance does not imply a strong relationship unless the scales of $X$ and $Y$ are considered. For instance, the covariance between test scores and hours studied might be large, but only because both variables are on large numerical scales. - Symmetry: Covariance is symmetric, meaning $\operatorname{Cov}(X,Y)=\operatorname{Cov}(Y,X)$, reflecting the mutual relationship. Covariance is more formally defined as the average product of the deviations for two variables. - LS estimation formula for covariance (unbiased estimator): $\large s_{xy}~=~\frac{\sum{\left(X-M_x\right)\left(Y-M_y\right)}}{N-1} $ - ML estimation formula for covariance (biased estimator): $\large S_{xy}~=~\frac{\sum{\left(X-M_x\right)\left(Y-M_y\right)}}{N} $ Again, note the lowercase (for LS) and uppercase (for ML) notation used. There is also the operator notation for covariance: - LS: $\operatorname{cov}(X, Y)$ - ML: $\operatorname{Cov}(X,Y)$ Also note that the order of variables listed in the notation does not matter: $\operatorname{Cov}(X,Y)=\operatorname{Cov}(Y,X)$. > [!tip] Converting LS to ML > The LS estimate can easily be converted to ML using the Bessel correction: > $\large > S_{xy}~=~s_{xy} \left( \frac{N-1}{N} \right) > $ # Connections Between Covariance and Other Statistics Covariance is the "raw material" of SEM because many of the other statistics involved are based on covariances. ## Variance Variance can be thought of as a special case of covariance; in other words, variance is the covariance of a variable with itself: $\large \eqalign{ \operatorname{Cov}(X,X)~&=~\frac{\sum{(X-M_x)(X-M_x)}}{N} \\[2ex] ~&=~\frac{\sum{(X-M_x)^2}}{N} \\[2ex] &=~\operatorname{Var}(X) } $ ## Correlation The correlation between $X$ and $Y$ is equal to the covariance of the standardized $X$ and standardized $Y$ (that is, $z_x$ and $z_y$ respectively). For this reason, the correlation is sometimes described as a "standardized covariance." Correlation is easier to interpret than covariance, but the metrics of the variables are lost. ==Notice that the previous formula has the same general form as the formula for covariance; this shows why the correlation between $X$ and $Y$ is equal to the covariance of their standardized counterparts (namely, $z_x$ and $z_y$):== $\large \operatorname{corr}(X,Y)~=~\operatorname{cov}(z_x,z_y) $ ==Squared correlation coefficient ($r^2$) also has meaning (proportion of variance in one variable linked to another variable). Notice that if you have the correlation between two variables ($r_{xy}$) and the standard deviations of those variables ($S_x$ and $S_y$), you can easily compute the covariance:== $\large S_{xy}~=~\frac{r_{xy}}{(S_x)(S_y)} $ ## Regression Weights The regression weight (unstandardized) is computed using the covariance: $\large B~=~\frac{\operatorname{Cov}(X,Y)}{\operatorname{Var}(X)} $ # Statistical Independence (Orthogonality) ★ If two variables are statistically independent (or orthogonal), then they are uncorrelated—in other words, there is no linear relationship between the two variables. Orthogonality between two variables is sometimes denoted by the symbol $\perp$. so, for example, if two variables (say, $X$ and $Y$) are orthogonal, this can be expressed as $X \perp Y$. Orthogonal variables have a correlation equal to zero, which also implies that the covariance is zero; i.e., if $X$ and $Y$ are independent, then $\operatorname{corr}(X,Y)=0$ and $\operatorname{cov}(X,Y)=0$. Conditional independence - If two variables are *conditionally independent* (or *conditionally orthogonal*), then the partial correlation between the variables is zero (while controlling for one or more other variables). - For example, with three variables (say, $W$, $X$, and $Y$), if $X$ and $Y$ are conditionally independent (given $W$), then the partial correlation for $X$ and $Y$ while controlling for $W$ is zero ($r_{xy.w}=0$). - Even if two variables are not orthogonal, it is still possible for them to be conditionally orthogonal (and vice versa). # Vectors and Matrices ★ For the purposes of this course, vectors and matrices are merely a more concise and elegant way to organize and refer to lists and tables of information. This is particularly useful when dealing with very large lists or tables of data. %% These are also the basis of linear algebra. %% ## Vector A *vector* is an array or ordered list of numerical values. Vectors are denoted as boldfaced lowercase letter (e.g., $\textbf{x}$, $\textbf{s}$, $\boldsymbol{\upmu}$, $\boldsymbol{\uptheta}$). > [!example] > The following shows the form of a vector of population means of three variables ($X_1$, $X_2$, and $X_3$): > $\large \boldsymbol{\upmu}~= \begin{bmatrix} \mu_1 \\ \mu_2 \\ \mu_3 \end{bmatrix} > $ > > Now say that the values of the three means are known to be $\mu_1=15.6$, $\mu_2=-9.9$, and $\mu_3=0.7$. These values can be displayed in vector form: > $\large \boldsymbol{\upmu}~= \begin{bmatrix} 15.6 \\ -9.9 \\ 0.7 \end{bmatrix} > $ ## Matrix A *matrix* (plural *matrices*) is a tabular block of ordered numerical data, arranged in rows and columns. Matrices are denoted as boldfaced uppercase letter (e.g., $\bf{S}$, $\bf{\Phi}$, $\bf{\Gamma}$). The position of an element in a matrix is given by its row number and column number (always in that order). A covariance matrix (also known as a variance-covariance matrix) is a special matrix used in SEM which contains all the variances and covariances for a set of observed variables. > [!tip] > In SEM, the covariance matrix can be used as the input data rather than the raw data. > [!example] > The following shows the form of a covariance matrix for three variables (say, $X_1$, $X_2$, and $X_3$): > $\large \bf{S}~= \begin{bmatrix} s_1^2 & s_{12} & s_{13} \\ s_{21} & s_2^2 & s_{23} \\ s_{31} & s_{32} & s_3^2 \end{bmatrix} > $ > > The matrix $\bf{S}$ has dimensions $3\times3$ (3 rows, 3 columns). The variances ($s_1^2$, $s_2^2$, and $s_3^2$) are on the main diagonal of the matrix, and the covariances ($s_{12}$, $s_{13}$, and $s_{23}$) are off-diagonal. Note that matrix in the previous example can be described as being both *square* and *symmetric*. - Square implied that there are as many rows as columns. - Symmetric implies that $s_{12}=s_{21}$, $s_{13}=s_{31}$, and $s_{23}=s_{32}$ (which makes sense given the properties of covariance). - While covariance and correlation matrices will always be square and symmetric, this is not necessarily true of all matrices (e.g., raw-data matrix). > [!question] Discussion questions > What would a correlation matrix look like? > What would an observed (raw) data matrix look like?