% vim: tw=50 % 25/02/2023 09AM \subsubsection*{Testing independence in contingency tables} $N_{ij}$: number of samples of type $(i, j)$. \[ (N_{ij}) \sim \Multinomial(n, (p_{ij})) \] $H_0$: $p_{ij} = p_{i+} \times p_{+j}$ \\ $H_1$: $(p_{ij})$ unconstrained. \\ Found $2\log\Lambda$, which has asymptotic $\chi_{(r - 1)(c - 1)}^2$ distribution. \begin{example*}[COVID-19 deaths] Problems with $\chi^2$ independence test: \begin{enumerate}[(1)] \item $\chi^2$ approximation can be bad when we have large tables. Rule of thumb: Need $N_{ij} \ge 5$ for all $i, j$. \myskip Solution (non-examinable): exact testing. Idea: under $H_0$, the margins of $N$ $(N_{i+})$, $(N_{+j})$ are sufficient statistics for $p$. therefore 2 tables $N$, $\tilde{N}$ with the same margins are equally likely under $H_0$. An exact test contrasts the test statistic observed $2\log\Lambda(N)$ with the distribution of this statistic for the set of tables with the same margins as $N$. This gives a test of \emph{exact} size $\alpha$. \item $2\log\Lambda$ can detect deviations from $H_0$ in any direction. $\implies$ Low power, especially when $r$, $c$ is large. This is why $H_0$ is not rejected in a test of size $1\%$ in COVID-19 example. Solutions: \begin{enumerate}[(1)] \item Define a parametric alternative $H_1$ with fewer degrees of freedom. \item Lump categories in the table. \end{enumerate} \end{enumerate} \end{example*} \subsubsection*{Tests of Homogeneity} Instead of assuming $\sum_{i, j} N_{ij}$ fixed, we assume row totals are fixed. \begin{example*} $150$ patients, split into groups of $50$ for placebo, half-dose, full-dose. We record whether each patient improved, showed no difference or got worse. \begin{center} \begin{tabular}{p{4em}|c|c|c} & I & N.D. & W \\ \hline Placebo & \hspace{4em} & \hspace{4em} & \hspace{4em} \\ Half & & & \\ Full & & & \end{tabular} \end{center} Now row totals are fixed. Null of homogeneity: probability of each outcome is the same in each treatment group. \myskip Model: \[ (N_i1, \ldots, N_{ic}) \sim \Multinomial(n_{i+}, p_{i1}, \ldots, p_{ic}) \] independent for $i = 1, \ldots, r$. Paramters satisfy $\sum_j p_{ij} = 1$ for all $i$. $H_0$: $p_{1j} = p_{2j} = \cdots p_{rj}$ for all $j = 1, \ldots, c$. $H_1$: $(p_{i1}, \ldots, p_{ic})$ is a probability vector for all $i$. \begin{align*} L(p) &= \prod_{i = 1}^r \frac{n_{i+}!}{N_{i1}! \cdots N_{ic}!} p_{i1}^{N_{i1}} \cdots p_{ic}^{N_{ic}} \\ l(p) &= \text{const} + \sum_{i, j} N_{ij} \log p_{ij} \end{align*} To find $2\log\Lambda$ we need to maximise $l(p)$ over $H_0$, $H_1$. $H_1$: use Lagrange multipliers with constraints $\sum_j p_{ij} = 1$ for all $i$. Then the mle is \[ \hat{p}_{ij} = \frac{N_{ij}}{n_{i+}} \] $H_0$: let $p_j = p_{1j} = \cdots = p_{+j}$. \[ l(p) = \text{const} + \sum_{j = 1}^c N_{+j} \log p_j \] hence the mle is $\hat{p}_j = \frac{N_{+j}}{n_{++}}$, $n_{++} = \sum_i n_{i+}$. Thus \[ 2\log\Lambda = 2 \sum_{i, j} N_{ij} \log \left( \frac{N_{ij}}{n_{i+} N_{+j} / n_{++}} \right) \] This is exactly the same statistic as $2\log\Lambda$ for the independence test. Let $o_{ij} = N_{ij}$, $e_{ij} = n_{i+} \hat{p}_j = n_{i+} \frac{N_{+j}}{n_{++}}$ \begin{align*} \implies 2\log\Lambda &= 2\sum_{i, j} o_{ij} \log \left( \frac{o_{ij}}{e_{ij}} \right) \\ &\approx \sum_{i, j} \frac{(o_{ij} - e_{ij})^2}{e_{ij}} \end{align*} This is also the same as Pearson's statistic for independence test. \end{example*} \noindent Wilk's implies $2\log\Lambda$ is approximately $\chi_d^2$, \begin{align*} d &= \dim(\Theta_1) - \dim(\Theta_0) \\ &= (c - 1)r - (c - 1) \\ &= (c - 1)(r - 1) \end{align*} Asymptotic distribution of $2\log\Lambda$ is also the same as in the independence test. \myskip Testing independence or homogeneity with size $\alpha$ always has the same conclusion. \subsubsection*{Relationship between tests and confidence sets} Define the \emph{acceptance ragion} $A$ of a test to be the complement of the critical region. Let $X \sim f_X(\bullet \mid \theta)$ for some $\theta \in \Theta$. \begin{theorem*} \begin{enumerate}[(1)] \item Suppose that for each $\theta_0 \in \Theta$ there is a test of $H_0$: $\theta = \theta_0$ of size $\alpha$ with acceptance region $A(\theta_0)$. Then, the set \[ I(X) = \{\theta : X \in A(\theta)\} \] is a $100(1 - \alpha)\%$ confidence set. \item Suppose $I(X)$ is a $100(1 - \alpha)\%$ confidence set for $\theta$. Then \[ A(\theta_0) = \{x : \theta_0 \in I(X)\} \] is the acceptance region of a size $\alpha$ test for $H_0$: $\theta = \theta_0$. \end{enumerate} \end{theorem*} \begin{proof} In each part: \[ \theta_0 \in I(X) \iff X \in A(\theta_0) \] For part (1), we calculate: \begin{align*} \PP_{\theta_0}(I(X) \ni \theta_0) &= \PP_{\theta_0}(x \in A(\theta_0)) \\ &= 1 - \PP_{\theta_0}(x \in C(\theta_0)) \\ &= 1 - \alpha \end{align*} as desired. For part (2): \begin{align*} \PP_{\theta_0}(X \in C(\theta_0)) &= \PP_{\theta_0}(X \not\in A(\theta_0)) \\ &= \PP_{\theta_0}(I(X) \not\ni \theta_0) \\ &= 1 - \PP_{\theta_0}(I(x) \ni \theta_0) \\ &= 1 - (1 - \alpha) \\ &= \alpha \end{align*} as desired. \end{proof} \begin{example*} $X_1, \ldots, X_n \iidsim \normaldist(\mu, \sigma_0^2)$, $\sigma^2$ known. \[ I(X) = \left( \ol{X} \pm \frac{z_{\alpha/2} \sigma_0}{\sqrt{n}} \right) \] confidence interval. Test: $H_0$: $\mu = \mu_0$, $H_1$: $\mu \neq \mu_0$. Critical region: \[ \left\{ x : \left| \sqrt{n} \frac{(X - \ol{X}}{\sigma_0} \right| > z_{\alpha/2} \right\} \] \end{example*}