2023-09-27 08:40:51 +02:00
\chapter { Linear Model}
2023-09-22 17:32:56 +02:00
2023-09-27 08:40:51 +02:00
\section { Simple Linear Regression}
2023-09-22 17:32:56 +02:00
\[
2023-09-27 08:40:51 +02:00
Y_ i = \beta _ 0 + \beta _ 1 X_ i + \varepsilon _ i
2023-09-22 17:32:56 +02:00
\]
2023-09-27 08:40:51 +02:00
\[
\Y = \X \beta + \varepsilon .
\]
\[
2023-09-22 17:32:56 +02:00
\begin { pmatrix}
Y_ 1 \\
Y_ 2 \\
\vdots \\
Y_ n
\end { pmatrix}
2023-09-27 08:40:51 +02:00
=
\begin { pmatrix}
2023-09-22 17:32:56 +02:00
1 & X_ 1 \\
1 & X_ 2 \\
\vdots & \vdots \\
2023-09-27 08:40:51 +02:00
1 & X_ n
2023-09-22 17:32:56 +02:00
\end { pmatrix}
2023-09-22 23:48:46 +02:00
\begin { pmatrix}
2023-09-22 17:32:56 +02:00
\beta _ 0 \\
\beta _ 1
\end { pmatrix}
2023-09-27 08:40:51 +02:00
+
2023-09-22 17:32:56 +02:00
\begin { pmatrix}
\varepsilon _ 1 \\
\varepsilon _ 2 \\
2023-09-27 08:40:51 +02:00
\vdots
2023-09-22 17:32:56 +02:00
\varepsilon _ n
\end { pmatrix}
2023-09-27 08:40:51 +02:00
\]
2023-09-22 17:32:56 +02:00
2023-09-27 08:40:51 +02:00
\paragraph * { Assumptions}
\begin { enumerate} [label={ \color { primary} { ($ A _ \arabic * $ )} } ]
\item $ \varepsilon _ i $ are independent;
\item $ \varepsilon _ i $ are identically distributed;
\item $ \varepsilon _ i $ are i.i.d $ \sim \Norm ( 0 , \sigma ^ 2 ) $ (homoscedasticity).
\end { enumerate}
\section { Generalized Linear Model}
2023-09-22 17:32:56 +02:00
2023-09-27 08:40:51 +02:00
\[
g(\EE (Y)) = X \beta
\]
with $ g $ being
2023-09-22 17:32:56 +02:00
\begin { itemize}
2023-09-27 08:40:51 +02:00
\item Logistic regression: $ g ( v ) = \log \left ( \frac { v } { 1 - v } \right ) $ , for instance for boolean values,
\item Poisson regression: $ g ( v ) = \log ( v ) $ , for instance for discrete variables.
2023-09-22 17:32:56 +02:00
\end { itemize}
2023-09-27 08:40:51 +02:00
\subsection { Penalized Regression}
When the number of variables is large, e.g, when the number of explanatory variable is above the number of observations, if $ p >> n $ ($ p $ : the number of explanatory variable, $ n $ is the number of observations), we cannot estimate the parameters.
In order to estimate the parameters, we can use penalties (additional terms).
Lasso regression, Elastic Net, etc.
2023-09-22 17:32:56 +02:00
\subsection { Statistical Analysis Workflow}
\begin { enumerate} [label={ \bfseries \color { primary} Step \arabic * .} ]
\item Graphical representation;
\item ...
\end { enumerate}
2023-09-22 23:48:46 +02:00
\[
Y = X \beta + \varepsilon ,
\]
is noted equivalently as
\[
\begin { pmatrix}
y_ 1 \\
y_ 2 \\
y_ 3 \\
y_ 4
\end { pmatrix}
= \begin { pmatrix}
1 & x_ { 11} & x_ { 12} \\
1 & x_ { 21} & x_ { 22} \\
1 & x_ { 31} & x_ { 32} \\
1 & x_ { 41} & x_ { 42}
\end { pmatrix}
\begin { pmatrix}
\beta _ 0 \\
\beta _ 1 \\
\beta _ 2
\end { pmatrix} +
\begin { pmatrix}
\varepsilon _ 1 \\
\varepsilon _ 2 \\
\varepsilon _ 3 \\
\varepsilon _ 4
\end { pmatrix} .
\]
2023-09-22 17:32:56 +02:00
\section { Parameter Estimation}
\subsection { Simple Linear Regression}
\subsection { General Case}
2023-09-27 08:40:51 +02:00
If $ \X ^ T \X $ is invertible, the OLS estimator is:
2023-09-22 17:32:56 +02:00
\begin { equation}
2023-09-27 08:40:51 +02:00
\hat { \beta } = (\X ^ T\X )^ { -1} \X ^ T \Y
2023-09-22 17:32:56 +02:00
\end { equation}
\subsection { Ordinary Least Square Algorithm}
We want to minimize the distance between $ \X \beta $ and $ \Y $ :
\[
\min \norm { \Y - \X \beta } ^ 2
\]
(See \autoref { ch:elements-of-linear-algebra} ).
\begin { align*}
\Rightarrow & \X \beta = proj^ { (1, \X )} \Y \\
\Rightarrow & \forall v \in w,\, vy = v proj^ w(y)\\
\Rightarrow & \forall i: \\
& \X _ i \Y = \X _ i X\hat { \beta } \qquad \text { where $ \hat { \beta } $ is the estimator of $ \beta $ } \\
2023-09-27 08:40:51 +02:00
\Rightarrow & \X ^ T \Y = \X ^ T \X \hat { \beta } \\
\Rightarrow & { \color { gray} (\X ^ T \X )^ { -1} } \X ^ T \Y = { \color { gray} (\X ^ T \X )^ { -1} } (\X ^ T\X ) \hat { \beta } \\
\Rightarrow & \hat { \beta } = (\X ^ T\X )^ { -1} \X ^ T \Y
2023-09-22 17:32:56 +02:00
\end { align*}
2023-09-27 08:40:51 +02:00
This formula comes from the orthogonal projection of $ \Y $ on the vector subspace defined by the explanatory variables $ \X $
2023-09-22 17:32:56 +02:00
$ \X \hat { \beta } $ is the closest point to $ \Y $ in the subspace generated by $ \X $ .
If $ H $ is the projection matrix of the subspace generated by $ \X $ , $ X \Y $ is the projection on $ \Y $ on this subspace, that corresponds to $ \X \hat { \beta } $ .
2023-09-22 23:48:46 +02:00
\section { Coefficient of Determination: \texorpdfstring { $ R ^ 2 $ } { R\textsuperscript { 2} } }
2023-09-22 17:32:56 +02:00
\begin { definition} [$ R ^ 2 $ ]
\[
0 \leq R^ 2 = \frac { \norm { \X \hat { \beta } - \bar { \Y } \One } ^ 2} { \norm { \Y - \bar { \Y } \One } ^ 2} = 1 - \frac { \norm { \Y - \X \hat { \beta } } ^ 2} { \norm { \Y - \bar { \Y } \One } ^ 2} \leq 1
2023-09-22 23:48:46 +02:00
\] proportion of variation of $ \Y $ explained by the model.
2023-09-22 17:32:56 +02:00
\end { definition}
2023-09-22 23:48:46 +02:00
\begin { figure}
\centering
\includestandalone { figures/schemes/orthogonal_ projection}
\caption { Orthogonal projection of $ \Y $ on plan generated by the base described by $ \X $ . $ \color { blue } a $ corresponds to $ \norm { \X \hat { \beta } - \bar { \Y } } ^ 2 $ and $ \color { blue } b $ corresponds to $ \norm { \Y - \hat { \beta } \X } ^ 2 $ }
\label { fig:scheme-orthogonal-projection}
2023-09-23 09:00:51 +02:00
\end { figure}
\begin { figure}
\centering
\includestandalone { figures/schemes/ordinary_ least_ squares}
\caption { Ordinary least squares and regression line with simulated data.}
\label { fig:ordinary-least-squares}
\end { figure}