Principal component analysis (PCA) reduces the number of dimensions in large datasets to principal components that retain most of the original information. It does this by transforming potentially correlated variables into a smaller set of uncorrelated variables, called principal components.
Let \(X\) the matrix \((n,p)\) of data, we note \(X_c\) = the centered matrix. Then the empirical variances, covariances matrix is \(C = \frac{1}{n}X_c^TX_c\). We note \(\Lambda\) the vector of the eigen value (in decrease order) of the matrix \(C\) and \(U\) the \((p,p)\) orthogonal matrix of the eigen vectors : \[C=U\texttt{diag}(\Lambda) U^T.\] Then We have
The coordinates of the \(n\) observations in the new basis of the eigen vectors \((\vec{u}_1,\ldots,\vec{u}_p)\) are \[\Psi = X_cU\]
The The coordinates of the \(p\) variables in the new basis of the eigen vectors \((\vec{v}_1,\ldots,\vec{v}_p)\) are \[\Phi = \sqrt{n}U\texttt{diag}(\sqrt{\lambda_1},\ldots,\sqrt{\lambda}_p)\]
The total inertia (variance) is \[I = \texttt{trace}(C)=\sum_{i=1,p}\lambda_i\]
The variance of the variable \(v_i\) is \(\lambda_i\)
[ Info: Precompiling FileIOExt [f5f51d8f-5827-5d2e-939b-192fcd6ec70c] (cache misses: wrong dep version loaded (2), wrong source (2))
My PCA function
Code
usingLinearAlgebra, Statisticsfunctionmy_PCA(X::Matrix{<:Real};normed=false)::Tuple{Vector{<:Real},Matrix{<:Real},Matrix{<:Real},Matrix{<:Real},Real,Vector{<:Real},Matrix{<:Real}}""" Compute the PCA of Data Input X : (n,p) Matrix of reals n = number of observations p = number of variables Output Λ : Vector of the p eigen value in decrease order U : (p,p) Matrix of reals eigen vectors in column Ψ : (n,p) Matrix of reals Coordinates of the observation in the new basis Φ = (p,p) Matrix of reals Coordinates of the variables in the new basis I_total : Real total inertia cum_var_ratio : p vector of reals cumulative variance ratio""" n,p =size(X) Λ =zeros(p); U =zeros(p,p); Ψ =zeros(n,p); Φ =zeros(p,p); I_total=0; cum_var_ratio =zeros(p)# Calculation of centered data xbar =mean(X,dims=1) Xc = X -ones(n,1)*xbar covMat = (1/n)*Xc'*Xcif normed ==true s=std(Xc,corrected=false,dims=1) Y=(Xc)./(ones(n,1)*s); covMat=(1/n)*Y'*Yend# Computating total inertia I_total =tr(covMat) Λ, U =eigen(covMat) eigOrder =sortperm(Λ, rev =true) # for abtaining increase order of eigen values Λ = Λ[eigOrder]# cumulative variance ratios cum_var_ratio =Vector{Float64}(undef,p)for i in1:p cum_var_ratio[i] =sum(Λ[1:i])/I_totalend U = U[:,eigOrder]if normed ==true Ψ = Y*U Φ =U*sqrt.(diagm(Λ))else Ψ = Xc*U Φ =U*sqrt.(n*diagm(Λ))endreturn Λ, U, Ψ, Φ, I_total,cum_var_ratio, covMatend
usingMultivariateStatsmodel =fit(PCA, X', maxoutdim=4, pratio =0.999) # Each column of X is an observationU =projection(model)println("U = ")display(U)Ψ = MultivariateStats.transform(model, X')println("Ψ = ")display(Ψ)display(Ψ'-my_Ψ) # Each column of Ψ is an observationdisplay(U-my_U)
\(Y\) is the centered and normed matrix. Each column of \(Xc\) is divided by its sample standard deviation
\(R=Y^TY\) is the corretalion matrix of the Data \(X\).
\[C=U\texttt{diag}(\Lambda) U^T.\] Then We have
The coordinates of the \(n\) observations in the new basis of the eigen vectors \((\vec{u}_1,\ldots,\vec{u}_p)\) are \[\Psi = YU\]
The The coordinates of the \(p\) variables in the new basis of the eigen vectors \((\vec{v}_1,\ldots,\vec{v}_p)\) are \[\Phi = U\texttt{diag}(\sqrt{\lambda_1},\ldots,\sqrt{\lambda}_p)\]
The total inertia (variance) is \[I = \texttt{trace}(C)=\sum_{i=1,p}\lambda_i\]
The variance of the variable \(v_i\) is \(\lambda_i\)
Data
Code
# Data from Tomassone page 138 : mineral watersX =[341273842322632399153287354424232989239661120015870242505207161135710278245311141873181325662386318186101664491831644481131398218151573583485131140414168248555911065541333214810316519618658613597616294023061520236364781068 ]df =DataFrame(X,[:HCO3, :SO4, :Cl, :Ca, :Mg, :Na])df[:, :Origins] = ["Aix-les-Bains", "Beckerish","Cayranne", "Chambon","Cristal-Roc","St Cyr","Evian","Ferita","St Hyppolite","Laurier", "Ogeu","Ondine","Perrier","Ribes", "Spa","Thonon", "Veri", "Viladreau","Vittel", "Volvic"]