# Theoretical part

## Feature dimensionality reduction

Feature dimensionality reduction is an application of unsupervised learning: reducing n-dimensional data to m-dimensional data (n>m). Can be applied to data compression and other fields

## Principal component analysis (PCA)

Principal component analysis is a commonly used feature dimensionality reduction method. For m-dimensional data A, you can reduce the dimensionality to obtain an n-dimensional data B (m>n), which satisfies $B = f(A)$ and $A/approx g(f(A))$, where f(x) is the encoding function and g(x) is the decoding function.

When performing principal component analysis, the optimization goal is $c = argmin ||x-g(c)||_{2}$, where c is the encoding and g(c) is the decoding function

# Code

## Import data set

import numpy as np
import pandas as pd
digits_train = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/optdigits/optdigits.tra', header=None)
digits_test = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/optdigits/optdigits.tes', header=None)

## Split data and labels

train_x,train_y = digits_train[np.arange(64)],digits_train
test_x,test_y = digits_test[np.arange(64)],digits_test

## Principal component analysis

from sklearn.decomposition import PCA
estimator = PCA(n_components=20)
pca_train_x = estimator.fit_transform(train_x)
pca_test_x = estimator.transform(test_x)

## Training support vector machine

from sklearn.svm import LinearSVC

### Raw data

svc = LinearSVC()
svc.fit(X=train_x,y=train_y)
svc.score(test_x,test_y)
0.9393433500278241

### PCA processed data

svc_pca = LinearSVC()
svc_pca.fit(pca_train_x,train_y)
svc_pca.score(pca_test_x,test_y)
0.91819699499165275
Reference: https://cloud.tencent.com/developer/article/1110770 sklearn-based principal component analysis theory partial code implementation-Cloud + Community-Tencent Cloud