It will be a document showing the Effect of Dimensionality Reduction on the accuracy of different classifiers.
The document will have simulations On High dimensional dataset of different shapes:
Each dataset is synthesized from sklearn established Datasets. Each dataset has 1000 dimensions with only 2 dimensions of data and rest are noise dimensions.
Questions to answer:
- Analyzing how dimensionality reduction helps in classification for different classifiers.
- Analyzing how classifiers perform with a different number of reduced datasets from the main high dimensional dataset.
Pipeline to be followed:
- Defining a dataset with sklearn established synthetic datasets with High dimensions.
- Performing classification on data and measuring accuracy for quantification of the process.
- performing the Dimensionality reduction technique keeping varying numbers of reduced dimensions.
- Checking the performance of classification again after reducing dimension after each iteration.
The output of the PR would be a figure showing different datasets, comparing accuracies of different classifiers with and without dimensionality reduction and a plot showing varying accuracies over reduced dimensions.
Experiments to follow:
https://github.com/NeuroDataDesign/team-forbidden-forest/blob/master/Parimal%20Joshi/Final_pr_2.ipynb