Reading
The material on the following site is important and should be read either before or after studying this section.
Principal Component Analysis Explained Visually
Victor Powell (with text by Lewis Lehe)
This short web article provides a basic introduction to PCA illustrated with graphical examples. It is highly-recommended due to its intuitive, yet thorough presentation that emphasizes the “why” of PCA, and how it can be employed for data exploration.
The following material is optional. However, interested readers are encouraged to peruse it.
Principal Component Analysis (PCA) in Python
Aditya Sharma
January 1, 2020
This web article provides a thorough, intuitive introduction to principal component analysis (PCA) without complex mathematics. The main concepts of PCA are illustrated through examples demonstrated in Python using Scikit Learn library functions and popular data sets.
Clustering with Scikit-Learn in Python
Thomas Jurczyk
September 29, 2021
This web article provides a thorough demonstration of k-means clustering on Greco-Roman authors in the ancient world. Principal component analysis is used to further analyze the results. The example in this article is illustrated with the Scikit Learn package in Python. Many code snippets are presented. Mathematical details and more advanced techniques are also provided, which the reader may skip. For the purposes of the present discussion, most benefit from the article will be drawn from the discussion of k-means clustering, principal component analysis, the explanation of the application at the intersection of literary studies and classical studies, and the instructive Python code.
The following website may be used for reference.
PCA with Scikit Learn in Python
Python Code
This section uses the Python code:
K-Means_Ancient_Authors_Example.py (Jupyter Notebook K-Means_Ancient_Authors_Example.ipynb) and the data file DNP_ancient_authors.csv.
PCA_tSNE_Example.py (Jupyter Notebook PCA_tSNE_Example.ipynb).
K-Means_tSNE_Example.py (Jupyter Notebook K-Means_tSNE_Example.ipynb).
Note: Because the data generated are random, the visualizations may differ from those shown in the text.