Reading
The following websites may be used for reference.
K-means Algorithm
This Scikit Learn website contains information about the implementation of k-means clustering.
sklearn.feature_extraction.text.TfidfVectorizer
sklearn.feature_extraction.text.TfidfTransformer
Clustering documents with TFIDF and KMeans
The following material is optional. However, interested readers are encouraged to peruse it.
Clustering with Scikit-Learn in Python
Thomas Jurczyk
September 29, 2021
This web article provides a thorough demonstration of k-means clustering on Greco-Roman authors in the ancient world. Principal component analysis is used to further analyze the results. The example in this article is illustrated with the Scikit Learn package in Python. Many code snippets are presented. Mathematical details and more advanced techniques are also provided, which the reader may skip. For the purposes of the present discussion, most benefit from the article will be drawn from the discussion of k-means clustering, principal component analysis, the explanation of the application at the intersection of literary studies and classical studies, and the instructive Python code.
Read: Clustering with Scikit-Learn in Python
Python Code
This section uses the Python code:
K-Means_Example.py (Jupyter Notebook K-Means_Example.ipynb).
K-Means_Ancient_Authors_Example.py (Jupyter Notebook K-Means_Ancient_Authors_Example.ipynb) and the data file DNP_ancient_authors.csv.