Reading
The following material is optional. However, interested readers are encouraged to peruse it.
Google Ngram Viewer
This site provides information on the Google Ngram Viewer, also known as Google Books Ngram Viewer, which is one of the most widely used n-gram tools in scientific research and humanities scholarship.
View: Google Ngram Viewer
N-grams Data
This site provides 2-, 3-, 4-, and 5-grams based on the publicly available Corpus of Contemporary American English (COCA). The site enables a number of different queries, including sequences for noun + noun, verb + “the” + noun, three-word strings with a preposition in the middle position, and two word strings where either of the words have specific beginnings or endings.
View: ngrams.info
Big Data for the Humanities Using Google Ngrams. Discovering Hidden Patterns of Conceptual Trends
Shai Ophir
Read: Big Data for the Humanities Using Google Ngrams. Discovering Hidden Patterns of Conceptual Trends
Digital Humanities, Big Data, and Ngrams
Claude S. Fischer
June 30, 2013
Read: Digital Humanities, Big Data, and Ngrams
Understanding Word N-grams and N-gram Probability in Natural Language Processing
Sunny Srinidhi
Read: Understanding Word N-grams and N-gram Probability in Natural Language Processing
-or-
Read: Understanding Word N-grams and N-gram Probability in Natural Language Processing
Nov 27, 2019
From DataFrame to N-Grams
Ednalyn C. De Dios
May 22, 2020
Read: From DataFrame to N-Grams
Python Code
The code to generate the bigram displayed in this section is discussed in the next course. Interested readers may refer to the script Bigram_Visualization_Example.py. This code can be modified to calculate and display the unigram and trigram. A Jupyter Notebook (Bigram_Visualization_Example.ipynb) is available for this code.