Reading
The following material is optional. However, interested readers are encouraged to peruse it.
Natural Language Processing
Venugopal Adep
This post provides a description of some of the natural language processing functions available in the Python NLTK library, with examples.
View: Natural Language Processing
The following material should be used for reference.
Text Analysis in R
Welbers, W. van Atteveldt, and K. Benoit
Published in Communication Methods and Measures, Vol. 11, No. 4, 2017, pp. 245 – 265.
This paper provides a through introduction to R operations, functions, and packages for text processing. The different text processing steps are clearly identified and described. Each function is illustrated with examples. An online appendix with all source code presented in the paper is also available.
View: Text Analysis in R
View: Online Appendix
R for Beginners
Emmanuel Paradis
This document is a short but thorough introduction to the R language. It contains a generous number of examples to help the reader understand programming concepts that may be unfamiliar. R functions and data structures are also discussed. As a searchable PDF file, the text is also useful as a reference.
View: R for Beginners
Example: Textual Data Visualization
Kenneth Benoit, Adam Obeng, and Stefan Müller
This article from the quanteda site describes and demonstrates examples of text visualizations. A widely employed example is comparing key words in the inaugural speeches of U.S. Presidents, described on.
View: Example: Textual Data Visualization
Quanteda. Quantitative Analysis of Textual Data
This website provides a reference for the quanteda package and demonstrates several examples.
View: Quanteda. Quantitative Analysis of Textual Data
R Code
The R code described in this section is found in: TextAnalysis_Example.R and uses the data file subjectData.csv. An interactive Jupyter notebook for this code is available in the file TextAnalysis_Example.ipynb. Note that the values obtained by running this code may be slightly different than those presented in the text.