"

Reading

The following material is optional.  However, interested readers are encouraged to peruse it.

Natural Language Processing

Venugopal Adep

This post provides a description of some of the natural language processing functions available in the Python NLTK library, with examples.

View: Natural Language Processing

 

 

The following material should be used for reference.

Text Analysis in R

Welbers, W. van Atteveldt, and K. Benoit

Published in Communication Methods and Measures, Vol. 11, No. 4, 2017, pp. 245 – 265.

This paper provides a through introduction to R operations, functions, and packages for text processing.  The different text processing steps are clearly identified and described.  Each function is illustrated with examples.  An online appendix with all source code presented in the paper is also available.

View: Text Analysis in R

View: Online Appendix

 

R for Beginners

Emmanuel Paradis

This document is a short but thorough introduction to the R language.  It contains a generous number of examples to help the reader understand programming concepts that may be unfamiliar.  R functions and data structures are also discussed.  As a searchable PDF file, the text is also useful as a reference.

View: R for Beginners

 

Example: Textual Data Visualization

Kenneth Benoit, Adam Obeng, and Stefan Müller

This article from the quanteda site describes and demonstrates examples of text visualizations.  A widely employed example is comparing key words in the inaugural speeches of U.S. Presidents, described on.

View: Example: Textual Data Visualization

 

Quanteda.  Quantitative Analysis of Textual Data

This website provides a reference for the quanteda package and demonstrates several examples.

View: Quanteda.  Quantitative Analysis of Textual Data

 

R Code

The R code described in this section is found in: TextAnalysis_Example.R and uses the data file subjectData.csv.  An interactive Jupyter notebook for this code is available in the   file TextAnalysis_Example.ipynb.  Note that the values obtained by running this code may be slightly different than those presented in the text.

[NEXT]

 

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Digital Humanities Tools and Techniques I Copyright © 2022 by Mark P. Wachowiak is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.