The Productive Scholar: Tools for Text Analysis in the Humanities

Text Analysis with NLTK Cheatsheet

Topic: Tools for Text Analysis in the Humanities
Speaker: Ben Johnston

Time: Thursday, April 3, 12:00 PM – 1:00 PM
Location: New Media Center, 130 Lewis Library, First Floor


A sequel to last semester’s ‘Tools for Text Analysis in the Humanities’, this session will give participants a brief yet hands-on introduction to NLTK, the Natural Language Toolkit. This extension to the popular Python programming language is geared specifically toward computational work with written human language data. In this introduction, we will use tools from this library to tokenize a corpus into sentences, n-grams, and words, create word frequency lists, view concordances, and do part-of-speech tagging. In doing so, this session will also serve as a very gentle introduction to the Python programming language. Absolutely no experience with Python or with programming is expected or required.

SESSION RECAP: Presenter Ben Johnston started by providing a contextual framework for this session which focused on Natural Language Toolkit (NLTK) and Python. He emphasizing the impossibility of actually learning Python in an hour, and the importance of those who have developed a sincere enthusiasm for the applications of digital tool with which they've become familiar to engage in 'knowledge sharing,' with peers and others. Knowledge sharing requires knowledge but not at the expert level. Digital humanists should be encouraged to share knowledge even while they themselves are still learning (as you will likely never stop learning). Doing so reinforces learning and helps build community–both important aspects of gaining competency in the digital humanities. Here's an excerpt from Ben's introduction: