Tag Archives: text

The Productive Scholar: Tools for Text Analysis in the Humanities

Text Analysis with NLTK Cheatsheet

Topic: Tools for Text Analysis in the Humanities170192449
Speaker: Ben Johnston

Time: Thursday, April 3, 12:00 PM – 1:00 PM
Location: New Media Center, 130 Lewis Library, First Floor


A sequel to last semester’s ‘Tools for Text Analysis in the Humanities’, this session will give participants a brief yet hands-on introduction to NLTK, the Natural Language Toolkit. This extension to the popular Python programming language is geared specifically toward computational work with written human language data. In this introduction, we will use tools from this library to tokenize a corpus into sentences, n-grams, and words, create word frequency lists, view concordances, and do part-of-speech tagging. In doing so, this session will also serve as a very gentle introduction to the Python programming language. Absolutely no experience with Python or with programming is expected or required.

SESSION RECAP: Presenter Ben Johnston started by providing a contextual framework for this session which focused on Natural Language Toolkit (NLTK) and Python. He emphasizing the impossibility of actually learning Python in an hour, and the importance of those who have developed a sincere enthusiasm for the applications of digital tool with which they’ve become familiar to engage in ‘knowledge sharing,’ with peers and others. Knowledge sharing requires knowledge but not at the expert level. Digital humanists should be encouraged to share knowledge even while they themselves are still learning (as you will likely never stop learning). Doing so reinforces learning and helps build community–both important aspects of gaining competency in the digital humanities. Here’s an excerpt from Ben’s introduction: Continue reading

See Text in Whole New Way: Text Visualization Tools

Data mining, concordances, word frequencies, all these things can be done to analyze text and to display the results (which are usually also in text form). Sometimes though, these results are hard to read, track, and to see correlation and relationships between bodies of texts and words. Text visualization adds another dimension to data mining a text. You can see in a simple and fast way how many words make up a text, what words have frequencies next to other words, and analyze the overall theme of a text and its corpus. The following tools listed below will help you get started with building a word frequency list and using your text to visualize your data, for the most part, in an easy and simple manner. Continue reading

Open Source Word Processors That Keep Arabic Characters Together

We recently came across an issue that a faculty member was having issues with students using Arabic in Microsoft Word on their Mac computers. When they type words, the characters do not connect like they should (each letter was separate). In our search to find out why this was happening, we found out that Microsoft Office no longer supports Arabic properly in their latest Mac version. We did find two open source word processors where the characters were not separated, AbiWord and NeoOffice. Both worked great but NeoOffice was able to open .docx files and AbiWord only opened .doc files.  To check out the software (free and open source) click on the link below:

NeoOffice: http://www.neooffice.org

AbiWord: http://www.abisource.com/

How to Render Arabic Text Properly in Flash CS4 with Actionscript

I have been working on a project that requires to have Arabic text in a flash website. The issue is that every time we add Arabic text, some of the letters get flipped when we export the movie (.swf file). Here is a work around that my boss (Ben Johnston) discovered. It uses actionscript and html code. Here are the steps below:

1. Create a text file with all the Arabic text you want to put into the fla file. Use HTML markup to create paragraphs, texts aligns etc. At the beginning of each line, you will write this actionscript code: instance_name=”html text”;

For example:
about_me_text = “<p><b style=’font-size:200%;’>About Me</b><p><br />“;

about_me_text += “<p>Author: Me<br>Department of Me Studies<br>Myself University <br></p><br >”;
about_me_text += “<p>More text<br></p><br >”;

Note: When you add an extra line, make sure your actionscript code has a + in it.

2. Go to the layer you want to add the text. Add a text box.

3. Open up the actionscript window for that layer. Paste in your text (with the actionscript and html markup in step 1)

4. Click on the text box. In the text box properties choose Dynamic Text

5. In Dynamic Text, there is a an instance textbox. Type in the instance name in the actionscript code. For this example, that would be: about_me_text

6. Under Paragraph in the text properties, in Behavior choose Multiline.

7. Under Options in the Text Properties, type in the same instance name for the Variable text box. In this example it would be:  about_me_text. (You can copy and paste the text you enter in step 5).

8. Render your movie.

Note: You can change the color, font type, and size in the Text Properties menu too.

MIT Lecture Browser



The MIT Lecture Browser lets you search for video lectures by using text keywords and/or categories like you would when you search for something in a search engine (like Google). It will then find any video with that word in it or in that catagory, and show you the transcript of the lectures and where those keywords are in the lecture. Also in the transcript,  there are buttons you can click on that will prompt your browser to play the video part of that lecture for that keyword. You will need RealPlayer video for your web browser to view the lecture videos. To search for videos or learn more click on the link below: