Skip Navigation

Home Projects Publications Archives About Sign Up or Log In

Browse Resources

(3 classifications) (3 resources)

Text processing (Computer science)

Research (2)
Societies, etc. (2)
Technological innovations. (1)

View Resource Text Encoding Initiative

The Text Encoding Initiative (TEI) is an "international and interdisciplinary standard that helps libraries, museums, publishers, and individual scholars represent all kinds of literary and linguistic texts for online research and teaching." The site offers information about the TEI consortium; recommendations for the encoding of textual material in various languages; TEI Tutorials that provide...
View Resource TextArc: An Alternate Way to View a Text

Textarc is an unconventional tool that gives readers the opportunity to discover patterns and concepts in texts. Still in a developmental stage, the site offers readers the opportunity to utilize human visual processing by allowing intuition to help extract meaning from a text. By exposing every word at once, the eye is able to make connections and decipher meaning otherwise overlooked by normal...
View Resource Tabula Screenshot

Tabula is a tool to extract data from PDFs. It is often used to extract data from government reports for aggregation and analysis. It has been used in this way by journalists at ProPublica, The Times of London, Foreign Policy, and others. To use Tabula, users draw a box around the region in a text-based PDF (not a scanned document) that they wish to extract data from. Tabula then produces a...