Skip Navigation

Scout Archives

Home Projects Publications Archives About Sign Up or Log In

Browse Resources

(4 classifications) (2 resources)

Text processing (Computer science)

Classification
Research (2)
Societies, etc. (2)
Software (1)
Technological innovations. (1)

Resources

Screenshot
Tabula

Tabula is a tool to extract data from PDFs. It is often used to extract data from government reports for aggregation and analysis. It has been used in this way by journalists at ProPublica, The Times of London, Foreign Policy, and others. To use Tabula, users draw a box around the region in a text-based PDF (not a scanned document) that they wish to extract data from. Tabula then produces a...

https://tabula.technology/
Text Encoding Initiative

The Text Encoding Initiative (TEI) is an "international and interdisciplinary standard that helps libraries, museums, publishers, and individual scholars represent all kinds of literary and linguistic texts for online research and teaching." The site offers information about the TEI consortium; recommendations for the encoding of textual material in various languages; TEI Tutorials that provide...

https://tei-c.org/