In the digital humanities, there is a constant need to turn images and PDF files into plain text to apply analyses such as topic modelling, named entity recognition, and other techniques.
Download count: 0
- Digital object identifier: 10.5334/jors.164
- Identifier TypeInternational standard serial numberIdentifier Value2049-9647
- The final version of this article, as published in The Journal of Open Research, can be viewed online at: https://openresearchsoftware.metajnl.com/articles/10.5334/jors.164/, opens in a new window
Citation and reuse
Cite this item
This is a suggested citation. Consult the appropriate style guide for specific citation guidelines.
Damerow, J., Peirson, B. R., & Laubichler, M. D. (2017). The Giles Ecosystem – Storage, Text Extraction, and OCR of Documents. Journal of Open Research Software, 5. doi:10.5334/jors.164