Description

In the digital humanities, there is a constant need to turn images and PDF files into plain text to apply analyses such as topic modelling, named entity recognition, and other techniques.

Reuse Permissions
  • Downloads
    pdf (1.3 MB)

    Download count: 0

    Details

    Contributors
    Date Created
    2017-09-28
    Resource Type
  • Text
  • Collections this item is in
    Identifier
    • Digital object identifier: 10.5334/jors.164
    • Identifier Type
      International standard serial number
      Identifier Value
      2049-9647
    Note

    Citation and reuse

    Cite this item

    This is a suggested citation. Consult the appropriate style guide for specific citation guidelines.

    Damerow, J., Peirson, B. R., & Laubichler, M. D. (2017). The Giles Ecosystem – Storage, Text Extraction, and OCR of Documents. Journal of Open Research Software, 5. doi:10.5334/jors.164

    Machine-readable links