Search Content

Examining the usage of New NLP Techniques to process Raw Text Data Entries

Description

2018, Google researchers published the BERT (Bidirectional Encoder Representations from Transformers) model, which has since served as a starting point for hundreds of NLP (Natural Language Processing) related experiments and other derivative models. BERT was trained on masked-language modelling (sentence prediction) but its capabilities extend to more common NLP tasks,…

2018, Google researchers published the BERT (Bidirectional Encoder Representations from Transformers) model, which has since served as a starting point for hundreds of NLP (Natural Language Processing) related experiments and other derivative models. BERT was trained on masked-language modelling (sentence prediction) but its capabilities extend to more common NLP tasks, such as language inference and text classification. Naralytics is a company that seeks to use natural language in order to be able to categorize users who create text into multiple categories – which is a modified version of classification. However, the text that Naralytics seeks to pull from exceed the maximum token length of 512 tokens that BERT supports – so this report discusses the research towards multiple BERT derivatives that seek to address this problem – and then implements a solution that addresses the multiple concerns that are attached to this kind of model.

ContributorsNgo, Nicholas (Author) / Carter, Lynn (Thesis director) / Lee, Gyou-Re (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor) / Economics Program in CLAS (Contributor)

Created2023-05

Filtering by

Examining the usage of New NLP Techniques to process Raw Text Data Entries