Natural Language Processing (NLP) techniques have increasingly been used in finance, accounting, and economics research to analyze text-based information more efficiently and effectively than primarily human-centered methods. The literature is rich with computational textual analysis techniques applied to consistent annual or quarterly financial fillings, with promising results to identify similarities between documents and firms, in addition to further using this information in relation to other economic phenomena. Building upon the knowledge gained from previous research and extending the application of NLP methods to other categories of financial documents, this project explores financial credit contracts, better understanding the information provided through their textual data by assessing patterns and relationships between documents and firms. The main methods used throughout this project is Term Frequency-Inverse Document Frequency (to represent each document as a numerical vector), Cosine Similarity (to measure the similarity between contracts), and K-Means Clustering (to organically derive clusters of documents based on the text included in the contract itself). Using these methods, the dimensions analyzed are various grouping methodologies (external industry classifications and text derived classifications), various granularities (document-wise and firm-wise), various financial documents associated with a single firm (the relationship between credit contracts and 10-K product descriptions), and how various mean cosine similarity distributions change over time.
Included in this item (2)
- Liu, Jeremy J (Author)
- Wahal, Sunil (Thesis director)
- Bharath, Sreedhar (Committee member)
- School of Mathematical and Statistical Sciences (Contributor)
- School for the Future of Innovation in Society (Contributor)
- Barrett, The Honors College (Contributor)