Search Content

Matching Items (3)

ConstrictR and ConstrictPy: R Package and Python Tool for Microbiome Analysis

Description

I, Christopher Negrich, am the sole author of this paper, but the tools described were designed in collaboration with Andrew Hoetker. ConstrictR (constrictor) and ConstrictPy are an R package and python tool designed together. ConstrictPy implements the functions and methods defined in ConstrictR and applies data handling, data parsing, input/output (I/O), and a user interface to increase usability. ConstrictR implements a variety of common data analysis methods used for statistical and subnetwork analysis. The majority of these methods are inspired by Lionel Guidi's 2016 paper, Plankton networks driving carbon export in the oligotrophic ocean. Additional methods were added to expand functionality, usability, and applicability to different areas of data science. Both ConstrictR and ConstrictPy are currently publicly available and usable, however, they are both ongoing projects. ConstrictR is available at github.com/cnegrich and ConstrictPy is available at github.com/ahoetker. Currently, ConstrictR has implemented functions for descriptive statistics, correlation, covariance, rank, sparsity, and weighted correlation network analysis with clustering, centrality, profiling, error handling, and data parsing methods to be released soon. ConstrictPy has fully implemented and integrated the features in ConstrictR as well as created functions for I/O and conversion between pandas and R data frames with a full feature user interface to be released soon. Both ConstrictR and ConstrictPy are designed to work with minimal dependencies and maximum available information on the algorithms implemented. As a result, ConstrictR is only dependent on base R (v3.4.4) functions with no libraries imported. ConstrictPy is dependent upon only pandas, Rpy2, and ConstrictR. This was done to increase longevity and independence of these tools. Additionally, all mathematical information is documented alongside the code, increasing the available information on how these tools function. Although neither tool is in its final version, this paper documents the code, mathematics, and instructions for use, in addition to plans for future work, for of the current versions of ConstrictR (v0.0.1) and ConstrictPy (v0.0.1).

ContributorsNegrich, Christopher Alec (Author) / Can, Huansheng (Thesis director) / Hansford, Dianne (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

A study of text mining framework for automated classification of software requirements in enterprise systems

Description

Text Classification is a rapidly evolving area of Data Mining while Requirements Engineering is a less-explored area of Software Engineering which deals the process of defining, documenting and maintaining a software system's requirements. When researchers decided to blend these two streams in, there was research on automating the process of classification of software requirements statements into categories easily comprehensible for developers for faster development and delivery, which till now was mostly done manually by software engineers - indeed a tedious job. However, most of the research was focused on classification of Non-functional requirements pertaining to intangible features such as security, reliability, quality and so on. It is indeed a challenging task to automatically classify functional requirements, those pertaining to how the system will function, especially those belonging to different and large enterprise systems. This requires exploitation of text mining capabilities. This thesis aims to investigate results of text classification applied on functional software requirements by creating a framework in R and making use of algorithms and techniques like k-nearest neighbors, support vector machine, and many others like boosting, bagging, maximum entropy, neural networks and random forests in an ensemble approach. The study was conducted by collecting and visualizing relevant enterprise data manually classified previously and subsequently used for training the model. Key components for training included frequency of terms in the documents and the level of cleanliness of data. The model was applied on test data and validated for analysis, by studying and comparing parameters like precision, recall and accuracy.

ContributorsSwadia, Japa (Author) / Ghazarian, Arbi (Thesis advisor) / Bansal, Srividya (Committee member) / Gaffar, Ashraf (Committee member) / Arizona State University (Publisher)

Created2016

Automating by Developing Model Components for the Insurance Ratemaking Actuarial Procedures

Description

The objective of this study is to build a model using R and RStudio that automates ratemaking procedures for Company XYZ’s actuaries in their commercial general liability pricing department. The purpose and importance of this objective is to allow actuaries to work more efficiently and effectively by using this model that outputs the results they otherwise would have had to code and calculate on their own. Instead of spending time working towards these results, the actuaries can analyze the findings, strategize accordingly, and communicate with business partners. The model was built from R code that was later transformed to Shiny, a package within RStudio that allows for the build-up of interactive web applications. The final result is a Shiny app that first takes in multiple datasets from Company XYZ’s data warehouse and displays different views of the data in order for actuaries to make selections on development and trend methods. The app outputs the re-created ratemaking exhibits showing the resulting developed and trended loss and premium as well as the experience-based indicated rate level change based on prior selections. The ratemaking process and Shiny app functionality will be detailed in this report.

ContributorsGilkey, Gina (Author) / Zicarelli, John (Thesis director) / Milovanovic, Jelena (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2022-05