Search Content

Graph Search as a Feature in Imperative/Procedural Programming Languages

Description

Graph theory is a critical component of computer science and software engineering, with algorithms concerning graph traversal and comprehension powering much of the largest problems in both industry and research. Engineers and researchers often have an accurate view of their target graph, however they struggle to implement a correct, and…

Graph theory is a critical component of computer science and software engineering, with algorithms concerning graph traversal and comprehension powering much of the largest problems in both industry and research. Engineers and researchers often have an accurate view of their target graph, however they struggle to implement a correct, and efficient, search over that graph.

To facilitate rapid, correct, efficient, and intuitive development of graph based solutions we propose a new programming language construct - the search statement. Given a supra-root node, a procedure which determines the children of a given parent node, and optional definitions of the fail-fast acceptance or rejection of a solution, the search statement can conduct a search over any graph or network. Structurally, this statement is modelled after the common switch statement and is put into a largely imperative/procedural context to allow for immediate and intuitive development by most programmers. The Go programming language has been used as a foundation and proof-of-concept of the search statement. A Go compiler is provided which implements this construct.

ContributorsHenderson, Christopher (Author) / Bansal, Ajay (Thesis advisor) / Lindquist, Timothy (Committee member) / Acuna, Ruben (Committee member) / Arizona State University (Publisher)

Created2018

3D - Patch Based Machine Learning Systems for Alzheimer’s Disease classification via 18F-FDG PET Analysis

Description

Alzheimer’s disease (AD), is a chronic neurodegenerative disease that usually starts slowly and gets worse over time. It is the cause of 60% to 70% of cases of dementia. There is growing interest in identifying brain image biomarkers that help evaluate AD risk pre-symptomatically. High-dimensional non-linear pattern classification methods have…

Alzheimer’s disease (AD), is a chronic neurodegenerative disease that usually starts slowly and gets worse over time. It is the cause of 60% to 70% of cases of dementia. There is growing interest in identifying brain image biomarkers that help evaluate AD risk pre-symptomatically. High-dimensional non-linear pattern classification methods have been applied to structural magnetic resonance images (MRI’s) and used to discriminate between clinical groups in Alzheimers progression. Using Fluorodeoxyglucose (FDG) positron emission tomography (PET) as the pre- ferred imaging modality, this thesis develops two independent machine learning based patch analysis methods and uses them to perform six binary classification experiments across different (AD) diagnostic categories. Specifically, features were extracted and learned using dimensionality reduction and dictionary learning & sparse coding by taking overlapping patches in and around the cerebral cortex and using them as fea- tures. Using AdaBoost as the preferred choice of classifier both methods try to utilize 18F-FDG PET as a biological marker in the early diagnosis of Alzheimer’s . Addi- tional we investigate the involvement of rich demographic features (ApoeE3, ApoeE4 and Functional Activities Questionnaires (FAQ)) in classification. The experimental results on Alzheimer’s Disease Neuroimaging initiative (ADNI) dataset demonstrate the effectiveness of both the proposed systems. The use of 18F-FDG PET may offer a new sensitive biomarker and enrich the brain imaging analysis toolset for studying the diagnosis and prognosis of AD.

ContributorsSrivastava, Anant (Author) / Wang, Yalin (Thesis advisor) / Bansal, Ajay (Thesis advisor) / Liang, Jianming (Committee member) / Arizona State University (Publisher)

Created2017

Optimizing Performance Measures in Classification Using Ensemble Learning Methods

Description

Ensemble learning methods like bagging, boosting, adaptive boosting, stacking have traditionally shown promising results in improving the predictive accuracy in classification. These techniques have recently been widely used in various domains and applications owing to the improvements in computational efficiency and distributed computing advances. However, with the advent of wide…

Ensemble learning methods like bagging, boosting, adaptive boosting, stacking have traditionally shown promising results in improving the predictive accuracy in classification. These techniques have recently been widely used in various domains and applications owing to the improvements in computational efficiency and distributed computing advances. However, with the advent of wide variety of applications of machine learning techniques to class imbalance problems, further focus is needed to evaluate, improve and optimize other performance measures such as sensitivity (true positive rate) and specificity (true negative rate) in classification. This thesis demonstrates a novel approach to evaluate and optimize the performance measures (specifically sensitivity and specificity) using ensemble learning methods for classification that can be especially useful in class imbalanced datasets. In this thesis, ensemble learning methods (specifically bagging and boosting) are used to optimize the performance measures (sensitivity and specificity) on a UC Irvine (UCI) 130 hospital diabetes dataset to predict if a patient will be readmitted to the hospital based on various feature vectors. From the experiments conducted, it can be empirically concluded that, by using ensemble learning methods, although accuracy does improve to some margin, both sensitivity and specificity are optimized significantly and consistently over different cross validation approaches. The implementation and evaluation has been done on a subset of the large UCI 130 hospital diabetes dataset. The performance measures of ensemble learners are compared to the base machine learning classification algorithms such as Naive Bayes, Logistic Regression, k Nearest Neighbor, Decision Trees and Support Vector Machines.

ContributorsBahl, Neeraj Dharampal (Author) / Bansal, Ajay (Thesis advisor) / Amresh, Ashish (Committee member) / Bansal, Srividya (Committee member) / Arizona State University (Publisher)

Created2017

Evaluation of a Guided Machine Learning Approach for Pharmacokinetic Modeling

Description

A medical control system, a real-time controller, uses a predictive model of human physiology for estimation and controlling of drug concentration in the human body. Artificial Pancreas (AP) is an example of the control system which regulates blood glucose in T1D patients. The predictive model in the control system…

A medical control system, a real-time controller, uses a predictive model of human physiology for estimation and controlling of drug concentration in the human body. Artificial Pancreas (AP) is an example of the control system which regulates blood glucose in T1D patients. The predictive model in the control system such as Bergman Minimal Model (BMM) is based on physiological modeling technique which separates the body into the number of anatomical compartments and each compartment's effect on body system is determined by their physiological parameters. These models are less accurate due to unaccounted physiological factors effecting target values. Estimation of a large number of physiological parameters through optimization algorithm is computationally expensive and stuck in local minima. This work evaluates a machine learning(ML) framework which has an ML model guided through physiological models. A support vector regression model guided through modified BMM is implemented for estimation of blood glucose levels. Physical activity and Endogenous glucose production are key factors that contribute in the increased hypoglycemia events thus, this work modifies Bergman Minimal Model ( Bergman et al. 1981) for more accurate estimation of blood glucose levels. Results show that the SVR outperformed BMM by 0.164 average RMSE for 7 different patients in the free-living scenario. This computationally inexpensive data driven model can potentially learn parameters more accurately with time. In conclusion, advised prediction model is promising in modeling the physiology elements in living systems.

ContributorsAgrawal, Anurag (Author) / Gupta, Sandeep K. S. (Thesis advisor) / Banerjee, Ayan (Committee member) / Kudva, Yogish (Committee member) / Arizona State University (Publisher)

Created2017

Minimizing Dataset Size Requirements for Machine Learning

Description

Machine learning methodologies are widely used in almost all aspects of software engineering. An effective machine learning model requires large amounts of data to achieve high accuracy. The data used for classification is mostly labeled, which is difficult to obtain. The dataset requires both high costs and effort to accurately…

Machine learning methodologies are widely used in almost all aspects of software engineering. An effective machine learning model requires large amounts of data to achieve high accuracy. The data used for classification is mostly labeled, which is difficult to obtain. The dataset requires both high costs and effort to accurately label the data into different classes. With abundance of data, it becomes necessary that all the data should be labeled for its proper utilization and this work focuses on reducing the labeling effort for large dataset. The thesis presents a comparison of different classifiers performance to test if small set of labeled data can be utilized to build accurate models for high prediction rate. The use of small dataset for classification is then extended to active machine learning methodology where, first a one class classifier will predict the outliers in the data and then the outlier samples are added to a training set for support vector machine classifier for labeling the unlabeled data. The labeling of dataset can be scaled up to avoid manual labeling and building more robust machine learning methodologies.

ContributorsBatra, Salil (Author) / Femiani, John (Thesis advisor) / Amresh, Ashish (Thesis advisor) / Bansal, Ajay (Committee member) / Arizona State University (Publisher)

Created2017

Automating Generation of Web GUI from a Design Image

Description

Frontend development often involves the repetitive and time-consuming task of transforming a Graphical User interface (GUI) design into Frontend Code. The GUI design could either be an image or a design created on tools like Figma, Sketch, etc. This process can be particularly challenging when the website designs are experimental…

Frontend development often involves the repetitive and time-consuming task of transforming a Graphical User interface (GUI) design into Frontend Code. The GUI design could either be an image or a design created on tools like Figma, Sketch, etc. This process can be particularly challenging when the website designs are experimental and undergo multiple iterations before the final version gets deployed. In such cases, developers work with the designers to make continuous changes and improve the look and feel of the website. This can lead to a lot of reworks and a poorly managed codebase that requires significant developer resources. To tackle this problem, researchers are exploring ways to automate the process of transforming image designs into functional websites instantly. This thesis explores the use of machine learning, specifically Recurrent Neural networks (RNN) to generate an intermediate code from an image design and then compile it into a React web frontend code. By utilizing this approach, designers can essentially transform an image design into a functional website, granting them creative freedom and the ability to present working prototypes to stockholders in real-time. To overcome the limitations of existing publicly available datasets, the thesis places significant emphasis on generating synthetic datasets. As part of this effort, the research proposes a novel method to double the size of the pix2code [2] dataset by incorporating additional complex HTML elements such as login forms, carousels, and cards. This approach has the potential to enhance the quality and diversity of training data available for machine learning models. Overall, the proposed approach offers a promising solution to the repetitive and time-consuming task of transforming GUI designs into frontend code.

ContributorsSingh, Ajitesh Janardan (Author) / Bansal, Ajay (Thesis advisor) / Mehlhase, Alexandra (Committee member) / Baron, Tyler (Committee member) / Arizona State University (Publisher)

Created2023

Improving Ontology Alignment Using Machine Learning Techniques

Description

Ontologies play an important role in storing and exchanging digitized data. As the need for semantic web information grows, organizations from around the globe has defined ontologies in different domains to better represent the data. But different organizations define ontologies of the same entity in their own way. Finding ontologies…

Ontologies play an important role in storing and exchanging digitized data. As the need for semantic web information grows, organizations from around the globe has defined ontologies in different domains to better represent the data. But different organizations define ontologies of the same entity in their own way. Finding ontologies of the same entity in different fields and domains has become very important for unifying and improving interoperability of data between these multiple domains. Many different techniques have been used over the year, including human assisted, automated and hybrid. In recent years with the availability of many machine learning techniques, researchers are trying to apply these techniques to solve the ontology alignment problem across different domains. In this study I have looked into the use of different machine learning techniques such as Support Vector Machine, Stochastic Gradient Descent, Random Forest etc. for solving ontology alignment problem with some of the most commonly used datasets found from the famous Ontology Alignment Evaluation Initiative (OAEI). I have proposed a method OntoAlign which demonstrates the importance of using different types of similarity measures for feature extraction from ontology data in order to achieve better results for ontology alignment.

ContributorsNasim, Tariq M (Author) / Bansal, Srividya (Thesis advisor) / Mehlhase, Alexandra (Committee member) / Banerjee, Ayan (Committee member) / Arizona State University (Publisher)

Created2022

Time series prediction for stock price and opioid incident location

Description

Time series forecasting is the prediction of future data after analyzing the past data for temporal trends. This work investigates two fields of time series forecasting in the form of Stock Data Prediction and the Opioid Incident Prediction. In this thesis, the Stock Data Prediction Problem investigates methods which could…

Time series forecasting is the prediction of future data after analyzing the past data for temporal trends. This work investigates two fields of time series forecasting in the form of Stock Data Prediction and the Opioid Incident Prediction. In this thesis, the Stock Data Prediction Problem investigates methods which could predict the trends in the NYSE and NASDAQ stock markets for ten different companies, nine of which are part of the Dow Jones Industrial Average (DJIA). A novel deep learning model which uses a Generative Adversarial Network (GAN) is used to predict future data and the results are compared with the existing regression techniques like Linear, Huber, and Ridge regression and neural network models such as Long-Short Term Memory (LSTMs) models.

In this thesis, the Opioid Incident Prediction Problem investigates methods which could predict the location of future opioid overdose incidences using the past opioid overdose incidences data. A similar deep learning model is used to predict the location of the future overdose incidences given the two datasets of the past incidences (Connecticut and Cincinnati Opioid incidence datasets) and compared with the existing neural network models such as Convolution LSTMs, Attention-based Convolution LSTMs, and Encoder-Decoder frameworks. Experimental results on the above-mentioned datasets for both the problems show the superiority of the proposed architectures over the standard statistical models.

ContributorsThomas, Kevin, M.S (Author) / Sen, Arunabha (Thesis advisor) / Davulcu, Hasan (Committee member) / Banerjee, Ayan (Committee member) / Arizona State University (Publisher)

Created2019

Diversifying Relevant Search Results from Social Media Using Community Contributed Images

Description

Availability of affordable image and video capturing devices as well as rapid development of social networking and content sharing websites has led to the creation of new type of content, Social Media. Any system serving the end user’s query search request should not only take the relevant images into consideration…

Availability of affordable image and video capturing devices as well as rapid development of social networking and content sharing websites has led to the creation of new type of content, Social Media. Any system serving the end user’s query search request should not only take the relevant images into consideration but they also need to be divergent for a well-rounded description of a query. As a result, the automated optimization of image retrieval results that are also divergent becomes exceedingly important.

The main focus of this thesis is to use visual description of a landmark by choosing the most diverse pictures that best describe all the details of the queried location from community-contributed datasets. For this, an end-to-end framework has been built, to retrieve relevant results that are also diverse. Different retrieval re-ranking and diversification strategies are evaluated to find a balance between relevance and diversification. Clustering techniques are employed to improve divergence. A unique fusion approach has been adopted to overcome the dilemma of selecting an appropriate clustering technique and the corresponding parameters, given a set of data to be investigated. Extensive experiments have been conducted on the Flickr Div150Cred dataset that has 30 different landmark locations. The results obtained are promising when evaluated on metrics for relevance and diversification.

ContributorsKalakota, Vaibhav Reddy (Author) / Bansal, Ajay (Thesis advisor) / Bansal, Srividya (Committee member) / Findler, Michael (Committee member) / Arizona State University (Publisher)

Created2020

Neural Network Architecture with External Memory and Domain-aware Weight Switching Mechanism

Description

Humans have an excellent ability to analyze and process information from multiple domains. They also possess the ability to apply the same decision-making process when the situation is familiar with their previous experience.

Inspired by human's ability to remember past experiences and apply the same when a similar situation occurs,…

Humans have an excellent ability to analyze and process information from multiple domains. They also possess the ability to apply the same decision-making process when the situation is familiar with their previous experience.

Inspired by human's ability to remember past experiences and apply the same when a similar situation occurs, the research community has attempted to augment memory with Neural Network to store the previously learned information. Together with this, the community has also developed mechanisms to perform domain-specific weight switching to handle multiple domains using a single model. Notably, the two research fields work independently, and the goal of this dissertation is to combine their capabilities.

This dissertation introduces a Neural Network module augmented with two external memories, one allowing the network to read and write the information and another to perform domain-specific weight switching. Two learning tasks are proposed in this work to investigate the model performance - solving mathematics operations sequence and action based on color sequence identification. A wide range of experiments with these two tasks verify the model's learning capabilities.

ContributorsPatel, Deep Chittranjan (Author) / Ben Amor, Hani (Thesis advisor) / Banerjee, Ayan (Committee member) / McDaniel, Troy (Committee member) / Arizona State University (Publisher)

Created2020

Filtering by