Search Content

Learning from asymmetric models and matched pairs

Description

With the increase in computing power and availability of data, there has never been a greater need to understand data and make decisions from it. Traditional statistical techniques may not be adequate to handle the size of today's data or the complexities of the information hidden within the data. Thus…

With the increase in computing power and availability of data, there has never been a greater need to understand data and make decisions from it. Traditional statistical techniques may not be adequate to handle the size of today's data or the complexities of the information hidden within the data. Thus knowledge discovery by machine learning techniques is necessary if we want to better understand information from data. In this dissertation, we explore the topics of asymmetric loss and asymmetric data in machine learning and propose new algorithms as solutions to some of the problems in these topics. We also studied variable selection of matched data sets and proposed a solution when there is non-linearity in the matched data. The research is divided into three parts. The first part addresses the problem of asymmetric loss. A proposed asymmetric support vector machine (aSVM) is used to predict specific classes with high accuracy. aSVM was shown to produce higher precision than a regular SVM. The second part addresses asymmetric data sets where variables are only predictive for a subset of the predictor classes. Asymmetric Random Forest (ARF) was proposed to detect these kinds of variables. The third part explores variable selection for matched data sets. Matched Random Forest (MRF) was proposed to find variables that are able to distinguish case and control without the restrictions that exists in linear models. MRF detects variables that are able to distinguish case and control even in the presence of interaction and qualitative variables.

ContributorsKoh, Derek (Author) / Runger, George C. (Thesis advisor) / Wu, Tong (Committee member) / Pan, Rong (Committee member) / Cesta, John (Committee member) / Arizona State University (Publisher)

Created2013

Modeling time series data for supervised learning

Description

Temporal data are increasingly prevalent and important in analytics. Time series (TS) data are chronological sequences of observations and an important class of temporal data. Fields such as medicine, finance, learning science and multimedia naturally generate TS data. Each series provide a high-dimensional data vector that challenges the learning of…

Temporal data are increasingly prevalent and important in analytics. Time series (TS) data are chronological sequences of observations and an important class of temporal data. Fields such as medicine, finance, learning science and multimedia naturally generate TS data. Each series provide a high-dimensional data vector that challenges the learning of the relevant patterns This dissertation proposes TS representations and methods for supervised TS analysis. The approaches combine new representations that handle translations and dilations of patterns with bag-of-features strategies and tree-based ensemble learning. This provides flexibility in handling time-warped patterns in a computationally efficient way. The ensemble learners provide a classification framework that can handle high-dimensional feature spaces, multiple classes and interaction between features. The proposed representations are useful for classification and interpretation of the TS data of varying complexity. The first contribution handles the problem of time warping with a feature-based approach. An interval selection and local feature extraction strategy is proposed to learn a bag-of-features representation. This is distinctly different from common similarity-based time warping. This allows for additional features (such as pattern location) to be easily integrated into the models. The learners have the capability to account for the temporal information through the recursive partitioning method. The second contribution focuses on the comprehensibility of the models. A new representation is integrated with local feature importance measures from tree-based ensembles, to diagnose and interpret time intervals that are important to the model. Multivariate time series (MTS) are especially challenging because the input consists of a collection of TS and both features within TS and interactions between TS can be important to models. Another contribution uses a different representation to produce computationally efficient strategies that learn a symbolic representation for MTS. Relationships between the multiple TS, nominal and missing values are handled with tree-based learners. Applications such as speech recognition, medical diagnosis and gesture recognition are used to illustrate the methods. Experimental results show that the TS representations and methods provide better results than competitive methods on a comprehensive collection of benchmark datasets. Moreover, the proposed approaches naturally provide solutions to similarity analysis, predictive pattern discovery and feature selection.

ContributorsBaydogan, Mustafa Gokce (Author) / Runger, George C. (Thesis advisor) / Atkinson, Robert (Committee member) / Gel, Esma (Committee member) / Pan, Rong (Committee member) / Arizona State University (Publisher)

Created2012

Development of Complementary Fresh-Food Systems Through the Exploration and Identification of Profit-Maximizing, Supply Chains

Description

One of the greatest 21st century challenges is meeting the needs of a growing world population expected to increase 35% by 2050 given projected trends in diets, consumption and income. This in turn requires a 70-100% improvement on current production capability, even as the world is undergoing systemic climate…

One of the greatest 21st century challenges is meeting the needs of a growing world population expected to increase 35% by 2050 given projected trends in diets, consumption and income. This in turn requires a 70-100% improvement on current production capability, even as the world is undergoing systemic climate pattern changes. This growth not only translates to higher demand for staple products, such as rice, wheat, and beans, but also creates demand for high-value products such as fresh fruits and vegetables (FVs), fueled by better economic conditions and a more health conscious consumer. In this case, it would seem that these trends would present opportunities for the economic development of environmentally well-suited regions to produce high-value products. Interestingly, many regions with production potential still exhibit a considerable gap between their current and ‘true’ maximum capability, especially in places where poverty is more common. Paradoxically, often high-value, horticultural products could be produced in these regions, if relatively small capital investments are made and proper marketing and distribution channels are created. The hypothesis is that small farmers within local agricultural systems are well positioned to take advantage of existing sustainable and profitable opportunities, specifically in high-value agricultural production. Unearthing these opportunities can entice investments in small farming development and help them enter the horticultural industry, thus expand the volume, variety and/or quality of products available for global consumption. In this dissertation, the objective is three-fold: (1) to demonstrate the hidden production potential that exist within local agricultural communities, (2) highlight the importance of supply chain modeling tools in the strategic design of local agricultural systems, and (3) demonstrate the application of optimization and machine learning techniques to strategize the implementation of protective agricultural technologies.

As part of this dissertation, a yield approximation method is developed and integrated with a mixed-integer program to estimate a region’s potential to produce non-perennial, vegetable items. This integration offers practical approximations that help decision-makers identify technologies needed to protect agricultural production, alter harvesting patterns to better match market behavior, and provide an analytical framework through which external investment entities can assess different production options.

ContributorsFlores, Hector M. (Author) / Villalobos, Rene (Thesis advisor) / Pan, Rong (Committee member) / Wu, Teresa (Committee member) / Parker, Nathan (Committee member) / Arizona State University (Publisher)

Created2017

Graph Search as a Feature in Imperative/Procedural Programming Languages

Description

Graph theory is a critical component of computer science and software engineering, with algorithms concerning graph traversal and comprehension powering much of the largest problems in both industry and research. Engineers and researchers often have an accurate view of their target graph, however they struggle to implement a correct, and…

Graph theory is a critical component of computer science and software engineering, with algorithms concerning graph traversal and comprehension powering much of the largest problems in both industry and research. Engineers and researchers often have an accurate view of their target graph, however they struggle to implement a correct, and efficient, search over that graph.

To facilitate rapid, correct, efficient, and intuitive development of graph based solutions we propose a new programming language construct - the search statement. Given a supra-root node, a procedure which determines the children of a given parent node, and optional definitions of the fail-fast acceptance or rejection of a solution, the search statement can conduct a search over any graph or network. Structurally, this statement is modelled after the common switch statement and is put into a largely imperative/procedural context to allow for immediate and intuitive development by most programmers. The Go programming language has been used as a foundation and proof-of-concept of the search statement. A Go compiler is provided which implements this construct.

ContributorsHenderson, Christopher (Author) / Bansal, Ajay (Thesis advisor) / Lindquist, Timothy (Committee member) / Acuna, Ruben (Committee member) / Arizona State University (Publisher)

Created2018

\English

Description

\English is a programming language, a method of allowing programmers to write instructions such that a computer may understand and execute said instructions in the form of a program. Though many programming languages exist, this particular language is designed for ease of development and heavy optimizability in ways that no…

\English is a programming language, a method of allowing programmers to write instructions such that a computer may understand and execute said instructions in the form of a program. Though many programming languages exist, this particular language is designed for ease of development and heavy optimizability in ways that no other programming language is. Building on the principles of Assembly level efficiency, referential integrity, and high order functionality, this language is able to produce extremely efficient code; meanwhile, programmatically defined English-based reusable syntax and a strong, static type system make \English easier to read and write than many existing programming languages. Its generalization of all language structures and components to operators leaves the language syntax open to project-specific syntactical structuring, making it more easily applicable in more cases. The thesis project requirements came in three parts: a compiler to compile \English code into NASM Assembly to produce a final program product; a standard library to define many of the basic operations of the language, including the creation of lists; and C translation library that would utilize \English properties to compile C code using the \English compiler. Though designed and partially coded, the compiler remains incomplete. The standard library, C translation library, and design of the language were completed. Additional tools regarding the language design and implementation were also created, including a Gedit syntax highlighting configuration file; usage documentation describing in a tutorial style the basic usage of the language; and more. Though the thesis project itself may be complete, the \English project will continue in order to produce a new language capable of the abilities possible with the design of this language.

ContributorsDavey, Connor (Author) / Gupta, Sandeep (Thesis director) / Bazzi, Rida (Committee member) / Calliss, Debra (Committee member) / Barrett, The Honors College (Contributor)

Created2016-05

Zenith: Type Safe, Functional Programming Language for Lua

Description

This paper introduces Zenith, a statically typed, functional programming language that compiles to Lua modules. The goal of Zenith is to be used in tandem with Lua, as a secondary language, in which Lua developers can transition potentially unsound programs into Zenith instead. Here developers will be ensured a set…

This paper introduces Zenith, a statically typed, functional programming language that compiles to Lua modules. The goal of Zenith is to be used in tandem with Lua, as a secondary language, in which Lua developers can transition potentially unsound programs into Zenith instead. Here developers will be ensured a set of guarantees during compile time, which are provided through Zenith’s language design and type system. This paper formulates the reasoning behind the design choices in Zenith, based on prior work. This paper also provides a basic understanding and intuitions on the Hindley-Milner type system used in Zenith, and the functional programming data types used to encode unsound functions. With these ideas combined, the paper concludes on how Zenith can provide soundness and runtime safety as a language, and how Zenith may be used with Lua to create safe systems.

ContributorsShrestha, Abhash (Author) / De Luca, Gennaro (Thesis advisor) / Bansal, Ajay (Thesis advisor) / Chen, Yinong (Committee member) / Arizona State University (Publisher)

Created2023

Diversifying Relevant Search Results from Social Media Using Community Contributed Images

Description

Availability of affordable image and video capturing devices as well as rapid development of social networking and content sharing websites has led to the creation of new type of content, Social Media. Any system serving the end user’s query search request should not only take the relevant images into consideration…

Availability of affordable image and video capturing devices as well as rapid development of social networking and content sharing websites has led to the creation of new type of content, Social Media. Any system serving the end user’s query search request should not only take the relevant images into consideration but they also need to be divergent for a well-rounded description of a query. As a result, the automated optimization of image retrieval results that are also divergent becomes exceedingly important.

The main focus of this thesis is to use visual description of a landmark by choosing the most diverse pictures that best describe all the details of the queried location from community-contributed datasets. For this, an end-to-end framework has been built, to retrieve relevant results that are also diverse. Different retrieval re-ranking and diversification strategies are evaluated to find a balance between relevance and diversification. Clustering techniques are employed to improve divergence. A unique fusion approach has been adopted to overcome the dilemma of selecting an appropriate clustering technique and the corresponding parameters, given a set of data to be investigated. Extensive experiments have been conducted on the Flickr Div150Cred dataset that has 30 different landmark locations. The results obtained are promising when evaluated on metrics for relevance and diversification.

ContributorsKalakota, Vaibhav Reddy (Author) / Bansal, Ajay (Thesis advisor) / Bansal, Srividya (Committee member) / Findler, Michael (Committee member) / Arizona State University (Publisher)

Created2020

Generating Vocabulary Sets for Implicit Language Learning using Masked Language Modeling

Description

Globalization is driving a rapid increase in motivation for learning new languages, with online and mobile language learning applications being an extremely popular method of doing so. Many language learning applications focus almost exclusively on aiding students in acquiring vocabulary, one of the most important elements in achieving fluency in…

Globalization is driving a rapid increase in motivation for learning new languages, with online and mobile language learning applications being an extremely popular method of doing so. Many language learning applications focus almost exclusively on aiding students in acquiring vocabulary, one of the most important elements in achieving fluency in a language. A well-balanced language curriculum must include both explicit vocabulary instruction and implicit vocabulary learning through interaction with authentic language materials. However, most language learning applications focus only on explicit instruction, providing little support for implicit learning. Students require support with implicit vocabulary learning because they need enough context to guess and acquire new words. Traditional techniques aim to teach students enough vocabulary to comprehend the text, thus enabling them to acquire new words. Despite the wide variety of support for vocabulary learning offered by learning applications today, few offer guidance on how to select an optimal vocabulary study set.

This thesis proposes a novel method of student modeling which uses pre-trained masked language models to model a student's reading comprehension abilities and detect words which are required for comprehension of a text. It explores the efficacy of using pre-trained masked language models to model human reading comprehension and presents a vocabulary study set generation pipeline using this method. This pipeline creates vocabulary study sets for explicit language learning that enable comprehension while still leaving some words to be acquired implicitly. Promising results show that masked language modeling can be used to model human comprehension and that the pipeline produces reasonably sized vocabulary study sets.

ContributorsEdgar, Vatricia Cathrine (Author) / Bansal, Ajay (Thesis advisor) / Acuna, Ruben (Committee member) / Mehlhase, Alexandra (Committee member) / Arizona State University (Publisher)

Created2020

Filtering by