Search Content

Sparse learning package with stability selection and application to alzheimer's disease

Description

Sparse learning is a technique in machine learning for feature selection and dimensionality reduction, to find a sparse set of the most relevant features. In any machine learning problem, there is a considerable amount of irrelevant information, and separating relevant information from the irrelevant information has been a topic of…

Sparse learning is a technique in machine learning for feature selection and dimensionality reduction, to find a sparse set of the most relevant features. In any machine learning problem, there is a considerable amount of irrelevant information, and separating relevant information from the irrelevant information has been a topic of focus. In supervised learning like regression, the data consists of many features and only a subset of the features may be responsible for the result. Also, the features might require special structural requirements, which introduces additional complexity for feature selection. The sparse learning package, provides a set of algorithms for learning a sparse set of the most relevant features for both regression and classification problems. Structural dependencies among features which introduce additional requirements are also provided as part of the package. The features may be grouped together, and there may exist hierarchies and over- lapping groups among these, and there may be requirements for selecting the most relevant groups among them. In spite of getting sparse solutions, the solutions are not guaranteed to be robust. For the selection to be robust, there are certain techniques which provide theoretical justification of why certain features are selected. The stability selection, is a method for feature selection which allows the use of existing sparse learning methods to select the stable set of features for a given training sample. This is done by assigning probabilities for the features: by sub-sampling the training data and using a specific sparse learning technique to learn the relevant features, and repeating this a large number of times, and counting the probability as the number of times a feature is selected. Cross-validation which is used to determine the best parameter value over a range of values, further allows to select the best parameter value. This is done by selecting the parameter value which gives the maximum accuracy score. With such a combination of algorithms, with good convergence guarantees, stable feature selection properties and the inclusion of various structural dependencies among features, the sparse learning package will be a powerful tool for machine learning research. Modular structure, C implementation, ATLAS integration for fast linear algebraic subroutines, make it one of the best tool for a large sparse setting. The varied collection of algorithms, support for group sparsity, batch algorithms, are a few of the notable functionality of the SLEP package, and these features can be used in a variety of fields to infer relevant elements. The Alzheimer Disease(AD) is a neurodegenerative disease, which gradually leads to dementia. The SLEP package is used for feature selection for getting the most relevant biomarkers from the available AD dataset, and the results show that, indeed, only a subset of the features are required to gain valuable insights.

ContributorsThulasiram, Ramesh (Author) / Ye, Jieping (Thesis advisor) / Xue, Guoliang (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2011

Expanding data mining theory for industrial applications

Description

The field of Data Mining is widely recognized and accepted for its applications in many business problems to guide decision-making processes based on data. However, in recent times, the scope of these problems has swollen and the methods are under scrutiny for applicability and relevance to real-world circumstances. At the…

The field of Data Mining is widely recognized and accepted for its applications in many business problems to guide decision-making processes based on data. However, in recent times, the scope of these problems has swollen and the methods are under scrutiny for applicability and relevance to real-world circumstances. At the crossroads of innovation and standards, it is important to examine and understand whether the current theoretical methods for industrial applications (which include KDD, SEMMA and CRISP-DM) encompass all possible scenarios that could arise in practical situations. Do the methods require changes or enhancements? As part of the thesis I study the current methods and delineate the ideas of these methods and illuminate their shortcomings which posed challenges during practical implementation. Based on the experiments conducted and the research carried out, I propose an approach which illustrates the business problems with higher accuracy and provides a broader view of the process. It is then applied to different case studies highlighting the different aspects to this approach.

ContributorsAnand, Aneeth (Author) / Liu, Huan (Thesis advisor) / Kempf, Karl G. (Thesis advisor) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2012

Characterization of cost excess in cloud applications

Description

The pay-as-you-go economic model of cloud computing increases the visibility, traceability, and verifiability of software costs. Application developers must understand how their software uses resources when running in the cloud in order to stay within budgeted costs and/or produce expected profits. Cloud computing's unique economic model also leads naturally to…

The pay-as-you-go economic model of cloud computing increases the visibility, traceability, and verifiability of software costs. Application developers must understand how their software uses resources when running in the cloud in order to stay within budgeted costs and/or produce expected profits. Cloud computing's unique economic model also leads naturally to an earn-as-you-go profit model for many cloud based applications. These applications can benefit from low level analyses for cost optimization and verification. Testing cloud applications to ensure they meet monetary cost objectives has not been well explored in the current literature. When considering revenues and costs for cloud applications, the resource economic model can be scaled down to the transaction level in order to associate source code with costs incurred while running in the cloud. Both static and dynamic analysis techniques can be developed and applied to understand how and where cloud applications incur costs. Such analyses can help optimize (i.e. minimize) costs and verify that they stay within expected tolerances. An adaptation of Worst Case Execution Time (WCET) analysis is presented here to statically determine worst case monetary costs of cloud applications. This analysis is used to produce an algorithm for determining control flow paths within an application that can exceed a given cost threshold. The corresponding results are used to identify path sections that contribute most to cost excess. A hybrid approach for determining cost excesses is also presented that is comprised mostly of dynamic measurements but that also incorporates calculations that are based on the static analysis approach. This approach uses operational profiles to increase the precision and usefulness of the calculations.

ContributorsBuell, Kevin, Ph.D (Author) / Collofello, James (Thesis advisor) / Davulcu, Hasan (Committee member) / Lindquist, Timothy (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2012

Listing combinatorial objects

Description

Gray codes are perhaps the best known structures for listing sequences of combinatorial objects, such as binary strings. Simply defined as a minimal change listing, Gray codes vary greatly both in structure and in the types of objects that they list. More specific types of Gray codes are universal cycles…

Gray codes are perhaps the best known structures for listing sequences of combinatorial objects, such as binary strings. Simply defined as a minimal change listing, Gray codes vary greatly both in structure and in the types of objects that they list. More specific types of Gray codes are universal cycles and overlap sequences. Universal cycles are Gray codes on a set of strings of length n in which the first n-1 letters of one object are the same as the last n-1 letters of its predecessor in the listing. Overlap sequences allow this overlap to vary between 1 and n-1. Some of our main contributions to the areas of Gray codes and universal cycles include a new Gray code algorithm for fixed weight m-ary words, and results on the existence of universal cycles for weak orders on [n]. Overlap cycles are a relatively new structure with very few published results. We prove the existence of s-overlap cycles for k-permutations of [n], which has been an open research problem for several years, as well as constructing 1- overlap cycles for Steiner triple and quadruple systems of every order. Also included are various other results of a similar nature covering other structures such as binary strings, m-ary strings, subsets, permutations, weak orders, partitions, and designs. These listing structures lend themselves readily to some classes of combinatorial objects, such as binary n-tuples and m-ary n-tuples. Others require more work to find an appropriate structure, such as k-subsets of an n-set, weak orders, and designs. Still more require a modification in the representation of the objects to fit these structures, such as partitions. Determining when and how we can fit these sets of objects into our three listing structures is the focus of this dissertation.

ContributorsHoran, Victoria E (Author) / Hurlbert, Glenn H. (Thesis advisor) / Czygrinow, Andrzej (Committee member) / Fishel, Susanna (Committee member) / Colbourn, Charles (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2012

Heuristics for Arc Routing Problems and Their Applications

Description

Arc Routing Problems (ARPs) are a type of routing problem that finds routes of minimum total cost covering the edges or arcs in a graph representing street or road networks. They find application in many essential services such as residential waste collection, winter gritting, and others. Being NP-hard, solutions are…

Arc Routing Problems (ARPs) are a type of routing problem that finds routes of minimum total cost covering the edges or arcs in a graph representing street or road networks. They find application in many essential services such as residential waste collection, winter gritting, and others. Being NP-hard, solutions are usually found using heuristic methods. This dissertation contributes to heuristics for ARP, with a focus on the Capacitated Arc Routing Problem (CARP) with additional constraints. In operations such as residential waste collection, vehicle breakdown disruptions occur frequently. A new variant Capacitated Arc Re-routing Problem for Vehicle Break-down (CARP-VB) is introduced to address the need to re-route using only remaining vehicles to avoid missing services. A new heuristic Probe is developed to solve CARP-VB. Experiments on benchmark instances show that Probe is better in reducing the makespan and hence effective in reducing delays and avoiding missing services. In addition to total cost, operators are also interested in solutions that are attractive, that is, routes that are contiguous, compact, and non-overlapping to manage the work. Operators may not adopt a solution that is not attractive even if it is optimum. They are also interested in solutions that are balanced in workload to meet equity requirements. A new multi-objective memetic algorithm, MA-ABC is developed, that optimizes three objectives: Attractiveness, makespan, and total cost. On testing with benchmark instances, MA-ABC was found to be effective in providing attractive and balanced route solutions without affecting the total cost. Changes in the problem specification such as demand and topology occurs frequently in business operations. Machine learning be applied to learn the distribution behind these changes and generate solutions quickly at time of inference. Splice is a machine learning framework for CARP that generates closer to optimum solutions quickly using a graph neural network and deep Q-learning. Splice can solve several variants of node and arc routing problems using the same architecture without any modification. Splice was trained and tested using randomly generated instances. Splice generated solutions faster that are also better in comparison to popular metaheuristics.

ContributorsRamamoorthy, Muhilan (Author) / Syrotiuk, Violet R. (Thesis advisor) / Forrest, Stephanie (Committee member) / Mirchandani, Pitu (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2022

Investigating the Role of Silent Users on Social Media

Description

Social media platforms provide a rich environment for analyzing user behavior. Recently, deep learning-based methods have been a mainstream approach for social media analysis models involving complex patterns. However, these methods are susceptible to biases in the training data, such as participation inequality. Basically, a mere 1% of users generate…

Social media platforms provide a rich environment for analyzing user behavior. Recently, deep learning-based methods have been a mainstream approach for social media analysis models involving complex patterns. However, these methods are susceptible to biases in the training data, such as participation inequality. Basically, a mere 1% of users generate the majority of the content on social networking sites, while the remaining users, though engaged to varying degrees, tend to be less active in content creation and largely silent. These silent users consume and listen to information that is propagated on the platform.However, their voice, attitude, and interests are not reflected in the online content, making the decision of the current methods predisposed towards the opinion of the active users. So models can mistake the loudest users for the majority. To make the silent majority heard is to reveal the true landscape of the platform. In this dissertation, to compensate for this bias in the data, which is related to user-level data scarcity, I introduce three pieces of research work. Two of these proposed solutions deal with the data on hand while the other tries to augment the current data. Specifically, the first proposed approach modifies the weight of users' activity/interaction in the input space, while the second approach involves re-weighting the loss based on the users' activity levels during the downstream task training. Lastly, the third approach uses large language models (LLMs) and learns the user's writing behavior to expand the current data. In other words, by utilizing LLMs as a sophisticated knowledge base, this method aims to augment the silent user's data.

ContributorsKarami, Mansooreh (Author) / Liu, Huan (Thesis advisor) / Sen, Arunabha (Committee member) / Davulcu, Hasan (Committee member) / Mancenido, Michelle V. (Committee member) / Arizona State University (Publisher)

Created2023

Exact Optimization Models and Algorithms for Large-scale Location and Interdiction Problems with Underlying Network Structure

Description

Networks are a versatile modeling tool for the cyber and physical infrastructure that characterize society. They can be used to describe system spatiotemporal dynamics, including distribution of commodities, movement of agents, and data transmission. This flexibility has resulted in the widespread use of network optimization techniques for decision-making in telecommunications,…

Networks are a versatile modeling tool for the cyber and physical infrastructure that characterize society. They can be used to describe system spatiotemporal dynamics, including distribution of commodities, movement of agents, and data transmission. This flexibility has resulted in the widespread use of network optimization techniques for decision-making in telecommunications, transportation, commerce, among other systems. However, realistic network problems are typically large-scale and require the use of integer variables to incorporate design or logical system constraints. This makes such problems hard to solve and precludes their wide applicability in the solution of applied problems. This dissertation studies four large-scale optimization problems with underlying network structure in different domain applications, including wireless sensor networks, wastewater monitoring, and scheduling. The problems of interest are formulated using mixed-integer optimization formulations. The proposed solution approaches in this dissertation include branch-and-cut and heuristic algorithms, which are enhanced with network-based valid inequalities and network reduction techniques. The first chapter studies a relay node placement problem in wireless sensor networks, with and without the presence of transmission obstacles in the deployment region. The proposed integer linear programming approach leverages the underlying network structure to produce valid inequalities and network reduction heuristics, which are incorporated in the branch-and-bound exploration. The solution approach outperforms the equivalent nonlinear model and solves instances with up to 1000 sensors within reasonable time. The second chapter studies the continuous version of the maximum capacity (widest) path interdiction problem and introduces the first known polynomial time algorithm to solve the problem using a combination of binary search and the discrete version of the Newton’s method. The third chapter explores the service agent transport interdiction problem in autonomous vehicle systems, where an agent schedules service tasks in the presence of an adversary. This chapter proposes a single stage branch-and-cut algorithm to solve the problem, along with several enhancement techniques to improve scalability. The last chapter studies the optimal placement of sensors in a wastewater network to minimize the maximum coverage (load) of placed sensors. This chapter proposes a branch-and-cut algorithm enhanced with network reduction techniques and strengthening constraints.

ContributorsMitra, Ankan (Author) / Sefair, Jorge A (Thesis advisor) / Mirchandani, Pitu (Committee member) / Grubesic, Anthony (Committee member) / Byeon, Geunyeong (Committee member) / Arizona State University (Publisher)

Created2023

SPSR efficient processing of socially k-nearest neighbors with spatial range filter

Description

Social media has become popular in the past decade. Facebook for example has 1.59 billion active users monthly. With such massive social networks generating lot of data, everyone is constantly looking for ways of leveraging the knowledge from social networks to make their systems more personalized to their end users.…

Social media has become popular in the past decade. Facebook for example has 1.59 billion active users monthly. With such massive social networks generating lot of data, everyone is constantly looking for ways of leveraging the knowledge from social networks to make their systems more personalized to their end users. And with rapid increase in the usage of mobile phones and wearables, social media data is being tied to spatial networks. This research document proposes an efficient technique that answers socially k-Nearest Neighbors with Spatial Range Filter. The proposed approach performs a joint search on both the social and spatial domains which radically improves the performance compared to straight forward solutions. The research document proposes a novel index that combines social and spatial indexes. In other words, graph data is stored in an organized manner to filter it based on spatial (region of interest) and social constraints (top-k closest vertices) at query time. That leads to pruning necessary paths during the social graph traversal procedure, and only returns the top-K social close venues. The research document then experimentally proves how the proposed approach outperforms existing baseline approaches by at least three times and also compare how each of our algorithms perform under various conditions on a real geo-social dataset extracted from Yelp.

ContributorsPasumarthy, Nitin (Author) / Sarwat, Mohamed (Thesis advisor) / Papotti, Paolo (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2016

Mobile cloud application framework and offloading strategies

Description

Mobile Cloud computing has shown its capability to support mobile devices for

provisioning computing, storage and communication resources. A distributed mobile

cloud service system called "POEM" is presented to manage the mobile cloud resource

and compose mobile cloud applications. POEM considers resource management not

only between mobile devices and clouds, but also among mobile…

Mobile Cloud computing has shown its capability to support mobile devices for

provisioning computing, storage and communication resources. A distributed mobile

cloud service system called "POEM" is presented to manage the mobile cloud resource

and compose mobile cloud applications. POEM considers resource management not

only between mobile devices and clouds, but also among mobile devices. It implements

both computation offloading and service composition features. The proposed POEM

solution is demonstrated by using OSGi and XMPP techniques.

Offloading is one major type of collaborations between mobile device and cloud

to achieve less execution time and less energy consumption. Offloading decisions for

mobile cloud collaboration involve many decision factors. One of important decision

factors is the network unavailability. This report presents an offloading decision model

that takes network unavailability into consideration. The application execution time

and energy consumption in both ideal network and network with some unavailability

are analyzed. Based on the presented theoretical model, an application partition

algorithm and a decision module are presented to produce an offloading decision that

is resistant to network unavailability.

Existing offloading models mainly focus on the one-to-one offloading relation. To

address the multi-factor and multi-site offloading mobile cloud application scenarios,

a multi-factor multi-site risk-based offloading model is presented, which abstracts the

offloading impact factors as for offloading benefit and offloading risk. The offloading

decision is made based on a comprehensive offloading risk evaluation. This presented

model is generic and expendable. Four offloading impact factors are presented to show

the construction and operation of the presented offloading model, which can be easily

extended to incorporate more factors to make offloading decision more comprehensive.

The overall offloading benefits and risks are aggregated based on the mobile cloud

users' preference.

The offloading topology may change during the whole application life. A set of

algorithms are presented to address the service topology reconfiguration problem in

several mobile cloud representative application scenarios, i.e., they are modeled as

finite horizon scenarios, infinite horizon scenarios, and large state space scenarios to

represent ad hoc, long-term, and large-scale mobile cloud service composition scenarios,

respectively.

ContributorsWu, Huijun (Author) / Huang, Dijiang (Thesis advisor) / Xue, Guoliang (Committee member) / Dasgupta, Partha (Committee member) / Mirchandani, Pitu (Committee member) / Arizona State University (Publisher)

Created2016

Economical apects of resource allocation under discounts

Description

Resource allocation is one of the most challenging issues policy decision makers must address. The objective of this thesis is to explore the resource allocation from an economical perspective, i.e., how to purchase resources in order to satisfy customers' requests. In this thesis, we attend to answer the question: when…

Resource allocation is one of the most challenging issues policy decision makers must address. The objective of this thesis is to explore the resource allocation from an economical perspective, i.e., how to purchase resources in order to satisfy customers' requests. In this thesis, we attend to answer the question: when and how to buy resources to fulfill customers' demands with minimum costs?

The first topic studied in this thesis is resource allocation in cloud networks. Cloud computing heralded an era where resources (such as computation and storage) can be scaled up and down elastically and on demand. This flexibility is attractive for its cost effectiveness: the cloud resource price depends on the actual utilization over time. This thesis studies two critical problems in cloud networks, focusing on the economical aspects of the resource allocation in the cloud/virtual networks, and proposes six algorithms to address the resource allocation problems for different discount models. The first problem attends a scenario where the virtual network provider offers different contracts to the service provider. Four algorithms for resource contract migration are proposed under two pricing models: Pay-as-You-Come and Pay-as-You-Go. The second problem explores a scenario where a cloud provider offers k contracts each with a duration and a rate respectively and a customer buys these contracts in order to satisfy its resource demand. This work shows that this problem can be seen as a 2-dimensional generalization of the classic online parking permit problem, and present a k-competitive online algorithm and an optimal online algorithm.

The second topic studied in this thesis is to explore how resource allocation and purchasing strategies work in our daily life. For example, is it worth buying a Yoga pass which costs USD 100 for ten entries, although it will expire at the end of this year? Decisions like these are part of our daily life, yet, not much is known today about good online strategies to buy discount vouchers with expiration dates. This work hence introduces a Discount Voucher Purchase Problem (DVPP). It aims to optimize the strategies for buying discount vouchers, i.e., coupons, vouchers, groupons which are valid only during a certain time period. The DVPP comes in three flavors: (1) Once Expire Lose Everything (OELE): Vouchers lose their entire value after expiration. (2) Once Expire Lose Discount (OELD): Vouchers lose their discount value after expiration. (3) Limited Purchasing Window (LPW): Vouchers have the property of OELE and can only be bought during a certain time window.

This work explores online algorithms with a provable competitive ratio against a clairvoyant offline algorithm, even in the worst case. In particular, this work makes the following contributions: we present a 4-competitive algorithm for OELE, an 8-competitive algorithm for OELD, and a lower bound for LPW. We also present an optimal offline algorithm for OELE and LPW, and show it is a 2-approximation solution for OELD.

ContributorsHu, Xinhui (Author) / Richa, Andrea (Thesis advisor) / Schmid, Stefan (Committee member) / Sen, Arunabha (Committee member) / Xue, Guoliang (Committee member) / Arizona State University (Publisher)

Created2015

Filtering by