Matching Items (18)
151341-Thumbnail Image.png
Description
With the rapid development of mobile sensing technologies like GPS, RFID, sensors in smartphones, etc., capturing position data in the form of trajectories has become easy. Moving object trajectory analysis is a growing area of interest these days owing to its applications in various domains such as marketing, security, traffic

With the rapid development of mobile sensing technologies like GPS, RFID, sensors in smartphones, etc., capturing position data in the form of trajectories has become easy. Moving object trajectory analysis is a growing area of interest these days owing to its applications in various domains such as marketing, security, traffic monitoring and management, etc. To better understand movement behaviors from the raw mobility data, this doctoral work provides analytic models for analyzing trajectory data. As a first contribution, a model is developed to detect changes in trajectories with time. If the taxis moving in a city are viewed as sensors that provide real time information of the traffic in the city, a change in these trajectories with time can reveal that the road network has changed. To detect changes, trajectories are modeled with a Hidden Markov Model (HMM). A modified training algorithm, for parameter estimation in HMM, called m-BaumWelch, is used to develop likelihood estimates under assumed changes and used to detect changes in trajectory data with time. Data from vehicles are used to test the method for change detection. Secondly, sequential pattern mining is used to develop a model to detect changes in frequent patterns occurring in trajectory data. The aim is to answer two questions: Are the frequent patterns still frequent in the new data? If they are frequent, has the time interval distribution in the pattern changed? Two different approaches are considered for change detection, frequency-based approach and distribution-based approach. The methods are illustrated with vehicle trajectory data. Finally, a model is developed for clustering and outlier detection in semantic trajectories. A challenge with clustering semantic trajectories is that both numeric and categorical attributes are present. Another problem to be addressed while clustering is that trajectories can be of different lengths and also have missing values. A tree-based ensemble is used to address these problems. The approach is extended to outlier detection in semantic trajectories.
ContributorsKondaveeti, Anirudh (Author) / Runger, George C. (Thesis advisor) / Mirchandani, Pitu (Committee member) / Pan, Rong (Committee member) / Maciejewski, Ross (Committee member) / Arizona State University (Publisher)
Created2012
152906-Thumbnail Image.png
Description
Multidimensional data have various representations. Thanks to their simplicity in modeling multidimensional data and the availability of various mathematical tools (such as tensor decompositions) that support multi-aspect analysis of such data, tensors are increasingly being used in many application domains including scientific data management, sensor data management, and social network

Multidimensional data have various representations. Thanks to their simplicity in modeling multidimensional data and the availability of various mathematical tools (such as tensor decompositions) that support multi-aspect analysis of such data, tensors are increasingly being used in many application domains including scientific data management, sensor data management, and social network data analysis. Relational model, on the other hand, enables semantic manipulation of data using relational operators, such as projection, selection, Cartesian-product, and set operators. For many multidimensional data applications, tensor operations as well as relational operations need to be supported throughout the data life cycle. In this thesis, we introduce a tensor-based relational data model (TRM), which enables both tensor- based data analysis and relational manipulations of multidimensional data, and define tensor-relational operations on this model. Then we introduce a tensor-relational data management system, so called, TensorDB. TensorDB is based on TRM, which brings together relational algebraic operations (for data manipulation and integration) and tensor algebraic operations (for data analysis). We develop optimization strategies for tensor-relational operations in both in-memory and in-database TensorDB. The goal of the TRM and TensorDB is to serve as a single environment that supports the entire life cycle of data; that is, data can be manipulated, integrated, processed, and analyzed.
ContributorsKim, Mijung (Author) / Candan, K. Selcuk (Thesis advisor) / Davulcu, Hasan (Committee member) / Sundaram, Hari (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)
Created2014
153478-Thumbnail Image.png
Description
US Senate is the venue of political debates where the federal bills are formed and voted. Senators show their support/opposition along the bills with their votes. This information makes it possible to extract the polarity of the senators. Similarly, blogosphere plays an increasingly important role as a forum for public

US Senate is the venue of political debates where the federal bills are formed and voted. Senators show their support/opposition along the bills with their votes. This information makes it possible to extract the polarity of the senators. Similarly, blogosphere plays an increasingly important role as a forum for public debate. Authors display sentiment toward issues, organizations or people using a natural language.

In this research, given a mixed set of senators/blogs debating on a set of political issues from opposing camps, I use signed bipartite graphs for modeling debates, and I propose an algorithm for partitioning both the opinion holders (senators or blogs) and the issues (bills or topics) comprising the debate into binary opposing camps. Simultaneously, my algorithm scales the entities on a univariate scale. Using this scale, a researcher can identify moderate and extreme senators/blogs within each camp, and polarizing versus unifying issues. Through performance evaluations I show that my proposed algorithm provides an effective solution to the problem, and performs much better than existing baseline algorithms adapted to solve this new problem. In my experiments, I used both real data from political blogosphere and US Congress records, as well as synthetic data which were obtained by varying polarization and degree distribution of the vertices of the graph to show the robustness of my algorithm.

I also applied my algorithm on all the terms of the US Senate to the date for longitudinal analysis and developed a web based interactive user interface www.PartisanScale.com to visualize the analysis.

US politics is most often polarized with respect to the left/right alignment of the entities. However, certain issues do not reflect the polarization due to political parties, but observe a split correlating to the demographics of the senators, or simply receive consensus. I propose a hierarchical clustering algorithm that identifies groups of bills that share the same polarization characteristics. I developed a web based interactive user interface www.ControversyAnalysis.com to visualize the clusters while providing a synopsis through distribution charts, word clouds, and heat maps.
ContributorsGokalp, Sedat (Author) / Davulcu, Hasan (Thesis advisor) / Sen, Arunabha (Committee member) / Liu, Huan (Committee member) / Woodward, Mark (Committee member) / Arizona State University (Publisher)
Created2015
150214-Thumbnail Image.png
Description
Galaxies with strong Lyman-alpha (Lya) emission line (also called Lya galaxies or emitters) offer an unique probe of the epoch of reionization - one of the important phases when most of the neutral hydrogen in the universe was ionized. In addition, Lya galaxies at high redshifts are a powerful tool

Galaxies with strong Lyman-alpha (Lya) emission line (also called Lya galaxies or emitters) offer an unique probe of the epoch of reionization - one of the important phases when most of the neutral hydrogen in the universe was ionized. In addition, Lya galaxies at high redshifts are a powerful tool to study low-mass galaxy formation. Since current observations suggest that the reionization is complete by redshift z~ 6, it is therefore necessary to discover galaxies at z > 6, to use their luminosity function (LF) as a probe of reionization. I found five z = 7.7 candidate Lya galaxies with line fluxes > 7x10-18 erg/s/cm/2 , from three different deep near-infrared (IR) narrowband (NB) imaging surveys in a volume > 4x104Mpc3. From the spectroscopic followup of four candidate galaxies, and with the current spectroscopic sensitivity, the detection of only the brightest candidate galaxy can be ruled out at 5 sigma level. Moreover, these observations successfully demonstrate that the sensitivity necessary for both, the NB imaging as well as the spectroscopic followup of z~ 8 Lya galaxies can be reached with the current instrumentation. While future, more sensitive spectroscopic observations are necessary, the observed Lya LF at z = 7.7 is consistent with z = 6.6 LF, suggesting that the intergalactic medium (IGM) is relatively ionized even at z = 7.7, with neutral fraction xHI≤ 30%. On the theoretical front, while several models of Lya emitters have been developed, the physical nature of Lya emitters is not yet completely known. Moreover, multi-parameter models and their complexities necessitates a simpler model. I have developed a simple, single-parameter model to populate dark mater halos with Lya emitters. The central tenet of this model, different from many of the earlier models, is that the star-formation rate (SFR), and hence the Lya luminosity, is proportional to the mass accretion rate rather than the total halo mass. This simple model is successful in reproducing many observable including LFs, stellar masses, SFRs, and clustering of Lya emitters from z~ 3 to z~ 7. Finally, using this model, I find that the mass accretion, and hence the star-formation in > 30% of Lya emitters at z~ 3 occur through major mergers, and this fraction increases to ~ 50% at z~7.
ContributorsShet Tilvi, Vithal (Author) / Malhotra, Sangeeta (Thesis advisor) / Rhoads, James (Committee member) / Scannapieco, Evan (Committee member) / Young, Patrick (Committee member) / Jansen, Rolf (Committee member) / Arizona State University (Publisher)
Created2011
149901-Thumbnail Image.png
Description
Query Expansion is a functionality of search engines that suggest a set of related queries for a user issued keyword query. In case of exploratory or ambiguous keyword queries, the main goal of the user would be to identify and select a specific category of query results among different categorical

Query Expansion is a functionality of search engines that suggest a set of related queries for a user issued keyword query. In case of exploratory or ambiguous keyword queries, the main goal of the user would be to identify and select a specific category of query results among different categorical options, in order to narrow down the search and reach the desired result. Typical corpus-driven keyword query expansion approaches return popular words in the results as expanded queries. These empirical methods fail to cover all semantics of categories present in the query results. More importantly these methods do not consider the semantic relationship between the keywords featured in an expanded query. Contrary to a normal keyword search setting, these factors are non-trivial in an exploratory and ambiguous query setting where the user's precise discernment of different categories present in the query results is more important for making subsequent search decisions. In this thesis, I propose a new framework for keyword query expansion: generating a set of queries that correspond to the categorization of original query results, which is referred as Categorizing query expansion. Two approaches of algorithms are proposed, one that performs clustering as pre-processing step and then generates categorizing expanded queries based on the clusters. The other category of algorithms handle the case of generating quality expanded queries in the presence of imperfect clusters.
ContributorsNatarajan, Sivaramakrishnan (Author) / Chen, Yi (Thesis advisor) / Candan, Selcuk (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)
Created2011
150442-Thumbnail Image.png
Description
Most stars form in groups, and these clusters are themselves nestled within larger associations and stellar complexes. It is not yet clear, however, whether stars cluster on preferred size scales within galaxies, or if stellar groupings have a continuous size distribution. I have developed two methods to select stellar groupings

Most stars form in groups, and these clusters are themselves nestled within larger associations and stellar complexes. It is not yet clear, however, whether stars cluster on preferred size scales within galaxies, or if stellar groupings have a continuous size distribution. I have developed two methods to select stellar groupings across a wide range of size-scales in order to assess trends in the size distribution and other basic properties of stellar groupings. The first method uses visual inspection of color-magnitude and color-color diagrams of clustered stars to assess whether the compact sources within the potential association are coeval, and thus likely to be born from the same parentmolecular cloud. This method was developed using the stellar associations in the M51/NGC 5195 interacting galaxy system. This process is highly effective at selecting single-aged stellar associations, but in order to assess properties of stellar clustering in a larger sample of nearby galaxies, an automated method for selecting stellar groupings is needed. I have developed an automated stellar grouping selection method that is sensitive to stellar clustering on all size scales. Using the Source Extractor software package on Gaussian-blurred images of NGC 4214, and the annular surface brightness to determine the characteristic size of each cluster/association, I eliminate much of the size and density biases intrinsic to other methods. This automated method was tested in the nearby dwarf irregular galaxy NGC 4214, and can detect stellar groupings with sizes ranging from compact clusters to stellar complexes. In future work, the automatic selection method developed in this dissertation will be used to identify stellar groupings in a set of nearby galaxies to determine if the size scales for stellar clustering are uniform in the nearby universe or if it is dependent on local galactic environment. Once the stellar clusters and associations have been identified and age-dated, this information can be used to deduce disruption times from the age distribution as a function of the position of the stellar grouping within the galaxy, the size of the cluster or association, and the morphological type of the galaxy. The implications of these results for galaxy formation and evolution are discussed.
ContributorsKaleida, Catherine (Author) / Scowen, Paul A. (Thesis advisor) / Windhorst, Rogier A. (Thesis advisor) / Jansen, Rolf A. (Committee member) / Timmes, Francis X. (Committee member) / Scannapieco, Evan (Committee member) / Arizona State University (Publisher)
Created2011
150466-Thumbnail Image.png
Description
The ever-changing economic landscape has forced many companies to re-examine their supply chains. Global resourcing and outsourcing of processes has been a strategy many organizations have adopted to reduce cost and to increase their global footprint. This has, however, resulted in increased process complexity and reduced customer satisfaction. In order

The ever-changing economic landscape has forced many companies to re-examine their supply chains. Global resourcing and outsourcing of processes has been a strategy many organizations have adopted to reduce cost and to increase their global footprint. This has, however, resulted in increased process complexity and reduced customer satisfaction. In order to meet and exceed customer expectations, many companies are forced to improve quality and on-time delivery, and have looked towards Lean Six Sigma as an approach to enable process improvement. The Lean Six Sigma literature is rich in deployment strategies; however, there is a general lack of a mathematical approach to deploy Lean Six Sigma in a global enterprise. This includes both project identification and prioritization. The research presented here is two-fold. Firstly, a process characterization framework is presented to evaluate processes based on eight characteristics. An unsupervised learning technique, using clustering algorithms, is then utilized to group processes that are Lean Six Sigma conducive. The approach helps Lean Six Sigma deployment champions to identify key areas within the business to focus a Lean Six Sigma deployment. A case study is presented and 33% of the processes were found to be Lean Six Sigma conducive. Secondly, having identified parts of the business that are lean Six Sigma conducive, the next steps are to formulate and prioritize a portfolio of projects. Very often the deployment champion is faced with the decision of selecting a portfolio of Lean Six Sigma projects that meet multiple objectives which could include: maximizing productivity, customer satisfaction or return on investment, while meeting certain budgetary constraints. A multi-period 0-1 knapsack problem is presented that maximizes the expected net savings of the Lean Six Sigma portfolio over the life cycle of the deployment. Finally, a case study is presented that demonstrates the application of the model in a large multinational company. Traditionally, Lean Six Sigma found its roots in manufacturing. The research presented in this dissertation also emphasizes the applicability of the methodology to the non-manufacturing space. Additionally, a comparison is conducted between manufacturing and non-manufacturing processes to highlight the challenges in deploying the methodology in both spaces.
ContributorsDuarte, Brett Marc (Author) / Fowler, John W (Thesis advisor) / Montgomery, Douglas C. (Thesis advisor) / Shunk, Dan (Committee member) / Borror, Connie (Committee member) / Konopka, John (Committee member) / Arizona State University (Publisher)
Created2011
150264-Thumbnail Image.png
Description
The following research is a regulatory and emissions analysis of collocated sources of air pollution as they relate to the definition of "major, stationary, sources", if their emissions were amalgamated. The emitting sources chosen for this study are seven facilities located in a single, aggregate mining pit, along the Aqua

The following research is a regulatory and emissions analysis of collocated sources of air pollution as they relate to the definition of "major, stationary, sources", if their emissions were amalgamated. The emitting sources chosen for this study are seven facilities located in a single, aggregate mining pit, along the Aqua Fria riverbed in Sun City, Arizona. The sources in question consist of Rock Crushing and Screening plants, Hot Mix Asphalt plants, and Concrete Batch plants. Generally, individual facilities with emissions of a criteria air pollutant over 100 tons per year or 70 tons per year for PM10 in the Maricopa County non-attainment area would be required to operate under a different permitting regime than those with emissions less than stated above. In addition, facility's that emit over 25 tons per year or 150 pounds per hour of NOx would trigger Maricopa County Best Available Control Technology (BACT) and would be required to install more stringent pollution controls. However, in order to circumvent the more stringent permitting requirements, some facilities have "collocated" in order to escape having their emissions calculated as single source, while operating as a single, production entity. The results of this study indicate that the sources analyzed do not collectively emit major source levels of emissions; however, they do trigger year and daily BACT for NOx. It was also discovered that lack of grid power contributes to the use of generators, which is the main source of emissions. Therefore, if grid electricity was introduced in outlying areas of Maricopa County, facilities could significantly reduce the use of generator power; thereby, reducing pollutants associated with generator use.
ContributorsFranquist, Timothy S (Author) / Olson, Larry (Thesis advisor) / Hild, Nicholas (Committee member) / Brown, Albert (Committee member) / Arizona State University (Publisher)
Created2011
156475-Thumbnail Image.png
Description
This research start utilizing an efficient sparse inverse covariance matrix (precision matrix) estimation technique to identify a set of highly correlated discriminative perspectives between radical and counter-radical groups. A ranking system has been developed that utilizes ranked perspectives to map Islamic organizations on a set of socio-cultural, political and behavioral

This research start utilizing an efficient sparse inverse covariance matrix (precision matrix) estimation technique to identify a set of highly correlated discriminative perspectives between radical and counter-radical groups. A ranking system has been developed that utilizes ranked perspectives to map Islamic organizations on a set of socio-cultural, political and behavioral scales based on their web site corpus. Simultaneously, a gold standard ranking of these organizations was created through domain experts and compute expert-to-expert agreements and present experimental results comparing the performance of the QUIC based scaling system to another baseline method for organizations. The QUIC based algorithm not only outperforms the baseline methods, but it is also the only system that consistently performs at area expert-level accuracies for all scales. Also, a multi-scale ideological model has been developed and it investigates the correlates of Islamic extremism in Indonesia, Nigeria and UK. This analysis demonstrate that violence does not correlate strongly with broad Muslim theological or sectarian orientations; it shows that religious diversity intolerance is the only consistent and statistically significant ideological correlate of Islamic extremism in these countries, alongside desire for political change in UK and Indonesia, and social change in Nigeria. Next, dynamic issues and communities tracking system based on NMF(Non-negative Matrix Factorization) co-clustering algorithm has been built to better understand the dynamics of virtual communities. The system used between Iran and Saudi Arabia to build and apply a multi-party agent-based model that can demonstrate the role of wedges and spoilers in a complex environment where coalitions are dynamic. Lastly, a visual intelligence platform for tracking the diffusion of online social movements has been developed called LookingGlass to track the geographical footprint, shifting positions and flows of individuals, topics and perspectives between groups. The algorithm utilize large amounts of text collected from a wide variety of organizations’ media outlets to discover their hotly debated topics, and their discriminative perspectives voiced by opposing camps organized into multiple scales. Discriminating perspectives is utilized to classify and map individual Tweeter’s message content to social movements based on the perspectives expressed in their tweets.
ContributorsKim, Nyunsu (Author) / Davulcu, Hasan (Thesis advisor) / Sen, Arunabha (Committee member) / Hsiao, Sharon (Committee member) / Corman, Steven (Committee member) / Arizona State University (Publisher)
Created2018
156679-Thumbnail Image.png
Description
The recent technological advances enable the collection of various complex, heterogeneous and high-dimensional data in biomedical domains. The increasing availability of the high-dimensional biomedical data creates the needs of new machine learning models for effective data analysis and knowledge discovery. This dissertation introduces several unsupervised and supervised methods to hel

The recent technological advances enable the collection of various complex, heterogeneous and high-dimensional data in biomedical domains. The increasing availability of the high-dimensional biomedical data creates the needs of new machine learning models for effective data analysis and knowledge discovery. This dissertation introduces several unsupervised and supervised methods to help understand the data, discover the patterns and improve the decision making. All the proposed methods can generalize to other industrial fields.

The first topic of this dissertation focuses on the data clustering. Data clustering is often the first step for analyzing a dataset without the label information. Clustering high-dimensional data with mixed categorical and numeric attributes remains a challenging, yet important task. A clustering algorithm based on tree ensembles, CRAFTER, is proposed to tackle this task in a scalable manner.

The second part of this dissertation aims to develop data representation methods for genome sequencing data, a special type of high-dimensional data in the biomedical domain. The proposed data representation method, Bag-of-Segments, can summarize the key characteristics of the genome sequence into a small number of features with good interpretability.

The third part of this dissertation introduces an end-to-end deep neural network model, GCRNN, for time series classification with emphasis on both the accuracy and the interpretation. GCRNN contains a convolutional network component to extract high-level features, and a recurrent network component to enhance the modeling of the temporal characteristics. A feed-forward fully connected network with the sparse group lasso regularization is used to generate the final classification and provide good interpretability.

The last topic centers around the dimensionality reduction methods for time series data. A good dimensionality reduction method is important for the storage, decision making and pattern visualization for time series data. The CRNN autoencoder is proposed to not only achieve low reconstruction error, but also generate discriminative features. A variational version of this autoencoder has great potential for applications such as anomaly detection and process control.
ContributorsLin, Sangdi (Author) / Runger, George C. (Thesis advisor) / Kocher, Jean-Pierre A (Committee member) / Pan, Rong (Committee member) / Escobedo, Adolfo R. (Committee member) / Arizona State University (Publisher)
Created2018