Matching Items (951)
Filtering by

Clear all filters

150126-Thumbnail Image.png
Description
Given the process of tumorigenesis, biological signaling pathways have become of interest in the field of oncology. Many of the regulatory mechanisms that are altered in cancer are directly related to signal transduction and cellular communication. Thus, identifying signaling pathways that have become deregulated may provide useful information

Given the process of tumorigenesis, biological signaling pathways have become of interest in the field of oncology. Many of the regulatory mechanisms that are altered in cancer are directly related to signal transduction and cellular communication. Thus, identifying signaling pathways that have become deregulated may provide useful information to better understanding altered regulatory mechanisms within cancer. Many methods that have been created to measure the distinct activity of signaling pathways have relied strictly upon transcription profiles. With advancements in comparative genomic hybridization techniques, copy number data has become extremely useful in providing valuable information pertaining to the genomic landscape of cancer. The purpose of this thesis is to develop a methodology that incorporates both gene expression and copy number data to identify signaling pathways that have become deregulated in cancer. The central idea is that copy number data may significantly assist in identifying signaling pathway deregulation by justifying the aberrant activity being measured in gene expression profiles. This method was then applied to four different subtypes of breast cancer resulting in the identification of signaling pathways associated with distinct functionalities for each of the breast cancer subtypes.
ContributorsTrevino, Robert (Author) / Kim, Seungchan (Thesis advisor) / Ringner, Markus (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)
Created2011
150086-Thumbnail Image.png
Description
Detecting anatomical structures, such as the carina, the pulmonary trunk and the aortic arch, is an important step in designing a CAD system of detection Pulmonary Embolism. The presented CAD system gets rid of the high-level prior defined knowledge to become a system which can easily extend to detect other

Detecting anatomical structures, such as the carina, the pulmonary trunk and the aortic arch, is an important step in designing a CAD system of detection Pulmonary Embolism. The presented CAD system gets rid of the high-level prior defined knowledge to become a system which can easily extend to detect other anatomic structures. The system is based on a machine learning algorithm --- AdaBoost and a general feature --- Haar. This study emphasizes on off-line and on-line AdaBoost learning. And in on-line AdaBoost, the thesis further deals with extremely imbalanced condition. The thesis first reviews several knowledge-based detection methods, which are relied on human being's understanding of the relationship between anatomic structures. Then the thesis introduces a classic off-line AdaBoost learning. The thesis applies different cascading scheme, namely multi-exit cascading scheme. The comparison between the two methods will be provided and discussed. Both of the off-line AdaBoost methods have problems in memory usage and time consuming. Off-line AdaBoost methods need to store all the training samples and the dataset need to be set before training. The dataset cannot be enlarged dynamically. Different training dataset requires retraining the whole process. The retraining is very time consuming and even not realistic. To deal with the shortcomings of off-line learning, the study exploited on-line AdaBoost learning approach. The thesis proposed a novel pool based on-line method with Kalman filters and histogram to better represent the distribution of the samples' weight. Analysis of the performance, the stability and the computational complexity will be provided in the thesis. Furthermore, the original on-line AdaBoost performs badly in imbalanced conditions, which occur frequently in medical image processing. In image dataset, positive samples are limited and negative samples are countless. A novel Self-Adaptive Asymmetric On-line Boosting method is presented. The method utilized a new asymmetric loss criterion with self-adaptability according to the ratio of exposed positive and negative samples and it has an advanced rule to update sample's importance weight taking account of both classification result and sample's label. Compared to traditional on-line AdaBoost Learning method, the new method can achieve far more accuracy in imbalanced conditions.
ContributorsWu, Hong (Author) / Liang, Jianming (Thesis advisor) / Farin, Gerald (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)
Created2011
150093-Thumbnail Image.png
Description
Action language C+ is a formalism for describing properties of actions, which is based on nonmonotonic causal logic. The definite fragment of C+ is implemented in the Causal Calculator (CCalc), which is based on the reduction of nonmonotonic causal logic to propositional logic. This thesis describes the language

Action language C+ is a formalism for describing properties of actions, which is based on nonmonotonic causal logic. The definite fragment of C+ is implemented in the Causal Calculator (CCalc), which is based on the reduction of nonmonotonic causal logic to propositional logic. This thesis describes the language of CCalc in terms of answer set programming (ASP), based on the translation of nonmonotonic causal logic to formulas under the stable model semantics. I designed a standard library which describes the constructs of the input language of CCalc in terms of ASP, allowing a simple modular method to represent CCalc input programs in the language of ASP. Using the combination of system F2LP and answer set solvers, this method achieves functionality close to that of CCalc while taking advantage of answer set solvers to yield efficient computation that is orders of magnitude faster than CCalc for many benchmark examples. In support of this, I created an automated translation system Cplus2ASP that implements the translation and encoding method and automatically invokes the necessary software to solve the translated input programs.
ContributorsCasolary, Michael (Author) / Lee, Joohyung (Thesis advisor) / Ahn, Gail-Joon (Committee member) / Baral, Chitta (Committee member) / Arizona State University (Publisher)
Created2011
150095-Thumbnail Image.png
Description
Multi-task learning (MTL) aims to improve the generalization performance (of the resulting classifiers) by learning multiple related tasks simultaneously. Specifically, MTL exploits the intrinsic task relatedness, based on which the informative domain knowledge from each task can be shared across multiple tasks and thus facilitate the individual task learning. It

Multi-task learning (MTL) aims to improve the generalization performance (of the resulting classifiers) by learning multiple related tasks simultaneously. Specifically, MTL exploits the intrinsic task relatedness, based on which the informative domain knowledge from each task can be shared across multiple tasks and thus facilitate the individual task learning. It is particularly desirable to share the domain knowledge (among the tasks) when there are a number of related tasks but only limited training data is available for each task. Modeling the relationship of multiple tasks is critical to the generalization performance of the MTL algorithms. In this dissertation, I propose a series of MTL approaches which assume that multiple tasks are intrinsically related via a shared low-dimensional feature space. The proposed MTL approaches are developed to deal with different scenarios and settings; they are respectively formulated as mathematical optimization problems of minimizing the empirical loss regularized by different structures. For all proposed MTL formulations, I develop the associated optimization algorithms to find their globally optimal solution efficiently. I also conduct theoretical analysis for certain MTL approaches by deriving the globally optimal solution recovery condition and the performance bound. To demonstrate the practical performance, I apply the proposed MTL approaches on different real-world applications: (1) Automated annotation of the Drosophila gene expression pattern images; (2) Categorization of the Yahoo web pages. Our experimental results demonstrate the efficiency and effectiveness of the proposed algorithms.
ContributorsChen, Jianhui (Author) / Ye, Jieping (Thesis advisor) / Kumar, Sudhir (Committee member) / Liu, Huan (Committee member) / Xue, Guoliang (Committee member) / Arizona State University (Publisher)
Created2011
150147-Thumbnail Image.png
Description
Navigating within non-linear structures is a challenge for all users when the space is large but the problem is most pronounced when the users are blind or visually impaired. Such users access digital content through screen readers like JAWS which read out the text on the screen. However presentation of

Navigating within non-linear structures is a challenge for all users when the space is large but the problem is most pronounced when the users are blind or visually impaired. Such users access digital content through screen readers like JAWS which read out the text on the screen. However presentation of non-linear narratives in such a manner without visual cues and information about spatial dependencies is very inefficient for such users. The NSDL Science Literacy StrandMaps are visual layouts to help students and teachers browse educational resources. A Strandmap shows relationships between concepts and how they build upon one another across grade levels. NSDL Strandmaps are non-linear narratives which need to be presented to users who are blind in an effective way. A good summary of the Strandmap can give the users an idea about the concepts that are explained in it. This can help them decide whether to view the map or not. In addition, a preview-based navigation mechanism can help users decide which direction they want to take, based on a preview of upcoming content in each direction. Given a non-linear narrative like a Strandmap which has both text and structure, and a word limit w, the goal of this thesis is to find the best way to create its summary. The following approaches are considered: – Purely Text-based Approach using a Multi-document Text Summarizer – Purely Structure-based Approach using PageRank – Approaches Combining both Text and Structure → CUTS-Based Approach (Topic Segmentation) → PageRank with Content Since no reference summaries for such structures were available, user studies were conducted to evaluate these algorithms. PageRank with Content approach performed the best. Another important conclusion was that text and structure are intertwined in a Strandmap by design.
ContributorsGaur, Shruti (Author) / Candan, Kasim Selcuk (Thesis advisor) / Sundaram, Hari (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)
Created2011
150148-Thumbnail Image.png
Description
In order to catch the smartest criminals in the world, digital forensics examiners need a means of collaborating and sharing information with each other and outside experts that is not prohibitively difficult. However, standard operating procedures and the rules of evidence generally disallow the use of the collaboration software and

In order to catch the smartest criminals in the world, digital forensics examiners need a means of collaborating and sharing information with each other and outside experts that is not prohibitively difficult. However, standard operating procedures and the rules of evidence generally disallow the use of the collaboration software and techniques that are currently available because they do not fully adhere to the dictated procedures for the handling, analysis, and disclosure of items relating to cases. The aim of this work is to conceive and design a framework that provides a completely new architecture that 1) can perform fundamental functions that are common and necessary to forensic analyses, and 2) is structured such that it is possible to include collaboration-facilitating components without changing the way users interact with the system sans collaboration. This framework is called the Collaborative Forensic Framework (CUFF). CUFF is constructed from four main components: Cuff Link, Storage, Web Interface, and Analysis Block. With the Cuff Link acting as a mediator between components, CUFF is flexible in both the method of deployment and the technologies used in implementation. The details of a realization of CUFF are given, which uses a combination of Java, the Google Web Toolkit, Django with Apache for a RESTful web service, and an Ubuntu Enterprise Cloud using Eucalyptus. The functionality of CUFF's components is demonstrated by the integration of an acquisition script designed for Android OS-based mobile devices that use the YAFFS2 file system. While this work has obvious application to examination labs which work under the mandate of judicial or investigative bodies, security officers at any organization would benefit from the improved ability to cooperate in electronic discovery efforts and internal investigations.
ContributorsMabey, Michael Kent (Author) / Ahn, Gail-Joon (Thesis advisor) / Yau, Stephen S. (Committee member) / Huang, Dijiang (Committee member) / Arizona State University (Publisher)
Created2011
149703-Thumbnail Image.png
Description
This dissertation studies routing in small-world networks such as grids plus long-range edges and real networks. Kleinberg showed that geography-based greedy routing in a grid-based network takes an expected number of steps polylogarithmic in the network size, thus justifying empirical efficiency observed beginning with Milgram. A counterpart for the grid-based

This dissertation studies routing in small-world networks such as grids plus long-range edges and real networks. Kleinberg showed that geography-based greedy routing in a grid-based network takes an expected number of steps polylogarithmic in the network size, thus justifying empirical efficiency observed beginning with Milgram. A counterpart for the grid-based model is provided; it creates all edges deterministically and shows an asymptotically matching upper bound on the route length. The main goal is to improve greedy routing through a decentralized machine learning process. Two considered methods are based on weighted majority and an algorithm of de Farias and Megiddo, both learning from feedback using ensembles of experts. Tests are run on both artificial and real networks, with decentralized spectral graph embedding supplying geometric information for real networks where it is not intrinsically available. An important measure analyzed in this work is overpayment, the difference between the cost of the method and that of the shortest path. Adaptive routing overtakes greedy after about a hundred or fewer searches per node, consistently across different network sizes and types. Learning stabilizes, typically at overpayment of a third to a half of that by greedy. The problem is made more difficult by eliminating the knowledge of neighbors' locations or by introducing uncooperative nodes. Even under these conditions, the learned routes are usually better than the greedy routes. The second part of the dissertation is related to the community structure of unannotated networks. A modularity-based algorithm of Newman is extended to work with overlapping communities (including considerably overlapping communities), where each node locally makes decisions to which potential communities it belongs. To measure quality of a cover of overlapping communities, a notion of a node contribution to modularity is introduced, and subsequently the notion of modularity is extended from partitions to covers. The final part considers a problem of network anonymization, mostly by the means of edge deletion. The point of interest is utility preservation. It is shown that a concentration on the preservation of routing abilities might damage the preservation of community structure, and vice versa.
ContributorsBakun, Oleg (Author) / Konjevod, Goran (Thesis advisor) / Richa, Andrea (Thesis advisor) / Syrotiuk, Violet R. (Committee member) / Czygrinow, Andrzej (Committee member) / Arizona State University (Publisher)
Created2011
149972-Thumbnail Image.png
Description
Templates are wildly used in Web sites development. Finding the template for a given set of Web pages could be very important and useful for many applications like Web page classification and monitoring content and structure changes of Web pages. In this thesis, two novel sequence-based Web page template detection

Templates are wildly used in Web sites development. Finding the template for a given set of Web pages could be very important and useful for many applications like Web page classification and monitoring content and structure changes of Web pages. In this thesis, two novel sequence-based Web page template detection algorithms are presented. Different from tree mapping algorithms which are based on tree edit distance, sequence-based template detection algorithms operate on the Prüfer/Consolidated Prüfer sequences of trees. Since there are one-to-one correspondences between Prüfer/Consolidated Prüfer sequences and trees, sequence-based template detection algorithms identify the template by finding a common subsequence between to Prüfer/Consolidated Prüfer sequences. This subsequence should be a sequential representation of a common subtree of input trees. Experiments on real-world web pages showed that our approaches detect templates effectively and efficiently.
ContributorsHuang, Wei (Author) / Candan, Kasim Selcuk (Thesis advisor) / Sundaram, Hari (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)
Created2011
152278-Thumbnail Image.png
Description
The digital forensics community has neglected email forensics as a process, despite the fact that email remains an important tool in the commission of crime. Current forensic practices focus mostly on that of disk forensics, while email forensics is left as an analysis task stemming from that practice. As there

The digital forensics community has neglected email forensics as a process, despite the fact that email remains an important tool in the commission of crime. Current forensic practices focus mostly on that of disk forensics, while email forensics is left as an analysis task stemming from that practice. As there is no well-defined process to be used for email forensics the comprehensiveness, extensibility of tools, uniformity of evidence, usefulness in collaborative/distributed environments, and consistency of investigations are hindered. At present, there exists little support for discovering, acquiring, and representing web-based email, despite its widespread use. To remedy this, a systematic process which includes discovering, acquiring, and representing web-based email for email forensics which is integrated into the normal forensic analysis workflow, and which accommodates the distinct characteristics of email evidence will be presented. This process focuses on detecting the presence of non-obvious artifacts related to email accounts, retrieving the data from the service provider, and representing email in a well-structured format based on existing standards. As a result, developers and organizations can collaboratively create and use analysis tools that can analyze email evidence from any source in the same fashion and the examiner can access additional data relevant to their forensic cases. Following, an extensible framework implementing this novel process-driven approach has been implemented in an attempt to address the problems of comprehensiveness, extensibility, uniformity, collaboration/distribution, and consistency within forensic investigations involving email evidence.
ContributorsPaglierani, Justin W (Author) / Ahn, Gail-Joon (Thesis advisor) / Yau, Stephen S. (Committee member) / Santanam, Raghu T (Committee member) / Arizona State University (Publisher)
Created2013
152361-Thumbnail Image.png
Description
The study of acoustic ecology is concerned with the manner in which life interacts with its environment as mediated through sound. As such, a central focus is that of the soundscape: the acoustic environment as perceived by a listener. This dissertation examines the application of several computational tools in the

The study of acoustic ecology is concerned with the manner in which life interacts with its environment as mediated through sound. As such, a central focus is that of the soundscape: the acoustic environment as perceived by a listener. This dissertation examines the application of several computational tools in the realms of digital signal processing, multimedia information retrieval, and computer music synthesis to the analysis of the soundscape. Namely, these tools include a) an open source software library, Sirens, which can be used for the segmentation of long environmental field recordings into individual sonic events and compare these events in terms of acoustic content, b) a graph-based retrieval system that can use these measures of acoustic similarity and measures of semantic similarity using the lexical database WordNet to perform both text-based retrieval and automatic annotation of environmental sounds, and c) new techniques for the dynamic, realtime parametric morphing of multiple field recordings, informed by the geographic paths along which they were recorded.
ContributorsMechtley, Brandon Michael (Author) / Spanias, Andreas S (Thesis advisor) / Sundaram, Hari (Thesis advisor) / Cook, Perry R. (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2013