Search Content

Structured sparse learning and its applications to biomedical and biological data

Description

Sparsity has become an important modeling tool in areas such as genetics, signal and audio processing, medical image processing, etc. Via the penalization of l-1 norm based regularization, the structured sparse learning algorithms can produce highly accurate models while imposing various predefined structures on the data, such as feature groups…

Sparsity has become an important modeling tool in areas such as genetics, signal and audio processing, medical image processing, etc. Via the penalization of l-1 norm based regularization, the structured sparse learning algorithms can produce highly accurate models while imposing various predefined structures on the data, such as feature groups or graphs. In this thesis, I first propose to solve a sparse learning model with a general group structure, where the predefined groups may overlap with each other. Then, I present three real world applications which can benefit from the group structured sparse learning technique. In the first application, I study the Alzheimer's Disease diagnosis problem using multi-modality neuroimaging data. In this dataset, not every subject has all data sources available, exhibiting an unique and challenging block-wise missing pattern. In the second application, I study the automatic annotation and retrieval of fruit-fly gene expression pattern images. Combined with the spatial information, sparse learning techniques can be used to construct effective representation of the expression images. In the third application, I present a new computational approach to annotate developmental stage for Drosophila embryos in the gene expression images. In addition, it provides a stage score that enables one to more finely annotate each embryo so that they are divided into early and late periods of development within standard stage demarcations. Stage scores help us to illuminate global gene activities and changes much better, and more refined stage annotations improve our ability to better interpret results when expression pattern matches are discovered between genes.

ContributorsYuan, Lei (Author) / Ye, Jieping (Thesis advisor) / Wang, Yalin (Committee member) / Xue, Guoliang (Committee member) / Kumar, Sudhir (Committee member) / Arizona State University (Publisher)

Created2013

Batch mode active learning for multimedia pattern recognition

Description

The rapid escalation of technology and the widespread emergence of modern technological equipments have resulted in the generation of humongous amounts of digital data (in the form of images, videos and text). This has expanded the possibility of solving real world problems using computational learning frameworks. However, while gathering a…

The rapid escalation of technology and the widespread emergence of modern technological equipments have resulted in the generation of humongous amounts of digital data (in the form of images, videos and text). This has expanded the possibility of solving real world problems using computational learning frameworks. However, while gathering a large amount of data is cheap and easy, annotating them with class labels is an expensive process in terms of time, labor and human expertise. This has paved the way for research in the field of active learning. Such algorithms automatically select the salient and exemplar instances from large quantities of unlabeled data and are effective in reducing human labeling effort in inducing classification models. To utilize the possible presence of multiple labeling agents, there have been attempts towards a batch mode form of active learning, where a batch of data instances is selected simultaneously for manual annotation. This dissertation is aimed at the development of novel batch mode active learning algorithms to reduce manual effort in training classification models in real world multimedia pattern recognition applications. Four major contributions are proposed in this work: $(i)$ a framework for dynamic batch mode active learning, where the batch size and the specific data instances to be queried are selected adaptively through a single formulation, based on the complexity of the data stream in question, $(ii)$ a batch mode active learning strategy for fuzzy label classification problems, where there is an inherent imprecision and vagueness in the class label definitions, $(iii)$ batch mode active learning algorithms based on convex relaxations of an NP-hard integer quadratic programming (IQP) problem, with guaranteed bounds on the solution quality and $(iv)$ an active matrix completion algorithm and its application to solve several variants of the active learning problem (transductive active learning, multi-label active learning, active feature acquisition and active learning for regression). These contributions are validated on the face recognition and facial expression recognition problems (which are commonly encountered in real world applications like robotics, security and assistive technology for the blind and the visually impaired) and also on collaborative filtering applications like movie recommendation.

ContributorsChakraborty, Shayok (Author) / Panchanathan, Sethuraman (Thesis advisor) / Balasubramanian, Vineeth N. (Committee member) / Li, Baoxin (Committee member) / Mittelmann, Hans (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Nanofluidics for single molecule DNA sequencing

Description

After a decade of efforts, accurate and affordable DNA sequencing continues to remain an important goal in current research landscape. This thesis starts with a brief overview of the recent updates in the field of DNA sequencing technologies followed by description of the nanofluidics route to single molecule DNA detection.…

After a decade of efforts, accurate and affordable DNA sequencing continues to remain an important goal in current research landscape. This thesis starts with a brief overview of the recent updates in the field of DNA sequencing technologies followed by description of the nanofluidics route to single molecule DNA detection. Chapter 2 presents discusses carbon nanotube(CNT) based nanofluidics. The fabrication and DNA sensing measurements of CNT forest membrane devices are presented. Chapter 3 gives the background for functionalization and recognition aspects of reader molecules. Chapter 4 marks the transition to solid state nanopore nanofluidics. The fabrication of Imidazole functionalized nanopores is discussed. The Single Molecule detection results of DNA from Palladium nanopore devices are presented next. Combining chemical recognition to nanopore technology, it has been possible to prolong the duration of single molecule events from the order of a few micro seconds to upto a few milliseconds. Overall, the work presented in this thesis promises longer single molecule detection time in a nanofludic set up and paves way for novel nanopore- tunnel junction devices that combine recognition chemistry, tunneling device and nanopore approach.

ContributorsKrishnakumar, Padmini (Author) / Lindsay, Stuart (Thesis advisor) / He, Jin (Committee member) / Vaiana, Sara (Committee member) / Schmidt, Kevin (Committee member) / Arizona State University (Publisher)

Created2013

Exploring the impact of varying levels of augmented reality to teach probability and sampling with a mobile device

Description

Statistics is taught at every level of education, yet teachers often have to assume their students have no knowledge of statistics and start from scratch each time they set out to teach statistics. The motivation for this experimental study comes from interest in exploring educational applications of augmented reality (AR)…

Statistics is taught at every level of education, yet teachers often have to assume their students have no knowledge of statistics and start from scratch each time they set out to teach statistics. The motivation for this experimental study comes from interest in exploring educational applications of augmented reality (AR) delivered via mobile technology that could potentially provide rich, contextualized learning for understanding concepts related to statistics education. This study examined the effects of AR experiences for learning basic statistical concepts. Using a 3 x 2 research design, this study compared learning gains of 252 undergraduate and graduate students from a pre- and posttest given before and after interacting with one of three types of augmented reality experiences, a high AR experience (interacting with three dimensional images coupled with movement through a physical space), a low AR experience (interacting with three dimensional images without movement), or no AR experience (two dimensional images without movement). Two levels of collaboration (pairs and no pairs) were also included. Additionally, student perceptions toward collaboration opportunities and engagement were compared across the six treatment conditions. Other demographic information collected included the students' previous statistics experience, as well as their comfort level in using mobile devices. The moderating variables included prior knowledge (high, average, and low) as measured by the student's pretest score. Taking into account prior knowledge, students with low prior knowledge assigned to either high or low AR experience had statistically significant higher learning gains than those assigned to a no AR experience. On the other hand, the results showed no statistical significance between students assigned to work individually versus in pairs. Students assigned to both high and low AR experience perceived a statistically significant higher level of engagement than their no AR counterparts. Students with low prior knowledge benefited the most from the high AR condition in learning gains. Overall, the AR application did well for providing a hands-on experience working with statistical data. Further research on AR and its relationship to spatial cognition, situated learning, high order skill development, performance support, and other classroom applications for learning is still needed.

ContributorsConley, Quincy (Author) / Atkinson, Robert K (Thesis advisor) / Nguyen, Frank (Committee member) / Nelson, Brian C (Committee member) / Arizona State University (Publisher)

Created2013

Epitaxy of group IV optical materials and synthesis of IV/III-V semiconductor analogs by designer hydride chemistries

Description

The thesis studies new methods to fabricate optoelectronic Ge1-ySny/Si(100) alloys and investigate their photoluminescence (PL) properties for possible applications in Si-based photonics including IR lasers. The work initially investigated the origin of the difference between the PL spectrum of bulk Ge, dominated by indirect gap emission, and the PL spectrum…

The thesis studies new methods to fabricate optoelectronic Ge1-ySny/Si(100) alloys and investigate their photoluminescence (PL) properties for possible applications in Si-based photonics including IR lasers. The work initially investigated the origin of the difference between the PL spectrum of bulk Ge, dominated by indirect gap emission, and the PL spectrum of Ge-on-Si films, dominated by direct gap emission. It was found that the difference is due to the supression of self-absorption effects in Ge films, combined with a deviation from quasi-equilibrium conditions in the conduction band of undoped films. The latter is confirmed by a model suggesting that the deviation is caused by the shorter recombination lifetime in the films relative to bulk Ge. The knowledge acquired from this work was then utilized to study the PL properties of n-type Ge1-ySny/Si (y=0.004-0.04) samples grown via chemical vapor deposition of Ge2H6/SnD4/P(GeH3)3. It was found that the emission intensity (I) of these samples is at least 10x stronger than observed in un-doped counterparts and that the Idir/Iind ratio of direct over indirect gap emission increases for high-Sn contents due to the reduced gamma-L valley separation, as expected. Next the PL investigation was expanded to samples with y=0.05-0.09 grown via a new method using the more reactive Ge3H8 in place of Ge2H6. Optical quality, 1-um thick Ge1-ySny/Si(100) layers were produced using Ge3H10/SnD4 and found to exhibit strong, tunable PL near the threshold of the direct-indirect bandgap crossover. A byproduct of this study was the development of an enhanced process to produce Ge3H8, Ge4H10, and Ge5H12 analogs for application in ultra-low temperature deposition of Group-IV semiconductors. The thesis also studies synthesis routes of an entirely new class of semiconductor compounds and alloys described by Si5-2y(III-V)y (III=Al, V= As, P) comprising of specifically designed diamond-like structures based on a Si parent lattice incorporating isolated III-V units. The common theme of the two thesis topics is the development of new mono-crystalline materials on ubiquitous silicon platforms with the objective of enhancing the optoelectronic performance of Si and Ge semiconductors, potentially leading to the design of next generation optical devices including lasers, detectors and solar cells.

ContributorsGrzybowski, Gordon (Author) / Kouvetakis, John (Thesis advisor) / Chizmeshya, Andrew (Committee member) / Menéndez, Jose (Committee member) / Arizona State University (Publisher)

Created2013

Advancing biomedical named entity recognition with multivariate feature selection and semantically motivated features

Description

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located…

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located within natural-language text and their semantic type is determined. This step is critical for later tasks in an information extraction pipeline, including normalization and relationship extraction. BANNER is a benchmark biomedical NER system using linear-chain conditional random fields and the rich feature set approach. A case study with BANNER locating genes and proteins in biomedical literature is described. The first corpus for disease NER adequate for use as training data is introduced, and employed in a case study of disease NER. The first corpus locating adverse drug reactions (ADRs) in user posts to a health-related social website is also described, and a system to locate and identify ADRs in social media text is created and evaluated. The rich feature set approach to creating NER feature sets is argued to be subject to diminishing returns, implying that additional improvements may require more sophisticated methods for creating the feature set. This motivates the first application of multivariate feature selection with filters and false discovery rate analysis to biomedical NER, resulting in a feature set at least 3 orders of magnitude smaller than the set created by the rich feature set approach. Finally, two novel approaches to NER by modeling the semantics of token sequences are introduced. The first method focuses on the sequence content by using language models to determine whether a sequence resembles entries in a lexicon of entity names or text from an unlabeled corpus more closely. The second method models the distributional semantics of token sequences, determining the similarity between a potential mention and the token sequences from the training data by analyzing the contexts where each sequence appears in a large unlabeled corpus. The second method is shown to improve the performance of BANNER on multiple data sets.

ContributorsLeaman, James Robert (Author) / Gonzalez, Graciela (Thesis advisor) / Baral, Chitta (Thesis advisor) / Cohen, Kevin B (Committee member) / Liu, Huan (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Optical properties of wurtzite semiconductors studied using cathodoluminescence imaging and spectroscopy

Description

The work contained in this dissertation is focused on the optical properties of direct band gap semiconductors which crystallize in a wurtzite structure: more specifically, the III-nitrides and ZnO. By using cathodoluminescence spectroscopy, many of their properties have been investigated, including band gaps, defect energy levels, carrier lifetimes, strain states,…

The work contained in this dissertation is focused on the optical properties of direct band gap semiconductors which crystallize in a wurtzite structure: more specifically, the III-nitrides and ZnO. By using cathodoluminescence spectroscopy, many of their properties have been investigated, including band gaps, defect energy levels, carrier lifetimes, strain states, exciton binding energies, and effects of electron irradiation on luminescence. Part of this work is focused on p-type Mg-doped GaN and InGaN. These materials are extremely important for the fabrication of visible light emitting diodes and diode lasers and their complex nature is currently not entirely understood. The luminescence of Mg-doped GaN films has been correlated with electrical and structural measurements in order to understand the behavior of hydrogen in the material. Deeply-bound excitons emitting near 3.37 and 3.42 eV are observed in films with a significant hydrogen concentration during cathodoluminescence at liquid helium temperatures. These radiative transitions are unstable during electron irradiation. Our observations suggest a hydrogen-related nature, as opposed to a previous assignment of stacking fault luminescence. The intensity of the 3.37 eV transition can be correlated with the electrical activation of the Mg acceptors. Next, the acceptor energy level of Mg in InGaN is shown to decrease significantly with an increase in the indium composition. This also corresponds to a decrease in the resistivity of these films. In addition, the hole concentration in multiple quantum well light emitting diode structures is much more uniform in the active region when Mg-doped InGaN (instead of Mg-doped GaN) is used. These results will help improve the efficiency of light emitting diodes, especially in the green/yellow color range. Also, the improved hole transport may prove to be important for the development of photovoltaic devices. Cathodoluminescence studies have also been performed on nanoindented ZnO crystals. Bulk, single crystal ZnO was indented using a sub-micron spherical diamond tip on various surface orientations. The resistance to deformation (the "hardness") of each surface orientation was measured, with the c-plane being the most resistive. This is due to the orientation of the easy glide planes, the c-planes, being positioned perpendicularly to the applied load. The a-plane oriented crystal is the least resistive to deformation. Cathodoluminescence imaging allows for the correlation of the luminescence with the regions located near the indentation. Sub-nanometer shifts in the band edge emission have been assigned to residual strain the crystals. The a- and m-plane oriented crystals show two-fold symmetry with regions of compressive and tensile strain located parallel and perpendicular to the ±c-directions, respectively. The c-plane oriented crystal shows six-fold symmetry with regions of tensile strain extending along the six equivalent a-directions.

ContributorsJuday, Reid (Author) / Ponce, Fernando A. (Thesis advisor) / Drucker, Jeff (Committee member) / Mccartney, Martha R (Committee member) / Menéndez, Jose (Committee member) / Shumway, John (Committee member) / Arizona State University (Publisher)

Created2013

Building adaptive computational systems for physiological and biomedical data

Description

In recent years, machine learning and data mining technologies have received growing attention in several areas such as recommendation systems, natural language processing, speech and handwriting recognition, image processing and biomedical domain. Many of these applications which deal with physiological and biomedical data require person specific or person adaptive systems.…

In recent years, machine learning and data mining technologies have received growing attention in several areas such as recommendation systems, natural language processing, speech and handwriting recognition, image processing and biomedical domain. Many of these applications which deal with physiological and biomedical data require person specific or person adaptive systems. The greatest challenge in developing such systems is the subject-dependent data variations or subject-based variability in physiological and biomedical data, which leads to difference in data distributions making the task of modeling these data, using traditional machine learning algorithms, complex and challenging. As a result, despite the wide application of machine learning, efficient deployment of its principles to model real-world data is still a challenge. This dissertation addresses the problem of subject based variability in physiological and biomedical data and proposes person adaptive prediction models based on novel transfer and active learning algorithms, an emerging field in machine learning. One of the significant contributions of this dissertation is a person adaptive method, for early detection of muscle fatigue using Surface Electromyogram signals, based on a new multi-source transfer learning algorithm. This dissertation also proposes a subject-independent algorithm for grading the progression of muscle fatigue from 0 to 1 level in a test subject, during isometric or dynamic contractions, at real-time. Besides subject based variability, biomedical image data also varies due to variations in their imaging techniques, leading to distribution differences between the image databases. Hence a classifier learned on one database may perform poorly on the other database. Another significant contribution of this dissertation has been the design and development of an efficient biomedical image data annotation framework, based on a novel combination of transfer learning and a new batch-mode active learning method, capable of addressing the distribution differences across databases. The methodologies developed in this dissertation are relevant and applicable to a large set of computing problems where there is a high variation of data between subjects or sources, such as face detection, pose detection and speech recognition. From a broader perspective, these frameworks can be viewed as a first step towards design of automated adaptive systems for real world data.

ContributorsChattopadhyay, Rita (Author) / Panchanathan, Sethuraman (Thesis advisor) / Ye, Jieping (Thesis advisor) / Li, Baoxin (Committee member) / Santello, Marco (Committee member) / Arizona State University (Publisher)

Created2013

Gene regulatory networks: modeling, intervention and context

Description

Biological systems are complex in many dimensions as endless transportation and communication networks all function simultaneously. Our ability to intervene within both healthy and diseased systems is tied directly to our ability to understand and model core functionality. The progress in increasingly accurate and thorough high-throughput measurement technologies has provided…

Biological systems are complex in many dimensions as endless transportation and communication networks all function simultaneously. Our ability to intervene within both healthy and diseased systems is tied directly to our ability to understand and model core functionality. The progress in increasingly accurate and thorough high-throughput measurement technologies has provided a deluge of data from which we may attempt to infer a representation of the true genetic regulatory system. A gene regulatory network model, if accurate enough, may allow us to perform hypothesis testing in the form of computational experiments. Of great importance to modeling accuracy is the acknowledgment of biological contexts within the models -- i.e. recognizing the heterogeneous nature of the true biological system and the data it generates. This marriage of engineering, mathematics and computer science with systems biology creates a cycle of progress between computer simulation and lab experimentation, rapidly translating interventions and treatments for patients from the bench to the bedside. This dissertation will first discuss the landscape for modeling the biological system, explore the identification of targets for intervention in Boolean network models of biological interactions, and explore context specificity both in new graphical depictions of models embodying context-specific genomic regulation and in novel analysis approaches designed to reveal embedded contextual information. Overall, the dissertation will explore a spectrum of biological modeling with a goal towards therapeutic intervention, with both formal and informal notions of biological context, in such a way that will enable future work to have an even greater impact in terms of direct patient benefit on an individualized level.

ContributorsVerdicchio, Michael (Author) / Kim, Seungchan (Thesis advisor) / Baral, Chitta (Committee member) / Stolovitzky, Gustavo (Committee member) / Collofello, James (Committee member) / Arizona State University (Publisher)

Created2013

System-level synthesis of dataplane subsystems for MPSoCs

Description

In recent years we have witnessed a shift towards multi-processor system-on-chips (MPSoCs) to address the demands of embedded devices (such as cell phones, GPS devices, luxury car features, etc.). Highly optimized MPSoCs are well-suited to tackle the complex application demands desired by the end user customer. These MPSoCs incorporate a…

In recent years we have witnessed a shift towards multi-processor system-on-chips (MPSoCs) to address the demands of embedded devices (such as cell phones, GPS devices, luxury car features, etc.). Highly optimized MPSoCs are well-suited to tackle the complex application demands desired by the end user customer. These MPSoCs incorporate a constellation of heterogeneous processing elements (PEs) (general purpose PEs and application-specific integrated circuits (ASICS)). A typical MPSoC will be composed of a application processor, such as an ARM Coretex-A9 with cache coherent memory hierarchy, and several application sub-systems. Each of these sub-systems are composed of highly optimized instruction processors, graphics/DSP processors, and custom hardware accelerators. Typically, these sub-systems utilize scratchpad memories (SPM) rather than support cache coherency. The overall architecture is an integration of the various sub-systems through a high bandwidth system-level interconnect (such as a Network-on-Chip (NoC)). The shift to MPSoCs has been fueled by three major factors: demand for high performance, the use of component libraries, and short design turn around time. As customers continue to desire more and more complex applications on their embedded devices the performance demand for these devices continues to increase. Designers have turned to using MPSoCs to address this demand. By using pre-made IP libraries designers can quickly piece together a MPSoC that will meet the application demands of the end user with minimal time spent designing new hardware. Additionally, the use of MPSoCs allows designers to generate new devices very quickly and thus reducing the time to market. In this work, a complete MPSoC synthesis design flow is presented. We first present a technique \cite{leary1_intro} to address the synthesis of the interconnect architecture (particularly Network-on-Chip (NoC)). We then address the synthesis of the memory architecture of a MPSoC sub-system \cite{leary2_intro}. Lastly, we present a co-synthesis technique to generate the functional and memory architectures simultaneously. The validity and quality of each synthesis technique is demonstrated through extensive experimentation.

ContributorsLeary, Glenn (Author) / Chatha, Karamvir S (Thesis advisor) / Vrudhula, Sarma (Committee member) / Shrivastava, Aviral (Committee member) / Beraha, Rudy (Committee member) / Arizona State University (Publisher)

Created2013

Filtering by