Search Content

Structured sparse learning and its applications to biomedical and biological data

Description

Sparsity has become an important modeling tool in areas such as genetics, signal and audio processing, medical image processing, etc. Via the penalization of l-1 norm based regularization, the structured sparse learning algorithms can produce highly accurate models while imposing various predefined structures on the data, such as feature groups…

Sparsity has become an important modeling tool in areas such as genetics, signal and audio processing, medical image processing, etc. Via the penalization of l-1 norm based regularization, the structured sparse learning algorithms can produce highly accurate models while imposing various predefined structures on the data, such as feature groups or graphs. In this thesis, I first propose to solve a sparse learning model with a general group structure, where the predefined groups may overlap with each other. Then, I present three real world applications which can benefit from the group structured sparse learning technique. In the first application, I study the Alzheimer's Disease diagnosis problem using multi-modality neuroimaging data. In this dataset, not every subject has all data sources available, exhibiting an unique and challenging block-wise missing pattern. In the second application, I study the automatic annotation and retrieval of fruit-fly gene expression pattern images. Combined with the spatial information, sparse learning techniques can be used to construct effective representation of the expression images. In the third application, I present a new computational approach to annotate developmental stage for Drosophila embryos in the gene expression images. In addition, it provides a stage score that enables one to more finely annotate each embryo so that they are divided into early and late periods of development within standard stage demarcations. Stage scores help us to illuminate global gene activities and changes much better, and more refined stage annotations improve our ability to better interpret results when expression pattern matches are discovered between genes.

ContributorsYuan, Lei (Author) / Ye, Jieping (Thesis advisor) / Wang, Yalin (Committee member) / Xue, Guoliang (Committee member) / Kumar, Sudhir (Committee member) / Arizona State University (Publisher)

Created2013

Batch mode active learning for multimedia pattern recognition

Description

The rapid escalation of technology and the widespread emergence of modern technological equipments have resulted in the generation of humongous amounts of digital data (in the form of images, videos and text). This has expanded the possibility of solving real world problems using computational learning frameworks. However, while gathering a…

The rapid escalation of technology and the widespread emergence of modern technological equipments have resulted in the generation of humongous amounts of digital data (in the form of images, videos and text). This has expanded the possibility of solving real world problems using computational learning frameworks. However, while gathering a large amount of data is cheap and easy, annotating them with class labels is an expensive process in terms of time, labor and human expertise. This has paved the way for research in the field of active learning. Such algorithms automatically select the salient and exemplar instances from large quantities of unlabeled data and are effective in reducing human labeling effort in inducing classification models. To utilize the possible presence of multiple labeling agents, there have been attempts towards a batch mode form of active learning, where a batch of data instances is selected simultaneously for manual annotation. This dissertation is aimed at the development of novel batch mode active learning algorithms to reduce manual effort in training classification models in real world multimedia pattern recognition applications. Four major contributions are proposed in this work: $(i)$ a framework for dynamic batch mode active learning, where the batch size and the specific data instances to be queried are selected adaptively through a single formulation, based on the complexity of the data stream in question, $(ii)$ a batch mode active learning strategy for fuzzy label classification problems, where there is an inherent imprecision and vagueness in the class label definitions, $(iii)$ batch mode active learning algorithms based on convex relaxations of an NP-hard integer quadratic programming (IQP) problem, with guaranteed bounds on the solution quality and $(iv)$ an active matrix completion algorithm and its application to solve several variants of the active learning problem (transductive active learning, multi-label active learning, active feature acquisition and active learning for regression). These contributions are validated on the face recognition and facial expression recognition problems (which are commonly encountered in real world applications like robotics, security and assistive technology for the blind and the visually impaired) and also on collaborative filtering applications like movie recommendation.

ContributorsChakraborty, Shayok (Author) / Panchanathan, Sethuraman (Thesis advisor) / Balasubramanian, Vineeth N. (Committee member) / Li, Baoxin (Committee member) / Mittelmann, Hans (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Embedded resource accounting with applications to water embedded in energy trade in the western U.S

Description

Water resource management is becoming increasingly burdened by uncertain and fluctuating conditions resulting from climate change and population growth which place increased demands on already strained resources. Innovative water management schemes are necessary to address the reality of available water supplies. One such approach is the substitution of trade in…

Water resource management is becoming increasingly burdened by uncertain and fluctuating conditions resulting from climate change and population growth which place increased demands on already strained resources. Innovative water management schemes are necessary to address the reality of available water supplies. One such approach is the substitution of trade in virtual water for the use of local water supplies. This study provides a review of existing work in the use of virtual water and water footprint methods. Virtual water trade has been shown to be a successful method for addressing water scarcity and decreasing overall water consumption by shifting high water consumptive processes to wetter regions. These results however assume that all water resource supplies are equivalent regardless of physical location and they do not tie directly to economic markets. In this study we introduce a new mathematical framework, Embedded Resource Accounting (ERA), which is a synthesis of several different analytical methods presently used to quantify and describe human interactions with the economy and the natural environment. We define the specifics of the ERA framework in a generic context for the analysis of embedded resource trade in a way that links directly with the economics of that trade. Acknowledging the cyclical nature of water and the abundance of actual water resources on Earth, this study addresses fresh water availability within a given region. That is to say, the quantities of fresh water supplies annually available at acceptable quality for anthropogenic uses. The results of this research provide useful tools for water resource managers and policy makers to inform decision making on, (1) reallocation of local available fresh water resources, and (2) strategic supplementation of those resources with outside fresh water resources via the import of virtual water.

ContributorsAdams, Elizabeth Anne (Author) / Ruddell, Benjamin L (Thesis advisor) / Allenby, Braden R. (Thesis advisor) / Seager, Thomas P (Committee member) / Arizona State University (Publisher)

Created2013

Nanofluidics for single molecule DNA sequencing

Description

After a decade of efforts, accurate and affordable DNA sequencing continues to remain an important goal in current research landscape. This thesis starts with a brief overview of the recent updates in the field of DNA sequencing technologies followed by description of the nanofluidics route to single molecule DNA detection.…

After a decade of efforts, accurate and affordable DNA sequencing continues to remain an important goal in current research landscape. This thesis starts with a brief overview of the recent updates in the field of DNA sequencing technologies followed by description of the nanofluidics route to single molecule DNA detection. Chapter 2 presents discusses carbon nanotube(CNT) based nanofluidics. The fabrication and DNA sensing measurements of CNT forest membrane devices are presented. Chapter 3 gives the background for functionalization and recognition aspects of reader molecules. Chapter 4 marks the transition to solid state nanopore nanofluidics. The fabrication of Imidazole functionalized nanopores is discussed. The Single Molecule detection results of DNA from Palladium nanopore devices are presented next. Combining chemical recognition to nanopore technology, it has been possible to prolong the duration of single molecule events from the order of a few micro seconds to upto a few milliseconds. Overall, the work presented in this thesis promises longer single molecule detection time in a nanofludic set up and paves way for novel nanopore- tunnel junction devices that combine recognition chemistry, tunneling device and nanopore approach.

ContributorsKrishnakumar, Padmini (Author) / Lindsay, Stuart (Thesis advisor) / He, Jin (Committee member) / Vaiana, Sara (Committee member) / Schmidt, Kevin (Committee member) / Arizona State University (Publisher)

Created2013

Volume change behavior of expansive soils due to wetting and drying cycles

Description

In a laboratory setting, the soil volume change behavior is best represented by using various testing standards on undisturbed or remolded samples. Whenever possible, it is most precise to use undisturbed samples to assess the volume change behavior but in the absence of undisturbed specimens, remodeled samples can be used.…

In a laboratory setting, the soil volume change behavior is best represented by using various testing standards on undisturbed or remolded samples. Whenever possible, it is most precise to use undisturbed samples to assess the volume change behavior but in the absence of undisturbed specimens, remodeled samples can be used. If that is the case, the soil is compacted to in-situ density and water content (or matric suction), which should best represent the expansive profile in question. It is standard practice to subject the specimen to a wetting process at a particular net normal stress. Even though currently accepted laboratory testing standard procedures provide insight on how the profile conditions changes with time, these procedures do not assess the long term effects on the soil due to climatic changes. In this experimental study, an assessment and quantification of the effect of multiple wetting/drying cycles on the volume change behavior of two different naturally occurring soils was performed. The changes in wetting and drying cycles were extreme when comparing the swings in matric suction. During the drying cycle, the expansive soil was subjected to extreme conditions, which decreased the moisture content less than the shrinkage limit. Nevertheless, both soils were remolded at five different compacted conditions and loaded to five different net normal stresses. Each sample was subjected to six wetting and drying cycles. During the assessment, it was evident from the results that the swell/collapse strain is highly non-linear at low stress levels. The strain-net normal stress relationship cannot be defined by one single function without transforming the data. Therefore, the dataset needs to be fitted to a bi-modal logarithmic function or to a logarithmic transformation of net normal stress in order to use a third order polynomial fit. It was also determined that the moisture content changes with time are best fit by non-linear functions. For the drying cycle, the radial strain was determined to have a constant rate of change with respect to the axial strain. However, for the wetting cycle, there was not enough radial strain data to develop correlations and therefore, an assumption was made based on 55 different test measurements/observations, for the wetting cycles. In general, it was observed that after each subsequent cycle, higher swelling was exhibited for lower net normal stress values; while higher collapse potential was observed for higher net normal stress values, once the net normal stress was less than/greater than a threshold net normal stress value. Furthermore, the swelling pressure underwent a reduction in all cases. Particularly, the Anthem soil exhibited a reduction in swelling pressure by at least 20 percent after the first wetting/drying cycle; while Colorado soil exhibited a reduction of 50 percent. After about the fourth cycle, the swelling pressure seemed to stabilized to an equilibrium value at which a reduction of 46 percent was observed for the Anthem soil and 68 percent reduction for the Colorado soil. The impact of the initial compacted conditions on heave characteristics was studied. Results indicated that materials compacted at higher densities exhibited greater swell potential. When comparing specimens compacted at the same density but at different moisture content (matric suction), it was observed that specimens compacted at higher suction would exhibit higher swelling potential, when subjected to the same net normal stress. The least amount of swelling strain was observed on specimens compacted at the lowest dry density and the lowest matric suction (higher water content). The results from the laboratory testing were used to develop ultimate heave profiles for both soils. This analysis showed that even though the swell pressure for each soil decreased with cycles, the amount of heave would increase or decrease depending upon the initial compaction condition. When the specimen was compacted at 110% of optimum moisture content and 90% of maximum dry density, it resulted in an ultimate heave reduction of 92 percent for Anthem and 685 percent for Colorado soil. On the other hand, when the soils were compacted at 90% optimum moisture content and 100% of the maximum dry density, Anthem specimens heave 78% more and Colorado specimens heave was reduced by 69%. Based on the results obtained, it is evident that the current methods to estimate heave and swelling pressure do not consider the effect of wetting/drying cycles; and seem to fail capturing the free swell potential of the soil. Recommendations for improvement current methods of practice are provided.

ContributorsRosenbalm, Daniel Curtis (Author) / Zapata, Claudia E (Thesis advisor) / Houston, Sandra L. (Committee member) / Kavazanjian, Edward (Committee member) / Witczak, Mathew W (Committee member) / Arizona State University (Publisher)

Created2013

Exploring the impact of varying levels of augmented reality to teach probability and sampling with a mobile device

Description

Statistics is taught at every level of education, yet teachers often have to assume their students have no knowledge of statistics and start from scratch each time they set out to teach statistics. The motivation for this experimental study comes from interest in exploring educational applications of augmented reality (AR)…

Statistics is taught at every level of education, yet teachers often have to assume their students have no knowledge of statistics and start from scratch each time they set out to teach statistics. The motivation for this experimental study comes from interest in exploring educational applications of augmented reality (AR) delivered via mobile technology that could potentially provide rich, contextualized learning for understanding concepts related to statistics education. This study examined the effects of AR experiences for learning basic statistical concepts. Using a 3 x 2 research design, this study compared learning gains of 252 undergraduate and graduate students from a pre- and posttest given before and after interacting with one of three types of augmented reality experiences, a high AR experience (interacting with three dimensional images coupled with movement through a physical space), a low AR experience (interacting with three dimensional images without movement), or no AR experience (two dimensional images without movement). Two levels of collaboration (pairs and no pairs) were also included. Additionally, student perceptions toward collaboration opportunities and engagement were compared across the six treatment conditions. Other demographic information collected included the students' previous statistics experience, as well as their comfort level in using mobile devices. The moderating variables included prior knowledge (high, average, and low) as measured by the student's pretest score. Taking into account prior knowledge, students with low prior knowledge assigned to either high or low AR experience had statistically significant higher learning gains than those assigned to a no AR experience. On the other hand, the results showed no statistical significance between students assigned to work individually versus in pairs. Students assigned to both high and low AR experience perceived a statistically significant higher level of engagement than their no AR counterparts. Students with low prior knowledge benefited the most from the high AR condition in learning gains. Overall, the AR application did well for providing a hands-on experience working with statistical data. Further research on AR and its relationship to spatial cognition, situated learning, high order skill development, performance support, and other classroom applications for learning is still needed.

ContributorsConley, Quincy (Author) / Atkinson, Robert K (Thesis advisor) / Nguyen, Frank (Committee member) / Nelson, Brian C (Committee member) / Arizona State University (Publisher)

Created2013

Epitaxy of group IV optical materials and synthesis of IV/III-V semiconductor analogs by designer hydride chemistries

Description

The thesis studies new methods to fabricate optoelectronic Ge1-ySny/Si(100) alloys and investigate their photoluminescence (PL) properties for possible applications in Si-based photonics including IR lasers. The work initially investigated the origin of the difference between the PL spectrum of bulk Ge, dominated by indirect gap emission, and the PL spectrum…

The thesis studies new methods to fabricate optoelectronic Ge1-ySny/Si(100) alloys and investigate their photoluminescence (PL) properties for possible applications in Si-based photonics including IR lasers. The work initially investigated the origin of the difference between the PL spectrum of bulk Ge, dominated by indirect gap emission, and the PL spectrum of Ge-on-Si films, dominated by direct gap emission. It was found that the difference is due to the supression of self-absorption effects in Ge films, combined with a deviation from quasi-equilibrium conditions in the conduction band of undoped films. The latter is confirmed by a model suggesting that the deviation is caused by the shorter recombination lifetime in the films relative to bulk Ge. The knowledge acquired from this work was then utilized to study the PL properties of n-type Ge1-ySny/Si (y=0.004-0.04) samples grown via chemical vapor deposition of Ge2H6/SnD4/P(GeH3)3. It was found that the emission intensity (I) of these samples is at least 10x stronger than observed in un-doped counterparts and that the Idir/Iind ratio of direct over indirect gap emission increases for high-Sn contents due to the reduced gamma-L valley separation, as expected. Next the PL investigation was expanded to samples with y=0.05-0.09 grown via a new method using the more reactive Ge3H8 in place of Ge2H6. Optical quality, 1-um thick Ge1-ySny/Si(100) layers were produced using Ge3H10/SnD4 and found to exhibit strong, tunable PL near the threshold of the direct-indirect bandgap crossover. A byproduct of this study was the development of an enhanced process to produce Ge3H8, Ge4H10, and Ge5H12 analogs for application in ultra-low temperature deposition of Group-IV semiconductors. The thesis also studies synthesis routes of an entirely new class of semiconductor compounds and alloys described by Si5-2y(III-V)y (III=Al, V= As, P) comprising of specifically designed diamond-like structures based on a Si parent lattice incorporating isolated III-V units. The common theme of the two thesis topics is the development of new mono-crystalline materials on ubiquitous silicon platforms with the objective of enhancing the optoelectronic performance of Si and Ge semiconductors, potentially leading to the design of next generation optical devices including lasers, detectors and solar cells.

ContributorsGrzybowski, Gordon (Author) / Kouvetakis, John (Thesis advisor) / Chizmeshya, Andrew (Committee member) / Menéndez, Jose (Committee member) / Arizona State University (Publisher)

Created2013

Advancing biomedical named entity recognition with multivariate feature selection and semantically motivated features

Description

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located…

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located within natural-language text and their semantic type is determined. This step is critical for later tasks in an information extraction pipeline, including normalization and relationship extraction. BANNER is a benchmark biomedical NER system using linear-chain conditional random fields and the rich feature set approach. A case study with BANNER locating genes and proteins in biomedical literature is described. The first corpus for disease NER adequate for use as training data is introduced, and employed in a case study of disease NER. The first corpus locating adverse drug reactions (ADRs) in user posts to a health-related social website is also described, and a system to locate and identify ADRs in social media text is created and evaluated. The rich feature set approach to creating NER feature sets is argued to be subject to diminishing returns, implying that additional improvements may require more sophisticated methods for creating the feature set. This motivates the first application of multivariate feature selection with filters and false discovery rate analysis to biomedical NER, resulting in a feature set at least 3 orders of magnitude smaller than the set created by the rich feature set approach. Finally, two novel approaches to NER by modeling the semantics of token sequences are introduced. The first method focuses on the sequence content by using language models to determine whether a sequence resembles entries in a lexicon of entity names or text from an unlabeled corpus more closely. The second method models the distributional semantics of token sequences, determining the similarity between a potential mention and the token sequences from the training data by analyzing the contexts where each sequence appears in a large unlabeled corpus. The second method is shown to improve the performance of BANNER on multiple data sets.

ContributorsLeaman, James Robert (Author) / Gonzalez, Graciela (Thesis advisor) / Baral, Chitta (Thesis advisor) / Cohen, Kevin B (Committee member) / Liu, Huan (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Optical properties of wurtzite semiconductors studied using cathodoluminescence imaging and spectroscopy

Description

The work contained in this dissertation is focused on the optical properties of direct band gap semiconductors which crystallize in a wurtzite structure: more specifically, the III-nitrides and ZnO. By using cathodoluminescence spectroscopy, many of their properties have been investigated, including band gaps, defect energy levels, carrier lifetimes, strain states,…

The work contained in this dissertation is focused on the optical properties of direct band gap semiconductors which crystallize in a wurtzite structure: more specifically, the III-nitrides and ZnO. By using cathodoluminescence spectroscopy, many of their properties have been investigated, including band gaps, defect energy levels, carrier lifetimes, strain states, exciton binding energies, and effects of electron irradiation on luminescence. Part of this work is focused on p-type Mg-doped GaN and InGaN. These materials are extremely important for the fabrication of visible light emitting diodes and diode lasers and their complex nature is currently not entirely understood. The luminescence of Mg-doped GaN films has been correlated with electrical and structural measurements in order to understand the behavior of hydrogen in the material. Deeply-bound excitons emitting near 3.37 and 3.42 eV are observed in films with a significant hydrogen concentration during cathodoluminescence at liquid helium temperatures. These radiative transitions are unstable during electron irradiation. Our observations suggest a hydrogen-related nature, as opposed to a previous assignment of stacking fault luminescence. The intensity of the 3.37 eV transition can be correlated with the electrical activation of the Mg acceptors. Next, the acceptor energy level of Mg in InGaN is shown to decrease significantly with an increase in the indium composition. This also corresponds to a decrease in the resistivity of these films. In addition, the hole concentration in multiple quantum well light emitting diode structures is much more uniform in the active region when Mg-doped InGaN (instead of Mg-doped GaN) is used. These results will help improve the efficiency of light emitting diodes, especially in the green/yellow color range. Also, the improved hole transport may prove to be important for the development of photovoltaic devices. Cathodoluminescence studies have also been performed on nanoindented ZnO crystals. Bulk, single crystal ZnO was indented using a sub-micron spherical diamond tip on various surface orientations. The resistance to deformation (the "hardness") of each surface orientation was measured, with the c-plane being the most resistive. This is due to the orientation of the easy glide planes, the c-planes, being positioned perpendicularly to the applied load. The a-plane oriented crystal is the least resistive to deformation. Cathodoluminescence imaging allows for the correlation of the luminescence with the regions located near the indentation. Sub-nanometer shifts in the band edge emission have been assigned to residual strain the crystals. The a- and m-plane oriented crystals show two-fold symmetry with regions of compressive and tensile strain located parallel and perpendicular to the ±c-directions, respectively. The c-plane oriented crystal shows six-fold symmetry with regions of tensile strain extending along the six equivalent a-directions.

ContributorsJuday, Reid (Author) / Ponce, Fernando A. (Thesis advisor) / Drucker, Jeff (Committee member) / Mccartney, Martha R (Committee member) / Menéndez, Jose (Committee member) / Shumway, John (Committee member) / Arizona State University (Publisher)

Created2013

Building adaptive computational systems for physiological and biomedical data

Description

In recent years, machine learning and data mining technologies have received growing attention in several areas such as recommendation systems, natural language processing, speech and handwriting recognition, image processing and biomedical domain. Many of these applications which deal with physiological and biomedical data require person specific or person adaptive systems.…

In recent years, machine learning and data mining technologies have received growing attention in several areas such as recommendation systems, natural language processing, speech and handwriting recognition, image processing and biomedical domain. Many of these applications which deal with physiological and biomedical data require person specific or person adaptive systems. The greatest challenge in developing such systems is the subject-dependent data variations or subject-based variability in physiological and biomedical data, which leads to difference in data distributions making the task of modeling these data, using traditional machine learning algorithms, complex and challenging. As a result, despite the wide application of machine learning, efficient deployment of its principles to model real-world data is still a challenge. This dissertation addresses the problem of subject based variability in physiological and biomedical data and proposes person adaptive prediction models based on novel transfer and active learning algorithms, an emerging field in machine learning. One of the significant contributions of this dissertation is a person adaptive method, for early detection of muscle fatigue using Surface Electromyogram signals, based on a new multi-source transfer learning algorithm. This dissertation also proposes a subject-independent algorithm for grading the progression of muscle fatigue from 0 to 1 level in a test subject, during isometric or dynamic contractions, at real-time. Besides subject based variability, biomedical image data also varies due to variations in their imaging techniques, leading to distribution differences between the image databases. Hence a classifier learned on one database may perform poorly on the other database. Another significant contribution of this dissertation has been the design and development of an efficient biomedical image data annotation framework, based on a novel combination of transfer learning and a new batch-mode active learning method, capable of addressing the distribution differences across databases. The methodologies developed in this dissertation are relevant and applicable to a large set of computing problems where there is a high variation of data between subjects or sources, such as face detection, pose detection and speech recognition. From a broader perspective, these frameworks can be viewed as a first step towards design of automated adaptive systems for real world data.

ContributorsChattopadhyay, Rita (Author) / Panchanathan, Sethuraman (Thesis advisor) / Ye, Jieping (Thesis advisor) / Li, Baoxin (Committee member) / Santello, Marco (Committee member) / Arizona State University (Publisher)

Created2013

Filtering by