Search Content

Sparse learning package with stability selection and application to alzheimer's disease

Description

Sparse learning is a technique in machine learning for feature selection and dimensionality reduction, to find a sparse set of the most relevant features. In any machine learning problem, there is a considerable amount of irrelevant information, and separating relevant information from the irrelevant information has been a topic of…

Sparse learning is a technique in machine learning for feature selection and dimensionality reduction, to find a sparse set of the most relevant features. In any machine learning problem, there is a considerable amount of irrelevant information, and separating relevant information from the irrelevant information has been a topic of focus. In supervised learning like regression, the data consists of many features and only a subset of the features may be responsible for the result. Also, the features might require special structural requirements, which introduces additional complexity for feature selection. The sparse learning package, provides a set of algorithms for learning a sparse set of the most relevant features for both regression and classification problems. Structural dependencies among features which introduce additional requirements are also provided as part of the package. The features may be grouped together, and there may exist hierarchies and over- lapping groups among these, and there may be requirements for selecting the most relevant groups among them. In spite of getting sparse solutions, the solutions are not guaranteed to be robust. For the selection to be robust, there are certain techniques which provide theoretical justification of why certain features are selected. The stability selection, is a method for feature selection which allows the use of existing sparse learning methods to select the stable set of features for a given training sample. This is done by assigning probabilities for the features: by sub-sampling the training data and using a specific sparse learning technique to learn the relevant features, and repeating this a large number of times, and counting the probability as the number of times a feature is selected. Cross-validation which is used to determine the best parameter value over a range of values, further allows to select the best parameter value. This is done by selecting the parameter value which gives the maximum accuracy score. With such a combination of algorithms, with good convergence guarantees, stable feature selection properties and the inclusion of various structural dependencies among features, the sparse learning package will be a powerful tool for machine learning research. Modular structure, C implementation, ATLAS integration for fast linear algebraic subroutines, make it one of the best tool for a large sparse setting. The varied collection of algorithms, support for group sparsity, batch algorithms, are a few of the notable functionality of the SLEP package, and these features can be used in a variety of fields to infer relevant elements. The Alzheimer Disease(AD) is a neurodegenerative disease, which gradually leads to dementia. The SLEP package is used for feature selection for getting the most relevant biomarkers from the available AD dataset, and the results show that, indeed, only a subset of the features are required to gain valuable insights.

ContributorsThulasiram, Ramesh (Author) / Ye, Jieping (Thesis advisor) / Xue, Guoliang (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2011

Computational methods for perceptual training in radiology

Description

Medical images constitute a special class of images that are captured to allow diagnosis of disease, and their "correct" interpretation is vitally important. Because they are not "natural" images, radiologists must be trained to visually interpret them. This training process includes implicit perceptual learning that is gradually acquired over an…

Medical images constitute a special class of images that are captured to allow diagnosis of disease, and their "correct" interpretation is vitally important. Because they are not "natural" images, radiologists must be trained to visually interpret them. This training process includes implicit perceptual learning that is gradually acquired over an extended period of exposure to medical images. This dissertation proposes novel computational methods for evaluating and facilitating perceptual training in radiologists. Part 1 of this dissertation proposes an eye-tracking-based metric for measuring the training progress of individual radiologists. Six metrics were identified as potentially useful: time to complete task, fixation count, fixation duration, consciously viewed regions, subconsciously viewed regions, and saccadic length. Part 2 of this dissertation proposes an eye-tracking-based entropy metric for tracking the rise and fall in the interest level of radiologists, as they scan chest radiographs. The results showed that entropy was significantly lower when radiologists were fixating on abnormal regions. Part 3 of this dissertation develops a method that allows extraction of Gabor-based feature vectors from corresponding anatomical regions of "normal" chest radiographs, despite anatomical variations across populations. These feature vectors are then used to develop and compare transductive and inductive computational methods for generating overlay maps that show atypical regions within test radiographs. The results show that the transductive methods produced much better maps than the inductive methods for 20 ground-truthed test radiographs. Part 4 of this dissertation uses an Extended Fuzzy C-Means (EFCM) based instance selection method to reduce the computational cost of transductive methods. The results showed that EFCM substantially reduced the computational cost without a substantial drop in performance. The dissertation then proposes a novel Variance Based Instance Selection (VBIS) method that also reduces the computational cost, but allows for incremental incorporation of new informative radiographs, as they are encountered. Part 5 of this dissertation develops and demonstrates a novel semi-transductive framework that combines the superior performance of transductive methods with the reduced computational cost of inductive methods. The results showed that the semi-transductive approach provided both an effective and efficient framework for detection of atypical regions in chest radiographs.

ContributorsAlzubaidi, Mohammad A (Author) / Panchanathan, Sethuraman (Thesis advisor) / Black, John A. (Committee member) / Ye, Jieping (Committee member) / Patel, Ameet (Committee member) / Arizona State University (Publisher)

Created2012

Machine learning methods for biosignature discovery

Description

Alzheimer's Disease (AD) is the most common form of dementia observed in elderly patients and has significant social-economic impact. There are many initiatives which aim to capture leading causes of AD. Several genetic, imaging, and biochemical markers are being explored to monitor progression of AD and explore treatment and detection…

Alzheimer's Disease (AD) is the most common form of dementia observed in elderly patients and has significant social-economic impact. There are many initiatives which aim to capture leading causes of AD. Several genetic, imaging, and biochemical markers are being explored to monitor progression of AD and explore treatment and detection options. The primary focus of this thesis is to identify key biomarkers to understand the pathogenesis and prognosis of Alzheimer's Disease. Feature selection is the process of finding a subset of relevant features to develop efficient and robust learning models. It is an active research topic in diverse areas such as computer vision, bioinformatics, information retrieval, chemical informatics, and computational finance. In this work, state of the art feature selection algorithms, such as Student's t-test, Relief-F, Information Gain, Gini Index, Chi-Square, Fisher Kernel Score, Kruskal-Wallis, Minimum Redundancy Maximum Relevance, and Sparse Logistic regression with Stability Selection have been extensively exploited to identify informative features for AD using data from Alzheimer's Disease Neuroimaging Initiative (ADNI). An integrative approach which uses blood plasma protein, Magnetic Resonance Imaging, and psychometric assessment scores biomarkers has been explored. This work also analyzes the techniques to handle unbalanced data and evaluate the efficacy of sampling techniques. Performance of feature selection algorithm is evaluated using the relevance of derived features and the predictive power of the algorithm using Random Forest and Support Vector Machine classifiers. Performance metrics such as Accuracy, Sensitivity and Specificity, and area under the Receiver Operating Characteristic curve (AUC) have been used for evaluation. The feature selection algorithms best suited to analyze AD proteomics data have been proposed. The key biomarkers distinguishing healthy and AD patients, Mild Cognitive Impairment (MCI) converters and non-converters, and healthy and MCI patients have been identified.

ContributorsDubey, Rashmi (Author) / Ye, Jieping (Thesis advisor) / Wang, Yalin (Committee member) / Wu, Tong (Committee member) / Arizona State University (Publisher)

Created2012

Novel statistical models for complex data structures

Description

Rapid advance in sensor and information technology has resulted in both spatially and temporally data-rich environment, which creates a pressing need for us to develop novel statistical methods and the associated computational tools to extract intelligent knowledge and informative patterns from these massive datasets. The statistical challenges for addressing these…

Rapid advance in sensor and information technology has resulted in both spatially and temporally data-rich environment, which creates a pressing need for us to develop novel statistical methods and the associated computational tools to extract intelligent knowledge and informative patterns from these massive datasets. The statistical challenges for addressing these massive datasets lay in their complex structures, such as high-dimensionality, hierarchy, multi-modality, heterogeneity and data uncertainty. Besides the statistical challenges, the associated computational approaches are also considered essential in achieving efficiency, effectiveness, as well as the numerical stability in practice. On the other hand, some recent developments in statistics and machine learning, such as sparse learning, transfer learning, and some traditional methodologies which still hold potential, such as multi-level models, all shed lights on addressing these complex datasets in a statistically powerful and computationally efficient way. In this dissertation, we identify four kinds of general complex datasets, including "high-dimensional datasets", "hierarchically-structured datasets", "multimodality datasets" and "data uncertainties", which are ubiquitous in many domains, such as biology, medicine, neuroscience, health care delivery, manufacturing, etc. We depict the development of novel statistical models to analyze complex datasets which fall under these four categories, and we show how these models can be applied to some real-world applications, such as Alzheimer's disease research, nursing care process, and manufacturing.

ContributorsHuang, Shuai (Author) / Li, Jing (Thesis advisor) / Askin, Ronald (Committee member) / Ye, Jieping (Committee member) / Runger, George C. (Committee member) / Arizona State University (Publisher)

Created2012

Transfer Learning for BioImaging and Bilingual Applications

Description

Discriminative learning when training and test data belong to different distributions is a challenging and complex task. Often times we have very few or no labeled data from the test or target distribution, but we may have plenty of labeled data from one or multiple related sources with different distributions.…

Discriminative learning when training and test data belong to different distributions is a challenging and complex task. Often times we have very few or no labeled data from the test or target distribution, but we may have plenty of labeled data from one or multiple related sources with different distributions. Due to its capability of migrating knowledge from related domains, transfer learning has shown to be effective for cross-domain learning problems. In this dissertation, I carry out research along this direction with a particular focus on designing efficient and effective algorithms for BioImaging and Bilingual applications. Specifically, I propose deep transfer learning algorithms which combine transfer learning and deep learning to improve image annotation performance. Firstly, I propose to generate the deep features for the Drosophila embryo images via pretrained deep models and build linear classifiers on top of the deep features. Secondly, I propose to fine-tune the pretrained model with a small amount of labeled images. The time complexity and performance of deep transfer learning methodologies are investigated. Promising results have demonstrated the knowledge transfer ability of proposed deep transfer algorithms. Moreover, I propose a novel Robust Principal Component Analysis (RPCA) approach to process the noisy images in advance. In addition, I also present a two-stage re-weighting framework for general domain adaptation problems. The distribution of source domain is mapped towards the target domain in the first stage, and an adaptive learning model is proposed in the second stage to incorporate label information from the target domain if it is available. Then the proposed model is applied to tackle cross lingual spam detection problem at LinkedIn’s website. Our experimental results on real data demonstrate the efficiency and effectiveness of the proposed algorithms.

ContributorsSun, Qian (Author) / Ye, Jieping (Committee member) / Xue, Guoliang (Committee member) / Liu, Huan (Committee member) / Li, Jing (Committee member) / Arizona State University (Publisher)

Created2015

Mining Data with Feature Interactions

Description

Models using feature interactions have been applied successfully in many areas such as biomedical analysis, recommender systems. The popularity of using feature interactions mainly lies in (1) they are able to capture the nonlinearity of the data compared with linear effects and (2) they enjoy great interpretability. In this thesis,…

Models using feature interactions have been applied successfully in many areas such as biomedical analysis, recommender systems. The popularity of using feature interactions mainly lies in (1) they are able to capture the nonlinearity of the data compared with linear effects and (2) they enjoy great interpretability. In this thesis, I propose a series of formulations using feature interactions for real world problems and develop efficient algorithms for solving them.

Specifically, I first propose to directly solve the non-convex formulation of the weak hierarchical Lasso which imposes weak hierarchy on individual features and interactions but can only be approximately solved by a convex relaxation in existing studies. I further propose to use the non-convex weak hierarchical Lasso formulation for hypothesis testing on the interaction features with hierarchical assumptions. Secondly, I propose a type of bi-linear models that take advantage of interactions of features for drug discovery problems where specific drug-drug pairs or drug-disease pairs are of interest. These models are learned by maximizing the number of positive data pairs that rank above the average score of unlabeled data pairs. Then I generalize the method to the case of using the top-ranked unlabeled data pairs for representative construction and derive an efficient algorithm for the extended formulation. Last but not least, motivated by a special form of bi-linear models, I propose a framework that enables simultaneously subgrouping data points and building specific models on the subgroups for learning on massive and heterogeneous datasets. Experiments on synthetic and real datasets are conducted to demonstrate the effectiveness or efficiency of the proposed methods.

ContributorsLiu, Yashu (Author) / Ye, Jieping (Thesis advisor) / Xue, Guoliang (Thesis advisor) / Liu, Huan (Committee member) / Mittelmann, Hans D (Committee member) / Arizona State University (Publisher)

Created2018

A History and Analysis of Drug Labeling Policy for Pregnant and Lactating Women and Women's Involvement in Clinical Drug Research from 1970 to 2014

Description

The inherent risk in testing drugs has been hotly debated since the government first started regulating the drug industry in the early 1900s. Who can assume the risks associated with trying new pharmaceuticals is unclear when looked at through society's lens. In the mid twentieth century, the US Food and…

The inherent risk in testing drugs has been hotly debated since the government first started regulating the drug industry in the early 1900s. Who can assume the risks associated with trying new pharmaceuticals is unclear when looked at through society's lens. In the mid twentieth century, the US Food and Drug Administration (FDA) published several guidance documents encouraging researchers to exclude women from early clinical drug research. The motivation to publish those documents and the subsequent guidance documents in which the FDA and other regulatory offices established their standpoints on women in drug research may have been connected to current events at the time. The problem of whether women should be involved in drug research is a question of who can assume risk and who is responsible for disseminating what specific kinds of information. The problem tends to be framed as one that juxtaposes the health of women and fetuses and sets their health as in opposition. That opposition, coupled with the inherent uncertainty in testing drugs, provides for a complex set of issues surrounding consent and access to information.

ContributorsMeek, Caroline Jane (Author) / Maienschein, Jane (Thesis director) / Brian, Jennifer (Committee member) / School of Life Sciences (Contributor) / Sanford School of Social and Family Dynamics (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Addressing Mental Health in Rural Indian Primary Schools Through Experiential Learning: A Viable Model?

Description

Social-emotional learning (SEL) methods are beginning to receive global attention in primary school education, yet the dominant emphasis on implementing these curricula is in high-income, urbanized areas. Consequently, the unique features of developing and integrating such methods in middle- or low-income rural areas are unclear. Past studies suggest that students…

Social-emotional learning (SEL) methods are beginning to receive global attention in primary school education, yet the dominant emphasis on implementing these curricula is in high-income, urbanized areas. Consequently, the unique features of developing and integrating such methods in middle- or low-income rural areas are unclear. Past studies suggest that students exposed to SEL programs show an increase in academic performance, improved ability to cope with stress, and better attitudes about themselves, others, and school, but these curricula are designed with an urban focus. The purpose of this study was to conduct a needs-based analysis to investigate components specific to a SEL curriculum contextualized to rural primary schools. A promising organization committed to rural educational development is Barefoot College, located in Tilonia, Rajasthan, India. In partnership with Barefoot, we designed an ethnographic study to identify and describe what teachers and school leaders consider the highest needs related to their students' social and emotional education. To do so, we interviewed 14 teachers and school leaders individually or in a focus group to explore their present understanding of “social-emotional learning” and the perception of their students’ social and emotional intelligence. Analysis of this data uncovered common themes among classroom behaviors and prevalent opportunities to address social and emotional well-being among students. These themes translated into the three overarching topics and eight sub-topics explored throughout the curriculum, and these opportunities guided the creation of the 21 modules within it. Through a design-based research methodology, we developed a 40-hour curriculum by implementing its various modules within seven Barefoot classrooms alongside continuous reiteration based on teacher feedback and participant observation. Through this process, we found that student engagement increased during contextualized SEL lessons as opposed to traditional methods. In addition, we found that teachers and students preferred and performed better with an activities-based approach. These findings suggest that rural educators must employ particular teaching strategies when addressing SEL, including localized content and an experiential-learning approach. Teachers reported that as their approach to SEL shifted, they began to unlock the potential to build self-aware, globally-minded students. This study concludes that social and emotional education cannot be treated in a generalized manner, as curriculum development is central to the teaching-learning process.

ContributorsBucker, Delaney Sue (Author) / Carrese, Susan (Thesis director) / Barab, Sasha (Committee member) / School of Life Sciences (Contributor, Contributor) / School of Civic & Economic Thought and Leadership (Contributor) / School of International Letters and Cultures (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Informed Consent Laws for Abortion: What Do Women Have a "Right to Know?"

Description

As of 2019, 30 US states have adopted abortion-specific informed consent laws that require state health departments to develop and disseminate written informational materials to patients seeking an abortion. Abortion is the only medical procedure for which states dictate the content of informed consent counseling. State abortion counseling materials have…

As of 2019, 30 US states have adopted abortion-specific informed consent laws that require state health departments to develop and disseminate written informational materials to patients seeking an abortion. Abortion is the only medical procedure for which states dictate the content of informed consent counseling. State abortion counseling materials have been criticized for containing inaccurate and misleading information, but overall, informed consent laws for abortion do not often receive national attention. The objective of this project was to determine the importance of informed consent laws to achieving the larger goal of dismantling the right to abortion. I found that informed consent counseling materials in most states contain a full timeline of fetal development, along with information about the risks of abortion, the risks of childbirth, and alternatives to abortion. In addition, informed consent laws for abortion are based on model legislation called the “Women’s Right to Know Act” developed by Americans United for Life (AUL). AUL calls itself the legal architect of the pro-life movement and works to pass laws at the state level that incrementally restrict abortion access so that it gradually becomes more difficult to exercise the right to abortion established by Roe v. Wade. The “Women’s Right to Know Act” is part of a larger package of model legislation called the “Women’s Protection Project,” a cluster of laws that place restrictions on abortion providers, purportedly to protect women, but actually to decrease abortion access. “Women’s Right to Know” counseling laws do not directly deny access to abortion, but they do reinforce key ideas important to the anti-abortion movement, like the concept of fetal personhood, distrust in medical professionals, the belief that pregnant people cannot be fully autonomous individuals, and the belief that abortion is not an ordinary medical procedure and requires special government oversight. “Women’s Right to Know” laws use the language of informed consent and the purported goal of protecting women to legitimize those ideas, and in doing so, they significantly undermine the right to abortion. The threat to abortion rights posed by laws like the “Women’s Right to Know” laws indicates the need to reevaluate and strengthen our ethical defense of the right to abortion.

ContributorsVenkatraman, Richa (Author) / Maienschein, Jane (Thesis director) / Brian, Jennifer (Thesis director) / Abboud, Carolina (Committee member) / Historical, Philosophical & Religious Studies (Contributor) / School of Life Sciences (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Impact of Turbidity on the UV Inactivation of Escherichia coli

Description

Turbidity is a known problem for UV water treatment systems as suspended particles can shield contaminants from the UV radiation. UV systems that utilize a reflective radiation chamber may be able to decrease the impact of turbidity on the efficacy of the system. The purpose of this study was to…

Turbidity is a known problem for UV water treatment systems as suspended particles can shield contaminants from the UV radiation. UV systems that utilize a reflective radiation chamber may be able to decrease the impact of turbidity on the efficacy of the system. The purpose of this study was to determine how kaolin clay and gram flour turbidity affects inactivation of Escherichia coli (E. coli) when using a UV system with a reflective chamber. Both sources of turbidity were shown to reduce the inactivation of E. coli with increasing concentrations. Overall, it was shown that increasing kaolin clay turbidity had a consistent effect on reducing UV inactivation across UV doses. Log inactivation was reduced by 1.48 log for the low UV dose and it was reduced by at least 1.31 log for the low UV dose. Gram flour had a similar effect to the clay at the lower UV dose, reducing log inactivation by 1.58 log. At the high UV dose, there was no change in UV inactivation with an increase in turbidity. In conclusion, turbidity has a significant impact on the efficacy of UV disinfection. Therefore, removing turbidity from water is an essential process to enhance UV efficiency for the disinfection of microbial pathogens.

ContributorsMalladi, Rohith (Author) / Abbaszadegan, Morteza (Thesis director) / Alum, Absar (Committee member) / Fox, Peter (Committee member) / School of Human Evolution & Social Change (Contributor) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Filtering by