Search Content

Batch mode active learning for multimedia pattern recognition

Description

The rapid escalation of technology and the widespread emergence of modern technological equipments have resulted in the generation of humongous amounts of digital data (in the form of images, videos and text). This has expanded the possibility of solving real world problems using computational learning frameworks. However, while gathering a…

The rapid escalation of technology and the widespread emergence of modern technological equipments have resulted in the generation of humongous amounts of digital data (in the form of images, videos and text). This has expanded the possibility of solving real world problems using computational learning frameworks. However, while gathering a large amount of data is cheap and easy, annotating them with class labels is an expensive process in terms of time, labor and human expertise. This has paved the way for research in the field of active learning. Such algorithms automatically select the salient and exemplar instances from large quantities of unlabeled data and are effective in reducing human labeling effort in inducing classification models. To utilize the possible presence of multiple labeling agents, there have been attempts towards a batch mode form of active learning, where a batch of data instances is selected simultaneously for manual annotation. This dissertation is aimed at the development of novel batch mode active learning algorithms to reduce manual effort in training classification models in real world multimedia pattern recognition applications. Four major contributions are proposed in this work: $(i)$ a framework for dynamic batch mode active learning, where the batch size and the specific data instances to be queried are selected adaptively through a single formulation, based on the complexity of the data stream in question, $(ii)$ a batch mode active learning strategy for fuzzy label classification problems, where there is an inherent imprecision and vagueness in the class label definitions, $(iii)$ batch mode active learning algorithms based on convex relaxations of an NP-hard integer quadratic programming (IQP) problem, with guaranteed bounds on the solution quality and $(iv)$ an active matrix completion algorithm and its application to solve several variants of the active learning problem (transductive active learning, multi-label active learning, active feature acquisition and active learning for regression). These contributions are validated on the face recognition and facial expression recognition problems (which are commonly encountered in real world applications like robotics, security and assistive technology for the blind and the visually impaired) and also on collaborative filtering applications like movie recommendation.

ContributorsChakraborty, Shayok (Author) / Panchanathan, Sethuraman (Thesis advisor) / Balasubramanian, Vineeth N. (Committee member) / Li, Baoxin (Committee member) / Mittelmann, Hans (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Contaminants of emerging concern in U.S. sewage sludges and forecasting of associated ecological and human health risks using sewage epidemiology approaches

Description

Many manmade chemicals used in consumer products are ultimately washed down the drain and are collected in municipal sewers. Efficient chemical monitoring at wastewater treatment (WWT) plants thus may provide up-to-date information on chemical usage rates for epidemiological assessments. The objective of the present study was to extrapolate this concept,…

Many manmade chemicals used in consumer products are ultimately washed down the drain and are collected in municipal sewers. Efficient chemical monitoring at wastewater treatment (WWT) plants thus may provide up-to-date information on chemical usage rates for epidemiological assessments. The objective of the present study was to extrapolate this concept, termed 'sewage epidemiology', to include municipal sewage sludge (MSS) in identifying and prioritizing contaminants of emerging concern (CECs). To test this the following specific aims were defined: i) to screen and identify CECs in nationally representative samples of MSS and to provide nationwide inventories of CECs in U.S. MSS; ii) to investigate the fate and persistence in MSS-amended soils, of sludge-borne hydrophobic CECs; and iii) to develop an analytical tool relying on contaminant levels in MSS as an indicator for identifying and prioritizing hydrophobic CECs. Chemicals that are primarily discharged to the sewage systems (alkylphenol surfactants) and widespread persistent organohalogen pollutants (perfluorochemicals and brominated flame retardants) were analyzed in nationally representative MSS samples. A meta-analysis showed that CECs contribute about 0.04-0.15% to the total dry mass of MSS, a mass equivalent of 2,700-7,900 metric tonnes of chemicals annually. An analysis of archived mesocoms from a sludge weathering study showed that 64 CECs persisted in MSS/soil mixtures over the course of the experiment, with half-lives ranging between 224 and >990 days; these results suggest an inherent persistence of CECs that accumulate in MSS. A comparison of the spectrum of chemicals (n=52) analyzed in nationally representative biological specimens from humans and MSS revealed 70% overlap. This observed co-occurrence of contaminants in both matrices suggests that MSS may serve as an indicator for ongoing human exposures and body burdens of pollutants in humans. In conclusion, I posit that this novel approach in sewage epidemiology may serve to pre-screen and prioritize the several thousands of known or suspected CECs to identify those that are most prone to pose a risk to human health and the environment.

ContributorsVenkatesan, Arjunkrishna (Author) / Halden, Rolf U. (Thesis advisor) / Westerhoff, Paul (Committee member) / Fox, Peter (Committee member) / Arizona State University (Publisher)

Created2013

Building adaptive computational systems for physiological and biomedical data

Description

In recent years, machine learning and data mining technologies have received growing attention in several areas such as recommendation systems, natural language processing, speech and handwriting recognition, image processing and biomedical domain. Many of these applications which deal with physiological and biomedical data require person specific or person adaptive systems.…

In recent years, machine learning and data mining technologies have received growing attention in several areas such as recommendation systems, natural language processing, speech and handwriting recognition, image processing and biomedical domain. Many of these applications which deal with physiological and biomedical data require person specific or person adaptive systems. The greatest challenge in developing such systems is the subject-dependent data variations or subject-based variability in physiological and biomedical data, which leads to difference in data distributions making the task of modeling these data, using traditional machine learning algorithms, complex and challenging. As a result, despite the wide application of machine learning, efficient deployment of its principles to model real-world data is still a challenge. This dissertation addresses the problem of subject based variability in physiological and biomedical data and proposes person adaptive prediction models based on novel transfer and active learning algorithms, an emerging field in machine learning. One of the significant contributions of this dissertation is a person adaptive method, for early detection of muscle fatigue using Surface Electromyogram signals, based on a new multi-source transfer learning algorithm. This dissertation also proposes a subject-independent algorithm for grading the progression of muscle fatigue from 0 to 1 level in a test subject, during isometric or dynamic contractions, at real-time. Besides subject based variability, biomedical image data also varies due to variations in their imaging techniques, leading to distribution differences between the image databases. Hence a classifier learned on one database may perform poorly on the other database. Another significant contribution of this dissertation has been the design and development of an efficient biomedical image data annotation framework, based on a novel combination of transfer learning and a new batch-mode active learning method, capable of addressing the distribution differences across databases. The methodologies developed in this dissertation are relevant and applicable to a large set of computing problems where there is a high variation of data between subjects or sources, such as face detection, pose detection and speech recognition. From a broader perspective, these frameworks can be viewed as a first step towards design of automated adaptive systems for real world data.

ContributorsChattopadhyay, Rita (Author) / Panchanathan, Sethuraman (Thesis advisor) / Ye, Jieping (Thesis advisor) / Li, Baoxin (Committee member) / Santello, Marco (Committee member) / Arizona State University (Publisher)

Created2013

The development and evaluation of biofuel production systems on marginal land

Description

The consumption of feedstocks from agriculture and forestry by current biofuel production has raised concerns about food security and land availability. In the meantime, intensive human activities have created a large amount of marginal lands that require management. This study investigated the viability of aligning land management with biofuel production…

The consumption of feedstocks from agriculture and forestry by current biofuel production has raised concerns about food security and land availability. In the meantime, intensive human activities have created a large amount of marginal lands that require management. This study investigated the viability of aligning land management with biofuel production on marginal lands. Biofuel crop production on two types of marginal lands, namely urban vacant lots and abandoned mine lands (AMLs), were assessed. The investigation of biofuel production on urban marginal land was carried out in Pittsburgh between 2008 and 2011, using the sunflower gardens developed by a Pittsburgh non-profit as an example. Results showed that the crops from urban marginal lands were safe for biofuel. The crop yield was 20% of that on agricultural land while the low input agriculture was used in crop cultivation. The energy balance analysis demonstrated that the sunflower gardens could produce a net energy return even at the current low yield. Biofuel production on AML was assessed from experiments conducted in a greenhouse for sunflower, soybean, corn, canola and camelina. The research successfully created an industrial symbiosis by using bauxite as soil amendment to enable plant growth on very acidic mine refuse. Phytoremediation and soil amendments were found to be able to effectively reduce contamination in the AML and its runoff. Results from this research supported that biofuel production on marginal lands could be a unique and feasible option for cultivating biofuel feedstocks.

ContributorsZhao, Xi (Author) / Landis, Amy (Thesis advisor) / Fox, Peter (Committee member) / Chester, Mikhail (Committee member) / Arizona State University (Publisher)

Created2013

Robust implementation of NL2KR system and it's application in iRODS domain

Description

Currently, to interact with computer based systems one needs to learn the specific interface language of that system. In most cases, interaction would be much easier if it could be done in natural language. For that, we will need a module which understands natural language and automatically translates it to…

Currently, to interact with computer based systems one needs to learn the specific interface language of that system. In most cases, interaction would be much easier if it could be done in natural language. For that, we will need a module which understands natural language and automatically translates it to the interface language of the system. NL2KR (Natural language to knowledge representation) v.1 system is a prototype of such a system. It is a learning based system that learns new meanings of words in terms of lambda-calculus formulas given an initial lexicon of some words and their meanings and a training corpus of sentences with their translations. As a part of this thesis, we take the prototype NL2KR v.1 system and enhance various components of it to make it usable for somewhat substantial and useful interface languages. We revamped the lexicon learning components, Inverse-lambda and Generalization modules, and redesigned the lexicon learning algorithm which uses these components to learn new meanings of words. Similarly, we re-developed an inbuilt parser of the system in Answer Set Programming (ASP) and also integrated external parser with the system. Apart from this, we added some new rich features like various system configurations and memory cache in the learning component of the NL2KR system. These enhancements helped in learning more meanings of the words, boosted performance of the system by reducing the computation time by a factor of 8 and improved the usability of the system. We evaluated the NL2KR system on iRODS domain. iRODS is a rule-oriented data system, which helps in managing large set of computer files using policies. This system provides a Rule-Oriented interface langauge whose syntactic structure is like any procedural programming language (eg. C). However, direct translation of natural language (NL) to this interface language is difficult. So, for automatic translation of NL to this language, we define a simple intermediate Policy Declarative Language (IPDL) to represent the knowledge in the policies, which then can be directly translated to iRODS rules. We develop a corpus of 100 policy statements and manually translate them to IPDL langauge. This corpus is then used for the evaluation of NL2KR system. We performed 10 fold cross validation on the system. Furthermore, using this corpus, we illustrate how different components of our NL2KR system work.

ContributorsKumbhare, Kanchan Ravishankar (Author) / Baral, Chitta (Thesis advisor) / Ye, Jieping (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2013

Classifying everyday activity through label propagation with sparse training data

Description

We solve the problem of activity verification in the context of sustainability. Activity verification is the process of proving the user assertions pertaining to a certain activity performed by the user. Our motivation lies in incentivizing the user for engaging in sustainable activities like taking public transport or recycling. Such…

We solve the problem of activity verification in the context of sustainability. Activity verification is the process of proving the user assertions pertaining to a certain activity performed by the user. Our motivation lies in incentivizing the user for engaging in sustainable activities like taking public transport or recycling. Such incentivization schemes require the system to verify the claim made by the user. The system verifies these claims by analyzing the supporting evidence captured by the user while performing the activity. The proliferation of portable smart-phones in the past few years has provided us with a ubiquitous and relatively cheap platform, having multiple sensors like accelerometer, gyroscope, microphone etc. to capture this evidence data in-situ. In this research, we investigate the supervised and semi-supervised learning techniques for activity verification. Both these techniques make use the data set constructed using the evidence submitted by the user. Supervised learning makes use of annotated evidence data to build a function to predict the class labels of the unlabeled data points. The evidence data captured can be either unimodal or multimodal in nature. We use the accelerometer data as evidence for transportation mode verification and image data as evidence for recycling verification. After training the system, we achieve maximum accuracy of 94% when classifying the transport mode and 81% when detecting recycle activity. In the case of recycle verification, we could improve the classification accuracy by asking the user for more evidence. We present some techniques to ask the user for the next best piece of evidence that maximizes the probability of classification. Using these techniques for detecting recycle activity, the accuracy increases to 93%. The major disadvantage of using supervised models is that it requires extensive annotated training data, which expensive to collect. Due to the limited training data, we look at the graph based inductive semi-supervised learning methods to propagate the labels among the unlabeled samples. In the semi-supervised approach, we represent each instance in the data set as a node in the graph. Since it is a complete graph, edges interconnect these nodes, with each edge having some weight representing the similarity between the points. We propagate the labels in this graph, based on the proximity of the data points to the labeled nodes. We estimate the performance of these algorithms by measuring how close the probability distribution of the data after label propagation is to the probability distribution of the ground truth data. Since labeling has a cost associated with it, in this thesis we propose two algorithms that help us in selecting minimum number of labeled points to propagate the labels accurately. Our proposed algorithm achieves a maximum of 73% increase in performance when compared to the baseline algorithm.

ContributorsDesai, Vaishnav (Author) / Sundaram, Hari (Thesis advisor) / Li, Baoxin (Thesis advisor) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2013

Informatics approach to improving surgical skills training

Description

Surgery as a profession requires significant training to improve both clinical decision making and psychomotor proficiency. In the medical knowledge domain, tools have been developed, validated, and accepted for evaluation of surgeons' competencies. However, assessment of the psychomotor skills still relies on the Halstedian model of apprenticeship, wherein surgeons are…

Surgery as a profession requires significant training to improve both clinical decision making and psychomotor proficiency. In the medical knowledge domain, tools have been developed, validated, and accepted for evaluation of surgeons' competencies. However, assessment of the psychomotor skills still relies on the Halstedian model of apprenticeship, wherein surgeons are observed during residency for judgment of their skills. Although the value of this method of skills assessment cannot be ignored, novel methodologies of objective skills assessment need to be designed, developed, and evaluated that augment the traditional approach. Several sensor-based systems have been developed to measure a user's skill quantitatively, but use of sensors could interfere with skill execution and thus limit the potential for evaluating real-life surgery. However, having a method to judge skills automatically in real-life conditions should be the ultimate goal, since only with such features that a system would be widely adopted. This research proposes a novel video-based approach for observing surgeons' hand and surgical tool movements in minimally invasive surgical training exercises as well as during laparoscopic surgery. Because our system does not require surgeons to wear special sensors, it has the distinct advantage over alternatives of offering skills assessment in both learning and real-life environments. The system automatically detects major skill-measuring features from surgical task videos using a computing system composed of a series of computer vision algorithms and provides on-screen real-time performance feedback for more efficient skill learning. Finally, the machine-learning approach is used to develop an observer-independent composite scoring model through objective and quantitative measurement of surgical skills. To increase effectiveness and usability of the developed system, it is integrated with a cloud-based tool, which automatically assesses surgical videos upload to the cloud.

ContributorsIslam, Gazi (Author) / Li, Baoxin (Thesis advisor) / Liang, Jianming (Thesis advisor) / Dinu, Valentin (Committee member) / Greenes, Robert (Committee member) / Smith, Marshall (Committee member) / Kahol, Kanav (Committee member) / Patel, Vimla L. (Committee member) / Arizona State University (Publisher)

Created2013

Use of ozonation and constructed wetlands to remove contaminants of emerging concern from wastewater effluent

Description

Contaminants of emerging concern (CECs) present in wastewater effluent can threat its safe discharge or reuse. Additional barriers of protection can be provided using advanced or natural treatment processes. This dissertation evaluated ozonation and constructed wetlands to remove CECs from wastewater effluent. Organic CECs can be removed by hydroxyl radical…

Contaminants of emerging concern (CECs) present in wastewater effluent can threat its safe discharge or reuse. Additional barriers of protection can be provided using advanced or natural treatment processes. This dissertation evaluated ozonation and constructed wetlands to remove CECs from wastewater effluent. Organic CECs can be removed by hydroxyl radical formed during ozonation, however estimating the ozone demand of wastewater effluent is complicated due to the presence of reduced inorganic species. A method was developed to estimate ozone consumption only by dissolved organic compounds and predict trace organic oxidation across multiple wastewater sources. Organic and engineered nanomaterial (ENM) CEC removal in constructed wetlands was investigated using batch experiments and continuous-flow microcosms containing decaying wetland plants. CEC removal varied depending on their physico-chemical properties, hydraulic residence time (HRT) and relative quantities of plant materials in the microcosms. At comparable HRTs, ENM removal improved with higher quantity of plant materials due to enhanced sorption which was verified in batch-scale studies with plant materials. A fate-predictive model was developed to evaluate the role of design loading rates on organic CEC removal. Areal removal rates increased with hydraulic loading rates (HLRs) and carbon loading rates (CLRs) unless photolysis was the dominant removal mechanism (e.g. atrazine). To optimize CEC removal, wetlands with different CLRs can be used in combination without lowering the net HLR. Organic CEC removal in denitrifying conditions of constructed wetlands was investigated and selected CECs (e.g. estradiol) were found to biotransform while denitrification occurred. Although level of denitrification was affected by HRT, similar impact on estradiol was not observed due to a dominant effect from plant biomass quantity. Overall, both modeling and experimental findings suggest considering CLR as an equally important factor with HRT or HLR to design constructed wetlands for CEC removal. This dissertation provided directions to select design parameters for ozonation (ozone dose) and constructed wetlands (design loading rates) to meet organic CEC removal goals. Future research is needed to understand fate of ENMs during ozonation and quantify the contributions from different transformation mechanisms occurring in the wetlands to incorporate in a model and evaluate the effect of wetland design.

ContributorsSharif, Fariya (Author) / Westerhoff, Paul (Thesis advisor) / Halden, Rolf (Committee member) / Fox, Peter (Committee member) / Herckes, Pierre (Committee member) / Arizona State University (Publisher)

Created2013

Overcoming the impacts of extreme weather and dissolved organic matter on the treatability of water using ozone

Description

The influence of climate variability and reclaimed wastewater on the water supply necessitates improved understanding of the treatability of trace and bulk organic matter. Dissolved organic matter (DOM) mobilized during extreme weather events and in treated wastewater includes natural organic matter (NOM), contaminants of emerging concern (CECs), and microbial extracellular…

The influence of climate variability and reclaimed wastewater on the water supply necessitates improved understanding of the treatability of trace and bulk organic matter. Dissolved organic matter (DOM) mobilized during extreme weather events and in treated wastewater includes natural organic matter (NOM), contaminants of emerging concern (CECs), and microbial extracellular polymeric substances (EPS). The goal of my dissertation was to quantify the impacts of extreme weather events on DOM in surface water and downstream treatment processes, and to improve membrane filtration efficiency and CECs oxidation efficiency during water reclamation with ozone. Surface water quality, air quality and hydrologic flow rate data were used to quantify changes in DOM and turbidity following dust storms, flooding, or runoff from wildfire burn areas in central Arizona. The subsequent impacts to treatment processes and public perception of water quality were also discussed. Findings showed a correlation between dust storm events and change in surface water turbidity (R2=0.6), attenuation of increased DOM through reservoir systems, a 30-40% increase in organic carbon and a 120-600% increase in turbidity following severe flooding, and differing impacts of upland and lowland wildfires. The use of ozone to reduce membrane fouling caused by vesicles (a subcomponent of EPS) and oxidize CECs through increased hydroxyl radical (HO●) production was investigated. An "ozone dose threshold" was observed above which addition of hydrogen peroxide increased HO● production; indicating the presence of ambient promoters in wastewater. Ozonation of CECs in secondary effluent over titanium dioxide or activated carbon did not increase radial production. Vesicles fouled ultrafiltration membranes faster (20 times greater flux decline) than polysaccharides, fatty acids, or NOM. Based upon the estimated carbon distribution of secondary effluent, vesicles could be responsible for 20-60% of fouling during ultrafiltration and may play a vital role in other environmental processes as well. Ozone reduced vesicle-caused membrane fouling that, in conjunction with the presence of ambient promoters, helps to explain why low ozone dosages improve membrane flux during full-scale water reclamation.

ContributorsBarry, Michelle (Author) / Barry, Michelle C (Thesis advisor) / Westerhoff, Paul (Committee member) / Fox, Peter (Committee member) / Halden, Rolf (Committee member) / Hristovski, Kiril (Committee member) / Arizona State University (Publisher)

Created2014

Improving our understanding of source zones at petroleum impacted sites through physical model studies

Description

Characterization of petroleum spill site source zones directly influences the selection of corrective action plans and frequently affects the success of remediation efforts. For example, simply knowing whether or not nonaqueous phase liquid (NAPL) is present, or if there is chemical storage in less hydraulically accessible regions, will influence corrective…

Characterization of petroleum spill site source zones directly influences the selection of corrective action plans and frequently affects the success of remediation efforts. For example, simply knowing whether or not nonaqueous phase liquid (NAPL) is present, or if there is chemical storage in less hydraulically accessible regions, will influence corrective action planning. The overarching objective of this study was to assess if macroscopic source zone features can be inferred from dissolved concentration vs. time data. Laboratory-scale physical model studies were conducted for idealized sources; defined as Type-1) NAPL-impacted high permeability zones, Type-2) NAPL-impacted lower permeability zones, and Type-3) dissolved chemical matrix storage in lower permeability zones. Aquifer source release studies were conducted using two-dimensional stainless steel flow-through tanks outfitted with sampling ports for the monitoring of effluent concentrations and flow rates. An idealized NAPL mixture of key gasoline components was used to create the NAPL source zones, and dissolved sources were created using aqueous solutions having concentrations similar to water in equilibrium with the NAPL sources. The average linear velocity was controlled by pumping to be about 2 ft/d, and dissolved effluent concentrations were monitored daily. The Type-1 experiment resulted in a source signature similar to that expected for a relatively well-mixed NAPL source, with dissolved concentrations dependent on chemical solubility and initial mass fraction. The Type-2 and Type-3 experiments were conducted for 320 d and 190 d respectively. Unlike the Type-1 experiment, the concentration vs. time behavior was similar for all chemicals, for both source types. The magnitudes of the effluent concentrations varied between the Type-2 and Type-3 experiments, and were related to the hydrocarbon source mass. A fourth physical model experiment was performed to identify differences between ideal equilibrium behavior and the source concentration vs. time behavior observed in the tank experiments. Screening-level mathematical models predicted the general behavior observed in the experiments. The results of these studies suggest that dissolved concentration vs. time data can be used to distinguish between Type-1 sources in transmissive zones and Type-2 and Type-3 sources in lower permeability zones, provided that many years to decades of data are available. The results also suggest that concentration vs. time data alone will be insufficient to distinguish between NAPL and dissolved-phase storage sources in lower permeability regions.

ContributorsWilson, Sean Tomas (Author) / Johnson, Paul (Thesis advisor) / Kavazanjian, Edward (Committee member) / Fox, Peter (Committee member) / Arizona State University (Publisher)

Created2014

ASU Electronic Theses and Dissertations

Filtering by

Batch mode active learning for multimedia pattern recognition

Contaminants of emerging concern in U.S. sewage sludges and forecasting of associated ecological and human health risks using sewage epidemiology approaches

Building adaptive computational systems for physiological and biomedical data

The development and evaluation of biofuel production systems on marginal land

Robust implementation of NL2KR system and it's application in iRODS domain

Classifying everyday activity through label propagation with sparse training data

Informatics approach to improving surgical skills training

Use of ozonation and constructed wetlands to remove contaminants of emerging concern from wastewater effluent

Overcoming the impacts of extreme weather and dissolved organic matter on the treatability of water using ozone

Improving our understanding of source zones at petroleum impacted sites through physical model studies