Search Content

Batch mode active learning for multimedia pattern recognition

Description

The rapid escalation of technology and the widespread emergence of modern technological equipments have resulted in the generation of humongous amounts of digital data (in the form of images, videos and text). This has expanded the possibility of solving real world problems using computational learning frameworks. However, while gathering a…

The rapid escalation of technology and the widespread emergence of modern technological equipments have resulted in the generation of humongous amounts of digital data (in the form of images, videos and text). This has expanded the possibility of solving real world problems using computational learning frameworks. However, while gathering a large amount of data is cheap and easy, annotating them with class labels is an expensive process in terms of time, labor and human expertise. This has paved the way for research in the field of active learning. Such algorithms automatically select the salient and exemplar instances from large quantities of unlabeled data and are effective in reducing human labeling effort in inducing classification models. To utilize the possible presence of multiple labeling agents, there have been attempts towards a batch mode form of active learning, where a batch of data instances is selected simultaneously for manual annotation. This dissertation is aimed at the development of novel batch mode active learning algorithms to reduce manual effort in training classification models in real world multimedia pattern recognition applications. Four major contributions are proposed in this work: $(i)$ a framework for dynamic batch mode active learning, where the batch size and the specific data instances to be queried are selected adaptively through a single formulation, based on the complexity of the data stream in question, $(ii)$ a batch mode active learning strategy for fuzzy label classification problems, where there is an inherent imprecision and vagueness in the class label definitions, $(iii)$ batch mode active learning algorithms based on convex relaxations of an NP-hard integer quadratic programming (IQP) problem, with guaranteed bounds on the solution quality and $(iv)$ an active matrix completion algorithm and its application to solve several variants of the active learning problem (transductive active learning, multi-label active learning, active feature acquisition and active learning for regression). These contributions are validated on the face recognition and facial expression recognition problems (which are commonly encountered in real world applications like robotics, security and assistive technology for the blind and the visually impaired) and also on collaborative filtering applications like movie recommendation.

ContributorsChakraborty, Shayok (Author) / Panchanathan, Sethuraman (Thesis advisor) / Balasubramanian, Vineeth N. (Committee member) / Li, Baoxin (Committee member) / Mittelmann, Hans (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

RAProp: ranking tweets by exploiting the tweet/user/web ecosystem

Description

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a…

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a reputation score for each tweet that is based not just on content, but also additional information from the Twitter ecosystem that consists of users, tweets, and the web pages that tweets link to. This information is obtained by modeling the Twitter ecosystem as a three-layer graph. The reputation score is used to power two novel methods of ranking tweets by propagating the reputation over an agreement graph based on tweets' content similarity. Additionally, I show how the agreement graph helps counter tweet spam. An evaluation of my method on 16~million tweets from the TREC 2011 Microblog Dataset shows that it doubles the precision over baseline Twitter Search and achieves higher precision than current state of the art method. I present a detailed internal empirical evaluation of RAProp in comparison to several alternative approaches proposed by me, as well as external evaluation in comparison to the current state of the art method.

ContributorsRavikumar, Srijith (Author) / Kambhampati, Subbarao (Thesis advisor) / Davulcu, Hasan (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2013

A multi-factor analysis of the emergence of a specialist-based economy among the Phoenix Basin Hohokam

Description

This project examines the social and economic factors that contributed to the development of a specialist-based economy among the Phoenix Basin Hohokam. In the Hohokam case, widespread dependence on the products of a few concentrated pottery producers developed in the absence of political centralization or hierarchical social arrangements. The factors…

This project examines the social and economic factors that contributed to the development of a specialist-based economy among the Phoenix Basin Hohokam. In the Hohokam case, widespread dependence on the products of a few concentrated pottery producers developed in the absence of political centralization or hierarchical social arrangements. The factors that promoted intensified pottery production, therefore, are the keys to addressing how economic systems can expand in small-scale and middle-range societies. This dissertation constructs a multi-factor model that explores changes to the organization of decorated pottery production during a substantial portion of the pre-Classic period (AD 700 - AD 1020). The analysis is designed to examine simultaneously several variables that may have encouraged demand for ceramic vessels made by specialists. This study evaluates the role of four factors in the development of supply and demand for specialist produced red-on-buff pottery in Hohokam settlements. The factors include 1) agricultural intensification in the form of irrigation agriculture, 2) increases in population density, 3) ritual or social obligations that require the production of particular craft items, and 4) reduced transport costs. Supply and demand for specialist-produced pottery is estimated through a sourcing analysis of non-local pottery at 13 Phoenix Basin settlements. Through a series of statistical analyses, the study measures changes in the influence of each factor on demand for specialist-produced pottery through four temporal phases of the Hohokam pre-Classic period. The analysis results indicate that specialized red-on-buff production was initially spurred by demand for light-colored, shiny, decorated pottery, but then by comparative advantages to specialized production in particular areas of the Phoenix Basin. Specialists concentrated on the Snaketown canal system were able to generate light-colored, mica-dense wares that Phoenix Basin consumers desired while lowering transport costs in the distribution of red-on-buff pottery. The circulation of decorated wares was accompanied by the production of plainware pottery in other areas of the Phoenix Basin. Economic growth in the region was based on complementary and coordinated economic activities between the Salt and the Gila River valleys.

ContributorsKelly, Sophia E (Author) / Abbott, David R. (Thesis advisor) / Darling, J. Andrew (Committee member) / Moore, Gordon (Committee member) / Spielmann, Katherine A. (Committee member) / Arizona State University (Publisher)

Created2013

A modular ROS package for linear temporal logic based motion planning

Description

Objective of this thesis project is to build a prototype using Linear Temporal Logic specifications for generating a 2D motion plan commanding an iRobot to fulfill the specifications. This thesis project was created for Cyber Physical Systems Lab in Arizona State University. The end product of this thesis is creation…

Objective of this thesis project is to build a prototype using Linear Temporal Logic specifications for generating a 2D motion plan commanding an iRobot to fulfill the specifications. This thesis project was created for Cyber Physical Systems Lab in Arizona State University. The end product of this thesis is creation of a software solution which can be used in the academia and industry for research in cyber physical systems related applications. The major features of the project are: creating a modular system for motion planning, use of Robot Operating System (ROS), use of triangulation for environment decomposition and using stargazer sensor for localization. The project is built on an open source software called ROS which provides an environment where it is very easy to integrate different modules be it software or hardware on a Linux based platform. Use of ROS implies the project or its modules can be adapted quickly for different applications as the need arises. The final software package created and tested takes a data file as its input which contains the LTL specifications, a symbols list used in the LTL and finally the environment polygon data containing real world coordinates for all polygons and also information on neighbors and parents of each polygon. The software package successfully ran the experiment of coverage, reachability with avoidance and sequencing.

ContributorsPandya, Parth (Author) / Fainekos, Georgios (Thesis advisor) / Dasgupta, Partha (Committee member) / Lee, Yann-Hang (Committee member) / Arizona State University (Publisher)

Created2013

A graphical language for LTL motion and mission planning

Description

Linear Temporal Logic is gaining increasing popularity as a high level specification language for robot motion planning due to its expressive power and scalability of LTL control synthesis algorithms. This formalism, however, requires expert knowledge and makes it inaccessible to non-expert users. This thesis introduces a graphical specification environment to…

Linear Temporal Logic is gaining increasing popularity as a high level specification language for robot motion planning due to its expressive power and scalability of LTL control synthesis algorithms. This formalism, however, requires expert knowledge and makes it inaccessible to non-expert users. This thesis introduces a graphical specification environment to create high level motion plans to control robots in the field by converting a visual representation of the motion/task plan into a Linear Temporal Logic (LTL) specification. The visual interface is built on the Android tablet platform and provides functionality to create task plans through a set of well defined gestures and on screen controls. It uses the notion of waypoints to quickly and efficiently describe the motion plan and enables a variety of complex Linear Temporal Logic specifications to be described succinctly and intuitively by the user without the need for the knowledge and understanding of LTL specification. Thus, it opens avenues for its use by personnel in military, warehouse management, and search and rescue missions. This thesis describes the construction of LTL for various scenarios used for robot navigation using the visual interface developed and leverages the use of existing LTL based motion planners to carry out the task plan by a robot.

ContributorsSrinivas, Shashank (Author) / Fainekos, Georgios (Thesis advisor) / Baral, Chitta (Committee member) / Burleson, Winslow (Committee member) / Arizona State University (Publisher)

Created2013

Algae Growth for Lipid Production

Description

This study analyzes the feasibility of using algae cultivated from wastewater effluent to produce a biodiesel feedstock. The goal was to determine if the energy produced was greater than the operational energy consumed without consideration to constructing the system as well as the emissions and economic value associated with the…

This study analyzes the feasibility of using algae cultivated from wastewater effluent to produce a biodiesel feedstock. The goal was to determine if the energy produced was greater than the operational energy consumed without consideration to constructing the system as well as the emissions and economic value associated with the process.

Four scenarios were created:
1) high-lipid, dry extraction.
2) high-lipid, wet extraction.
3) low-lipid, dry extraction.
4) low-lipid, wet extraction.
In all cases, the system required more energy than it produced. In high lipid scenarios, the energy produced is close to the energy consumed, and a positive net energy balance may be achieved with minor improvements in technology or accounting for coproducts. In the low lipid scenarios, the energy balance is too negative to be considered feasible. Therefore the lipid content affects the decision to implement algae cultivation.

The dry extraction and the wet extraction both require some level of mechanical drying and this makes the two methods yield similar results in terms of the energy analysis. Therefore, the extraction method does not dramatically affect the decision for implementing algae-based oil production from an energetic standpoint. The economic value of the oil in both high lipid scenarios results in a net profit despite the negative net energy. Emission calculations resulted in avoiding a significant amount of CO2 for high lipid scenarios but not for the low lipid scenarios. The CO2 avoided does not account for non-lipid biomass, so this number is an underestimation of the final CO2 avoided from the end products.

While the term "CO2 avoided" has been used for this study, it should be noted that this CO2 would be emitted upon use as a fuel source. These emissions, however, are not “new” CO2 because it has already been emitted and is being captured and recycled. Currently, literature is very divisive on the lipid content present in algae and this study shows that lipid content has a tremendous affect on energy and emissions impacts. The type of algae that can grow in wastewater effluent also should be investigated as well as the conditions that promote high lipid accumulation. The dewatering phase must be improved as it is extremely energy intensive and dominates the operational energy balance.

In order to compete, wet extraction must have a much more significant effect on the drying phase and must avoid the use of the human toxicants methanol and chloroform. Additionally, while the construction phase was beyond the scope of this project it may be a critical aspect in determining the feasibility these systems. Future research in this field should focus on lipid production, optimizing the belt dryer or finding a different method of dewatering, and allocating the coproducts.

ContributorsBarr, William (Author) / Hottle, Troy (Author) / Arizona State University. School of Sustainable Engineering and the Built Environment (Contributor) / Arizona State University. Center for Earth Systems Engineering and Management (Contributor)

Created2012-05

Answer set programming and other computing paradigms

Description

Answer Set Programming (ASP) is one of the most prominent and successful knowledge representation paradigms. The success of ASP is due to its expressive non-monotonic modeling language and its efficient computational methods originating from building propositional satisfiability solvers. The wide adoption of ASP has motivated several extensions to its modeling…

Answer Set Programming (ASP) is one of the most prominent and successful knowledge representation paradigms. The success of ASP is due to its expressive non-monotonic modeling language and its efficient computational methods originating from building propositional satisfiability solvers. The wide adoption of ASP has motivated several extensions to its modeling language in order to enhance expressivity, such as incorporating aggregates and interfaces with ontologies. Also, in order to overcome the grounding bottleneck of computation in ASP, there are increasing interests in integrating ASP with other computing paradigms, such as Constraint Programming (CP) and Satisfiability Modulo Theories (SMT). Due to the non-monotonic nature of the ASP semantics, such enhancements turned out to be non-trivial and the existing extensions are not fully satisfactory. We observe that one main reason for the difficulties rooted in the propositional semantics of ASP, which is limited in handling first-order constructs (such as aggregates and ontologies) and functions (such as constraint variables in CP and SMT) in natural ways. This dissertation presents a unifying view on these extensions by viewing them as instances of formulas with generalized quantifiers and intensional functions. We extend the first-order stable model semantics by by Ferraris, Lee, and Lifschitz to allow generalized quantifiers, which cover aggregate, DL-atoms, constraints and SMT theory atoms as special cases. Using this unifying framework, we study and relate different extensions of ASP. We also present a tight integration of ASP with SMT, based on which we enhance action language C+ to handle reasoning about continuous changes. Our framework yields a systematic approach to study and extend non-monotonic languages.

ContributorsMeng, Yunsong (Author) / Lee, Joohyung (Thesis advisor) / Ahn, Gail-Joon (Committee member) / Baral, Chitta (Committee member) / Fainekos, Georgios (Committee member) / Lifschitz, Vladimir (Committee member) / Arizona State University (Publisher)

Created2013

Model-based development of multi-iRobot simulation and control

Description

This thesis introduces the Model-Based Development of Multi-iRobot Toolbox (MBDMIRT), a Simulink-based toolbox designed to provide the means to acquire and practice the Model-Based Development (MBD) skills necessary to design real-time embedded system. The toolbox was developed in the Cyber-Physical System Laboratory at Arizona State University. The MBDMIRT toolbox runs…

This thesis introduces the Model-Based Development of Multi-iRobot Toolbox (MBDMIRT), a Simulink-based toolbox designed to provide the means to acquire and practice the Model-Based Development (MBD) skills necessary to design real-time embedded system. The toolbox was developed in the Cyber-Physical System Laboratory at Arizona State University. The MBDMIRT toolbox runs under MATLAB/Simulink to simulate the movements of multiple iRobots and to control, after verification by simulation, multiple physical iRobots accordingly. It adopts the Simulink/Stateflow, which exemplifies an approach to MBD, to program the behaviors of the iRobots. The MBDMIRT toolbox reuses and augments the open-source MATLAB-Based Simulator for the iRobot Create from Cornell University to run the simulation. Regarding the mechanism of iRobot control, the MBDMIRT toolbox applies the MATLAB Toolbox for the iRobot Create (MTIC) from United States Naval Academy to command the physical iRobots. The MBDMIRT toolbox supports a timer in both the simulation and the control, which is based on the local clock of the PC running the toolbox. In addition to the build-in sensors of an iRobot, the toolbox can simulate four user-added sensors, which are overhead localization system (OLS), sonar sensors, a camera, and Light Detection And Ranging (LIDAR). While controlling a physical iRobot, the toolbox supports the StarGazer OLS manufactured by HAGISONIC, Inc.

ContributorsSu, Shih-Kai (Author) / Fainekos, Georgios E (Thesis advisor) / Sarjoughian, Hessam S. (Committee member) / Artemiadis, Panagiotis K (Committee member) / Arizona State University (Publisher)

Created2012

An intelligent co-reference resolver for Winograd schema sentences containing resolved semantic entities

Description

There has been a lot of research in the field of artificial intelligence about thinking machines. Alan Turing proposed a test to observe a machine's intelligent behaviour with respect to natural language conversation. The Winograd schema challenge is suggested as an alternative, to the Turing test. It needs inferencing capabilities,…

There has been a lot of research in the field of artificial intelligence about thinking machines. Alan Turing proposed a test to observe a machine's intelligent behaviour with respect to natural language conversation. The Winograd schema challenge is suggested as an alternative, to the Turing test. It needs inferencing capabilities, reasoning abilities and background knowledge to get the answer right. It involves a coreference resolution task in which a machine is given a sentence containing a situation which involves two entities, one pronoun and some more information about the situation and the machine has to come up with the right resolution of a pronoun to one of the entities. The complexity of the task is increased with the fact that the Winograd sentences are not constrained by one domain or specific sentence structure and it also contains a lot of human proper names. This modification makes the task of association of entities, to one particular word in the sentence, to derive the answer, difficult. I have developed a pronoun resolver system for the confined domain Winograd sentences. I have developed a classifier or filter which takes input sentences and decides to accept or reject them based on a particular criteria. Once the sentence is accepted. I run parsers on it to obtain the detailed analysis. Furthermore I have developed four answering modules which use world knowledge and inferencing mechanisms to try and resolve the pronoun. The four techniques I use are : ConceptNet knowledgebase, Search engine pattern counts,Narrative event chains and sentiment analysis. I have developed a particular aggregation mechanism for the answers from these modules to arrive at a final answer. I have used caching technique for the association relations that I obtain for different modules, so as to boost the performance. I run my system on the standard ‘nyu dataset’ of Winograd sentences and questions. This dataset is then restricted, by my classifier, to 90 sentences. I evaluate my system on this 90 sentence dataset. When I compare my results against the state of the art system on the same dataset, I get nearly 4.5 % improvement in the restricted domain.

ContributorsBudukh, Tejas Ulhas (Author) / Baral, Chitta (Thesis advisor) / VanLehn, Kurt (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2013

Representing, reasoning and answering questions about biological pathways various applications

Description

Biological organisms are made up of cells containing numerous interconnected biochemical processes. Diseases occur when normal functionality of these processes is disrupted, manifesting as disease symptoms. Thus, understanding these biochemical processes and their interrelationships is a primary task in biomedical research and a prerequisite for activities including diagnosing diseases and…

Biological organisms are made up of cells containing numerous interconnected biochemical processes. Diseases occur when normal functionality of these processes is disrupted, manifesting as disease symptoms. Thus, understanding these biochemical processes and their interrelationships is a primary task in biomedical research and a prerequisite for activities including diagnosing diseases and drug development. Scientists studying these interconnected processes have identified various pathways involved in drug metabolism, diseases, and signal transduction, etc. High-throughput technologies, new algorithms and speed improvements over the last decade have resulted in deeper knowledge about biological systems, leading to more refined pathways. Such pathways tend to be large and complex, making it difficult for an individual to remember all aspects. Thus, computer models are needed to represent and analyze them. The refinement activity itself requires reasoning with a pathway model by posing queries against it and comparing the results against the real biological system. Many existing models focus on structural and/or factoid questions, relying on surface-level information. These are generally not the kind of questions that a biologist may ask someone to test their understanding of biological processes. Examples of questions requiring understanding of biological processes are available in introductory college level biology text books. Such questions serve as a model for the question answering system developed in this thesis. Thus, the main goal of this thesis is to develop a system that allows the encoding of knowledge about biological pathways to answer questions demonstrating understanding of the pathways. To that end, a language is developed to specify a pathway and pose questions against it. Some existing tools are modified and used to accomplish this goal. The utility of the framework developed in this thesis is illustrated with applications in the biological domain. Finally, the question answering system is used in real world applications by extracting pathway knowledge from text and answering questions related to drug development.

ContributorsAnwar, Saadat (Author) / Baral, Chitta (Thesis advisor) / Inoue, Katsumi (Committee member) / Chen, Yi (Committee member) / Davulcu, Hasan (Committee member) / Lee, Joohyung (Committee member) / Arizona State University (Publisher)

Created2014

Filtering by