Matching Items (39)
Filtering by

Clear all filters

153022-Thumbnail Image.png
Description
In the sport of competitive water skiing, the skill of a human boat driver can affect athletic performance. Driver influence is not necessarily inhibitive to skiers, however, it reduces the fairness and credibility of the sport overall. In response to the stated problem, this thesis proposes a vision-based real-time control

In the sport of competitive water skiing, the skill of a human boat driver can affect athletic performance. Driver influence is not necessarily inhibitive to skiers, however, it reduces the fairness and credibility of the sport overall. In response to the stated problem, this thesis proposes a vision-based real-time control system designed specifically for tournament waterski boats. The challenges addressed in this thesis include: one, the segmentation of floating objects in frame sequences captured by a moving camera, two, the identification of segmented objects which fit a predefined model, and three, the accurate and fast estimation of camera position and orientation from coplanar point correspondences. This thesis discusses current ideas and proposes new methods for the three challenges mentioned. In the end, a working prototype is produced.
ContributorsWalker, Collin (Author) / Li, Baoxin (Thesis advisor) / Turaga, Pavan (Committee member) / Claveau, David (Committee member) / Arizona State University (Publisher)
Created2014
155900-Thumbnail Image.png
Description
Compressive sensing theory allows to sense and reconstruct signals/images with lower sampling rate than Nyquist rate. Applications in resource constrained environment stand to benefit from this theory, opening up many possibilities for new applications at the same time. The traditional inference pipeline for computer vision sequence reconstructing the image from

Compressive sensing theory allows to sense and reconstruct signals/images with lower sampling rate than Nyquist rate. Applications in resource constrained environment stand to benefit from this theory, opening up many possibilities for new applications at the same time. The traditional inference pipeline for computer vision sequence reconstructing the image from compressive measurements. However,the reconstruction process is a computationally expensive step that also provides poor results at high compression rate. There have been several successful attempts to perform inference tasks directly on compressive measurements such as activity recognition. In this thesis, I am interested to tackle a more challenging vision problem - Visual question answering (VQA) without reconstructing the compressive images. I investigate the feasibility of this problem with a series of experiments, and I evaluate proposed methods on a VQA dataset and discuss promising results and direction for future work.
ContributorsHuang, Li-Chin (Author) / Turaga, Pavan (Thesis advisor) / Yang, Yezhou (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2017
156084-Thumbnail Image.png
Description
The performance of most of the visual computing tasks depends on the quality of the features extracted from the raw data. Insightful feature representation increases the performance of many learning algorithms by exposing the underlying explanatory factors of the output for the unobserved input. A good representation should also handle

The performance of most of the visual computing tasks depends on the quality of the features extracted from the raw data. Insightful feature representation increases the performance of many learning algorithms by exposing the underlying explanatory factors of the output for the unobserved input. A good representation should also handle anomalies in the data such as missing samples and noisy input caused by the undesired, external factors of variation. It should also reduce the data redundancy. Over the years, many feature extraction processes have been invented to produce good representations of raw images and videos.

The feature extraction processes can be categorized into three groups. The first group contains processes that are hand-crafted for a specific task. Hand-engineering features requires the knowledge of domain experts and manual labor. However, the feature extraction process is interpretable and explainable. Next group contains the latent-feature extraction processes. While the original feature lies in a high-dimensional space, the relevant factors for a task often lie on a lower dimensional manifold. The latent-feature extraction employs hidden variables to expose the underlying data properties that cannot be directly measured from the input. Latent features seek a specific structure such as sparsity or low-rank into the derived representation through sophisticated optimization techniques. The last category is that of deep features. These are obtained by passing raw input data with minimal pre-processing through a deep network. Its parameters are computed by iteratively minimizing a task-based loss.

In this dissertation, I present four pieces of work where I create and learn suitable data representations. The first task employs hand-crafted features to perform clinically-relevant retrieval of diabetic retinopathy images. The second task uses latent features to perform content-adaptive image enhancement. The third task ranks a pair of images based on their aestheticism. The goal of the last task is to capture localized image artifacts in small datasets with patch-level labels. For both these tasks, I propose novel deep architectures and show significant improvement over the previous state-of-art approaches. A suitable combination of feature representations augmented with an appropriate learning approach can increase performance for most visual computing tasks.
ContributorsChandakkar, Parag Shridhar (Author) / Li, Baoxin (Thesis advisor) / Yang, Yezhou (Committee member) / Turaga, Pavan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)
Created2017
155352-Thumbnail Image.png
Description
Computational thinking, the fundamental way of thinking in computer science, including information sourcing and problem solving behind programming, is considered vital to children who live in a digital era. Most of current educational games designed to teach children about coding either rely on external curricular materials or are too complicated

Computational thinking, the fundamental way of thinking in computer science, including information sourcing and problem solving behind programming, is considered vital to children who live in a digital era. Most of current educational games designed to teach children about coding either rely on external curricular materials or are too complicated to work well with young children. In this thesis project, Guardy, an iOS tower defense game, was developed to help children over 8 years old learn about and practice using basic concepts in programming. The game is built with the SpriteKit, a graphics rendering and animation infrastructure in Apple’s integrated development environment Xcode. It simplifies switching among different game scenes and animating game sprites in the development. In a typical game, a sequence of operations is arranged by players to destroy incoming enemy minions. Basic coding concepts like looping, sequencing, conditionals, and classification are integrated in different levels. In later levels, players are required to type in commands and put them in an order to keep playing the game. To reduce the difficulty of the usability testing, a method combining questionnaires and observation was conducted with two groups of college students who either have no programming experience or are familiar with coding. The results show that Guardy has the potential to help children learn programming and practice computational thinking.
ContributorsWang, Xiaoxiao (Author) / Nelson, Brian C. (Thesis advisor) / Turaga, Pavan (Committee member) / Walker, Erin (Committee member) / Arizona State University (Publisher)
Created2017
155963-Thumbnail Image.png
Description
Computer Vision as a eld has gone through signicant changes in the last decade.

The eld has seen tremendous success in designing learning systems with hand-crafted

features and in using representation learning to extract better features. In this dissertation

some novel approaches to representation learning and task learning are studied.

Multiple-instance learning which is

Computer Vision as a eld has gone through signicant changes in the last decade.

The eld has seen tremendous success in designing learning systems with hand-crafted

features and in using representation learning to extract better features. In this dissertation

some novel approaches to representation learning and task learning are studied.

Multiple-instance learning which is generalization of supervised learning, is one

example of task learning that is discussed. In particular, a novel non-parametric k-

NN-based multiple-instance learning is proposed, which is shown to outperform other

existing approaches. This solution is applied to a diabetic retinopathy pathology

detection problem eectively.

In cases of representation learning, generality of neural features are investigated

rst. This investigation leads to some critical understanding and results in feature

generality among datasets. The possibility of learning from a mentor network instead

of from labels is then investigated. Distillation of dark knowledge is used to eciently

mentor a small network from a pre-trained large mentor network. These studies help

in understanding representation learning with smaller and compressed networks.
ContributorsVenkatesan, Ragav (Author) / Li, Baoxin (Thesis advisor) / Turaga, Pavan (Committee member) / Yang, Yezhou (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)
Created2017
157840-Thumbnail Image.png
Description
Over the last decade, deep neural networks also known as deep learning, combined with large databases and specialized hardware for computation, have made major strides in important areas such as computer vision, computational imaging and natural language processing. However, such frameworks currently suffer from some drawbacks. For example, it is

Over the last decade, deep neural networks also known as deep learning, combined with large databases and specialized hardware for computation, have made major strides in important areas such as computer vision, computational imaging and natural language processing. However, such frameworks currently suffer from some drawbacks. For example, it is generally not clear how the architectures are to be designed for different applications, or how the neural networks behave under different input perturbations and it is not easy to make the internal representations and parameters more interpretable. In this dissertation, I propose building constraints into feature maps, parameters and and design of algorithms involving neural networks for applications in low-level vision problems such as compressive imaging and multi-spectral image fusion, and high-level inference problems including activity and face recognition. Depending on the application, such constraints can be used to design architectures which are invariant/robust to certain nuisance factors, more efficient and, in some cases, more interpretable. Through extensive experiments on real-world datasets, I demonstrate these advantages of the proposed methods over conventional frameworks.
ContributorsLohit, Suhas Anand (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Li, Baoxin (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2019
158817-Thumbnail Image.png
Description
Over the past decade, machine learning research has made great strides and significant impact in several fields. Its success is greatly attributed to the development of effective machine learning algorithms like deep neural networks (a.k.a. deep learning), availability of large-scale databases and access to specialized hardware like Graphic Processing Units.

Over the past decade, machine learning research has made great strides and significant impact in several fields. Its success is greatly attributed to the development of effective machine learning algorithms like deep neural networks (a.k.a. deep learning), availability of large-scale databases and access to specialized hardware like Graphic Processing Units. When designing and training machine learning systems, researchers often assume access to large quantities of data that capture different possible variations. Variations in the data is needed to incorporate desired invariance and robustness properties in the machine learning system, especially in the case of deep learning algorithms. However, it is very difficult to gather such data in a real-world setting. For example, in certain medical/healthcare applications, it is very challenging to have access to data from all possible scenarios or with the necessary amount of variations as required to train the system. Additionally, the over-parameterized and unconstrained nature of deep neural networks can cause them to be poorly trained and in many cases over-confident which, in turn, can hamper their reliability and generalizability. This dissertation is a compendium of my research efforts to address the above challenges. I propose building invariant feature representations by wedding concepts from topological data analysis and Riemannian geometry, that automatically incorporate the desired invariance properties for different computer vision applications. I discuss how deep learning can be used to address some of the common challenges faced when working with topological data analysis methods. I describe alternative learning strategies based on unsupervised learning and transfer learning to address issues like dataset shifts and limited training data. Finally, I discuss my preliminary work on applying simple orthogonal constraints on deep learning feature representations to help develop more reliable and better calibrated models.
ContributorsSom, Anirudh (Author) / Turaga, Pavan (Thesis advisor) / Krishnamurthi, Narayanan (Committee member) / Spanias, Andreas (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2020
158890-Thumbnail Image.png
Description
Open Design is a crowd-driven global ecosystem which tries to challenge and alter contemporary modes of capitalistic hardware production. It strives to build on the collective skills, expertise and efforts of people regardless of their educational, social or political backgrounds to develop and disseminate physical products, machines and systems. In

Open Design is a crowd-driven global ecosystem which tries to challenge and alter contemporary modes of capitalistic hardware production. It strives to build on the collective skills, expertise and efforts of people regardless of their educational, social or political backgrounds to develop and disseminate physical products, machines and systems. In contrast to capitalistic hardware production, Open Design practitioners publicly share design files, blueprints and knowhow through various channels including internet platforms and in-person workshops. These designs are typically replicated, modified, improved and reshared by individuals and groups who are broadly referred to as ‘makers’.

This dissertation aims to expand the current scope of Open Design within human-computer interaction (HCI) research through a long-term exploration of Open Design’s socio-technical processes. I examine Open Design from three perspectives: the functional—materials, tools, and platforms that enable crowd-driven open hardware production, the critical—materially-oriented engagements within open design as a site for sociotechnical discourse, and the speculative—crowd-driven critical envisioning of future hardware.

More specifically, this dissertation first explores the growing global scene of Open Design through a long-term ethnographic study of the open science hardware (OScH) movement, a genre of Open Design. This long-term study of OScH provides a focal point for HCI to deeply understand Open Design's growing global landscape. Second, it examines the application of Critical Making within Open Design through an OScH workshop with designers, engineers, artists and makers from local communities. This work foregrounds the role of HCI researchers as facilitators of collaborative critical engagements within Open Design. Third, this dissertation introduces the concept of crowd-driven Design Fiction through the development of a publicly accessible online Design Fiction platform named Dream Drones. Through a six month long development and a study with drone related practitioners, it offers several pragmatic insights into the challenges and opportunities for crowd-driven Design Fiction. Through these explorations, I highlight the broader implications and novel research pathways for HCI to shape and be shaped by the global Open Design movement.
ContributorsFernando, Kattak Kuttige Rex Piyum (Author) / Kuznetsov, Anastasia (Thesis advisor) / Turaga, Pavan (Committee member) / Middel, Ariane (Committee member) / Takamura, John (Committee member) / Arizona State University (Publisher)
Created2020
158896-Thumbnail Image.png
Description
Cameras have become commonplace with wide-ranging applications of phone photography, computer vision, and medical imaging. With a growing need to reduce size and costs while maintaining image quality, the need to look past traditional style of cameras is becoming more apparent. Several non-traditional cameras have shown to be promising options

Cameras have become commonplace with wide-ranging applications of phone photography, computer vision, and medical imaging. With a growing need to reduce size and costs while maintaining image quality, the need to look past traditional style of cameras is becoming more apparent. Several non-traditional cameras have shown to be promising options for size-constraint applications, and while they may offer several advantages, they also usually are limited by image quality degradation due to optical or a need to reconstruct a captured image. In this thesis, we take a look at three of these non-traditional cameras: a pinhole camera, a diffusion-mask lensless camera, and an under-display camera (UDC).

For each of these cases, I present a feasible image restoration pipeline to correct for their particular limitations. For the pinhole camera, I present an early pipeline to allow for practical pinhole photography by reducing noise levels caused by low-light imaging, enhancing exposure levels, and sharpening the blur caused by the pinhole. For lensless cameras, we explore a neural network architecture that performs joint image reconstruction and point spread function (PSF) estimation to robustly recover images captured with multiple PSFs from different cameras. Using adversarial learning, this approach achieves improved reconstruction results that do not require explicit knowledge of the PSF at test-time and shows an added improvement in the reconstruction model’s ability to generalize to variations in the camera’s PSF. This allows lensless cameras to be utilized in a wider range of applications that require multiple cameras without the need to explicitly train a separate model for each new camera. For UDCs, we utilize a multi-stage approach to correct for low light transmission, blur, and haze. This pipeline uses a PyNET deep neural network architecture to perform a majority of the restoration, while additionally using a traditional optimization approach which is then fused in a learned manner in the second stage to improve high-frequency features. I show results from this novel fusion approach that is on-par with the state of the art.
ContributorsRego, Joshua D (Author) / Jayasuriya, Suren (Thesis advisor) / Blain Christen, Jennifer (Thesis advisor) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2020