Matching Items (17)
Filtering by

Clear all filters

Description
Generating real-world content for VR is challenging in terms of capturing and processing at high resolution and high frame-rates. The content needs to represent a truly immersive experience, where the user can look around in 360-degree view and perceive the depth of the scene. The existing solutions only capture and

Generating real-world content for VR is challenging in terms of capturing and processing at high resolution and high frame-rates. The content needs to represent a truly immersive experience, where the user can look around in 360-degree view and perceive the depth of the scene. The existing solutions only capture and offload the compute load to the server. But offloading large amounts of raw camera feeds takes longer latencies and poses difficulties for real-time applications. By capturing and computing on the edge, we can closely integrate the systems and optimize for low latency. However, moving the traditional stitching algorithms to battery constrained device needs at least three orders of magnitude reduction in power. We believe that close integration of capture and compute stages will lead to reduced overall system power.

We approach the problem by building a hardware prototype and characterize the end-to-end system bottlenecks of power and performance. The prototype has 6 IMX274 cameras and uses Nvidia Jetson TX2 development board for capture and computation. We found that capturing is bottlenecked by sensor power and data-rates across interfaces, whereas compute is limited by the total number of computations per frame. Our characterization shows that redundant capture and redundant computations lead to high power, huge memory footprint, and high latency. The existing systems lack hardware-software co-design aspects, leading to excessive data transfers across the interfaces and expensive computations within the individual subsystems. Finally, we propose mechanisms to optimize the system for low power and low latency. We emphasize the importance of co-design of different subsystems to reduce and reuse the data. For example, reusing the motion vectors of the ISP stage reduces the memory footprint of the stereo correspondence stage. Our estimates show that pipelining and parallelization on custom FPGA can achieve real time stitching.
ContributorsGunnam, Sridhar (Author) / LiKamWa, Robert (Thesis advisor) / Turaga, Pavan (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2018
156735-Thumbnail Image.png
Description
The popularity of social media has generated abundant large-scale social networks, which advances research on network analytics. Good representations of nodes in a network can facilitate many network mining tasks. The goal of network representation learning (network embedding) is to learn low-dimensional vector representations of social network nodes that capture

The popularity of social media has generated abundant large-scale social networks, which advances research on network analytics. Good representations of nodes in a network can facilitate many network mining tasks. The goal of network representation learning (network embedding) is to learn low-dimensional vector representations of social network nodes that capture certain properties of the networks. With the learned node representations, machine learning and data mining algorithms can be applied for network mining tasks such as link prediction and node classification. Because of its ability to learn good node representations, network representation learning is attracting increasing attention and various network embedding algorithms are proposed.

Despite the success of these network embedding methods, the majority of them are dedicated to static plain networks, i.e., networks with fixed nodes and links only; while in social media, networks can present in various formats, such as attributed networks, signed networks, dynamic networks and heterogeneous networks. These social networks contain abundant rich information to alleviate the network sparsity problem and can help learn a better network representation; while plain network embedding approaches cannot tackle such networks. For example, signed social networks can have both positive and negative links. Recent study on signed networks shows that negative links have added value in addition to positive links for many tasks such as link prediction and node classification. However, the existence of negative links challenges the principles used for plain network embedding. Thus, it is important to study signed network embedding. Furthermore, social networks can be dynamic, where new nodes and links can be introduced anytime. Dynamic networks can reveal the concept drift of a user and require efficiently updating the representation when new links or users are introduced. However, static network embedding algorithms cannot deal with dynamic networks. Therefore, it is important and challenging to propose novel algorithms for tackling different types of social networks.

In this dissertation, we investigate network representation learning in social media. In particular, we study representative social networks, which includes attributed network, signed networks, dynamic networks and document networks. We propose novel frameworks to tackle the challenges of these networks and learn representations that not only capture the network structure but also the unique properties of these social networks.
ContributorsWang, Suhang (Author) / Liu, Huan (Thesis advisor) / Aggarwal, Charu (Committee member) / Sen, Arunabha (Committee member) / Tong, Hanghang (Committee member) / Arizona State University (Publisher)
Created2018
154545-Thumbnail Image.png
Description
Many neurological disorders, especially those that result in dementia, impact speech and language production. A number of studies have shown that there exist subtle changes in linguistic complexity in these individuals that precede disease onset. However, these studies are conducted on controlled speech samples from a specific task. This thesis

Many neurological disorders, especially those that result in dementia, impact speech and language production. A number of studies have shown that there exist subtle changes in linguistic complexity in these individuals that precede disease onset. However, these studies are conducted on controlled speech samples from a specific task. This thesis explores the possibility of using natural language processing in order to detect declining linguistic complexity from more natural discourse. We use existing data from public figures suspected (or at risk) of suffering from cognitive-linguistic decline, downloaded from the Internet, to detect changes in linguistic complexity. In particular, we focus on two case studies. The first case study analyzes President Ronald Reagan’s transcribed spontaneous speech samples during his presidency. President Reagan was diagnosed with Alzheimer’s disease in 1994, however my results showed declining linguistic complexity during the span of the 8 years he was in office. President George Herbert Walker Bush, who has no known diagnosis of Alzheimer’s disease, shows no decline in the same measures. In the second case study, we analyze transcribed spontaneous speech samples from the news conferences of 10 current NFL players and 18 non-player personnel since 2007. The non-player personnel have never played professional football. Longitudinal analysis of linguistic complexity showed contrasting patterns in the two groups. The majority (6 of 10) of current players showed decline in at least one measure of linguistic complexity over time. In contrast, the majority (11 out of 18) of non-player personnel showed an increase in at least one linguistic complexity measure.
ContributorsWang, Shuai (Author) / Berisha, Visar (Thesis advisor) / LaCross, Amy (Committee member) / Tong, Hanghang (Committee member) / Arizona State University (Publisher)
Created2016
155085-Thumbnail Image.png
Description
High-level inference tasks in video applications such as recognition, video retrieval, and zero-shot classification have become an active research area in recent years. One fundamental requirement for such applications is to extract high-quality features that maintain high-level information in the videos.

Many video feature extraction algorithms have been purposed, such

High-level inference tasks in video applications such as recognition, video retrieval, and zero-shot classification have become an active research area in recent years. One fundamental requirement for such applications is to extract high-quality features that maintain high-level information in the videos.

Many video feature extraction algorithms have been purposed, such as STIP, HOG3D, and Dense Trajectories. These algorithms are often referred to as “handcrafted” features as they were deliberately designed based on some reasonable considerations. However, these algorithms may fail when dealing with high-level tasks or complex scene videos. Due to the success of using deep convolution neural networks (CNNs) to extract global representations for static images, researchers have been using similar techniques to tackle video contents. Typical techniques first extract spatial features by processing raw images using deep convolution architectures designed for static image classifications. Then simple average, concatenation or classifier-based fusion/pooling methods are applied to the extracted features. I argue that features extracted in such ways do not acquire enough representative information since videos, unlike images, should be characterized as a temporal sequence of semantically coherent visual contents and thus need to be represented in a manner considering both semantic and spatio-temporal information.

In this thesis, I propose a novel architecture to learn semantic spatio-temporal embedding for videos to support high-level video analysis. The proposed method encodes video spatial and temporal information separately by employing a deep architecture consisting of two channels of convolutional neural networks (capturing appearance and local motion) followed by their corresponding Fully Connected Gated Recurrent Unit (FC-GRU) encoders for capturing longer-term temporal structure of the CNN features. The resultant spatio-temporal representation (a vector) is used to learn a mapping via a Fully Connected Multilayer Perceptron (FC-MLP) to the word2vec semantic embedding space, leading to a semantic interpretation of the video vector that supports high-level analysis. I evaluate the usefulness and effectiveness of this new video representation by conducting experiments on action recognition, zero-shot video classification, and semantic video retrieval (word-to-video) retrieval, using the UCF101 action recognition dataset.
ContributorsHu, Sheng-Hung (Author) / Li, Baoxin (Thesis advisor) / Turaga, Pavan (Committee member) / Liang, Jianming (Committee member) / Tong, Hanghang (Committee member) / Arizona State University (Publisher)
Created2016
156246-Thumbnail Image.png
Description
Diffusion processes in networks can be used to model many real-world processes, such as the propagation of a rumor on social networks and cascading failures on power networks. Analysis of diffusion processes in networks can help us answer important questions such as the role and the importance of each node

Diffusion processes in networks can be used to model many real-world processes, such as the propagation of a rumor on social networks and cascading failures on power networks. Analysis of diffusion processes in networks can help us answer important questions such as the role and the importance of each node in the network for spreading the diffusion and how to top or contain a cascading failure in the network. This dissertation consists of three parts.

In the first part, we study the problem of locating multiple diffusion sources in networks under the Susceptible-Infected-Recovered (SIR) model. Given a complete snapshot of the network, we developed a sample-path-based algorithm, named clustering and localization, and proved that for regular trees, the estimators produced by the proposed algorithm are within a constant distance from the real sources with a high probability. Then, we considered the case in which only a partial snapshot is observed and proposed a new algorithm, named Optimal-Jordan-Cover (OJC). The algorithm first extracts a subgraph using a candidate selection algorithm that selects source candidates based on the number of observed infected nodes in their neighborhoods. Then, in the extracted subgraph, OJC finds a set of nodes that "cover" all observed infected nodes with the minimum radius. The set of nodes is called the Jordan cover, and is regarded as the set of diffusion sources. We proved that OJC can locate all sources with probability one asymptotically with partial observations in the Erdos-Renyi (ER) random graph. Multiple experiments on different networks were done, which show our algorithms outperform others.

In the second part, we tackle the problem of reconstructing the diffusion history from partial observations. We formulated the diffusion history reconstruction problem as a maximum a posteriori (MAP) problem and proved the problem is NP hard. Then we proposed a step-by- step reconstruction algorithm, which can always produce a diffusion history that is consistent with the partial observations. Our experimental results based on synthetic and real networks show that the algorithm significantly outperforms some existing methods.

In the third part, we consider the problem of improving the robustness of an interdependent network by rewiring a small number of links during a cascading attack. We formulated the problem as a Markov decision process (MDP) problem. While the problem is NP-hard, we developed an effective and efficient algorithm, RealWire, to robustify the network and to mitigate the damage during the attack. Extensive experimental results show that our algorithm outperforms other algorithms on most of the robustness metrics.
ContributorsChen, Zhen (Author) / Ying, Lei (Thesis advisor) / Tong, Hanghang (Thesis advisor) / Zhang, Junshan (Committee member) / He, Jingrui (Committee member) / Arizona State University (Publisher)
Created2018
158419-Thumbnail Image.png
Description
Object detection is an interesting computer vision area that is concerned with the detection of object instances belonging to specific classes of interest as well as the localization of these instances in images and/or videos. Object detection serves as a vital module in many computer vision based applications. This work

Object detection is an interesting computer vision area that is concerned with the detection of object instances belonging to specific classes of interest as well as the localization of these instances in images and/or videos. Object detection serves as a vital module in many computer vision based applications. This work focuses on the development of object detection methods that exhibit increased robustness to varying illuminations and image quality. In this work, two methods for robust object detection are presented.

In the context of varying illumination, this work focuses on robust generic obstacle detection and collision warning in Advanced Driver Assistance Systems (ADAS) under varying illumination conditions. The highlight of the first method is the ability to detect all obstacles without prior knowledge and detect partially occluded obstacles including the obstacles that have not completely appeared in the frame (truncated obstacles). It is first shown that the angular distortion in the Inverse Perspective Mapping (IPM) domain belonging to obstacle edges varies as a function of their corresponding 2D location in the camera plane. This information is used to generate object proposals. A novel proposal assessment method based on fusing statistical properties from both the IPM image and the camera image to perform robust outlier elimination and false positive reduction is also proposed.

In the context of image quality, this work focuses on robust multiple-class object detection using deep neural networks for images with varying quality. The use of Generative Adversarial Networks (GANs) is proposed in a novel generative framework to generate features that provide robustness for object detection on reduced quality images. The proposed GAN-based Detection of Objects (GAN-DO) framework is not restricted to any particular architecture and can be generalized to several deep neural network (DNN) based architectures. The resulting deep neural network maintains the exact architecture as the selected baseline model without adding to the model parameter complexity or inference speed. Performance results provided using GAN-DO on object detection datasets establish an improved robustness to varying image quality and a higher object detection and classification accuracy compared to the existing approaches.
ContributorsPrakash, Charan Dudda (Author) / Karam, Lina (Thesis advisor) / Abousleman, Glen (Committee member) / Jayasuriya, Suren (Committee member) / Yu, Hongbin (Committee member) / Arizona State University (Publisher)
Created2020
158584-Thumbnail Image.png
Description
The following document describes the hardware implementation and analysis of Temporal Interference Mitigation using High-Level Synthesis. As the problem of spectral congestion becomes more chronic and widespread, Electromagnetic radio frequency (RF) based systems are posing as viable solution to this problem. Among the existing RF methods Cooperation based systems have

The following document describes the hardware implementation and analysis of Temporal Interference Mitigation using High-Level Synthesis. As the problem of spectral congestion becomes more chronic and widespread, Electromagnetic radio frequency (RF) based systems are posing as viable solution to this problem. Among the existing RF methods Cooperation based systems have been a solution to a host of congestion problems. One of the most important elements of RF receiver is the spatially adaptive part of the receiver. Temporal Mitigation is vital technique employed at the receiver for signal recovery and future propagation along the radar chain.

The computationally intensive parts of temporal mitigation are identified and hardware accelerated. The hardware implementation is based on sequential approach with optimizations applied on the individual components for better performance.

An extensive analysis using a range of fixed point data types is performed to find the optimal data type necessary.

Finally a hybrid combination of data types for different components of temporal mitigation is proposed based on results from the above analysis.
ContributorsSiddiqui, Saquib Ahmad (Author) / Bliss, Daniel (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Ogras, Umit Y. (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2020
157866-Thumbnail Image.png
Description
This thesis addresses the problem of recommending a viewpoint for aesthetic photography. Viewpoint recommendation is suggesting the best camera pose to capture a visually pleasing photograph of the subject of interest by using any end-user device such as drone, mobile robot or smartphone. Solving this problem enables to capture visually

This thesis addresses the problem of recommending a viewpoint for aesthetic photography. Viewpoint recommendation is suggesting the best camera pose to capture a visually pleasing photograph of the subject of interest by using any end-user device such as drone, mobile robot or smartphone. Solving this problem enables to capture visually pleasing photographs autonomously in areal photography, wildlife photography, landscape photography or in personal photography.

The viewpoint recommendation problem can be divided into two stages: (a) generating a set of dense novel views based on the basis views captured about the subject. The dense novel views are useful to better understand the scene and to know how the subject looks from different viewpoints and (b) each novel is scored based on how aesthetically good it is. The viewpoint with the greatest aesthetic score is recommended for capturing a visually pleasing photograph.
ContributorsKatukuri, Sathish Kumar (Author) / LiKamWa, Robert (Thesis advisor) / Turaga, Pavan (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2019
158896-Thumbnail Image.png
Description
Cameras have become commonplace with wide-ranging applications of phone photography, computer vision, and medical imaging. With a growing need to reduce size and costs while maintaining image quality, the need to look past traditional style of cameras is becoming more apparent. Several non-traditional cameras have shown to be promising options

Cameras have become commonplace with wide-ranging applications of phone photography, computer vision, and medical imaging. With a growing need to reduce size and costs while maintaining image quality, the need to look past traditional style of cameras is becoming more apparent. Several non-traditional cameras have shown to be promising options for size-constraint applications, and while they may offer several advantages, they also usually are limited by image quality degradation due to optical or a need to reconstruct a captured image. In this thesis, we take a look at three of these non-traditional cameras: a pinhole camera, a diffusion-mask lensless camera, and an under-display camera (UDC).

For each of these cases, I present a feasible image restoration pipeline to correct for their particular limitations. For the pinhole camera, I present an early pipeline to allow for practical pinhole photography by reducing noise levels caused by low-light imaging, enhancing exposure levels, and sharpening the blur caused by the pinhole. For lensless cameras, we explore a neural network architecture that performs joint image reconstruction and point spread function (PSF) estimation to robustly recover images captured with multiple PSFs from different cameras. Using adversarial learning, this approach achieves improved reconstruction results that do not require explicit knowledge of the PSF at test-time and shows an added improvement in the reconstruction model’s ability to generalize to variations in the camera’s PSF. This allows lensless cameras to be utilized in a wider range of applications that require multiple cameras without the need to explicitly train a separate model for each new camera. For UDCs, we utilize a multi-stage approach to correct for low light transmission, blur, and haze. This pipeline uses a PyNET deep neural network architecture to perform a majority of the restoration, while additionally using a traditional optimization approach which is then fused in a learned manner in the second stage to improve high-frequency features. I show results from this novel fusion approach that is on-par with the state of the art.
ContributorsRego, Joshua D (Author) / Jayasuriya, Suren (Thesis advisor) / Blain Christen, Jennifer (Thesis advisor) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2020
161561-Thumbnail Image.png
Description
A distributed wireless sensor network (WSN) is a network of a large number of lowcost,multi-functional sensors with power, bandwidth, and memory constraints, operating in remote environments with sensing and communication capabilities. WSNs are a source for a large amount of data and due to the inherent communication and resource constraints, developing a distributed

A distributed wireless sensor network (WSN) is a network of a large number of lowcost,multi-functional sensors with power, bandwidth, and memory constraints, operating in remote environments with sensing and communication capabilities. WSNs are a source for a large amount of data and due to the inherent communication and resource constraints, developing a distributed algorithms to perform statistical parameter estimation and data analysis is necessary. In this work, consensus based distributed algorithms are developed for distributed estimation and processing over WSNs. Firstly, a distributed spectral clustering algorithm to group the sensors based on the location attributes is developed. Next, a distributed max consensus algorithm robust to additive noise in the network is designed. Furthermore, distributed spectral radius estimation algorithms for analog, as well as, digital communication models are developed. The proposed algorithms work for any connected graph topologies. Theoretical bounds are derived and simulation results supporting the theory are also presented.
ContributorsMuniraju, Gowtham (Author) / Tepedelenlioğlu, Cihan (Thesis advisor) / Spanias, Andreas (Thesis advisor) / Berisha, Visar (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2021