Matching Items (50)
Filtering by

Clear all filters

154703-Thumbnail Image.png
Description
Cardiovascular disease (CVD) is the leading cause of mortality yet largely preventable, but the key to prevention is to identify at-risk individuals before adverse events. For predicting individual CVD risk, carotid intima-media thickness (CIMT), a noninvasive ultrasound method, has proven to be valuable, offering several advantages over CT coronary artery

Cardiovascular disease (CVD) is the leading cause of mortality yet largely preventable, but the key to prevention is to identify at-risk individuals before adverse events. For predicting individual CVD risk, carotid intima-media thickness (CIMT), a noninvasive ultrasound method, has proven to be valuable, offering several advantages over CT coronary artery calcium score. However, each CIMT examination includes several ultrasound videos, and interpreting each of these CIMT videos involves three operations: (1) select three enddiastolic ultrasound frames (EUF) in the video, (2) localize a region of interest (ROI) in each selected frame, and (3) trace the lumen-intima interface and the media-adventitia interface in each ROI to measure CIMT. These operations are tedious, laborious, and time consuming, a serious limitation that hinders the widespread utilization of CIMT in clinical practice. To overcome this limitation, this paper presents a new system to automate CIMT video interpretation. Our extensive experiments demonstrate that the suggested system significantly outperforms the state-of-the-art methods. The superior performance is attributable to our unified framework based on convolutional neural networks (CNNs) coupled with our informative image representation and effective post-processing of the CNN outputs, which are uniquely designed for each of the above three operations.
ContributorsShin, Jaeyul (Author) / Liang, Jianming (Thesis advisor) / Maciejewski, Ross (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2016
154885-Thumbnail Image.png
Description
Computational visual aesthetics has recently become an active research area. Existing state-of-art methods formulate this as a binary classification task where a given image is predicted to be beautiful or not. In many applications such as image retrieval and enhancement, it is more important to rank images based on their

Computational visual aesthetics has recently become an active research area. Existing state-of-art methods formulate this as a binary classification task where a given image is predicted to be beautiful or not. In many applications such as image retrieval and enhancement, it is more important to rank images based on their aesthetic quality instead of binary-categorizing them. Furthermore, in such applications, it may be possible that all images belong to the same category. Hence determining the aesthetic ranking of the images is more appropriate. To this end, a novel problem of ranking images with respect to their aesthetic quality is formulated in this work. A new data-set of image pairs with relative labels is constructed by carefully selecting images from the popular AVA data-set. Unlike in aesthetics classification, there is no single threshold which would determine the ranking order of the images across the entire data-set.

This problem is attempted using a deep neural network based approach that is trained on image pairs by incorporating principles from relative learning. Results show that such relative training procedure allows the network to rank the images with a higher accuracy than a state-of-art network trained on the same set of images using binary labels. Further analyzing the results show that training a model using the image pairs learnt better aesthetic features than training on same number of individual binary labelled images.

Additionally, an attempt is made at enhancing the performance of the system by incorporating saliency related information. Given an image, humans might fixate their vision on particular parts of the image, which they might be subconsciously intrigued to. I therefore tried to utilize the saliency information both stand-alone as well as in combination with the global and local aesthetic features by performing two separate sets of experiments. In both the cases, a standard saliency model is chosen and the generated saliency maps are convoluted with the images prior to passing them to the network, thus giving higher importance to the salient regions as compared to the remaining. Thus generated saliency-images are either used independently or along with the global and the local features to train the network. Empirical results show that the saliency related aesthetic features might already be learnt by the network as a sub-set of the global features from automatic feature extraction, thus proving the redundancy of the additional saliency module.
ContributorsGattupalli, Jaya Vijetha (Author) / Li, Baoxin (Thesis advisor) / Davulcu, Hasan (Committee member) / Liang, Jianming (Committee member) / Arizona State University (Publisher)
Created2016
155085-Thumbnail Image.png
Description
High-level inference tasks in video applications such as recognition, video retrieval, and zero-shot classification have become an active research area in recent years. One fundamental requirement for such applications is to extract high-quality features that maintain high-level information in the videos.

Many video feature extraction algorithms have been purposed, such

High-level inference tasks in video applications such as recognition, video retrieval, and zero-shot classification have become an active research area in recent years. One fundamental requirement for such applications is to extract high-quality features that maintain high-level information in the videos.

Many video feature extraction algorithms have been purposed, such as STIP, HOG3D, and Dense Trajectories. These algorithms are often referred to as “handcrafted” features as they were deliberately designed based on some reasonable considerations. However, these algorithms may fail when dealing with high-level tasks or complex scene videos. Due to the success of using deep convolution neural networks (CNNs) to extract global representations for static images, researchers have been using similar techniques to tackle video contents. Typical techniques first extract spatial features by processing raw images using deep convolution architectures designed for static image classifications. Then simple average, concatenation or classifier-based fusion/pooling methods are applied to the extracted features. I argue that features extracted in such ways do not acquire enough representative information since videos, unlike images, should be characterized as a temporal sequence of semantically coherent visual contents and thus need to be represented in a manner considering both semantic and spatio-temporal information.

In this thesis, I propose a novel architecture to learn semantic spatio-temporal embedding for videos to support high-level video analysis. The proposed method encodes video spatial and temporal information separately by employing a deep architecture consisting of two channels of convolutional neural networks (capturing appearance and local motion) followed by their corresponding Fully Connected Gated Recurrent Unit (FC-GRU) encoders for capturing longer-term temporal structure of the CNN features. The resultant spatio-temporal representation (a vector) is used to learn a mapping via a Fully Connected Multilayer Perceptron (FC-MLP) to the word2vec semantic embedding space, leading to a semantic interpretation of the video vector that supports high-level analysis. I evaluate the usefulness and effectiveness of this new video representation by conducting experiments on action recognition, zero-shot video classification, and semantic video retrieval (word-to-video) retrieval, using the UCF101 action recognition dataset.
ContributorsHu, Sheng-Hung (Author) / Li, Baoxin (Thesis advisor) / Turaga, Pavan (Committee member) / Liang, Jianming (Committee member) / Tong, Hanghang (Committee member) / Arizona State University (Publisher)
Created2016
155739-Thumbnail Image.png
Description
In this thesis, I propose a new technique of Aligning English sentence words

with its Semantic Representation using Inductive Logic Programming(ILP). My

work focusses on Abstract Meaning Representation(AMR). AMR is a semantic

formalism to English natural language. It encodes meaning of a sentence in a rooted

graph. This representation has gained attention for its

In this thesis, I propose a new technique of Aligning English sentence words

with its Semantic Representation using Inductive Logic Programming(ILP). My

work focusses on Abstract Meaning Representation(AMR). AMR is a semantic

formalism to English natural language. It encodes meaning of a sentence in a rooted

graph. This representation has gained attention for its simplicity and expressive power.

An AMR Aligner aligns words in a sentence to nodes(concepts) in its AMR

graph. As AMR annotation has no explicit alignment with words in English sentence,

automatic alignment becomes a requirement for training AMR parsers. The aligner in

this work comprises of two components. First, rules are learnt using ILP that invoke

AMR concepts from sentence-AMR graph pairs in the training data. Second, the

learnt rules are then used to align English sentences with AMR graphs. The technique

is evaluated on publicly available test dataset and the results are comparable with

state-of-the-art aligner.
ContributorsAgarwal, Shubham (Author) / Baral, Chitta (Thesis advisor) / Li, Baoxin (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2017
155764-Thumbnail Image.png
Description
With the rise of Online Social Networks (OSN) in the last decade, social network analysis has become a crucial research topic. The OSN graphs have unique properties that distinguish them from other types of graphs. In this thesis, five month Tweet corpus collected from Bangladesh - between June 2016 and

With the rise of Online Social Networks (OSN) in the last decade, social network analysis has become a crucial research topic. The OSN graphs have unique properties that distinguish them from other types of graphs. In this thesis, five month Tweet corpus collected from Bangladesh - between June 2016 and October 2016 is analyzed, in order to detect accounts that belong to groups. These groups consist of official and non-official twitter handles of political organizations and NGOs in Bangladesh. A set of network, temporal, spatial and behavioral features are proposed to discriminate between accounts belonging to individual twitter users, news, groups and organization leaders. Finally, the experimental results are presented and a subset of relevant features is identified that lead to a generalizable model. Detection of tiny number of groups from large network is achieved with 0.8 precision, 0.75 recall and 0.77 F1 score. The domain independent network and behavioral features and models developed here are suitable for solving twitter account classification problem in any context.
ContributorsGore, Chinmay Chandrashekhar (Author) / Davulcu, Hasan (Thesis advisor) / Hsiao, Ihan (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2017
155900-Thumbnail Image.png
Description
Compressive sensing theory allows to sense and reconstruct signals/images with lower sampling rate than Nyquist rate. Applications in resource constrained environment stand to benefit from this theory, opening up many possibilities for new applications at the same time. The traditional inference pipeline for computer vision sequence reconstructing the image from

Compressive sensing theory allows to sense and reconstruct signals/images with lower sampling rate than Nyquist rate. Applications in resource constrained environment stand to benefit from this theory, opening up many possibilities for new applications at the same time. The traditional inference pipeline for computer vision sequence reconstructing the image from compressive measurements. However,the reconstruction process is a computationally expensive step that also provides poor results at high compression rate. There have been several successful attempts to perform inference tasks directly on compressive measurements such as activity recognition. In this thesis, I am interested to tackle a more challenging vision problem - Visual question answering (VQA) without reconstructing the compressive images. I investigate the feasibility of this problem with a series of experiments, and I evaluate proposed methods on a VQA dataset and discuss promising results and direction for future work.
ContributorsHuang, Li-Chin (Author) / Turaga, Pavan (Thesis advisor) / Yang, Yezhou (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2017
155861-Thumbnail Image.png
Description
Bangladesh is a secular democracy with almost 90% of its population constituting of Muslims and the rest 10% constituting of the minority groups that includes Hindus, Christians, Buddhists, Ahmadi Muslims, Shia, Sufi, LGBT groups and Atheists. In recent years, Bangladesh has experienced an increase in attacks by religious extremist groups,

Bangladesh is a secular democracy with almost 90% of its population constituting of Muslims and the rest 10% constituting of the minority groups that includes Hindus, Christians, Buddhists, Ahmadi Muslims, Shia, Sufi, LGBT groups and Atheists. In recent years, Bangladesh has experienced an increase in attacks by religious extremist groups, such as IS and AQIS affiliates, hate-groups and politically motivated violence. Attacks have also become indiscriminate, with assailants targeting a wide variety of individuals, including religious minorities and foreigners. According to the telecoms regulator, the number of internet users in Bangladesh now stands at over 66.8 million reaching 41% penetration. Of them, 63 million access the internet through mobile phones. Facebook, with the usage of about 97.2%, is the most used social network in Bangladesh.

In this research, local academics with cultural expertise collaborated to locate and download content from 292 Facebook groups organized under three (3) major umbrella types: Religious Terrorist Violence, Political Intolerance and Issue, and Target-based Intolerance between June2016 - December 2016 period. Dates of real extremist attacks were aligned with corresponding Facebook message streams, identified posts and comments related to the targets and perpetrators of the attacks, and proceeded to use the context of the attacks, their effects, the nature and structure of underlying extremist and counter-violent extremist networks, to study the narratives and trends over time.
ContributorsChhabra, Pankaj (Author) / Davulcu, Hasan (Thesis advisor) / Li, Baoxin (Committee member) / Hsiao, Ihan (Committee member) / Arizona State University (Publisher)
Created2017
155809-Thumbnail Image.png
Description
Light field imaging is limited in its computational processing demands of high

sampling for both spatial and angular dimensions. Single-shot light field cameras

sacrifice spatial resolution to sample angular viewpoints, typically by multiplexing

incoming rays onto a 2D sensor array. While this resolution can be recovered using

compressive sensing, these iterative solutions are slow

Light field imaging is limited in its computational processing demands of high

sampling for both spatial and angular dimensions. Single-shot light field cameras

sacrifice spatial resolution to sample angular viewpoints, typically by multiplexing

incoming rays onto a 2D sensor array. While this resolution can be recovered using

compressive sensing, these iterative solutions are slow in processing a light field. We

present a deep learning approach using a new, two branch network architecture,

consisting jointly of an autoencoder and a 4D CNN, to recover a high resolution

4D light field from a single coded 2D image. This network decreases reconstruction

time significantly while achieving average PSNR values of 26-32 dB on a variety of

light fields. In particular, reconstruction time is decreased from 35 minutes to 6.7

minutes as compared to the dictionary method for equivalent visual quality. These

reconstructions are performed at small sampling/compression ratios as low as 8%,

allowing for cheaper coded light field cameras. We test our network reconstructions

on synthetic light fields, simulated coded measurements of real light fields captured

from a Lytro Illum camera, and real coded images from a custom CMOS diffractive

light field camera. The combination of compressive light field capture with deep

learning allows the potential for real-time light field video acquisition systems in the

future.
ContributorsGupta, Mayank (Author) / Turaga, Pavan (Thesis advisor) / Yang, Yezhou (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2017
155817-Thumbnail Image.png
Description
Improving accessibility to public buildings by people with special needs has been an important societal commitment that is mandated by federal laws. In the information age, accessibility can mean more than simply providing physical accommodations like ramps for wheel-chairs. Better yet, accessibility will be fundamentally improved, if a user can

Improving accessibility to public buildings by people with special needs has been an important societal commitment that is mandated by federal laws. In the information age, accessibility can mean more than simply providing physical accommodations like ramps for wheel-chairs. Better yet, accessibility will be fundamentally improved, if a user can be made aware of important location-specific information like functions of offices near the user within a building. A smart environment may help a new person quickly get acquainted about the environment. Such features can be more critical for cases of making an indoor environment more accessible to people with visual impairment. With the intention to promote the integration of visually impaired people in society, this thesis efforts on methodologies for building smart and accessible indoor office environments with the help of Apple's Bluetooth Low Energy (BLE) technology called iBeacon to provide location awareness and enable easy access to information about the environment to people with visual impairment. This thesis presents work done on developing an iterative based approach in improving the configuration of given number of iBeacons to gain optimal signal coverage in a given office space environment and enabling smart features such as tagging points of interest and push notifications. This work aims to exploit the idea to look at visual impairment beyond the level of disability and cash it at as an opportunity to bring about a change of style of living. This work develops a methodology by introducing an end-to-end systems that uses intelligent server side and visually-impaired-friendly client side interfaces to give a prototype of an assistive technology to help them do basic activities like getting familiarized about an office environment without the need for asking for assistance.
ContributorsLagisetty, Jashmi (Author) / Li, Baoxin (Thesis advisor) / Hedgpeth, Terri (Committee member) / Balasooriya, Janaka (Committee member) / Arizona State University (Publisher)
Created2017
155419-Thumbnail Image.png
Description
Answer Set Programming (ASP) is one of the main formalisms in Knowledge Representation (KR) that is being widely applied in a large number of applications. While ASP is effective on Boolean decision problems, it has difficulty in expressing quantitative uncertainty and probability in a natural way.

Logic Programs under the answer

Answer Set Programming (ASP) is one of the main formalisms in Knowledge Representation (KR) that is being widely applied in a large number of applications. While ASP is effective on Boolean decision problems, it has difficulty in expressing quantitative uncertainty and probability in a natural way.

Logic Programs under the answer set semantics and Markov Logic Network (LPMLN) is a recent extension of answer set programs to overcome the limitation of the deterministic nature of ASP by adopting the log-linear weight scheme of Markov Logic. This thesis investigates the relationships between LPMLN and two other extensions of ASP: weak constraints to express a quantitative preference among answer sets, and P-log to incorporate probabilistic uncertainty. The studied relationships show how different extensions of answer set programs are related to each other, and how they are related to formalisms in Statistical Relational Learning, such as Problog and MLN, which have shown to be closely related to LPMLN. The studied relationships compare the properties of the involved languages and provide ways to compute one language using an implementation of another language.

This thesis first presents a translation of LPMLN into programs with weak constraints. The translation allows for computing the most probable stable models (i.e., MAP estimates) or probability distribution in LPMLN programs using standard ASP solvers so that the well-developed techniques in ASP can be utilized. This result can be extended to other formalisms, such as Markov Logic, ProbLog, and Pearl’s Causal Models, that are shown to be translatable into LPMLN.

This thesis also presents a translation of P-log into LPMLN. The translation tells how probabilistic nonmonotonicity (the ability of the reasoner to change his probabilistic model as a result of new information) of P-log can be represented in LPMLN, which yields a way to compute P-log using standard ASP solvers or MLN solvers.
ContributorsYang, Zhun (Author) / Lee, Joohyung (Thesis advisor) / Baral, Chitta (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2017