Search Content

Towards energy efficient computing with Linux: enabling task level power awareness and support for energy efficient accelerator

Description

With increasing transistor volume and reducing feature size, it has become a major design constraint to reduce power consumption also. This has given rise to aggressive architectural changes for on-chip power management and rapid development to energy efficient hardware accelerators. Accordingly, the objective of this research work is to facilitate…

With increasing transistor volume and reducing feature size, it has become a major design constraint to reduce power consumption also. This has given rise to aggressive architectural changes for on-chip power management and rapid development to energy efficient hardware accelerators. Accordingly, the objective of this research work is to facilitate software developers to leverage these hardware techniques and improve energy efficiency of the system. To achieve this, I propose two solutions for Linux kernel: Optimal use of these architectural enhancements to achieve greater energy efficiency requires accurate modeling of processor power consumption. Though there are many models available in literature to model processor power consumption, there is a lack of such models to capture power consumption at the task-level. Task-level energy models are a requirement for an operating system (OS) to perform real-time power management as OS time multiplexes tasks to enable sharing of hardware resources. I propose a detailed design methodology for constructing an architecture agnostic task-level power model and incorporating it into a modern operating system to build an online task-level power profiler. The profiler is implemented inside the latest Linux kernel and validated for Intel Sandy Bridge processor. It has a negligible overhead of less than 1\% hardware resource consumption. The profiler power prediction was demonstrated for various application benchmarks from SPEC to PARSEC with less than 4\% error. I also demonstrate the importance of the proposed profiler for emerging architectural techniques through use case scenarios, which include heterogeneous computing and fine grained per-core DVFS. Along with architectural enhancement in general purpose processors to improve energy efficiency, hardware accelerators like Coarse Grain reconfigurable architecture (CGRA) are gaining popularity. Unlike vector processors, which rely on data parallelism, CGRA can provide greater flexibility and compiler level control making it more suitable for present SoC environment. To provide streamline development environment for CGRA, I propose a flexible framework in Linux to do design space exploration for CGRA. With accurate and flexible hardware models, fine grained integration with accurate architectural simulator, and Linux memory management and DMA support, a user can carry out limitless experiments on CGRA in full system environment.

ContributorsDesai, Digant Pareshkumar (Author) / Vrudhula, Sarma (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)

Created2013

The optical properties of nitride semiconductors for visible light emission

Description

Nitride semiconductors have wide applications in electronics and optoelectronics technologies. Understanding the nature of the optical recombination process and its effects on luminescence efficiency is important for the development of novel devices. This dissertation deals with the optical properties of nitride semiconductors, including GaN epitaxial layers and more complex heterostructures.…

Nitride semiconductors have wide applications in electronics and optoelectronics technologies. Understanding the nature of the optical recombination process and its effects on luminescence efficiency is important for the development of novel devices. This dissertation deals with the optical properties of nitride semiconductors, including GaN epitaxial layers and more complex heterostructures. The emission characteristics are examined by cathodoluminescence spectroscopy and imaging, and are correlated with the structural and electrical properties studied by transmission electron microscopy and electron holography. Four major areas are covered in this dissertation, which are described next. The effect of strain on the emission characteristics in wurtzite GaN has been studied. The values of the residual strain in GaN epilayers with different dislocation densities are determined by x-ray diffraction, and the relationship between exciton emission energy and the in-plane residual strain is demonstrated. It shows that the emission energy increases withthe magnitude of the in-plane compressive strain. The temperature dependence of the emission characteristics in cubic GaN has been studied. It is observed that the exciton emission and donor-acceptor pair recombination behave differently with temperature. The donor-bound exciton binding energy has been measured to be 13 meV from the temperature dependence of the emission spectrum. It is also found that the ionization energies for both acceptors and donors are smaller in cubic compared with hexagonal structures, which should contribute to higher doping efficiencies. A comprehensive study on the structural and optical properties is presented for InGaN/GaN quantum wells emitting in the blue, green, and yellow regions of the electromagnetic spectrum. Transmission electron microscopy images indicate the presence of indium inhomogeneties which should be responsible for carrier localization. The temperature dependence of emission luminescence shows that the carrier localization effects become more significant with increasing emission wavelength. On the other hand, the effect of non-radiative recombination on luminescence efficiency also varies with the emission wavelength. The fast increase of the non-radiative recombination rate with temperature in the green emitting QWs contributes to the lower efficiency compared with the blue emitting QWs. The possible saturation of non-radiative recombination above 100 K may explain the unexpected high emission efficiency for the yellow emitting QWs Finally, the effects of InGaN underlayers on the electronic and optical properties of InGaN/GaN quantum wells emitting in visible spectral regions have been studied. A significant improvement of the emission efficiency is observed, which is associated with a blue shift in the emission energy, a reduced recombination lifetime, an increased spatial homogeneity in the luminescence, and a weaker internal field across the quantum wells. These are explained by a partial strain relaxation introduced by the InGaN underlayer, which is measured by reciprocal space mapping of the x-ray diffraction intensity.

ContributorsLi, Di (Author) / Ponce, Fernando (Thesis advisor) / Culbertson, Robert (Committee member) / Yu, Hongbin (Committee member) / Shumway, John (Committee member) / Menéndez, Jose (Committee member) / Arizona State University (Publisher)

Created2012

The influence of dome size, parent vessel angle, and coil packing density on coil embolization treatment in cerebral aneurysms

Description

A cerebral aneurysm is a bulging of a blood vessel in the brain. Aneurysmal rupture affects 25,000 people each year and is associated with a 45% mortality rate. Therefore, it is critically important to treat cerebral aneurysms effectively before they rupture. Endovascular coiling is the most effective treatment for cerebral…

A cerebral aneurysm is a bulging of a blood vessel in the brain. Aneurysmal rupture affects 25,000 people each year and is associated with a 45% mortality rate. Therefore, it is critically important to treat cerebral aneurysms effectively before they rupture. Endovascular coiling is the most effective treatment for cerebral aneurysms. During coiling process, series of metallic coils are deployed into the aneurysmal sack with the intent of reaching a sufficient packing density (PD). Coils packing can facilitate thrombus formation and help seal off the aneurysm from circulation over time. While coiling is effective, high rates of treatment failure have been associated with basilar tip aneurysms (BTAs). Treatment failure may be related to geometrical features of the aneurysm. The purpose of this study was to investigate the influence of dome size, parent vessel (PV) angle, and PD on post-treatment aneurysmal hemodynamics using both computational fluid dynamics (CFD) and particle image velocimetry (PIV). Flows in four idealized BTA models with a combination of dome sizes and two different PV angles were simulated using CFD and then validated against PIV data. Percent reductions in post-treatment aneurysmal velocity and cross-neck (CN) flow as well as percent coverage of low wall shear stress (WSS) area were analyzed. In all models, aneurysmal velocity and CN flow decreased after coiling, while low WSS area increased. However, with increasing PD, further reductions were observed in aneurysmal velocity and CN flow, but minimal changes were observed in low WSS area. Overall, coil PD had the greatest impact while dome size has greater impact than PV angle on aneurysmal hemodynamics. These findings lead to a conclusion that combinations of treatment goals and geometric factor may play key roles in coil embolization treatment outcomes, and support that different treatment timing may be a critical factor in treatment optimization.

ContributorsIndahlastari, Aprinda (Author) / Frakes, David (Thesis advisor) / Chong, Brian (Committee member) / Muthuswamy, Jitendran (Committee member) / Arizona State University (Publisher)

Created2013

Kinematic analysis and quantitative evaluation for reach movements in stroke rehabilitation

Description

In this thesis, quantitative evaluation of quality of movement during stroke rehabilitation will be discussed. Previous research on stroke rehabilitation in hospital has been shown to be effective. In this thesis, we study various issues that arise when creating a home-based system that can be deployed in a patient's home.…

In this thesis, quantitative evaluation of quality of movement during stroke rehabilitation will be discussed. Previous research on stroke rehabilitation in hospital has been shown to be effective. In this thesis, we study various issues that arise when creating a home-based system that can be deployed in a patient's home. Limitation of motion capture due to reduced number of sensors leads to problems with design of kinematic features for quantitative evaluation. Also, the hierarchical three-level tasks of rehabilitation requires new design of kinematic features. In this thesis, the design of kinematic features for a home based stroke rehabilitation system will be presented. Results of the most challenging classifier are shown and proves the effectiveness of the design. Comparison between modern classification techniques and low computational cost threshold based classification with same features will also be shown.

ContributorsCheng, Long (Author) / Turaga, Pavan (Thesis advisor) / Arizona State University (Publisher)

Created2012

Upper body motion analysis using kinect for stroke rehabilitation at the home

Description

Motion capture using cost-effective sensing technology is challenging and the huge success of Microsoft Kinect has been attracting researchers to uncover the potential of using this technology into computer vision applications. In this thesis, an upper-body motion analysis in a home-based system for stroke rehabilitation using novel RGB-D camera -…

Motion capture using cost-effective sensing technology is challenging and the huge success of Microsoft Kinect has been attracting researchers to uncover the potential of using this technology into computer vision applications. In this thesis, an upper-body motion analysis in a home-based system for stroke rehabilitation using novel RGB-D camera - Kinect is presented. We address this problem by ﬁrst conducting a systematic analysis of the usability of Kinect for motion analysis in stroke rehabilitation. Then a hybrid upper body tracking approach is proposed which combines off-the-shelf skeleton tracking with a novel depth-fused mean shift tracking method. We proposed several kinematic features reliably extracted from the proposed inexpensive and portable motion capture system and classiﬁers that correlate torso movement to clinical measures of unimpaired and impaired. Experiment results show that the proposed sensing and analysis works reliably on measuring torso movement quality and is promising for end-point tracking. The system is currently being deployed for large-scale evaluations.

ContributorsDu, Tingfang (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Rikakis, Thanassis (Committee member) / Arizona State University (Publisher)

Created2012

Decentralized information search

Description

Our research focuses on finding answers through decentralized search, for complex, imprecise queries (such as "Which is the best hair salon nearby?") in situations where there is a spatiotemporal constraint (say answer needs to be found within 15 minutes) associated with the query. In general, human networks are good in…

Our research focuses on finding answers through decentralized search, for complex, imprecise queries (such as "Which is the best hair salon nearby?") in situations where there is a spatiotemporal constraint (say answer needs to be found within 15 minutes) associated with the query. In general, human networks are good in answering imprecise queries. We try to use the social network of a person to answer his query. Our research aims at designing a framework that exploits the user's social network in order to maximize the answers for a given query. Exploiting an user's social network has several challenges. The major challenge is that the user's immediate social circle may not possess the answer for the given query, and hence the framework designed needs to carry out the query diffusion process across the network. The next challenge involves in finding the right set of seeds to pass the query to in the user's social circle. One other challenge is to incentivize people in the social network to respond to the query and thereby maximize the quality and quantity of replies. Our proposed framework is a mobile application where an individual can either respond to the query or forward it to his friends. We simulated the query diffusion process in three types of graphs: Small World, Random and Preferential Attachment. Given a type of network and a particular query, we carried out the query diffusion by selecting seeds based on attributes of the seed. The main attributes are Topic relevance, Replying or Forwarding probability and Time to Respond. We found that there is a considerable increase in the number of replies attained, even without saturating the user's network, if we adopt an optimal seed selection process. We found the output of the optimal algorithm to be satisfactory as the number of replies received at the interrogator's end was close to three times the number of neighbors an interrogator has. We addressed the challenge of incentivizing people to respond by associating a particular amount of points for each query asked, and awarding the same to people involved in answering the query. Thus, we aim to design a mobile application based on our proposed framework so that it helps in maximizing the replies for the interrogator's query by diffusing the query across his/her social network.

ContributorsSwaminathan, Neelakantan (Author) / Sundaram, Hari (Thesis advisor) / Davulcu, Hasan (Thesis advisor) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2013

Behavior of colloids with anisotropic diffusivities

Description

Locomotion of microorganisms is commonly observed in nature and some aspects of their motion can be replicated by synthetic motors. Synthetic motors rely on a variety of propulsion mechanisms including auto-diffusiophoresis, auto-electrophoresis, and bubble generation. Regardless of the source of the locomotion, the motion of any motor can be characterized…

Locomotion of microorganisms is commonly observed in nature and some aspects of their motion can be replicated by synthetic motors. Synthetic motors rely on a variety of propulsion mechanisms including auto-diffusiophoresis, auto-electrophoresis, and bubble generation. Regardless of the source of the locomotion, the motion of any motor can be characterized by the translational and rotational velocity and effective diffusivity. In a uniform environment the long-time motion of a motor can be fully characterized by the effective diffusivity. In this work it is shown that when motors possess both translational and rotational velocity the motor transitions from a short-time diffusivity to a long-time diffusivity at a time of pi/w. The short-time diffusivities are two to three orders of magnitude larger than the diffusivity of a Brownian sphere of the same size, increase linearly with concentration, and scale as v^2/2w. The measured long-time diffusivities are five times lower than the short-time diffusivities, scale as v^2/{2Dr [1 + (w/Dr )^2]}, and exhibit a maximum as a function of concentration. The variation of a colloid's velocity and effective diffusivity to its local environment (e.g. fuel concentration) suggests that the motors can accumulate in a bounded system, analogous to biological chemokinesis. Chemokinesis of organisms is the non-uniform equilibrium concentration that arises from a bounded random walk of swimming organisms in a chemical concentration gradient. In non-swimming organisms we term this response diffusiokinesis. We show that particles that migrate only by Brownian thermal motion are capable of achieving non-uniform pseudo equilibrium distribution in a diffusivity gradient. The concentration is a result of a bounded random-walk process where at any given time a larger percentage of particles can be found in the regions of low diffusivity than in regions of high diffusivity. Individual particles are not trapped in any given region but at equilibrium the net flux between regions is zero. For Brownian particles the gradient in diffusivity is achieved by creating a viscosity gradient in a microfluidic device. The distribution of the particles is described by the Fokker-Planck equation for variable diffusivity. The strength of the probe concentration gradient is proportional to the strength of the diffusivity gradient and inversely proportional to the mean probe diffusivity in the channel in accordance with the no flux condition at steady state. This suggests that Brownian colloids, natural or synthetic, will concentrate in a bounded system in response to a gradient in diffusivity and that the magnitude of the response is proportional to the magnitude of the gradient in diffusivity divided by the mean diffusivity in the channel.

ContributorsMarine, Nathan Arasmus (Author) / Posner, Jonathan D (Thesis advisor) / Adrian, Ronald J (Committee member) / Frakes, David (Committee member) / Phelan, Patrick E (Committee member) / Santos, Veronica J (Committee member) / Arizona State University (Publisher)

Created2013

Exploring video denoising using matrix completion

Description

Video denoising has been an important task in many multimedia and computer vision applications. Recent developments in the matrix completion theory and emergence of new numerical methods which can efficiently solve the matrix completion problem have paved the way for exploration of new techniques for some classical image processing tasks.…

Video denoising has been an important task in many multimedia and computer vision applications. Recent developments in the matrix completion theory and emergence of new numerical methods which can efficiently solve the matrix completion problem have paved the way for exploration of new techniques for some classical image processing tasks. Recent literature shows that many computer vision and image processing problems can be solved by using the matrix completion theory. This thesis explores the application of matrix completion in video denoising. A state-of-the-art video denoising algorithm in which the denoising task is modeled as a matrix completion problem is chosen for detailed study. The contribution of this thesis lies in both providing extensive analysis to bridge the gap in existing literature on matrix completion frame work for video denoising and also in proposing some novel techniques to improve the performance of the chosen denoising algorithm. The chosen algorithm is implemented for thorough analysis. Experiments and discussions are presented to enable better understanding of the problem. Instability shown by the algorithm at some parameter values in a particular case of low levels of pure Gaussian noise is identified. Artifacts introduced in such cases are analyzed. A novel way of grouping structurally-relevant patches is proposed to improve the algorithm. Experiments show that this technique is useful, especially in videos containing high amounts of motion. Based on the observation that matrix completion is not suitable for denoising patches containing relatively low amount of image details, a framework is designed to separate patches corresponding to low structured regions from a noisy image. Experiments are conducted by not subjecting such patches to matrix completion, instead denoising such patches in a different way. The resulting improvement in performance suggests that denoising low structured patches does not require a complex method like matrix completion and in fact it is counter-productive to subject such patches to matrix completion. These results also indicate the inherent limitation of matrix completion to deal with cases in which noise dominates the structural properties of an image. A novel method for introducing priorities to the ranked patches in matrix completion is also presented. Results showed that this method yields improved performance in general. It is observed that the artifacts in presence of low levels of pure Gaussian noise appear differently after introducing priorities to the patches and the artifacts occur at a wider range of parameter values. Results and discussion suggesting future ways to explore this problem are also presented.

ContributorsMaguluri, Hima Bindu (Author) / Li, Baoxin (Thesis advisor) / Turaga, Pavan (Committee member) / Claveau, Claude (Committee member) / Arizona State University (Publisher)

Created2013

Structural and optical properties of II-VI and III-V compound semiconductors

Description

This dissertation is on the study of structural and optical properties of some III-V and II-VI compound semiconductors. The first part of this dissertation is a study of the deformation mechanisms associated with nanoindentation and nanoscratching of InP, GaN, and ZnO crystals. The second part is an investigation of some…

This dissertation is on the study of structural and optical properties of some III-V and II-VI compound semiconductors. The first part of this dissertation is a study of the deformation mechanisms associated with nanoindentation and nanoscratching of InP, GaN, and ZnO crystals. The second part is an investigation of some fundamental issues regarding compositional fluctuations and microstructure in GaInNAs and InAlN alloys. In the first part, the microstructure of (001) InP scratched in an atomic force microscope with a small diamond tip has been studied as a function of applied normal force and crystalline direction in order to understand at the nanometer scale the deformation mechanisms in the zinc-blende structure. TEM images show deeper dislocation propagation for scratches along <110> compared to <100>. High strain fields were observed in <100> scratches, indicating hardening due to locking of dislocations gliding on different slip planes. Reverse plastic flow have been observed in <110> scratches in the form of pop-up events that result from recovery of stored elastic strain. In a separate study, nanoindentation-induced plastic deformation has been studied in c-, a-, and m-plane ZnO single crystals and c-plane GaN respectively, to study the deformation mechanism in wurtzite hexagonal structures. TEM results reveal that the prime deformation mechanism is slip on basal planes and in some cases, on pyramidal planes, and strain built up along particular directions. No evidence of phase transformation or cracking was observed in both materials. CL imaging reveals quenching of near band-edge emission by dislocations. In the second part, compositional inhomogeneity in quaternary GaInNAs and ternary InAlN alloys has been studied using TEM. It is shown that exposure to antimony during growth of GaInNAs results in uniform chemical composition in the epilayer, as antimony suppresses the surface mobility of adatoms that otherwise leads to two-dimensional growth and elemental segregation. In a separate study, compositional instability is observed in lattice-matched InAlN films grown on GaN, for growth beyond a certain thickness. Beyond 200 nm of thickness, two sub-layers with different indium content are observed, the top one with lower indium content.

ContributorsHuang, Jingyi (Author) / Ponce, Fernando A. (Thesis advisor) / Carpenter, Ray W (Committee member) / Smith, David J. (Committee member) / Yu, Hongbin (Committee member) / Treacy, Michael Mj (Committee member) / Arizona State University (Publisher)

Created2013

Compiler and runtime for memory management on software managed manycore processors

Description

We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale…

We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale to hundreds and thousands of cores. In addition, caches and coherence logic already take 20-50% of the total power consumption of the processor and 30-60% of die area. Therefore, a more scalable architecture is needed for manycore architectures. Software Managed Manycore (SMM) architectures emerge as a solution. They have scalable memory design in which each core has direct access to only its local scratchpad memory, and any data transfers to/from other memories must be done explicitly in the application using Direct Memory Access (DMA) commands. Lack of automatic memory management in the hardware makes such architectures extremely power-efficient, but they also become difficult to program. If the code/data of the task mapped onto a core cannot fit in the local scratchpad memory, then DMA calls must be added to bring in the code/data before it is required, and it may need to be evicted after its use. However, doing this adds a lot of complexity to the programmer's job. Now programmers must worry about data management, on top of worrying about the functional correctness of the program - which is already quite complex. This dissertation presents a comprehensive compiler and runtime integration to automatically manage the code and data of each task in the limited local memory of the core. We firstly developed a Complete Circular Stack Management. It manages stack frames between the local memory and the main memory, and addresses the stack pointer problem as well. Though it works, we found we could further optimize the management for most cases. Thus a Smart Stack Data Management (SSDM) is provided. In this work, we formulate the stack data management problem and propose a greedy algorithm for the same. Later on, we propose a general cost estimation algorithm, based on which CMSM heuristic for code mapping problem is developed. Finally, heap data is dynamic in nature and therefore it is hard to manage it. We provide two schemes to manage unlimited amount of heap data in constant sized region in the local memory. In addition to those separate schemes for different kinds of data, we also provide a memory partition methodology.

ContributorsBai, Ke (Author) / Shrivastava, Aviral (Thesis advisor) / Chatha, Karamvir (Committee member) / Xue, Guoliang (Committee member) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)

Created2014