Search Content

Towards energy efficient computing with Linux: enabling task level power awareness and support for energy efficient accelerator

Description

With increasing transistor volume and reducing feature size, it has become a major design constraint to reduce power consumption also. This has given rise to aggressive architectural changes for on-chip power management and rapid development to energy efficient hardware accelerators. Accordingly, the objective of this research work is to facilitate…

With increasing transistor volume and reducing feature size, it has become a major design constraint to reduce power consumption also. This has given rise to aggressive architectural changes for on-chip power management and rapid development to energy efficient hardware accelerators. Accordingly, the objective of this research work is to facilitate software developers to leverage these hardware techniques and improve energy efficiency of the system. To achieve this, I propose two solutions for Linux kernel: Optimal use of these architectural enhancements to achieve greater energy efficiency requires accurate modeling of processor power consumption. Though there are many models available in literature to model processor power consumption, there is a lack of such models to capture power consumption at the task-level. Task-level energy models are a requirement for an operating system (OS) to perform real-time power management as OS time multiplexes tasks to enable sharing of hardware resources. I propose a detailed design methodology for constructing an architecture agnostic task-level power model and incorporating it into a modern operating system to build an online task-level power profiler. The profiler is implemented inside the latest Linux kernel and validated for Intel Sandy Bridge processor. It has a negligible overhead of less than 1\% hardware resource consumption. The profiler power prediction was demonstrated for various application benchmarks from SPEC to PARSEC with less than 4\% error. I also demonstrate the importance of the proposed profiler for emerging architectural techniques through use case scenarios, which include heterogeneous computing and fine grained per-core DVFS. Along with architectural enhancement in general purpose processors to improve energy efficiency, hardware accelerators like Coarse Grain reconfigurable architecture (CGRA) are gaining popularity. Unlike vector processors, which rely on data parallelism, CGRA can provide greater flexibility and compiler level control making it more suitable for present SoC environment. To provide streamline development environment for CGRA, I propose a flexible framework in Linux to do design space exploration for CGRA. With accurate and flexible hardware models, fine grained integration with accurate architectural simulator, and Linux memory management and DMA support, a user can carry out limitless experiments on CGRA in full system environment.

ContributorsDesai, Digant Pareshkumar (Author) / Vrudhula, Sarma (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)

Created2013

Advances in micromechanics modeling of composites structures for structural health monitoring

Description

Although high performance, light-weight composites are increasingly being used in applications ranging from aircraft, rotorcraft, weapon systems and ground vehicles, the assurance of structural reliability remains a critical issue. In composites, damage is absorbed through various fracture processes, including fiber failure, matrix cracking and delamination. An important element in achieving…

Although high performance, light-weight composites are increasingly being used in applications ranging from aircraft, rotorcraft, weapon systems and ground vehicles, the assurance of structural reliability remains a critical issue. In composites, damage is absorbed through various fracture processes, including fiber failure, matrix cracking and delamination. An important element in achieving reliable composite systems is a strong capability of assessing and inspecting physical damage of critical structural components. Installation of a robust Structural Health Monitoring (SHM) system would be very valuable in detecting the onset of composite failure. A number of major issues still require serious attention in connection with the research and development aspects of sensor-integrated reliable SHM systems for composite structures. In particular, the sensitivity of currently available sensor systems does not allow detection of micro level damage; this limits the capability of data driven SHM systems. As a fundamental layer in SHM, modeling can provide in-depth information on material and structural behavior for sensing and detection, as well as data for learning algorithms. This dissertation focusses on the development of a multiscale analysis framework, which is used to detect various forms of damage in complex composite structures. A generalized method of cells based micromechanics analysis, as implemented in NASA's MAC/GMC code, is used for the micro-level analysis. First, a baseline study of MAC/GMC is performed to determine the governing failure theories that best capture the damage progression. The deficiencies associated with various layups and loading conditions are addressed. In most micromechanics analysis, a representative unit cell (RUC) with a common fiber packing arrangement is used. The effect of variation in this arrangement within the RUC has been studied and results indicate this variation influences the macro-scale effective material properties and failure stresses. The developed model has been used to simulate impact damage in a composite beam and an airfoil structure. The model data was verified through active interrogation using piezoelectric sensors. The multiscale model was further extended to develop a coupled damage and wave attenuation model, which was used to study different damage states such as fiber-matrix debonding in composite structures with surface bonded piezoelectric sensors.

ContributorsMoncada, Albert (Author) / Chattopadhyay, Aditi (Thesis advisor) / Dai, Lenore (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Rajadas, John (Committee member) / Yekani Fard, Masoud (Committee member) / Arizona State University (Publisher)

Created2012

Adaptive filter bank time-frequency representations

Description

A signal with time-varying frequency content can often be expressed more clearly using a time-frequency representation (TFR), which maps the signal into a two-dimensional function of time and frequency, similar to musical notation. The thesis reviews one of the most commonly used TFRs, the Wigner distribution (WD), and discusses its…

A signal with time-varying frequency content can often be expressed more clearly using a time-frequency representation (TFR), which maps the signal into a two-dimensional function of time and frequency, similar to musical notation. The thesis reviews one of the most commonly used TFRs, the Wigner distribution (WD), and discusses its application in Fourier optics: it is shown that the WD is analogous to the spectral dispersion that results from a diffraction grating, and time and frequency are similarly analogous to a one dimensional spatial coordinate and wavenumber. The grating is compared with a simple polychromator, which is a bank of optical filters. Another well-known TFR is the short time Fourier transform (STFT). Its discrete version can be shown to be equivalent to a filter bank, an array of bandpass filters that enable localized processing of the analysis signals in different sub-bands. This work proposes a signal-adaptive method of generating TFRs. In order to minimize distortion in analyzing a signal, the method modifies the filter bank to consist of non-overlapping rectangular bandpass filters generated using the Butterworth filter design process. The information contained in the resulting TFR can be used to reconstruct the signal, and perfect reconstruction techniques involving quadrature mirror filter banks are compared with a simple Fourier synthesis sum. The optimal filter parameters of the rectangular filters are selected adaptively by minimizing the mean-squared error (MSE) from a pseudo-reconstructed version of the analysis signal. The reconstruction MSE is proposed as an error metric for characterizing TFRs; a practical measure of the error requires normalization and cross correlation with the analysis signal. Simulations were performed to demonstrate the the effectiveness of the new adaptive TFR and its relation to swept-tuned spectrum analyzers.

ContributorsWeber, Peter C. (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Tepedelenlioğlu, Cihan (Committee member) / Kovvali, Narayan (Committee member) / Arizona State University (Publisher)

Created2012

Alkali activated systems: understanding the influence of curing conditions and activator type/chemistry on the mechanical strength and chemical structure of fly ash/slag systems

Description

The alkali activation of aluminosilicate materials as binder systems derived from industrial byproducts have been extensively studied due to the advantages they offer in terms enhanced material properties, while increasing sustainability by the reuse of industrial waste and byproducts and reducing the adverse impacts of OPC production. Fly ash and…

The alkali activation of aluminosilicate materials as binder systems derived from industrial byproducts have been extensively studied due to the advantages they offer in terms enhanced material properties, while increasing sustainability by the reuse of industrial waste and byproducts and reducing the adverse impacts of OPC production. Fly ash and ground granulated blast furnace slag are commonly used for their content of soluble silica and aluminate species that can undergo dissolution, polymerization with the alkali, condensation on particle surfaces and solidification. The following topics are the focus of this thesis: (i) the use of microwave assisted thermal processing, in addition to heat-curing as a means of alkali activation and (ii) the relative effects of alkali cations (K or Na) in the activator (powder activators) on the mechanical properties and chemical structure of these systems. Unsuitable curing conditions instigate carbonation, which in turn lowers the pH of the system causing significant reductions in the rate of fly ash activation and mechanical strength development. This study explores the effects of sealing the samples during the curing process, which effectively traps the free water in the system, and allows for increased aluminosilicate activation. The use of microwave-curing in lieu of thermal-curing is also studied in order to reduce energy consumption and for its ability to provide fast volumetric heating. Potassium-based powder activators dry blended into the slag binder system is shown to be effective in obtaining very high compressive strengths under moist curing conditions (greater than 70 MPa), whereas sodium-based powder activation is much weaker (around 25 MPa). Compressive strength decreases when fly ash is introduced into the system. Isothermal calorimetry is used to evaluate the early hydration process, and to understand the reaction kinetics of the alkali powder activated systems. A qualitative evidence of the alkali-hydroxide concentration of the paste pore solution through the use of electrical conductivity measurements is also presented, with the results indicating the ion concentration of alkali is more prevalent in the pore solution of potassium-based systems. The use of advanced spectroscopic and thermal analysis techniques to distinguish the influence of studied parameters is also discussed.

ContributorsChowdhury, Ussala (Author) / Neithalath, Narayanan (Thesis advisor) / Rajan, Subramanium D. (Committee member) / Mobasher, Barzin (Committee member) / Arizona State University (Publisher)

Created2013

Distributed inference using bounded transmissions

Description

Distributed inference has applications in a wide range of fields such as source localization, target detection, environment monitoring, and healthcare. In this dissertation, distributed inference schemes which use bounded transmit power are considered. The performance of the proposed schemes are studied for a variety of inference problems. In the first…

Distributed inference has applications in a wide range of fields such as source localization, target detection, environment monitoring, and healthcare. In this dissertation, distributed inference schemes which use bounded transmit power are considered. The performance of the proposed schemes are studied for a variety of inference problems. In the first part of the dissertation, a distributed detection scheme where the sensors transmit with constant modulus signals over a Gaussian multiple access channel is considered. The deflection coefficient of the proposed scheme is shown to depend on the characteristic function of the sensing noise, and the error exponent for the system is derived using large deviation theory. Optimization of the deflection coefficient and error exponent are considered with respect to a transmission phase parameter for a variety of sensing noise distributions including impulsive ones. The proposed scheme is also favorably compared with existing amplify-and-forward (AF) and detect-and-forward (DF) schemes. The effect of fading is shown to be detrimental to the detection performance and simulations are provided to corroborate the analytical results. The second part of the dissertation studies a distributed inference scheme which uses bounded transmission functions over a Gaussian multiple access channel. The conditions on the transmission functions under which consistent estimation and reliable detection are possible is characterized. For the distributed estimation problem, an estimation scheme that uses bounded transmission functions is proved to be strongly consistent provided that the variance of the noise samples are bounded and that the transmission function is one-to-one. The proposed estimation scheme is compared with the amplify and forward technique and its robustness to impulsive sensing noise distributions is highlighted. It is also shown that bounded transmissions suffer from inconsistent estimates if the sensing noise variance goes to infinity. For the distributed detection problem, similar results are obtained by studying the deflection coefficient. Simulations corroborate our analytical results. In the third part of this dissertation, the problem of estimating the average of samples distributed at the nodes of a sensor network is considered. A distributed average consensus algorithm in which every sensor transmits with bounded peak power is proposed. In the presence of communication noise, it is shown that the nodes reach consensus asymptotically to a finite random variable whose expectation is the desired sample average of the initial observations with a variance that depends on the step size of the algorithm and the variance of the communication noise. The asymptotic performance is characterized by deriving the asymptotic covariance matrix using results from stochastic approximation theory. It is shown that using bounded transmissions results in slower convergence compared to the linear consensus algorithm based on the Laplacian heuristic. Simulations corroborate our analytical findings. Finally, a robust distributed average consensus algorithm in which every sensor performs a nonlinear processing at the receiver is proposed. It is shown that non-linearity at the receiver nodes makes the algorithm robust to a wide range of channel noise distributions including the impulsive ones. It is shown that the nodes reach consensus asymptotically and similar results are obtained as in the case of transmit non-linearity. Simulations corroborate our analytical findings and highlight the robustness of the proposed algorithm.

ContributorsDasarathan, Sivaraman (Author) / Tepedelenlioğlu, Cihan (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Reisslein, Martin (Committee member) / Goryll, Michael (Committee member) / Arizona State University (Publisher)

Created2013

Some applications of vector fitting in the solution of electromagnetic fields and interactions

Description

Vector Fitting (VF) is a recent macromodeling method that has been popularized by its use in many commercial software for extracting equivalent circuit's of simulated networks. Specifically for material measurement applications, VF is shown to estimate either the permittivity or permeability of a multi-Debye material accurately, even when measured in…

Vector Fitting (VF) is a recent macromodeling method that has been popularized by its use in many commercial software for extracting equivalent circuit's of simulated networks. Specifically for material measurement applications, VF is shown to estimate either the permittivity or permeability of a multi-Debye material accurately, even when measured in the presence of noise and interferences caused by test setup imperfections. A brief history and survey of methods utilizing VF for material measurement will be introduced in this work. It is shown how VF is useful for macromodeling dielectric materials after being measured with standard transmission line and free-space methods. The sources of error in both an admittance tunnel test device and stripline resonant cavity test device are identified and VF is employed for correcting these errors. Full-wave simulations are performed to model the test setup imperfections and the sources of interference they cause are further verified in actual hardware measurements. An accurate macromodel is attained as long as the signal-to-interference-ratio (SIR) in the measurement is sufficiently high such that the Debye relaxations are observable in the data. Finally, VF is applied for macromodeling the time history of the total fields scattering from a perfectly conducting wedge. This effort is an initial test to see if a time domain theory of diffraction exists, and if the diffraction coefficients may be exactly modeled with VF. This section concludes how VF is not only useful for applications in material measurement, but for the solution of modeling fields and interactions in general.

ContributorsRichards, Evan (Author) / Diaz, Rodolfo E (Thesis advisor) / Tsakalis, Konstantinos (Committee member) / Platte, Rodrigo (Committee member) / Arizona State University (Publisher)

Created2013

Compiler and runtime for memory management on software managed manycore processors

Description

We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale…

We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale to hundreds and thousands of cores. In addition, caches and coherence logic already take 20-50% of the total power consumption of the processor and 30-60% of die area. Therefore, a more scalable architecture is needed for manycore architectures. Software Managed Manycore (SMM) architectures emerge as a solution. They have scalable memory design in which each core has direct access to only its local scratchpad memory, and any data transfers to/from other memories must be done explicitly in the application using Direct Memory Access (DMA) commands. Lack of automatic memory management in the hardware makes such architectures extremely power-efficient, but they also become difficult to program. If the code/data of the task mapped onto a core cannot fit in the local scratchpad memory, then DMA calls must be added to bring in the code/data before it is required, and it may need to be evicted after its use. However, doing this adds a lot of complexity to the programmer's job. Now programmers must worry about data management, on top of worrying about the functional correctness of the program - which is already quite complex. This dissertation presents a comprehensive compiler and runtime integration to automatically manage the code and data of each task in the limited local memory of the core. We firstly developed a Complete Circular Stack Management. It manages stack frames between the local memory and the main memory, and addresses the stack pointer problem as well. Though it works, we found we could further optimize the management for most cases. Thus a Smart Stack Data Management (SSDM) is provided. In this work, we formulate the stack data management problem and propose a greedy algorithm for the same. Later on, we propose a general cost estimation algorithm, based on which CMSM heuristic for code mapping problem is developed. Finally, heap data is dynamic in nature and therefore it is hard to manage it. We provide two schemes to manage unlimited amount of heap data in constant sized region in the local memory. In addition to those separate schemes for different kinds of data, we also provide a memory partition methodology.

ContributorsBai, Ke (Author) / Shrivastava, Aviral (Thesis advisor) / Chatha, Karamvir (Committee member) / Xue, Guoliang (Committee member) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)

Created2014

Towards adaptive micro-robotic neural interfaces: autonomous navigation of microelectrodes in the brain for optimal neural recording

Description

Advances in implantable MEMS technology has made possible adaptive micro-robotic implants that can track and record from single neurons in the brain. Development of autonomous neural interfaces opens up exciting possibilities of micro-robots performing standard electrophysiological techniques that would previously take researchers several hundred hours to train and achieve the…

Advances in implantable MEMS technology has made possible adaptive micro-robotic implants that can track and record from single neurons in the brain. Development of autonomous neural interfaces opens up exciting possibilities of micro-robots performing standard electrophysiological techniques that would previously take researchers several hundred hours to train and achieve the desired skill level. It would result in more reliable and adaptive neural interfaces that could record optimal neural activity 24/7 with high fidelity signals, high yield and increased throughput. The main contribution here is validating adaptive strategies to overcome challenges in autonomous navigation of microelectrodes inside the brain. The following issues pose significant challenges as brain tissue is both functionally and structurally dynamic: a) time varying mechanical properties of the brain tissue-microelectrode interface due to the hyperelastic, viscoelastic nature of brain tissue b) non-stationarities in the neural signal caused by mechanical and physiological events in the interface and c) the lack of visual feedback of microelectrode position in brain tissue. A closed loop control algorithm is proposed here for autonomous navigation of microelectrodes in brain tissue while optimizing the signal-to-noise ratio of multi-unit neural recordings. The algorithm incorporates a quantitative understanding of constitutive mechanical properties of soft viscoelastic tissue like the brain and is guided by models that predict stresses developed in brain tissue during movement of the microelectrode. An optimal movement strategy is developed that achieves precise positioning of microelectrodes in the brain by minimizing the stresses developed in the surrounding tissue during navigation and maximizing the speed of movement. Results of testing the closed-loop control paradigm in short-term rodent experiments validated that it was possible to achieve a consistently high quality SNR throughout the duration of the experiment. At the systems level, new generation of MEMS actuators for movable microelectrode array are characterized and the MEMS device operation parameters are optimized for improved performance and reliability. Further, recommendations for packaging to minimize the form factor of the implant; design of device mounting and implantation techniques of MEMS microelectrode array to enhance the longevity of the implant are also included in a top-down approach to achieve a reliable brain interface.

ContributorsAnand, Sindhu (Author) / Muthuswamy, Jitendran (Thesis advisor) / Tillery, Stephen H (Committee member) / Buneo, Christopher (Committee member) / Abbas, James (Committee member) / Tsakalis, Konstantinos (Committee member) / Arizona State University (Publisher)

Created2013

Adaptive learning and unsupervised clustering of immune responses using microarray random sequence peptides

Description

Immunosignaturing is a medical test for assessing the health status of a patient by applying microarrays of random sequence peptides to determine the patient's immune fingerprint by associating antibodies from a biological sample to immune responses. The immunosignature measurements can potentially provide pre-symptomatic diagnosis for infectious diseases or detection of…

Immunosignaturing is a medical test for assessing the health status of a patient by applying microarrays of random sequence peptides to determine the patient's immune fingerprint by associating antibodies from a biological sample to immune responses. The immunosignature measurements can potentially provide pre-symptomatic diagnosis for infectious diseases or detection of biological threats. Currently, traditional bioinformatics tools, such as data mining classification algorithms, are used to process the large amount of peptide microarray data. However, these methods generally require training data and do not adapt to changing immune conditions or additional patient information. This work proposes advanced processing techniques to improve the classification and identification of single and multiple underlying immune response states embedded in immunosignatures, making it possible to detect both known and previously unknown diseases or biothreat agents. Novel adaptive learning methodologies for un- supervised and semi-supervised clustering integrated with immunosignature feature extraction approaches are proposed. The techniques are based on extracting novel stochastic features from microarray binding intensities and use Dirichlet process Gaussian mixture models to adaptively cluster the immunosignatures in the feature space. This learning-while-clustering approach allows continuous discovery of antibody activity by adaptively detecting new disease states, with limited a priori disease or patient information. A beta process factor analysis model to determine underlying patient immune responses is also proposed to further improve the adaptive clustering performance by formatting new relationships between patients and antibody activity. In order to extend the clustering methods for diagnosing multiple states in a patient, the adaptive hierarchical Dirichlet process is integrated with modified beta process factor analysis latent feature modeling to identify relationships between patients and infectious agents. The use of Bayesian nonparametric adaptive learning techniques allows for further clustering if additional patient data is received. Significant improvements in feature identification and immune response clustering are demonstrated using samples from patients with different diseases.

ContributorsMalin, Anna (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Bliss, Daniel (Committee member) / Chakrabarti, Chaitali (Committee member) / Kovvali, Narayan (Committee member) / Lacroix, Zoé (Committee member) / Arizona State University (Publisher)

Created2013

Non-holonomic differential drive mobile robot control & design: critical dynamics and coupling constraints

Description

Mobile robots are used in a broad range of application areas; e.g. search and rescue, reconnaissance, exploration, etc. Given the increasing need for high performance mobile robots, the area has received attention by researchers. In this thesis, critical control and control-relevant design issues for differential drive mobile robots is addressed.…

Mobile robots are used in a broad range of application areas; e.g. search and rescue, reconnaissance, exploration, etc. Given the increasing need for high performance mobile robots, the area has received attention by researchers. In this thesis, critical control and control-relevant design issues for differential drive mobile robots is addressed. Two major themes that have been explored are the use of kinematic models for control design and the use of decentralized proportional plus integral (PI) control. While these topics have received much attention, there still remain critical questions which have not been rigorously addressed. In this thesis, answers to the following critical questions are provided: When is 1. a kinematic model sufficient for control design? 2. coupled dynamics essential? 3. a decentralized PI inner loop velocity controller sufficient? 4. centralized multiple-input multiple-output (MIMO) control essential? and how can one design the robot to relax the requirements implied in 1 and 2? In this thesis, the following is shown: 1. The nonlinear kinematic model will suffice for control design when the inner velocity (dynamic) loop is much faster (10X) than the slower outer positioning loop. 2. A dynamic model is essential when the inner velocity (dynamic) loop is less than two times faster than the slower outer positioning loop. 3. A decentralized inner loop PI velocity controller will be sufficient for accomplish- ing high performance control when the required velocity bandwidth is small, rel- ative to the peak dynamic coupling frequency. A rule-of-thumb which depends on the robot aspect ratio is given. 4. A centralized MIMO velocity controller is needed when the required bandwidth is large, relative to the peak dynamic coupling frequency. Here, the analysis in the thesis is sparse making the topic an area for future analytical work. Despite this, it is clearly shown that a centralized MIMO inner loop controller can offer increased performance vis- ́a-vis a decentralized PI controller. 5. Finally, it is shown how the dynamic coupling depends on the robot aspect ratio and how the coupling can be significantly reduced. As such, this can be used to ease the requirements imposed by 2 and 4 above.

ContributorsAnvari, Iman (Author) / Rodriguez, Armando A (Thesis advisor) / Si, Jenni (Committee member) / Tsakalis, Konstantinos (Committee member) / Arizona State University (Publisher)

Created2013