Search Content

Towards energy efficient computing with Linux: enabling task level power awareness and support for energy efficient accelerator

Description

With increasing transistor volume and reducing feature size, it has become a major design constraint to reduce power consumption also. This has given rise to aggressive architectural changes for on-chip power management and rapid development to energy efficient hardware accelerators. Accordingly, the objective of this research work is to facilitate…

With increasing transistor volume and reducing feature size, it has become a major design constraint to reduce power consumption also. This has given rise to aggressive architectural changes for on-chip power management and rapid development to energy efficient hardware accelerators. Accordingly, the objective of this research work is to facilitate software developers to leverage these hardware techniques and improve energy efficiency of the system. To achieve this, I propose two solutions for Linux kernel: Optimal use of these architectural enhancements to achieve greater energy efficiency requires accurate modeling of processor power consumption. Though there are many models available in literature to model processor power consumption, there is a lack of such models to capture power consumption at the task-level. Task-level energy models are a requirement for an operating system (OS) to perform real-time power management as OS time multiplexes tasks to enable sharing of hardware resources. I propose a detailed design methodology for constructing an architecture agnostic task-level power model and incorporating it into a modern operating system to build an online task-level power profiler. The profiler is implemented inside the latest Linux kernel and validated for Intel Sandy Bridge processor. It has a negligible overhead of less than 1\% hardware resource consumption. The profiler power prediction was demonstrated for various application benchmarks from SPEC to PARSEC with less than 4\% error. I also demonstrate the importance of the proposed profiler for emerging architectural techniques through use case scenarios, which include heterogeneous computing and fine grained per-core DVFS. Along with architectural enhancement in general purpose processors to improve energy efficiency, hardware accelerators like Coarse Grain reconfigurable architecture (CGRA) are gaining popularity. Unlike vector processors, which rely on data parallelism, CGRA can provide greater flexibility and compiler level control making it more suitable for present SoC environment. To provide streamline development environment for CGRA, I propose a flexible framework in Linux to do design space exploration for CGRA. With accurate and flexible hardware models, fine grained integration with accurate architectural simulator, and Linux memory management and DMA support, a user can carry out limitless experiments on CGRA in full system environment.

ContributorsDesai, Digant Pareshkumar (Author) / Vrudhula, Sarma (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)

Created2013

Compiler and runtime for memory management on software managed manycore processors

Description

We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale…

We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale to hundreds and thousands of cores. In addition, caches and coherence logic already take 20-50% of the total power consumption of the processor and 30-60% of die area. Therefore, a more scalable architecture is needed for manycore architectures. Software Managed Manycore (SMM) architectures emerge as a solution. They have scalable memory design in which each core has direct access to only its local scratchpad memory, and any data transfers to/from other memories must be done explicitly in the application using Direct Memory Access (DMA) commands. Lack of automatic memory management in the hardware makes such architectures extremely power-efficient, but they also become difficult to program. If the code/data of the task mapped onto a core cannot fit in the local scratchpad memory, then DMA calls must be added to bring in the code/data before it is required, and it may need to be evicted after its use. However, doing this adds a lot of complexity to the programmer's job. Now programmers must worry about data management, on top of worrying about the functional correctness of the program - which is already quite complex. This dissertation presents a comprehensive compiler and runtime integration to automatically manage the code and data of each task in the limited local memory of the core. We firstly developed a Complete Circular Stack Management. It manages stack frames between the local memory and the main memory, and addresses the stack pointer problem as well. Though it works, we found we could further optimize the management for most cases. Thus a Smart Stack Data Management (SSDM) is provided. In this work, we formulate the stack data management problem and propose a greedy algorithm for the same. Later on, we propose a general cost estimation algorithm, based on which CMSM heuristic for code mapping problem is developed. Finally, heap data is dynamic in nature and therefore it is hard to manage it. We provide two schemes to manage unlimited amount of heap data in constant sized region in the local memory. In addition to those separate schemes for different kinds of data, we also provide a memory partition methodology.

ContributorsBai, Ke (Author) / Shrivastava, Aviral (Thesis advisor) / Chatha, Karamvir (Committee member) / Xue, Guoliang (Committee member) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)

Created2014

Adaptive learning and unsupervised clustering of immune responses using microarray random sequence peptides

Description

Immunosignaturing is a medical test for assessing the health status of a patient by applying microarrays of random sequence peptides to determine the patient's immune fingerprint by associating antibodies from a biological sample to immune responses. The immunosignature measurements can potentially provide pre-symptomatic diagnosis for infectious diseases or detection of…

Immunosignaturing is a medical test for assessing the health status of a patient by applying microarrays of random sequence peptides to determine the patient's immune fingerprint by associating antibodies from a biological sample to immune responses. The immunosignature measurements can potentially provide pre-symptomatic diagnosis for infectious diseases or detection of biological threats. Currently, traditional bioinformatics tools, such as data mining classification algorithms, are used to process the large amount of peptide microarray data. However, these methods generally require training data and do not adapt to changing immune conditions or additional patient information. This work proposes advanced processing techniques to improve the classification and identification of single and multiple underlying immune response states embedded in immunosignatures, making it possible to detect both known and previously unknown diseases or biothreat agents. Novel adaptive learning methodologies for un- supervised and semi-supervised clustering integrated with immunosignature feature extraction approaches are proposed. The techniques are based on extracting novel stochastic features from microarray binding intensities and use Dirichlet process Gaussian mixture models to adaptively cluster the immunosignatures in the feature space. This learning-while-clustering approach allows continuous discovery of antibody activity by adaptively detecting new disease states, with limited a priori disease or patient information. A beta process factor analysis model to determine underlying patient immune responses is also proposed to further improve the adaptive clustering performance by formatting new relationships between patients and antibody activity. In order to extend the clustering methods for diagnosing multiple states in a patient, the adaptive hierarchical Dirichlet process is integrated with modified beta process factor analysis latent feature modeling to identify relationships between patients and infectious agents. The use of Bayesian nonparametric adaptive learning techniques allows for further clustering if additional patient data is received. Significant improvements in feature identification and immune response clustering are demonstrated using samples from patients with different diseases.

ContributorsMalin, Anna (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Bliss, Daniel (Committee member) / Chakrabarti, Chaitali (Committee member) / Kovvali, Narayan (Committee member) / Lacroix, Zoé (Committee member) / Arizona State University (Publisher)

Created2013

Scheduling neural sensors to estimate brain activity

Description

Research on developing new algorithms to improve information on brain functionality and structure is ongoing. Studying neural activity through dipole source localization with electroencephalography (EEG) and magnetoencephalography (MEG) sensor measurements can lead to diagnosis and treatment of a brain disorder and can also identify the area of the brain from…

Research on developing new algorithms to improve information on brain functionality and structure is ongoing. Studying neural activity through dipole source localization with electroencephalography (EEG) and magnetoencephalography (MEG) sensor measurements can lead to diagnosis and treatment of a brain disorder and can also identify the area of the brain from where the disorder has originated. Designing advanced localization algorithms that can adapt to environmental changes is considered a significant shift from manual diagnosis which is based on the knowledge and observation of the doctor, to an adaptive and improved brain disorder diagnosis as these algorithms can track activities that might not be noticed by the human eye. An important consideration of these localization algorithms, however, is to try and minimize the overall power consumption in order to improve the study and treatment of brain disorders. This thesis considers the problem of estimating dynamic parameters of neural dipole sources while minimizing the system's overall power consumption; this is achieved by minimizing the number of EEG/MEG measurements sensors without a loss in estimation performance accuracy. As the EEG/MEG measurements models are related non-linearity to the dipole source locations and moments, these dynamic parameters can be estimated using sequential Monte Carlo methods such as particle filtering. Due to the large number of sensors required to record EEG/MEG Measurements for use in the particle filter, over long period recordings, a large amounts of power is required for storage and transmission. In order to reduce the overall power consumption, two methods are proposed. The first method used the predicted mean square estimation error as the performance metric under the constraint of a maximum power consumption. The performance metric of the second method uses the distance between the location of the sensors and the location estimate of the dipole source at the previous time step; this sensor scheduling scheme results in maximizing the overall signal-to-noise ratio. The performance of both methods is demonstrated using simulated data, and both methods show that they can provide good estimation results with significant reduction in the number of activated sensors at each time step.

ContributorsMichael, Stefanos (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Kovvali, Narayan (Committee member) / Arizona State University (Publisher)

Created2012

A co-design modeling methodology for simulation of service oriented computing systems

Description

The adoption of the Service Oriented Architecture (SOA) as the foundation for developing a new generation of software systems - known as Service Based Software Systems (SBS), poses new challenges in system design. While simulation as a methodology serves a principal role in design, there is a growing recognition that…

The adoption of the Service Oriented Architecture (SOA) as the foundation for developing a new generation of software systems - known as Service Based Software Systems (SBS), poses new challenges in system design. While simulation as a methodology serves a principal role in design, there is a growing recognition that simulation of SBS requires modeling capabilities beyond those that are developed for the traditional distributed software systems. In particular, while different component-based modeling approaches may lend themselves to simulating the logical process flows in Service Oriented Computing (SOC) systems, they are inadequate in terms of supporting SOA-compliant modeling. Furthermore, composite services must satisfy multiple QoS attributes under constrained service reconfigurations and hardware resources. A key desired capability, therefore, is to model and simulate not only the services consistent with SOA concepts and principles, but also the hardware and network components on which services must execute on. In this dissertation, SOC-DEVS - a novel co-design modeling methodology that enables simulation of software and hardware aspects of SBS for early architectural design evaluation is developed. A set of abstractions representing important service characteristics and service relationships are modeled. The proposed software/hardware co-design simulation capability is introduced into the DEVS-Suite simulator. Exemplar simulation models of a communication intensive Voice Communication System and a computation intensive Encryption System are developed and then validated using data from an existing real system. The applicability of the SOC-DEVS methodology is demonstrated in a simulation testbed aimed at facilitating the design & development of SBS. Furthermore, the simulation testbed is extended by integrating an existing prototype monitoring and adaptation system with the simulator to support basic experimentation towards design & development of Adaptive SBS.

ContributorsMuqsith, Mohammed Abdul (Author) / Sarjoughian, Hessam S. (Thesis advisor) / Yau, Sik-Sang (Thesis advisor) / Huang, Dijiang (Committee member) / Tsai, Wei-Tek (Committee member) / Arizona State University (Publisher)

Created2011

Confidentiality protection of user data and adaptive resource allocation for managing multiple workflow performance in service-based systems

Description

In this dissertation, two interrelated problems of service-based systems (SBS) are addressed: protecting users' data confidentiality from service providers, and managing performance of multiple workflows in SBS. Current SBSs pose serious limitations to protecting users' data confidentiality. Since users' sensitive data is sent in unencrypted forms to remote machines owned…

In this dissertation, two interrelated problems of service-based systems (SBS) are addressed: protecting users' data confidentiality from service providers, and managing performance of multiple workflows in SBS. Current SBSs pose serious limitations to protecting users' data confidentiality. Since users' sensitive data is sent in unencrypted forms to remote machines owned and operated by third-party service providers, there are risks of unauthorized use of the users' sensitive data by service providers. Although there are many techniques for protecting users' data from outside attackers, currently there is no effective way to protect users' sensitive data from service providers. In this dissertation, an approach is presented to protecting the confidentiality of users' data from service providers, and ensuring that service providers cannot collect users' confidential data while the data is processed or stored in cloud computing systems. The approach has four major features: (1) separation of software service providers and infrastructure service providers, (2) hiding the information of the owners of data, (3) data obfuscation, and (4) software module decomposition and distributed execution. Since the approach to protecting users' data confidentiality includes software module decomposition and distributed execution, it is very important to effectively allocate the resource of servers in SBS to each of the software module to manage the overall performance of workflows in SBS. An approach is presented to resource allocation for SBS to adaptively allocating the system resources of servers to their software modules in runtime in order to satisfy the performance requirements of multiple workflows in SBS. Experimental results show that the dynamic resource allocation approach can substantially increase the throughput of a SBS and the optimal resource allocation can be found in polynomial time

ContributorsAn, Ho Geun (Author) / Yau, Sik-Sang (Thesis advisor) / Huang, Dijiang (Committee member) / Ahn, Gail-Joon (Committee member) / Santanam, Raghu (Committee member) / Arizona State University (Publisher)

Created2012

Service oriented architecture for mobile cloud computing

Description

The Open Services Gateway initiative (OSGi) framework is a standard of module system and service platform that implements a complete and dynamic component model. Currently most of OSGi implementations are implemented by Java, which has similarities of Android language. With the emergence of Android operating system, due to the similarities…

The Open Services Gateway initiative (OSGi) framework is a standard of module system and service platform that implements a complete and dynamic component model. Currently most of OSGi implementations are implemented by Java, which has similarities of Android language. With the emergence of Android operating system, due to the similarities between Java and Android, the integration of module system and service platform from OSGi to Android system attracts more and more attention. How to make OSGi run in Android is a hot topic, further, how to find a mechanism to enable communication between OSGi and Android system is a more advanced area than simply making OSGi running in Android. This paper, which aimed to fulfill SOA (Service Oriented Architecture) and CBA (Component Based Architecture), proposed a solution on integrating Felix OSGi platform with Android system in order to build up Distributed OSGi framework between mobile phones upon XMPP protocol. And in this paper, it not only successfully makes OSGi run on Android, but also invents a mechanism that makes a seamless collaboration between these two platforms.

ContributorsDong, Xinyi (Author) / Huang, Dijiang (Thesis advisor) / Dasgupta, Partha (Committee member) / Chen, Yinong (Committee member) / Arizona State University (Publisher)

Created2012

Secure sharing of electronic medical records in cloud computing

Description

In modern healthcare environments, there is a strong need to create an infrastructure that reduces time-consuming efforts and costly operations to obtain a patient's complete medical record and uniformly integrates this heterogeneous collection of medical data to deliver it to the healthcare professionals. As a result, healthcare providers are more…

In modern healthcare environments, there is a strong need to create an infrastructure that reduces time-consuming efforts and costly operations to obtain a patient's complete medical record and uniformly integrates this heterogeneous collection of medical data to deliver it to the healthcare professionals. As a result, healthcare providers are more willing to shift their electronic medical record (EMR) systems to clouds that can remove the geographical distance barriers among providers and patient. Even though cloud-based EMRs have received considerable attention since it would help achieve lower operational cost and better interoperability with other healthcare providers, the adoption of security-aware cloud systems has become an extremely important prerequisite for bringing interoperability and efficient management to the healthcare industry. Since a shared electronic health record (EHR) essentially represents a virtualized aggregation of distributed clinical records from multiple healthcare providers, sharing of such integrated EHRs may comply with various authorization policies from these data providers. In this work, we focus on the authorized and selective sharing of EHRs among several parties with different duties and objectives that satisfies access control and compliance issues in healthcare cloud computing environments. We present a secure medical data sharing framework to support selective sharing of composite EHRs aggregated from various healthcare providers and compliance of HIPAA regulations. Our approach also ensures that privacy concerns need to be accommodated for processing access requests to patients' healthcare information. To realize our proposed approach, we design and implement a cloud-based EHRs sharing system. In addition, we describe case studies and evaluation results to demonstrate the effectiveness and efficiency of our approach.

ContributorsWu, Ruoyu (Author) / Ahn, Gail-Joon (Thesis advisor) / Yau, Stephen S. (Committee member) / Huang, Dijiang (Committee member) / Arizona State University (Publisher)

Created2012

Dynamic waveform design for track-before-detect algorithms in radar

Description

In this thesis, an adaptive waveform selection technique for dynamic target tracking under low signal-to-noise ratio (SNR) conditions is investigated. The approach is integrated with a track-before-detect (TBD) algorithm and uses delay-Doppler matched filter (MF) outputs as raw measurements without setting any threshold for extracting delay-Doppler estimates. The particle filter…

In this thesis, an adaptive waveform selection technique for dynamic target tracking under low signal-to-noise ratio (SNR) conditions is investigated. The approach is integrated with a track-before-detect (TBD) algorithm and uses delay-Doppler matched filter (MF) outputs as raw measurements without setting any threshold for extracting delay-Doppler estimates. The particle filter (PF) Bayesian sequential estimation approach is used with the TBD algorithm (PF-TBD) to estimate the dynamic target state. A waveform-agile TBD technique is proposed that integrates the PF-TBD with a waveform selection technique. The new approach predicts the waveform to transmit at the next time step by minimizing the predicted mean-squared error (MSE). As a result, the radar parameters are adaptively and optimally selected for superior performance. Based on previous work, this thesis highlights the applicability of the predicted covariance matrix to the lower SNR waveform-agile tracking problem. The adaptive waveform selection algorithm's MSE performance was compared against fixed waveforms using Monte Carlo simulations. It was found that the adaptive approach performed at least as well as the best fixed waveform when focusing on estimating only position or only velocity. When these estimates were weighted by different amounts, then the adaptive performance exceeded all fixed waveforms. This improvement in performance demonstrates the utility of the predicted covariance in waveform design, at low SNR conditions that are poorly handled with more traditional tracking algorithms.

ContributorsPiwowarski, Ryan (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Kovvali, Narayan (Committee member) / Arizona State University (Publisher)

Created2011

Multidimensional DFT IP generators for FPGA platforms

Description

Multidimensional (MD) discrete Fourier transform (DFT) is a key kernel algorithm in many signal processing applications, such as radar imaging and medical imaging. Traditionally, a two-dimensional (2-D) DFT is computed using Row-Column (RC) decomposition, where one-dimensional (1-D) DFTs are computed along the rows followed by 1-D DFTs along the columns.…

Multidimensional (MD) discrete Fourier transform (DFT) is a key kernel algorithm in many signal processing applications, such as radar imaging and medical imaging. Traditionally, a two-dimensional (2-D) DFT is computed using Row-Column (RC) decomposition, where one-dimensional (1-D) DFTs are computed along the rows followed by 1-D DFTs along the columns. However, architectures based on RC decomposition are not efficient for large input size data which have to be stored in external memories based Synchronous Dynamic RAM (SDRAM). In this dissertation, first an efficient architecture to implement 2-D DFT for large-sized input data is proposed. This architecture achieves very high throughput by exploiting the inherent parallelism due to a novel 2-D decomposition and by utilizing the row-wise burst access pattern of the SDRAM external memory. In addition, an automatic IP generator is provided for mapping this architecture onto a reconfigurable platform of Xilinx Virtex-5 devices. For a 2048x2048 input size, the proposed architecture is 1.96 times faster than RC decomposition based implementation under the same memory constraints, and also outperforms other existing implementations. While the proposed 2-D DFT IP can achieve high performance, its output is bit-reversed. For systems where the output is required to be in natural order, use of this DFT IP would result in timing overhead. To solve this problem, a new bandwidth-efficient MD DFT IP that is transpose-free and produces outputs in natural order is proposed. It is based on a novel decomposition algorithm that takes into account the output order, FPGA resources, and the characteristics of off-chip memory access. An IP generator is designed and integrated into an in-house FPGA development platform, AlgoFLEX, for easy verification and fast integration. The corresponding 2-D and 3-D DFT architectures are ported onto the BEE3 board and their performance measured and analyzed. The results shows that the architecture can maintain the maximum memory bandwidth throughout the whole procedure while avoiding matrix transpose operations used in most other MD DFT implementations. The proposed architecture has also been ported onto the Xilinx ML605 board. When clocked at 100 MHz, 2048x2048 images with complex single-precision can be processed in less than 27 ms. Finally, transpose-free imaging flows for range-Doppler algorithm (RDA) and chirp-scaling algorithm (CSA) in SAR imaging are proposed. The corresponding implementations take advantage of the memory access patterns designed for the MD DFT IP and have superior timing performance. The RDA and CSA flows are mapped onto a unified architecture which is implemented on an FPGA platform. When clocked at 100MHz, the RDA and CSA computations with data size 4096x4096 can be completed in 323ms and 162ms, respectively. This implementation outperforms existing SAR image accelerators based on FPGA and GPU.

ContributorsYu, Chi-Li (Author) / Chakrabarti, Chaitali (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Karam, Lina (Committee member) / Cao, Yu (Committee member) / Arizona State University (Publisher)

Created2012