Search Content

A New RNS 4-moduli set for the implementation of FIR filters

Description

Residue number systems have gained significant importance in the field of high-speed digital signal processing due to their carry-free nature and speed-up provided by parallelism. The critical aspect in the application of RNS is the selection of the moduli set and the design of the conversion units. There have been…

Residue number systems have gained significant importance in the field of high-speed digital signal processing due to their carry-free nature and speed-up provided by parallelism. The critical aspect in the application of RNS is the selection of the moduli set and the design of the conversion units. There have been several RNS moduli sets proposed for the implementation of digital filters. However, some are unbalanced and some do not provide the required dynamic range. This thesis addresses the drawbacks of existing RNS moduli sets and proposes a new moduli set for efficient implementation of FIR filters. An efficient VLSI implementation model has been derived for the design of a reverse converter from RNS to the conventional two's complement representation. This model facilitates the realization of a reverse converter for better performance with less hardware complexity when compared with the reverse converter designs of the existing balanced 4-moduli sets. Experimental results comparing multiply and accumulate units using RNS that are implemented using the proposed four-moduli set with the state-of-the-art balanced four-moduli sets, show large improvements in area (46%) and power (43%) reduction for various dynamic ranges. RNS FIR filters using the proposed moduli-set and existing balanced 4-moduli set are implemented in RTL and compared for chip area and power and observed 20% improvements. This thesis also presents threshold logic implementation of the reverse converter.

ContributorsChalivendra, Gayathri (Author) / Vrudhula, Sarma (Thesis advisor) / Shrivastava, Aviral (Committee member) / Bakkaloglu, Bertan (Committee member) / Arizona State University (Publisher)

Created2011

Sparse learning package with stability selection and application to alzheimer's disease

Description

Sparse learning is a technique in machine learning for feature selection and dimensionality reduction, to find a sparse set of the most relevant features. In any machine learning problem, there is a considerable amount of irrelevant information, and separating relevant information from the irrelevant information has been a topic of…

Sparse learning is a technique in machine learning for feature selection and dimensionality reduction, to find a sparse set of the most relevant features. In any machine learning problem, there is a considerable amount of irrelevant information, and separating relevant information from the irrelevant information has been a topic of focus. In supervised learning like regression, the data consists of many features and only a subset of the features may be responsible for the result. Also, the features might require special structural requirements, which introduces additional complexity for feature selection. The sparse learning package, provides a set of algorithms for learning a sparse set of the most relevant features for both regression and classification problems. Structural dependencies among features which introduce additional requirements are also provided as part of the package. The features may be grouped together, and there may exist hierarchies and over- lapping groups among these, and there may be requirements for selecting the most relevant groups among them. In spite of getting sparse solutions, the solutions are not guaranteed to be robust. For the selection to be robust, there are certain techniques which provide theoretical justification of why certain features are selected. The stability selection, is a method for feature selection which allows the use of existing sparse learning methods to select the stable set of features for a given training sample. This is done by assigning probabilities for the features: by sub-sampling the training data and using a specific sparse learning technique to learn the relevant features, and repeating this a large number of times, and counting the probability as the number of times a feature is selected. Cross-validation which is used to determine the best parameter value over a range of values, further allows to select the best parameter value. This is done by selecting the parameter value which gives the maximum accuracy score. With such a combination of algorithms, with good convergence guarantees, stable feature selection properties and the inclusion of various structural dependencies among features, the sparse learning package will be a powerful tool for machine learning research. Modular structure, C implementation, ATLAS integration for fast linear algebraic subroutines, make it one of the best tool for a large sparse setting. The varied collection of algorithms, support for group sparsity, batch algorithms, are a few of the notable functionality of the SLEP package, and these features can be used in a variety of fields to infer relevant elements. The Alzheimer Disease(AD) is a neurodegenerative disease, which gradually leads to dementia. The SLEP package is used for feature selection for getting the most relevant biomarkers from the available AD dataset, and the results show that, indeed, only a subset of the features are required to gain valuable insights.

ContributorsThulasiram, Ramesh (Author) / Ye, Jieping (Thesis advisor) / Xue, Guoliang (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2011

Multi-task learning via structured regularization: formulations, algorithms, and applications

Description

Multi-task learning (MTL) aims to improve the generalization performance (of the resulting classifiers) by learning multiple related tasks simultaneously. Specifically, MTL exploits the intrinsic task relatedness, based on which the informative domain knowledge from each task can be shared across multiple tasks and thus facilitate the individual task learning. It…

Multi-task learning (MTL) aims to improve the generalization performance (of the resulting classifiers) by learning multiple related tasks simultaneously. Specifically, MTL exploits the intrinsic task relatedness, based on which the informative domain knowledge from each task can be shared across multiple tasks and thus facilitate the individual task learning. It is particularly desirable to share the domain knowledge (among the tasks) when there are a number of related tasks but only limited training data is available for each task. Modeling the relationship of multiple tasks is critical to the generalization performance of the MTL algorithms. In this dissertation, I propose a series of MTL approaches which assume that multiple tasks are intrinsically related via a shared low-dimensional feature space. The proposed MTL approaches are developed to deal with different scenarios and settings; they are respectively formulated as mathematical optimization problems of minimizing the empirical loss regularized by different structures. For all proposed MTL formulations, I develop the associated optimization algorithms to find their globally optimal solution efficiently. I also conduct theoretical analysis for certain MTL approaches by deriving the globally optimal solution recovery condition and the performance bound. To demonstrate the practical performance, I apply the proposed MTL approaches on different real-world applications: (1) Automated annotation of the Drosophila gene expression pattern images; (2) Categorization of the Yahoo web pages. Our experimental results demonstrate the efficiency and effectiveness of the proposed algorithms.

ContributorsChen, Jianhui (Author) / Ye, Jieping (Thesis advisor) / Kumar, Sudhir (Committee member) / Liu, Huan (Committee member) / Xue, Guoliang (Committee member) / Arizona State University (Publisher)

Created2011

Structured sparse learning and its applications to biomedical and biological data

Description

Sparsity has become an important modeling tool in areas such as genetics, signal and audio processing, medical image processing, etc. Via the penalization of l-1 norm based regularization, the structured sparse learning algorithms can produce highly accurate models while imposing various predefined structures on the data, such as feature groups…

Sparsity has become an important modeling tool in areas such as genetics, signal and audio processing, medical image processing, etc. Via the penalization of l-1 norm based regularization, the structured sparse learning algorithms can produce highly accurate models while imposing various predefined structures on the data, such as feature groups or graphs. In this thesis, I first propose to solve a sparse learning model with a general group structure, where the predefined groups may overlap with each other. Then, I present three real world applications which can benefit from the group structured sparse learning technique. In the first application, I study the Alzheimer's Disease diagnosis problem using multi-modality neuroimaging data. In this dataset, not every subject has all data sources available, exhibiting an unique and challenging block-wise missing pattern. In the second application, I study the automatic annotation and retrieval of fruit-fly gene expression pattern images. Combined with the spatial information, sparse learning techniques can be used to construct effective representation of the expression images. In the third application, I present a new computational approach to annotate developmental stage for Drosophila embryos in the gene expression images. In addition, it provides a stage score that enables one to more finely annotate each embryo so that they are divided into early and late periods of development within standard stage demarcations. Stage scores help us to illuminate global gene activities and changes much better, and more refined stage annotations improve our ability to better interpret results when expression pattern matches are discovered between genes.

ContributorsYuan, Lei (Author) / Ye, Jieping (Thesis advisor) / Wang, Yalin (Committee member) / Xue, Guoliang (Committee member) / Kumar, Sudhir (Committee member) / Arizona State University (Publisher)

Created2013

Testing of threshold logic latch based hybrid circuits

Description

The advent of threshold logic simplifies the traditional Boolean logic to the single level multi-input function. Threshold logic latch (TLL), among implementations of threshold logic, is functionally equivalent to a multi-input function with an edge triggered flip-flop, which stands out to improve area and both dynamic and leakage power consumption,…

The advent of threshold logic simplifies the traditional Boolean logic to the single level multi-input function. Threshold logic latch (TLL), among implementations of threshold logic, is functionally equivalent to a multi-input function with an edge triggered flip-flop, which stands out to improve area and both dynamic and leakage power consumption, providing an appropriate design alternative. Accordingly, the TLL standard cell library is designed. Through technology mapping, hybrid circuit is generated by absorbing the logic cone backward from each flip-flip to get the smallest remaining feeder. With the scan test methodology adopted, design for testability (DFT) is proposed, including scan element design and scan chain insertion. Test synthesis flow is then introduced, according to the Cadence tool, RTL compiler. Test application is the process of applying vectors and the response analysis, which is mainly about the testbench design. A parameterized generic self-checking Verilog testbench is designed for static fault detection. Test development refers to the fault modeling, and test generation. Firstly, functional truth table test generation on TLL cells is proposed. Before the truth table test of the threshold function, the dependence of sequence of vectors applied, i.e., the dependence of current state on the previous state, should be eliminated. Transition test (dynamic pattern) on all weak inputs is proved to be able to test the reset function, which is supposed to erase the history in the reset phase before every evaluation phase. Remaining vectors in the truth table except the weak inputs are then applied statically (static pattern). Secondly, dynamic patterns for all weak inputs are proposed to detect structural transistor level faults analyzed in the TLL cell, with single fault assumption and stuck-at faults, stuck-on faults, and stuck-open faults under consideration. Containing those patterns, the functional test covers all testable structural faults inside the TLL. Thirdly, with the scope of the whole hybrid netlist, the procedure of test generation is proposed with three steps: scan chain test; test of feeders and other scan elements except TLLs; functional pattern test of TLL cells. Implementation of this procedure is discussed in the automatic test pattern generation (ATPG) chapter.

ContributorsHu, Yang (Author) / Vrudhula, Sarma (Thesis advisor) / Barnaby, Hugh (Committee member) / Yu, Shimeng (Committee member) / Arizona State University (Publisher)

Created2013

Dynamic scheduling of stream programs on embedded multi-core processors

Description

Stream computing has emerged as an importantmodel of computation for embedded system applications particularly in the multimedia and network processing domains. In recent past several programming languages and embedded multi-core processors have been proposed for streaming applications. This thesis examines the execution and dynamic scheduling of stream programs on embedded…

Stream computing has emerged as an importantmodel of computation for embedded system applications particularly in the multimedia and network processing domains. In recent past several programming languages and embedded multi-core processors have been proposed for streaming applications. This thesis examines the execution and dynamic scheduling of stream programs on embedded multi-core processors. The thesis addresses the problem in the context of a multi-tasking environment with a time varying allocation of processing elements for a particular streaming application. As a solution the thesis proposes a two step approach where the stream program is compiled to gather key application information, and to generate re-targetable code. A light weight dynamic scheduler incorporates the second stage of the approach. The dynamic scheduler utilizes the static information and available resources to assign or partition the application across the multi-core architecture. The objective of the dynamic scheduler is to maximize the throughput of the application, and it is sensitive to the resource (processing elements, scratch-pad memory, DMA bandwidth) constraints imposed by the target architecture. We evaluate the proposed approach by compiling and scheduling benchmark stream programs on a representative embedded multi-core processor. We present experimental results that evaluate the quality of the solutions generated by the proposed approach by comparisons with existing techniques.

ContributorsLee, Haeseung (Author) / Chatha, Karamvir (Thesis advisor) / Vrudhula, Sarma (Committee member) / Chakrabarti, Chaitali (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)

Created2013

Stochastic optimization and real-time scheduling in cyber-physical systems

Description

A principal goal of this dissertation is to study stochastic optimization and real-time scheduling in cyber-physical systems (CPSs) ranging from real-time wireless systems to energy systems to distributed control systems. Under this common theme, this dissertation can be broadly organized into three parts based on the system environments. The first…

A principal goal of this dissertation is to study stochastic optimization and real-time scheduling in cyber-physical systems (CPSs) ranging from real-time wireless systems to energy systems to distributed control systems. Under this common theme, this dissertation can be broadly organized into three parts based on the system environments. The first part investigates stochastic optimization in real-time wireless systems, with the focus on the deadline-aware scheduling for real-time traffic. The optimal solution to such scheduling problems requires to explicitly taking into account the coupling in the deadline-aware transmissions and stochastic characteristics of the traffic, which involves a dynamic program that is traditionally known to be intractable or computationally expensive to implement. First, real-time scheduling with adaptive network coding over memoryless channels is studied, and a polynomial-time complexity algorithm is developed to characterize the optimal real-time scheduling. Then, real-time scheduling over Markovian channels is investigated, where channel conditions are time-varying and online channel learning is necessary, and the optimal scheduling policies in different traffic regimes are studied. The second part focuses on the stochastic optimization and real-time scheduling involved in energy systems. First, risk-aware scheduling and dispatch for plug-in electric vehicles (EVs) are studied, aiming to jointly optimize the EV charging cost and the risk of the load mismatch between the forecasted and the actual EV loads, due to the random driving activities of EVs. Then, the integration of wind generation at high penetration levels into bulk power grids is considered. Joint optimization of economic dispatch and interruptible load management is investigated using short-term wind farm generation forecast. The third part studies stochastic optimization in distributed control systems under different network environments. First, distributed spectrum access in cognitive radio networks is investigated by using pricing approach, where primary users (PUs) sell the temporarily unused spectrum and secondary users compete via random access for such spectrum opportunities. The optimal pricing strategy for PUs and the corresponding distributed implementation of spectrum access control are developed to maximize the PU's revenue. Then, a systematic study of the nonconvex utility-based power control problem is presented under the physical interference model in ad-hoc networks. Distributed power control schemes are devised to maximize the system utility, by leveraging the extended duality theory and simulated annealing.

ContributorsYang, Lei (Author) / Zhang, Junshan (Thesis advisor) / Tepedelenlioğlu, Cihan (Committee member) / Xue, Guoliang (Committee member) / Ying, Lei (Committee member) / Arizona State University (Publisher)

Created2012

Security and privacy in heterogeneous wireless and mobile networks: challenges and solutions

Description

The rapid advances in wireless communications and networking have given rise to a number of emerging heterogeneous wireless and mobile networks along with novel networking paradigms, including wireless sensor networks, mobile crowdsourcing, and mobile social networking. While offering promising solutions to a wide range of new applications, their widespread adoption…

The rapid advances in wireless communications and networking have given rise to a number of emerging heterogeneous wireless and mobile networks along with novel networking paradigms, including wireless sensor networks, mobile crowdsourcing, and mobile social networking. While offering promising solutions to a wide range of new applications, their widespread adoption and large-scale deployment are often hindered by people's concerns about the security, user privacy, or both. In this dissertation, we aim to address a number of challenging security and privacy issues in heterogeneous wireless and mobile networks in an attempt to foster their widespread adoption. Our contributions are mainly fivefold. First, we introduce a novel secure and loss-resilient code dissemination scheme for wireless sensor networks deployed in hostile and harsh environments. Second, we devise a novel scheme to enable mobile users to detect any inauthentic or unsound location-based top-k query result returned by an untrusted location-based service providers. Third, we develop a novel verifiable privacy-preserving aggregation scheme for people-centric mobile sensing systems. Fourth, we present a suite of privacy-preserving profile matching protocols for proximity-based mobile social networking, which can support a wide range of matching metrics with different privacy levels. Last, we present a secure combination scheme for crowdsourcing-based cooperative spectrum sensing systems that can enable robust primary user detection even when malicious cognitive radio users constitute the majority.

ContributorsZhang, Rui (Author) / Zhang, Yanchao (Thesis advisor) / Duman, Tolga Mete (Committee member) / Xue, Guoliang (Committee member) / Zhang, Junshan (Committee member) / Arizona State University (Publisher)

Created2013

Unified framework for energy-proportional computing in multicore processors: novel algorithms and practical implementation

Description

Multicore processors have proliferated in nearly all forms of computing, from servers, desktop, to smartphones. The primary reason for this large adoption of multicore processors is due to its ability to overcome the power-wall by providing higher performance at a lower power consumption rate. With multi-cores, there is increased need…

Multicore processors have proliferated in nearly all forms of computing, from servers, desktop, to smartphones. The primary reason for this large adoption of multicore processors is due to its ability to overcome the power-wall by providing higher performance at a lower power consumption rate. With multi-cores, there is increased need for dynamic energy management (DEM), much more than for single-core processors, as DEM for multi-cores is no more a mechanism just to ensure that a processor is kept under specified temperature limits, but also a set of techniques that manage various processor controls like dynamic voltage and frequency scaling (DVFS), task migration, fan speed, etc. to achieve a stated objective. The objectives span a wide range from maximizing throughput, minimizing power consumption, reducing peak temperature, maximizing energy efficiency, maximizing processor reliability, and so on, along with much more wider constraints of temperature, power, timing, and reliability constraints. Thus DEM can be very complex and challenging to achieve. Since often times many DEMs operate together on a single processor, there is a need to unify various DEM techniques. This dissertation address such a need. In this work, a framework for DEM is proposed that provides a unifying processor model that includes processor power, thermal, timing, and reliability models, supports various DEM control mechanisms, many different objective functions along with equally diverse constraint specifications. Using the framework, a range of novel solutions is derived for instances of DEM problems, that include maximizing processor performance, energy efficiency, or minimizing power consumption, peak temperature under constraints of maximum temperature, memory reliability and task deadlines. Finally, a robust closed-loop controller to implement the above solutions on a real processor platform with a very low operational overhead is proposed. Along with the controller design, a model identification methodology for obtaining the required power and thermal models for the controller is also discussed. The controller is architecture independent and hence easily portable across many platforms. The controller has been successfully deployed on Intel Sandy Bridge processor and the use of the controller has increased the energy efficiency of the processor by over 30%

ContributorsHanumaiah, Vinay (Author) / Vrudhula, Sarma (Thesis advisor) / Chatha, Karamvir (Committee member) / Chakrabarti, Chaitali (Committee member) / Rodriguez, Armando (Committee member) / Askin, Ronald (Committee member) / Arizona State University (Publisher)

Created2013

Towards energy efficient computing with Linux: enabling task level power awareness and support for energy efficient accelerator

Description

With increasing transistor volume and reducing feature size, it has become a major design constraint to reduce power consumption also. This has given rise to aggressive architectural changes for on-chip power management and rapid development to energy efficient hardware accelerators. Accordingly, the objective of this research work is to facilitate…

With increasing transistor volume and reducing feature size, it has become a major design constraint to reduce power consumption also. This has given rise to aggressive architectural changes for on-chip power management and rapid development to energy efficient hardware accelerators. Accordingly, the objective of this research work is to facilitate software developers to leverage these hardware techniques and improve energy efficiency of the system. To achieve this, I propose two solutions for Linux kernel: Optimal use of these architectural enhancements to achieve greater energy efficiency requires accurate modeling of processor power consumption. Though there are many models available in literature to model processor power consumption, there is a lack of such models to capture power consumption at the task-level. Task-level energy models are a requirement for an operating system (OS) to perform real-time power management as OS time multiplexes tasks to enable sharing of hardware resources. I propose a detailed design methodology for constructing an architecture agnostic task-level power model and incorporating it into a modern operating system to build an online task-level power profiler. The profiler is implemented inside the latest Linux kernel and validated for Intel Sandy Bridge processor. It has a negligible overhead of less than 1\% hardware resource consumption. The profiler power prediction was demonstrated for various application benchmarks from SPEC to PARSEC with less than 4\% error. I also demonstrate the importance of the proposed profiler for emerging architectural techniques through use case scenarios, which include heterogeneous computing and fine grained per-core DVFS. Along with architectural enhancement in general purpose processors to improve energy efficiency, hardware accelerators like Coarse Grain reconfigurable architecture (CGRA) are gaining popularity. Unlike vector processors, which rely on data parallelism, CGRA can provide greater flexibility and compiler level control making it more suitable for present SoC environment. To provide streamline development environment for CGRA, I propose a flexible framework in Linux to do design space exploration for CGRA. With accurate and flexible hardware models, fine grained integration with accurate architectural simulator, and Linux memory management and DMA support, a user can carry out limitless experiments on CGRA in full system environment.

ContributorsDesai, Digant Pareshkumar (Author) / Vrudhula, Sarma (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)

Created2013