Search Content

Reliable arithmetic circuit design inspired by SNP systems

Description

ABSTRACT Developing new non-traditional device models is gaining popularity as the silicon-based electrical device approaches its limitation when it scales down. Membrane systems, also called P systems, are a new class of biological computation model inspired by the way cells process chemical signals. Spiking Neural P systems (SNP systems), a…

ABSTRACT Developing new non-traditional device models is gaining popularity as the silicon-based electrical device approaches its limitation when it scales down. Membrane systems, also called P systems, are a new class of biological computation model inspired by the way cells process chemical signals. Spiking Neural P systems (SNP systems), a certain kind of membrane systems, is inspired by the way the neurons in brain interact using electrical spikes. Compared to the traditional Boolean logic, SNP systems not only perform similar functions but also provide a more promising solution for reliable computation. Two basic neuron types, Low Pass (LP) neurons and High Pass (HP) neurons, are introduced. These two basic types of neurons are capable to build an arbitrary SNP neuron. This leads to the conclusion that these two basic neuron types are Turing complete since SNP systems has been proved Turing complete. These two basic types of neurons are further used as the elements to construct general-purpose arithmetic circuits, such as adder, subtractor and comparator. In this thesis, erroneous behaviors of neurons are discussed. Transmission error (spike loss) is proved to be equivalent to threshold error, which makes threshold error discussion more universal. To improve the reliability, a new structure called motif is proposed. Compared to Triple Modular Redundancy improvement, motif design presents its efficiency and effectiveness in both single neuron and arithmetic circuit analysis. DRAM-based CMOS circuits are used to implement the two basic types of neurons. Functionality of basic type neurons is proved using the SPICE simulations. The motif improved adder and the comparator, as compared to conventional Boolean logic design, are much more reliable with lower leakage, and smaller silicon area. This leads to the conclusion that SNP system could provide a more promising solution for reliable computation than the conventional Boolean logic.

ContributorsAn, Pei (Author) / Cao, Yu (Thesis advisor) / Barnaby, Hugh (Committee member) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)

Created2013

Towards energy efficient computing with Linux: enabling task level power awareness and support for energy efficient accelerator

Description

With increasing transistor volume and reducing feature size, it has become a major design constraint to reduce power consumption also. This has given rise to aggressive architectural changes for on-chip power management and rapid development to energy efficient hardware accelerators. Accordingly, the objective of this research work is to facilitate…

With increasing transistor volume and reducing feature size, it has become a major design constraint to reduce power consumption also. This has given rise to aggressive architectural changes for on-chip power management and rapid development to energy efficient hardware accelerators. Accordingly, the objective of this research work is to facilitate software developers to leverage these hardware techniques and improve energy efficiency of the system. To achieve this, I propose two solutions for Linux kernel: Optimal use of these architectural enhancements to achieve greater energy efficiency requires accurate modeling of processor power consumption. Though there are many models available in literature to model processor power consumption, there is a lack of such models to capture power consumption at the task-level. Task-level energy models are a requirement for an operating system (OS) to perform real-time power management as OS time multiplexes tasks to enable sharing of hardware resources. I propose a detailed design methodology for constructing an architecture agnostic task-level power model and incorporating it into a modern operating system to build an online task-level power profiler. The profiler is implemented inside the latest Linux kernel and validated for Intel Sandy Bridge processor. It has a negligible overhead of less than 1\% hardware resource consumption. The profiler power prediction was demonstrated for various application benchmarks from SPEC to PARSEC with less than 4\% error. I also demonstrate the importance of the proposed profiler for emerging architectural techniques through use case scenarios, which include heterogeneous computing and fine grained per-core DVFS. Along with architectural enhancement in general purpose processors to improve energy efficiency, hardware accelerators like Coarse Grain reconfigurable architecture (CGRA) are gaining popularity. Unlike vector processors, which rely on data parallelism, CGRA can provide greater flexibility and compiler level control making it more suitable for present SoC environment. To provide streamline development environment for CGRA, I propose a flexible framework in Linux to do design space exploration for CGRA. With accurate and flexible hardware models, fine grained integration with accurate architectural simulator, and Linux memory management and DMA support, a user can carry out limitless experiments on CGRA in full system environment.

ContributorsDesai, Digant Pareshkumar (Author) / Vrudhula, Sarma (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)

Created2013

Study of self-heating effects in GaN HEMTs

Description

GaN high electron mobility transistors (HEMTs) based on the III-V nitride material system have been under extensive investigation because of their superb performance as high power RF devices. Two dimensional electron gas(2-DEG) with charge density ten times higher than that of GaAs-based HEMT and mobility much higher than Si enables…

GaN high electron mobility transistors (HEMTs) based on the III-V nitride material system have been under extensive investigation because of their superb performance as high power RF devices. Two dimensional electron gas(2-DEG) with charge density ten times higher than that of GaAs-based HEMT and mobility much higher than Si enables a low on-resistance required for RF devices. Self-heating issues with GaN HEMT and lack of understanding of various phenomena are hindering their widespread commercial development. There is a need to understand device operation by developing a model which could be used to optimize electrical and thermal characteristics of GaN HEMT design for high power and high frequency operation. In this thesis work a physical simulation model of AlGaN/GaN HEMT is developed using commercially available software ATLAS from SILVACO Int. based on the energy balance/hydrodynamic carrier transport equations. The model is calibrated against experimental data. Transfer and output characteristics are the key focus in the analysis along with saturation drain current. The resultant IV curves showed a close correspondence with experimental results. Various combinations of electron mobility, velocity saturation, momentum and energy relaxation times and gate work functions were attempted to improve IV curve correlation. Thermal effects were also investigated to get a better understanding on the role of self-heating effects on the electrical characteristics of GaN HEMTs. The temperature profiles across the device were observed. Hot spots were found along the channel in the gate-drain spacing. These preliminary results indicate that the thermal effects do have an impact on the electrical device characteristics at large biases even though the amount of self-heating is underestimated with respect to thermal particle-based simulations that solve the energy balance equations for acoustic and optical phonons as well (thus take proper account of the formation of the hot-spot). The decrease in drain current is due to decrease in saturation carrier velocity. The necessity of including hydrodynamic/energy balance transport models for accurate simulations is demonstrated. Possible ways for improving model accuracy are discussed in conjunction with future research.

ContributorsChowdhury, Towhid (Author) / Vasileska, Dragica (Thesis advisor) / Goodnick, Stephen (Committee member) / Goryll, Michael (Committee member) / Arizona State University (Publisher)

Created2013

Distributed inference using bounded transmissions

Description

Distributed inference has applications in a wide range of fields such as source localization, target detection, environment monitoring, and healthcare. In this dissertation, distributed inference schemes which use bounded transmit power are considered. The performance of the proposed schemes are studied for a variety of inference problems. In the first…

Distributed inference has applications in a wide range of fields such as source localization, target detection, environment monitoring, and healthcare. In this dissertation, distributed inference schemes which use bounded transmit power are considered. The performance of the proposed schemes are studied for a variety of inference problems. In the first part of the dissertation, a distributed detection scheme where the sensors transmit with constant modulus signals over a Gaussian multiple access channel is considered. The deflection coefficient of the proposed scheme is shown to depend on the characteristic function of the sensing noise, and the error exponent for the system is derived using large deviation theory. Optimization of the deflection coefficient and error exponent are considered with respect to a transmission phase parameter for a variety of sensing noise distributions including impulsive ones. The proposed scheme is also favorably compared with existing amplify-and-forward (AF) and detect-and-forward (DF) schemes. The effect of fading is shown to be detrimental to the detection performance and simulations are provided to corroborate the analytical results. The second part of the dissertation studies a distributed inference scheme which uses bounded transmission functions over a Gaussian multiple access channel. The conditions on the transmission functions under which consistent estimation and reliable detection are possible is characterized. For the distributed estimation problem, an estimation scheme that uses bounded transmission functions is proved to be strongly consistent provided that the variance of the noise samples are bounded and that the transmission function is one-to-one. The proposed estimation scheme is compared with the amplify and forward technique and its robustness to impulsive sensing noise distributions is highlighted. It is also shown that bounded transmissions suffer from inconsistent estimates if the sensing noise variance goes to infinity. For the distributed detection problem, similar results are obtained by studying the deflection coefficient. Simulations corroborate our analytical results. In the third part of this dissertation, the problem of estimating the average of samples distributed at the nodes of a sensor network is considered. A distributed average consensus algorithm in which every sensor transmits with bounded peak power is proposed. In the presence of communication noise, it is shown that the nodes reach consensus asymptotically to a finite random variable whose expectation is the desired sample average of the initial observations with a variance that depends on the step size of the algorithm and the variance of the communication noise. The asymptotic performance is characterized by deriving the asymptotic covariance matrix using results from stochastic approximation theory. It is shown that using bounded transmissions results in slower convergence compared to the linear consensus algorithm based on the Laplacian heuristic. Simulations corroborate our analytical findings. Finally, a robust distributed average consensus algorithm in which every sensor performs a nonlinear processing at the receiver is proposed. It is shown that non-linearity at the receiver nodes makes the algorithm robust to a wide range of channel noise distributions including the impulsive ones. It is shown that the nodes reach consensus asymptotically and similar results are obtained as in the case of transmit non-linearity. Simulations corroborate our analytical findings and highlight the robustness of the proposed algorithm.

ContributorsDasarathan, Sivaraman (Author) / Tepedelenlioğlu, Cihan (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Reisslein, Martin (Committee member) / Goryll, Michael (Committee member) / Arizona State University (Publisher)

Created2013

Unified framework for energy-proportional computing in multicore processors: novel algorithms and practical implementation

Description

Multicore processors have proliferated in nearly all forms of computing, from servers, desktop, to smartphones. The primary reason for this large adoption of multicore processors is due to its ability to overcome the power-wall by providing higher performance at a lower power consumption rate. With multi-cores, there is increased need…

Multicore processors have proliferated in nearly all forms of computing, from servers, desktop, to smartphones. The primary reason for this large adoption of multicore processors is due to its ability to overcome the power-wall by providing higher performance at a lower power consumption rate. With multi-cores, there is increased need for dynamic energy management (DEM), much more than for single-core processors, as DEM for multi-cores is no more a mechanism just to ensure that a processor is kept under specified temperature limits, but also a set of techniques that manage various processor controls like dynamic voltage and frequency scaling (DVFS), task migration, fan speed, etc. to achieve a stated objective. The objectives span a wide range from maximizing throughput, minimizing power consumption, reducing peak temperature, maximizing energy efficiency, maximizing processor reliability, and so on, along with much more wider constraints of temperature, power, timing, and reliability constraints. Thus DEM can be very complex and challenging to achieve. Since often times many DEMs operate together on a single processor, there is a need to unify various DEM techniques. This dissertation address such a need. In this work, a framework for DEM is proposed that provides a unifying processor model that includes processor power, thermal, timing, and reliability models, supports various DEM control mechanisms, many different objective functions along with equally diverse constraint specifications. Using the framework, a range of novel solutions is derived for instances of DEM problems, that include maximizing processor performance, energy efficiency, or minimizing power consumption, peak temperature under constraints of maximum temperature, memory reliability and task deadlines. Finally, a robust closed-loop controller to implement the above solutions on a real processor platform with a very low operational overhead is proposed. Along with the controller design, a model identification methodology for obtaining the required power and thermal models for the controller is also discussed. The controller is architecture independent and hence easily portable across many platforms. The controller has been successfully deployed on Intel Sandy Bridge processor and the use of the controller has increased the energy efficiency of the processor by over 30%

ContributorsHanumaiah, Vinay (Author) / Vrudhula, Sarma (Thesis advisor) / Chatha, Karamvir (Committee member) / Chakrabarti, Chaitali (Committee member) / Rodriguez, Armando (Committee member) / Askin, Ronald (Committee member) / Arizona State University (Publisher)

Created2013

Novel low temperature processing for enhanced properties of ion implanted thin films and amorphous mixed oxide thin film transistors

Description

This research emphasizes the use of low energy and low temperature post processing to improve the performance and lifetime of thin films and thin film transistors, by applying the fundamentals of interaction of materials with conductive heating and electromagnetic radiation. Single frequency microwave anneal is used to rapidly recrystallize the…

This research emphasizes the use of low energy and low temperature post processing to improve the performance and lifetime of thin films and thin film transistors, by applying the fundamentals of interaction of materials with conductive heating and electromagnetic radiation. Single frequency microwave anneal is used to rapidly recrystallize the damage induced during ion implantation in Si substrates. Volumetric heating of the sample in the presence of the microwave field facilitates quick absorption of radiation to promote recrystallization at the amorphous-crystalline interface, apart from electrical activation of the dopants due to relocation to the substitutional sites. Structural and electrical characterization confirm recrystallization of heavily implanted Si within 40 seconds anneal time with minimum dopant diffusion compared to rapid thermal annealed samples. The use of microwave anneal to improve performance of multilayer thin film devices, e.g. thin film transistors (TFTs) requires extensive study of interaction of individual layers with electromagnetic radiation. This issue has been addressed by developing detail understanding of thin films and interfaces in TFTs by studying reliability and failure mechanisms upon extensive stress test. Electrical and ambient stresses such as illumination, thermal, and mechanical stresses are inflicted on the mixed oxide based thin film transistors, which are explored due to high mobilities of the mixed oxide (indium zinc oxide, indium gallium zinc oxide) channel layer material. Semiconductor parameter analyzer is employed to extract transfer characteristics, useful to derive mobility, subthreshold, and threshold voltage parameters of the transistors. Low temperature post processing anneals compatible with polymer substrates are performed in several ambients (oxygen, forming gas and vacuum) at 150 °C as a preliminary step. The analysis of the results pre and post low temperature anneals using device physics fundamentals assists in categorizing defects leading to failure/degradation as: oxygen vacancies, thermally activated defects within the bandgap, channel-dielectric interface defects, and acceptor-like or donor-like trap states. Microwave anneal has been confirmed to enhance the quality of thin films, however future work entails extending the use of electromagnetic radiation in controlled ambient to facilitate quick post fabrication anneal to improve the functionality and lifetime of these low temperature fabricated TFTs.

ContributorsVemuri, Rajitha (Author) / Alford, Terry L. (Thesis advisor) / Theodore, N David (Committee member) / Goryll, Michael (Committee member) / Arizona State University (Publisher)

Created2013

Carrier lifetime measurement for characterization of ultraclean thin p/p+ silicon epitaxial layers

Description

Carrier lifetime is one of the few parameters which can give information about the low defect densities in today's semiconductors. In principle there is no lower limit to the defect density determined by lifetime measurements. No other technique can easily detect defect densities as low as 10-9 - 10-10 cm-3…

Carrier lifetime is one of the few parameters which can give information about the low defect densities in today's semiconductors. In principle there is no lower limit to the defect density determined by lifetime measurements. No other technique can easily detect defect densities as low as 10-9 - 10-10 cm-3 in a simple, contactless room temperature measurement. However in practice, recombination lifetime τr measurements such as photoconductance decay (PCD) and surface photovoltage (SPV) that are widely used for characterization of bulk wafers face serious limitations when applied to thin epitaxial layers, where the layer thickness is smaller than the minority carrier diffusion length Ln. Other methods such as microwave photoconductance decay (µ-PCD), photoluminescence (PL), and frequency-dependent SPV, where the generated excess carriers are confined to the epitaxial layer width by using short excitation wavelengths, require complicated configuration and extensive surface passivation processes that make them time-consuming and not suitable for process screening purposes. Generation lifetime τg, typically measured with pulsed MOS capacitors (MOS-C) as test structures, has been shown to be an eminently suitable technique for characterization of thin epitaxial layers. It is for these reasons that the IC community, largely concerned with unipolar MOS devices, uses lifetime measurements as a "process cleanliness monitor." However when dealing with ultraclean epitaxial wafers, the classic MOS-C technique measures an effective generation lifetime τg eff which is dominated by the surface generation and hence cannot be used for screening impurity densities. I have developed a modified pulsed MOS technique for measuring generation lifetime in ultraclean thin p/p+ epitaxial layers which can be used to detect metallic impurities with densities as low as 10-10 cm-3. The widely used classic version has been shown to be unable to effectively detect such low impurity densities due to the domination of surface generation; whereas, the modified version can be used suitably as a metallic impurity density monitoring tool for such cases.

ContributorsElhami Khorasani, Arash (Author) / Alford, Terry (Thesis advisor) / Goryll, Michael (Committee member) / Bertoni, Mariana (Committee member) / Arizona State University (Publisher)

Created2013

Adaptive parameter estimation, modeling and patient-specific classification of electrocardiogram signals

Description

Adaptive processing and classification of electrocardiogram (ECG) signals are important in eliminating the strenuous process of manually annotating ECG recordings for clinical use. Such algorithms require robust models whose parameters can adequately describe the ECG signals. Although different dynamic statistical models describing ECG signals currently exist, they depend considerably on…

Adaptive processing and classification of electrocardiogram (ECG) signals are important in eliminating the strenuous process of manually annotating ECG recordings for clinical use. Such algorithms require robust models whose parameters can adequately describe the ECG signals. Although different dynamic statistical models describing ECG signals currently exist, they depend considerably on a priori information and user-specified model parameters. Also, ECG beat morphologies, which vary greatly across patients and disease states, cannot be uniquely characterized by a single model. In this work, sequential Bayesian based methods are used to appropriately model and adaptively select the corresponding model parameters of ECG signals. An adaptive framework based on a sequential Bayesian tracking method is proposed to adaptively select the cardiac parameters that minimize the estimation error, thus precluding the need for pre-processing. Simulations using real ECG data from the online Physionet database demonstrate the improvement in performance of the proposed algorithm in accurately estimating critical heart disease parameters. In addition, two new approaches to ECG modeling are presented using the interacting multiple model and the sequential Markov chain Monte Carlo technique with adaptive model selection. Both these methods can adaptively choose between different models for various ECG beat morphologies without requiring prior ECG information, as demonstrated by using real ECG signals. A supervised Bayesian maximum-likelihood (ML) based classifier uses the estimated model parameters to classify different types of cardiac arrhythmias. However, the non-availability of sufficient amounts of representative training data and the large inter-patient variability pose a challenge to the existing supervised learning algorithms, resulting in a poor classification performance. In addition, recently developed unsupervised learning methods require a priori knowledge on the number of diseases to cluster the ECG data, which often evolves over time. In order to address these issues, an adaptive learning ECG classification method that uses Dirichlet process Gaussian mixture models is proposed. This approach does not place any restriction on the number of disease classes, nor does it require any training data. This algorithm is adapted to be patient-specific by labeling or identifying the generated mixtures using the Bayesian ML method, assuming the availability of labeled training data.

ContributorsEdla, Shwetha Reddy (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Kovvali, Narayan (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Arizona State University (Publisher)

Created2012

High-quality extended-wavelength materials for optoelectronic applications

Description

Photodetectors in the 1.7 to 4.0 μm range are being commercially developed on InP substrates to meet the needs of longer wavelength applications such as thermal and medical sensing. Currently, these devices utilize high indium content metamorphic Ga1-xInxAs (x > 0.53) layers to extend the wavelength range beyond the 1.7…

Photodetectors in the 1.7 to 4.0 μm range are being commercially developed on InP substrates to meet the needs of longer wavelength applications such as thermal and medical sensing. Currently, these devices utilize high indium content metamorphic Ga1-xInxAs (x > 0.53) layers to extend the wavelength range beyond the 1.7 μm achievable using lattice matched GaInAs. The large lattice mismatch required to reach the extended wavelengths results in photodetector materials that contain a large number of misfit dislocations. The low quality of these materials results in a large nonradiative Shockley Read Hall generation/recombination rate that is manifested as an undesirable large thermal noise level in these photodetectors. This work focuses on utilizing the different band structure engineering methods to design more efficient devices on InP substrates. One prospective way to improve photodetector performance at the extended wavelengths is to utilize lattice matched GaInAs/GaAsSb structures that have a type-II band alignment, where the ground state transition energy of the superlattice is smaller than the bandgap of either constituent material. Over the extended wavelength range of 2 to 3 μm this superlattice structure has an optimal period thickness of 3.4 to 5.2 nm and a wavefunction overlap of 0.8 to 0.4, respectively. In using a type-II superlattice to extend the cutoff wavelength there is a tradeoff between the wavelength reached and the electron-hole wavefunction overlap realized, and hence absorption coefficient achieved. This tradeoff and the subsequent reduction in performance can be overcome by two methods: adding bismuth to this type-II material system; applying strain on both layers in the system to attain strain-balanced condition. These allow the valance band alignment and hence the wavefunction overlap to be tuned independently of the wavelength cutoff. Adding 3% bismuth to the GaInAs constituent material, the resulting lattice matched Ga0.516In0.484As0.970Bi0.030/GaAs0.511Sb0.489superlattice realizes a 50% larger absorption coefficient. While as, similar results can be achieved with strain-balanced condition with strain limited to 1.9% on either layer. The optimal design rules derived from the different possibilities make it feasible to extract superlattice period thickness with the best absorption coefficient for any cutoff wavelength in the range.

ContributorsSharma, Ankur R (Author) / Johnson, Shane (Thesis advisor) / Goryll, Michael (Committee member) / Roedel, Ronald (Committee member) / Arizona State University (Publisher)

Created2013

Compiler and runtime for memory management on software managed manycore processors

Description

We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale…

We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale to hundreds and thousands of cores. In addition, caches and coherence logic already take 20-50% of the total power consumption of the processor and 30-60% of die area. Therefore, a more scalable architecture is needed for manycore architectures. Software Managed Manycore (SMM) architectures emerge as a solution. They have scalable memory design in which each core has direct access to only its local scratchpad memory, and any data transfers to/from other memories must be done explicitly in the application using Direct Memory Access (DMA) commands. Lack of automatic memory management in the hardware makes such architectures extremely power-efficient, but they also become difficult to program. If the code/data of the task mapped onto a core cannot fit in the local scratchpad memory, then DMA calls must be added to bring in the code/data before it is required, and it may need to be evicted after its use. However, doing this adds a lot of complexity to the programmer's job. Now programmers must worry about data management, on top of worrying about the functional correctness of the program - which is already quite complex. This dissertation presents a comprehensive compiler and runtime integration to automatically manage the code and data of each task in the limited local memory of the core. We firstly developed a Complete Circular Stack Management. It manages stack frames between the local memory and the main memory, and addresses the stack pointer problem as well. Though it works, we found we could further optimize the management for most cases. Thus a Smart Stack Data Management (SSDM) is provided. In this work, we formulate the stack data management problem and propose a greedy algorithm for the same. Later on, we propose a general cost estimation algorithm, based on which CMSM heuristic for code mapping problem is developed. Finally, heap data is dynamic in nature and therefore it is hard to manage it. We provide two schemes to manage unlimited amount of heap data in constant sized region in the local memory. In addition to those separate schemes for different kinds of data, we also provide a memory partition methodology.

ContributorsBai, Ke (Author) / Shrivastava, Aviral (Thesis advisor) / Chatha, Karamvir (Committee member) / Xue, Guoliang (Committee member) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)

Created2014

ASU Electronic Theses and Dissertations