Matching Items (48)

152459-Thumbnail Image.png

Improving the reliability of NAND Flash, phase-change RAM and spin-torque transfer RAM

Description

Non-volatile memories (NVM) are widely used in modern electronic devices due to their non-volatility, low static power consumption and high storage density. While Flash memories are the dominant NVM technology,

Non-volatile memories (NVM) are widely used in modern electronic devices due to their non-volatility, low static power consumption and high storage density. While Flash memories are the dominant NVM technology, resistive memories such as phase change access memory (PRAM) and spin torque transfer random access memory (STT-MRAM) are gaining ground. All these technologies suffer from reliability degradation due to process variations, structural limits and material property shift. To address the reliability concerns of these NVM technologies, multi-level low cost solutions are proposed for each of them. My approach consists of first building a comprehensive error model. Next the error characteristics are exploited to develop low cost multi-level strategies to compensate for the errors. For instance, for NAND Flash memory, I first characterize errors due to threshold voltage variations as a function of the number of program/erase cycles. Next a flexible product code is designed to migrate to a stronger ECC scheme as program/erase cycles increases. An adaptive data refresh scheme is also proposed to improve memory reliability with low energy cost for applications with different data update frequencies. For PRAM, soft errors and hard errors models are built based on shifts in the resistance distributions. Next I developed a multi-level error control approach involving bit interleaving and subblock flipping at the architecture level, threshold resistance tuning at the circuit level and programming current profile tuning at the device level. This approach helped reduce the error rate significantly so that it was now sufficient to use a low cost ECC scheme to satisfy the memory reliability constraint. I also studied the reliability of a PRAM+DRAM hybrid memory system and analyzed the tradeoffs between memory performance, programming energy and lifetime. For STT-MRAM, I first developed an error model based on process variations. I developed a multi-level approach to reduce the error rates that consisted of increasing the W/L ratio of the access transistor, increasing the voltage difference across the memory cell and adjusting the current profile during write operation. This approach enabled use of a low cost BCH based ECC scheme to achieve very low block failure rates.

Contributors

Agent

Created

Date Created
  • 2014

152413-Thumbnail Image.png

Extending efficiency in a DC/DC converter with automatic mode switching from PFM to PWM

Description

Switch mode DC/DC converters are suited for battery powered applications, due to their high efficiency, which help in conserving the battery lifetime. Fixed Frequency PWM based converters, which are generally

Switch mode DC/DC converters are suited for battery powered applications, due to their high efficiency, which help in conserving the battery lifetime. Fixed Frequency PWM based converters, which are generally used for these applications offer good voltage regulation, low ripple and excellent efficiency at high load currents. However at light load currents, fixed frequency PWM converters suffer from poor efficiencies The PFM control offers higher efficiency at light loads at the cost of a higher ripple. The PWM has a poor efficiency at light loads but good voltage ripple characteristics, due to a high switching frequency. To get the best of both control modes, both loops are used together with the control switched from one loop to another based on the load current. Such architectures are referred to as hybrid converters. While transition from PFM to PWM loop can be made by estimating the average load current, transition from PFM to PWM requires voltage or peak current sensing. This theses implements a hysteretic PFM solution for a synchronous buck converter with external MOSFET's, to achieve efficiencies of about 80% at light loads. As the PFM loop operates independently of the PWM loop, a transition circuit for automatically transitioning from PFM to PWM is implemented. The transition circuit is implemented digitally without needing any external voltage or current sensing circuit.

Contributors

Agent

Created

Date Created
  • 2014

152421-Thumbnail Image.png

Radiation hardened pulse based D flip flop design

Description

ABSTRACT The D flip flop acts as a sequencing element while designing any pipelined system. Radiation Hardening by Design (RHBD) allows hardened circuits to be fabricated on commercially available CMOS

ABSTRACT The D flip flop acts as a sequencing element while designing any pipelined system. Radiation Hardening by Design (RHBD) allows hardened circuits to be fabricated on commercially available CMOS manufacturing process. Recently, single event transients (SET's) have become as important as single event upset (SEU) in radiation hardened high speed digital designs. A novel temporal pulse based RHBD flip-flop design is presented. Temporally delayed pulses produced by a radiation hardened pulse generator design samples the data in three redundant pulse latches. The proposed RHBD flip-flop has been statistically designed and fabricated on 90 nm TSMC LP process. Detailed simulations of the flip-flop operation in both normal and radiation environments are presented. Spatial separation of critical nodes for the physical design of the flip-flop is carried out for mitigating multi-node charge collection upsets. The proposed flip-flop is also used in commercial CAD flows for high performance chip designs. The proposed flip-flop is used in the design and auto-place-route (APR) of an advanced encryption system and the metrics analyzed.

Contributors

Agent

Created

Date Created
  • 2014

154858-Thumbnail Image.png

Development and implementation of physical layer kernels for wireless communication protocols

Description

Historically, wireless communication devices have been developed to process one specific waveform. In contrast, a modern cellular phone supports multiple waveforms corresponding to LTE, WCDMA(3G) and 2G standards. The selection

Historically, wireless communication devices have been developed to process one specific waveform. In contrast, a modern cellular phone supports multiple waveforms corresponding to LTE, WCDMA(3G) and 2G standards. The selection of the network is controlled by software running on a general purpose processor, not by the user. Now, instead of selecting from a set of complete radios as in software controlled radio, what if the software could select the building blocks based on the user needs. This is the new software-defined flexible radio which would enable users to construct wireless systems that fit their needs, rather than forcing to use from a small set of pre-existing protocols.

To develop and implement flexible protocols, a flexible hardware very similar to a Software Defined Radio (SDR) is required. In this thesis, the Intel T2200 board is chosen as the SDR platform. It is a heterogeneous platform with ARM, CEVA DSP and several accelerators. A wide range of protocols is mapped onto this platform and their performance evaluated. These include two OFDM based protocols (WiFi-Lite-A, WiFi-Lite-B), one DFT-spread OFDM based protocol (SCFDM-Lite) and one single carrier based protocol (SC-Lite). The transmitter and receiver blocks of the different protocols are first mapped on ARM in the T2200 board. The timing results show that IFFT, FFT, and Viterbi decoder blocks take most of the transmitter and receiver execution time and so in the next step these are mapped onto CEVA DSP. Mapping onto CEVA DSP resulted in significant execution time savings. The savings for WiFi-Lite-A were 60%, for WiFi-Lite-B were 64%, and for SCFDM-Lite were 71.5%. No savings are reported for SC-Lite since it was not mapped onto CEVA DSP.

Significant reduction in execution time is achieved for WiFi-Lite-A and WiFi-Lite-B protocols by implementing the entire transmitter and receiver chains on CEVA DSP. For instance, for WiFi-Lite-A, the savings were as large as 90%. Such huge savings are because the entire transmitter or receiver chain are implemented on CEVA and the timing overhead due to ARM-CEVA communication is completely eliminated. Finally, over-the-air testing was done for WiFi-Lite-A and WiFi-Lite-B protocols. Data was sent over the air using one Intel T2200 WBS board and received using another Intel T2200 WBS board. The received frames were decoded with no errors, thereby validating the over-the-air-communications.

Contributors

Agent

Created

Date Created
  • 2016

156489-Thumbnail Image.png

Power-Performance Modeling and Adaptive Management of Heterogeneous Mobile Platforms​

Description

Nearly 60% of the world population uses a mobile phone, which is typically powered by a system-on-chip (SoC). While the mobile platform capabilities range widely, responsiveness, long battery life and

Nearly 60% of the world population uses a mobile phone, which is typically powered by a system-on-chip (SoC). While the mobile platform capabilities range widely, responsiveness, long battery life and reliability are common design concerns that are crucial to remain competitive. Consequently, state-of-the-art mobile platforms have become highly heterogeneous by combining a powerful SoC with numerous other resources, including display, memory, power management IC, battery and wireless modems. Furthermore, the SoC itself is a heterogeneous resource that integrates many processing elements, such as CPU cores, GPU, video, image, and audio processors. Therefore, CPU cores do not dominate the platform power consumption under many application scenarios.

Competitive performance requires higher operating frequency, and leads to larger power consumption. In turn, power consumption increases the junction and skin temperatures, which have adverse effects on the device reliability and user experience. As a result, allocating the power budget among the major platform resources and temperature control have become fundamental consideration for mobile platforms. Dynamic thermal and power management algorithms address this problem by putting a subset of the processing elements or shared resources to sleep states, or throttling their frequencies. However, an adhoc approach could easily cripple the performance, if it slows down the performance-critical processing element. Furthermore, mobile platforms run a wide range of applications with time varying workload characteristics, unlike early generations, which supported only limited functionality. As a result, there is a need for adaptive power and performance management approaches that consider the platform as a whole, rather than focusing on a subset. Towards this need, our specific contributions include (a) a framework to dynamically select the Pareto-optimal frequency and active cores for the heterogeneous CPUs, such as ARM big.Little architecture, (b) a dynamic power budgeting approach for allocating optimal power consumption to the CPU and GPU using performance sensitivity models for each PE, (c) an adaptive GPU frame time sensitivity prediction model to aid power management algorithms, and (d) an online learning algorithm that constructs adaptive run-time models for non-stationary workloads.

Contributors

Agent

Created

Date Created
  • 2018

156491-Thumbnail Image.png

High Performance Power Management Integrated Circuits for Portable Devices

Description

Portable devices often require multiple power management IC (PMIC) to power different sub-modules, Li-ion batteries are well suited for portable devices because of its small size, high energy density and

Portable devices often require multiple power management IC (PMIC) to power different sub-modules, Li-ion batteries are well suited for portable devices because of its small size, high energy density and long life cycle. Since Li-ion battery is the major power source for portable device, fast and high-efficiency battery charging solution has become a major requirement in portable device application.

In the first part of dissertation, a high performance Li-ion switching battery charger is proposed. Cascaded two loop (CTL) control architecture is used for seamless CC-CV transition, time based technique is utilized to minimize controller area and power consumption. Time domain controller is implemented by using voltage controlled oscillator (VCO) and voltage controlled delay line (VCDL). Several efficiency improvement techniques such as segmented power-FET, quasi-zero voltage switching (QZVS) and switching frequency reduction are proposed. The proposed switching battery charger is able to provide maximum 2 A charging current and has an peak efficiency of 93.3%. By configure the charger as boost converter, the charger is able to provide maximum 1.5 A charging current while achieving 96.3% peak efficiency.

The second part of dissertation presents a digital low dropout regulator (DLDO) for system on a chip (SoC) in portable devices application. The proposed DLDO achieve fast transient settling time, lower undershoot/overshoot and higher PSR performance compared to state of the art. By having a good PSR performance, the proposed DLDO is able to power mixed signal load. To achieve a fast load transient response, a load transient detector (LTD) enables boost mode operation of the digital PI controller. The boost mode operation achieves sub microsecond settling time, and reduces the settling time by 50% to 250 ns, undershoot/overshoot by 35% to 250 mV and 17% to 125 mV without compromising the system stability.

Contributors

Agent

Created

Date Created
  • 2018

157687-Thumbnail Image.png

Implementation of Graph Kernels on Multi core Architecture

Description

Graphs are one of the key data structures for many real-world computing applica-

tions such as machine learning, social networks, genomics etc. The main challenges of

graph processing include diculty in parallelizing

Graphs are one of the key data structures for many real-world computing applica-

tions such as machine learning, social networks, genomics etc. The main challenges of

graph processing include diculty in parallelizing the workload that results in work-

load imbalance, poor memory locality and very large number of memory accesses.

This causes large-scale graph processing to be very expensive.

This thesis presents implementation of a select set of graph kernels on a multi-core

architecture, Transmuter. The kernels are Breadth-First Search (BFS), Page Rank

(PR), and Single Source Shortest Path (SSSP). Transmuter is a multi-tiled architec-

ture with 4 tiles and 16 general processing elements (GPE) per tile that supports a

two level cache hierarchy. All graph processing kernels have been implemented on

Transmuter using Gem5 architectural simulator.

The key pre-processing steps in improving the performance are static partition-

ing by destination and balancing the workload among the processing cores. Results

obtained by processing graphs that are partitioned against un-partitioned graphs

show almost 3x improvement in performance. Choice of data structure also plays an

important role in the amount of storage space consumed and the amount of synchro-

nization required in a parallel implementation. Here the compressed sparse column

data format was used. BFS and SSSP are frontier-based algorithms where a frontier

represents a subset of vertices that are active during the current iteration. They

were implemented using the Boolean frontier array data structure. PR is an iterative

algorithm where all vertices are active at all times.

The performance of the dierent Transmuter implementations for the 14nm node

were evaluated based on metrics such as power consumption (Watt), Giga Operations

Per Second(GOPS), GOPS/Watt and L1/L2 cache misses. GOPS/W numbers for

graphs with 10k nodes and 10k edges is 33 for BFS, 477 for PR and 10 for SSSP.

i

Frontier-based algorithms have much lower GOPS/W compared to iterative algo-

rithms such as PR. This is because all nodes in Page Rank are active at all points

in time. For all three kernel implementations, the L1 cache miss rates are quite low

while the L2 cache hit rates are high.

Contributors

Agent

Created

Date Created
  • 2019

157465-Thumbnail Image.png

How Does Technology Development Influence the Assessment of Parkinson’s Disease? A Systematic Review

Description

Parkinson’s disease (PD) is a neurological disorder with complicated and disabling motor and non-motor symptoms. The pathology for PD is difficult and expensive. Furthermore, it depends on patient

Parkinson’s disease (PD) is a neurological disorder with complicated and disabling motor and non-motor symptoms. The pathology for PD is difficult and expensive. Furthermore, it depends on patient diaries and the neurologist’s subjective assessment of clinical scales. Objective, accurate, and continuous patient monitoring have become possible with the advancement in mobile and portable equipment. Consequently, a significant amount of work has been done to explore new cost-effective and subjective assessment methods or PD symptoms. For example, smart technologies, such as wearable sensors and optical motion capturing systems, have been used to analyze the symptoms of a PD patient to assess their disease progression and even to detect signs in their nascent stage for early diagnosis of PD.

This review focuses on the use of modern equipment for PD applications that were developed in the last decade. Four significant fields of research were identified: Assistance diagnosis, Prognosis or Monitoring of Symptoms and their Severity, Predicting Response to Treatment, and Assistance to Therapy or Rehabilitation. This study reviews the papers published between January 2008 and December 2018 in the following four databases: Pubmed Central, Science Direct, IEEE Xplore and MDPI. After removing unrelated articles, ones published in languages other than English, duplicate entries and other articles that did not fulfill the selection criteria, 778 papers were manually investigated and included in this review. A general overview of PD applications, devices used and aspects monitored for PD management is provided in this systematic review.

Contributors

Agent

Created

Date Created
  • 2019

157773-Thumbnail Image.png

Power, Performance, and Energy Management of Heterogeneous Architectures

Description

Many core modern multiprocessor systems-on-chip offers tremendous power and performance

optimization opportunities by tuning thousands of potential voltage, frequency

and core configurations. Applications running on these architectures are becoming increasingly

complex. As the

Many core modern multiprocessor systems-on-chip offers tremendous power and performance

optimization opportunities by tuning thousands of potential voltage, frequency

and core configurations. Applications running on these architectures are becoming increasingly

complex. As the basic building blocks, which make up the application, change during

runtime, different configurations may become optimal with respect to power, performance

or other metrics. Identifying the optimal configuration at runtime is a daunting task due

to a large number of workloads and configurations. Therefore, there is a strong need to

evaluate the metrics of interest as a function of the supported configurations.

This thesis focuses on two different types of modern multiprocessor systems-on-chip

(SoC): Mobile heterogeneous systems and tile based Intel Xeon Phi architecture.

For mobile heterogeneous systems, this thesis presents a novel methodology that can

accurately instrument different types of applications with specific performance monitoring

calls. These calls provide a rich set of performance statistics at a basic block level while the

application runs on the target platform. The target architecture used for this work (Odroid

XU3) is capable of running at 4940 different frequency and core combinations. With the

help of instrumented application vast amount of characterization data is collected that provides

details about performance, power and CPU state at every instrumented basic block

across 19 different types of applications. The vast amount of data collected has enabled

two runtime schemes. The first work provides a methodology to find optimal configurations

in heterogeneous architecture using classifiers and demonstrates an average increase

of 93%, 81% and 6% in performance per watt compared to the interactive, ondemand and

powersave governors, respectively. The second work using same data shows a novel imitation

learning framework for dynamically controlling the type, number, and the frequencies

of active cores to achieve an average of 109% PPW improvement compared to the default

governors.

This work also presents how to accurately profile tile based Intel Xeon Phi architecture

while training different types of neural networks using open image dataset on deep learning

framework. The data collected allows deep exploratory analysis. It also showcases how

different hardware parameters affect performance of Xeon Phi.

Contributors

Agent

Created

Date Created
  • 2019

157804-Thumbnail Image.png

Energy-Efficient ASIC Accelerators for Machine/Deep Learning Algorithms

Description

While machine/deep learning algorithms have been successfully used in many practical applications including object detection and image/video classification, accurate, fast, and low-power hardware implementations of such algorithms are still a

While machine/deep learning algorithms have been successfully used in many practical applications including object detection and image/video classification, accurate, fast, and low-power hardware implementations of such algorithms are still a challenging task, especially for mobile systems such as Internet of Things, autonomous vehicles, and smart drones.

This work presents an energy-efficient programmable application-specific integrated circuit (ASIC) accelerator for object detection. The proposed ASIC supports multi-class (face/traffic sign/car license plate/pedestrian), many-object (up to 50) in one image with different sizes (6 down-/11 up-scaling), and high accuracy (87% for face detection datasets). The proposed accelerator is composed of an integral channel detector with 2,000 classifiers for five rigid boosted templates to make a strong object detection. By jointly optimizing the algorithm and efficient hardware architecture, the prototype chip implemented in 65nm demonstrates real-time object detection of 20-50 frames/s with 22.5-181.7mW (0.54-1.75nJ/pixel) at 0.58-1.1V supply.

In this work, to reduce computation without accuracy degradation, an energy-efficient deep convolutional neural network (DCNN) accelerator is proposed based on a novel conditional computing scheme and integrates convolution with subsequent max-pooling operations. This way, the total number of bit-wise convolutions could be reduced by ~2x, without affecting the output feature values. This work also has been developing an optimized dataflow that exploits sparsity, maximizes data re-use and minimizes off-chip memory access, which can improve upon existing hardware works. The total off-chip memory access can be saved by 2.12x. Preliminary results of the proposed DCNN accelerator achieved a peak 7.35 TOPS/W for VGG-16 by post-layout simulation results in 40nm.

A number of recent efforts have attempted to design custom inference engine based on various approaches, including the systolic architecture, near memory processing, and in-meomry computing concept. This work evaluates a comprehensive comparison of these various approaches in a unified framework. This work also presents the proposed energy-efficient in-memory computing accelerator for deep neural networks (DNNs) by integrating many instances of in-memory computing macros with an ensemble of peripheral digital circuits, which supports configurable multibit activations and large-scale DNNs seamlessly while substantially improving the chip-level energy-efficiency. Proposed accelerator is fully designed in 65nm, demonstrating ultralow energy consumption for DNNs.

Contributors

Agent

Created

Date Created
  • 2019