Matching Items (211)
152388-Thumbnail Image.png
Description
Microprocessors are the processing heart of any digital system and are central to all the technological advancements of the age including space exploration and monitoring. The demands of space exploration require a special class of microprocessors called radiation hardened microprocessors which are less susceptible to radiation present outside the earth's

Microprocessors are the processing heart of any digital system and are central to all the technological advancements of the age including space exploration and monitoring. The demands of space exploration require a special class of microprocessors called radiation hardened microprocessors which are less susceptible to radiation present outside the earth's atmosphere, in other words their functioning is not disrupted even in presence of disruptive radiation. The presence of these particles forces the designers to come up with design techniques at circuit and chip levels to alleviate the errors which can be encountered in the functioning of microprocessors. Microprocessor evolution has been very rapid in terms of performance but the same cannot be said about its rad-hard counterpart. With the total data processing capability overall increasing rapidly, the clear lack of performance of the processors manifests as a bottleneck in any processing system. To design high performance rad-hard microprocessors designers have to overcome difficult design problems at various design stages i.e. Architecture, Synthesis, Floorplanning, Optimization, routing and analysis all the while maintaining circuit radiation hardness. The reference design `HERMES' is targeted at 90nm IBM G process and is expected to reach 500Mhz which is twice as fast any processor currently available. Chapter 1 talks about the mechanisms of radiation effects which cause upsets and degradation to the functioning of digital circuits. Chapter 2 gives a brief description of the components which are used in the design and are part of the consistent efforts at ASUVLSI lab culminating in this chip level implementation of the design. Chapter 3 explains the basic digital design ASIC flow and the changes made to it leading to a rad-hard specific ASIC flow used in implementing this chip. Chapter 4 talks about the triple mode redundant (TMR) specific flow which is used in the block implementation, delineating the challenges faced and the solutions proposed to make the flow work. Chapter 5 explains the challenges faced and solutions arrived at while using the top-level flow described in chapter 3. Chapter 6 puts together the results and analyzes the design in terms of basic integrated circuit design constraints.
ContributorsRamamurthy, Chandarasekaran (Author) / Clark, Lawrence T (Thesis advisor) / Holbert, Keith E. (Committee member) / Barnaby, Hugh J (Committee member) / Mayhew, David (Committee member) / Arizona State University (Publisher)
Created2013
153334-Thumbnail Image.png
Description
Three dimensional (3-D) ultrasound is safe, inexpensive, and has been shown to drastically improve system ease-of-use, diagnostic efficiency, and patient throughput. However, its high computational complexity and resulting high power consumption has precluded its use in hand-held applications.

In this dissertation, algorithm-architecture co-design techniques that aim to make hand-held 3-D ultrasound

Three dimensional (3-D) ultrasound is safe, inexpensive, and has been shown to drastically improve system ease-of-use, diagnostic efficiency, and patient throughput. However, its high computational complexity and resulting high power consumption has precluded its use in hand-held applications.

In this dissertation, algorithm-architecture co-design techniques that aim to make hand-held 3-D ultrasound a reality are presented. First, image enhancement methods to improve signal-to-noise ratio (SNR) are proposed. These include virtual source firing techniques and a low overhead digital front-end architecture using orthogonal chirps and orthogonal Golay codes.

Second, algorithm-architecture co-design techniques to reduce the power consumption of 3-D SAU imaging systems is presented. These include (i) a subaperture multiplexing strategy and the corresponding apodization method to alleviate the signal bandwidth bottleneck, and (ii) a highly efficient iterative delay calculation method to eliminate complex operations such as multiplications, divisions and square-root in delay calculation during beamforming. These techniques were used to define Sonic Millip3De, a 3-D die stacked architecture for digital beamforming in SAU systems. Sonic Millip3De produces 3-D high resolution images at 2 frames per second with system power consumption of 15W in 45nm technology.

Third, a new beamforming method based on separable delay decomposition is proposed to reduce the computational complexity of the beamforming unit in an SAU system. The method is based on minimizing the root-mean-square error (RMSE) due to delay decomposition. It reduces the beamforming complexity of a SAU system by 19x while providing high image fidelity that is comparable to non-separable beamforming. The resulting modified Sonic Millip3De architecture supports a frame rate of 32 volumes per second while maintaining power consumption of 15W in 45nm technology.

Next a 3-D plane-wave imaging system that utilizes both separable beamforming and coherent compounding is presented. The resulting system has computational complexity comparable to that of a non-separable non-compounding baseline system while significantly improving contrast-to-noise ratio and SNR. The modified Sonic Millip3De architecture is now capable of generating high resolution images at 1000 volumes per second with 9-fire-angle compounding.
ContributorsYang, Ming (Author) / Chakrabarti, Chaitali (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Karam, Lina (Committee member) / Frakes, David (Committee member) / Ogras, Umit Y. (Committee member) / Arizona State University (Publisher)
Created2015
152892-Thumbnail Image.png
Description
Mobile platforms are becoming highly heterogeneous by combining a powerful multiprocessor system-on-chip (MpSoC) with numerous resources including display, memory, power management IC (PMIC), battery and wireless modems into a compact package. Furthermore, the MpSoC itself is a heterogeneous resource that integrates many processing elements such as CPU cores, GPU, video,

Mobile platforms are becoming highly heterogeneous by combining a powerful multiprocessor system-on-chip (MpSoC) with numerous resources including display, memory, power management IC (PMIC), battery and wireless modems into a compact package. Furthermore, the MpSoC itself is a heterogeneous resource that integrates many processing elements such as CPU cores, GPU, video, image, and audio processors. As a result, optimization approaches targeting mobile computing needs to consider the platform at various levels of granularity.

Platform energy consumption and responsiveness are two major considerations for mobile systems since they determine the battery life and user satisfaction, respectively. In this work, the models for power consumption, response time, and energy consumption of heterogeneous mobile platforms are presented. Then, these models are used to optimize the energy consumption of baseline platforms under power, response time, and temperature constraints with and without introducing new resources. It is shown, the optimal design choices depend on dynamic power management algorithm, and adding new resources is more energy efficient than scaling existing resources alone. The framework is verified through actual experiments on Qualcomm Snapdragon 800 based tablet MDP/T. Furthermore, usage of the framework at both design and runtime optimization is also presented.
ContributorsGupta, Ujjwala (Author) / Ogras, Umit Y. (Thesis advisor) / Ozev, Sule (Committee member) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)
Created2014
152893-Thumbnail Image.png
Description
Network traffic analysis by means of Quality of Service (QoS) is a popular research and development area among researchers for a long time. It is becoming even more relevant recently due to ever increasing use of the Internet and other public and private communication networks. Fast and precise QoS analysis

Network traffic analysis by means of Quality of Service (QoS) is a popular research and development area among researchers for a long time. It is becoming even more relevant recently due to ever increasing use of the Internet and other public and private communication networks. Fast and precise QoS analysis is a vital task in mission-critical communication networks (MCCNs), where providing a certain level of QoS is essential for national security, safety or economic vitality. In this thesis, the details of all aspects of a comprehensive computational framework for QoS analysis in MCCNs are provided. There are three main QoS analysis tasks in MCCNs; QoS measurement, QoS visualization and QoS prediction. Definitions of these tasks are provided and for each of those, complete solutions are suggested either by referring to an existing work or providing novel methods.

A scalable and accurate passive one-way QoS measurement algorithm is proposed. It is shown that accurate QoS measurements are possible using network flow data.

Requirements of a good QoS visualization platform are listed. Implementations of the capabilities of a complete visualization platform are presented.

Steps of QoS prediction task in MCCNs are defined. The details of feature selection, class balancing through sampling and assessing classification algorithms for this task are outlined. Moreover, a novel tree based logistic regression method for knowledge discovery is introduced. Developed prediction framework is capable of making very accurate packet level QoS predictions and giving valuable insights to network administrators.
ContributorsSenturk, Muhammet Burhan (Author) / Li, Jing (Thesis advisor) / Baydogan, Mustafa G (Committee member) / Wu, Teresa (Committee member) / Arizona State University (Publisher)
Created2014
152997-Thumbnail Image.png
Description
Stream processing has emerged as an important model of computation especially in the context of multimedia and communication sub-systems of embedded System-on-Chip (SoC) architectures. The dataflow nature of streaming applications allows them to be most naturally expressed as a set of kernels iteratively operating on continuous streams of data. The

Stream processing has emerged as an important model of computation especially in the context of multimedia and communication sub-systems of embedded System-on-Chip (SoC) architectures. The dataflow nature of streaming applications allows them to be most naturally expressed as a set of kernels iteratively operating on continuous streams of data. The kernels are computationally intensive and are mainly characterized by real-time constraints that demand high throughput and data bandwidth with limited global data reuse. Conventional architectures fail to meet these demands due to their poorly matched execution models and the overheads associated with instruction and data movements.

This work presents StreamWorks, a multi-core embedded architecture for energy-efficient stream computing. The basic processing element in the StreamWorks architecture is the StreamEngine (SE) which is responsible for iteratively executing a stream kernel. SE introduces an instruction locking mechanism that exploits the iterative nature of the kernels and enables fine-grain instruction reuse. Each instruction in a SE is locked to a Reservation Station (RS) and revitalizes itself after execution; thus never retiring from the RS. The entire kernel is hosted in RS Banks (RSBs) close to functional units for energy-efficient instruction delivery. The dataflow semantics of stream kernels are captured by a context-aware dataflow execution mode that efficiently exploits the Instruction Level Parallelism (ILP) and Data-level parallelism (DLP) within stream kernels.

Multiple SEs are grouped together to form a StreamCluster (SC) that communicate via a local interconnect. A novel software FIFO virtualization technique with split-join functionality is proposed for efficient and scalable stream communication across SEs. The proposed communication mechanism exploits the Task-level parallelism (TLP) of the stream application. The performance and scalability of the communication mechanism is evaluated against the existing data movement schemes for scratchpad based multi-core architectures. Further, overlay schemes and architectural support are proposed that allow hosting any number of kernels on the StreamWorks architecture. The proposed oevrlay schemes for code management supports kernel(context) switching for the most common use cases and can be adapted for any multi-core architecture that use software managed local memories.

The performance and energy-efficiency of the StreamWorks architecture is evaluated for stream kernel and application benchmarks by implementing the architecture in 45nm TSMC and comparison with a low power RISC core and a contemporary accelerator.
ContributorsPanda, Amrit (Author) / Chatha, Karam S. (Thesis advisor) / Wu, Carole-Jean (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)
Created2014
153089-Thumbnail Image.png
Description
A benchmark suite that is representative of the programs a processor typically executes is necessary to understand a processor's performance or energy consumption characteristics. The first contribution of this work addresses this need for mobile platforms with MobileBench, a selection of representative smartphone applications. In smartphones, like any other

A benchmark suite that is representative of the programs a processor typically executes is necessary to understand a processor's performance or energy consumption characteristics. The first contribution of this work addresses this need for mobile platforms with MobileBench, a selection of representative smartphone applications. In smartphones, like any other portable computing systems, energy is a limited resource. Based on the energy characterization of a commercial widely-used smartphone, application cores are found to consume a significant part of the total energy consumption of the device. With this insight, the subsequent part of this thesis focuses on the portion of energy that is spent to move data from the memory system to the application core's internal registers. The primary motivation for this work comes from the relatively higher power consumption associated with a data movement instruction compared to that of an arithmetic instruction. The data movement energy cost is worsened esp. in a System on Chip (SoC) because the amount of data received and exchanged in a SoC based smartphone increases at an explosive rate. A detailed investigation is performed to quantify the impact of data movement

on the overall energy consumption of a smartphone device. To aid this study, microbenchmarks that generate desired data movement patterns between different levels of the memory hierarchy are designed. Energy costs of data movement are then computed by measuring the instantaneous power consumption of the device when the micro benchmarks are executed. This work makes an extensive use of hardware performance counters to validate the memory access behavior of microbenchmarks and to characterize the energy consumed in moving data. Finally, the calculated energy costs of data movement are used to characterize the portion of energy that MobileBench applications spend in moving data. The results of this study show that a significant 35% of the total device energy is spent in data movement alone. Energy is an increasingly important criteria in the context of designing architectures for future smartphones and this thesis offers insights into data movement energy consumption.
ContributorsPandiyan, Dhinakaran (Author) / Wu, Carole-Jean (Thesis advisor) / Shrivastava, Aviral (Committee member) / Lee, Yann-Hang (Committee member) / Arizona State University (Publisher)
Created2014
153094-Thumbnail Image.png
Description
Android is currently the most widely used mobile operating system. The permission model in Android governs the resource access privileges of applications. The permission model however is amenable to various attacks, including re-delegation attacks, background snooping attacks and disclosure of private information. This thesis is aimed at understanding, analyzing and

Android is currently the most widely used mobile operating system. The permission model in Android governs the resource access privileges of applications. The permission model however is amenable to various attacks, including re-delegation attacks, background snooping attacks and disclosure of private information. This thesis is aimed at understanding, analyzing and performing forensics on application behavior. This research sheds light on several security aspects, including the use of inter-process communications (IPC) to perform permission re-delegation attacks.

Android permission system is more of app-driven rather than user controlled, which means it is the applications that specify their permission requirement and the only thing which the user can do is choose not to install a particular application based on the requirements. Given the all or nothing choice, users succumb to pressures and needs to accept permissions requested. This thesis proposes a couple of ways for providing the users finer grained control of application privileges. The same methods can be used to evade the Permission Re-delegation attack.

This thesis also proposes and implements a novel methodology in Android that can be used to control the access privileges of an Android application, taking into consideration the context of the running application. This application-context based permission usage is further used to analyze a set of sample applications. We found the evidence of applications spoofing or divulging user sensitive information such as location information, contact information, phone id and numbers, in the background. Such activities can be used to track users for a variety of privacy-intrusive purposes. We have developed implementations that minimize several forms of privacy leaks that are routinely done by stock applications.
ContributorsGollapudi, Narasimha Aditya (Author) / Dasgupta, Partha (Thesis advisor) / Xue, Guoliang (Committee member) / Doupe, Adam (Committee member) / Arizona State University (Publisher)
Created2014
153033-Thumbnail Image.png
Description
Coarse Grain Reconfigurable Arrays (CGRAs) are promising accelerators capable of

achieving high performance at low power consumption. While CGRAs can efficiently

accelerate loop kernels, accelerating loops with control flow (loops with if-then-else

structures) is quite challenging. Techniques that handle control flow execution in

CGRAs generally use predication. Such techniques execute both branches of an

if-then-else

Coarse Grain Reconfigurable Arrays (CGRAs) are promising accelerators capable of

achieving high performance at low power consumption. While CGRAs can efficiently

accelerate loop kernels, accelerating loops with control flow (loops with if-then-else

structures) is quite challenging. Techniques that handle control flow execution in

CGRAs generally use predication. Such techniques execute both branches of an

if-then-else structure and select outcome of either branch to commit based on the

result of the conditional. This results in poor utilization of CGRA s computational

resources. Dual-issue scheme which is the state of the art technique for control flow

fetches instructions from both paths of the branch and selects one to execute at

runtime based on the result of the conditional. This technique has an overhead in

instruction fetch bandwidth. In this thesis, to improve performance of control flow

execution in CGRAs, I propose a solution in which the result of the conditional

expression that decides the branch outcome is communicated to the instruction fetch

unit to selectively issue instructions from the path taken by the branch at run time.

Experimental results show that my solution can achieve 34.6% better performance

and 52.1% improvement in energy efficiency on an average compared to state of the

art dual issue scheme without imposing any overhead in instruction fetch bandwidth.
ContributorsRajendran Radhika, Shri Hari (Author) / Shrivastava, Aviral (Thesis advisor) / Christen, Jennifer Blain (Committee member) / Cao, Yu (Committee member) / Arizona State University (Publisher)
Created2014
153040-Thumbnail Image.png
Description
Android has been the dominant platform in which most of the mobile development is being done. By the end of the second quarter of 2014, 84.7 percent of the entire world mobile phones market share had been captured by Android. The Android library internally uses the modified Linux kernel as

Android has been the dominant platform in which most of the mobile development is being done. By the end of the second quarter of 2014, 84.7 percent of the entire world mobile phones market share had been captured by Android. The Android library internally uses the modified Linux kernel as the part of its stack. The I/O scheduler, is a part of the Linux kernel, responsible for scheduling data requests to the internal and the external memory devices that are attached to the mobile systems.

The usage of solid state drives in the Android tablet has also seen a rise owing to its speed of operation and mechanical stability. The I/O schedulers that exist in the present Linux kernel are not better suited for handling solid state drives in particular to exploit the inherent parallelism offered by the solid state drives. The Android provides information to the Linux kernel about the processes running in the foreground and background. Based on this information the kernel decides the process scheduling and the memory management, but no such information exists for the I/O scheduling. Research shows that the resource management could be done better if the operating system is aware of the characteristics of the requester. Thus, there is a need for a better I/O scheduler that could schedule I/O operations based on the application and also exploit the parallelism in the solid state drives. The scheduler proposed through this research does that. It contains two algorithms working in unison one focusing on the solid state drives and the other on the application awareness.

The Android application context aware scheduler has the features of increasing the responsiveness of the time sensitive applications and also increases the throughput by parallel scheduling of request in the solid state drive. The suggested scheduler is tested using standard benchmarks and real-time scenarios, the results convey that our scheduler outperforms the existing default completely fair queuing scheduler of the Android.
ContributorsSivasankaran, Jeevan Prasath (Author) / Lee, Yann Hang (Thesis advisor) / Wu, Carole-Jean (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)
Created2014
153394-Thumbnail Image.png
Description
As a promising solution to the problem of acquiring and storing large amounts of image and video data, spatial-multiplexing camera architectures have received lot of attention in the recent past. Such architectures have the attractive feature of combining a two-step process of acquisition and compression of pixel measurements in a

As a promising solution to the problem of acquiring and storing large amounts of image and video data, spatial-multiplexing camera architectures have received lot of attention in the recent past. Such architectures have the attractive feature of combining a two-step process of acquisition and compression of pixel measurements in a conventional camera, into a single step. A popular variant is the single-pixel camera that obtains measurements of the scene using a pseudo-random measurement matrix. Advances in compressive sensing (CS) theory in the past decade have supplied the tools that, in theory, allow near-perfect reconstruction of an image from these measurements even for sub-Nyquist sampling rates. However, current state-of-the-art reconstruction algorithms suffer from two drawbacks -- They are (1) computationally very expensive and (2) incapable of yielding high fidelity reconstructions for high compression ratios. In computer vision, the final goal is usually to perform an inference task using the images acquired and not signal recovery. With this motivation, this thesis considers the possibility of inference directly from compressed measurements, thereby obviating the need to use expensive reconstruction algorithms. It is often the case that non-linear features are used for inference tasks in computer vision. However, currently, it is unclear how to extract such features from compressed measurements. Instead, using the theoretical basis provided by the Johnson-Lindenstrauss lemma, discriminative features using smashed correlation filters are derived and it is shown that it is indeed possible to perform reconstruction-free inference at high compression ratios with only a marginal loss in accuracy. As a specific inference problem in computer vision, face recognition is considered, mainly beyond the visible spectrum such as in the short wave infra-red region (SWIR), where sensors are expensive.
ContributorsLohit, Suhas Anand (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2015