Search Content

System-level synthesis of dataplane subsystems for MPSoCs

Description

In recent years we have witnessed a shift towards multi-processor system-on-chips (MPSoCs) to address the demands of embedded devices (such as cell phones, GPS devices, luxury car features, etc.). Highly optimized MPSoCs are well-suited to tackle the complex application demands desired by the end user customer. These MPSoCs incorporate a…

In recent years we have witnessed a shift towards multi-processor system-on-chips (MPSoCs) to address the demands of embedded devices (such as cell phones, GPS devices, luxury car features, etc.). Highly optimized MPSoCs are well-suited to tackle the complex application demands desired by the end user customer. These MPSoCs incorporate a constellation of heterogeneous processing elements (PEs) (general purpose PEs and application-specific integrated circuits (ASICS)). A typical MPSoC will be composed of a application processor, such as an ARM Coretex-A9 with cache coherent memory hierarchy, and several application sub-systems. Each of these sub-systems are composed of highly optimized instruction processors, graphics/DSP processors, and custom hardware accelerators. Typically, these sub-systems utilize scratchpad memories (SPM) rather than support cache coherency. The overall architecture is an integration of the various sub-systems through a high bandwidth system-level interconnect (such as a Network-on-Chip (NoC)). The shift to MPSoCs has been fueled by three major factors: demand for high performance, the use of component libraries, and short design turn around time. As customers continue to desire more and more complex applications on their embedded devices the performance demand for these devices continues to increase. Designers have turned to using MPSoCs to address this demand. By using pre-made IP libraries designers can quickly piece together a MPSoC that will meet the application demands of the end user with minimal time spent designing new hardware. Additionally, the use of MPSoCs allows designers to generate new devices very quickly and thus reducing the time to market. In this work, a complete MPSoC synthesis design flow is presented. We first present a technique \cite{leary1_intro} to address the synthesis of the interconnect architecture (particularly Network-on-Chip (NoC)). We then address the synthesis of the memory architecture of a MPSoC sub-system \cite{leary2_intro}. Lastly, we present a co-synthesis technique to generate the functional and memory architectures simultaneously. The validity and quality of each synthesis technique is demonstrated through extensive experimentation.

ContributorsLeary, Glenn (Author) / Chatha, Karamvir S (Thesis advisor) / Vrudhula, Sarma (Committee member) / Shrivastava, Aviral (Committee member) / Beraha, Rudy (Committee member) / Arizona State University (Publisher)

Created2013

Dynamic programming algorithm for computing temporal logic robustness

Description

In this thesis we deal with the problem of temporal logic robustness estimation. We present a dynamic programming algorithm for the robust estimation problem of Metric Temporal Logic (MTL) formulas regarding a finite trace of time stated sequence. This algorithm not only tests if the MTL specification is satisfied by…

In this thesis we deal with the problem of temporal logic robustness estimation. We present a dynamic programming algorithm for the robust estimation problem of Metric Temporal Logic (MTL) formulas regarding a finite trace of time stated sequence. This algorithm not only tests if the MTL specification is satisfied by the given input which is a finite system trajectory, but also quantifies to what extend does the sequence satisfies or violates the MTL specification. The implementation of the algorithm is the DP-TALIRO toolbox for MATLAB. Currently it is used as the temporal logic robust computing engine of S-TALIRO which is a tool for MATLAB searching for trajectories of minimal robustness in Simulink/ Stateflow. DP-TALIRO is expected to have near linear running time and constant memory requirement depending on the structure of the MTL formula. DP-TALIRO toolbox also integrates new features not supported in its ancestor FW-TALIRO such as parameter replacement, most related iteration and most related predicate. A derivative of DP-TALIRO which is DP-T-TALIRO is also addressed in this thesis which applies dynamic programming algorithm for time robustness computation. We test the running time of DP-TALIRO and compare it with FW-TALIRO. Finally, we present an application where DP-TALIRO is used as the robustness computation core of S-TALIRO for a parameter estimation problem.

ContributorsYang, Hengyi (Author) / Fainekos, Georgios (Thesis advisor) / Sarjoughian, Hessam S. (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)

Created2013

Software techniques in the compromise of energy and accuracy

Description

Software has a great impact on the energy efficiency of any computing system--it can manage the components of a system efficiently or inefficiently. The impact of software is amplified in the context of a wearable computing system used for activity recognition. The design space this platform opens up is immense…

Software has a great impact on the energy efficiency of any computing system--it can manage the components of a system efficiently or inefficiently. The impact of software is amplified in the context of a wearable computing system used for activity recognition. The design space this platform opens up is immense and encompasses sensors, feature calculations, activity classification algorithms, sleep schedules, and transmission protocols. Design choices in each of these areas impact energy use, overall accuracy, and usefulness of the system. This thesis explores methods software can influence the trade-off between energy consumption and system accuracy. In general the more energy a system consumes the more accurate will be. We explore how finding the transitions between human activities is able to reduce the energy consumption of such systems without reducing much accuracy. We introduce the Log-likelihood Ratio Test as a method to detect transitions, and explore how choices of sensor, feature calculations, and parameters concerning time segmentation affect the accuracy of this method. We discovered an approximate 5X increase in energy efficiency could be achieved with only a 5% decrease in accuracy. We also address how a system's sleep mode, in which the processor enters a low-power state and sensors are turned off, affects a wearable computing platform that does activity recognition. We discuss the energy trade-offs in each stage of the activity recognition process. We find that careful analysis of these parameters can result in great increases in energy efficiency if small compromises in overall accuracy can be tolerated. We call this the ``Great Compromise.'' We found a 6X increase in efficiency with a 7% decrease in accuracy. We then consider how wireless transmission of data affects the overall energy efficiency of a wearable computing platform. We find that design decisions such as feature calculations and grouping size have a great impact on the energy consumption of the system because of the amount of data that is stored and transmitted. For example, storing and transmitting vector-based features such as FFT or DCT do not compress the signal and would use more energy than storing and transmitting the raw signal. The effect of grouping size on energy consumption depends on the feature. For scalar features energy consumption is proportional in the inverse of grouping size, so it's reduced as grouping size goes up. For features that depend on the grouping size, such as FFT, energy increases with the logarithm of grouping size, so energy consumption increases slowly as grouping size increases. We find that compressing data through activity classification and transition detection significantly reduces energy consumption and that the energy consumed for the classification overhead is negligible compared to the energy savings from data compression. We provide mathematical models of energy usage and data generation, and test our ideas using a mobile computing platform, the Texas Instruments Chronos watch.

ContributorsBoyd, Jeffrey Michael (Author) / Sundaram, Hari (Thesis advisor) / Li, Baoxin (Thesis advisor) / Shrivastava, Aviral (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2014

Method development in crystallization for femtosecond nanocrystallography

Description

Membrane proteins are a vital part of cellular structure. They are directly involved in many important cellular functions, such as uptake, signaling, respiration, and photosynthesis, among others. Despite their importance, however, less than 500 unique membrane protein structures have been determined to date. This is due to several difficulties with…

Membrane proteins are a vital part of cellular structure. They are directly involved in many important cellular functions, such as uptake, signaling, respiration, and photosynthesis, among others. Despite their importance, however, less than 500 unique membrane protein structures have been determined to date. This is due to several difficulties with macromolecular crystallography, primarily the difficulty of growing large, well-ordered protein crystals. Since the first proof of concept for femtosecond nanocrystallography showing that diffraction patterns can be collected on extremely small crystals, thus negating the need to grow larger crystals, there have been many exciting advancements in the field. The technique has been proven to show high spatial resolution, thus making it a viable method for structural biology. However, due to the ultrafast nature of the technique, which allows for a lack of radiation damage in imaging, even more interesting experiments are possible, and the first temporal and spatial images of an undamaged structure could be acquired. This concept was denoted as time-resolved femtosecond nanocrystallography.

This dissertation presents on the first time-resolved data set of Photosystem II where structural changes can actually be seen without radiation damage. In order to accomplish this, new crystallization techniques had to be developed so that enough crystals could be made for the liquid jet to deliver a fully hydrated stream of crystals to the high-powered X-ray source. These changes are still in the preliminary stages due to the slightly lower resolution data obtained, but they are still a promising show of the power of this new technique. With further optimization of crystal growth methods and quality, injection technique, and continued development of data analysis software, it is only a matter of time before the ability to make movies of molecules in motion from X-ray diffraction snapshots in time exists. The work presented here is the first step in that process.

ContributorsKupitz, Christopher (Author) / Fromme, Petra (Thesis advisor) / Spence, John C. (Thesis advisor) / Redding, Kevin (Committee member) / Ros, Alexandra (Committee member) / Arizona State University (Publisher)

Created2014

Characterizing the influence of amino acids on the oxidation/reduction properties of transition metals

Description

The utilization of solar energy requires an efficient means of its storage as fuel. In bio-inspired artificial photosynthesis, light energy can be used to drive water oxidation, but catalysts that produce molecular oxygen from water are required. This dissertation demonstrates a novel complex utilizing earth-abundant Ni in combination with glycine…

The utilization of solar energy requires an efficient means of its storage as fuel. In bio-inspired artificial photosynthesis, light energy can be used to drive water oxidation, but catalysts that produce molecular oxygen from water are required. This dissertation demonstrates a novel complex utilizing earth-abundant Ni in combination with glycine as an efficient catalyst with a modest overpotential of 0.475 ± 0.005 V for a current density of 1 mA/cm2 at pH 11. The production of molecular oxygen at a high potential was verified by measurement of the change in oxygen concentration, yielding a Faradaic efficiency of 60 ± 5%. This Ni species can achieve a current density of 4 mA/cm2 that persists for at least 10 hours. Based upon the observed pH dependence of the current amplitude and oxidation/reduction peaks, the catalysis is an electron-proton coupled process. In addition, to investigate the binding of divalent metals to proteins, four peptides were designed and synthesized with carboxylate and histidine ligands. The binding of the metals was characterized by monitoring the metal-induced changes in circular dichroism spectra. Cyclic voltammetry demonstrated that bound copper underwent a Cu(I)/Cu(II) oxidation/reduction change at a potential of approximately 0.32 V in a quasi-reversible process. The relative binding affinity of Mn(II), Fe(II), Co(II), Ni(II) and Cu(II) to the peptides is correlated with the stability constants of the Irving-Williams series for divalent metal ions. A potential application of these complexes of transition metals with amino acids or peptides is in the development of artificial photosynthetic cells.

ContributorsWang, Dong (Author) / Allen, James P. (Thesis advisor) / Ghirlanda, Giovanna (Committee member) / Redding, Kevin (Committee member) / Arizona State University (Publisher)

Created2014

Glycan-cyanovirin-N interactions and designed WW domains: combining experimental and computational studies

Description

Cyanovirin-N (CVN) is a cyanobacterial lectin with potent anti-HIV activity, mediated by binding to the N-linked oligosaccharide moiety of the envelope protein gp120. CVN offers a scaffold to develop multivalent carbohydrate-binding proteins with tunable specificities and affinities. I present here biophysical calculations completed on a monomeric-stabilized mutant of cyanovirin-N, P51G-m4-CVN,…

Cyanovirin-N (CVN) is a cyanobacterial lectin with potent anti-HIV activity, mediated by binding to the N-linked oligosaccharide moiety of the envelope protein gp120. CVN offers a scaffold to develop multivalent carbohydrate-binding proteins with tunable specificities and affinities. I present here biophysical calculations completed on a monomeric-stabilized mutant of cyanovirin-N, P51G-m4-CVN, in which domain A binding activity is abolished by four mutations; with comparisons made to CVNmutDB, in which domain B binding activity is abolished. Using Monte Carlo calculations and docking simulations, mutations in CVNmutDB were considered singularly, and the mutations E41A/G and T57A were found to impact the affinity towards dimannose the greatest. 15N-labeled proteins were titrated with Manα(1-2)Manα, while following chemical shift perturbations in NMR spectra. The mutants, E41A/G and T57A, had a larger Kd than P51G-m4-CVN, matching the trends predicted by the calculations. We also observed that the N42A mutation affects the local fold of the binding pocket, thus removing all binding to dimannose. Characterization of the mutant N53S showed similar binding affinity to P51G-m4-CVN. Using biophysical calculations allows us to study future iterations of models to explore affinities and specificities. In order to further elucidate the role of multivalency, I report here a designed covalent dimer of CVN, Nested cyanovirin-N (Nested CVN), which has four binding sites. Nested CVN was found to have comparable binding affinity to gp120 and antiviral activity to wt CVN. These results demonstrate the ability to create a multivalent, covalent dimer that has comparable results to that of wt CVN.

WW domains are small modules consisting of 32-40 amino acids that recognize proline-rich peptides and are found in many signaling pathways. We use WW domain sequences to explore protein folding by simulations using Zipping and Assembly Method. We identified five crucial contacts that enabled us to predict the folding of WW domain sequences based on those contacts. We then designed a folded WW domain peptide from an unfolded WW domain sequence by introducing native contacts at those critical positions.

ContributorsWoodrum, Brian William (Author) / Ghirlanda, Giovanna (Thesis advisor) / Redding, Kevin (Committee member) / Wang, Xu (Committee member) / Arizona State University (Publisher)

Created2014

Towards biohybrid artificial photosynthesis

Description

A vast amount of energy emanates from the sun, and at the distance of Earth, approximately 172,500 TW reaches the atmosphere. Of that, 80,600 TW reaches the surface with 15,600 TW falling on land. Photosynthesis converts 156 TW in the form of biomass, which represents all food/fuel for the biosphere…

A vast amount of energy emanates from the sun, and at the distance of Earth, approximately 172,500 TW reaches the atmosphere. Of that, 80,600 TW reaches the surface with 15,600 TW falling on land. Photosynthesis converts 156 TW in the form of biomass, which represents all food/fuel for the biosphere with about 20 TW of the total product used by humans. Additionally, our society uses approximately 20 more TW of energy from ancient photosynthetic products i.e. fossil fuels. In order to mitigate climate problems, the carbon dioxide must be removed from the human energy usage by replacement or recycling as an energy carrier. Proposals have been made to process biomass into biofuels; this work demonstrates that current efficiencies of natural photosynthesis are inadequate for this purpose, the effects of fossil fuel replacement with biofuels is ecologically irresponsible, and new technologies are required to operate at sufficient efficiencies to utilize artificial solar-to-fuels systems. Herein a hybrid bioderived self-assembling hydrogen-evolving nanoparticle consisting of photosystem I (PSI) and platinum nanoclusters is demonstrated to operate with an overall efficiency of 6%, which exceeds that of land plants by more than an order of magnitude. The system was limited by the rate of electron donation to photooxidized PSI. Further work investigated the interactions of natural donor acceptor pairs of cytochrome c6 and PSI for the thermophilic cyanobacteria Thermosynechococcus elogantus BP1 and the red alga Galderia sulphuraria. The cyanobacterial system is typified by collisional control while the algal system demonstrates a population of prebound PSI-cytochrome c6 complexes with faster electron transfer rates. Combining the stability of cyanobacterial PSI and kinetics of the algal PSI:cytochrome would result in more efficient solar-to-fuel conversion. A second priority is the replacement of platinum with chemically abundant catalysts. In this work, protein scaffolds are employed using host-guest strategies to increase the stability of proton reduction catalysts and enhance the turnover number without the oxygen sensitivity of hydrogenases. Finally, design of unnatural electron transfer proteins are explored and may introduce a bioorthogonal method of introducing alternative electron transfer pathways in vitro or in vivo in the case of engineered photosynthetic organisms.

ContributorsVaughn, Michael David (Author) / Moore, Thomas (Thesis advisor) / Fromme, Petra (Thesis advisor) / Ghirlanda, Giovanna (Committee member) / Redding, Kevin (Committee member) / Arizona State University (Publisher)

Created2014

StreamWorks: an energy-efficient embedded co-processor for stream computing

Description

Stream processing has emerged as an important model of computation especially in the context of multimedia and communication sub-systems of embedded System-on-Chip (SoC) architectures. The dataflow nature of streaming applications allows them to be most naturally expressed as a set of kernels iteratively operating on continuous streams of data. The…

Stream processing has emerged as an important model of computation especially in the context of multimedia and communication sub-systems of embedded System-on-Chip (SoC) architectures. The dataflow nature of streaming applications allows them to be most naturally expressed as a set of kernels iteratively operating on continuous streams of data. The kernels are computationally intensive and are mainly characterized by real-time constraints that demand high throughput and data bandwidth with limited global data reuse. Conventional architectures fail to meet these demands due to their poorly matched execution models and the overheads associated with instruction and data movements.

This work presents StreamWorks, a multi-core embedded architecture for energy-efficient stream computing. The basic processing element in the StreamWorks architecture is the StreamEngine (SE) which is responsible for iteratively executing a stream kernel. SE introduces an instruction locking mechanism that exploits the iterative nature of the kernels and enables fine-grain instruction reuse. Each instruction in a SE is locked to a Reservation Station (RS) and revitalizes itself after execution; thus never retiring from the RS. The entire kernel is hosted in RS Banks (RSBs) close to functional units for energy-efficient instruction delivery. The dataflow semantics of stream kernels are captured by a context-aware dataflow execution mode that efficiently exploits the Instruction Level Parallelism (ILP) and Data-level parallelism (DLP) within stream kernels.

Multiple SEs are grouped together to form a StreamCluster (SC) that communicate via a local interconnect. A novel software FIFO virtualization technique with split-join functionality is proposed for efficient and scalable stream communication across SEs. The proposed communication mechanism exploits the Task-level parallelism (TLP) of the stream application. The performance and scalability of the communication mechanism is evaluated against the existing data movement schemes for scratchpad based multi-core architectures. Further, overlay schemes and architectural support are proposed that allow hosting any number of kernels on the StreamWorks architecture. The proposed oevrlay schemes for code management supports kernel(context) switching for the most common use cases and can be adapted for any multi-core architecture that use software managed local memories.

The performance and energy-efficiency of the StreamWorks architecture is evaluated for stream kernel and application benchmarks by implementing the architecture in 45nm TSMC and comparison with a low power RISC core and a contemporary accelerator.

ContributorsPanda, Amrit (Author) / Chatha, Karam S. (Thesis advisor) / Wu, Carole-Jean (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)

Created2014

Data movement energy characterization of emerging smartphone workloads for mobile platforms

Description

A benchmark suite that is representative of the programs a processor typically executes is necessary to understand a processor's performance or energy consumption characteristics. The first contribution of this work addresses this need for mobile platforms with MobileBench, a selection of representative smartphone applications. In smartphones, like any other…

A benchmark suite that is representative of the programs a processor typically executes is necessary to understand a processor's performance or energy consumption characteristics. The first contribution of this work addresses this need for mobile platforms with MobileBench, a selection of representative smartphone applications. In smartphones, like any other portable computing systems, energy is a limited resource. Based on the energy characterization of a commercial widely-used smartphone, application cores are found to consume a significant part of the total energy consumption of the device. With this insight, the subsequent part of this thesis focuses on the portion of energy that is spent to move data from the memory system to the application core's internal registers. The primary motivation for this work comes from the relatively higher power consumption associated with a data movement instruction compared to that of an arithmetic instruction. The data movement energy cost is worsened esp. in a System on Chip (SoC) because the amount of data received and exchanged in a SoC based smartphone increases at an explosive rate. A detailed investigation is performed to quantify the impact of data movement

on the overall energy consumption of a smartphone device. To aid this study, microbenchmarks that generate desired data movement patterns between different levels of the memory hierarchy are designed. Energy costs of data movement are then computed by measuring the instantaneous power consumption of the device when the micro benchmarks are executed. This work makes an extensive use of hardware performance counters to validate the memory access behavior of microbenchmarks and to characterize the energy consumed in moving data. Finally, the calculated energy costs of data movement are used to characterize the portion of energy that MobileBench applications spend in moving data. The results of this study show that a significant 35% of the total device energy is spent in data movement alone. Energy is an increasingly important criteria in the context of designing architectures for future smartphones and this thesis offers insights into data movement energy consumption.

ContributorsPandiyan, Dhinakaran (Author) / Wu, Carole-Jean (Thesis advisor) / Shrivastava, Aviral (Committee member) / Lee, Yann-Hang (Committee member) / Arizona State University (Publisher)

Created2014

Android application context aware I/O scheduler

Description

Android has been the dominant platform in which most of the mobile development is being done. By the end of the second quarter of 2014, 84.7 percent of the entire world mobile phones market share had been captured by Android. The Android library internally uses the modified Linux kernel as…

Android has been the dominant platform in which most of the mobile development is being done. By the end of the second quarter of 2014, 84.7 percent of the entire world mobile phones market share had been captured by Android. The Android library internally uses the modified Linux kernel as the part of its stack. The I/O scheduler, is a part of the Linux kernel, responsible for scheduling data requests to the internal and the external memory devices that are attached to the mobile systems.

The usage of solid state drives in the Android tablet has also seen a rise owing to its speed of operation and mechanical stability. The I/O schedulers that exist in the present Linux kernel are not better suited for handling solid state drives in particular to exploit the inherent parallelism offered by the solid state drives. The Android provides information to the Linux kernel about the processes running in the foreground and background. Based on this information the kernel decides the process scheduling and the memory management, but no such information exists for the I/O scheduling. Research shows that the resource management could be done better if the operating system is aware of the characteristics of the requester. Thus, there is a need for a better I/O scheduler that could schedule I/O operations based on the application and also exploit the parallelism in the solid state drives. The scheduler proposed through this research does that. It contains two algorithms working in unison one focusing on the solid state drives and the other on the application awareness.

The Android application context aware scheduler has the features of increasing the responsiveness of the time sensitive applications and also increases the throughput by parallel scheduling of request in the solid state drive. The suggested scheduler is tested using standard benchmarks and real-time scenarios, the results convey that our scheduler outperforms the existing default completely fair queuing scheduler of the Android.

ContributorsSivasankaran, Jeevan Prasath (Author) / Lee, Yann Hang (Thesis advisor) / Wu, Carole-Jean (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)

Created2014

Theses and Dissertations