Search Content

Model-based design, simulation and automatic code generation for embedded systems and robotic applications

Description

As the complexity of robotic systems and applications grows rapidly, development of high-performance, easy to use, and fully integrated development environments for those systems is inevitable. Model-Based Design (MBD) of dynamic systems using engineering software such as Simulink® from MathWorks®, SciCos from Metalau team and SystemModeler® from Wolfram® is quite…

As the complexity of robotic systems and applications grows rapidly, development of high-performance, easy to use, and fully integrated development environments for those systems is inevitable. Model-Based Design (MBD) of dynamic systems using engineering software such as Simulink® from MathWorks®, SciCos from Metalau team and SystemModeler® from Wolfram® is quite popular nowadays. They provide tools for modeling, simulation, verification and in some cases automatic code generation for desktop applications, embedded systems and robots. For real-world implementation of models on the actual hardware, those models should be converted into compilable machine code either manually or automatically. Due to the complexity of robotic systems, manual code translation from model to code is not a feasible optimal solution so we need to move towards automated code generation for such systems. MathWorks® offers code generation facilities called Coder® products for this purpose. However in order to fully exploit the power of model-based design and code generation tools for robotic applications, we need to enhance those software systems by adding and modifying toolboxes, files and other artifacts as well as developing guidelines and procedures. In this thesis, an effort has been made to propose a guideline as well as a Simulink® library, StateFlow® interface API and a C/C++ interface API to complete this toolchain for NAO humanoid robots. Thus the model of the hierarchical control architecture can be easily and properly converted to code and built for implementation.

ContributorsRaji Kermani, Ramtin (Author) / Fainekos, Georgios (Thesis advisor) / Lee, Yann-Hang (Committee member) / Sarjoughian, Hessam S. (Committee member) / Arizona State University (Publisher)

Created2013

Towards energy efficient computing with Linux: enabling task level power awareness and support for energy efficient accelerator

Description

With increasing transistor volume and reducing feature size, it has become a major design constraint to reduce power consumption also. This has given rise to aggressive architectural changes for on-chip power management and rapid development to energy efficient hardware accelerators. Accordingly, the objective of this research work is to facilitate…

With increasing transistor volume and reducing feature size, it has become a major design constraint to reduce power consumption also. This has given rise to aggressive architectural changes for on-chip power management and rapid development to energy efficient hardware accelerators. Accordingly, the objective of this research work is to facilitate software developers to leverage these hardware techniques and improve energy efficiency of the system. To achieve this, I propose two solutions for Linux kernel: Optimal use of these architectural enhancements to achieve greater energy efficiency requires accurate modeling of processor power consumption. Though there are many models available in literature to model processor power consumption, there is a lack of such models to capture power consumption at the task-level. Task-level energy models are a requirement for an operating system (OS) to perform real-time power management as OS time multiplexes tasks to enable sharing of hardware resources. I propose a detailed design methodology for constructing an architecture agnostic task-level power model and incorporating it into a modern operating system to build an online task-level power profiler. The profiler is implemented inside the latest Linux kernel and validated for Intel Sandy Bridge processor. It has a negligible overhead of less than 1\% hardware resource consumption. The profiler power prediction was demonstrated for various application benchmarks from SPEC to PARSEC with less than 4\% error. I also demonstrate the importance of the proposed profiler for emerging architectural techniques through use case scenarios, which include heterogeneous computing and fine grained per-core DVFS. Along with architectural enhancement in general purpose processors to improve energy efficiency, hardware accelerators like Coarse Grain reconfigurable architecture (CGRA) are gaining popularity. Unlike vector processors, which rely on data parallelism, CGRA can provide greater flexibility and compiler level control making it more suitable for present SoC environment. To provide streamline development environment for CGRA, I propose a flexible framework in Linux to do design space exploration for CGRA. With accurate and flexible hardware models, fine grained integration with accurate architectural simulator, and Linux memory management and DMA support, a user can carry out limitless experiments on CGRA in full system environment.

ContributorsDesai, Digant Pareshkumar (Author) / Vrudhula, Sarma (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)

Created2013

A modular ROS package for linear temporal logic based motion planning

Description

Objective of this thesis project is to build a prototype using Linear Temporal Logic specifications for generating a 2D motion plan commanding an iRobot to fulfill the specifications. This thesis project was created for Cyber Physical Systems Lab in Arizona State University. The end product of this thesis is creation…

Objective of this thesis project is to build a prototype using Linear Temporal Logic specifications for generating a 2D motion plan commanding an iRobot to fulfill the specifications. This thesis project was created for Cyber Physical Systems Lab in Arizona State University. The end product of this thesis is creation of a software solution which can be used in the academia and industry for research in cyber physical systems related applications. The major features of the project are: creating a modular system for motion planning, use of Robot Operating System (ROS), use of triangulation for environment decomposition and using stargazer sensor for localization. The project is built on an open source software called ROS which provides an environment where it is very easy to integrate different modules be it software or hardware on a Linux based platform. Use of ROS implies the project or its modules can be adapted quickly for different applications as the need arises. The final software package created and tested takes a data file as its input which contains the LTL specifications, a symbols list used in the LTL and finally the environment polygon data containing real world coordinates for all polygons and also information on neighbors and parents of each polygon. The software package successfully ran the experiment of coverage, reachability with avoidance and sequencing.

ContributorsPandya, Parth (Author) / Fainekos, Georgios (Thesis advisor) / Dasgupta, Partha (Committee member) / Lee, Yann-Hang (Committee member) / Arizona State University (Publisher)

Created2013

Replay debugger for human interactive multiple threaded android applications

Description

Debugging is a boring, tedious, time consuming but inevitable step of software development and debugging multiple threaded applications with user interactions is even more complicated. Since concurrency and synchronism are normal features in Android mobile applications, the order of thread execution may vary in every run even with the same…

Debugging is a boring, tedious, time consuming but inevitable step of software development and debugging multiple threaded applications with user interactions is even more complicated. Since concurrency and synchronism are normal features in Android mobile applications, the order of thread execution may vary in every run even with the same input. To make things worse, the target erroneous cases may happen just in a few specific runs. Besides, the randomness of user interactions makes the whole debugging procedure more unpredictable. Thus, debugging a multiple threaded application is a tough and challenging task. This thesis introduces a replay mechanism for debugging user interactive multiple threaded Android applications. The approach is based on the 'Lamport Clock' concept, 'Event Driven' implementation and 'Client-Server' architecture. The debugger tool described in this thesis provides a user controlled debugging environment where users or developers are allowed to use modified record application to generate a log file. During the record time, all the necessary events like thread creation, synchronization and user input are recorded. Therefore, based on the information contained in the generated log files, the debugger tool can replay the application off-line since log files provide the deterministic order of execution. In this case, user or developers can replay an application as many times as they need to pinpoint the errors in the applications.

ContributorsLu, He (Author) / Lee, Yann-Hang (Thesis advisor) / Fainekos, Georgios (Committee member) / Chen, Yinong (Committee member) / Arizona State University (Publisher)

Created2012

Quantitative evaluation of control flow based soft error protection mechanisms

Description

Rapid technology scaling, the main driver of the power and performance improvements of computing solutions, has also rendered our computing systems extremely susceptible to transient errors called soft errors. Among the arsenal of techniques to protect computation from soft errors, Control Flow Checking (CFC) based techniques have gained a reputation…

Rapid technology scaling, the main driver of the power and performance improvements of computing solutions, has also rendered our computing systems extremely susceptible to transient errors called soft errors. Among the arsenal of techniques to protect computation from soft errors, Control Flow Checking (CFC) based techniques have gained a reputation of effective, yet low-cost protection mechanism. The basic idea is that, there is a high probability that a soft-fault in program execution will eventually alter the control flow of the program. Therefore just by making sure that the control flow of the program is correct, significant protection can be achieved. More than a dozen techniques for CFC have been developed over the last several decades, ranging from hardware techniques, software techniques, and hardware-software hybrid techniques as well. Our analysis shows that existing CFC techniques are not only ineffective in protecting from soft errors, but cause additional power and performance overheads. For this analysis, we develop and validate a simulation based experimental setup to accurately and quantitatively estimate the architectural vulnerability of a program execution on a processor micro-architecture. We model the protection achieved by various state-of-the-art CFC techniques in this quantitative vulnerability estimation setup, and find out that software only CFC protection schemes (CFCSS, CFCSS+NA, CEDA) increase system vulnerability by 18% to 21% with 17% to 38% performance overhead. Hybrid CFC protection (CFEDC) increases vulnerability by 5%, while the vulnerability remains almost the same for hardware only CFC protection (CFCET); notwithstanding the hardware overheads of design cost, area, and power incurred in the hardware modifications required for their implementations.

ContributorsRhisheekesan, Abhishek (Author) / Shrivastava, Aviral (Thesis advisor) / Colbourn, Charles Joseph (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)

Created2013

Dynamic scheduling of stream programs on embedded multi-core processors

Description

Stream computing has emerged as an importantmodel of computation for embedded system applications particularly in the multimedia and network processing domains. In recent past several programming languages and embedded multi-core processors have been proposed for streaming applications. This thesis examines the execution and dynamic scheduling of stream programs on embedded…

Stream computing has emerged as an importantmodel of computation for embedded system applications particularly in the multimedia and network processing domains. In recent past several programming languages and embedded multi-core processors have been proposed for streaming applications. This thesis examines the execution and dynamic scheduling of stream programs on embedded multi-core processors. The thesis addresses the problem in the context of a multi-tasking environment with a time varying allocation of processing elements for a particular streaming application. As a solution the thesis proposes a two step approach where the stream program is compiled to gather key application information, and to generate re-targetable code. A light weight dynamic scheduler incorporates the second stage of the approach. The dynamic scheduler utilizes the static information and available resources to assign or partition the application across the multi-core architecture. The objective of the dynamic scheduler is to maximize the throughput of the application, and it is sensitive to the resource (processing elements, scratch-pad memory, DMA bandwidth) constraints imposed by the target architecture. We evaluate the proposed approach by compiling and scheduling benchmark stream programs on a representative embedded multi-core processor. We present experimental results that evaluate the quality of the solutions generated by the proposed approach by comparisons with existing techniques.

ContributorsLee, Haeseung (Author) / Chatha, Karamvir (Thesis advisor) / Vrudhula, Sarma (Committee member) / Chakrabarti, Chaitali (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)

Created2013

Towards efficient online reasoning about actions

Description

Modeling dynamic systems is an interesting problem in Knowledge Representation (KR) due to their usefulness in reasoning about real-world environments. In order to effectively do this, a number of different formalisms have been considered ranging from low-level languages, such as Answer Set Programming (ASP), to high-level action languages, such as…

Modeling dynamic systems is an interesting problem in Knowledge Representation (KR) due to their usefulness in reasoning about real-world environments. In order to effectively do this, a number of different formalisms have been considered ranging from low-level languages, such as Answer Set Programming (ASP), to high-level action languages, such as C+ and BC. These languages show a lot of promise over many traditional approaches as they allow a developer to automate many tasks which require reasoning within dynamic environments in a succinct and elaboration tolerant manner. However, despite their strengths, they are still insufficient for modeling many systems, especially those of non-trivial scale or that require the ability to cope with exceptions which occur during execution, such as unexpected events or unintended consequences to actions which have been performed. In order to address these challenges, a theoretical framework is created which focuses on improving the feasibility of applying KR techniques to such problems. The framework is centered on the action language BC+, which integrates many of the strengths of existing KR formalisms, and provides the ability to perform efficient reasoning in an incremental fashion while handling exceptions which occur during execution. The result is a developer friendly formalism suitable for performing reasoning in an online environment. Finally, the newly enhanced Cplus2ASP 2 is introduced, which provides a number of improvements over the original version. These improvements include implementing BC+ among several additional languages, providing enhanced developer support, and exhibiting a significant performance increase over its predecessors and similar systems.

ContributorsBabb, Joseph (Author) / Lee, Joohyung (Thesis advisor) / Lee, Yann-Hang (Committee member) / Baral, Chitta (Committee member) / Arizona State University (Publisher)

Created2014

Register file organization for coarse-grained reconfigurable architectures: compiler-microarchitecture perspective

Description

Coarse-Grained Reconfigurable Architectures (CGRA) are a promising fabric for improving the performance and power-efficiency of computing devices. CGRAs are composed of components that are well-optimized to execute loops and rotating register file is an example of such a component present in CGRAs. Due to the rotating nature of register indexes…

Coarse-Grained Reconfigurable Architectures (CGRA) are a promising fabric for improving the performance and power-efficiency of computing devices. CGRAs are composed of components that are well-optimized to execute loops and rotating register file is an example of such a component present in CGRAs. Due to the rotating nature of register indexes in rotating register file, it is very challenging, if at all possible, to hold and properly index memory addresses (pointers) and static values. In this Thesis, different structures for CGRA register files are investigated. Those structures are experimentally compared in terms of performance of mapped applications, design frequency, and area. It is shown that a register file that can logically be partitioned into rotating and non-rotating regions is an excellent choice because it imposes the minimum restriction on underlying CGRA mapping algorithm while resulting in efficient resource utilization.

ContributorsSaluja, Dipal (Author) / Shrivastava, Aviral (Thesis advisor) / Lee, Yann-Hang (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)

Created2014

Asymmetric multiprocessing real time operating system on multicore platforms

Description

The need for multi-core architectural trends was realized in the desktop computing domain fairly long back. This trend is also beginning to be seen in the deeply embedded systems such as automotive and avionics industry owing to ever increasing demands in terms of sheer computational bandwidth, responsiveness, reliability and power…

The need for multi-core architectural trends was realized in the desktop computing domain fairly long back. This trend is also beginning to be seen in the deeply embedded systems such as automotive and avionics industry owing to ever increasing demands in terms of sheer computational bandwidth, responsiveness, reliability and power consumption constraints. The adoption of such multi-core architectures in safety critical systems is often met with resistance owing to the overhead in migration of the existing stable code base to the new system setup, typically requiring extensive re-design. This also brings about the need for exhaustive testing and validation that goes hand in hand with such a migration, especially in safety critical real-time systems.

This project highlights the steps to develop an asymmetric multiprocessing variant of Micrium µC/OS-II real-time operating system suited for a multi-core system. This RTOS variant also supports multi-core synchronization, shared memory management and multi-core messaging queues.

Since such specialized embedded systems are usually developed by system designers focused more so on the functionality than on the coding standards, the adoption of automatic production code generation tools, such as SIMULINK's Embedded Coder, is increasingly becoming the industry norm. Such tools are capable of producing robust, industry compliant code with very little roll out time. This project documents the process of extending SIMULINK's automatic code generation tool for the AMP variant of µC/OS-II on Freescale's MPC5675K, dual-core Microcontroller Unit. This includes code generation from task based models and multi-rate models. Apart from this, it also de-scribes the development of additional software tools to allow semantically consistent communication between task on the same kernel and those across the kernels.

ContributorsBulusu, Girish Rao (Author) / Lee, Yann-Hang (Thesis advisor) / Fainekos, Georgios (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)

Created2014

StreamWorks: an energy-efficient embedded co-processor for stream computing

Description

Stream processing has emerged as an important model of computation especially in the context of multimedia and communication sub-systems of embedded System-on-Chip (SoC) architectures. The dataflow nature of streaming applications allows them to be most naturally expressed as a set of kernels iteratively operating on continuous streams of data. The…

Stream processing has emerged as an important model of computation especially in the context of multimedia and communication sub-systems of embedded System-on-Chip (SoC) architectures. The dataflow nature of streaming applications allows them to be most naturally expressed as a set of kernels iteratively operating on continuous streams of data. The kernels are computationally intensive and are mainly characterized by real-time constraints that demand high throughput and data bandwidth with limited global data reuse. Conventional architectures fail to meet these demands due to their poorly matched execution models and the overheads associated with instruction and data movements.

This work presents StreamWorks, a multi-core embedded architecture for energy-efficient stream computing. The basic processing element in the StreamWorks architecture is the StreamEngine (SE) which is responsible for iteratively executing a stream kernel. SE introduces an instruction locking mechanism that exploits the iterative nature of the kernels and enables fine-grain instruction reuse. Each instruction in a SE is locked to a Reservation Station (RS) and revitalizes itself after execution; thus never retiring from the RS. The entire kernel is hosted in RS Banks (RSBs) close to functional units for energy-efficient instruction delivery. The dataflow semantics of stream kernels are captured by a context-aware dataflow execution mode that efficiently exploits the Instruction Level Parallelism (ILP) and Data-level parallelism (DLP) within stream kernels.

Multiple SEs are grouped together to form a StreamCluster (SC) that communicate via a local interconnect. A novel software FIFO virtualization technique with split-join functionality is proposed for efficient and scalable stream communication across SEs. The proposed communication mechanism exploits the Task-level parallelism (TLP) of the stream application. The performance and scalability of the communication mechanism is evaluated against the existing data movement schemes for scratchpad based multi-core architectures. Further, overlay schemes and architectural support are proposed that allow hosting any number of kernels on the StreamWorks architecture. The proposed oevrlay schemes for code management supports kernel(context) switching for the most common use cases and can be adapted for any multi-core architecture that use software managed local memories.

The performance and energy-efficiency of the StreamWorks architecture is evaluated for stream kernel and application benchmarks by implementing the architecture in 45nm TSMC and comparison with a low power RISC core and a contemporary accelerator.

ContributorsPanda, Amrit (Author) / Chatha, Karam S. (Thesis advisor) / Wu, Carole-Jean (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)

Created2014