Search Content

Expanding data mining theory for industrial applications

Description

The field of Data Mining is widely recognized and accepted for its applications in many business problems to guide decision-making processes based on data. However, in recent times, the scope of these problems has swollen and the methods are under scrutiny for applicability and relevance to real-world circumstances. At the…

The field of Data Mining is widely recognized and accepted for its applications in many business problems to guide decision-making processes based on data. However, in recent times, the scope of these problems has swollen and the methods are under scrutiny for applicability and relevance to real-world circumstances. At the crossroads of innovation and standards, it is important to examine and understand whether the current theoretical methods for industrial applications (which include KDD, SEMMA and CRISP-DM) encompass all possible scenarios that could arise in practical situations. Do the methods require changes or enhancements? As part of the thesis I study the current methods and delineate the ideas of these methods and illuminate their shortcomings which posed challenges during practical implementation. Based on the experiments conducted and the research carried out, I propose an approach which illustrates the business problems with higher accuracy and provides a broader view of the process. It is then applied to different case studies highlighting the different aspects to this approach.

ContributorsAnand, Aneeth (Author) / Liu, Huan (Thesis advisor) / Kempf, Karl G. (Thesis advisor) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2012

STL on limited local memory (LLM) multi-core processors

Description

Limited Local Memory (LLM) multicore architectures are promising powerefficient architectures will scalable memory hierarchy. In LLM multicores, each core can access only a small local memory. Accesses to a large shared global memory can only be made explicitly through Direct Memory Access (DMA) operations. Standard Template Library (STL) is a…

Limited Local Memory (LLM) multicore architectures are promising powerefficient architectures will scalable memory hierarchy. In LLM multicores, each core can access only a small local memory. Accesses to a large shared global memory can only be made explicitly through Direct Memory Access (DMA) operations. Standard Template Library (STL) is a powerful programming tool and is widely used for software development. STLs provide dynamic data structures, algorithms, and iterators for vector, deque (double-ended queue), list, map (red-black tree), etc. Since the size of the local memory is limited in the cores of the LLM architecture, and data transfer is not automatically supported by hardware cache or OS, the usage of current STL implementation on LLM multicores is limited. Specifically, there is a hard limitation on the amount of data they can handle. In this article, we propose and implement a framework which manages the STL container classes on the local memory of LLM multicore architecture. Our proposal removes the data size limitation of the STL, and therefore improves the programmability on LLM multicore architectures with little change to the original program. Our implementation results in only about 12%-17% increase in static library code size and reasonable runtime overheads.

ContributorsLu, Di (Author) / Shrivastava, Aviral (Thesis advisor) / Chatha, Karamvir (Committee member) / Dasgupta, Partha (Committee member) / Arizona State University (Publisher)

Created2012

Improving CGRA utilization by enabling multi-threading for power-efficient embedded systems

Description

Performance improvements have largely followed Moore's Law due to the help from technology scaling. In order to continue improving performance, power-efficiency must be reduced. Better technology has improved power-efficiency, but this has a limit. Multi-core architectures have been shown to be an additional aid to this crusade of increased power-efficiency.…

Performance improvements have largely followed Moore's Law due to the help from technology scaling. In order to continue improving performance, power-efficiency must be reduced. Better technology has improved power-efficiency, but this has a limit. Multi-core architectures have been shown to be an additional aid to this crusade of increased power-efficiency. Accelerators are growing in popularity as the next means of achieving power-efficient performance. Accelerators such as Intel SSE are ideal, but prove difficult to program. FPGAs, on the other hand, are less efficient due to their fine-grained reconfigurability. A middle ground is found in CGRAs, which are highly power-efficient, but largely programmable accelerators. Power-efficiencies of 100s of GOPs/W have been estimated, more than 2 orders of magnitude greater than current processors. Currently, CGRAs are limited in their applicability due to their ability to only accelerate a single thread at a time. This limitation becomes especially apparent as multi-core/multi-threaded processors have moved into the mainstream. This limitation is removed by enabling multi-threading on CGRAs through a software-oriented approach. The key capability in this solution is enabling quick run-time transformation of schedules to execute on targeted portions of the CGRA. This allows the CGRA to be shared among multiple threads simultaneously. Analysis shows that enabling multi-threading has very small costs but provides very large benefits (less than 1% single-threaded performance loss but nearly 300% CGRA throughput increase). By increasing dynamism of CGRA scheduling, system performance is shown to increase overall system performance of an optimized system by almost 350% over that of a single-threaded CGRA and nearly 20x faster than the same system with no CGRA in a highly threaded environment.

ContributorsPager, Jared (Author) / Shrivastava, Aviral (Thesis advisor) / Gupta, Sandeep (Committee member) / Speyer, Gil (Committee member) / Arizona State University (Publisher)

Created2011

Efficient implementation of a low cost object tracking system

Description

Object tracking is an important topic in multimedia, particularly in applications such as teleconferencing, surveillance and human-computer interface. Its goal is to determine the position of objects in images continuously and reliably. The key steps involved in object tracking are foreground detection to detect moving objects, clustering to enable representation…

Object tracking is an important topic in multimedia, particularly in applications such as teleconferencing, surveillance and human-computer interface. Its goal is to determine the position of objects in images continuously and reliably. The key steps involved in object tracking are foreground detection to detect moving objects, clustering to enable representation of an object by its centroid, and tracking the centroids to determine the motion parameters.

In this thesis, a low cost object tracking system is implemented on a hardware accelerator that is a warp based processor for SIMD/Vector style computations. First, the different foreground detection techniques are explored to figure out the best technique that involves the least number of computations without compromising on the performance. It is found that the Gaussian Mixture Model proposed by Zivkovic gives the best performance with respect to both accuracy and number of computations. Pixel level parallelization is applied to this algorithm and it is mapped onto the hardware accelerator.

Next, the different clustering algorithms are studied and it is found that while DBSCAN is highly accurate and robust to outliers, it is very computationally intensive. In contrast, K-means is computationally simple, but it requires that the number of means to be specified beforehand. So, a new clustering algorithm is proposed that uses a combination of both DBSCAN and K-means algorithm along with a diagnostic algorithm on K-means to estimate the right number of centroids. The proposed hybrid algorithm is shown to be faster than the DBSCAN algorithm by ~2.5x with minimal loss in accuracy. Also, the 1D Kalman filter is implemented assuming constant acceleration model. Since the computations involved in Kalman filter is just a set of recursive equations, the sequential model in itself exhibits good performance, thereby alleviating the need for parallelization. The tracking performance of the low cost implementation is evaluated against the sequential version. It is found that the proposed hybrid algorithm performs very close to the reference algorithm based on the DBSCAN algorithm.

ContributorsSasikumar, Asha (Author) / Chakrabarti, Chaitali (Thesis advisor) / Ogras, Umit Y. (Committee member) / Suppapola, Antonia Pappandreau (Committee member) / Arizona State University (Publisher)

Created2015

SmartGateway Framework

Description

Cisco estimates that by 2020, 50 billion devices will be connected to the Internet. But 99% of the things today remain isolated and unconnected. Different connectivity protocols, proprietary access, varied device characteristics, security concerns are the main reasons for that isolated state. This project aims at designing and building a…

Cisco estimates that by 2020, 50 billion devices will be connected to the Internet. But 99% of the things today remain isolated and unconnected. Different connectivity protocols, proprietary access, varied device characteristics, security concerns are the main reasons for that isolated state. This project aims at designing and building a prototype gateway that exposes a simple and intuitive HTTP Restful interface to access and manipulate devices and the data that they produce while addressing most of the issues listed above. Along with manipulating devices, the framework exposes sensor data in such a way that it can be used to create applications like rules or events that make the home smarter. It also allows the user to represent high-level knowledge by aggregating the low-level sensor data. This high-level representation can be considered as a property of the environment or object rather than the sensor itself which makes interpreting the values more intuitive and accessible.

ContributorsNair, Shankar (Author) / Lee, Yann-Hang (Thesis advisor) / Lee, Joohyung (Committee member) / Fainekos, Georgios (Committee member) / Arizona State University (Publisher)

Created2015

Stochastic simulation framework for a data mule network in the Amazon delta

Description

This report investigates the improvement in the transmission throughput, when fountain codes are used in opportunistic data routing, for a proposed delay tolerant network to connect remote and isolated communities in the Amazon region in Brazil, to the main city of that area. To extend healthcare facilities to the remote…

This report investigates the improvement in the transmission throughput, when fountain codes are used in opportunistic data routing, for a proposed delay tolerant network to connect remote and isolated communities in the Amazon region in Brazil, to the main city of that area. To extend healthcare facilities to the remote and isolated communities, on the banks of river Amazon in Brazil, the network [7] utilizes regularly schedules boats as data mules to carry data from one city to other.

Frequent thunder and rain storms, given state of infrastructure and harsh geographical terrain; all contribute to increase in chances of massages not getting delivered to intended destination. These regions have access to medical facilities only through sporadic visits from medical team from the main city in the region, Belem. The proposed network uses records for routine clinical examinations such as ultrasounds on pregnant women could be sent to the doctors in Belem for evaluation.

However, due to the lack of modern communication infrastructure in these communities and unpredictable boat schedules due to delays and breakdowns, as well as high transmission failures due to the harsh environment in the region, mandate the design of robust delay-tolerant routing algorithms. The work presented here incorporates the unpredictability of the Amazon riverine scenario into the simulation model - accounting for boat mechanical failure in boats leading to delays/breakdowns, possible decrease in transmission speed due to rain and individual packet losses.

Extensive simulation results are presented, to evaluate the proposed approach and to verify that the proposed solution [7] could be used as a viable mode of communication, given the lack of available options in the region. While the simulation results are focused on remote healthcare applications in the Brazilian Amazon, we envision that our approach may also be used for other remote applications, such as distance education, and other similar scenarios.

ContributorsAgarwal, Rachit (Author) / Richa, Andrea (Thesis advisor) / Dasgupta, Partha (Committee member) / Johnson, Thienne (Committee member) / Arizona State University (Publisher)

Created2015

Increasing the Effectiveness of Error Messages in a Computer Programming and Simulation Tool

Description

Each programming language has a compiler associated with it which helps to identify logical or syntactical errors in the program. These compiler error messages play important part in the form of formative feedback for the programmer. Thus, the error messages should be constructed carefully, considering the affective and cognitive needs…

Each programming language has a compiler associated with it which helps to identify logical or syntactical errors in the program. These compiler error messages play important part in the form of formative feedback for the programmer. Thus, the error messages should be constructed carefully, considering the affective and cognitive needs of programmers. This is especially true for systems that are used in educational settings, as the messages are typically seen by students who are novice programmers. If the error messages are hard to understand then they might discourage students from understanding or learning the programming language. The primary goal of this research is to identify methods to make the error messages more effective so that students can understand them better and simultaneously learn from their mistakes. This study is focused on understanding how the error message affects the understanding of the error and the approach students take to solve the error. In this study, three types of error messages were provided to the students. The first type is Default type error message which is an assembler centric error message. The second type is Link type error message which is a descriptive error message along with a link to the appropriate section of the PLP manual. The third type is Example type error message which is again a descriptive error message with an example of the similar type of error along with correction step. All these error types were developed for the PLP assembly language. A think-aloud experiment was designed and conducted on the students. The experiment was later transcribed and coded to understand different approach students take to solve different type of error message. After analyzing the result of the think-aloud experiment it was found that student read the Link type error message completely and they understood and learned from the error message to solve the error. The results also indicated that Link type was more helpful compare to other types of error message. The Link type made error solving process more effective compared to other error types.

ContributorsTanpure, Siddhant Bapusaheb (Author) / Sohoni, Sohum (Thesis advisor) / Gary, Kevin A (Committee member) / Craig, Scotty D. (Committee member) / Arizona State University (Publisher)

Created2018

Control for Resonant Microbeam Vibrotactile Haptic Displays

Description

The world’s population is currently 9% visually impaired. Medical sciences do not have a biological fix that can cure this visual impairment. Visually impaired people are currently being assisted with biological fixes or assistive devices. The current assistive devices are limited in size as well as resolution. This thesis presents…

The world’s population is currently 9% visually impaired. Medical sciences do not have a biological fix that can cure this visual impairment. Visually impaired people are currently being assisted with biological fixes or assistive devices. The current assistive devices are limited in size as well as resolution. This thesis presents the development and experimental validation of a control system for a new vibrotactile haptic display that is currently in development. In order to allow the vibrotactile haptic display to be used to represent motion, the control system must be able to change the image displayed at a rate of at least 30 frames/second. In order to achieve this, this thesis introduces and investigates the use of three improvements: threading, change filtering, and wave libraries. Through these methods, it is determined that an average of 40 frames/second can be achieved.

ContributorsKIM, KENDRA (Author) / Sodemann, Angela (Thesis advisor) / Robertson, John (Committee member) / Bansal, Ajay (Committee member) / Arizona State University (Publisher)

Created2018

Graph Search as a Feature in Imperative/Procedural Programming Languages

Description

Graph theory is a critical component of computer science and software engineering, with algorithms concerning graph traversal and comprehension powering much of the largest problems in both industry and research. Engineers and researchers often have an accurate view of their target graph, however they struggle to implement a correct, and…

Graph theory is a critical component of computer science and software engineering, with algorithms concerning graph traversal and comprehension powering much of the largest problems in both industry and research. Engineers and researchers often have an accurate view of their target graph, however they struggle to implement a correct, and efficient, search over that graph.

To facilitate rapid, correct, efficient, and intuitive development of graph based solutions we propose a new programming language construct - the search statement. Given a supra-root node, a procedure which determines the children of a given parent node, and optional definitions of the fail-fast acceptance or rejection of a solution, the search statement can conduct a search over any graph or network. Structurally, this statement is modelled after the common switch statement and is put into a largely imperative/procedural context to allow for immediate and intuitive development by most programmers. The Go programming language has been used as a foundation and proof-of-concept of the search statement. A Go compiler is provided which implements this construct.

ContributorsHenderson, Christopher (Author) / Bansal, Ajay (Thesis advisor) / Lindquist, Timothy (Committee member) / Acuna, Ruben (Committee member) / Arizona State University (Publisher)

Created2018

GeoSparkSim: A Scalable Microscopic Road Network Traffic Simulator Based on Apache Spark

Description

Researchers and practitioners have widely studied road network traffic data in different areas such as urban planning, traffic prediction and spatial-temporal databases. For instance, researchers use such data to evaluate the impact of road network changes. Unfortunately, collecting large-scale high-quality urban traffic data requires tremendous efforts because participating vehicles must…

Researchers and practitioners have widely studied road network traffic data in different areas such as urban planning, traffic prediction and spatial-temporal databases. For instance, researchers use such data to evaluate the impact of road network changes. Unfortunately, collecting large-scale high-quality urban traffic data requires tremendous efforts because participating vehicles must install Global Positioning System(GPS) receivers and administrators must continuously monitor these devices. There have been some urban traffic simulators trying to generate such data with different features. However, they suffer from two critical issues (1) Scalability: most of them only offer single-machine solution which is not adequate to produce large-scale data. Some simulators can generate traffic in parallel but do not well balance the load among machines in a cluster. (2) Granularity: many simulators do not consider microscopic traffic situations including traffic lights, lane changing, car following. This paper proposed GeoSparkSim, a scalable traffic simulator which extends Apache Spark to generate large-scale road network traffic datasets with microscopic traffic simulation. The proposed system seamlessly integrates with a Spark-based spatial data management system, GeoSpark, to deliver a holistic approach that allows data scientists to simulate, analyze and visualize large-scale urban traffic data. To implement microscopic traffic models, GeoSparkSim employs a simulation-aware vehicle partitioning method to partition vehicles among different machines such that each machine has a balanced workload. The experimental analysis shows that GeoSparkSim can simulate the movements of 200 thousand cars over an extensive road network (250 thousand road junctions and 300 thousand road segments).

ContributorsFu, Zishan (Author) / Sarwat, Mohamed (Thesis advisor) / Pedrielli, Giulia (Committee member) / Sefair, Jorge (Committee member) / Arizona State University (Publisher)

Created2019

Filtering by