Matching Items (7)

132211-Thumbnail Image.png

FPGAs as an Edge Computing Solution

Description

As the Internet of Things continues to expand, not only must our computing power grow
alongside it, our very approach must evolve. While the recent trend has been to centralize

As the Internet of Things continues to expand, not only must our computing power grow
alongside it, our very approach must evolve. While the recent trend has been to centralize our
computing resources in the cloud, it now looks beneficial to push more computing power
towards the “edge” with so called edge computing, reducing the immense strain on cloud
servers and the latency experienced by IoT devices. A new computing paradigm also brings
new opportunities for innovation, and one such innovation could be the use of FPGAs as edge
servers. In this research project, I learn the design flow for developing OpenCL kernels and
custom FPGA BSPs. Using these tools, I investigate the viability of using FPGAs as standalone
edge computing devices. Concluding that—although the technology is a great fit—the current
necessity of dynamically reprogrammable FPGAs to be closely coupled with a host CPU is
holding them back from this purpose. I propose a modification to the architecture of the Intel
Arria 10 GX that would allow it to be decoupled from its host CPU, allowing it to truly serve as a
viable edge computing solution.

Contributors

Agent

Created

Date Created
  • 2019-05

155389-Thumbnail Image.png

Scaling Up Large-scale Sparse Learning and Its Application to Medical Imaging

Description

Large-scale $\ell_1$-regularized loss minimization problems arise in high-dimensional applications such as compressed sensing and high-dimensional supervised learning, including classification and regression problems. In many applications, it remains challenging to apply

Large-scale $\ell_1$-regularized loss minimization problems arise in high-dimensional applications such as compressed sensing and high-dimensional supervised learning, including classification and regression problems. In many applications, it remains challenging to apply the sparse learning model to large-scale problems that have massive data samples with high-dimensional features. One popular and promising strategy is to scaling up the optimization problem in parallel. Parallel solvers run multiple cores on a shared memory system or a distributed environment to speed up the computation, while the practical usage is limited by the huge dimension in the feature space and synchronization problems.

In this dissertation, I carry out the research along the direction with particular focuses on scaling up the optimization of sparse learning for supervised and unsupervised learning problems. For the supervised learning, I firstly propose an asynchronous parallel solver to optimize the large-scale sparse learning model in a multithreading environment. Moreover, I propose a distributed framework to conduct the learning process when the dataset is distributed stored among different machines. Then the proposed model is further extended to the studies of risk genetic factors for Alzheimer's Disease (AD) among different research institutions, integrating a group feature selection framework to rank the top risk SNPs for AD. For the unsupervised learning problem, I propose a highly efficient solver, termed Stochastic Coordinate Coding (SCC), scaling up the optimization of dictionary learning and sparse coding problems. The common issue for the medical imaging research is that the longitudinal features of patients among different time points are beneficial to study together. To further improve the dictionary learning model, I propose a multi-task dictionary learning method, learning the different task simultaneously and utilizing shared and individual dictionary to encode both consistent and changing imaging features.

Contributors

Agent

Created

Date Created
  • 2017

154349-Thumbnail Image.png

Parallel optimization of polynomials for large-scale problems in stability and control

Description

In this thesis, we focus on some of the NP-hard problems in control theory. Thanks to the converse Lyapunov theory, these problems can often be modeled as optimization over polynomials.

In this thesis, we focus on some of the NP-hard problems in control theory. Thanks to the converse Lyapunov theory, these problems can often be modeled as optimization over polynomials. To avoid the problem of intractability, we establish a trade off between accuracy and complexity. In particular, we develop a sequence of tractable optimization problems - in the form of Linear Programs (LPs) and/or Semi-Definite Programs (SDPs) - whose solutions converge to the exact solution of the NP-hard problem. However, the computational and memory complexity of these LPs and SDPs grow exponentially with the progress of the sequence - meaning that improving the accuracy of the solutions requires solving SDPs with tens of thousands of decision variables and constraints. Setting up and solving such problems is a significant challenge. The existing optimization algorithms and software are only designed to use desktop computers or small cluster computers - machines which do not have sufficient memory for solving such large SDPs. Moreover, the speed-up of these algorithms does not scale beyond dozens of processors. This in fact is the reason we seek parallel algorithms for setting-up and solving large SDPs on large cluster- and/or super-computers.

We propose parallel algorithms for stability analysis of two classes of systems: 1) Linear systems with a large number of uncertain parameters; 2) Nonlinear systems defined by polynomial vector fields. First, we develop a distributed parallel algorithm which applies Polya's and/or Handelman's theorems to some variants of parameter-dependent Lyapunov inequalities with parameters defined over the standard simplex. The result is a sequence of SDPs which possess a block-diagonal structure. We then develop a parallel SDP solver which exploits this structure in order to map the computation, memory and communication to a distributed parallel environment. Numerical tests on a supercomputer demonstrate the ability of the algorithm to efficiently utilize hundreds and potentially thousands of processors, and analyze systems with 100+ dimensional state-space. Furthermore, we extend our algorithms to analyze robust stability over more complicated geometries such as hypercubes and arbitrary convex polytopes. Our algorithms can be readily extended to address a wide variety of problems in control such as Hinfinity synthesis for systems with parametric uncertainty and computing control Lyapunov functions.

Contributors

Agent

Created

Date Created
  • 2016

151299-Thumbnail Image.png

Asymptotic and numerical algorithms in applied electromagnetism

Description

Asymptotic and Numerical methods are popular in applied electromagnetism. In this work, the two methods are applied for collimated antennas and calibration targets, respectively. As an asymptotic method, the diffracted

Asymptotic and Numerical methods are popular in applied electromagnetism. In this work, the two methods are applied for collimated antennas and calibration targets, respectively. As an asymptotic method, the diffracted Gaussian beam approach (DGBA) is developed for design and simulation of collimated multi-reflector antenna systems, based upon Huygens principle and independent Gaussian beam expansion, referred to as the frames. To simulate a reflector antenna in hundreds to thousands of wavelength, it requires 1E7 - 1E9 independent Gaussian beams. To this end, high performance parallel computing is implemented, based on Message Passing Interface (MPI). The second part of the dissertation includes the plane wave scattering from a target consisting of doubly periodic array of sharp conducting circular cones by the magnetic field integral equation (MFIE) via Coiflet based Galerkin's procedure in conjunction with the Floquet theorem. Owing to the orthogonally, compact support, continuity and smoothness of the Coiflets, well-conditioned impedance matrices are obtained. Majority of the matrix entries are obtained in the spectral domain by one-point quadrature with high precision. For the oscillatory entries, spatial domain computation is applied, bypassing the slow convergence of the spectral summation of the non-damping propagating modes. The simulation results are compared with the solutions from an RWG-MLFMA based commercial software, FEKO, and excellent agreement is observed.

Contributors

Agent

Created

Date Created
  • 2012

157560-Thumbnail Image.png

A Parallel Adaptive Mesh Refinement Library for Cartesian Meshes

Description

This dissertation introduces FARCOM (Fortran Adaptive Refiner for Cartesian Orthogonal Meshes), a new general library for adaptive mesh refinement (AMR) based on an unstructured hexahedral mesh framework. As a result

This dissertation introduces FARCOM (Fortran Adaptive Refiner for Cartesian Orthogonal Meshes), a new general library for adaptive mesh refinement (AMR) based on an unstructured hexahedral mesh framework. As a result of the underlying unstructured formulation, the refinement and coarsening operators of the library operate on a single-cell basis and perform in-situ replacement of old mesh elements. This approach allows for h-refinement without the memory and computational expense of calculating masked coarse grid cells, as is done in traditional patch-based AMR approaches, and enables unstructured flow solvers to have access to the automated domain generation capabilities usually only found in tree AMR formulations.

The library is written to let the user determine where to refine and coarsen through custom refinement selector functions for static mesh generation and dynamic mesh refinement, and can handle smooth fields (such as level sets) or localized markers (e.g. density gradients). The library was parallelized with the use of the Zoltan graph-partitioning library, which provides interfaces to both a graph partitioner (PT-Scotch) and a partitioner based on Hilbert space-filling curves. The partitioned adjacency graph, mesh data, and solution variable data is then packed and distributed across all MPI ranks in the simulation, which then regenerate the mesh, generate domain decomposition ghost cells, and create communication caches.

Scalability runs were performed using a Leveque wave propagation scheme for solving the Euler equations. The results of simulations on up to 1536 cores indicate that the parallel performance is highly dependent on the graph partitioner being used, and differences between the partitioners were analyzed. FARCOM is found to have better performance if each MPI rank has more than 60,000 cells.

Contributors

Agent

Created

Date Created
  • 2019

151976-Thumbnail Image.png

Testing independence of parallel pseudorandom number streams: incorporating the data's multivariate nature

Description

Parallel Monte Carlo applications require the pseudorandom numbers used on each processor to be independent in a probabilistic sense. The TestU01 software package is the standard testing suite for detecting

Parallel Monte Carlo applications require the pseudorandom numbers used on each processor to be independent in a probabilistic sense. The TestU01 software package is the standard testing suite for detecting stream dependence and other properties that make certain pseudorandom generators ineffective in parallel (as well as serial) settings. TestU01 employs two basic schemes for testing parallel generated streams. The first applies serial tests to the individual streams and then tests the resulting P-values for uniformity. The second turns all the parallel generated streams into one long vector and then applies serial tests to the resulting concatenated stream. Various forms of stream dependence can be missed by each approach because neither one fully addresses the multivariate nature of the accumulated data when generators are run in parallel. This dissertation identifies these potential faults in the parallel testing methodologies of TestU01 and investigates two different methods to better detect inter-stream dependencies: correlation motivated multivariate tests and vector time series based tests. These methods have been implemented in an extension to TestU01 built in C++ and the unique aspects of this extension are discussed. A variety of different generation scenarios are then examined using the TestU01 suite in concert with the extension. This enhanced software package is found to better detect certain forms of inter-stream dependencies than the original TestU01 suites of tests.

Contributors

Agent

Created

Date Created
  • 2013

150671-Thumbnail Image.png

Trajectory sensitivity based power system dynamic security assessment

Description

Contemporary methods for dynamic security assessment (DSA) mainly re-ly on time domain simulations to explore the influence of large disturbances in a power system. These methods are computationally intensive especially

Contemporary methods for dynamic security assessment (DSA) mainly re-ly on time domain simulations to explore the influence of large disturbances in a power system. These methods are computationally intensive especially when the system operating point changes continually. The trajectory sensitivity method, when implemented and utilized as a complement to the existing DSA time domain simulation routine, can provide valuable insights into the system variation in re-sponse to system parameter changes. The implementation of the trajectory sensitivity analysis is based on an open source power system analysis toolbox called PSAT. Eight categories of sen-sitivity elements have been implemented and tested. The accuracy assessment of the implementation demonstrates the validity of both the theory and the imple-mentation. The computational burden introduced by the additional sensitivity equa-tions is relieved by two innovative methods: one is by employing a cluster to per-form the sensitivity calculations in parallel; the other one is by developing a mod-ified very dishonest Newton method in conjunction with the latest sparse matrix processing technology. The relation between the linear approximation accuracy and the perturba-tion size is also studied numerically. It is found that there is a fixed connection between the linear approximation accuracy and the perturbation size. Therefore this finding can serve as a general application guide to evaluate the accuracy of the linear approximation. The applicability of the trajectory sensitivity approach to a large realistic network has been demonstrated in detail. This research work applies the trajectory sensitivity analysis method to the Western Electricity Coordinating Council (WECC) system. Several typical power system dynamic security problems, in-cluding the transient angle stability problem, the voltage stability problem consid-ering load modeling uncertainty and the transient stability constrained interface real power flow limit calculation, have been addressed. Besides, a method based on the trajectory sensitivity approach and the model predictive control has been developed for determination of under frequency load shedding strategy for real time stability assessment. These applications have shown the great efficacy and accuracy of the trajectory sensitivity method in handling these traditional power system stability problems.

Contributors

Agent

Created

Date Created
  • 2012