Matching Items (11)

134185-Thumbnail Image.png

A Novel Historical Safety Metric for Evaluating Road Networks

Description

37,461 automobile accident fatalities occured in the United States in 2016 ("Quick Facts 2016", 2017). Improving the safety of roads has traditionally been approached by governmental agencies including the National

37,461 automobile accident fatalities occured in the United States in 2016 ("Quick Facts 2016", 2017). Improving the safety of roads has traditionally been approached by governmental agencies including the National Highway Traffic Safety Administration and State Departments of Transporation. In past literature, automobile crash data is analyzed using time-series prediction technicques to identify road segments and/or intersections likely to experience future crashes (Lord & Mannering, 2010). After dangerous zones have been identified road modifications can be implemented improving public safety. This project introduces a historical safety metric for evaluating the relative danger of roads in a road network. The historical safety metric can be used to update routing choices of individual drivers improving public safety by avoiding historically more dangerous routes. The metric is constructed using crash frequency, severity, location and traffic information. An analysis of publically-available crash and traffic data in Allgeheny County, Pennsylvania is used to generate the historical safety metric for a specific road network. Methods for evaluating routes based on the presented historical safety metric are included using the Mann Whitney U Test to evaluate the significance of routing decisions. The evaluation method presented requires routes have at least 20 crashes to be compared with significance testing. The safety of the road network is visualized using a heatmap to present distribution of the metric throughout Allgeheny County.

Contributors

Agent

Created

Date Created
  • 2017-12

134107-Thumbnail Image.png

Analyzing LinkedIn Profiles Using Machine Learning

Description

Understanding the necessary skills required to work in an industry is a difficult task with many potential uses. By being able to predict the industry of a person based on

Understanding the necessary skills required to work in an industry is a difficult task with many potential uses. By being able to predict the industry of a person based on their skills, professional social networks could make searching better with automated tagging, advertisers can target more carefully, and students can better find a career path that fits their skillset. The aim in this project is to apply deep learning to the world of professional networking. Deep Learning is a type of machine learning that has recently been making breakthroughs in the analysis of complex datasets that previously were not of much use. Initially the goal was to apply deep learning to the skills-to-company relationship, but a lack of quality data required a change to the skills-to-industry relationship. To accomplish the new goal, a database of LinkedIn profiles that are part of various industries was gathered and processed. From this dataset a model was created to take a list of skills and output an industry that people with those skills work in. Such a model has value in the insights that it forms allowing candidates to: determine what industry fits a skillset, identify key skills for industries, and locate which industries possible candidates may best fit in. Various models were trained and tested on a skill to industry dataset. The model was able to learn similarities between industries, and predict the most likely industries for each profiles skillset.

Contributors

Agent

Created

Date Created
  • 2017-12

131065-Thumbnail Image.png

Project Resolve: Reducing Recidivism using Advanced Analytics

Description

The United States has an institutional prison system built on the principle of retributive justice combined with racial prejudice that despite countless efforts for reform currently holds 2.3 million individuals,

The United States has an institutional prison system built on the principle of retributive justice combined with racial prejudice that despite countless efforts for reform currently holds 2.3 million individuals, primarily minorities, behind bars. This institution has remained largely unchanged, meanwhile 83.4% of those who enter the system will return within one decade and it currently costs nearly $39 billion each year (Alper 4). Because the prison institution consistently fails to address the core root of crime, there is a great need to reconsider the approach taken towards those who break our nation’s laws with the dual purpose of enhancing freedom and reducing crime. This paper outlines an original theoretical framework being implemented by Project Resolve that can help to identify and implement solutions for our prison system without reliance on political, institutional, or societal approval. The method focuses on three core goals, the first is to connect as much of the data surrounding prisoners and the formerly incarcerated as possible, the second is to use modern analytic approaches to analyze and propose superior solutions for rehabilitation, the third is shifting focus to public interest technology both inside prisons and the parole process. The combination of these objectives has the potential to reduce recidivism to significantly, deter criminals before initial offense, and to implement a truly equitable prison institution.

Contributors

Agent

Created

Date Created
  • 2020-05

131535-Thumbnail Image.png

Exploring the Influence of Visualized Data: Inclusion and Collaboration Between University Members

Description

Visualizations are an integral component for communicating and evaluating modern networks. As data becomes more complex, info-graphics require a balance between visual noise and effective storytelling that is often restricted

Visualizations are an integral component for communicating and evaluating modern networks. As data becomes more complex, info-graphics require a balance between visual noise and effective storytelling that is often restricted by layouts unsuitable for scalability. The challenge then rests upon researchers to effectively structure their information in a way that allows for flexible, transparent illustration. We propose network graphing as an operative alternative for demonstrating community behavior over traditional charts which are unable to look past numeric data. In this paper, we explore methods for manipulating, processing, cleaning, and aggregating data in Python; a programming language tailored for handling structured data, which can then be formatted for analysis and modeling of social network tendencies in Gephi. We implement this data by applying an algorithm known as the Fruchterman-Reingold force-directed layout to datasets of Arizona State University’s research and collaboration network. The result is a visualization that analyzes the university’s infrastructure by providing insight about community behaviors between colleges. Furthermore, we highlight how the flexibility of this visualization provides a foundation for specific use cases by demonstrating centrality measures to find important liaisons that connect distant communities.

Contributors

Agent

Created

Date Created
  • 2020-05

136587-Thumbnail Image.png

Balancing the Present and Future: Making Valuable Predictions for Continuous Improvement in Management

Description

In the words of W. Edwards Deming, "the central problem in management and in leadership is failure to understand the information in variation." While many quality management programs propose the

In the words of W. Edwards Deming, "the central problem in management and in leadership is failure to understand the information in variation." While many quality management programs propose the institution of technical training in advanced statistical methods, this paper proposes that by understanding the fundamental information behind statistical theory, and by minimizing bias and variance while fully utilizing the available information about the system at hand, one can make valuable, accurate predictions about the future. Combining this knowledge with the work of quality gurus W. E. Deming, Eliyahu Goldratt, and Dean Kashiwagi, a framework for making valuable predictions for continuous improvement is made. After this information is synthesized, it is concluded that the best way to make accurate, informative predictions about the future is to "balance the present and future," seeing the future through the lens of the present and thus minimizing bias, variance, and risk.

Contributors

Agent

Created

Date Created
  • 2015-05

An Empirical Study of View Construction for Multi-View Learning

Description

Multi-view learning, a subfield of machine learning that aims to improve model performance by training on multiple views of the data, has been studied extensively in the past decades. It

Multi-view learning, a subfield of machine learning that aims to improve model performance by training on multiple views of the data, has been studied extensively in the past decades. It is typically applied in contexts where the input features naturally form multiple groups or views. An example of a naturally multi-view context is a data set of websites, where each website is described not only by the text on the page, but also by the text of hyperlinks pointing to the page. More recently, various studies have demonstrated the initial success of applying multi-view learning on single-view data with multiple artificially constructed views. However, there lacks a systematic study regarding the effectiveness of such artificially constructed views. To bridge this gap, this thesis begins by providing a high-level overview of multi-view learning with the co-training algorithm. Co-training is a classic semi-supervised learning algorithm that takes advantage of both labelled and unlabelled examples in the data set for training. Then, the thesis presents a web-based tool developed in Python allowing users to experiment with and compare the performance of multiple view construction approaches on various data sets. The supported view construction approaches in the web-based tool include subsampling, Optimal Feature Set Partitioning, and the genetic algorithm. Finally, the thesis presents an empirical comparison of the performance of these approaches, not only against one another, but also against traditional single-view models. The findings show that a simple subsampling approach combined with co-training often outperforms both the other view construction approaches, as well as traditional single-view methods.

Contributors

Agent

Created

Date Created
  • 2019-12

157833-Thumbnail Image.png

Assessing Influential Users in Live Streaming Social Networks

Description

Live streaming has risen to significant popularity in the recent past and largely this live streaming is a feature of existing social networks like Facebook, Instagram, and Snapchat. However, there

Live streaming has risen to significant popularity in the recent past and largely this live streaming is a feature of existing social networks like Facebook, Instagram, and Snapchat. However, there does exist at least one social network entirely devoted to live streaming, and specifically the live streaming of video games, Twitch. This social network is unique for a number of reasons, not least because of its hyper-focus on live content and this uniqueness has challenges for social media researchers.

Despite this uniqueness, almost no scientific work has been performed on this public social network. Thus, it is unclear what user interaction features present on other social networks exist on Twitch. Investigating the interactions between users and identifying which, if any, of the common user behaviors on social network exist on Twitch is an important step in understanding how Twitch fits in to the social media ecosystem. For example, there are users that have large followings on Twitch and amass a large number of viewers, but do those users exert influence over the behavior of other user the way that popular users on Twitter do?

This task, however, will not be trivial. The same hyper-focus on live content that makes Twitch unique in the social network space invalidates many of the traditional approaches to social network analysis. Thus, new algorithms and techniques must be developed in order to tap this data source. In this thesis, a novel algorithm for finding games whose releases have made a significant impact on the network is described as well as a novel algorithm for detecting and identifying influential players of games. In addition, the Twitch network is described in detail along with the data that was collected in order to power the two previously described algorithms.

Contributors

Agent

Created

Date Created
  • 2019

156200-Thumbnail Image.png

An exploration of statistical modelling methods on simulation data case study: biomechanical predator-prey simulations

Description

Modern, advanced statistical tools from data mining and machine learning have become commonplace in molecular biology in large part because of the “big data” demands of various kinds of “-omics”

Modern, advanced statistical tools from data mining and machine learning have become commonplace in molecular biology in large part because of the “big data” demands of various kinds of “-omics” (e.g., genomics, transcriptomics, metabolomics, etc.). However, in other fields of biology where empirical data sets are conventionally smaller, more traditional statistical methods of inference are still very effective and widely used. Nevertheless, with the decrease in cost of high-performance computing, these fields are starting to employ simulation models to generate insights into questions that have been elusive in the laboratory and field. Although these computational models allow for exquisite control over large numbers of parameters, they also generate data at a qualitatively different scale than most experts in these fields are accustomed to. Thus, more sophisticated methods from big-data statistics have an opportunity to better facilitate the often-forgotten area of bioinformatics that might be called “in-silicomics”.

As a case study, this thesis develops methods for the analysis of large amounts of data generated from a simulated ecosystem designed to understand how mammalian biomechanics interact with environmental complexity to modulate the outcomes of predator–prey interactions. These simulations investigate how other biomechanical parameters relating to the agility of animals in predator–prey pairs are better predictors of pursuit outcomes. Traditional modelling techniques such as forward, backward, and stepwise variable selection are initially used to study these data, but the number of parameters and potentially relevant interaction effects render these methods impractical. Consequently, new modelling techniques such as LASSO regularization are used and compared to the traditional techniques in terms of accuracy and computational complexity. Finally, the splitting rules and instances in the leaves of classification trees provide the basis for future simulation with an economical number of additional runs. In general, this thesis shows the increased utility of these sophisticated statistical techniques with simulated ecological data compared to the approaches traditionally used in these fields. These techniques combined with methods from industrial Design of Experiments will help ecologists extract novel insights from simulations that combine habitat complexity, population structure, and biomechanics.

Contributors

Agent

Created

Date Created
  • 2018

155030-Thumbnail Image.png

Crossing the chasm: deploying machine learning analytics in dynamic real-world scenarios

Description

The dawn of Internet of Things (IoT) has opened the opportunity for mainstream adoption of machine learning analytics. However, most research in machine learning has focused on discovery of new

The dawn of Internet of Things (IoT) has opened the opportunity for mainstream adoption of machine learning analytics. However, most research in machine learning has focused on discovery of new algorithms or fine-tuning the performance of existing algorithms. Little exists on the process of taking an algorithm from the lab-environment into the real-world, culminating in sustained value. Real-world applications are typically characterized by dynamic non-stationary systems with requirements around feasibility, stability and maintainability. Not much has been done to establish standards around the unique analytics demands of real-world scenarios.

This research explores the problem of the why so few of the published algorithms enter production and furthermore, fewer end up generating sustained value. The dissertation proposes a ‘Design for Deployment’ (DFD) framework to successfully build machine learning analytics so they can be deployed to generate sustained value. The framework emphasizes and elaborates the often neglected but immensely important latter steps of an analytics process: ‘Evaluation’ and ‘Deployment’. A representative evaluation framework is proposed that incorporates the temporal-shifts and dynamism of real-world scenarios. Additionally, the recommended infrastructure allows analytics projects to pivot rapidly when a particular venture does not materialize. Deployment needs and apprehensions of the industry are identified and gaps addressed through a 4-step process for sustainable deployment. Lastly, the need for analytics as a functional area (like finance and IT) is identified to maximize the return on machine-learning deployment.

The framework and process is demonstrated in semiconductor manufacturing – it is highly complex process involving hundreds of optical, electrical, chemical, mechanical, thermal, electrochemical and software processes which makes it a highly dynamic non-stationary system. Due to the 24/7 uptime requirements in manufacturing, high-reliability and fail-safe are a must. Moreover, the ever growing volumes mean that the system must be highly scalable. Lastly, due to the high cost of change, sustained value proposition is a must for any proposed changes. Hence the context is ideal to explore the issues involved. The enterprise use-cases are used to demonstrate the robustness of the framework in addressing challenges encountered in the end-to-end process of productizing machine learning analytics in dynamic read-world scenarios.

Contributors

Agent

Created

Date Created
  • 2016

155841-Thumbnail Image.png

Policy and Place: A Spatial Data Science Framework for Research and Decision-Making

Description

A major challenge in health-related policy and program evaluation research is attributing underlying causal relationships where complicated processes may exist in natural or quasi-experimental settings. Spatial interaction and heterogeneity between

A major challenge in health-related policy and program evaluation research is attributing underlying causal relationships where complicated processes may exist in natural or quasi-experimental settings. Spatial interaction and heterogeneity between units at individual or group levels can violate both components of the Stable-Unit-Treatment-Value-Assumption (SUTVA) that are core to the counterfactual framework, making treatment effects difficult to assess. New approaches are needed in health studies to develop spatially dynamic causal modeling methods to both derive insights from data that are sensitive to spatial differences and dependencies, and also be able to rely on a more robust, dynamic technical infrastructure needed for decision-making. To address this gap with a focus on causal applications theoretically, methodologically and technologically, I (1) develop a theoretical spatial framework (within single-level panel econometric methodology) that extends existing theories and methods of causal inference, which tend to ignore spatial dynamics; (2) demonstrate how this spatial framework can be applied in empirical research; and (3) implement a new spatial infrastructure framework that integrates and manages the required data for health systems evaluation.

The new spatially explicit counterfactual framework considers how spatial effects impact treatment choice, treatment variation, and treatment effects. To illustrate this new methodological framework, I first replicate a classic quasi-experimental study that evaluates the effect of drinking age policy on mortality in the United States from 1970 to 1984, and further extend it with a spatial perspective. In another example, I evaluate food access dynamics in Chicago from 2007 to 2014 by implementing advanced spatial analytics that better account for the complex patterns of food access, and quasi-experimental research design to distill the impact of the Great Recession on the foodscape. Inference interpretation is sensitive to both research design framing and underlying processes that drive geographically distributed relationships. Finally, I advance a new Spatial Data Science Infrastructure to integrate and manage data in dynamic, open environments for public health systems research and decision- making. I demonstrate an infrastructure prototype in a final case study, developed in collaboration with health department officials and community organizations.

Contributors

Agent

Created

Date Created
  • 2017