Matching Items (61)

134662-Thumbnail Image.png

Data and Predictive Analytics for Energy Use

Description

The overall energy consumption around the United States has not been reduced even with the advancement of technology over the past decades. Deficiencies exist between design and actual energy performances.

The overall energy consumption around the United States has not been reduced even with the advancement of technology over the past decades. Deficiencies exist between design and actual energy performances. Energy Infrastructure Systems (EIS) are impacted when the amount of energy production cannot be accurately and efficiently forecasted. Inaccurate engineering assumptions can result when there is a lack of understanding on how energy systems can operate in real-world applications. Energy systems are complex, which results in unknown system behaviors, due to an unknown structural system model. Currently, there exists a lack of data mining techniques in reverse engineering, which are needed to develop efficient structural system models. In this project, a new type of reverse engineering algorithm has been applied to a year's worth of energy data collected from an ASU research building called MacroTechnology Works, to identify the structural system model. Developing and understanding structural system models is the first step in creating accurate predictive analytics for energy production. The associative network of the building's data will be highlighted to accurately depict the structural model. This structural model will enhance energy infrastructure systems' energy efficiency, reduce energy waste, and narrow the gaps between energy infrastructure design, planning, operation and management (DPOM).

Contributors

Agent

Created

Date Created
  • 2016-12

134710-Thumbnail Image.png

Sentiment Analysis of Public Perception Towards Transgender Rights on Twitter

Description

The fight for equal transgender rights is gaining traction in the public eye, but still has a lot of progress to make in the social and legal spheres. Since public

The fight for equal transgender rights is gaining traction in the public eye, but still has a lot of progress to make in the social and legal spheres. Since public opinion is critical in any civil rights movement, this study attempts to identify the most effective methods to elicit public reactions in support of transgender rights. Topic analysis through Latent Dirichlet Allocation is performed on Twitter data, along with polarity sentiment analysis, to track the subjects which gain the most effective reactions over time. Graphing techniques are used in an attempt to visually display the trends in topics. The topic analysis techniques are effective in identifying the positive and negative trends in the data, but the graphing algorithm lacks the ability to comprehensibly display complex data with more dimensionality.

Contributors

Created

Date Created
  • 2016-12

133932-Thumbnail Image.png

How Fake News Spreads in the U.S: A Geographic Visualization System for Misinformation

Description

The spread of fake news (rumors) has been a growing problem on the internet in the past few years due to the increase of social media services. People share fake

The spread of fake news (rumors) has been a growing problem on the internet in the past few years due to the increase of social media services. People share fake news articles on social media sometimes without knowing that those articles contain false information. Not knowing whether an article is fake or real is a problem because it causes social media news to lose credibility. Prior research on fake news has focused on how to detect fake news, but efforts towards controlling fake news articles on the internet are still facing challenges. Some of these challenges include; it is hard to collect large sets of fake news data, it is hard to collect locations of people who are spreading fake news, and it is difficult to study the geographic distribution of fake news. To address these challenges, I am examining how fake news spreads in the United States (US) by developing a geographic visualization system for misinformation. I am collecting a set of fake news articles from a website called snopes.com. After collecting these articles I am extracting the keywords from each article and storing them in a file. I then use the stored keywords to search on Twitter in order to find out the locations of users who spread the rumors. Finally, I mark those locations on a map in order to show the geographic distribution of fake news. Having access to large sets of fake news data, knowing the locations of people who are spreading fake news, and being able to understand the geographic distribution of fake news will help in the efforts towards addressing the fake news problem on the internet by providing target areas.

Contributors

Agent

Created

Date Created
  • 2018-05

147748-Thumbnail Image.png

Twitter Patterns in the Politics of Social Mobilization: #BlackLivesMatter Case Study

Description

The role of technology in shaping modern society has become increasingly important in the context of current democratic politics, especially when examined through the lens of social media. Twitter

The role of technology in shaping modern society has become increasingly important in the context of current democratic politics, especially when examined through the lens of social media. Twitter is a prominent social media platform used as a political medium, contributing to political movements such as #OccupyWallStreet, #MeToo, and #BlackLivesMatter. Using the #BlackLivesMatter movement as an illustrative case to establish patterns in Twitter usage, this thesis aims to answer the question “to what extent is Twitter an accurate representation of “real life” in terms of performative activism and user engagement?” The discussion of Twitter is contextualized by research on Twitter’s use in politics, both as a mobilizing force and potential to divide and mislead. Using intervals of time between 2014 – 2020, Twitter data containing #BlackLivesMatter is collected and analyzed. The discussion of findings centers around the role of performative activism in social mobilization on twitter. The analysis shows patterns in the data that indicates performative activism can skew the real picture of civic engagement, which can impact the way in which public opinion affects future public policy and mobilization.

Contributors

Created

Date Created
  • 2021-05

137174-Thumbnail Image.png

Analysis of Twitter's Effect on Stock Prices

Description

Twitter has become a very popular social media site that is used daily by many people and organizations. This paper will focus on the financial aspect of Twitter, as a

Twitter has become a very popular social media site that is used daily by many people and organizations. This paper will focus on the financial aspect of Twitter, as a process will be shown to be able to mine data about specific companies' stock prices. This was done by writing a program to grab tweets about the stocks of the thirty companies in the Dow Jones.

Contributors

Agent

Created

Date Created
  • 2014-05

137197-Thumbnail Image.png

Visual Analytic Tools for Geo-Genealogy and Geo-Demographics

Description

This work explores the development of a visual analytics tool for geodemographic exploration in an online environment. We mine 78 million records from the United States white pages, link the

This work explores the development of a visual analytics tool for geodemographic exploration in an online environment. We mine 78 million records from the United States white pages, link the location data to demographic data (specifically income) from the United States Census Bureau, and allow users to interactively compare distributions of names with regards to spatial location similarity and income. In order to enable interactive similarity exploration, we explore methods of pre-processing the data as well as on-the-fly lookups. As data becomes larger and more complex, the development of appropriate data storage and analytics solutions has become even more critical when enabling online visualization. We discuss problems faced in implementation, design decisions and directions for future work.

Contributors

Agent

Created

Date Created
  • 2014-05

134946-Thumbnail Image.png

Darkweb Cyber Threat Intelligence Mining through the I2P Protocol

Description

This thesis project focused on malicious hacking community activities accessible through the I2P protocol. We visited 315 distinct I2P sites to identify those with malicious hacking content. We also wrote

This thesis project focused on malicious hacking community activities accessible through the I2P protocol. We visited 315 distinct I2P sites to identify those with malicious hacking content. We also wrote software to scrape and parse data from relevant I2P sites. The data was integrated into the CySIS databases for further analysis to contribute to the larger CySIS Lab Darkweb Cyber Threat Intelligence Mining research. We found that the I2P cryptonet was slow and had only a small amount of malicious hacking community activity. However, we also found evidence of a growing perception that Tor anonymity could be compromised. This work will contribute to understanding the malicious hacker community as some Tor users, seeking assured anonymity, transition to I2P.

Contributors

Agent

Created

Date Created
  • 2016-12

153229-Thumbnail Image.png

Efficient processing of skyline queries on static data sources, data streams and incomplete datasets

Description

Skyline queries extract interesting points that are non-dominated and help paint the bigger picture of the data in question. They are valuable in many multi-criteria decision applications and are becoming

Skyline queries extract interesting points that are non-dominated and help paint the bigger picture of the data in question. They are valuable in many multi-criteria decision applications and are becoming a staple of decision support systems.

An assumption commonly made by many skyline algorithms is that a skyline query is applied to a single static data source or data stream. Unfortunately, this assumption does not hold in many applications in which a skyline query may involve attributes belonging to multiple data sources and requires a join operation to be performed before the skyline can be produced. Recently, various skyline-join algorithms have been proposed to address this problem in the context of static data sources. However, these algorithms suffer from several drawbacks: they often need to scan the data sources exhaustively to obtain the skyline-join results; moreover, the pruning techniques employed to eliminate tuples are largely based on expensive tuple-to-tuple comparisons. On the other hand, most data stream techniques focus on single stream skyline queries, thus rendering them unsuitable for skyline-join queries.

Another assumption typically made by most of the earlier skyline algorithms is that the data is complete and all skyline attribute values are available. Due to this constraint, these algorithms cannot be applied to incomplete data sources in which some of the attribute values are missing and are represented by NULL values. There exists a definition of dominance for incomplete data, but this leads to undesirable consequences such as non-transitive and cyclic dominance relations both of which are detrimental to skyline processing.

Based on the aforementioned observations, the main goal of the research described in this dissertation is the design and development of a framework of skyline operators that effectively handles three distinct types of skyline queries: 1) skyline-join queries on static data sources, 2) skyline-window-join queries over data streams, and 3) strata-skyline queries on incomplete datasets. This dissertation presents the unique challenges posed by these skyline queries and addresses the shortcomings of current skyline techniques by proposing efficient methods to tackle the added overhead in processing skyline queries on static data sources, data streams, and incomplete datasets.

Contributors

Agent

Created

Date Created
  • 2014

153988-Thumbnail Image.png

Automatic text summarization using importance of sentences for email corpus

Description

With the advent of Internet, the data being added online is increasing at enormous rate. Though search engines are using IR techniques to facilitate the search requests from users, the

With the advent of Internet, the data being added online is increasing at enormous rate. Though search engines are using IR techniques to facilitate the search requests from users, the results are not effective towards the search query of the user. The search engine user has to go through certain webpages before getting at the webpage he/she wanted. This problem of Information Overload can be solved using Automatic Text Summarization. Summarization is a process of obtaining at abridged version of documents so that user can have a quick view to understand what exactly the document is about. Email threads from W3C are used in this system. Apart from common IR features like Term Frequency, Inverse Document Frequency, Term Rank, a variation of page rank based on graph model, which can cluster the words with respective to word ambiguity, is implemented. Term Rank also considers the possibility of co-occurrence of words with the corpus and evaluates the rank of the word accordingly. Sentences of email threads are ranked as per features and summaries are generated. System implemented the concept of pyramid evaluation in content selection. The system can be considered as a framework for Unsupervised Learning in text summarization.

Contributors

Agent

Created

Date Created
  • 2015

152127-Thumbnail Image.png

Leveraging metadata for extracting robust multi-variate temporal features

Description

In recent years, there are increasing numbers of applications that use multi-variate time series data where multiple uni-variate time series coexist. However, there is a lack of systematic of multi-variate

In recent years, there are increasing numbers of applications that use multi-variate time series data where multiple uni-variate time series coexist. However, there is a lack of systematic of multi-variate time series. This thesis focuses on (a) defining a simplified inter-related multi-variate time series (IMTS) model and (b) developing robust multi-variate temporal (RMT) feature extraction algorithm that can be used for locating, filtering, and describing salient features in multi-variate time series data sets. The proposed RMT feature can also be used for supporting multiple analysis tasks, such as visualization, segmentation, and searching / retrieving based on multi-variate time series similarities. Experiments confirm that the proposed feature extraction algorithm is highly efficient and effective in identifying robust multi-scale temporal features of multi-variate time series.

Contributors

Agent

Created

Date Created
  • 2013