Search Content

Context-aware search principles in automated learning environments

Description

Many web search improvements have been developed since the advent of the modern search engine, but one underrepresented area is the application of specific customizations to search results for educational web sites. In order to address this issue and improve the relevance of search results in automated learning environments, this…

Many web search improvements have been developed since the advent of the modern search engine, but one underrepresented area is the application of specific customizations to search results for educational web sites. In order to address this issue and improve the relevance of search results in automated learning environments, this work has integrated context-aware search principles with applications of preference based re-ranking and query modifications. This research investigates several aspects of context-aware search principles, specifically context-sensitive and preference based re-ranking of results which take user inputs as to their preferred content, and combines this with search query modifications which automatically search for a variety of modified terms based on the given search query, integrating these results into the overall re-ranking for the context. The result of this work is a novel web search algorithm which could be applied to any online learning environment attempting to collect relevant resources for learning about a given topic. The algorithm has been evaluated through user studies comparing traditional search results to the context-aware results returned through the algorithm for a given topic. These studies explore how this integration of methods could provide improved relevance in the search results returned when compared against other modern search engines.

ContributorsVan Egmond, Eric (Author) / Burleson, Winslow (Thesis advisor) / Syrotiuk, Violet (Thesis advisor) / Nelson, Brian (Committee member) / Arizona State University (Publisher)

Created2014

Topic chains for determining risk of unauthorized information transfer

Description

Corporations invest considerable resources to create, preserve and analyze

their data; yet while organizations are interested in protecting against

unauthorized data transfer, there lacks a comprehensive metric to discriminate

what data are at risk of leaking.

This thesis motivates the need for a quantitative leakage risk metric, and

provides a risk assessment system,…

Corporations invest considerable resources to create, preserve and analyze

their data; yet while organizations are interested in protecting against

unauthorized data transfer, there lacks a comprehensive metric to discriminate

what data are at risk of leaking.

This thesis motivates the need for a quantitative leakage risk metric, and

provides a risk assessment system, called Whispers, for computing it. Using

unsupervised machine learning techniques, Whispers uncovers themes in an

organization's document corpus, including previously unknown or unclassified

data. Then, by correlating the document with its authors, Whispers can

identify which data are easier to contain, and conversely which are at risk.

Using the Enron email database, Whispers constructs a social network segmented

by topic themes. This graph uncovers communication channels within the

organization. Using this social network, Whispers determines the risk of each

topic by measuring the rate at which simulated leaks are not detected. For the

Enron set, Whispers identified 18 separate topic themes between January 1999

and December 2000. The highest risk data emanated from the legal department

with a leakage risk as high as 60%.

ContributorsWright, Jeremy (Author) / Syrotiuk, Violet (Thesis advisor) / Davulcu, Hasan (Committee member) / Yau, Stephen (Committee member) / Arizona State University (Publisher)

Created2014

Establishing distributed social network trust model in MobiCloud system

Description

This thesis proposed a novel approach to establish the trust model in a social network scenario based on users' emails. Email is one of the most important social connections nowadays. By analyzing email exchange activities among users, a social network trust model can be established to judge the trust rate…

This thesis proposed a novel approach to establish the trust model in a social network scenario based on users' emails. Email is one of the most important social connections nowadays. By analyzing email exchange activities among users, a social network trust model can be established to judge the trust rate between each two users. The whole trust checking process is divided into two steps: local checking and remote checking. Local checking directly contacts the email server to calculate the trust rate based on user's own email communication history. Remote checking is a distributed computing process to get help from user's social network friends and built the trust rate together. The email-based trust model is built upon a cloud computing framework called MobiCloud. Inside MobiCloud, each user occupies a virtual machine which can directly communicate with others. Based on this feature, the distributed trust model is implemented as a combination of local analysis and remote analysis in the cloud. Experiment results show that the trust evaluation model can give accurate trust rate even in a small scale social network which does not have lots of social connections. With this trust model, the security in both social network services and email communication could be improved.

ContributorsZhong, Yunji (Author) / Huang, Dijiang (Thesis advisor) / Dasgupta, Partha (Committee member) / Syrotiuk, Violet (Committee member) / Arizona State University (Publisher)

Created2011

Covering arrays: generation and post-optimization

Description

Exhaustive testing is generally infeasible except in the smallest of systems. Research

has shown that testing the interactions among fewer (up to 6) components is generally

sufficient while retaining the capability to detect up to 99% of defects. This leads to a

substantial decrease in the number of tests. Covering arrays are combinatorial…

Exhaustive testing is generally infeasible except in the smallest of systems. Research

has shown that testing the interactions among fewer (up to 6) components is generally

sufficient while retaining the capability to detect up to 99% of defects. This leads to a

substantial decrease in the number of tests. Covering arrays are combinatorial objects

that guarantee that every interaction is tested at least once.

In the absence of direct constructions, forming small covering arrays is generally

an expensive computational task. Algorithms to generate covering arrays have been

extensively studied yet no single algorithm provides the smallest solution. More

recently research has been directed towards a new technique called post-optimization.

These algorithms take an existing covering array and attempt to reduce its size.

This thesis presents a new idea for post-optimization by representing covering

arrays as graphs. Some properties of these graphs are established and the results are

contrasted with existing post-optimization algorithms. The idea is then generalized to

close variants of covering arrays with surprising results which in some cases reduce

the size by 30%. Applications of the method to generation and test prioritization are

studied and some interesting results are reported.

ContributorsKaria, Rushang Vinod (Author) / Colbourn, Charles J (Thesis advisor) / Syrotiuk, Violet (Committee member) / Richa, Andréa W. (Committee member) / Arizona State University (Publisher)

Created2015

Smoothed Airtime Linear Tuning and Optimized REACT with Multi-hop Extensions

Description

Medium access control (MAC) is a fundamental problem in wireless networks.

In ad-hoc wireless networks especially, many of the performance and scaling issues

these networks face can be attributed to their use of the core IEEE 802.11 MAC

protocol: distributed coordination function (DCF). Smoothed Airtime Linear Tuning

(SALT) is a new contention window tuning…

Medium access control (MAC) is a fundamental problem in wireless networks.

In ad-hoc wireless networks especially, many of the performance and scaling issues

these networks face can be attributed to their use of the core IEEE 802.11 MAC

protocol: distributed coordination function (DCF). Smoothed Airtime Linear Tuning

(SALT) is a new contention window tuning algorithm proposed to address some of the

deficiencies of DCF in 802.11 ad-hoc networks. SALT works alongside a new user level

and optimized implementation of REACT, a distributed resource allocation protocol,

to ensure that each node secures the amount of airtime allocated to it by REACT.

The algorithm accomplishes that by tuning the contention window size parameter

that is part of the 802.11 backoff process. SALT converges more tightly on airtime

allocations than a contention window tuning algorithm from previous work and this

increases fairness in transmission opportunities and reduces jitter more than either

802.11 DCF or the other tuning algorithm. REACT and SALT were also extended

to the multi-hop flow scenario with the introduction of a new airtime reservation

algorithm. With a reservation in place multi-hop TCP throughput actually increased

when running SALT and REACT as compared to 802.11 DCF, and the combination of

protocols still managed to maintain its fairness and jitter advantages. All experiments

were performed on a wireless testbed, not in simulation.

ContributorsMellott, Matthew (Author) / Syrotiuk, Violet (Thesis advisor) / Colbourn, Charles (Committee member) / Tinnirello, Ilenia (Committee member) / Arizona State University (Publisher)

Created2018

Modeling Fantasy Baseball Player Popularity Using Twitter Activity

Description

Social media is used by people every day to discuss the nuances of their lives. Major League Baseball (MLB) is a popular sport in the United States, and as such has generated a great deal of activity on Twitter. As fantasy baseball continues to grow in popularity, so does the…

Social media is used by people every day to discuss the nuances of their lives. Major League Baseball (MLB) is a popular sport in the United States, and as such has generated a great deal of activity on Twitter. As fantasy baseball continues to grow in popularity, so does the research into better algorithms for picking players. Most of the research done in this area focuses on improving the prediction of a player's individual performance. However, the crowd-sourcing power afforded by social media may enable more informed predictions about players' performances. Players are chosen by popularity and personal preferences by most amateur gamblers. While some of these trends (particularly the long-term ones) are captured by ranking systems, this research was focused on predicting the daily spikes in popularity (and therefore price or draft order) by comparing the number of mentions that the player received on Twitter compared to their previous mentions. In doing so, it was demonstrated that improved fantasy baseball predictions can be made through leveraging social media data.

ContributorsRuskin, Lewis John (Author) / Liu, Huan (Thesis director) / Montgomery, Douglas (Committee member) / Morstatter, Fred (Committee member) / Industrial, Systems (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2017-05

Exploration of Sea Ice Concentrations using Graph Metrics

Description

As an example of "big data," we consider a repository of Arctic sea ice concentration data collected from satellites over the years 1979-2005. The data is represented by a graph, where vertices correspond to measurement points, and an edge is inserted between two vertices if the Pearson correlation coefficient between…

As an example of "big data," we consider a repository of Arctic sea ice concentration data collected from satellites over the years 1979-2005. The data is represented by a graph, where vertices correspond to measurement points, and an edge is inserted between two vertices if the Pearson correlation coefficient between them exceeds a threshold. We investigate new questions about the structure of the graph related to betweenness, closeness centrality, vertex degrees, and characteristic path length. We also investigate whether an offset of weeks and years in graph generation results in a cosine similarity value that differs significantly from expected values. Finally, we relate the computational results to trends in Arctic ice.

ContributorsDougherty, Ryan Edward (Author) / Syrotiuk, Violet (Thesis director) / Colbourn, Charles (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Computer Science and Engineering Program (Contributor)

Created2015-05

Constructing Locating arrays with Constraints using Constraint Satisfaction

Description

When designing screening experiments for many factors, two problems quickly arise. The first is that testing all the different combinations of the factors and interactions creates an experiment that is too large to conduct in a practical amount of time. One way this problem is solved is with…

When designing screening experiments for many factors, two problems quickly arise. The first is that testing all the different combinations of the factors and interactions creates an experiment that is too large to conduct in a practical amount of time. One way this problem is solved is with a combinatorial design called a locating array (LA) which can efficiently identify the factors and interactions most influential on a response. The second problem is how to ensure that combinations that prohibit some particular tests are absent, a requirement that is common in real-world systems. This research proposes a solution to the second problem using constraint satisfaction.

ContributorsMiller, Vincent Joseph (Author) / Syrotiuk, Violet (Thesis director) / Colbourn, Charles (Committee member) / Computer Science and Engineering Program (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Statistical Analysis of Power Differences between Experimental Design Software Packages

Description

Based on findings of previous studies, there was speculation that two well-known experimental design software packages, JMP and Design Expert, produced varying power outputs given the same design and user inputs. For context and scope, another popular experimental design software package, Minitab® Statistical Software version 17, was added to the…

Based on findings of previous studies, there was speculation that two well-known experimental design software packages, JMP and Design Expert, produced varying power outputs given the same design and user inputs. For context and scope, another popular experimental design software package, Minitab® Statistical Software version 17, was added to the comparison. The study compared multiple test cases run on the three software packages with a focus on 2k and 3K factorial design and adjusting the standard deviation effect size, number of categorical factors, levels, number of factors, and replicates. All six cases were run on all three programs and were attempted to be run at one, two, and three replicates each. There was an issue at the one replicate stage, however—Minitab does not allow for only one replicate full factorial designs and Design Expert will not provide power outputs for only one replicate unless there are three or more factors. From the analysis of these results, it was concluded that the differences between JMP 13 and Design Expert 10 were well within the margin of error and likely caused by rounding. The differences between JMP 13, Design Expert 10, and Minitab 17 on the other hand indicated a fundamental difference in the way Minitab addressed power calculation compared to the latest versions of JMP and Design Expert. This was found to be likely a cause of Minitab’s dummy variable coding as its default instead of the orthogonal coding default of the other two. Although dummy variable and orthogonal coding for factorial designs do not show a difference in results, the methods affect the overall power calculations. All three programs can be adjusted to use either method of coding, but the exact instructions for how are difficult to find and thus a follow-up guide on changing the coding for factorial variables would improve this issue.

ContributorsArmstrong, Julia Robin (Author) / McCarville, Daniel R. (Thesis director) / Montgomery, Douglas (Committee member) / Industrial, Systems (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2017-05

Using an Open-Source Solution to Implement a Drone Cyber-Physical System

Description

The goal of this project is to use an open-source solution to implement a drone Cyber-Physical System that can fly autonomously and accurately. The proof-of-concept to analyze the drone's flight capabilities is to fly in a pattern corresponding to the outline of an image, a process that requires both stability…

The goal of this project is to use an open-source solution to implement a drone Cyber-Physical System that can fly autonomously and accurately. The proof-of-concept to analyze the drone's flight capabilities is to fly in a pattern corresponding to the outline of an image, a process that requires both stability and precision to accurately depict the image. In this project, we found that building a Cyber-Physical System is difficult because of the tedious and complex nature of designing and testing the hardware and software solutions of this system. Furthermore, we reflect on the difficulties that arose from using open-source hardware and software.

ContributorsDedinsky, Rachel (Co-author) / Lubbers, Harrison James (Co-author) / Shrivastava, Aviral (Thesis director) / Dougherty, Ryan (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05