Matching Items (70)

133698-Thumbnail Image.png

An Algorithm for Merging Identities

Description

In online social networks the identities of users are concealed, often by design. This anonymity makes it possible for a single person to have multiple accounts and to engage in

In online social networks the identities of users are concealed, often by design. This anonymity makes it possible for a single person to have multiple accounts and to engage in malicious activity such as defrauding a service providers, leveraging social influence, or hiding activities that would otherwise be detected. There are various methods for detecting whether two online users in a network are the same people in reality and the simplest way to utilize this information is to simply merge their identities and treat the two users as a single user. However, this then raises the issue of how we deal with these composite identities. To solve this problem, we introduce a mathematical abstraction for representing users and their identities as partitions on a set. We then define a similarity function, SIM, between two partitions, a set of properties that SIM must have, and a threshold that SIM must exceed for two users to be considered the same person. The main theoretical result of our work is a proof that for any given partition and similarity threshold, there is only a single unique way to merge the identities of similar users such that no two identities are similar. We also present two algorithms, COLLAPSE and SIM_MERGE, that merge the identities of users to find this unique set of identities. We prove that both algorithms execute in polynomial time and we also perform an experiment on dark web social network data from over 6000 users that demonstrates the runtime of SIM_MERGE.

Contributors

Agent

Created

Date Created
  • 2018-05

149991-Thumbnail Image.png

Compressive sensing for computer vision and image processing

Description

With the introduction of compressed sensing and sparse representation,many image processing and computer vision problems have been looked at in a new way. Recent trends indicate that many challenging computer

With the introduction of compressed sensing and sparse representation,many image processing and computer vision problems have been looked at in a new way. Recent trends indicate that many challenging computer vision and image processing problems are being solved using compressive sensing and sparse representation algorithms. This thesis assays some applications of compressive sensing and sparse representation with regards to image enhancement, restoration and classication. The first application deals with image Super-Resolution through compressive sensing based sparse representation. A novel framework is developed for understanding and analyzing some of the implications of compressive sensing in reconstruction and recovery of an image through raw-sampled and trained dictionaries. Properties of the projection operator and the dictionary are examined and the corresponding results presented. In the second application a novel technique for representing image classes uniquely in a high-dimensional space for image classification is presented. In this method, design and implementation strategy of the image classification system through unique affine sparse codes is presented, which leads to state of the art results. This further leads to analysis of some of the properties attributed to these unique sparse codes. In addition to obtaining these codes, a strong classier is designed and implemented to boost the results obtained. Evaluation with publicly available datasets shows that the proposed method outperforms other state of the art results in image classication. The final part of the thesis deals with image denoising with a novel approach towards obtaining high quality denoised image patches using only a single image. A new technique is proposed to obtain highly correlated image patches through sparse representation, which are then subjected to matrix completion to obtain high quality image patches. Experiments suggest that there may exist a structure within a noisy image which can be exploited for denoising through a low-rank constraint.

Contributors

Agent

Created

Date Created
  • 2011

149754-Thumbnail Image.png

Production scheduling and system configuration for capacitated flow lines with application in the semiconductor backend process

Description

A good production schedule in a semiconductor back-end facility is critical for the on time delivery of customer orders. Compared to the front-end process that is dominated by re-entrant product

A good production schedule in a semiconductor back-end facility is critical for the on time delivery of customer orders. Compared to the front-end process that is dominated by re-entrant product flows, the back-end process is linear and therefore more suitable for scheduling. However, the production scheduling of the back-end process is still very difficult due to the wide product mix, large number of parallel machines, product family related setups, machine-product qualification, and weekly demand consisting of thousands of lots. In this research, a novel mixed-integer-linear-programming (MILP) model is proposed for the batch production scheduling of a semiconductor back-end facility. In the MILP formulation, the manufacturing process is modeled as a flexible flow line with bottleneck stages, unrelated parallel machines, product family related sequence-independent setups, and product-machine qualification considerations. However, this MILP formulation is difficult to solve for real size problem instances. In a semiconductor back-end facility, production scheduling usually needs to be done every day while considering updated demand forecast for a medium term planning horizon. Due to the limitation on the solvable size of the MILP model, a deterministic scheduling system (DSS), consisting of an optimizer and a scheduler, is proposed to provide sub-optimal solutions in a short time for real size problem instances. The optimizer generates a tentative production plan. Then the scheduler sequences each lot on each individual machine according to the tentative production plan and scheduling rules. Customized factory rules and additional resource constraints are included in the DSS, such as preventive maintenance schedule, setup crew availability, and carrier limitations. Small problem instances are randomly generated to compare the performances of the MILP model and the deterministic scheduling system. Then experimental design is applied to understand the behavior of the DSS and identify the best configuration of the DSS under different demand scenarios. Product-machine qualification decisions have long-term and significant impact on production scheduling. A robust product-machine qualification matrix is critical for meeting demand when demand quantity or mix varies. In the second part of this research, a stochastic mixed integer programming model is proposed to balance the tradeoff between current machine qualification costs and future backorder costs with uncertain demand. The L-shaped method and acceleration techniques are proposed to solve the stochastic model. Computational results are provided to compare the performance of different solution methods.

Contributors

Agent

Created

Date Created
  • 2011

149901-Thumbnail Image.png

Query expansion for handling exploratory and ambiguous keyword queries

Description

Query Expansion is a functionality of search engines that suggest a set of related queries for a user issued keyword query. In case of exploratory or ambiguous keyword queries, the

Query Expansion is a functionality of search engines that suggest a set of related queries for a user issued keyword query. In case of exploratory or ambiguous keyword queries, the main goal of the user would be to identify and select a specific category of query results among different categorical options, in order to narrow down the search and reach the desired result. Typical corpus-driven keyword query expansion approaches return popular words in the results as expanded queries. These empirical methods fail to cover all semantics of categories present in the query results. More importantly these methods do not consider the semantic relationship between the keywords featured in an expanded query. Contrary to a normal keyword search setting, these factors are non-trivial in an exploratory and ambiguous query setting where the user's precise discernment of different categories present in the query results is more important for making subsequent search decisions. In this thesis, I propose a new framework for keyword query expansion: generating a set of queries that correspond to the categorization of original query results, which is referred as Categorizing query expansion. Two approaches of algorithms are proposed, one that performs clustering as pre-processing step and then generates categorizing expanded queries based on the clusters. The other category of algorithms handle the case of generating quality expanded queries in the presence of imperfect clusters.

Contributors

Agent

Created

Date Created
  • 2011

150111-Thumbnail Image.png

Post-optimization: necessity analysis for combinatorial arrays

Description

Finding the optimal solution to a problem with an enormous search space can be challenging. Unless a combinatorial construction technique is found that also guarantees the optimality of the resulting

Finding the optimal solution to a problem with an enormous search space can be challenging. Unless a combinatorial construction technique is found that also guarantees the optimality of the resulting solution, this could be an infeasible task. If such a technique is unavailable, different heuristic methods are generally used to improve the upper bound on the size of the optimal solution. This dissertation presents an alternative method which can be used to improve a solution to a problem rather than construct a solution from scratch. Necessity analysis, which is the key to this approach, is the process of analyzing the necessity of each element in a solution. The post-optimization algorithm presented here utilizes the result of the necessity analysis to improve the quality of the solution by eliminating unnecessary objects from the solution. While this technique could potentially be applied to different domains, this dissertation focuses on k-restriction problems, where a solution to the problem can be presented as an array. A scalable post-optimization algorithm for covering arrays is described, which starts from a valid solution and performs necessity analysis to iteratively improve the quality of the solution. It is shown that not only can this technique improve upon the previously best known results, it can also be added as a refinement step to any construction technique and in most cases further improvements are expected. The post-optimization algorithm is then modified to accommodate every k-restriction problem; and this generic algorithm can be used as a starting point to create a reasonable sized solution for any such problem. This generic algorithm is then further refined for hash family problems, by adding a conflict graph analysis to the necessity analysis phase. By recoloring the conflict graphs a new degree of flexibility is explored, which can further improve the quality of the solution.

Contributors

Agent

Created

Date Created
  • 2011

150836-Thumbnail Image.png

A visualization dashboard for Muslim social movements

Description

Muslim radicalism is recognized as one of the greatest security threats for the United States and the rest of the world. Use of force to eliminate specific radical entities is

Muslim radicalism is recognized as one of the greatest security threats for the United States and the rest of the world. Use of force to eliminate specific radical entities is ineffective in containing radicalism as a whole. There is a need to understand the origin, ideologies and behavior of Radical and Counter-Radical organizations and how they shape up over a period of time. Recognizing and supporting counter-radical organizations is one of the most important steps towards impeding radical organizations. A lot of research has already been done to categorize and recognize organizations, to understand their behavior, their interactions with other organizations, their target demographics and the area of influence. We have a huge amount of information which is a result of the research done over these topics. This thesis provides a powerful and interactive way to navigate through all this information, using a Visualization Dashboard. The dashboard makes it easier for Social Scientists, Policy Analysts, Military and other personnel to visualize an organization's propensity towards violence and radicalism. It also tracks the peaking religious, political and socio-economic markers, their target demographics and locations. A powerful search interface with parametric search helps in narrowing down to specific scenarios and view the corresponding information related to the organizations. This tool helps to identify moderate Counter-Radical organizations and also has the potential of predicting the orientation of various organizations based on the current information.

Contributors

Agent

Created

Date Created
  • 2012

151500-Thumbnail Image.png

Design, analysis and resource allocations in networks in presence of region-based faults

Description

Communication networks, both wired and wireless, are expected to have a certain level of fault-tolerance capability.These networks are also expected to ensure a graceful degradation in performance when some of

Communication networks, both wired and wireless, are expected to have a certain level of fault-tolerance capability.These networks are also expected to ensure a graceful degradation in performance when some of the network components fail. Traditional studies on fault tolerance in communication networks, for the most part, make no assumptions regarding the location of node/link faults, i.e., the faulty nodes and links may be close to each other or far from each other. However, in many real life scenarios, there exists a strong spatial correlation among the faulty nodes and links. Such failures are often encountered in disaster situations, e.g., natural calamities or enemy attacks. In presence of such region-based faults, many of traditional network analysis and fault-tolerant metrics, that are valid under non-spatially correlated faults, are no longer applicable. To this effect, the main thrust of this research is design and analysis of robust networks in presence of such region-based faults. One important finding of this research is that if some prior knowledge is available on the maximum size of the region that might be affected due to a region-based fault, this piece of knowledge can be effectively utilized for resource efficient design of networks. It has been shown in this dissertation that in some scenarios, effective utilization of this knowledge may result in substantial saving is transmission power in wireless networks. In this dissertation, the impact of region-based faults on the connectivity of wireless networks has been studied and a new metric, region-based connectivity, is proposed to measure the fault-tolerance capability of a network. In addition, novel metrics, such as the region-based component decomposition number(RBCDN) and region-based largest component size(RBLCS) have been proposed to capture the network state, when a region-based fault disconnects the network. Finally, this dissertation presents efficient resource allocation techniques that ensure tolerance against region-based faults, in distributed file storage networks and data center networks.

Contributors

Agent

Created

Date Created
  • 2013

151517-Thumbnail Image.png

Industrial applications of data mining: engineering effort forecasting based on mining and analysis of patterns in historical project execution data

Description

Data mining is increasing in importance in solving a variety of industry problems. Our initiative involves the estimation of resource requirements by skill set for future projects by mining and

Data mining is increasing in importance in solving a variety of industry problems. Our initiative involves the estimation of resource requirements by skill set for future projects by mining and analyzing actual resource consumption data from past projects in the semiconductor industry. To achieve this goal we face difficulties like data with relevant consumption information but stored in different format and insufficient data about project attributes to interpret consumption data. Our first goal is to clean the historical data and organize it into meaningful structures for analysis. Once the preprocessing on data is completed, different data mining techniques like clustering is applied to find projects which involve resources of similar skillsets and which involve similar complexities and size. This results in "resource utilization templates" for groups of related projects from a resource consumption perspective. Then project characteristics are identified which generate this diversity in headcounts and skillsets. These characteristics are not currently contained in the data base and are elicited from the managers of historical projects. This represents an opportunity to improve the usefulness of the data collection system for the future. The ultimate goal is to match the product technical features with the resource requirement for projects in the past as a model to forecast resource requirements by skill set for future projects. The forecasting model is developed using linear regression with cross validation of the training data as the past project execution are relatively few in number. Acceptable levels of forecast accuracy are achieved relative to human experts' results and the tool is applied to forecast some future projects' resource demand.

Contributors

Agent

Created

Date Created
  • 2013

151063-Thumbnail Image.png

Robust and efficient medium access despite jamming

Description

Interference constitutes a major challenge for communication networks operating over a shared medium where availability is imperative. This dissertation studies the problem of designing and analyzing efficient medium access protocols

Interference constitutes a major challenge for communication networks operating over a shared medium where availability is imperative. This dissertation studies the problem of designing and analyzing efficient medium access protocols which are robust against strong adversarial jamming. More specifically, four medium access (MAC) protocols (i.e., JADE, ANTIJAM, COMAC, and SINRMAC) which aim to achieve high throughput despite jamming activities under a variety of network and adversary models are presented. We also propose a self-stabilizing leader election protocol, SELECT, that can effectively elect a leader in the network with the existence of a strong adversary. Our protocols can not only deal with internal interference without the exact knowledge on the number of participants in the network, but they are also robust to unintentional or intentional external interference, e.g., due to co-existing networks or jammers. We model the external interference by a powerful adaptive and/or reactive adversary which can jam a (1 − ε)-portion of the time steps, where 0 < ε ≤ 1 is an arbitrary constant. We allow the adversary to be adaptive and to have complete knowledge of the entire protocol history. Moreover, in case the adversary is also reactive, it uses carrier sensing to make informed decisions to disrupt communications. Among the proposed protocols, JADE, ANTIJAM and COMAC are able to achieve Θ(1)-competitive throughput with the presence of the strong adversary; while SINRMAC is the first attempt to apply SINR model (i.e., Signal to Interference plus Noise Ratio), in robust medium access protocols design; the derived principles are also useful to build applications on top of the MAC layer, and we present SELECT, which is an exemplary study for leader election, which is one of the most fundamental tasks in distributed computing.

Contributors

Agent

Created

Date Created
  • 2012

151004-Thumbnail Image.png

A tool for threading, organizing and presenting emails using a web interface

Description

The overall contribution of the Minerva Initiative at ASU is to map social organizations in a multidimensional space that provides a measure of their radical or counter radical influence over

The overall contribution of the Minerva Initiative at ASU is to map social organizations in a multidimensional space that provides a measure of their radical or counter radical influence over the demographics of a nation. This tool serves as a simple content management system to store and track project resources like documents, images, videos and web links. It provides centralized and secure access to email conversations among project team members. Conversations are categorized into one of the seven pre-defined categories. Each category is associated with a certain set of keywords and we follow a frequency based approach for matching email conversations with the categories. The interface is hosted as a web application which can be accessed by the project team.

Contributors

Agent

Created

Date Created
  • 2012