Search Content

Fixed verse generation using neural word embeddings

Description

For the past three decades, the design of an effective strategy for generating poetry that matches that of a human’s creative capabilities and complexities has been an elusive goal in artificial intelligence (AI) and natural language generation (NLG) research, and among linguistic creativity researchers in particular. This thesis presents a…

For the past three decades, the design of an effective strategy for generating poetry that matches that of a human’s creative capabilities and complexities has been an elusive goal in artificial intelligence (AI) and natural language generation (NLG) research, and among linguistic creativity researchers in particular. This thesis presents a novel approach to fixed verse poetry generation using neural word embeddings. During the course of generation, a two layered poetry classifier is developed. The first layer uses a lexicon based method to classify poems into types based on form and structure, and the second layer uses a supervised classification method to classify poems into subtypes based on content with an accuracy of 92%. The system then uses a two-layer neural network to generate poetry based on word similarities and word movements in a 50-dimensional vector space.

The verses generated by the system are evaluated using rhyme, rhythm, syllable counts and stress patterns. These computational features of language are considered for generating haikus, limericks and iambic pentameter verses. The generated poems are evaluated using a Turing test on both experts and non-experts. The user study finds that only 38% computer generated poems were correctly identified by nonexperts while 65% of the computer generated poems were correctly identified by experts. Although the system does not pass the Turing test, the results from the Turing test suggest an improvement of over 17% when compared to previous methods which use Turing tests to evaluate poetry generators.

ContributorsMagge, Arjun (Author) / Syrotiuk, Violet R. (Thesis advisor) / Baral, Chitta (Committee member) / Hogue, Cynthia (Committee member) / Bazzi, Rida (Committee member) / Arizona State University (Publisher)

Created2016

Answer set programming modulo theories

Description

Knowledge representation and reasoning is a prominent subject of study within the field of artificial intelligence that is concerned with the symbolic representation of knowledge in such a way to facilitate automated reasoning about this knowledge. Often in real-world domains, it is necessary to perform defeasible reasoning when representing default…

Knowledge representation and reasoning is a prominent subject of study within the field of artificial intelligence that is concerned with the symbolic representation of knowledge in such a way to facilitate automated reasoning about this knowledge. Often in real-world domains, it is necessary to perform defeasible reasoning when representing default behaviors of systems. Answer Set Programming is a widely-used knowledge representation framework that is well-suited for such reasoning tasks and has been successfully applied to practical domains due to efficient computation through grounding--a process that replaces variables with variable-free terms--and propositional solvers similar to SAT solvers. However, some domains provide a challenge for grounding-based methods such as domains requiring reasoning about continuous time or resources.

To address these domains, there have been several proposals to achieve efficiency through loose integrations with efficient declarative solvers such as constraint solvers or satisfiability modulo theories solvers. While these approaches successfully avoid substantial grounding, due to the loose integration, they are not suitable for performing defeasible reasoning on functions. As a result, this expressive reasoning on functions must either be performed using predicates to simulate the functions or in a way that is not elaboration tolerant. Neither compromise is reasonable; the former suffers from the grounding bottleneck when domains are large as is often the case in real-world domains while the latter necessitates encodings to be non-trivially modified for elaborations.

This dissertation presents a novel framework called Answer Set Programming Modulo Theories (ASPMT) that is a tight integration of the stable model semantics and satisfiability modulo theories. This framework both supports defeasible reasoning about functions and alleviates the grounding bottleneck. Combining the strengths of Answer Set Programming and satisfiability modulo theories enables efficient continuous reasoning while still supporting rich reasoning features such as reasoning about defaults and reasoning in domains with incomplete knowledge. This framework is realized in two prototype implementations called MVSM and ASPMT2SMT, and the latter was recently incorporated into a non-monotonic spatial reasoning system. To define the semantics of this framework, we extend the first-order stable model semantics by Ferraris, Lee and Lifschitz to allow "intensional functions" and provide analyses of the theoretical properties of this new formalism and on the relationships between this and existing approaches.

ContributorsBartholomew, Michael James (Author) / Lee, Joohyung (Thesis advisor) / Bazzi, Rida (Committee member) / Colbourn, Charles (Committee member) / Fainekos, Georgios (Committee member) / Lifschitz, Vladimir (Committee member) / Arizona State University (Publisher)

Created2016

A non-consensus based decentralized financial transaction processing model with support for efficient auditing

Description

The success of Bitcoin has generated significant interest in the financial community to understand whether the technological underpinnings of the cryptocurrency paradigm can be leveraged to improve the efficiency of financial processes in the existing infrastructure. Various alternative proposals, most notably, Ripple and Ethereum, aim to provide solutions to the…

The success of Bitcoin has generated significant interest in the financial community to understand whether the technological underpinnings of the cryptocurrency paradigm can be leveraged to improve the efficiency of financial processes in the existing infrastructure. Various alternative proposals, most notably, Ripple and Ethereum, aim to provide solutions to the financial community in different ways. These proposals derive their security guarantees from either the computational hardness of proof-of-work or voting based distributed consensus mechanism, both of which can be computationally expensive. Furthermore, the financial audit requirements for a participating financial institutions have not been suitably addressed.

This thesis presents a novel approach of constructing a non-consensus based decentralized financial transaction processing model with a built-in efficient audit structure. The problem of decentralized inter-bank payment processing is used for the model design. The two key insights used in this work are (1) to utilize a majority signature based replicated storage protocol for transaction authorization, and (2) to construct individual self-verifiable audit trails for each node as opposed to a common Blockchain. Theoretical analysis shows that the model provides cryptographic security for transaction processing and the presented audit structure facilitates financial auditing of individual nodes in time independent of the number of transactions.

ContributorsGupta, Saurabh (Author) / Bazzi, Rida (Thesis advisor) / Ahn, Gail-Joon (Committee member) / Herlihy, Maurice (Committee member) / Arizona State University (Publisher)

Created2016

Categorization of Phishing Detection Features And Using the Feature Vectors to Classify Phishing Websites

Description

Phishing is a form of online fraud where a spoofed website tries to gain access to user's sensitive information by tricking the user into believing that it is a benign website. There are several solutions to detect phishing attacks such as educating users, using blacklists or extracting phishing characteristics found…

Phishing is a form of online fraud where a spoofed website tries to gain access to user's sensitive information by tricking the user into believing that it is a benign website. There are several solutions to detect phishing attacks such as educating users, using blacklists or extracting phishing characteristics found to exist in phishing attacks. In this thesis, we analyze approaches that extract features from phishing websites and train classification models with extracted feature set to classify phishing websites. We create an exhaustive list of all features used in these approaches and categorize them into 6 broader categories and 33 finer categories. We extract 59 features from the URL, URL redirects, hosting domain (WHOIS and DNS records) and popularity of the website and analyze their robustness in classifying a phishing website. Our emphasis is on determining the predictive performance of robust features. We evaluate the classification accuracy when using the entire feature set and when URL features or site popularity features are excluded from the feature set and show how our approach can be used to effectively predict specific types of phishing attacks such as shortened URLs and randomized URLs. Using both decision table classifiers and neural network classifiers, our results indicate that robust features seem to have enough predictive power to be used in practice.

ContributorsNamasivayam, Bhuvana Lalitha (Author) / Bazzi, Rida (Thesis advisor) / Zhao, Ziming (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2017

Pingo: A Framework for the Management of Storage of Intermediate Outputs of Computational Workflows

Description

Scientific workflows allow scientists to easily model and express the entire data processing steps, typically as a directed acyclic graph (DAG). These scientific workflows are made of a collection of tasks that usually take a long time to compute and that produce a considerable amount of intermediate datasets. Because…

Scientific workflows allow scientists to easily model and express the entire data processing steps, typically as a directed acyclic graph (DAG). These scientific workflows are made of a collection of tasks that usually take a long time to compute and that produce a considerable amount of intermediate datasets. Because of the nature of scientific exploration, a scientific workflow can be modified and re-run multiple times, or new scientific workflows are created that might make use of past intermediate datasets. Storing intermediate datasets has the potential to save time in computations. Since storage is limited, one main problem that needs a solution is determining which intermediate datasets need to be saved at creation time in order to minimize the computational time of the workflows to be run in the future. This research thesis proposes the design and implementation of Pingo, a system that is capable of managing the computations of scientific workflows as well as the storage, provenance and deletion of intermediate datasets. Pingo uses the history of workflows submitted to the system to predict the most likely datasets to be needed in the future, and subjects the decision of dataset deletion to the optimization of the computational time of future workflows.

Contributorsde Armas, Jadiel (Author) / Bazzi, Rida (Thesis advisor) / Huang, Dijiang (Committee member) / Syrotiuk, Violet (Committee member) / Arizona State University (Publisher)

Created2017

UnSync: a soft error resilient redundant CMP architecture

Description

Reducing device dimensions, increasing transistor densities, and smaller timing windows, expose the vulnerability of processors to soft errors induced by charge carrying particles. Since these factors are inevitable in the advancement of processor technology, the industry has been forced to improve reliability on general purpose Chip Multiprocessors (CMPs). With the…

Reducing device dimensions, increasing transistor densities, and smaller timing windows, expose the vulnerability of processors to soft errors induced by charge carrying particles. Since these factors are inevitable in the advancement of processor technology, the industry has been forced to improve reliability on general purpose Chip Multiprocessors (CMPs). With the availability of increased hardware resources, redundancy based techniques are the most promising methods to eradicate soft error failures in CMP systems. This work proposes a novel customizable and redundant CMP architecture (UnSync) that utilizes hardware based detection mechanisms (most of which are readily available in the processor), to reduce overheads during error free executions. In the presence of errors (which are infrequent), the always forward execution enabled recovery mechanism provides for resilience in the system. The inherent nature of UnSync architecture framework supports customization of the redundancy, and thereby provides means to achieve possible performance-reliability trade-offs in many-core systems. This work designs a detailed RTL model of UnSync architecture and performs hardware synthesis to compare the hardware (power/area) overheads incurred. It then compares the same with those of the Reunion technique, a state-of-the-art redundant multi-core architecture. This work also performs cycle-accurate simulations over a wide range of SPEC2000, and MiBench benchmarks to evaluate the performance efficiency achieved over that of the Reunion architecture. Experimental results show that, UnSync architecture reduces power consumption by 34.5% and improves performance by up to 20% with 13.3% less area overhead, when compared to Reunion architecture for the same level of reliability achieved.

ContributorsHong, Fei (Author) / Shrivastava, Aviral (Thesis advisor) / Bazzi, Rida (Committee member) / Fainekos, Georgios (Committee member) / Arizona State University (Publisher)

Created2011

GCKEngine - An Algorithm for Automatic Ontology Building

Description

To facilitate the development of the Semantic Web, we propose in this thesis a general automatic ontology-building algorithm which, given a pool of potential terms and a set of relationships to include in the ontology, can utilize information gathered from Google queries to build a full ontology for a certain…

To facilitate the development of the Semantic Web, we propose in this thesis a general automatic ontology-building algorithm which, given a pool of potential terms and a set of relationships to include in the ontology, can utilize information gathered from Google queries to build a full ontology for a certain domain. We utilized this ontology-building algorithm as part of a larger system to tag computer tutorials for three systems with different kinds of metadata, and index the tagged documents into a search engine. Our evaluation of the resultant search engine indicates that our automatic ontology-building algorithm is able to build relatively good-quality ontologies and utilize this ontology to effectively apply metadata to documents.

ContributorsWalliman, Garret Greg (Author) / Davulcu, Hasan (Thesis director) / Liu, Huan (Committee member) / Bazzi, Rida (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2013-05

Cryptojacking Detection: A Classification and Comparison of Malicious Cryptocurrency Mining Detection Systems

Description

Cryptojacking is a process in which a program utilizes a user’s CPU to mine cryptocurrencies unknown to the user. Since cryptojacking is a relatively new problem and its impact is still limited, very little has been done to combat it. Multiple studies have been conducted where a cryptojacking detection system…

Cryptojacking is a process in which a program utilizes a user’s CPU to mine cryptocurrencies unknown to the user. Since cryptojacking is a relatively new problem and its impact is still limited, very little has been done to combat it. Multiple studies have been conducted where a cryptojacking detection system is implemented, but none of these systems have truly solved the problem. This thesis surveys existing studies and provides a classification and evaluation of each detection system with the aim of determining their pros and cons. The result of the evaluation indicates that it might be possible to bypass detection of existing systems by modifying the cryptojacking code. In addition to this classification, I developed an automatic code instrumentation program that replaces specific instructions with functionally similar sequences as a way to show how easy it is to implement simple obfuscation to bypass detection by existing systems.

ContributorsLarson, Kent Merle (Author) / Bazzi, Rida (Thesis director) / Shoshitaishvili, Yan (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2021-05

Scuttlebutt and Whuffie: Reputation in Distributed Networks

Description

Secure Scuttlebutt is a digital social network in which the network data is distributed among the users. This is done to secure several benefits, like offline browsing, censorship resistance, and to imitate natural social networks, but it comes with downsides, like the lack of an obvious implementation of a recommendation algorithm. This…

Secure Scuttlebutt is a digital social network in which the network data is distributed among the users. This is done to secure several benefits, like offline browsing, censorship resistance, and to imitate natural social networks, but it comes with downsides, like the lack of an obvious implementation of a recommendation algorithm. This paper proposes Whuffie, an algorithm that tracks each user's reputation for having information that is interesting to a user using conditional probabilities. Some errors in the main Secure Scuttlebutt network prevent current large-scale testing of the usefulness of the algorithm, but testing on my own personal account led me to believe it a success.

ContributorsVermillion, Alexander J (Author) / Bazzi, Rida (Thesis director) / Richa, Andrea (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2021-05

The Voices Behind Robocalls in the Telephone Spam Ecosystem

Description

The rampant occurrence of spam telephone calls shows a clear weakness of authentication and security in our telephone systems. The onset of cheap and effective voice over Internet Protocol (VoIP) technology is a major factor in this as our existing telephone ecosystem is virtually defenseless by many features of this…

The rampant occurrence of spam telephone calls shows a clear weakness of authentication and security in our telephone systems. The onset of cheap and effective voice over Internet Protocol (VoIP) technology is a major factor in this as our existing telephone ecosystem is virtually defenseless by many features of this technology. Our telephone systems have also suffered tremendously from a lack of a proper Caller ID verification system. Phone call spammers are able to mask their identities with relative ease by quickly editing their Caller ID. It will take a combination of unique innovations in implementing new authentication mechanisms in the telephone ecosystem, novel government regulation, and understanding how the people behind the spam phone calls themselves operate. This study dives into the robocall ecosystem to find more about the humans behind spam telephone calls and the economic models they use. Understanding how the people behind robocalls work within their environments will allow for more insight into how the ecosystem works. The study looks at the human component of robocalls: what ways they benefit from conducting spam phone calls, patterns in how they identify which phone number to call, and how these people interact with each other within the telephone spam ecosystem. This information will be pivotal to educate consumers on how they should mitigate spam as well as for creating defensive systems. In this qualitative study, we have conducted numerous interviews with call center employees, have had participants fill out surveys, and garnered data through our CallFire integrated voice broadcast system. While the research is still ongoing, initial conclusions in my pilot study interview data point to promising transparency in how the voices behind these calls operate on both a small and large scale.

ContributorsUsman, Ahmed (Author) / Doupe, Adam (Thesis director) / Bazzi, Rida (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2021-05

Filtering by