Search Content

A semantic triplet based story classifier

Description

Text classification, in the artificial intelligence domain, is an activity in which text documents are automatically classified into predefined categories using machine learning techniques. An example of this is classifying uncategorized news articles into different predefined categories such as "Business", "Politics", "Education", "Technology" , etc. In this thesis, supervised machine…

Text classification, in the artificial intelligence domain, is an activity in which text documents are automatically classified into predefined categories using machine learning techniques. An example of this is classifying uncategorized news articles into different predefined categories such as "Business", "Politics", "Education", "Technology" , etc. In this thesis, supervised machine learning approach is followed, in which a module is first trained with pre-classified training data and then class of test data is predicted. Good feature extraction is an important step in the machine learning approach and hence the main component of this text classifier is semantic triplet based features in addition to traditional features like standard keyword based features and statistical features based on shallow-parsing (such as density of POS tags and named entities). Triplet {Subject, Verb, Object} in a sentence is defined as a relation between subject and object, the relation being the predicate (verb). Triplet extraction process, is a 5 step process which takes input corpus as a web text document(s), each consisting of one or many paragraphs, from RSS feeds to lists of extremist website. Input corpus feeds into the "Pronoun Resolution" step, which uses an heuristic approach to identify the noun phrases referenced by the pronouns. The next step "SRL Parser" is a shallow semantic parser and converts the incoming pronoun resolved paragraphs into annotated predicate argument format. The output of SRL parser is processed by "Triplet Extractor" algorithm which forms the triplet in the form {Subject, Verb, Object}. Generalization and reduction of triplet features is the next step. Reduced feature representation reduces computing time, yields better discriminatory behavior and handles curse of dimensionality phenomena. For training and testing, a ten- fold cross validation approach is followed. In each round SVM classifier is trained with 90% of labeled (training) data and in the testing phase, classes of remaining 10% unlabeled (testing) data are predicted. Concluding, this paper proposes a model with semantic triplet based features for story classification. The effectiveness of the model is demonstrated against other traditional features used in the literature for text classification tasks.

ContributorsKarad, Ravi Chandravadan (Author) / Davulcu, Hasan (Thesis advisor) / Corman, Steven (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2013

Techniques for supporting prediction of security breaches in critical cloud infrastructures using Bayesian network and Markov decision process

Description

Emerging trends in cyber system security breaches in critical cloud infrastructures show that attackers have abundant resources (human and computing power), expertise and support of large organizations and possible foreign governments. In order to greatly improve the protection of critical cloud infrastructures, incorporation of human behavior is needed to predict…

Emerging trends in cyber system security breaches in critical cloud infrastructures show that attackers have abundant resources (human and computing power), expertise and support of large organizations and possible foreign governments. In order to greatly improve the protection of critical cloud infrastructures, incorporation of human behavior is needed to predict potential security breaches in critical cloud infrastructures. To achieve such prediction, it is envisioned to develop a probabilistic modeling approach with the capability of accurately capturing system-wide causal relationship among the observed operational behaviors in the critical cloud infrastructure and accurately capturing probabilistic human (users’) behaviors on subsystems as the subsystems are directly interacting with humans. In our conceptual approach, the system-wide causal relationship can be captured by the Bayesian network, and the probabilistic human behavior in the subsystems can be captured by the Markov Decision Processes. The interactions between the dynamically changing state graphs of Markov Decision Processes and the dynamic causal relationships in Bayesian network are key components in such probabilistic modelling applications. In this thesis, two techniques are presented for supporting the above vision to prediction of potential security breaches in critical cloud infrastructures. The first technique is for evaluation of the conformance of the Bayesian network with the multiple MDPs. The second technique is to evaluate the dynamically changing Bayesian network structure for conformance with the rules of the Bayesian network using a graph checker algorithm. A case study and its simulation are presented to show how the two techniques support the specific parts in our conceptual approach to predicting system-wide security breaches in critical cloud infrastructures.

ContributorsNagaraja, Vinjith (Author) / Yau, Stephen S. (Thesis advisor) / Ahn, Gail-Joon (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2015

Secure Mobile SDN

Description

The increasing usage of smart-phones and mobile devices in work environment and IT

industry has brought about unique set of challenges and opportunities. ARM architecture

in particular has evolved to a point where it supports implementations across wide spectrum

of performance points and ARM based tablets and smart-phones are in demand. The

enhancements to…

The increasing usage of smart-phones and mobile devices in work environment and IT

industry has brought about unique set of challenges and opportunities. ARM architecture

in particular has evolved to a point where it supports implementations across wide spectrum

of performance points and ARM based tablets and smart-phones are in demand. The

enhancements to basic ARM RISC architecture allow ARM to have high performance,

small code size, low power consumption and small silicon area. Users want their devices to

perform many tasks such as read email, play games, and run other online applications and

organizations no longer desire to provision and maintain individual’s IT equipment. The

term BYOD (Bring Your Own Device) has come into being from demand of such a work

setup and is one of the motivation of this research work. It brings many opportunities such

as increased productivity and reduced costs and challenges such as secured data access,

data leakage and amount of control by the organization.

To provision such a framework we need to bridge the gap from both organizations side

and individuals point of view. Mobile device users face issue of application delivery on

multiple platforms. For instance having purchased many applications from one proprietary

application store, individuals may want to move them to a different platform/device but

currently this is not possible. Organizations face security issues in providing such a solution

as there are many potential threats from allowing BYOD work-style such as unauthorized

access to data, attacks from the devices within and outside the network.

ARM based Secure Mobile SDN framework will resolve these issues and enable employees

to consolidate both personal and business calls and mobile data access on a single device.

To address application delivery issue we are introducing KVM based virtualization that

will allow host OS to run multiple guest OS. To address the security problem we introduce

SDN environment where host would be running bridged network of guest OS using Open

vSwitch . This would allow a remote controller to monitor the state of guest OS for making

important control and traffic flow decisions based on the situation.

ContributorsChowdhary, Ankur (Author) / Huang, Dijiang (Thesis advisor) / Tong, Hanghang (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2015

Analyzing the effects of Bollinger bands on the probability of stock options using support vector machines

Description

The purpose of this research is to efficiently analyze certain data provided and to see if a useful trend can be observed as a result. This trend can be used to analyze certain probabilities. There are three main pieces of data which are being analyzed in this research: The value…

The purpose of this research is to efficiently analyze certain data provided and to see if a useful trend can be observed as a result. This trend can be used to analyze certain probabilities. There are three main pieces of data which are being analyzed in this research: The value for δ of the call and put option, the %B value of the stock, and the amount of time until expiration of the stock option. The %B value is the most important. The purpose of analyzing the data is to see the relationship between the variables and, given certain values, what is the probability the trade makes money. This result will be used in finding the probability certain trades make money over a period of time.

Since options are so dependent on probability, this research specifically analyzes stock options rather than stocks themselves. Stock options have value like stocks except options are leveraged. The most common model used to calculate the value of an option is the Black-Scholes Model [1]. There are five main variables the Black-Scholes Model uses to calculate the overall value of an option. These variables are θ, δ, γ, v, and ρ. The variable, θ is the rate of change in price of the option due to time decay, δ is the rate of change of the option’s price due to the stock’s changing value, γ is the rate of change of δ, v represents the rate of change of the value of the option in relation to the stock’s volatility, and ρ represents the rate of change in value of the option in relation to the interest rate [2]. In this research, the %B value of the stock is analyzed along with the time until expiration of the option. All options have the same δ. This is due to the fact that all the options analyzed in this experiment are less than two months from expiration and the value of δ reveals how far in or out of the money an option is.

The machine learning technique used to analyze the data and the probability

is support vector machines. Support vector machines analyze data that can be classified in one of two or more groups and attempts to find a pattern in the data to develop a model, which reliably classifies similar, future data into the correct group. This is used to analyze the outcome of stock options.

ContributorsReeves, Michael (Author) / Richa, Andrea (Thesis advisor) / McCarville, Daniel R. (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2015

Predicting demographic and financial attributes in a bank marketing dataset

Description

Bank institutions employ several marketing strategies to maximize new customer acquisition as well as current customer retention. Telemarketing is one such approach taken where individual customers are contacted by bank representatives with offers. These telemarketing strategies can be improved in combination with data mining techniques that allow predictability…

Bank institutions employ several marketing strategies to maximize new customer acquisition as well as current customer retention. Telemarketing is one such approach taken where individual customers are contacted by bank representatives with offers. These telemarketing strategies can be improved in combination with data mining techniques that allow predictability of customer information and interests. In this thesis, bank telemarketing data from a Portuguese banking institution were analyzed to determine predictability of several client demographic and financial attributes and find most contributing factors in each. Data were preprocessed to ensure quality, and then data mining models were generated for the attributes with logistic regression, support vector machine (SVM) and random forest using Orange as the data mining tool. Results were analyzed using precision, recall and F1 score.

ContributorsEjaz, Samira (Author) / Davulcu, Hasan (Thesis advisor) / Balasooriya, Janaka (Committee member) / Candan, Kasim (Committee member) / Arizona State University (Publisher)

Created2016

ASU Electronic Theses and Dissertations

Filtering by

A semantic triplet based story classifier

Techniques for supporting prediction of security breaches in critical cloud infrastructures using Bayesian network and Markov decision process

Secure Mobile SDN

Analyzing the effects of Bollinger bands on the probability of stock options using support vector machines

Predicting demographic and financial attributes in a bank marketing dataset