Matching Items (36)
Filtering by

Clear all filters

153487-Thumbnail Image.png
Description
Internet browsers are today capable of warning internet users of a potential phishing attack. Browsers identify these websites by referring to blacklists of reported phishing websites maintained by trusted organizations like Google, Phishtank etc. On identifying a Unified Resource Locator (URL) requested by a user as a reported phishing URL,

Internet browsers are today capable of warning internet users of a potential phishing attack. Browsers identify these websites by referring to blacklists of reported phishing websites maintained by trusted organizations like Google, Phishtank etc. On identifying a Unified Resource Locator (URL) requested by a user as a reported phishing URL, browsers like Mozilla Firefox and Google Chrome display an 'active' warning message in an attempt to stop the user from making a potentially dangerous decision of visiting the website and sharing confidential information like username-password, credit card information, social security number etc.

However, these warnings are not always successful at safeguarding the user from a phishing attack. On several occasions, users ignore these warnings and 'click through' them, eventually landing at the potentially dangerous website and giving away confidential information. Failure to understand the warning, failure to differentiate different types of browser warnings, diminishing trust on browser warnings due to repeated encounter are some of the reasons that make users ignore these warnings. It is important to address these factors in order to eventually improve a user’s reaction to these warnings.

In this thesis, I propose a novel design to improve the effectiveness and reliability of phishing warning messages. This design utilizes the name of the target website that a fake website is mimicking, to display a simple, easy to understand and interactive warning message with the primary objective of keeping the user away from a potentially spoof website.
ContributorsSharma, Satyabrata (Author) / Bazzi, Rida (Thesis advisor) / Walker, Erin (Committee member) / Gaffar, Ashraf (Committee member) / Arizona State University (Publisher)
Created2015
153213-Thumbnail Image.png
Description
The processing of large volumes of RDF data require an efficient storage and query processing engine that can scale well with the volume of data. The initial attempts to address this issue focused on optimizing native RDF stores as well as conventional relational databases management systems. But as the

The processing of large volumes of RDF data require an efficient storage and query processing engine that can scale well with the volume of data. The initial attempts to address this issue focused on optimizing native RDF stores as well as conventional relational databases management systems. But as the volume of RDF data grew to exponential proportions, the limitations of these systems became apparent and researchers began to focus on using big data analysis tools, most notably Hadoop, to process RDF data. Various studies and benchmarks that evaluate these tools for RDF data processing have been published. In the past two and half years, however, heavy users of big data systems, like Facebook, noted limitations with the query performance of these big data systems and began to develop new distributed query engines for big data that do not rely on map-reduce. Facebook's Presto is one such example.

This thesis deals with evaluating the performance of Presto in processing big RDF data against Apache Hive. A comparative analysis was also conducted against 4store, a native RDF store. To evaluate the performance Presto for big RDF data processing, a map-reduce program and a compiler, based on Flex and Bison, were implemented. The map-reduce program loads RDF data into HDFS while the compiler translates SPARQL queries into a subset of SQL that Presto (and Hive) can understand. The evaluation was done on four and eight node Linux clusters installed on Microsoft Windows Azure platform with RDF datasets of size 10, 20, and 30 million triples. The results of the experiment show that Presto has a much higher performance than Hive can be used to process big RDF data. The thesis also proposes an architecture based on Presto, Presto-RDF, that can be used to process big RDF data.
ContributorsMammo, Mulugeta (Author) / Bansal, Srividya (Thesis advisor) / Bansal, Ajay (Committee member) / Lindquist, Timothy (Committee member) / Arizona State University (Publisher)
Created2014
153910-Thumbnail Image.png
Description
Despite the various driver assistance systems and electronics, the threat to life of driver, passengers and other people on the road still persists. With the growth in technology, the use of in-vehicle devices with a plethora of buttons and features is increasing resulting in increased distraction. Recently, speech recognition has

Despite the various driver assistance systems and electronics, the threat to life of driver, passengers and other people on the road still persists. With the growth in technology, the use of in-vehicle devices with a plethora of buttons and features is increasing resulting in increased distraction. Recently, speech recognition has emerged as an alternative to distraction and has the potential to be beneficial. However, considering the fact that automotive environment is dynamic and noisy in nature, distraction may not arise from the manual interaction, but due to the cognitive load. Hence, speech recognition certainly cannot be a reliable mode of communication.

The thesis is focused on proposing a simultaneous multimodal approach for designing interface between driver and vehicle with a goal to enable the driver to be more attentive to the driving tasks and spend less time fiddling with distractive tasks. By analyzing the human-human multimodal interaction techniques, new modes have been identified and experimented, especially suitable for the automotive context. The identified modes are touch, speech, graphics, voice-tip and text-tip. The multiple modes are intended to work collectively to make the interaction more intuitive and natural. In order to obtain a minimalist user-centered design for the center stack, various design principles such as 80/20 rule, contour bias, affordance, flexibility-usability trade-off etc. have been implemented on the prototypes. The prototype was developed using the Dragon software development kit on android platform for speech recognition.

In the present study, the driver behavior was investigated in an experiment conducted on the DriveSafety driving simulator DS-600s. Twelve volunteers drove the simulator under two conditions: (1) accessing the center stack applications using touch only and (2) accessing the applications using speech with offered text-tip. The duration for which user looked away from the road (eyes-off-road) was measured manually for each scenario. Comparison of results proved that eyes-off-road time is less for the second scenario. The minimalist design with 8-10 icons per screen proved to be effective as all the readings were within the driver distraction recommendations (eyes-off-road time < 2sec per screen) defined by NHTSA.
ContributorsMittal, Richa (Author) / Gaffar, Ashraf (Thesis advisor) / Femiani, John (Committee member) / Gray, Robert (Committee member) / Arizona State University (Publisher)
Created2015
156689-Thumbnail Image.png
Description
Since the advent of the internet and even more after social media platforms, the explosive growth of textual data and its availability has made analysis a tedious task. Information extraction systems are available but are generally too specific and often only extract certain kinds of information they deem necessary and

Since the advent of the internet and even more after social media platforms, the explosive growth of textual data and its availability has made analysis a tedious task. Information extraction systems are available but are generally too specific and often only extract certain kinds of information they deem necessary and extraction worthy. Using data visualization theory and fast, interactive querying methods, leaving out information might not really be necessary. This thesis explores textual data visualization techniques, intuitive querying, and a novel approach to all-purpose textual information extraction to encode large text corpus to improve human understanding of the information present in textual data.

This thesis presents a modified traversal algorithm on dependency parse output of text to extract all subject predicate object pairs from text while ensuring that no information is missed out. To support full scale, all-purpose information extraction from large text corpuses, a data preprocessing pipeline is recommended to be used before the extraction is run. The output format is designed specifically to fit on a node-edge-node model and form the building blocks of a network which makes understanding of the text and querying of information from corpus quick and intuitive. It attempts to reduce reading time and enhancing understanding of the text using interactive graph and timeline.
ContributorsHashmi, Syed Usama (Author) / Bansal, Ajay (Thesis advisor) / Bansal, Srividya (Committee member) / Gonzalez Sanchez, Javier (Committee member) / Arizona State University (Publisher)
Created2018
156614-Thumbnail Image.png
Description
Academia is not what it used to be. In today’s fast-paced world, requirements

are constantly changing, and adapting to these changes in an academic curriculum

can be challenging. Given a specific aspect of a domain, there can be various levels of

proficiency that can be achieved by the students. Considering the wide array

Academia is not what it used to be. In today’s fast-paced world, requirements

are constantly changing, and adapting to these changes in an academic curriculum

can be challenging. Given a specific aspect of a domain, there can be various levels of

proficiency that can be achieved by the students. Considering the wide array of needs,

diverse groups need customized course curriculum. The need for having an archetype

to design a course focusing on the outcomes paved the way for Outcome-based

Education (OBE). OBE focuses on the outcomes as opposed to the traditional way of

following a process [23]. According to D. Clark, the major reason for the creation of

Bloom’s taxonomy was not only to stimulate and inspire a higher quality of thinking

in academia – incorporating not just the basic fact-learning and application, but also

to evaluate and analyze on the facts and its applications [7]. Instructional Module

Development System (IMODS) is the culmination of both these models – Bloom’s

Taxonomy and OBE. It is an open-source web-based software that has been

developed on the principles of OBE and Bloom’s Taxonomy. It guides an instructor,

step-by-step, through an outcomes-based process as they define the learning

objectives, the content to be covered and develop an instruction and assessment plan.

The tool also provides the user with a repository of techniques based on the choices

made by them regarding the level of learning while defining the objectives. This helps

in maintaining alignment among all the components of the course design. The tool

also generates documentation to support the course design and provide feedback

when the course is lacking in certain aspects.

It is not just enough to come up with a model that theoretically facilitates

effective result-oriented course design. There should be facts, experiments and proof

that any model succeeds in achieving what it aims to achieve. And thus, there are two

research objectives of this thesis: (i) design a feature for course design feedback and

evaluate its effectiveness; (ii) evaluate the usefulness of a tool like IMODS on various

aspects – (a) the effectiveness of the tool in educating instructors on OBE; (b) the

effectiveness of the tool in providing appropriate and efficient pedagogy and

assessment techniques; (c) the effectiveness of the tool in building the learning

objectives; (d) effectiveness of the tool in document generation; (e) Usability of the

tool; (f) the effectiveness of OBE on course design and expected student outcomes.

The thesis presents a detailed algorithm for course design feedback, its pseudocode, a

description and proof of the correctness of the feature, methods used for evaluation

of the tool, experiments for evaluation and analysis of the obtained results.
ContributorsRaj, Vaishnavi (Author) / Bansal, Srividya (Thesis advisor) / Bansal, Ajay (Committee member) / Mehlhase, Alexandra (Committee member) / Arizona State University (Publisher)
Created2018
156879-Thumbnail Image.png
Description
The Semantic Web contains large amounts of related information in the form of knowledge graphs such as DBpedia. These knowledge graphs are typically enormous and are not easily accessible for users as they need specialized knowledge in query languages (such as SPARQL) as well as deep familiarity of the ontologies

The Semantic Web contains large amounts of related information in the form of knowledge graphs such as DBpedia. These knowledge graphs are typically enormous and are not easily accessible for users as they need specialized knowledge in query languages (such as SPARQL) as well as deep familiarity of the ontologies used by these knowledge graphs. So, to make these knowledge graphs more accessible (even for non- experts) several question answering (QA) systems have been developed over the last decade. Due to the complexity of the task, several approaches have been undertaken that include techniques from natural language processing (NLP), information retrieval (IR), machine learning (ML) and the Semantic Web (SW). At a higher level, most question answering systems approach the question answering task as a conversion from the natural language question to its corresponding SPARQL query. These systems then utilize the query to retrieve the desired entities or literals. One approach to solve this problem, that is used by most systems today, is to apply deep syntactic and semantic analysis on the input question to derive the SPARQL query. This has resulted in the evolution of natural language processing pipelines that have common characteristics such as answer type detection, segmentation, phrase matching, part-of-speech-tagging, named entity recognition, named entity disambiguation, syntactic or dependency parsing, semantic role labeling, etc.

This has lead to NLP pipeline architectures that integrate components that solve a specific aspect of the problem and pass on the results to subsequent components for further processing eg: DBpedia Spotlight for named entity recognition, RelMatch for relational mapping, etc. A major drawback in this approach is error propagation that is a common problem in NLP. This can occur due to mistakes early on in the pipeline that can adversely affect successive steps further down the pipeline. Another approach is to use query templates either manually generated or extracted from existing benchmark datasets such as Question Answering over Linked Data (QALD) to generate the SPARQL queries that is basically a set of predefined queries with various slots that need to be filled. This approach potentially shifts the question answering problem into a classification task where the system needs to match the input question to the appropriate template (class label).

This thesis proposes a neural network approach to automatically learn and classify natural language questions into its corresponding template using recursive neural networks. An obvious advantage of using neural networks is the elimination for the need of laborious feature engineering that can be cumbersome and error prone. The input question would be encoded into a vector representation. The model will be trained and evaluated on the LC-QuAD Dataset (Large-scale Complex Question Answering Dataset). The dataset was created explicitly for machine learning based QA approaches for learning complex SPARQL queries. The dataset consists of 5000 questions along with their corresponding SPARQL queries over the DBpedia dataset spanning 5042 entities and 615 predicates. These queries were annotated based on 38 unique templates that the model will attempt to classify. The resulting model will be evaluated against both the LC-QuAD dataset and the Question Answering Over Linked Data (QALD-7) dataset.

The recursive neural network achieves template classification accuracy of 0.828 on the LC-QuAD dataset and an accuracy of 0.618 on the QALD-7 dataset. When the top-2 most likely templates were considered the model achieves an accuracy of 0.945 on the LC-QuAD dataset and 0.786 on the QALD-7 dataset.

After slot filling, the overall system achieves a macro F-score 0.419 on the LC- QuAD dataset and a macro F-score of 0.417 on the QALD-7 dataset.
ContributorsAthreya, Ram G (Author) / Bansal, Srividya (Thesis advisor) / Usbeck, Ricardo (Committee member) / Gary, Kevin (Committee member) / Arizona State University (Publisher)
Created2018
Description
Driver distraction research has a long history spanning nearly 50 years, intensifying in the last decade. The focus has always been on identifying the distractive tasks and measuring the respective harm level. As in-vehicle technology advances, the list of distractive activities grows along with crash risk. Additionally, the distractive activities

Driver distraction research has a long history spanning nearly 50 years, intensifying in the last decade. The focus has always been on identifying the distractive tasks and measuring the respective harm level. As in-vehicle technology advances, the list of distractive activities grows along with crash risk. Additionally, the distractive activities become more common and complicated, especially with regard to In-Car Interactive System. This work's main focus is on driver distraction caused by the in-car interactive System. There have been many User Interaction Designs (Buttons, Speech, Visual) for Human-Car communication, in the past and currently present. And, all related studies suggest that driver distraction level is still high and there is a need for a better design. Multimodal Interaction is a design approach, which relies on using multiple modes for humans to interact with the car & hence reducing driver distraction by allowing the driver to choose the most suitable mode with minimum distraction. Additionally, combining multiple modes simultaneously provides more natural interaction, which could lead to less distraction. The main goal of MMI is to enable the driver to be more attentive to driving tasks and spend less time fiddling with distractive tasks. Engineering based method is used to measure driver distraction. This method uses metrics like Reaction time, Acceleration, Lane Departure obtained from test cases.
ContributorsJahagirdar, Tanvi (Author) / Gaffar, Ashraf (Thesis advisor) / Ghazarian, Arbi (Committee member) / Gray, Robert (Committee member) / Arizona State University (Publisher)
Created2015
Description
Driving is already a complex task that demands a varying level of cognitive and physical load. With the advancement in technology, the car has become a place for media consumption, a communications center and an interconnected workplace. The number of features in a car has also increased. As a result,

Driving is already a complex task that demands a varying level of cognitive and physical load. With the advancement in technology, the car has become a place for media consumption, a communications center and an interconnected workplace. The number of features in a car has also increased. As a result, the user interaction inside the car has become overcrowded and more complex. This has increased the amount of distraction while driving and has also increased the number of accidents due to distracted driving. This thesis focuses on the critical analysis of today’s in-car environment covering two main aspects, Multi Modal Interaction (MMI), and Advanced Driver Assistance Systems (ADAS), to minimize the distraction. It also provides deep market research on future trends in the smart car technology. After careful analysis, it was observed that an infotainment screen cluttered with lots of small icons, a center stack with a plethora of small buttons and a poor Voice Recognition (VR) results in high cognitive load, and these are the reasons for the increased driver distraction. Though the VR has become a standard technology, the current state of technology is focused on features oriented design and a sales driven approach. Most of the automotive manufacturers are focusing on making the VR better but attaining perfection in VR is not the answer as there are inherent challenges and limitations in respect to the in-car environment and cognitive load. Accordingly, the research proposed a novel in-car interaction design solution: Multi-Modal Interaction (MMI). The MMI is a new term when used in the context of vehicles, but it is widely used in human-human interaction. The approach offers a non-intrusive alternative to the driver to interact with the features in the car. With the focus on user-centered design, the MMI and ADAS can potentially help to reduce the distraction. To support the discussion, an experiment was conducted to benchmark a minimalist UI design. An engineering based method was used to test and measure distraction of four different UIs with varying numbers of icons and screen sizes. Lastly, in order to compete with the market, the basic features that are provided by all the other competitors cannot be eliminated, but the hard work can be done to improve the HCaI and to make driving safer.
ContributorsNakrani, Paresh Keshubhai (Author) / Gaffar, Ashraf (Thesis advisor) / Sohoni, Sohum (Committee member) / Ghazarian, Arabi (Committee member) / Arizona State University (Publisher)
Created2015
154747-Thumbnail Image.png
Description
Text Classification is a rapidly evolving area of Data Mining while Requirements Engineering is a less-explored area of Software Engineering which deals the process of defining, documenting and maintaining a software system's requirements. When researchers decided to blend these two streams in, there was research on automating the process of

Text Classification is a rapidly evolving area of Data Mining while Requirements Engineering is a less-explored area of Software Engineering which deals the process of defining, documenting and maintaining a software system's requirements. When researchers decided to blend these two streams in, there was research on automating the process of classification of software requirements statements into categories easily comprehensible for developers for faster development and delivery, which till now was mostly done manually by software engineers - indeed a tedious job. However, most of the research was focused on classification of Non-functional requirements pertaining to intangible features such as security, reliability, quality and so on. It is indeed a challenging task to automatically classify functional requirements, those pertaining to how the system will function, especially those belonging to different and large enterprise systems. This requires exploitation of text mining capabilities. This thesis aims to investigate results of text classification applied on functional software requirements by creating a framework in R and making use of algorithms and techniques like k-nearest neighbors, support vector machine, and many others like boosting, bagging, maximum entropy, neural networks and random forests in an ensemble approach. The study was conducted by collecting and visualizing relevant enterprise data manually classified previously and subsequently used for training the model. Key components for training included frequency of terms in the documents and the level of cleanliness of data. The model was applied on test data and validated for analysis, by studying and comparing parameters like precision, recall and accuracy.
ContributorsSwadia, Japa (Author) / Ghazarian, Arbi (Thesis advisor) / Bansal, Srividya (Committee member) / Gaffar, Ashraf (Committee member) / Arizona State University (Publisher)
Created2016
154625-Thumbnail Image.png
Description
This reports investigates the general day to day problems faced by small businesses, particularly small vendors, in areas of marketing and general management. Due to lack of man power, internet availability and properly documented data, small business cannot optimize their business. The aim of the research is to address and

This reports investigates the general day to day problems faced by small businesses, particularly small vendors, in areas of marketing and general management. Due to lack of man power, internet availability and properly documented data, small business cannot optimize their business. The aim of the research is to address and find a solution to these problems faced, in the form of a tool which utilizes data science. The tool will have features which will aid the vendor to mine their data which they record themselves and find useful information which will benefit their businesses. Since there is lack of properly documented data, One Class Classification using Support Vector Machine (SVM) is used to build a classifying model that can return positive values for audience that is likely to respond to a marketing strategy. Market basket analysis is used to choose products from the inventory in a way that patterns are found amongst them and therefore there is a higher chance of a marketing strategy to attract audience. Also, higher selling products can be used to the vendors' advantage and lesser selling products can be paired with them to have an overall profit to the business. The tool, as envisioned, meets all the requirements that it was set out to have and can be used as a stand alone application to bring the power of data mining into the hands of a small vendor.
ContributorsSharma, Aveesha (Author) / Ghazarian, Arbi (Thesis advisor) / Gaffar, Ashraf (Committee member) / Bansal, Srividya (Committee member) / Arizona State University (Publisher)
Created2016