Search Content

The classification of domain concepts in object-oriented systems

Description

The complexity of the systems that software engineers build has continuously grown since the inception of the field. What has not changed is the engineers' mental capacity to operate on about seven distinct pieces of information at a time. The widespread use of UML has led to more abstract software…

The complexity of the systems that software engineers build has continuously grown since the inception of the field. What has not changed is the engineers' mental capacity to operate on about seven distinct pieces of information at a time. The widespread use of UML has led to more abstract software design activities, however the same cannot be said for reverse engineering activities. The introduction of abstraction to reverse engineering will allow the engineer to move farther away from the details of the system, increasing his ability to see the role that domain level concepts play in the system. In this thesis, we present a technique that facilitates filtering of classes from existing systems at the source level based on their relationship to concepts in the domain via a classification method using machine learning. We showed that concepts can be identified using a machine learning classifier based on source level metrics. We developed an Eclipse plugin to assist with the process of manually classifying Java source code, and collecting metrics and classifications into a standard file format. We developed an Eclipse plugin to act as a concept identifier that visually indicates a class as a domain concept or not. We minimized the size of training sets to ensure a useful approach in practice. This allowed us to determine that a training set of 7:5 to 10% is nearly as effective as a training set representing 50% of the system. We showed that random selection is the most consistent and effective means of selecting a training set. We found that KNN is the most consistent performer among the learning algorithms tested. We determined the optimal feature set for this classification problem. We discussed two possible structures besides a one to one mapping of domain knowledge to implementation. We showed that classes representing more than one concept are simply concepts at differing levels of abstraction. We also discussed composite concepts representing a domain concept implemented by more than one class. We showed that these composite concepts are difficult to detect because the problem is NP-complete.

ContributorsCarey, Maurice (Author) / Colbourn, Charles (Thesis advisor) / Collofello, James (Thesis advisor) / Davulcu, Hasan (Committee member) / Sarjoughian, Hessam S. (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

A semantic triplet based story classifier

Description

Text classification, in the artificial intelligence domain, is an activity in which text documents are automatically classified into predefined categories using machine learning techniques. An example of this is classifying uncategorized news articles into different predefined categories such as "Business", "Politics", "Education", "Technology" , etc. In this thesis, supervised machine…

Text classification, in the artificial intelligence domain, is an activity in which text documents are automatically classified into predefined categories using machine learning techniques. An example of this is classifying uncategorized news articles into different predefined categories such as "Business", "Politics", "Education", "Technology" , etc. In this thesis, supervised machine learning approach is followed, in which a module is first trained with pre-classified training data and then class of test data is predicted. Good feature extraction is an important step in the machine learning approach and hence the main component of this text classifier is semantic triplet based features in addition to traditional features like standard keyword based features and statistical features based on shallow-parsing (such as density of POS tags and named entities). Triplet {Subject, Verb, Object} in a sentence is defined as a relation between subject and object, the relation being the predicate (verb). Triplet extraction process, is a 5 step process which takes input corpus as a web text document(s), each consisting of one or many paragraphs, from RSS feeds to lists of extremist website. Input corpus feeds into the "Pronoun Resolution" step, which uses an heuristic approach to identify the noun phrases referenced by the pronouns. The next step "SRL Parser" is a shallow semantic parser and converts the incoming pronoun resolved paragraphs into annotated predicate argument format. The output of SRL parser is processed by "Triplet Extractor" algorithm which forms the triplet in the form {Subject, Verb, Object}. Generalization and reduction of triplet features is the next step. Reduced feature representation reduces computing time, yields better discriminatory behavior and handles curse of dimensionality phenomena. For training and testing, a ten- fold cross validation approach is followed. In each round SVM classifier is trained with 90% of labeled (training) data and in the testing phase, classes of remaining 10% unlabeled (testing) data are predicted. Concluding, this paper proposes a model with semantic triplet based features for story classification. The effectiveness of the model is demonstrated against other traditional features used in the literature for text classification tasks.

ContributorsKarad, Ravi Chandravadan (Author) / Davulcu, Hasan (Thesis advisor) / Corman, Steven (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2013

Development of a robust and integrated methodology for predicting the reliability of microelectronic packaging systems

Description

Ball Grid Array (BGA) using lead-free or lead-rich solder materials are widely used as Second Level Interconnects (SLI) in mounting packaged components to the printed circuit board (PCB). The reliability of these solder joints is of significant importance to the performance of microelectronics components and systems. Product design/form-factor, solder material,…

Ball Grid Array (BGA) using lead-free or lead-rich solder materials are widely used as Second Level Interconnects (SLI) in mounting packaged components to the printed circuit board (PCB). The reliability of these solder joints is of significant importance to the performance of microelectronics components and systems. Product design/form-factor, solder material, manufacturing process, use condition, as well as, the inherent variabilities present in the system, greatly influence product reliability. Accurate reliability analysis requires an integrated approach to concurrently account for all these factors and their synergistic effects. Such an integrated and robust methodology can be used in design and development of new and advanced microelectronics systems and can provide significant improvement in cycle-time, cost, and reliability. IMPRPK approach is based on a probabilistic methodology, focusing on three major tasks of (1) Characterization of BGA solder joints to identify failure mechanisms and obtain statistical data, (2) Finite Element analysis (FEM) to predict system response needed for life prediction, and (3) development of a probabilistic methodology to predict the reliability, as well as, the sensitivity of the system to various parameters and the variabilities. These tasks and the predictive capabilities of IMPRPK in microelectronic reliability analysis are discussed.

ContributorsFallah-Adl, Ali (Author) / Tasooji, Amaneh (Thesis advisor) / Krause, Stephen (Committee member) / Alford, Terry (Committee member) / Jiang, Hanqing (Committee member) / Mahajan, Ravi (Committee member) / Arizona State University (Publisher)

Created2013

Dealloying induced stress corrosion cracking

Description

Dealloying induced stress corrosion cracking is particularly relevant in energy conversion systems (both nuclear and fossil fuel) as many failures in alloys such as austenitic stainless steel and nickel-based systems result directly from dealloying. This study provides evidence of the role of unstable dynamic fracture processes in dealloying induced stress-corrosion…

Dealloying induced stress corrosion cracking is particularly relevant in energy conversion systems (both nuclear and fossil fuel) as many failures in alloys such as austenitic stainless steel and nickel-based systems result directly from dealloying. This study provides evidence of the role of unstable dynamic fracture processes in dealloying induced stress-corrosion cracking of face-centered cubic alloys. Corrosion of such alloys often results in the formation of a brittle nanoporous layer which we hypothesize serves to nucleate a crack that owing to dynamic effects penetrates into the un-dealloyed parent phase alloy. Thus, since there is essentially a purely mechanical component of cracking, stress corrosion crack propagation rates can be significantly larger than that predicted from electrochemical parameters. The main objective of this work is to examine and test this hypothesis under conditions relevant to stress corrosion cracking. Silver-gold alloys serve as a model system for this study since hydrogen effects can be neglected on a thermodynamic basis, which allows us to focus on a single cracking mechanism. In order to study various aspects of this problem, the dynamic fracture properties of monolithic nanoporous gold (NPG) were examined in air and under electrochemical conditions relevant to stress corrosion cracking. The detailed processes associated with the crack injection phenomenon were also examined by forming dealloyed nanoporous layers of prescribed properties on un-dealloyed parent phase structures and measuring crack penetration distances. Dynamic fracture in monolithic NPG and in crack injection experiments was examined using high-speed (106 frames s-1) digital photography. The tunable set of experimental parameters included the NPG length scale (20-40 nm), thickness of the dealloyed layer (10-3000 nm) and the electrochemical potential (0.5-1.5 V). The results of crack injection experiments were characterized using the dual-beam focused ion beam/scanning electron microscopy. Together these tools allow us to very accurately examine the detailed structure and composition of dealloyed grain boundaries and compare crack injection distances to the depth of dealloying. The results of this work should provide a basis for new mathematical modeling of dealloying induced stress corrosion cracking while providing a sound physical basis for the design of new alloys that may not be susceptible to this form of cracking. Additionally, the obtained results should be of broad interest to researchers interested in the fracture properties of nano-structured materials. The findings will open up new avenues of research apart from any implications the study may have for stress corrosion cracking.

ContributorsSun, Shaofeng (Author) / Sieradzki, Karl (Thesis advisor) / Jiang, Hanqing (Committee member) / Peralta, Pedro (Committee member) / Arizona State University (Publisher)

Created2012

Multiscale reduced order models for the geometrically nonlinear response of complex structures

Description

The focus of this investigation includes three aspects. First, the development of nonlinear reduced order modeling techniques for the prediction of the response of complex structures exhibiting "large" deformations, i.e. a geometrically nonlinear behavior, and modeled within a commercial finite element code. The present investigation builds on a general methodology,…

The focus of this investigation includes three aspects. First, the development of nonlinear reduced order modeling techniques for the prediction of the response of complex structures exhibiting "large" deformations, i.e. a geometrically nonlinear behavior, and modeled within a commercial finite element code. The present investigation builds on a general methodology, successfully validated in recent years on simpler panel structures, by developing a novel identification strategy of the reduced order model parameters, that enables the consideration of the large number of modes needed for complex structures, and by extending an automatic strategy for the selection of the basis functions used to represent accurately the displacement field. These novel developments are successfully validated on the nonlinear static and dynamic responses of a 9-bay panel structure modeled within Nastran. In addition, a multi-scale approach based on Component Mode Synthesis methods is explored. Second, an assessment of the predictive capabilities of nonlinear reduced order models for the prediction of the large displacement and stress fields of panels that have a geometric discontinuity; a flat panel with a notch was used for this assessment. It is demonstrated that the reduced order models of both virgin and notched panels provide a close match of the displacement field obtained from full finite element analyses of the notched panel for moderately large static and dynamic responses. In regards to stresses, it is found that the notched panel reduced order model leads to a close prediction of the stress distribution obtained on the notched panel as computed by the finite element model. Two enrichment techniques, based on superposition of the notch effects on the virgin panel stress field, are proposed to permit a close prediction of the stress distribution of the notched panel from the reduced order model of the virgin one. A very good prediction of the full finite element results is achieved with both enrichments for static and dynamic responses. Finally, computational challenges associated with the solution of the reduced order model equations are discussed. Two alternatives to reduce the computational time for the solution of these problems are explored.

ContributorsPerez, Ricardo Angel (Author) / Mignolet, Marc (Thesis advisor) / Oswald, Jay (Committee member) / Spottswood, Stephen (Committee member) / Peralta, Pedro (Committee member) / Jiang, Hanqing (Committee member) / Arizona State University (Publisher)

Created2012

A molecular electronic transducer based low-frequency accelerometer with electrolyte droplet sensing body

Description

"Sensor Decade" has been labeled on the first decade of the 21st century. Similar to the revolution of micro-computer in 1980s, sensor R&D; developed rapidly during the past 20 years. Hard workings were mainly made to minimize the size of devices with optimal the performance. Efforts to develop the small…

"Sensor Decade" has been labeled on the first decade of the 21st century. Similar to the revolution of micro-computer in 1980s, sensor R&D; developed rapidly during the past 20 years. Hard workings were mainly made to minimize the size of devices with optimal the performance. Efforts to develop the small size devices are mainly concentrated around Micro-electro-mechanical-system (MEMS) technology. MEMS accelerometers are widely published and used in consumer electronics, such as smart phones, gaming consoles, anti-shake camera and vibration detectors. This study represents liquid-state low frequency micro-accelerometer based on molecular electronic transducer (MET), in which inertial mass is not the only but also the conversion of mechanical movement to electric current signal is the main utilization of the ionic liquid. With silicon-based planar micro-fabrication, the device uses a sub-micron liter electrolyte droplet sealed in oil as the sensing body and a MET electrode arrangement which is the anode-cathode-cathode-anode (ACCA) in parallel as the read-out sensing part. In order to sensing the movement of ionic liquid, an imposed electric potential was applied between the anode and the cathode. The electrode reaction, I_3^-+2e^___3I^-, occurs around the cathode which is reverse at the anodes. Obviously, the current magnitude varies with the concentration of ionic liquid, which will be effected by the movement of liquid droplet as the inertial mass. With such structure, the promising performance of the MET device design is to achieve 10.8 V/G (G=9.81 m/s^2) sensitivity at 20 Hz with the bandwidth from 1 Hz to 50 Hz, and a low noise floor of 100 ug/sqrt(Hz) at 20 Hz.

ContributorsLiang, Mengbing (Author) / Yu, Hongyu (Thesis advisor) / Jiang, Hanqing (Committee member) / Kozicki, Micheal (Committee member) / Arizona State University (Publisher)

Created2013

Somatic ABC's: a theoretical framework for designing, developing and evaluating the building blocks of touch-based information delivery

Description

Situations of sensory overload are steadily becoming more frequent as the ubiquity of technology approaches reality--particularly with the advent of socio-communicative smartphone applications, and pervasive, high speed wireless networks. Although the ease of accessing information has improved our communication effectiveness and efficiency, our visual and auditory modalities--those modalities that today's…

Situations of sensory overload are steadily becoming more frequent as the ubiquity of technology approaches reality--particularly with the advent of socio-communicative smartphone applications, and pervasive, high speed wireless networks. Although the ease of accessing information has improved our communication effectiveness and efficiency, our visual and auditory modalities--those modalities that today's computerized devices and displays largely engage--have become overloaded, creating possibilities for distractions, delays and high cognitive load; which in turn can lead to a loss of situational awareness, increasing chances for life threatening situations such as texting while driving. Surprisingly, alternative modalities for information delivery have seen little exploration. Touch, in particular, is a promising candidate given that it is our largest sensory organ with impressive spatial and temporal acuity. Although some approaches have been proposed for touch-based information delivery, they are not without limitations including high learning curves, limited applicability and/or limited expression. This is largely due to the lack of a versatile, comprehensive design theory--specifically, a theory that addresses the design of touch-based building blocks for expandable, efficient, rich and robust touch languages that are easy to learn and use. Moreover, beyond design, there is a lack of implementation and evaluation theories for such languages. To overcome these limitations, a unified, theoretical framework, inspired by natural, spoken language, is proposed called Somatic ABC's for Articulating (designing), Building (developing) and Confirming (evaluating) touch-based languages. To evaluate the usefulness of Somatic ABC's, its design, implementation and evaluation theories were applied to create communication languages for two very unique application areas: audio described movies and motor learning. These applications were chosen as they presented opportunities for complementing communication by offloading information, typically conveyed visually and/or aurally, to the skin. For both studies, it was found that Somatic ABC's aided the design, development and evaluation of rich somatic languages with distinct and natural communication units.

ContributorsMcDaniel, Troy Lee (Author) / Panchanathan, Sethuraman (Thesis advisor) / Davulcu, Hasan (Committee member) / Li, Baoxin (Committee member) / Santello, Marco (Committee member) / Arizona State University (Publisher)

Created2012

Microrheology and particle dynamics at liquid-liquid interfaces

Description

The rheological properties at liquid-liquid interfaces are important in many industrial processes such as manufacturing foods, pharmaceuticals, cosmetics, and petroleum products. This dissertation focuses on the study of linear viscoelastic properties at liquid-liquid interfaces by tracking the thermal motion of particles confined at the interfaces. The technique of interfacial microrheology…

The rheological properties at liquid-liquid interfaces are important in many industrial processes such as manufacturing foods, pharmaceuticals, cosmetics, and petroleum products. This dissertation focuses on the study of linear viscoelastic properties at liquid-liquid interfaces by tracking the thermal motion of particles confined at the interfaces. The technique of interfacial microrheology is first developed using one- and two-particle tracking, respectively. In one-particle interfacial microrheology, the rheological response at the interface is measured from the motion of individual particles. One-particle interfacial microrheology at polydimethylsiloxane (PDMS) oil-water interfaces depends strongly on the surface chemistry of different tracer particles. In contrast, by tracking the correlated motion of particle pairs, two-particle interfacial microrheology significantly minimizes the effects from tracer particle surface chemistry and particle size. Two-particle interfacial microrheology is further applied to study the linear viscoelastic properties of immiscible polymer-polymer interfaces. The interfacial loss and storage moduli at PDMS-polyethylene glycol (PEG) interfaces are measured over a wide frequency range. The zero-shear interfacial viscosity, estimated from the Cross model, falls between the bulk viscosities of two individual polymers. Surprisingly, the interfacial relaxation time is observed to be an order of magnitude larger than that of the PDMS bulk polymers. To explore the fundamental basis of interfacial nanorheology, molecular dynamics (MD) simulations are employed to investigate the nanoparticle dynamics. The diffusion of single nanoparticles in pure water and low-viscosity PDMS oils is reasonably consistent with the prediction by the Stokes-Einstein equation. To demonstrate the potential of nanorheology based on the motion of nanoparticles, the shear moduli and viscosities of the bulk phases and interfaces are calculated from single-nanoparticle tracking. Finally, the competitive influences of nanoparticles and surfactants on other interfacial properties, such as interfacial thickness and interfacial tension are also studied by MD simulations.

ContributorsSong, Yanmei (Author) / Dai, Lenore L (Thesis advisor) / Jiang, Hanqing (Committee member) / Lin, Jerry Y S (Committee member) / Raupp, Gregory B (Committee member) / Sierks, Michael R (Committee member) / Arizona State University (Publisher)

Created2011

Machine learning methods for high-dimensional imbalanced biomedical data

Description

Learning from high dimensional biomedical data attracts lots of attention recently. High dimensional biomedical data often suffer from the curse of dimensionality and have imbalanced class distributions. Both of these features of biomedical data, high dimensionality and imbalanced class distributions, are challenging for traditional machine learning methods and may affect…

Learning from high dimensional biomedical data attracts lots of attention recently. High dimensional biomedical data often suffer from the curse of dimensionality and have imbalanced class distributions. Both of these features of biomedical data, high dimensionality and imbalanced class distributions, are challenging for traditional machine learning methods and may affect the model performance. In this thesis, I focus on developing learning methods for the high-dimensional imbalanced biomedical data. In the first part, a sparse canonical correlation analysis (CCA) method is presented. The penalty terms is used to control the sparsity of the projection matrices of CCA. The sparse CCA method is then applied to find patterns among biomedical data sets and labels, or to find patterns among different data sources. In the second part, I discuss several learning problems for imbalanced biomedical data. Note that traditional learning systems are often biased when the biomedical data are imbalanced. Therefore, traditional evaluations such as accuracy may be inappropriate for such cases. I then discuss several alternative evaluation criteria to evaluate the learning performance. For imbalanced binary classification problems, I use the undersampling based classifiers ensemble (UEM) strategy to obtain accurate models for both classes of samples. A small sphere and large margin (SSLM) approach is also presented to detect rare abnormal samples from a large number of subjects. In addition, I apply multiple feature selection and clustering methods to deal with high-dimensional data and data with highly correlated features. Experiments on high-dimensional imbalanced biomedical data are presented which illustrate the effectiveness and efficiency of my methods.

ContributorsYang, Tao (Author) / Ye, Jieping (Thesis advisor) / Wang, Yalin (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2013

Measuring and Enhancing Users' Privacy in Machine Learning

Description

With the bloom of machine learning, a massive amount of data has been used in the training process of machine learning. A tremendous amount of this data is user-generated data which allows the machine learning models to produce accurate results and personalized services. Nevertheless, I recognize the importance of preserving…

With the bloom of machine learning, a massive amount of data has been used in the training process of machine learning. A tremendous amount of this data is user-generated data which allows the machine learning models to produce accurate results and personalized services. Nevertheless, I recognize the importance of preserving the privacy of individuals by protecting their information in the training process. One privacy attack that affects individuals is the private attribute inference attack. The private attribute attack is the process of inferring individuals' information that they do not explicitly reveal, such as age, gender, location, and occupation. The impacts of this go beyond knowing the information as individuals face potential risks. Furthermore, some applications need sensitive data to train the models and predict helpful insights and figuring out how to build privacy-preserving machine learning models will increase the capabilities of these applications.However, improving privacy affects the data utility which leads to a dilemma between privacy and utility. The utility of the data is measured by the quality of the data for different tasks. This trade-off between privacy and utility needs to be maintained to satisfy the privacy requirement and the result quality. To achieve more scalable privacy-preserving machine learning models, I investigate the privacy risks that affect individuals' private information in distributed machine learning. Even though the distributed machine learning has been driven by privacy concerns, privacy issues have been proposed in the literature which threaten individuals' privacy. In this dissertation, I investigate how to measure and protect individuals' privacy in centralized and distributed machine learning models. First, a privacy-preserving text representation learning is proposed to protect users' privacy that can be revealed from user generated data. Second, a novel privacy-preserving text classification for split learning is presented to improve users' privacy and retain high utility by defending against private attribute inference attacks.

ContributorsAlnasser, Walaa (Author) / Liu, Huan (Thesis advisor) / Davulcu, Hasan (Committee member) / Shu, Kai (Committee member) / Bao, Tiffany (Committee member) / Arizona State University (Publisher)

Created2022

Filtering by