Search Content

The classification of domain concepts in object-oriented systems

Description

The complexity of the systems that software engineers build has continuously grown since the inception of the field. What has not changed is the engineers' mental capacity to operate on about seven distinct pieces of information at a time. The widespread use of UML has led to more abstract software…

The complexity of the systems that software engineers build has continuously grown since the inception of the field. What has not changed is the engineers' mental capacity to operate on about seven distinct pieces of information at a time. The widespread use of UML has led to more abstract software design activities, however the same cannot be said for reverse engineering activities. The introduction of abstraction to reverse engineering will allow the engineer to move farther away from the details of the system, increasing his ability to see the role that domain level concepts play in the system. In this thesis, we present a technique that facilitates filtering of classes from existing systems at the source level based on their relationship to concepts in the domain via a classification method using machine learning. We showed that concepts can be identified using a machine learning classifier based on source level metrics. We developed an Eclipse plugin to assist with the process of manually classifying Java source code, and collecting metrics and classifications into a standard file format. We developed an Eclipse plugin to act as a concept identifier that visually indicates a class as a domain concept or not. We minimized the size of training sets to ensure a useful approach in practice. This allowed us to determine that a training set of 7:5 to 10% is nearly as effective as a training set representing 50% of the system. We showed that random selection is the most consistent and effective means of selecting a training set. We found that KNN is the most consistent performer among the learning algorithms tested. We determined the optimal feature set for this classification problem. We discussed two possible structures besides a one to one mapping of domain knowledge to implementation. We showed that classes representing more than one concept are simply concepts at differing levels of abstraction. We also discussed composite concepts representing a domain concept implemented by more than one class. We showed that these composite concepts are difficult to detect because the problem is NP-complete.

ContributorsCarey, Maurice (Author) / Colbourn, Charles (Thesis advisor) / Collofello, James (Thesis advisor) / Davulcu, Hasan (Committee member) / Sarjoughian, Hessam S. (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Evaluation and characterization of Silicon MESFETs in low dropout regulators

Description

The partially-depleted (PD) silicon Metal Semiconductor Field Effect Transistor (MESFET) is becoming more and more attractive for analog and RF applications due to its high breakdown voltage. Compared to conventional CMOS high voltage transistors, the silicon MESFET can be fabricated in commercial standard Silicon-on-Insulator (SOI) CMOS foundries without any change…

The partially-depleted (PD) silicon Metal Semiconductor Field Effect Transistor (MESFET) is becoming more and more attractive for analog and RF applications due to its high breakdown voltage. Compared to conventional CMOS high voltage transistors, the silicon MESFET can be fabricated in commercial standard Silicon-on-Insulator (SOI) CMOS foundries without any change to the process. The transition frequency of the device is demonstrated to be 45GHz, which makes the MESFET suitable for applications in high power RF power amplifier designs. Also, high breakdown voltage and low turn-on resistance make it the ideal choice for switches in the switching regulator designs. One of the anticipated applications of the MESFET is for the pass device for a low dropout linear regulator. Conventional NMOS and PMOS linear regulators suffer from high dropout voltage, low bandwidth and poor stability issues. In contrast, the N-MESFET pass transistor can provide an ultra-low dropout voltage and high bandwidth without the need for an external compensation capacitor to ensure stability. In this thesis, the design theory and problems of the conventional linear regulators are discussed. N-MESFET low dropout regulators are evaluated and characterized. The error amplifier used a folded cascode architecture with gain boosting. The source follower topology is utilized as the buffer to sink the gate leakage current from the MESFET. A shunt-feedback transistor is added to reduce the output impedance and provide the current adaptively. Measurement results show that the dropout voltage is less than 150 mV for a 1A load current at 1.8V output. Radiation measurements were done for discrete MESFET and fully integrated LDO regulators, which demonstrate their radiation tolerance ability for aerospace applications.

ContributorsChen, Bo (Author) / Thornton, Trevor (Thesis advisor) / Bakkaloglu, Bertan (Committee member) / Goryll, Michael (Committee member) / Arizona State University (Publisher)

Created2013

Carrier lifetime measurement for characterization of ultraclean thin p/p+ silicon epitaxial layers

Description

Carrier lifetime is one of the few parameters which can give information about the low defect densities in today's semiconductors. In principle there is no lower limit to the defect density determined by lifetime measurements. No other technique can easily detect defect densities as low as 10-9 - 10-10 cm-3…

Carrier lifetime is one of the few parameters which can give information about the low defect densities in today's semiconductors. In principle there is no lower limit to the defect density determined by lifetime measurements. No other technique can easily detect defect densities as low as 10-9 - 10-10 cm-3 in a simple, contactless room temperature measurement. However in practice, recombination lifetime τr measurements such as photoconductance decay (PCD) and surface photovoltage (SPV) that are widely used for characterization of bulk wafers face serious limitations when applied to thin epitaxial layers, where the layer thickness is smaller than the minority carrier diffusion length Ln. Other methods such as microwave photoconductance decay (µ-PCD), photoluminescence (PL), and frequency-dependent SPV, where the generated excess carriers are confined to the epitaxial layer width by using short excitation wavelengths, require complicated configuration and extensive surface passivation processes that make them time-consuming and not suitable for process screening purposes. Generation lifetime τg, typically measured with pulsed MOS capacitors (MOS-C) as test structures, has been shown to be an eminently suitable technique for characterization of thin epitaxial layers. It is for these reasons that the IC community, largely concerned with unipolar MOS devices, uses lifetime measurements as a "process cleanliness monitor." However when dealing with ultraclean epitaxial wafers, the classic MOS-C technique measures an effective generation lifetime τg eff which is dominated by the surface generation and hence cannot be used for screening impurity densities. I have developed a modified pulsed MOS technique for measuring generation lifetime in ultraclean thin p/p+ epitaxial layers which can be used to detect metallic impurities with densities as low as 10-10 cm-3. The widely used classic version has been shown to be unable to effectively detect such low impurity densities due to the domination of surface generation; whereas, the modified version can be used suitably as a metallic impurity density monitoring tool for such cases.

ContributorsElhami Khorasani, Arash (Author) / Alford, Terry (Thesis advisor) / Goryll, Michael (Committee member) / Bertoni, Mariana (Committee member) / Arizona State University (Publisher)

Created2013

Connecting users with similar interests for group understanding

Description

In most social networking websites, users are allowed to perform interactive activities. One of the fundamental features that these sites provide is to connecting with users of their kind. On one hand, this activity makes online connections visible and tangible; on the other hand, it enables the exploration of our…

In most social networking websites, users are allowed to perform interactive activities. One of the fundamental features that these sites provide is to connecting with users of their kind. On one hand, this activity makes online connections visible and tangible; on the other hand, it enables the exploration of our connections and the expansion of our social networks easier. The aggregation of people who share common interests forms social groups, which are fundamental parts of our social lives. Social behavioral analysis at a group level is an active research area and attracts many interests from the industry. Challenges of my work mainly arise from the scale and complexity of user generated behavioral data. The multiple types of interactions, highly dynamic nature of social networking and the volatile user behavior suggest that these data are complex and big in general. Effective and efficient approaches are required to analyze and interpret such data. My work provide effective channels to help connect the like-minded and, furthermore, understand user behavior at a group level. The contributions of this dissertation are in threefold: (1) proposing novel representation of collective tagging knowledge via tag networks; (2) proposing the new information spreader identification problem in egocentric soical networks; (3) defining group profiling as a systematic approach to understanding social groups. In sum, the research proposes novel concepts and approaches for connecting the like-minded, enables the understanding of user groups, and exposes interesting research opportunities.

ContributorsWang, Xufei (Author) / Liu, Huan (Thesis advisor) / Kambhampati, Subbarao (Committee member) / Sundaram, Hari (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Advancing biomedical named entity recognition with multivariate feature selection and semantically motivated features

Description

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located…

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located within natural-language text and their semantic type is determined. This step is critical for later tasks in an information extraction pipeline, including normalization and relationship extraction. BANNER is a benchmark biomedical NER system using linear-chain conditional random fields and the rich feature set approach. A case study with BANNER locating genes and proteins in biomedical literature is described. The first corpus for disease NER adequate for use as training data is introduced, and employed in a case study of disease NER. The first corpus locating adverse drug reactions (ADRs) in user posts to a health-related social website is also described, and a system to locate and identify ADRs in social media text is created and evaluated. The rich feature set approach to creating NER feature sets is argued to be subject to diminishing returns, implying that additional improvements may require more sophisticated methods for creating the feature set. This motivates the first application of multivariate feature selection with filters and false discovery rate analysis to biomedical NER, resulting in a feature set at least 3 orders of magnitude smaller than the set created by the rich feature set approach. Finally, two novel approaches to NER by modeling the semantics of token sequences are introduced. The first method focuses on the sequence content by using language models to determine whether a sequence resembles entries in a lexicon of entity names or text from an unlabeled corpus more closely. The second method models the distributional semantics of token sequences, determining the similarity between a potential mention and the token sequences from the training data by analyzing the contexts where each sequence appears in a large unlabeled corpus. The second method is shown to improve the performance of BANNER on multiple data sets.

ContributorsLeaman, James Robert (Author) / Gonzalez, Graciela (Thesis advisor) / Baral, Chitta (Thesis advisor) / Cohen, Kevin B (Committee member) / Liu, Huan (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Batch mode active learning for multimedia pattern recognition

Description

The rapid escalation of technology and the widespread emergence of modern technological equipments have resulted in the generation of humongous amounts of digital data (in the form of images, videos and text). This has expanded the possibility of solving real world problems using computational learning frameworks. However, while gathering a…

The rapid escalation of technology and the widespread emergence of modern technological equipments have resulted in the generation of humongous amounts of digital data (in the form of images, videos and text). This has expanded the possibility of solving real world problems using computational learning frameworks. However, while gathering a large amount of data is cheap and easy, annotating them with class labels is an expensive process in terms of time, labor and human expertise. This has paved the way for research in the field of active learning. Such algorithms automatically select the salient and exemplar instances from large quantities of unlabeled data and are effective in reducing human labeling effort in inducing classification models. To utilize the possible presence of multiple labeling agents, there have been attempts towards a batch mode form of active learning, where a batch of data instances is selected simultaneously for manual annotation. This dissertation is aimed at the development of novel batch mode active learning algorithms to reduce manual effort in training classification models in real world multimedia pattern recognition applications. Four major contributions are proposed in this work: $(i)$ a framework for dynamic batch mode active learning, where the batch size and the specific data instances to be queried are selected adaptively through a single formulation, based on the complexity of the data stream in question, $(ii)$ a batch mode active learning strategy for fuzzy label classification problems, where there is an inherent imprecision and vagueness in the class label definitions, $(iii)$ batch mode active learning algorithms based on convex relaxations of an NP-hard integer quadratic programming (IQP) problem, with guaranteed bounds on the solution quality and $(iv)$ an active matrix completion algorithm and its application to solve several variants of the active learning problem (transductive active learning, multi-label active learning, active feature acquisition and active learning for regression). These contributions are validated on the face recognition and facial expression recognition problems (which are commonly encountered in real world applications like robotics, security and assistive technology for the blind and the visually impaired) and also on collaborative filtering applications like movie recommendation.

ContributorsChakraborty, Shayok (Author) / Panchanathan, Sethuraman (Thesis advisor) / Balasubramanian, Vineeth N. (Committee member) / Li, Baoxin (Committee member) / Mittelmann, Hans (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Low power, high throughput continuous flow PCR instruments for environmental applications

Description

Continuous monitoring in the adequate temporal and spatial scale is necessary for a better understanding of environmental variations. But field deployments of molecular biological analysis platforms in that scale are currently hindered because of issues with power, throughput and automation. Currently, such analysis is performed by the collection of large…

Continuous monitoring in the adequate temporal and spatial scale is necessary for a better understanding of environmental variations. But field deployments of molecular biological analysis platforms in that scale are currently hindered because of issues with power, throughput and automation. Currently, such analysis is performed by the collection of large sample volumes from over a wide area and transporting them to laboratory testing facilities, which fail to provide any real-time information. This dissertation evaluates the systems currently utilized for in-situ field analyses and the issues hampering the successful deployment of such bioanalytial instruments for environmental applications. The design and development of high throughput, low power, and autonomous Polymerase Chain Reaction (PCR) instruments, amenable for portable field operations capable of providing quantitative results is presented here as part of this dissertation. A number of novel innovations have been reported here as part of this work in microfluidic design, PCR thermocycler design, optical design and systems integration. Emulsion microfluidics in conjunction with fluorinated oils and Teflon tubing have been used for the fluidic module that reduces cross-contamination eliminating the need for disposable components or constant cleaning. A cylindrical heater has been designed with the tubing wrapped around fixed temperature zones enabling continuous operation. Fluorescence excitation and detection have been achieved by using a light emitting diode (LED) as the excitation source and a photomultiplier tube (PMT) as the detector. Real-time quantitative PCR results were obtained by using multi-channel fluorescence excitation and detection using LED, optical fibers and a 64-channel multi-anode PMT for measuring continuous real-time fluorescence. The instrument was evaluated by comparing the results obtained with those obtained from a commercial instrument and found to be comparable. To further improve the design and enhance its field portability, this dissertation also presents a framework for the instrumentation necessary for a portable digital PCR platform to achieve higher throughputs with lower power. Both systems were designed such that it can easily couple with any upstream platform capable of providing nucleic acid for analysis using standard fluidic connections. Consequently, these instruments can be used not only in environmental applications, but portable diagnostics applications as well.

ContributorsRay, Tathagata (Author) / Youngbull, Cody (Thesis advisor) / Goryll, Michael (Thesis advisor) / Blain Christen, Jennifer (Committee member) / Yu, Hongyu (Committee member) / Arizona State University (Publisher)

Created2013

Systems integration for biosensing: design, fabrication, and packaging of microelectronics, sensors, and microfluidics

Description

Over the past fifty years, the development of sensors for biological applications has increased dramatically. This rapid growth can be attributed in part to the reduction in feature size, which the electronics industry has pioneered over the same period. The decrease in feature size has led to the production of…

Over the past fifty years, the development of sensors for biological applications has increased dramatically. This rapid growth can be attributed in part to the reduction in feature size, which the electronics industry has pioneered over the same period. The decrease in feature size has led to the production of microscale sensors that are used for sensing applications, ranging from whole-body monitoring down to molecular sensing. Unfortunately, sensors are often developed without regard to how they will be integrated into biological systems. The complexities of integration are underappreciated. Integration involves more than simply making electrical connections. Interfacing microscale sensors with biological environments requires numerous considerations with respect to the creation of compatible packaging, the management of biological reagents, and the act of combining technologies with different dimensions and material properties. Recent advances in microfluidics, especially the proliferation of soft lithography manufacturing methods, have established the groundwork for creating systems that may solve many of the problems inherent to sensor-fluidic interaction. The adaptation of microelectronics manufacturing methods, such as Complementary Metal-Oxide-Semiconductor (CMOS) and Microelectromechanical Systems (MEMS) processes, allows the creation of a complete biological sensing system with integrated sensors and readout circuits. Combining these technologies is an obstacle to forming complete sensor systems. This dissertation presents new approaches for the design, fabrication, and integration of microscale sensors and microelectronics with microfluidics. The work addresses specific challenges, such as combining commercial manufacturing processes into biological systems and developing microscale sensors in these processes. This work is exemplified through a feedback-controlled microfluidic pH system to demonstrate the integration capabilities of microscale sensors for autonomous microenvironment control.

ContributorsWelch, David (Author) / Blain Christen, Jennifer (Thesis advisor) / Muthuswamy, Jitendran (Committee member) / Frakes, David (Committee member) / LaBelle, Jeffrey (Committee member) / Goryll, Michael (Committee member) / Arizona State University (Publisher)

Created2012

High-quality extended-wavelength materials for optoelectronic applications

Description

Photodetectors in the 1.7 to 4.0 μm range are being commercially developed on InP substrates to meet the needs of longer wavelength applications such as thermal and medical sensing. Currently, these devices utilize high indium content metamorphic Ga1-xInxAs (x > 0.53) layers to extend the wavelength range beyond the 1.7…

Photodetectors in the 1.7 to 4.0 μm range are being commercially developed on InP substrates to meet the needs of longer wavelength applications such as thermal and medical sensing. Currently, these devices utilize high indium content metamorphic Ga1-xInxAs (x > 0.53) layers to extend the wavelength range beyond the 1.7 μm achievable using lattice matched GaInAs. The large lattice mismatch required to reach the extended wavelengths results in photodetector materials that contain a large number of misfit dislocations. The low quality of these materials results in a large nonradiative Shockley Read Hall generation/recombination rate that is manifested as an undesirable large thermal noise level in these photodetectors. This work focuses on utilizing the different band structure engineering methods to design more efficient devices on InP substrates. One prospective way to improve photodetector performance at the extended wavelengths is to utilize lattice matched GaInAs/GaAsSb structures that have a type-II band alignment, where the ground state transition energy of the superlattice is smaller than the bandgap of either constituent material. Over the extended wavelength range of 2 to 3 μm this superlattice structure has an optimal period thickness of 3.4 to 5.2 nm and a wavefunction overlap of 0.8 to 0.4, respectively. In using a type-II superlattice to extend the cutoff wavelength there is a tradeoff between the wavelength reached and the electron-hole wavefunction overlap realized, and hence absorption coefficient achieved. This tradeoff and the subsequent reduction in performance can be overcome by two methods: adding bismuth to this type-II material system; applying strain on both layers in the system to attain strain-balanced condition. These allow the valance band alignment and hence the wavefunction overlap to be tuned independently of the wavelength cutoff. Adding 3% bismuth to the GaInAs constituent material, the resulting lattice matched Ga0.516In0.484As0.970Bi0.030/GaAs0.511Sb0.489superlattice realizes a 50% larger absorption coefficient. While as, similar results can be achieved with strain-balanced condition with strain limited to 1.9% on either layer. The optimal design rules derived from the different possibilities make it feasible to extract superlattice period thickness with the best absorption coefficient for any cutoff wavelength in the range.

ContributorsSharma, Ankur R (Author) / Johnson, Shane (Thesis advisor) / Goryll, Michael (Committee member) / Roedel, Ronald (Committee member) / Arizona State University (Publisher)

Created2013

Drosophila stage annotation using sparse learning method

Description

Drosophila melanogaster, as an important model organism, is used to explore the mechanism which governs cell differentiation and embryonic development. Understanding the mechanism will help to reveal the effects of genes on other species or even human beings. Currently, digital camera techniques make high quality Drosophila gene expression imaging possible.…

Drosophila melanogaster, as an important model organism, is used to explore the mechanism which governs cell differentiation and embryonic development. Understanding the mechanism will help to reveal the effects of genes on other species or even human beings. Currently, digital camera techniques make high quality Drosophila gene expression imaging possible. On the other hand, due to the advances in biology, gene expression images which can reveal spatiotemporal patterns are generated in a high-throughput pace. Thus, an automated and efficient system that can analyze gene expression will become a necessary tool for investigating the gene functions, interactions and developmental processes. One investigation method is to compare the expression patterns of different developmental stages. Recently, however, the expression patterns are manually annotated with rough stage ranges. The work of annotation requires professional knowledge from experienced biologists. Hence, how to transfer the domain knowledge in biology into an automated system which can automatically annotate the patterns provides a challenging problem for computer scientists. In this thesis, the problem of stage annotation for Drosophila embryo is modeled in the machine learning framework. Three sparse learning algorithms and one ensemble algorithm are used to attack the problem. The sparse algorithms are Lasso, group Lasso and sparse group Lasso. The ensemble algorithm is based on a voting method. Besides that the proposed algorithms can annotate the patterns to stages instead of stage ranges with high accuracy; the decimal stage annotation algorithm presents a novel way to annotate the patterns to decimal stages. In addition, some analysis on the algorithm performance are made and corresponding explanations are given. Finally, with the proposed system, all the lateral view BDGP and FlyFish images are annotated and several interesting applications of decimal stage value are revealed.

ContributorsPan, Cheng (Author) / Ye, Jieping (Thesis advisor) / Li, Baoxin (Committee member) / Farin, Gerald (Committee member) / Arizona State University (Publisher)

Created2012