Search Content

Advancing biomedical named entity recognition with multivariate feature selection and semantically motivated features

Description

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located…

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located within natural-language text and their semantic type is determined. This step is critical for later tasks in an information extraction pipeline, including normalization and relationship extraction. BANNER is a benchmark biomedical NER system using linear-chain conditional random fields and the rich feature set approach. A case study with BANNER locating genes and proteins in biomedical literature is described. The first corpus for disease NER adequate for use as training data is introduced, and employed in a case study of disease NER. The first corpus locating adverse drug reactions (ADRs) in user posts to a health-related social website is also described, and a system to locate and identify ADRs in social media text is created and evaluated. The rich feature set approach to creating NER feature sets is argued to be subject to diminishing returns, implying that additional improvements may require more sophisticated methods for creating the feature set. This motivates the first application of multivariate feature selection with filters and false discovery rate analysis to biomedical NER, resulting in a feature set at least 3 orders of magnitude smaller than the set created by the rich feature set approach. Finally, two novel approaches to NER by modeling the semantics of token sequences are introduced. The first method focuses on the sequence content by using language models to determine whether a sequence resembles entries in a lexicon of entity names or text from an unlabeled corpus more closely. The second method models the distributional semantics of token sequences, determining the similarity between a potential mention and the token sequences from the training data by analyzing the contexts where each sequence appears in a large unlabeled corpus. The second method is shown to improve the performance of BANNER on multiple data sets.

ContributorsLeaman, James Robert (Author) / Gonzalez, Graciela (Thesis advisor) / Baral, Chitta (Thesis advisor) / Cohen, Kevin B (Committee member) / Liu, Huan (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Envelope as climate negotiator: evaluating adaptive building envelope's capacity to moderate indoor climate and energy

Description

Through manipulation of adaptable opportunities available within a given environment, individuals become active participants in managing personal comfort requirements, by exercising control over their comfort without the assistance of mechanical heating and cooling systems. Similarly, continuous manipulation of a building skin's form, insulation, porosity, and transmissivity qualities exerts control over…

Through manipulation of adaptable opportunities available within a given environment, individuals become active participants in managing personal comfort requirements, by exercising control over their comfort without the assistance of mechanical heating and cooling systems. Similarly, continuous manipulation of a building skin's form, insulation, porosity, and transmissivity qualities exerts control over the energy exchanged between indoor and outdoor environments. This research uses four adaptive response variables in a modified software algorithm to explore an adaptive building skin's potential in reacting to environmental stimuli with the purpose of minimizing energy use without sacrificing occupant comfort. Results illustrate that significant energy savings can be realized with adaptive envelopes over static building envelopes even under extreme summer and winter climate conditions; that the magnitude of these savings are dependent on climate and orientation; and that occupant thermal comfort can be improved consistently over comfort levels achieved by optimized static building envelopes. The resulting adaptive envelope's unique climate-specific behavior could inform designers in creating an intelligent kinetic aesthetic that helps facilitate adaptability and resiliency in architecture.

ContributorsErickson, James (Author) / Bryan, Harvey (Thesis advisor) / Addison, Marlin (Committee member) / Kroelinger, Michael D. (Committee member) / Reddy, T. Agami (Committee member) / Arizona State University (Publisher)

Created2013

Effectiveness of in-home feedback devices in conjunction with energy use information on residential energy consumption

Description

Residential energy consumption accounts for 22% of the total energy use in the United States. The consumer's perception of energy usage and conservation are very inaccurate which is leading to growing number of individuals who try to seek out ways to use energy more wisely. Hence behavioral change in consumers…

Residential energy consumption accounts for 22% of the total energy use in the United States. The consumer's perception of energy usage and conservation are very inaccurate which is leading to growing number of individuals who try to seek out ways to use energy more wisely. Hence behavioral change in consumers with respect to energy use, by providing energy use feedback may be important in reducing home energy consumption. Real-time energy information feedback delivered via technology along with feedback interventions has been reported to produce up to 20 percent declines in residential energy consumption through past research and pilot studies. There are, however, large differences in the estimates of the effect of these different types of feedback on energy use. As part of the Energize Phoenix Program, (a U.S. Department of Energy funded program), a Dashboard Study was conducted by the Arizona State University to estimate the impact of real-time, home-energy displays in conjunction with other feedback interventions on the residential rate of energy consumption in Phoenix, while also creating awareness and encouragement to households to reduce energy consumption. The research evaluates the effectiveness of these feedback initiatives. In the following six months of field experiment, a selected number of low-income multi-family apartments in Phoenix, were divided in three groups of feedback interventions, where one group received residential energy use related education and information, the second group received the same education as well as was equipped with the in-home feedback device and the third was given the same education, the feedback device and added budgeting information. Results of the experiment at the end of the six months did not lend a consistent support to the results from literature and past pilot studies. The data revealed a statistically insignificant reduction in energy consumption for the experiment group overall and inconsistent results for individual households when compared to a randomly selected control sample. However, as per the participant survey results, the study proved effective to foster awareness among participating residents of their own patterns of residential electricity consumption and understanding of residential energy use related savings.

ContributorsRungta, Shaily (Author) / Bryan, Harvey (Thesis advisor) / Reddy, Agami (Committee member) / Webster, Aleksasha (Committee member) / Arizona State University (Publisher)

Created2013

Time division multiplexing of network access by security groups in high performance computing environments

Description

It is commonly known that High Performance Computing (HPC) systems are most frequently used by multiple users for batch job, parallel computations. Less well known, however, are the numerous HPC systems servicing data so sensitive that administrators enforce either a) sequential job processing - only one job at a time…

It is commonly known that High Performance Computing (HPC) systems are most frequently used by multiple users for batch job, parallel computations. Less well known, however, are the numerous HPC systems servicing data so sensitive that administrators enforce either a) sequential job processing - only one job at a time on the entire system, or b) physical separation - devoting an entire HPC system to a single project until recommissioned. The driving forces behind this type of security are numerous but share the common origin of data so sensitive that measures above and beyond industry standard are used to ensure information security. This paper presents a network security solution that provides information security above and beyond industry standard, yet still enabling multi-user computations on the system. This paper's main contribution is a mechanism designed to enforce high level time division multiplexing of network access (Time Division Multiple Access, or TDMA) according to security groups. By dividing network access into time windows, interactions between applications over the network can be prevented in an easily verifiable way.

ContributorsFerguson, Joshua (Author) / Gupta, Sandeep Ks (Thesis advisor) / Varsamopoulos, Georgios (Committee member) / Ball, George (Committee member) / Arizona State University (Publisher)

Created2013

Building adaptive computational systems for physiological and biomedical data

Description

In recent years, machine learning and data mining technologies have received growing attention in several areas such as recommendation systems, natural language processing, speech and handwriting recognition, image processing and biomedical domain. Many of these applications which deal with physiological and biomedical data require person specific or person adaptive systems.…

In recent years, machine learning and data mining technologies have received growing attention in several areas such as recommendation systems, natural language processing, speech and handwriting recognition, image processing and biomedical domain. Many of these applications which deal with physiological and biomedical data require person specific or person adaptive systems. The greatest challenge in developing such systems is the subject-dependent data variations or subject-based variability in physiological and biomedical data, which leads to difference in data distributions making the task of modeling these data, using traditional machine learning algorithms, complex and challenging. As a result, despite the wide application of machine learning, efficient deployment of its principles to model real-world data is still a challenge. This dissertation addresses the problem of subject based variability in physiological and biomedical data and proposes person adaptive prediction models based on novel transfer and active learning algorithms, an emerging field in machine learning. One of the significant contributions of this dissertation is a person adaptive method, for early detection of muscle fatigue using Surface Electromyogram signals, based on a new multi-source transfer learning algorithm. This dissertation also proposes a subject-independent algorithm for grading the progression of muscle fatigue from 0 to 1 level in a test subject, during isometric or dynamic contractions, at real-time. Besides subject based variability, biomedical image data also varies due to variations in their imaging techniques, leading to distribution differences between the image databases. Hence a classifier learned on one database may perform poorly on the other database. Another significant contribution of this dissertation has been the design and development of an efficient biomedical image data annotation framework, based on a novel combination of transfer learning and a new batch-mode active learning method, capable of addressing the distribution differences across databases. The methodologies developed in this dissertation are relevant and applicable to a large set of computing problems where there is a high variation of data between subjects or sources, such as face detection, pose detection and speech recognition. From a broader perspective, these frameworks can be viewed as a first step towards design of automated adaptive systems for real world data.

ContributorsChattopadhyay, Rita (Author) / Panchanathan, Sethuraman (Thesis advisor) / Ye, Jieping (Thesis advisor) / Li, Baoxin (Committee member) / Santello, Marco (Committee member) / Arizona State University (Publisher)

Created2013

Gene regulatory networks: modeling, intervention and context

Description

Biological systems are complex in many dimensions as endless transportation and communication networks all function simultaneously. Our ability to intervene within both healthy and diseased systems is tied directly to our ability to understand and model core functionality. The progress in increasingly accurate and thorough high-throughput measurement technologies has provided…

Biological systems are complex in many dimensions as endless transportation and communication networks all function simultaneously. Our ability to intervene within both healthy and diseased systems is tied directly to our ability to understand and model core functionality. The progress in increasingly accurate and thorough high-throughput measurement technologies has provided a deluge of data from which we may attempt to infer a representation of the true genetic regulatory system. A gene regulatory network model, if accurate enough, may allow us to perform hypothesis testing in the form of computational experiments. Of great importance to modeling accuracy is the acknowledgment of biological contexts within the models -- i.e. recognizing the heterogeneous nature of the true biological system and the data it generates. This marriage of engineering, mathematics and computer science with systems biology creates a cycle of progress between computer simulation and lab experimentation, rapidly translating interventions and treatments for patients from the bench to the bedside. This dissertation will first discuss the landscape for modeling the biological system, explore the identification of targets for intervention in Boolean network models of biological interactions, and explore context specificity both in new graphical depictions of models embodying context-specific genomic regulation and in novel analysis approaches designed to reveal embedded contextual information. Overall, the dissertation will explore a spectrum of biological modeling with a goal towards therapeutic intervention, with both formal and informal notions of biological context, in such a way that will enable future work to have an even greater impact in terms of direct patient benefit on an individualized level.

ContributorsVerdicchio, Michael (Author) / Kim, Seungchan (Thesis advisor) / Baral, Chitta (Committee member) / Stolovitzky, Gustavo (Committee member) / Collofello, James (Committee member) / Arizona State University (Publisher)

Created2013

System-level synthesis of dataplane subsystems for MPSoCs

Description

In recent years we have witnessed a shift towards multi-processor system-on-chips (MPSoCs) to address the demands of embedded devices (such as cell phones, GPS devices, luxury car features, etc.). Highly optimized MPSoCs are well-suited to tackle the complex application demands desired by the end user customer. These MPSoCs incorporate a…

In recent years we have witnessed a shift towards multi-processor system-on-chips (MPSoCs) to address the demands of embedded devices (such as cell phones, GPS devices, luxury car features, etc.). Highly optimized MPSoCs are well-suited to tackle the complex application demands desired by the end user customer. These MPSoCs incorporate a constellation of heterogeneous processing elements (PEs) (general purpose PEs and application-specific integrated circuits (ASICS)). A typical MPSoC will be composed of a application processor, such as an ARM Coretex-A9 with cache coherent memory hierarchy, and several application sub-systems. Each of these sub-systems are composed of highly optimized instruction processors, graphics/DSP processors, and custom hardware accelerators. Typically, these sub-systems utilize scratchpad memories (SPM) rather than support cache coherency. The overall architecture is an integration of the various sub-systems through a high bandwidth system-level interconnect (such as a Network-on-Chip (NoC)). The shift to MPSoCs has been fueled by three major factors: demand for high performance, the use of component libraries, and short design turn around time. As customers continue to desire more and more complex applications on their embedded devices the performance demand for these devices continues to increase. Designers have turned to using MPSoCs to address this demand. By using pre-made IP libraries designers can quickly piece together a MPSoC that will meet the application demands of the end user with minimal time spent designing new hardware. Additionally, the use of MPSoCs allows designers to generate new devices very quickly and thus reducing the time to market. In this work, a complete MPSoC synthesis design flow is presented. We first present a technique \cite{leary1_intro} to address the synthesis of the interconnect architecture (particularly Network-on-Chip (NoC)). We then address the synthesis of the memory architecture of a MPSoC sub-system \cite{leary2_intro}. Lastly, we present a co-synthesis technique to generate the functional and memory architectures simultaneously. The validity and quality of each synthesis technique is demonstrated through extensive experimentation.

ContributorsLeary, Glenn (Author) / Chatha, Karamvir S (Thesis advisor) / Vrudhula, Sarma (Committee member) / Shrivastava, Aviral (Committee member) / Beraha, Rudy (Committee member) / Arizona State University (Publisher)

Created2013

Novel anhydrous superprotonic ionic liquids and membranes for application in mid-temperature fuel cells

Description

This thesis studies three different types of anhydrous proton conducting electrolytes for use in fuel cells. The proton energy level scheme is used to make the first electrolyte which is a rubbery polymer in which the conductivity reaches values typical of activated Nafion, even though it is completely anhydrous. The…

This thesis studies three different types of anhydrous proton conducting electrolytes for use in fuel cells. The proton energy level scheme is used to make the first electrolyte which is a rubbery polymer in which the conductivity reaches values typical of activated Nafion, even though it is completely anhydrous. The protons are introduced into a cross-linked polyphospazene rubber by the superacid HOTf, which is absorbed by partial protonation of the backbone nitrogens. The decoupling of conductivity from segmental relaxation times assessed by comparison with conductivity relaxation times amounts to some 10 orders of magnitude, but it cannot be concluded whether it is purely protonic or due equally to a mobile OTf- or H(OTf)2-; component. The second electrolyte is built on the success of phosphoric acid as a fuel cell electrolyte, by designing a variant of the molecular acid that has increased temperature range without sacrifice of high temperature conductivity or open circuit voltage. The success is achieved by introduction of a hybrid component, based on silicon coordination of phosphate groups, which prevents decomposition or water loss to 250ºC, while enhancing free proton motion. Conductivity studies are reported to 285ºC and full H2/O2 cell polarization curves to 226ºC. The current efficiency reported here (current density per unit of fuel supplied per sec) is the highest on record. A power density of 184 (mW.cm-2) is achieved at 226ºC with hydrogen flow rate of 4.1 ml/minute. The third electrolyte is a novel type of ionic liquids which is made by addition of a super strong Brønsted acid to a super weak Brønsted base. Here it is shown that by allowing the proton of transient HAlCl4, to relocate on a very weak base that is also stable to superacids, we can create an anhydrous ionic liquid, itself a superacid, in which the proton is so loosely bound that at least 50% of the electrical conductivity is due to the motion of free protons. The protic ionic liquids (PILs) described, pentafluoropyridinium tetrachloroaluminate and 5-chloro-2,4,6-trifluoropyrimidinium tetrachloroaluminate, might be the forerunner of a class of materials in which the proton plasma state can be approached.

ContributorsAnsari, Younes (Author) / Angell, Charles A (Thesis advisor) / Richert, Ranko (Committee member) / Chizmeshya, Andrew (Committee member) / Wolf, George (Committee member) / Arizona State University (Publisher)

Created2013

Robust implementation of NL2KR system and it's application in iRODS domain

Description

Currently, to interact with computer based systems one needs to learn the specific interface language of that system. In most cases, interaction would be much easier if it could be done in natural language. For that, we will need a module which understands natural language and automatically translates it to…

Currently, to interact with computer based systems one needs to learn the specific interface language of that system. In most cases, interaction would be much easier if it could be done in natural language. For that, we will need a module which understands natural language and automatically translates it to the interface language of the system. NL2KR (Natural language to knowledge representation) v.1 system is a prototype of such a system. It is a learning based system that learns new meanings of words in terms of lambda-calculus formulas given an initial lexicon of some words and their meanings and a training corpus of sentences with their translations. As a part of this thesis, we take the prototype NL2KR v.1 system and enhance various components of it to make it usable for somewhat substantial and useful interface languages. We revamped the lexicon learning components, Inverse-lambda and Generalization modules, and redesigned the lexicon learning algorithm which uses these components to learn new meanings of words. Similarly, we re-developed an inbuilt parser of the system in Answer Set Programming (ASP) and also integrated external parser with the system. Apart from this, we added some new rich features like various system configurations and memory cache in the learning component of the NL2KR system. These enhancements helped in learning more meanings of the words, boosted performance of the system by reducing the computation time by a factor of 8 and improved the usability of the system. We evaluated the NL2KR system on iRODS domain. iRODS is a rule-oriented data system, which helps in managing large set of computer files using policies. This system provides a Rule-Oriented interface langauge whose syntactic structure is like any procedural programming language (eg. C). However, direct translation of natural language (NL) to this interface language is difficult. So, for automatic translation of NL to this language, we define a simple intermediate Policy Declarative Language (IPDL) to represent the knowledge in the policies, which then can be directly translated to iRODS rules. We develop a corpus of 100 policy statements and manually translate them to IPDL langauge. This corpus is then used for the evaluation of NL2KR system. We performed 10 fold cross validation on the system. Furthermore, using this corpus, we illustrate how different components of our NL2KR system work.

ContributorsKumbhare, Kanchan Ravishankar (Author) / Baral, Chitta (Thesis advisor) / Ye, Jieping (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2013

Testing independence of parallel pseudorandom number streams: incorporating the data's multivariate nature

Description

Parallel Monte Carlo applications require the pseudorandom numbers used on each processor to be independent in a probabilistic sense. The TestU01 software package is the standard testing suite for detecting stream dependence and other properties that make certain pseudorandom generators ineffective in parallel (as well as serial) settings. TestU01 employs…

Parallel Monte Carlo applications require the pseudorandom numbers used on each processor to be independent in a probabilistic sense. The TestU01 software package is the standard testing suite for detecting stream dependence and other properties that make certain pseudorandom generators ineffective in parallel (as well as serial) settings. TestU01 employs two basic schemes for testing parallel generated streams. The first applies serial tests to the individual streams and then tests the resulting P-values for uniformity. The second turns all the parallel generated streams into one long vector and then applies serial tests to the resulting concatenated stream. Various forms of stream dependence can be missed by each approach because neither one fully addresses the multivariate nature of the accumulated data when generators are run in parallel. This dissertation identifies these potential faults in the parallel testing methodologies of TestU01 and investigates two different methods to better detect inter-stream dependencies: correlation motivated multivariate tests and vector time series based tests. These methods have been implemented in an extension to TestU01 built in C++ and the unique aspects of this extension are discussed. A variety of different generation scenarios are then examined using the TestU01 suite in concert with the extension. This enhanced software package is found to better detect certain forms of inter-stream dependencies than the original TestU01 suites of tests.

ContributorsIsmay, Chester (Author) / Eubank, Randall (Thesis advisor) / Young, Dennis (Committee member) / Kao, Ming-Hung (Committee member) / Lanchier, Nicolas (Committee member) / Reiser, Mark R. (Committee member) / Arizona State University (Publisher)

Created2013

Filtering by