Matching Items (8)
Filtering by

Clear all filters

134821-Thumbnail Image.png
Description
Mosquito population data is a valuable resource for researchers and public health officials working to limit the spread of deadly zoonotic viruses such as Zika Virus and West Nile Virus. Unfortunately, this data is currently difficult to obtain and aggregate across the United States. Obtaining historical data often requires filing

Mosquito population data is a valuable resource for researchers and public health officials working to limit the spread of deadly zoonotic viruses such as Zika Virus and West Nile Virus. Unfortunately, this data is currently difficult to obtain and aggregate across the United States. Obtaining historical data often requires filing requests to individual States or Counties and hoping for a response. Current online systems available for accessing aggregated data are lacking essential features, or limited in scope. In order to make mosquito population data more accessible for United States researchers, epidemiologists, and public health officials, the MosquitoDB system has been developed. MosquitoDB consists of a JavaScript Web Application, connected to a SQL database, that makes submitting and retrieving United States mosquito population data much simpler and straight forward than alternative systems. The MosquitoDB software project is open source and publically available on GitHub, allowing community scrutiny and contributions to add or improve necessary features. For this Creative Project, the core MosquitoDB system was designed and developed with 3 main features: 1) Web Interface for querying mosquito data. 2) Web Interface for submitting mosquito data. 3) Web Services for querying/retrieving and submitting mosquito data. The Web Interface is essential for common end users, such as researchers and public health officials, to access historical data or submit new data. The Web Services provide building blocks for Web Applications that other developers can use to incorporate data into new applications. The current MosquitoDB system is live at https://zodo.asu.edu/mosquito and the public code repository is available at https://github.com/developerDemetri/mosquitodb.
ContributorsJones-Shargani, Demetrius Paul (Author) / Scotch, Matthew (Thesis director) / Weissenbacher, Davy (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)
Created2016-12
154663-Thumbnail Image.png
Description
Text mining of biomedical literature and clinical notes is a very active field of research in biomedical science. Semantic analysis is one of the core modules for different Natural Language Processing (NLP) solutions. Methods for calculating semantic relatedness of two concepts can be very useful in solutions solving different problems

Text mining of biomedical literature and clinical notes is a very active field of research in biomedical science. Semantic analysis is one of the core modules for different Natural Language Processing (NLP) solutions. Methods for calculating semantic relatedness of two concepts can be very useful in solutions solving different problems such as relationship extraction, ontology creation and question / answering [1–6]. Several techniques exist in calculating semantic relatedness of two concepts. These techniques utilize different knowledge sources and corpora. So far, researchers attempted to find the best hybrid method for each domain by combining semantic relatedness techniques and data sources manually. In this work, attempts were made to eliminate the needs for manually combining semantic relatedness methods targeting any new contexts or resources through proposing an automated method, which attempted to find the best combination of semantic relatedness techniques and resources to achieve the best semantic relatedness score in every context. This may help the research community find the best hybrid method for each context considering the available algorithms and resources.
ContributorsEmadzadeh, Ehsan (Author) / Gonzalez, Graciela (Thesis advisor) / Greenes, Robert (Committee member) / Scotch, Matthew (Committee member) / Arizona State University (Publisher)
Created2016
154641-Thumbnail Image.png
Description
Proliferation of social media websites and discussion forums in the last decade has resulted in social media mining emerging as an effective mechanism to extract consumer patterns. Most research on social media and pharmacovigilance have concentrated on

Adverse Drug Reaction (ADR) identification. Such methods employ a step of drug search followed

Proliferation of social media websites and discussion forums in the last decade has resulted in social media mining emerging as an effective mechanism to extract consumer patterns. Most research on social media and pharmacovigilance have concentrated on

Adverse Drug Reaction (ADR) identification. Such methods employ a step of drug search followed by classification of the associated text as consisting an ADR or not. Although this method works efficiently for ADR classifications, if ADR evidence is present in users posts over time, drug mentions fail to capture such ADRs. It also fails to record additional user information which may provide an opportunity to perform an in-depth analysis for lifestyle habits and possible reasons for any medical problems.

Pre-market clinical trials for drugs generally do not include pregnant women, and so their effects on pregnancy outcomes are not discovered early. This thesis presents a thorough, alternative strategy for assessing the safety profiles of drugs during pregnancy by utilizing user timelines from social media. I explore the use of a variety of state-of-the-art social media mining techniques, including rule-based and machine learning techniques, to identify pregnant women, monitor their drug usage patterns, categorize their birth outcomes, and attempt to discover associations between drugs and bad birth outcomes.

The technique used models user timelines as longitudinal patient networks, which provide us with a variety of key information about pregnancy, drug usage, and post-

birth reactions. I evaluate the distinct parts of the pipeline separately, validating the usefulness of each step. The approach to use user timelines in this fashion has produced very encouraging results, and can be employed for a range of other important tasks where users/patients are required to be followed over time to derive population-based measures.
ContributorsChandrashekar, Pramod Bharadwaj (Author) / Davulcu, Hasan (Thesis advisor) / Gonzalez, Graciela (Thesis advisor) / Hsiao, Sharon (Committee member) / Arizona State University (Publisher)
Created2016
149607-Thumbnail Image.png
Description
In the current millennium, extensive use of computers and the internet caused an exponential increase in information. Few research areas are as important as information extraction, which primarily involves extracting concepts and the relations between them from free text. Limitations in the size of training data, lack of lexicons and

In the current millennium, extensive use of computers and the internet caused an exponential increase in information. Few research areas are as important as information extraction, which primarily involves extracting concepts and the relations between them from free text. Limitations in the size of training data, lack of lexicons and lack of relationship patterns are major factors for poor performance in information extraction. This is because the training data cannot possibly contain all concepts and their synonyms; and it contains only limited examples of relationship patterns between concepts. Creating training data, lexicons and relationship patterns is expensive, especially in the biomedical domain (including clinical notes) because of the depth of domain knowledge required of the curators. Dictionary-based approaches for concept extraction in this domain are not sufficient to effectively overcome the complexities that arise because of the descriptive nature of human languages. For example, there is a relatively higher amount of abbreviations (not all of them present in lexicons) compared to everyday English text. Sometimes abbreviations are modifiers of an adjective (e.g. CD4-negative) rather than nouns (and hence, not usually considered named entities). There are many chemical names with numbers, commas, hyphens and parentheses (e.g. t(3;3)(q21;q26)), which will be separated by most tokenizers. In addition, partial words are used in place of full words (e.g. up- and downregulate); and some of the words used are highly specialized for the domain. Clinical notes contain peculiar drug names, anatomical nomenclature, other specialized names and phrases that are not standard in everyday English or in published articles (e.g. "l shoulder inj"). State of the art concept extraction systems use machine learning algorithms to overcome some of these challenges. However, they need a large annotated corpus for every concept class that needs to be extracted. A novel natural language processing approach to minimize this limitation in concept extraction is proposed here using distributional semantics. Distributional semantics is an emerging field arising from the notion that the meaning or semantics of a piece of text (discourse) depends on the distribution of the elements of that discourse in relation to its surroundings. Distributional information from large unlabeled data is used to automatically create lexicons for the concepts to be tagged, clusters of contextually similar words, and thesauri of distributionally similar words. These automatically generated lexical resources are shown here to be more useful than manually created lexicons for extracting concepts from both literature and narratives. Further, machine learning features based on distributional semantics are shown to improve the accuracy of BANNER, and could be used in other machine learning systems such as cTakes to improve their performance. In addition, in order to simplify the sentence patterns and facilitate association extraction, a new algorithm using a "shotgun" approach is proposed. The goal of sentence simplification has traditionally been to reduce the grammatical complexity of sentences while retaining the relevant information content and meaning to enable better readability for humans and enhanced processing by parsers. Sentence simplification is shown here to improve the performance of association extraction systems for both biomedical literature and clinical notes. It helps improve the accuracy of protein-protein interaction extraction from the literature and also improves relationship extraction from clinical notes (such as between medical problems, tests and treatments). Overall, the two main contributions of this work include the application of sentence simplification to association extraction as described above, and the use of distributional semantics for concept extraction. The proposed work on concept extraction amalgamates for the first time two diverse research areas -distributional semantics and information extraction. This approach renders all the advantages offered in other semi-supervised machine learning systems, and, unlike other proposed semi-supervised approaches, it can be used on top of different basic frameworks and algorithms.
ContributorsJonnalagadda, Siddhartha Reddy (Author) / Gonzalez, Graciela H (Thesis advisor) / Cohen, Trevor A (Committee member) / Greenes, Robert A (Committee member) / Fridsma, Douglas B (Committee member) / Arizona State University (Publisher)
Created2011
148351-Thumbnail Image.png
Description

Contraceptive methods are vital in maintaining women’s health and preventing unintended pregnancy. When a woman uses a method that reflects her personal preferences and lifestyle, the chances of low adoption and misuse decreases. The research aim of this project is to develop a web-based decision aid tailored to college women

Contraceptive methods are vital in maintaining women’s health and preventing unintended pregnancy. When a woman uses a method that reflects her personal preferences and lifestyle, the chances of low adoption and misuse decreases. The research aim of this project is to develop a web-based decision aid tailored to college women that assists in the selection of contraceptive methods. For this reason, My Contraceptive Choice (MCC) is built using the gaps identified in existing resources provided by Planned Parenthood and Bedsider, along with feedback from a university student focus group. The tool is a short quiz that is followed by two pages of information and resources for a variety of different contraceptive methods commonly used by college women. The evaluation phase of this project includes simulated test cases, a Google Forms survey, and a second focus group to assess the tool for accuracy and usability. From the survey, 130 of the 150 (80.7%) responses believe that the recommendations provided can help them select a birth control method. Furthermore, 136 of the 150 (90.0%) responses believe that the layout of the tool made it easy to navigate. The second focus group feedback suggests that the MCC tool is perceived to be accurate, usable, and useful to the college population. Participants believe that the MCC tool performs better overall compared to the Planned Parenthood quiz in creating a customized recommendation and Bedsider in overall usability. The test cases reveal that there are further improvements that could be made to create a more accurate recommendation to the user. In conclusion, the new MCC tool accomplishes the aim of creating a beneficial resource to college women in assisting with the birth control selection process.

ContributorsRedman, Molly (Author) / Wang, Dongwen (Thesis director) / Brian, Jennifer (Committee member) / College of Health Solutions (Contributor) / Department of Information Systems (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)
Created2021-05
157992-Thumbnail Image.png
Description
Unstructured texts containing biomedical information from sources such as electronic health records, scientific literature, discussion forums, and social media offer an opportunity to extract information for a wide range of applications in biomedical informatics. Building scalable and efficient pipelines for natural language processing and extraction of biomedical information plays an

Unstructured texts containing biomedical information from sources such as electronic health records, scientific literature, discussion forums, and social media offer an opportunity to extract information for a wide range of applications in biomedical informatics. Building scalable and efficient pipelines for natural language processing and extraction of biomedical information plays an important role in the implementation and adoption of applications in areas such as public health. Advancements in machine learning and deep learning techniques have enabled rapid development of such pipelines. This dissertation presents entity extraction pipelines for two public health applications: virus phylogeography and pharmacovigilance. For virus phylogeography, geographical locations are extracted from biomedical scientific texts for metadata enrichment in the GenBank database containing 2.9 million virus nucleotide sequences. For pharmacovigilance, tools are developed to extract adverse drug reactions from social media posts to open avenues for post-market drug surveillance from non-traditional sources. Across these pipelines, high variance is observed in extraction performance among the entities of interest while using state-of-the-art neural network architectures. To explain the variation, linguistic measures are proposed to serve as indicators for entity extraction performance and to provide deeper insight into the domain complexity and the challenges associated with entity extraction. For both the phylogeography and pharmacovigilance pipelines presented in this work the annotated datasets and applications are open source and freely available to the public to foster further research in public health.
ContributorsMagge, Arjun (Author) / Scotch, Matthew (Thesis advisor) / Gonzalez-Hernandez, Graciela (Thesis advisor) / Greenes, Robert (Committee member) / Arizona State University (Publisher)
Created2019
132648-Thumbnail Image.png
Description
Background: Pulmonary embolism is a deadly condition that is often diagnosed using a technique known as computed tomography pulmonary angiography (CTPA). CTPA reports are free-text, narrative-style forms of documentation conferring radiologist findings—both primary (regarding pulmonary embolism) and incidental. This project seeks to combine simple natural language processing (NLP) techniques, such

Background: Pulmonary embolism is a deadly condition that is often diagnosed using a technique known as computed tomography pulmonary angiography (CTPA). CTPA reports are free-text, narrative-style forms of documentation conferring radiologist findings—both primary (regarding pulmonary embolism) and incidental. This project seeks to combine simple natural language processing (NLP) techniques, such as regular expressions and rules, to build upon and
further process output from a machine learning based named entity recognition (NER) tool for the purposes of (1) linking references to radiological images with the corresponding clinical findings and (2) extracting primary and incidental findings.

Methods: The project’s system utilized a regular expression to extract image references. All CTPA reports were first processed with NER software to obtain the text and spans of clinical findings. A heuristic was used to determine the appropriate clinical finding that should be linked with a particular image reference. Another regular expression was used to extract primary findings from NER output; the remaining findings were considered incidental. Performance was
assessed against a gold standard, which was based upon a manually annotated version of the CTPA reports used in this project.

Results: Extraction of image references achieved a 100% accuracy. Linkages between these references and exact gold standard spans of the clinical findings achieved a precision of 0.24, a recall of 0.22, and an F1 score of 0.23. Linkages with partial spans of clinical findings as determined by the gold standard achieved a precision of 0.71, a recall of 0.67, and an F1 score of 0.69. Primary and incidental finding extraction achieved a precision of 0.67, a recall of 0.80, and
an F1 score of 0.73.

Discussion: Various elements reduced system performance such as the difficulty of exactly matching the spans of clinical findings from NER output with those found in the gold standard. The heuristic linking clinical findings and image references was especially sensitive to NER false positives and false negatives due to its assumption that the appropriate clinical finding was that which was immediately prior to the image reference. Although the system did not perform as well as hoped, lessons were learned such as the need for clear research methodology and proper gold standard creation; without a proper gold standard, problem scope and system performance cannot be properly assessed. Improvements to the system include creating a more robust heuristic, sifting NER false positives, and training the NER tool used on a dataset of CTPA reports.
ContributorsBorlongan, Matthew Bilog (Author) / Devarakonda, Murthy (Thesis director) / Murcko, Anita (Committee member) / College of Health Solutions (Contributor, Contributor) / Barrett, The Honors College (Contributor)
Created2019-05
190923-Thumbnail Image.png
Description
Sticking to healthy behaviors is difficult. The lack of long-term behavior maintenance negatively impacts health outcomes and increases healthcare costs. Current methods for improving behavior maintenance yield varying and often limited results. This dissertation designs and tests quantitative methods for identifying behavioral strategies associated with long-term maintenance the long-term maintenance

Sticking to healthy behaviors is difficult. The lack of long-term behavior maintenance negatively impacts health outcomes and increases healthcare costs. Current methods for improving behavior maintenance yield varying and often limited results. This dissertation designs and tests quantitative methods for identifying behavioral strategies associated with long-term maintenance the long-term maintenance of three different health behaviors. Data were collected from three settings: mindfulness through a commercial app, walking from a randomized controlled trial, and pill-taking from a commercial app-based intervention. Novel pattern-detection methodologies were employed to measure temporal consistency and identify key behavioral strategies. For mindfulness and walking behaviors, the impact of individual phenotypes on long-term behavior maintenance was analyzed. For medication adherence, the optimal window of time in which pills should be taken was empirically determined, and the impact of consistent timing on long-term medication adherence was analyzed. To perform these analyses, robust and regularized models, panel data models, statistical tests, and clustering algorithms were used. For mindfulness meditation, both consistent and inconsistent phenotypes were associated with long-term engagement. In the walking intervention, those with a consistent phenotype experienced greater increases in walking after the study than inconsistent individuals. However, the effect of consistency was strongest for individuals who either exercised less than 10 or more than 30 minutes per day. Lastly, in the medication adherence incentive program, consistently taking medication within 55 minutes of the goal time had the strongest association with future adherence. This dissertation demonstrates that certain phenotypes are more advantageous than others for long-term maintenance and interventions. Temporal consistency is likely helpful for maintaining behaviors that offer delayed physical benefits, such as regular walking or medicating for chronic illnesses, but less helpful for cognitive behaviors like mindfulness, which can provide more immediate satisfaction. When designing interventions, the nature of the behavior and observable phenotypes should be taken into consideration. Generally, focusing on consistency is likely to contribute to long-term success; however, this is individual and context dependent. Future research should investigate this further by examining the relationship between behavioral phenotypes and psychological measurement tools to gain a deeper understanding of the successful maintenance of healthy behaviors.
ContributorsFowers, Rylan (Author) / Stetcher, Chad (Thesis advisor) / Chung, Yunro (Thesis advisor) / Ghasemzadeh, Hassan (Committee member) / Arizona State University (Publisher)
Created2023