Matching Items (7)
Filtering by

Clear all filters

154999-Thumbnail Image.png
Description
Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks such as pharmacovigilance via the use of Natural Language Processing (NLP) techniques. One of the critical steps in information extraction pipelines is Named Entity Recognition

Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks such as pharmacovigilance via the use of Natural Language Processing (NLP) techniques. One of the critical steps in information extraction pipelines is Named Entity Recognition (NER), where the mentions of entities such as diseases are located in text and their entity type are identified. However, the language in social media is highly informal, and user-expressed health-related concepts are often non-technical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and advanced machine learning-based NLP techniques have been underutilized. This work explores the effectiveness of different machine learning techniques, and particularly deep learning, to address the challenges associated with extraction of health-related concepts from social media. Deep learning has recently attracted a lot of attention in machine learning research and has shown remarkable success in several applications particularly imaging and speech recognition. However, thus far, deep learning techniques are relatively unexplored for biomedical text mining and, in particular, this is the first attempt in applying deep learning for health information extraction from social media.

This work presents ADRMine that uses a Conditional Random Field (CRF) sequence tagger for extraction of complex health-related concepts. It utilizes a large volume of unlabeled user posts for automatic learning of embedding cluster features, a novel application of deep learning in modeling the similarity between the tokens. ADRMine significantly improved the medical NER performance compared to the baseline systems.

This work also presents DeepHealthMiner, a deep learning pipeline for health-related concept extraction. Most of the machine learning methods require sophisticated task-specific manual feature design which is a challenging step in processing the informal and noisy content of social media. DeepHealthMiner automatically learns classification features using neural networks and utilizing a large volume of unlabeled user posts. Using a relatively small labeled training set, DeepHealthMiner could accurately identify most of the concepts, including the consumer expressions that were not observed in the training data or in the standard medical lexicons outperforming the state-of-the-art baseline techniques.
ContributorsNikfarjam, Azadeh (Author) / Gonzalez, Graciela (Thesis advisor) / Greenes, Robert (Committee member) / Scotch, Matthew (Committee member) / Arizona State University (Publisher)
Created2016
158370-Thumbnail Image.png
Description
This body of research sought to explore relationships between cognitive function and physical activity (PA), sedentary behavior (SB), and sleep, independently and in conjunction, in mid-life to older adults with no known cognitive impairment. Aging is associated with cognitive decline, and lifestyle behaviors such as PA, SB, and sleep, may

This body of research sought to explore relationships between cognitive function and physical activity (PA), sedentary behavior (SB), and sleep, independently and in conjunction, in mid-life to older adults with no known cognitive impairment. Aging is associated with cognitive decline, and lifestyle behaviors such as PA, SB, and sleep, may mitigate this decline. First, a systematic review and meta-analysis was conducted to examine the effect of aerobic PA interventions on memory and executive function in sedentary adults. Second, a longitudinal study was conducted to examine the association between SB and odds of incident cognitive impairment, and SB and cognitive decline in older adults. Last, a cross-sectional study was conducted to examine the joint associations between different levels of sleep with levels of PA, and sleep with levels of sedentary time on memory and executive function. This body of research provided evidence to support the association between aerobic PA and improved cognitive function, SB and incident cognitive impairment and cognitive function declines, and the joint association of sleep and different levels of PA and ST on cognitive function by hypertension status.
ContributorsHoffmann, Nicole M (Author) / Lee, Rebecca E (Thesis advisor) / Petrov, Megan E (Thesis advisor) / Marek, Karen (Committee member) / Arizona State University (Publisher)
Created2020
157341-Thumbnail Image.png
Description
Through three investigations, this dissertation examined properties of the family and early care and education center (ECEC) environments related to preschool-aged children’s cardiovascular fitness (CVF) and gross locomotor skills (GLS). Investigation one used a systematic review and meta-analysis to synthesize the effectiveness of school-based interventions at improving CVF, in preschool-aged

Through three investigations, this dissertation examined properties of the family and early care and education center (ECEC) environments related to preschool-aged children’s cardiovascular fitness (CVF) and gross locomotor skills (GLS). Investigation one used a systematic review and meta-analysis to synthesize the effectiveness of school-based interventions at improving CVF, in preschool-aged children. For investigations two and three product- and process-based measures of GLS were collected from children in ECECs (n=16), using the progressive aerobic cardiovascular endurance run (PACER; n=144) and the CHAMPS motor skill protocol (CMSP; n=91), respectively. Investigation two and three examined family factors and ECEC factors for associations with measures of GLS, respectively.

Investigation one revealed a moderate-to-large effect size for school-based interventions (n=10) increasing CVF (g=0.75; 95%CI [0.40-1.11]). Multi-level interventions (g=.79 [0.34-1.25]) were more effective than interventions focused on the individual (g=0.67 [0.12-1.22]). In investigations two and three children (78.3% Hispanic; mean ± SD age 53.2±4.5 months) completed a mean ± SD 3.7±2.3 PACER laps and 19.0±5.5 CSMP criteria. Individual and family factors associated with PACER laps included child sex (B=-0.96, p=0.03) and age (B=0.17, p<0.01), parents’ promotion of inactivity (B=0.66, p=0.08) and screen time (B=0.65, p=0.05), and parents’ concern for child’s safety during physical activity (B=-0.36, p=0.09). Child age (B=0.47, p<0.01) and parent employment (B=2.29, p=0.07) were associated with CMSP criteria. At the ECEC level, policy environment quality (B=-0.17; p=0.01) was significantly associated with number of PACER laps completed. Outdoor play environment quality (B=0.18; p=0.03), outdoor play equipment total (B=0.32; p<0.01) and screen time environment quality (B=0.60; p=0.02) were significantly associated with CMSP criteria. Researchers, ECEC teachers and policy makers should promote positive environmental changes to preschool-aged children’s family and ECEC environments, as these environments have the potential to improve CVF and GLS more than programs focused on the child alone.
ContributorsSzeszulski, Jacob (Author) / Lee, Rebecca E (Thesis advisor) / Buman, Matthew P (Committee member) / Hooker, Steven P (Committee member) / Vega-Lopez, Sonia (Committee member) / Shaibi, Gabriel Q (Committee member) / Arizona State University (Publisher)
Created2019
157879-Thumbnail Image.png
Description
Accounting for over a third of all emerging and re-emerging infections, viruses represent a major public health threat, which researchers and epidemiologists across the world have been attempting to contain for decades. Recently, genomics-based surveillance of viruses through methods such as virus phylogeography has grown into a popular tool for

Accounting for over a third of all emerging and re-emerging infections, viruses represent a major public health threat, which researchers and epidemiologists across the world have been attempting to contain for decades. Recently, genomics-based surveillance of viruses through methods such as virus phylogeography has grown into a popular tool for infectious disease monitoring. When conducting such surveillance studies, researchers need to manually retrieve geographic metadata denoting the location of infected host (LOIH) of viruses from public sequence databases such as GenBank and any publication related to their study. The large volume of semi-structured and unstructured information that must be reviewed for this task, along with the ambiguity of geographic locations, make it especially challenging. Prior work has demonstrated that the majority of GenBank records lack sufficient geographic granularity concerning the LOIH of viruses. As a result, reviewing full-text publications is often necessary for conducting in-depth analysis of virus migration, which can be a very time-consuming process. Moreover, integrating geographic metadata pertaining to the LOIH of viruses from different sources, including different fields in GenBank records as well as full-text publications, and normalizing the integrated metadata to unique identifiers for subsequent analysis, are also challenging tasks, often requiring expert domain knowledge. Therefore, automated information extraction (IE) methods could help significantly accelerate this process, positively impacting public health research. However, very few research studies have attempted the use of IE methods in this domain.

This work explores the use of novel knowledge-driven geographic IE heuristics for extracting, integrating, and normalizing the LOIH of viruses based on information available in GenBank and related publications; when evaluated on manually annotated test sets, the methods were found to have a high accuracy and shown to be adequate for addressing this challenging problem. It also presents GeoBoost, a pioneering software system for georeferencing GenBank records, as well as a large-scale database containing over two million virus GenBank records georeferenced using the algorithms introduced here. The methods, database and software developed here could help support diverse public health domains focusing on sequence-informed virus surveillance, thereby enhancing existing platforms for controlling and containing disease outbreaks.
ContributorsTahsin, Tasnia (Author) / Gonzalez, Graciela (Thesis advisor) / Scotch, Matthew (Thesis advisor) / Runger, George C. (Committee member) / Arizona State University (Publisher)
Created2019
157992-Thumbnail Image.png
Description
Unstructured texts containing biomedical information from sources such as electronic health records, scientific literature, discussion forums, and social media offer an opportunity to extract information for a wide range of applications in biomedical informatics. Building scalable and efficient pipelines for natural language processing and extraction of biomedical information plays an

Unstructured texts containing biomedical information from sources such as electronic health records, scientific literature, discussion forums, and social media offer an opportunity to extract information for a wide range of applications in biomedical informatics. Building scalable and efficient pipelines for natural language processing and extraction of biomedical information plays an important role in the implementation and adoption of applications in areas such as public health. Advancements in machine learning and deep learning techniques have enabled rapid development of such pipelines. This dissertation presents entity extraction pipelines for two public health applications: virus phylogeography and pharmacovigilance. For virus phylogeography, geographical locations are extracted from biomedical scientific texts for metadata enrichment in the GenBank database containing 2.9 million virus nucleotide sequences. For pharmacovigilance, tools are developed to extract adverse drug reactions from social media posts to open avenues for post-market drug surveillance from non-traditional sources. Across these pipelines, high variance is observed in extraction performance among the entities of interest while using state-of-the-art neural network architectures. To explain the variation, linguistic measures are proposed to serve as indicators for entity extraction performance and to provide deeper insight into the domain complexity and the challenges associated with entity extraction. For both the phylogeography and pharmacovigilance pipelines presented in this work the annotated datasets and applications are open source and freely available to the public to foster further research in public health.
ContributorsMagge, Arjun (Author) / Scotch, Matthew (Thesis advisor) / Gonzalez-Hernandez, Graciela (Thesis advisor) / Greenes, Robert (Committee member) / Arizona State University (Publisher)
Created2019
158766-Thumbnail Image.png
Description
Background. Street food stands (SFS) are common ways in which people in Mexico access food, having been a part of the environment and culture of Mexican food for generations. However, no studies have used a validated assessment tool to reliably measure food and beverage availability at a variety of SFS.

Background. Street food stands (SFS) are common ways in which people in Mexico access food, having been a part of the environment and culture of Mexican food for generations. However, no studies have used a validated assessment tool to reliably measure food and beverage availability at a variety of SFS. Nor have the availability, density, variety, and distribution of SFS and street foods and beverages been assessed across neighborhood income levels.Objective: This dissertation’s goal was to decrease gaps in knowledge about the role SFS may play in food availability in the Mexican food environment.
Methods: Survey design and ethnographic field methods were used to develop, test, and validate the Street Food Stand Assessment Tool (SFSAT). Geographic information system and ground-truthing methods were used to identify a sample of street segments across 20 neighborhoods representing low-, middle- and high-income neighborhoods in Mexico City on which to assess the availability, density, variety, and distribution of SFS and the foods and beverages sold at these food venues using the SFSAT.
Results: A sample of 391 SFS were assessed across 791 street segments. Results showed that SFS were found in all neighborhoods. Contrary to the initial hypothesis, most SFS were found in middle-income neighborhoods. While the availability of street foods and beverages was higher in middle-income neighborhoods, the variety was less consistent: fruit/vegetable variety was high in high-income neighborhoods whereas processed snack variety was higher in low-income neighborhoods. SFS were most often distributed near homes, transportation centers, and worksites across the three neighborhood income levels.
Conclusion: This study bridged the gap in knowledge about the availability, density, variety, and distribution of SFS and products sold at these sources of food by using an assessment tool that was developed, tested, and validated specifically for SFS. The findings showed that SFS were found across all neighborhoods. Furthermore, results also suggested that SFS can be a source of healthy food items. Additional studies are needed to understand the relationship between SFS availability, food consumption, and health outcomes in the Mexican population.
ContributorsRosales Chavez, Jose Benito (Author) / Jehn, Megan (Thesis advisor) / Bruening, Meg (Thesis advisor) / Lee, Rebecca E (Committee member) / Ohri-Vachaspati, Punam (Committee member) / Arizona State University (Publisher)
Created2020
158895-Thumbnail Image.png
Description
The severity of the health and economic devastation resulting from outbreaks of viruses such as Zika, Ebola, SARS-CoV-1 and, most recently, SARS-CoV-2 underscores the need for tools which aim to delineate critical disease dynamical features underlying observed patterns of infectious disease spread. The growing emphasis placed on genome sequencing to

The severity of the health and economic devastation resulting from outbreaks of viruses such as Zika, Ebola, SARS-CoV-1 and, most recently, SARS-CoV-2 underscores the need for tools which aim to delineate critical disease dynamical features underlying observed patterns of infectious disease spread. The growing emphasis placed on genome sequencing to support pathogen outbreak response highlights the need to adapt traditional epidemiological metrics to leverage this increasingly rich data stream. Further, the rapidity with which pathogen molecular sequence data is now generated, coupled with advent of sophisticated, Bayesian statistical techniques for pathogen molecular sequence analysis, creates an unprecedented opportunity to disrupt and innovate public health surveillance using 21st century tools. Bayesian phylogeography is a modeling framework which assumes discrete traits -- such as age, location of sampling, or species -- evolve according to a continuous-time Markov chain process along a phylogenetic tree topology which is inferred from molecular sequence data.

While myriad studies exist which reconstruct patterns of discrete trait evolution along an inferred phylogeny, attempts to translate the results of phyloegographic analyses into actionable metrics that can be used by public health agencies to direct the development of interventions aimed at reducing pathogen spread are conspicuously absent from the literature. In this dissertation, I focus on developing an intuitive metric, the phylogenetic risk ratio (PRR), which I use to translate the results of Bayesian phylogeographic modeling studies into a form actionable by public health agencies. I apply the PRR to two case studies: i) age-associated diffusion of influenza A/H3N2 during the 2016-17 US epidemic and ii) host associated diffusion of West Nile virus in the US. I discuss the limitations of this (and Bayesian phylogeographic) approaches when studying non-geographic traits for which limited metadata is available in public molecular sequence databases and statistically principled solutions to the missing metadata problem in the phylogenetic context. Then, I perform a simulation study to evaluate the statistical performance of the missing metadata solution. Finally, I provide a solution for researchers whom are interested in using the PRR and phylogenetic UTMs in their own genomic epidemiological studies yet are deterred by the idiosyncratic, error-prone processes required to implement these methods using popular Bayesian phylogenetic inference software packages. My solution, Build-A-BEAST, is a publicly available, object-oriented system written in python which aims to reduce the complexity and idiosyncrasy of creating XML files necessary to perform the aforementioned analyses. This dissertation extends the conceptual framework of Bayesian phylogeographic methods, develops a method to translates the output of phylogenetic models into an actionable form, evaluates the use of priors for missing metadata, and, finally, provides a solution which eases the implementation of these methods. In doing so, I lay the foundation for future work in disseminating and implementing Bayesian phylogeographic methods for routine public health surveillance.
ContributorsVaiente, Matteo (Author) / Scotch, Matthew (Thesis advisor) / Mubayi, Anuj (Committee member) / Liu, Li (Committee member) / Arizona State University (Publisher)
Created2020