Matching Items (36)
152740-Thumbnail Image.png
Description
Genomic structural variation (SV) is defined as gross alterations in the genome broadly classified as insertions/duplications, deletions inversions and translocations. DNA sequencing ushered structural variant discovery beyond laboratory detection techniques to high resolution informatics approaches. Bioinformatics tools for computational discovery of SVs however are still missing variants in the complex

Genomic structural variation (SV) is defined as gross alterations in the genome broadly classified as insertions/duplications, deletions inversions and translocations. DNA sequencing ushered structural variant discovery beyond laboratory detection techniques to high resolution informatics approaches. Bioinformatics tools for computational discovery of SVs however are still missing variants in the complex cancer genome. This study aimed to define genomic context leading to tool failure and design novel algorithm addressing this context. Methods: The study tested the widely held but unproven hypothesis that tools fail to detect variants which lie in repeat regions. Publicly available 1000-Genomes dataset with experimentally validated variants was tested with SVDetect-tool for presence of true positives (TP) SVs versus false negative (FN) SVs, expecting that FNs would be overrepresented in repeat regions. Further, the novel algorithm designed to informatically capture the biological etiology of translocations (non-allelic homologous recombination and 3&ndashD; placement of chromosomes in cells –context) was tested using simulated dataset. Translocations were created in known translocation hotspots and the novel&ndashalgorithm; tool compared with SVDetect and BreakDancer. Results: 53% of false negative (FN) deletions were within repeat structure compared to 81% true positive (TP) deletions. Similarly, 33% FN insertions versus 42% TP, 26% FN duplication versus 57% TP and 54% FN novel sequences versus 62% TP were within repeats. Repeat structure was not driving the tool's inability to detect variants and could not be used as context. The novel algorithm with a redefined context, when tested against SVDetect and BreakDancer was able to detect 10/10 simulated translocations with 30X coverage dataset and 100% allele frequency, while SVDetect captured 4/10 and BreakDancer detected 6/10. For 15X coverage dataset with 100% allele frequency, novel algorithm was able to detect all ten translocations albeit with fewer reads supporting the same. BreakDancer detected 4/10 and SVDetect detected 2/10 Conclusion: This study showed that presence of repetitive elements in general within a structural variant did not influence the tool's ability to capture it. This context-based algorithm proved better than current tools even with half the genome coverage than accepted protocol and provides an important first step for novel translocation discovery in cancer genome.
ContributorsShetty, Sheetal (Author) / Dinu, Valentin (Thesis advisor) / Bussey, Kimberly (Committee member) / Scotch, Matthew (Committee member) / Wallstrom, Garrick (Committee member) / Arizona State University (Publisher)
Created2014
152847-Thumbnail Image.png
Description
The processes of a human somatic cell are very complex with various genetic mechanisms governing its fate. Such cells undergo various genetic mutations, which translate to the genetic aberrations that we see in cancer. There are more than 100 types of cancer, each having many more subtypes with aberrations being

The processes of a human somatic cell are very complex with various genetic mechanisms governing its fate. Such cells undergo various genetic mutations, which translate to the genetic aberrations that we see in cancer. There are more than 100 types of cancer, each having many more subtypes with aberrations being unique to each. In the past two decades, the widespread application of high-throughput genomic technologies, such as micro-arrays and next-generation sequencing, has led to the revelation of many such aberrations. Known types and subtypes can be readily identified using gene-expression profiling and more importantly, high-throughput genomic datasets have helped identify novel sub-types with distinct signatures. Recent studies showing usage of gene-expression profiling in clinical decision making in breast cancer patients underscore the utility of high-throughput datasets. Beyond prognosis, understanding the underlying cellular processes is essential for effective cancer treatment. Various high-throughput techniques are now available to look at a particular aspect of a genetic mechanism in cancer tissue. To look at these mechanisms individually is akin to looking at a broken watch; taking apart each of its parts, looking at them individually and finally making a list of all the faulty ones. Integrative approaches are needed to transform one-dimensional cancer signatures into multi-dimensional interaction and regulatory networks, consequently bettering our understanding of cellular processes in cancer. Here, I attempt to (i) address ways to effectively identify high quality variants when multiple assays on the same sample samples are available through two novel tools, snpSniffer and NGSPE; (ii) glean new biological insight into multiple myeloma through two novel integrative analysis approaches making use of disparate high-throughput datasets. While these methods focus on multiple myeloma datasets, the informatics approaches are applicable to all cancer datasets and will thus help advance cancer genomics.
ContributorsYellapantula, Venkata (Author) / Dinu, Valentin (Thesis advisor) / Scotch, Matthew (Committee member) / Wallstrom, Garrick (Committee member) / Keats, Jonathan (Committee member) / Arizona State University (Publisher)
Created2014
154070-Thumbnail Image.png
Description
No two cancers are alike. Cancer is a dynamic and heterogeneous disease, such heterogeneity arise among patients with the same cancer type, among cancer cells within the same individual’s tumor and even among cells within the same sub-clone over time. The recent application of next-generation sequencing and precision medicine techniques

No two cancers are alike. Cancer is a dynamic and heterogeneous disease, such heterogeneity arise among patients with the same cancer type, among cancer cells within the same individual’s tumor and even among cells within the same sub-clone over time. The recent application of next-generation sequencing and precision medicine techniques is the driving force to uncover the complexity of cancer and the best clinical practice. The core concept of precision medicine is to move away from crowd-based, best-for-most treatment and take individual variability into account when optimizing the prevention and treatment strategies. Next-generation sequencing is the method to sift through the entire 3 billion letters of each patient’s DNA genetic code in a massively parallel fashion.

The deluge of next-generation sequencing data nowadays has shifted the bottleneck of cancer research from multiple “-omics” data collection to integrative analysis and data interpretation. In this dissertation, I attempt to address two distinct, but dependent, challenges. The first is to design specific computational algorithms and tools that can process and extract useful information from the raw data in an efficient, robust, and reproducible manner. The second challenge is to develop high-level computational methods and data frameworks for integrating and interpreting these data. Specifically, Chapter 2 presents a tool called Snipea (SNv Integration, Prioritization, Ensemble, and Annotation) to further identify, prioritize and annotate somatic SNVs (Single Nucleotide Variant) called from multiple variant callers. Chapter 3 describes a novel alignment-based algorithm to accurately and losslessly classify sequencing reads from xenograft models. Chapter 4 describes a direct and biologically motivated framework and associated methods for identification of putative aberrations causing survival difference in GBM patients by integrating whole-genome sequencing, exome sequencing, RNA-Sequencing, methylation array and clinical data. Lastly, chapter 5 explores longitudinal and intratumor heterogeneity studies to reveal the temporal and spatial context of tumor evolution. The long-term goal is to help patients with cancer, particularly those who are in front of us today. Genome-based analysis of the patient tumor can identify genomic alterations unique to each patient’s tumor that are candidate therapeutic targets to decrease therapy resistance and improve clinical outcome.
ContributorsPeng, Sen (Author) / Dinu, Valentin (Thesis advisor) / Scotch, Matthew (Committee member) / Wallstrom, Garrick (Committee member) / Arizona State University (Publisher)
Created2015
132806-Thumbnail Image.png
Description
The 2017-2018 Influenza season was marked by the death of 80,000 Americans: the highest flu-related death toll in a decade. Further, the yearly economic toll to the US healthcare system and society is on the order of tens of billions of dollars. It is vital that we gain a better

The 2017-2018 Influenza season was marked by the death of 80,000 Americans: the highest flu-related death toll in a decade. Further, the yearly economic toll to the US healthcare system and society is on the order of tens of billions of dollars. It is vital that we gain a better understanding of the dynamics of influenza transmission in order to prevent its spread. Viral DNA sequences examined using bioinformatics methods offer a rich framework with which to monitor the evolution and spread of influenza for public health surveillance. To better understand the influenza epidemic during the severe 2017-2018 season, we established a passive surveillance system at Arizona State University’s Tempe Campus Health Services beginning in January 2018. From this system, nasopharyngeal samples screening positive for influenza were collected. Using these samples, molecular DNA sequences will be generated using a combined multiplex RT-PCR and NGS approach. Phylogenetic analysis will be used to infer the severity and temporal course of the 2017-2018 influenza outbreak on campus as well as the 2018-2019 flu season. Through this surveillance system, we will gain knowledge of the dynamics of influenza spread in a university setting and will use this information to inform public health strategies.
ContributorsMendoza, Lydia Marie (Author) / Scotch, Matthew (Thesis director) / Hogue, Brenda (Committee member) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)
Created2019-05
133652-Thumbnail Image.png
Description
Title: A Mobile Health Application for Tracking Patients' Health Record Abstract Background: Mobile Health (mHealth) has recently been adopted and used in rural communities in developing countries to improve the quality of healthcare in those areas. Some organizations use mHealth application to track pregnancy and provide routine checkups for pregnant

Title: A Mobile Health Application for Tracking Patients' Health Record Abstract Background: Mobile Health (mHealth) has recently been adopted and used in rural communities in developing countries to improve the quality of healthcare in those areas. Some organizations use mHealth application to track pregnancy and provide routine checkups for pregnant women. Other organizations use mHelath application to provide treatment and counseling services to HIV/AIDs patients, and others are using it to provide treatment and other health care services to the general populations in rural communities. One organization that is using mobile health to bring primary care for the first time in some of the rural communities of Liberia is Last Mile Health. Since 2015, the organization has trained community health assistants (CHAs) to use a mobile health platform called Data Collection Tools (DCTs). The CHAs use the DCT to collect health data, diagnose and treat patients, provide counseling and educational services to their communities, and for referring patients for further care. While it is true that the DCT has many great features, it currently has many limitations such as data storage, data processing, and many others. Objectives: The goals of this study was to 1. Explore some of the mobile health initiatives in developing countries and outline some of the important features of those initiatives. 2. Design a mobile health application (a new version of the Last Mile Health's DCT) that incorporates some of those features that were outlined in objective 1. Method: A comprehensive literature search in PubMed and Arizona State University (ASU) Library databases was conducted to retrieve publications between 2014 and 2017 that contained phrases like "mHealth design", "mHealth implementation" or "mHealth validation". For a publication to refer to mHealth, the publication had to contain the term "mHealth," or contains both the term "health" and one of the following terms: mobile phone, cellular phone, mobile device, text message device, mobile technology, mobile telemedicine, mobile monitoring device, interactive voice response device, or disease management device. Results: The search yielded a total of 1407 publications. Of those, 11 publications met the inclusion criteria and were therefore included in the study. All of the features described in the selected articles were important to the Last Mile Health, but due to issues such as internet accessibility and cellular coverage, only five of those features were selected to be incorporated in the new version of the Last Mile's mobile health system. Using a software called Configure.it, the new version of the Last Mile's mobile health system was built. This new system incorporated features such as user logs, QR code, reminder, simple API, and other features that were identified in the study. The new system also helps to address problems such as data storage and processing that are currently faced by the Last Mile Health organization.
ContributorsKarway, George K. (Author) / Scotch, Matthew (Thesis director) / Kaufman, David (Committee member) / Biomedical Informatics Program (Contributor) / Barrett, The Honors College (Contributor)
Created2018-05
133301-Thumbnail Image.png
Description
Phylogenetic analyses that were conducted in the past didn't have the ability or functionality to inform and implement useful public health decisions while using clustering. Models can be constructed to conduct any further analyses for the result of meaningful data to be used in the future of public health informatics.

Phylogenetic analyses that were conducted in the past didn't have the ability or functionality to inform and implement useful public health decisions while using clustering. Models can be constructed to conduct any further analyses for the result of meaningful data to be used in the future of public health informatics. A phylogenetic tree is considered one of the best ways for researchers to visualize and analyze the evolutionary history of a certain virus. The focus of this study was to research HIV phylodynamic and phylogenetic methods. This involved identifying the fast growing HIV transmission clusters and rates for certain risk groups in the US. In order to achieve these results an HIV database was required to retrieve real-time data for implementation, alignment software for multiple sequence alignment, Bayesian analysis software for the development and manipulation of models, and graphical tools for visualizing the output from the models created. This study began by conducting a literature review on HIV phylogeographies and phylodynamics. Sequence data was then obtained from a sequence database to be run in a multiple alignment software. The sequence that was obtained was unaligned which is why the alignment was required. Once the alignment was performed, the same file was loaded into a Bayesian analysis software for model creation of a phylogenetic tree. When the model was created, the tree was edited in a tree visualization software for the user to easily interpret. From this study the output of the tree resulted the way it did, due to a distant homology or the mixing of certain parameters. For a further continuation of this study, it would be interesting to use the same aligned sequence and use different model parameter selections for the initial creation of the model to see how the output changes. This is because one small change for the model parameter could greatly affect the output of the phylogenetic tree.
ContributorsNandan, Meghana (Author) / Scotch, Matthew (Thesis director) / Liu, Li (Committee member) / Biomedical Informatics Program (Contributor) / Barrett, The Honors College (Contributor)
Created2018-05
134821-Thumbnail Image.png
Description
Mosquito population data is a valuable resource for researchers and public health officials working to limit the spread of deadly zoonotic viruses such as Zika Virus and West Nile Virus. Unfortunately, this data is currently difficult to obtain and aggregate across the United States. Obtaining historical data often requires filing

Mosquito population data is a valuable resource for researchers and public health officials working to limit the spread of deadly zoonotic viruses such as Zika Virus and West Nile Virus. Unfortunately, this data is currently difficult to obtain and aggregate across the United States. Obtaining historical data often requires filing requests to individual States or Counties and hoping for a response. Current online systems available for accessing aggregated data are lacking essential features, or limited in scope. In order to make mosquito population data more accessible for United States researchers, epidemiologists, and public health officials, the MosquitoDB system has been developed. MosquitoDB consists of a JavaScript Web Application, connected to a SQL database, that makes submitting and retrieving United States mosquito population data much simpler and straight forward than alternative systems. The MosquitoDB software project is open source and publically available on GitHub, allowing community scrutiny and contributions to add or improve necessary features. For this Creative Project, the core MosquitoDB system was designed and developed with 3 main features: 1) Web Interface for querying mosquito data. 2) Web Interface for submitting mosquito data. 3) Web Services for querying/retrieving and submitting mosquito data. The Web Interface is essential for common end users, such as researchers and public health officials, to access historical data or submit new data. The Web Services provide building blocks for Web Applications that other developers can use to incorporate data into new applications. The current MosquitoDB system is live at https://zodo.asu.edu/mosquito and the public code repository is available at https://github.com/developerDemetri/mosquitodb.
ContributorsJones-Shargani, Demetrius Paul (Author) / Scotch, Matthew (Thesis director) / Weissenbacher, Davy (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)
Created2016-12
154663-Thumbnail Image.png
Description
Text mining of biomedical literature and clinical notes is a very active field of research in biomedical science. Semantic analysis is one of the core modules for different Natural Language Processing (NLP) solutions. Methods for calculating semantic relatedness of two concepts can be very useful in solutions solving different problems

Text mining of biomedical literature and clinical notes is a very active field of research in biomedical science. Semantic analysis is one of the core modules for different Natural Language Processing (NLP) solutions. Methods for calculating semantic relatedness of two concepts can be very useful in solutions solving different problems such as relationship extraction, ontology creation and question / answering [1–6]. Several techniques exist in calculating semantic relatedness of two concepts. These techniques utilize different knowledge sources and corpora. So far, researchers attempted to find the best hybrid method for each domain by combining semantic relatedness techniques and data sources manually. In this work, attempts were made to eliminate the needs for manually combining semantic relatedness methods targeting any new contexts or resources through proposing an automated method, which attempted to find the best combination of semantic relatedness techniques and resources to achieve the best semantic relatedness score in every context. This may help the research community find the best hybrid method for each context considering the available algorithms and resources.
ContributorsEmadzadeh, Ehsan (Author) / Gonzalez, Graciela (Thesis advisor) / Greenes, Robert (Committee member) / Scotch, Matthew (Committee member) / Arizona State University (Publisher)
Created2016
155110-Thumbnail Image.png
Description
Accurate quantitative information of tumor/lesion volume plays a critical role

in diagnosis and treatment assessment. The current clinical practice emphasizes on efficiency, but sacrifices accuracy (bias and precision). In the other hand, many computational algorithms focus on improving the accuracy, but are often time consuming and cumbersome to use. Not to

Accurate quantitative information of tumor/lesion volume plays a critical role

in diagnosis and treatment assessment. The current clinical practice emphasizes on efficiency, but sacrifices accuracy (bias and precision). In the other hand, many computational algorithms focus on improving the accuracy, but are often time consuming and cumbersome to use. Not to mention that most of them lack validation studies on real clinical data. All of these hinder the translation of these advanced methods from benchside to bedside.

In this dissertation, I present a user interactive image application to rapidly extract accurate quantitative information of abnormalities (tumor/lesion) from multi-spectral medical images, such as measuring brain tumor volume from MRI. This is enabled by a GPU level set method, an intelligent algorithm to learn image features from user inputs, and a simple and intuitive graphical user interface with 2D/3D visualization. In addition, a comprehensive workflow is presented to validate image quantitative methods for clinical studies.

This application has been evaluated and validated in multiple cases, including quantifying healthy brain white matter volume from MRI and brain lesion volume from CT or MRI. The evaluation studies show that this application has been able to achieve comparable results to the state-of-the-art computer algorithms. More importantly, the retrospective validation study on measuring intracerebral hemorrhage volume from CT scans demonstrates that not only the measurement attributes are superior to the current practice method in terms of bias and precision but also it is achieved without a significant delay in acquisition time. In other words, it could be useful to the clinical trials and clinical practice, especially when intervention and prognostication rely upon accurate baseline lesion volume or upon detecting change in serial lesion volumetric measurements. Obviously, this application is useful to biomedical research areas which desire an accurate quantitative information of anatomies from medical images. In addition, the morphological information is retained also. This is useful to researches which require an accurate delineation of anatomic structures, such as surgery simulation and planning.
ContributorsXue, Wenzhe (Author) / Kaufman, David (Thesis advisor) / Mitchell, J. Ross (Thesis advisor) / Johnson, William (Committee member) / Scotch, Matthew (Committee member) / Arizona State University (Publisher)
Created2016
154999-Thumbnail Image.png
Description
Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks such as pharmacovigilance via the use of Natural Language Processing (NLP) techniques. One of the critical steps in information extraction pipelines is Named Entity Recognition

Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks such as pharmacovigilance via the use of Natural Language Processing (NLP) techniques. One of the critical steps in information extraction pipelines is Named Entity Recognition (NER), where the mentions of entities such as diseases are located in text and their entity type are identified. However, the language in social media is highly informal, and user-expressed health-related concepts are often non-technical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and advanced machine learning-based NLP techniques have been underutilized. This work explores the effectiveness of different machine learning techniques, and particularly deep learning, to address the challenges associated with extraction of health-related concepts from social media. Deep learning has recently attracted a lot of attention in machine learning research and has shown remarkable success in several applications particularly imaging and speech recognition. However, thus far, deep learning techniques are relatively unexplored for biomedical text mining and, in particular, this is the first attempt in applying deep learning for health information extraction from social media.

This work presents ADRMine that uses a Conditional Random Field (CRF) sequence tagger for extraction of complex health-related concepts. It utilizes a large volume of unlabeled user posts for automatic learning of embedding cluster features, a novel application of deep learning in modeling the similarity between the tokens. ADRMine significantly improved the medical NER performance compared to the baseline systems.

This work also presents DeepHealthMiner, a deep learning pipeline for health-related concept extraction. Most of the machine learning methods require sophisticated task-specific manual feature design which is a challenging step in processing the informal and noisy content of social media. DeepHealthMiner automatically learns classification features using neural networks and utilizing a large volume of unlabeled user posts. Using a relatively small labeled training set, DeepHealthMiner could accurately identify most of the concepts, including the consumer expressions that were not observed in the training data or in the standard medical lexicons outperforming the state-of-the-art baseline techniques.
ContributorsNikfarjam, Azadeh (Author) / Gonzalez, Graciela (Thesis advisor) / Greenes, Robert (Committee member) / Scotch, Matthew (Committee member) / Arizona State University (Publisher)
Created2016