Search Content

Robust margin based classifiers for small sample data

Description

In many classication problems data samples cannot be collected easily, example in drug trials, biological experiments and study on cancer patients. In many situations the data set size is small and there are many outliers. When classifying such data, example cancer vs normal patients the consequences of mis-classication are probably…

In many classication problems data samples cannot be collected easily, example in drug trials, biological experiments and study on cancer patients. In many situations the data set size is small and there are many outliers. When classifying such data, example cancer vs normal patients the consequences of mis-classication are probably more important than any other data type, because the data point could be a cancer patient or the classication decision could help determine what gene might be over expressed and perhaps a cause of cancer. These mis-classications are typically higher in the presence of outlier data points. The aim of this thesis is to develop a maximum margin classier that is suited to address the lack of robustness of discriminant based classiers (like the Support Vector Machine (SVM)) to noise and outliers. The underlying notion is to adopt and develop a natural loss function that is more robust to outliers and more representative of the true loss function of the data. It is demonstrated experimentally that SVM's are indeed susceptible to outliers and that the new classier developed, here coined as Robust-SVM (RSVM), is superior to all studied classier on the synthetic datasets. It is superior to the SVM in both the synthetic and experimental data from biomedical studies and is competent to a classier derived on similar lines when real life data examples are considered.

ContributorsGupta, Sidharth (Author) / Kim, Seungchan (Thesis advisor) / Welfert, Bruno (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2011

Informatics approach to improving surgical skills training

Description

Surgery as a profession requires significant training to improve both clinical decision making and psychomotor proficiency. In the medical knowledge domain, tools have been developed, validated, and accepted for evaluation of surgeons' competencies. However, assessment of the psychomotor skills still relies on the Halstedian model of apprenticeship, wherein surgeons are…

Surgery as a profession requires significant training to improve both clinical decision making and psychomotor proficiency. In the medical knowledge domain, tools have been developed, validated, and accepted for evaluation of surgeons' competencies. However, assessment of the psychomotor skills still relies on the Halstedian model of apprenticeship, wherein surgeons are observed during residency for judgment of their skills. Although the value of this method of skills assessment cannot be ignored, novel methodologies of objective skills assessment need to be designed, developed, and evaluated that augment the traditional approach. Several sensor-based systems have been developed to measure a user's skill quantitatively, but use of sensors could interfere with skill execution and thus limit the potential for evaluating real-life surgery. However, having a method to judge skills automatically in real-life conditions should be the ultimate goal, since only with such features that a system would be widely adopted. This research proposes a novel video-based approach for observing surgeons' hand and surgical tool movements in minimally invasive surgical training exercises as well as during laparoscopic surgery. Because our system does not require surgeons to wear special sensors, it has the distinct advantage over alternatives of offering skills assessment in both learning and real-life environments. The system automatically detects major skill-measuring features from surgical task videos using a computing system composed of a series of computer vision algorithms and provides on-screen real-time performance feedback for more efficient skill learning. Finally, the machine-learning approach is used to develop an observer-independent composite scoring model through objective and quantitative measurement of surgical skills. To increase effectiveness and usability of the developed system, it is integrated with a cloud-based tool, which automatically assesses surgical videos upload to the cloud.

ContributorsIslam, Gazi (Author) / Li, Baoxin (Thesis advisor) / Liang, Jianming (Thesis advisor) / Dinu, Valentin (Committee member) / Greenes, Robert (Committee member) / Smith, Marshall (Committee member) / Kahol, Kanav (Committee member) / Patel, Vimla L. (Committee member) / Arizona State University (Publisher)

Created2013

miRNA Targeting: In depth review of biologically significant mechanisms and a bioinformatic approach to identifying targeting sequences in C. elegans

Description

microRNAs (miRNAs) are short ~22nt non-coding RNAs that regulate gene output at the post-transcriptional level. Via targeting of degenerate elements primarily in 3'untranslated regions (3'UTR) of mRNAs, miRNAs can target thousands of varying genes and suppress their protein translation. The precise mechanistic function and bio- logical role of miRNAs is…

microRNAs (miRNAs) are short ~22nt non-coding RNAs that regulate gene output at the post-transcriptional level. Via targeting of degenerate elements primarily in 3'untranslated regions (3'UTR) of mRNAs, miRNAs can target thousands of varying genes and suppress their protein translation. The precise mechanistic function and bio- logical role of miRNAs is not fully understood and yet it is a major contributor to a pleth- ora of diseases, including neurological disorders, muscular disorders, and cancer. Cer- tain model organisms are valuable in understanding the function of miRNA and there- fore fully understanding the biological significance of miRNA targeting. Here I report a mechanistic analysis of miRNA targeting in C. elegans, and a bioinformatic approach to aid in further investigation of miRNA targeted sequences. A few of the biologically significant mechanisms discussed in this thesis include alternative polyadenylation, RNA binding proteins, components of the miRNA recognition machinery, miRNA secondary structures, and their polymorphisms. This thesis also discusses a novel bioinformatic approach to studying miRNA biology, including computational miRNA target prediction software, and sequence complementarity. This thesis allows a better understanding of miRNA biology and presents an ideal strategy for approaching future research in miRNA targeting.

ContributorsWeigele, Dustin Keith (Author) / Mangone, Marco (Thesis director) / Katchman, Benjamin (Committee member) / Barrett, The Honors College (Contributor) / Department of Chemistry and Biochemistry (Contributor) / School of Life Sciences (Contributor)

Created2014-12

Preliminary Metabolic Reconstruction of Two Methane Producing Microbes: Methanoregula boonei 6A8 and Methanosphaerula palustris E1-9c

Description

Methane (CH4) is very important in the environment as it is a greenhouse gas and important for the degradation of organic matter. During the last 200 years the atmospheric concentration of CH4 has tripled. Methanogens are methane-producing microbes from the Archaea domain that complete the final step in breaking down…

Methane (CH4) is very important in the environment as it is a greenhouse gas and important for the degradation of organic matter. During the last 200 years the atmospheric concentration of CH4 has tripled. Methanogens are methane-producing microbes from the Archaea domain that complete the final step in breaking down organic matter to generate methane through a process called methanogenesis. They contribute to about 74% of the CH4 present on the Earth's atmosphere, producing 1 billion tons of methane annually. The purpose of this work is to generate a preliminary metabolic reconstruction model of two methanogens: Methanoregula boonei 6A8 and Methanosphaerula palustris E1-9c. M. boonei and M. palustris are part of the Methanomicrobiales order and perform hydrogenotrophic methanogenesis, which means that they reduce CO2 to CH4 by using H2 as their major electron donor. Metabolic models are frameworks for understanding a cell as a system and they provide the means to assess the changes in gene regulation in response in various environmental and physiological constraints. The Pathway-Tools software v16 was used to generate these draft models. The models were manually curated using literature searches, the KEGG database and homology methods with the Methanosarcina acetivorans strain, the closest methanogen strain with a nearly complete metabolic reconstruction. These preliminary models attempt to complete the pathways required for amino acid biosynthesis, methanogenesis, and major cofactors related to methanogenesis. The M. boonei reconstruction currently includes 99 pathways and has 82% of its reactions completed, while the M. palustris reconstruction includes 102 pathways and has 89% of its reactions completed.

ContributorsMahendra, Divya (Author) / Cadillo-Quiroz, Hinsby (Thesis director) / Wang, Xuan (Committee member) / Stout, Valerie (Committee member) / Barrett, The Honors College (Contributor) / Computing and Informatics Program (Contributor) / School of Life Sciences (Contributor) / Biomedical Informatics Program (Contributor)

Created2014-05

Transcriptome gene expression analysis of breast cancer using RNA-Seq

Description

Background: Breast cancer is the most frequently diagnosed cancer and the leading cause of cancer deaths in females worldwide, accounting for 23% of all new cancer cases and 14% of all total cancer deaths in 2008. Five tumor-normal pairs of primary breast epithelial cells were treated for infinite proliferation by…

Background: Breast cancer is the most frequently diagnosed cancer and the leading cause of cancer deaths in females worldwide, accounting for 23% of all new cancer cases and 14% of all total cancer deaths in 2008. Five tumor-normal pairs of primary breast epithelial cells were treated for infinite proliferation by using a ROCK inhibitor and mouse feeder cells. Methods: Raw paired-end, 100x coverage RNA-Seq data was aligned to the Human Reference Genome Version 19 using BWA and Tophat. Gene differential expression analysis was completed using Cufflinks and Cuffdiff. Interactive Genome Viewer was used for data visualization. Results: 15 genes were found to be down-regulated by at least one log-fold change in 4/5 of tumor samples. 75 genes were found to be down-regulated in 3/5 of our tumor samples by at least one log-fold change. 11 genes were found to be up-regulated in 4/5 of our tumor samples, and 68 genes were identified to be up-regulated in 3/5 of the tumor samples by at least one-fold change. Conclusion: Expression changes in genes such as AZGP1, AGER, ALG11, and S1007 suggest a disruption in the glycosylation pathway. No correlation was found between Cufflink's Her2 gene-expression and DAKO score classification.

ContributorsHernandez, Fernando (Author) / Anderson, Karen (Thesis director) / Mangone, Marco (Committee member) / Park, Jin (Committee member) / Barrett, The Honors College (Contributor) / Department of Information Systems (Contributor)

Created2013-05

BioEve: user interface framework bridging IE and IR

Description

Continuous advancements in biomedical research have resulted in the production of vast amounts of scientific data and literature discussing them. The ultimate goal of computational biology is to translate these large amounts of data into actual knowledge of the complex biological processes and accurate life science models. The ability to…

Continuous advancements in biomedical research have resulted in the production of vast amounts of scientific data and literature discussing them. The ultimate goal of computational biology is to translate these large amounts of data into actual knowledge of the complex biological processes and accurate life science models. The ability to rapidly and effectively survey the literature is necessary for the creation of large scale models of the relationships among biomedical entities as well as hypothesis generation to guide biomedical research. To reduce the effort and time spent in performing these activities, an intelligent search system is required. Even though many systems aid in navigating through this wide collection of documents, the vastness and depth of this information overload can be overwhelming. An automated extraction system coupled with a cognitive search and navigation service over these document collections would not only save time and effort, but also facilitate discovery of the unknown information implicitly conveyed in the texts. This thesis presents the different approaches used for large scale biomedical named entity recognition, and the challenges faced in each. It also proposes BioEve: an integrative framework to fuse a faceted search with information extraction to provide a search service that addresses the user's desire for "completeness" of the query results, not just the top-ranked ones. This information extraction system enables discovery of important semantic relationships between entities such as genes, diseases, drugs, and cell lines and events from biomedical text on MEDLINE, which is the largest publicly available database of the world's biomedical journal literature. It is an innovative search and discovery service that makes it easier to search
avigate and discover knowledge hidden in life sciences literature. To demonstrate the utility of this system, this thesis also details a prototype enterprise quality search and discovery service that helps researchers with a guided step-by-step query refinement, by suggesting concepts enriched in intermediate results, and thereby facilitating the "discover more as you search" paradigm.

ContributorsKanwar, Pradeep (Author) / Davulcu, Hasan (Thesis advisor) / Dinu, Valentin (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2010

A review of pathway-based visualization and quantification analysis tools using microarray data

Description

Pathway analysis helps researchers gain insight into the biology behind gene expression-based data. By applying this data to known biological pathways, we can learn about mutations or other changes in cellular function, such as those seen in cancer. There are many tools that can be used to analyze pathways; however,…

Pathway analysis helps researchers gain insight into the biology behind gene expression-based data. By applying this data to known biological pathways, we can learn about mutations or other changes in cellular function, such as those seen in cancer. There are many tools that can be used to analyze pathways; however, it can be difficult to find and learn about the which tool is optimal for use in a certain experiment. This thesis aims to comprehensively review four tools, Cytoscape, PaxtoolsR, PathOlogist, and Reactome, and their role in pathway analysis. This is done by applying a known microarray data set to each tool and testing their different functions. The functions of these programs will then be analyzed to determine their roles in learning about biology and assisting new researchers with their experiments. It was found that each tools holds a very unique and important role in pathway analysis. Visualization pathways have the role of exploring individual pathways and interpreting genomic results. Quantification pathways use statistical tests to determine pathway significance. Together one can find pathways of interest and then explore areas of interest.

ContributorsRehling, Thomas Evan (Author) / Buetow, Kenneth (Thesis director) / Wilson, Melissa (Committee member) / School of Life Sciences (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Differential Gene Expression in Type II Diabetes

Description

This research project investigated known and novel differential genetic variants and their associated molecular pathways involved in Type II diabetes mellitus for the purpose of improving diagnosis and treatment methods. The goal of this investigation was to 1) identify the genetic variants and SNPs in Type II diabetes to develo…

This research project investigated known and novel differential genetic variants and their associated molecular pathways involved in Type II diabetes mellitus for the purpose of improving diagnosis and treatment methods. The goal of this investigation was to 1) identify the genetic variants and SNPs in Type II diabetes to develop a gene regulatory pathway, and 2) utilize this pathway to determine suitable drug therapeutics for prevention and treatment. Using a Gene Set Enrichment Analysis (GSEA), a set of 1000 gene identifiers from a Mayo Clinic database was analyzed to determine the most significant genetic variants related to insulin signaling pathways involved in Type II Diabetes. The following genes were identified: NRAS, KRAS, PIK3CA, PDE3B, TSC1, AKT3, SOS1, NEU1, PRKAA2, AMPK, and ACC. In an extensive literature review and cross-analysis with Kegg and Reactome pathway databases, novel SNPs located on these gene variants were identified and used to determine suitable drug therapeutics for treatment. Overall, understanding how genetic mutations affect target gene function related to Type II Diabetes disease pathology is crucial to the development of effective diagnosis and treatment. This project provides new insight into the molecular basis of the Type II Diabetes, serving to help untangle the regulatory complexity of the disease and aid in the advancement of diagnosis and treatment. Keywords: Type II Diabetes mellitus, Gene Set Enrichment Analysis, genetic variants, KEGG Insulin Pathway, gene-regulatory pathway

ContributorsBucklin, Lindsay (Co-author) / Davis, Vanessa (Co-author) / Holechek, Susan (Thesis director) / Wang, Junwen (Committee member) / Nyarige, Verah (Committee member) / School of Human Evolution & Social Change (Contributor) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

A Curation of the Callithrix penicillata Draft Genome

Description

Callithrix penicillata, also known as the Black-tufted marmoset primarily lives in the Brazilian highlands and has had little research conducted on it. For this project I performed a genome curation on the newly assembled genome of this species. The scaffolds obtained by the Dovetail Genomics reads were organized and labeled…

Callithrix penicillata, also known as the Black-tufted marmoset primarily lives in the Brazilian highlands and has had little research conducted on it. For this project I performed a genome curation on the newly assembled genome of this species. The scaffolds obtained by the Dovetail Genomics reads were organized and labeled into chromosomes using the 2014 Callithrix jacchus genome as a reference. Then, using that same genome as a reference, 13 of the chromosomes were reverse complimented to be continuous with the 2014 Callithrix jacchus genome. The N50 statistics of the assembly were calculated and found to be 124 Mb. Quality scores were run for the final genome using referee and visualized with a bar plot, with 99% of sites scoring above 0. Heterozygosity was also calculated and found to be 0.3%. Finally, the final version of the genome was visually compared to the 2017 Callithrix jacchus genome and the GRCh38 human genome. This genome was submitted to the NCBIs database to await further approval.

ContributorsJohnson, Joelle Genevieve (Author) / Cartwright, Reed (Thesis director) / Stone, Anne (Committee member) / School of Molecular Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2019-12

Evaluating variant calling best practices

Description

Analyzing human DNA sequence data allows researchers to identify variants associated with disease, reconstruct the demographic histories of human populations, and further understand the structure and function of the genome. Identifying variants in whole genome sequences is a crucial bioinformatics step in sequence data processing and can be performed using…

Analyzing human DNA sequence data allows researchers to identify variants associated with disease, reconstruct the demographic histories of human populations, and further understand the structure and function of the genome. Identifying variants in whole genome sequences is a crucial bioinformatics step in sequence data processing and can be performed using multiple approaches. To investigate the consistency between different bioinformatics methods, we compared the accuracy and sensitivity of two genotyping strategies, joint variant calling and single-sample variant calling. Autosomal and sex chromosome variant call sets were produced by joint and single-sample calling variants for 10 female individuals. The accuracy of variant calls was assessed using SNP array genotype data collected from each individual. To compare the ability of joint and single-sample calling to capture low-frequency variants, folded site frequency spectra were constructed from variant call sets. To investigate the potential for these different variant calling methods to impact downstream analyses, we estimated nucleotide diversity for call sets produced using each approach. We found that while both methods were equally accurate when validated by SNP array sites, single-sample calling identified a greater number of singletons. However, estimates of nucleotide diversity were robust to these differences in the site frequency spectrum between call sets. Our results suggest that despite single-sample calling’s greater sensitivity for low-frequency variants, the differences between approaches have a minimal effect on downstream analyses. While joint calling may be a more efficient approach for genotyping many samples, in situations that preclude large sample sizes, our study suggests that single-sample calling is a suitable alternative.

ContributorsHowell, Emma (Co-author) / Wilson, Melissa (Thesis director) / Stone, Anne (Committee member) / Phung, Tanya (Committee member) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Filtering by