Search Content

Time efficient and quality effective K nearest neighbor search in high dimension space

Description

K-Nearest-Neighbors (KNN) search is a fundamental problem in many application domains such as database and data mining, information retrieval, machine learning, pattern recognition and plagiarism detection. Locality sensitive hash (LSH) is so far the most practical approximate KNN search algorithm for high dimensional data. Algorithms such as Multi-Probe LSH and…

K-Nearest-Neighbors (KNN) search is a fundamental problem in many application domains such as database and data mining, information retrieval, machine learning, pattern recognition and plagiarism detection. Locality sensitive hash (LSH) is so far the most practical approximate KNN search algorithm for high dimensional data. Algorithms such as Multi-Probe LSH and LSH-Forest improve upon the basic LSH algorithm by varying hash bucket size dynamically at query time, so these two algorithms can answer different KNN queries adaptively. However, these two algorithms need a data access post-processing step after candidates' collection in order to get the final answer to the KNN query. In this thesis, Multi-Probe LSH with data access post-processing (Multi-Probe LSH with DAPP) algorithm and LSH-Forest with data access post-processing (LSH-Forest with DAPP) algorithm are improved by replacing the costly data access post-processing (DAPP) step with a much faster histogram-based post-processing (HBPP). Two HBPP algorithms: LSH-Forest with HBPP and Multi- Probe LSH with HBPP are presented in this thesis, both of them achieve the three goals for KNN search in large scale high dimensional data set: high search quality, high time efficiency, high space efficiency. None of the previous KNN algorithms can achieve all three goals. More specifically, it is shown that HBPP algorithms can always achieve high search quality (as good as LSH-Forest with DAPP and Multi-Probe LSH with DAPP) with much less time cost (one to several orders of magnitude speedup) and same memory usage. It is also shown that with almost same time cost and memory usage, HBPP algorithms can always achieve better search quality than LSH-Forest with random pick (LSH-Forest with RP) and Multi-Probe LSH with random pick (Multi-Probe LSH with RP). Moreover, to achieve a very high search quality, Multi-Probe with HBPP is always a better choice than LSH-Forest with HBPP, regardless of the distribution, size and dimension number of the data set.

ContributorsYu, Renwei (Author) / Candan, Kasim S (Thesis advisor) / Sapino, Maria L (Committee member) / Chen, Yi (Committee member) / Sundaram, Hari (Committee member) / Arizona State University (Publisher)

Created2011

A study on home based Parkinson's disease monitoring and evaluation: design, development, and evaluation

Description

Parkinson's disease, the most prevalent movement disorder of the central nervous system, is a chronic condition that affects more than 1000,000 U.S. residents and about 3% of the population over the age of 65. The characteristic symptoms include tremors, bradykinesia, rigidity and impaired postural stability. Current therapy based on augmentation…

Parkinson's disease, the most prevalent movement disorder of the central nervous system, is a chronic condition that affects more than 1000,000 U.S. residents and about 3% of the population over the age of 65. The characteristic symptoms include tremors, bradykinesia, rigidity and impaired postural stability. Current therapy based on augmentation or replacement of dopamine is designed to improve patients' motor performance but often leads to levodopa-induced complications, such as dyskinesia and motor fluctuation. With the disease progress, clinicians must closely monitor patients' progress in order to identify any complications or decline in motor function as soon as possible in PD management. Unfortunately, current clinical assessment for Parkinson's is subjective and mostly influenced by brief observations during patient visits. Thus improvement or decline in patients' motor function in between visits is extremely difficult to assess. This may hamper clinicians while making informed decisions about the course of therapy for Parkinson's patients and could negatively impact clinical care. In this study we explored new approaches for PD assessment that aim to provide home-based PD assessment and monitoring. By extending the disease assessment to home, the healthcare burden on patients and their family can be reduced, and the disease progress can be more closely monitored by physicians. To achieve these aims, two novel approaches have been designed, developed and validated. The first approach is a questionnaire based self-evaluation metric, which estimate the PD severity through using self-evaluation score on pre-designed questions. Based on the results of the first approach, a smart phone based approach was invented. The approach takes advantage of the mobile computing technology and clinical decision support approach to evaluate the motor performance of patient daily activity and provide the longitudinal disease assessment and monitoring. Both approaches have been validated on recruited PD patients at the movement disorder program of Barrow Neurological Clinic (BNC) at St Joseph's Hospital and Medical Center. The results of validation tests showed favorable accuracy on detecting and assessing critical symptoms of PD, and shed light on promising future of implementing mobile platform based PD evaluation and monitoring tools to facilitate PD management.

ContributorsPan, Di (Author) / Petitti, Diana (Thesis advisor) / Greenes, Robert (Committee member) / Johnson, William (Committee member) / Dhall, Rohit (Committee member) / Arizona State University (Publisher)

Created2013

An informatics approach to establishing a sustainable public health community

Description

This work involved the analysis of a public health system, and the design, development and deployment of enterprise informatics architecture, and sustainable community methods to address problems with the current public health system. Specifically, assessment of the Nationally Notifiable Disease Surveillance System (NNDSS) was instrumental in forming the design of…

This work involved the analysis of a public health system, and the design, development and deployment of enterprise informatics architecture, and sustainable community methods to address problems with the current public health system. Specifically, assessment of the Nationally Notifiable Disease Surveillance System (NNDSS) was instrumental in forming the design of the current implementation at the Southern Nevada Health District (SNHD). The result of the system deployment at SNHD was considered as a basis for projecting the practical application and benefits of an enterprise architecture. This approach has resulted in a sustainable platform to enhance the practice of public health by improving the quality and timeliness of data, effectiveness of an investigation, and reporting across the continuum.

ContributorsKriseman, Jeffrey Michael (Author) / Dinu, Valentin (Thesis advisor) / Greenes, Robert (Committee member) / Johnson, William (Committee member) / Arizona State University (Publisher)

Created2012

Context-aware adaptive hybrid semantic relatedness in biomedical science

Description

Text mining of biomedical literature and clinical notes is a very active field of research in biomedical science. Semantic analysis is one of the core modules for different Natural Language Processing (NLP) solutions. Methods for calculating semantic relatedness of two concepts can be very useful in solutions solving different problems…

Text mining of biomedical literature and clinical notes is a very active field of research in biomedical science. Semantic analysis is one of the core modules for different Natural Language Processing (NLP) solutions. Methods for calculating semantic relatedness of two concepts can be very useful in solutions solving different problems such as relationship extraction, ontology creation and question / answering [1–6]. Several techniques exist in calculating semantic relatedness of two concepts. These techniques utilize different knowledge sources and corpora. So far, researchers attempted to find the best hybrid method for each domain by combining semantic relatedness techniques and data sources manually. In this work, attempts were made to eliminate the needs for manually combining semantic relatedness methods targeting any new contexts or resources through proposing an automated method, which attempted to find the best combination of semantic relatedness techniques and resources to achieve the best semantic relatedness score in every context. This may help the research community find the best hybrid method for each context considering the available algorithms and resources.

ContributorsEmadzadeh, Ehsan (Author) / Gonzalez, Graciela (Thesis advisor) / Greenes, Robert (Committee member) / Scotch, Matthew (Committee member) / Arizona State University (Publisher)

Created2016

Patient-centered and experience-aware mining for effective information discovery in health forums

Description

Online health forums provide a convenient channel for patients, caregivers, and medical professionals to share their experience, support and encourage each other, and form health communities. The fast growing content in health forums provides a large repository for people to seek valuable information. A forum user can issue a keyword…

Online health forums provide a convenient channel for patients, caregivers, and medical professionals to share their experience, support and encourage each other, and form health communities. The fast growing content in health forums provides a large repository for people to seek valuable information. A forum user can issue a keyword query to search health forums regarding to some specific questions, e.g., what treatments are effective for a disease symptom? A medical researcher can discover medical knowledge in a timely and large-scale fashion by automatically aggregating the latest evidences emerging in health forums.

This dissertation studies how to effectively discover information in health forums. Several challenges have been identified. First, the existing work relies on the syntactic information unit, such as a sentence, a post, or a thread, to bind different pieces of information in a forum. However, most of information discovery tasks should be based on the semantic information unit, a patient. For instance, given a keyword query that involves the relationship between a treatment and side effects, it is expected that the matched keywords refer to the same patient. In this work, patient-centered mining is proposed to mine patient semantic information units. In a patient information unit, the health information, such as diseases, symptoms, treatments, effects, and etc., is connected by the corresponding patient.

Second, the information published in health forums has varying degree of quality. Some information includes patient-reported personal health experience, while others can be hearsay. In this work, a context-aware experience extraction framework is proposed to mine patient-reported personal health experience, which can be used for evidence-based knowledge discovery or finding patients with similar experience.

At last, the proposed patient-centered and experience-aware mining framework is used to build a patient health information database for effectively discovering adverse drug reactions (ADRs) from health forums. ADRs have become a serious health problem and even a leading cause of death in the United States. Health forums provide valuable evidences in a large scale and in a timely fashion through the active participation of patients, caregivers, and doctors. Empirical evaluation shows the effectiveness of the proposed approach.

ContributorsLiu, Yunzhong (Author) / Chen, Yi (Thesis advisor) / Liu, Huan (Thesis advisor) / Li, Baoxin (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2016

Health information extraction from social media

Description

Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks such as pharmacovigilance via the use of Natural Language Processing (NLP) techniques. One of the critical steps in information extraction pipelines is Named Entity Recognition…

Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks such as pharmacovigilance via the use of Natural Language Processing (NLP) techniques. One of the critical steps in information extraction pipelines is Named Entity Recognition (NER), where the mentions of entities such as diseases are located in text and their entity type are identified. However, the language in social media is highly informal, and user-expressed health-related concepts are often non-technical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and advanced machine learning-based NLP techniques have been underutilized. This work explores the effectiveness of different machine learning techniques, and particularly deep learning, to address the challenges associated with extraction of health-related concepts from social media. Deep learning has recently attracted a lot of attention in machine learning research and has shown remarkable success in several applications particularly imaging and speech recognition. However, thus far, deep learning techniques are relatively unexplored for biomedical text mining and, in particular, this is the first attempt in applying deep learning for health information extraction from social media.

This work presents ADRMine that uses a Conditional Random Field (CRF) sequence tagger for extraction of complex health-related concepts. It utilizes a large volume of unlabeled user posts for automatic learning of embedding cluster features, a novel application of deep learning in modeling the similarity between the tokens. ADRMine significantly improved the medical NER performance compared to the baseline systems.

This work also presents DeepHealthMiner, a deep learning pipeline for health-related concept extraction. Most of the machine learning methods require sophisticated task-specific manual feature design which is a challenging step in processing the informal and noisy content of social media. DeepHealthMiner automatically learns classification features using neural networks and utilizing a large volume of unlabeled user posts. Using a relatively small labeled training set, DeepHealthMiner could accurately identify most of the concepts, including the consumer expressions that were not observed in the training data or in the standard medical lexicons outperforming the state-of-the-art baseline techniques.

ContributorsNikfarjam, Azadeh (Author) / Gonzalez, Graciela (Thesis advisor) / Greenes, Robert (Committee member) / Scotch, Matthew (Committee member) / Arizona State University (Publisher)

Created2016

Surgical instrument reprocessing in a hospital setting analyzed with statistical process control and data mining techniques

Description

In a healthcare setting, the Sterile Processing Department (SPD) provides ancillary services to the Operating Room (OR), Emergency Room, Labor & Delivery, and off-site clinics. SPD's function is to reprocess reusable surgical instruments and return them to their home departments. The management of surgical instruments and medical devices can impact…

In a healthcare setting, the Sterile Processing Department (SPD) provides ancillary services to the Operating Room (OR), Emergency Room, Labor & Delivery, and off-site clinics. SPD's function is to reprocess reusable surgical instruments and return them to their home departments. The management of surgical instruments and medical devices can impact patient safety and hospital revenue. Any time instrumentation or devices are not available or are not fit for use, patient safety and revenue can be negatively impacted. One step of the instrument reprocessing cycle is sterilization. Steam sterilization is the sterilization method used for the majority of surgical instruments and is preferred to immediate use steam sterilization (IUSS) because terminally sterilized items can be stored until needed. IUSS Items must be used promptly and cannot be stored for later use. IUSS is intended for emergency situations and not as regular course of action. Unfortunately, IUSS is used to compensate for inadequate inventory levels, scheduling conflicts, and miscommunications. If IUSS is viewed as an adverse event, then monitoring IUSS incidences can help healthcare organizations meet patient safety goals and financial goals along with aiding in process improvement efforts. This work recommends statistical process control methods to IUSS incidents and illustrates the use of control charts for IUSS occurrences through a case study and analysis of the control charts for data from a health care provider. Furthermore, this work considers the application of data mining methods to IUSS occurrences and presents a representative example of data mining to the IUSS occurrences. This extends the application of statistical process control and data mining in healthcare applications.

ContributorsWeart, Gail (Author) / Runger, George C. (Thesis advisor) / Li, Jing (Committee member) / Shunk, Dan (Committee member) / Arizona State University (Publisher)

Created2014

Patient-centered health information technology: engagement with the plan of care among older adults with multi-morbidities

Description

A core principle in multiple national quality improvement strategies is the engagement of chronically ill patients in the creation and execution of their treatment plans. Numerous initiatives are underway to use health information technology (HIT) to support patient engagement however the use of HIT and other factors such as health…

A core principle in multiple national quality improvement strategies is the engagement of chronically ill patients in the creation and execution of their treatment plans. Numerous initiatives are underway to use health information technology (HIT) to support patient engagement however the use of HIT and other factors such as health literacy may be significant barriers to engagement for older adults. This qualitative descriptive study sought to explore the ways that older adults with multi-morbidities engaged with their plan of care. Forty participants were recruited through multiple case sampling from two ambulatory cardiology practices. Participants were English-speaking, without a dementia-related diagnosis, and between the ages of 65 and 86. The older adults in this study performed many behaviors to engage in the plan of care, including acting in ways to support health, managing health-related information, attending routine visits with their doctors, and participating in treatment planning. A subset of patients engaged in active decision-making because of the point they were at in their chronic disease. At that cross roads, they expressed uncertainly over which road to travel. Two factors influenced the engagement of older adults: a relationship with the provider that met the patient's needs, and the distribution of a Meaningful Use clinical summary at the conclusion of the provider visit. Participants described the ways in which the clinical summary helped and hindered their understanding of the care plan.

Insights gained as a result of this study include an understanding of the discrepancies between what the healthcare system expects of patients and their actual behavior when it comes to the creation of a care plan and the ways in which they take care of their health. Further research should examine the ability of various factors to enhance patient engagement. For example, it may be useful to focus on ways to improve the clinical summary to enhance engagement with the care plan and meet standards for a health literate document. Recommendations for the improvement of the clinical summary are provided. Finally, this study explored potential reasons for the infrequent use of online health information by older adults including the trusting relationship they enjoyed with their cardiologist.

ContributorsJiggins Colorafi, Karen (Author) / Lamb, Gerri (Thesis advisor) / Marek, Karen (Committee member) / Greenes, Robert (Committee member) / Evans, Bronwynne (Committee member) / Arizona State University (Publisher)

Created2015

Efficient processing of skyline queries on static data sources, data streams and incomplete datasets

Description

Skyline queries extract interesting points that are non-dominated and help paint the bigger picture of the data in question. They are valuable in many multi-criteria decision applications and are becoming a staple of decision support systems.

An assumption commonly made by many skyline algorithms is that a skyline query is applied…

Skyline queries extract interesting points that are non-dominated and help paint the bigger picture of the data in question. They are valuable in many multi-criteria decision applications and are becoming a staple of decision support systems.

An assumption commonly made by many skyline algorithms is that a skyline query is applied to a single static data source or data stream. Unfortunately, this assumption does not hold in many applications in which a skyline query may involve attributes belonging to multiple data sources and requires a join operation to be performed before the skyline can be produced. Recently, various skyline-join algorithms have been proposed to address this problem in the context of static data sources. However, these algorithms suffer from several drawbacks: they often need to scan the data sources exhaustively to obtain the skyline-join results; moreover, the pruning techniques employed to eliminate tuples are largely based on expensive tuple-to-tuple comparisons. On the other hand, most data stream techniques focus on single stream skyline queries, thus rendering them unsuitable for skyline-join queries.

Another assumption typically made by most of the earlier skyline algorithms is that the data is complete and all skyline attribute values are available. Due to this constraint, these algorithms cannot be applied to incomplete data sources in which some of the attribute values are missing and are represented by NULL values. There exists a definition of dominance for incomplete data, but this leads to undesirable consequences such as non-transitive and cyclic dominance relations both of which are detrimental to skyline processing.

Based on the aforementioned observations, the main goal of the research described in this dissertation is the design and development of a framework of skyline operators that effectively handles three distinct types of skyline queries: 1) skyline-join queries on static data sources, 2) skyline-window-join queries over data streams, and 3) strata-skyline queries on incomplete datasets. This dissertation presents the unique challenges posed by these skyline queries and addresses the shortcomings of current skyline techniques by proposing efficient methods to tackle the added overhead in processing skyline queries on static data sources, data streams, and incomplete datasets.

ContributorsNagendra, Mithila (Author) / Candan, Kasim Selcuk (Thesis advisor) / Chen, Yi (Committee member) / Davulcu, Hasan (Committee member) / Silva, Yasin N. (Committee member) / Sundaram, Hari (Committee member) / Arizona State University (Publisher)

Created2014

Filtering by