Matching Items (41)

129063-Thumbnail Image.png

Disease Gene Prioritization by Integrating Tissue-Specific Molecular Networks Using a Robust Multi-Network Model

Description

Background: Accurately prioritizing candidate disease genes is an important and challenging problem. Various network-based methods have been developed to predict potential disease genes by utilizing the disease similarity network and molecular networks such as protein interaction or gene co-expression networks.

Background: Accurately prioritizing candidate disease genes is an important and challenging problem. Various network-based methods have been developed to predict potential disease genes by utilizing the disease similarity network and molecular networks such as protein interaction or gene co-expression networks. Although successful, a common limitation of the existing methods is that they assume all diseases share the same molecular network and a single generic molecular network is used to predict candidate genes for all diseases. However, different diseases tend to manifest in different tissues, and the molecular networks in different tissues are usually different. An ideal method should be able to incorporate tissue-specific molecular networks for different diseases.

Results: In this paper, we develop a robust and flexible method to integrate tissue-specific molecular networks for disease gene prioritization. Our method allows each disease to have its own tissue-specific network(s). We formulate the problem of candidate gene prioritization as an optimization problem based on network propagation. When there are multiple tissue-specific networks available for a disease, our method can automatically infer the relative importance of each tissue-specific network. Thus it is robust to the noisy and incomplete network data. To solve the optimization problem, we develop fast algorithms which have linear time complexities in the number of nodes in the molecular networks. We also provide rigorous theoretical foundations for our algorithms in terms of their optimality and convergence properties. Extensive experimental results show that our method can significantly improve the accuracy of candidate gene prioritization compared with the state-of-the-art methods.

Conclusions: In our experiments, we compare our methods with 7 popular network-based disease gene prioritization algorithms on diseases from Online Mendelian Inheritance in Man (OMIM) database. The experimental results demonstrate that our methods recover true associations more accurately than other methods in terms of AUC values, and the performance differences are significant (with paired t-test p-values less than 0.05). This validates the importance to integrate tissue-specific molecular networks for studying disease gene prioritization and show the superiority of our network models and ranking algorithms toward this purpose. The source code and datasets are available at http:/ijingchao.github.io/CRstar/.

Contributors

Agent

Created

Date Created
2016-11-10

Large-Scale Off-Target Identification Using Fast and Accurate Dual Regularized One-Class Collaborative Filtering and Its Application to Drug Repurposing

Description

Target-based screening is one of the major approaches in drug discovery. Besides the intended target, unexpected drug off-target interactions often occur, and many of them have not been recognized and characterized. The off-target interactions can be responsible for either therapeutic

Target-based screening is one of the major approaches in drug discovery. Besides the intended target, unexpected drug off-target interactions often occur, and many of them have not been recognized and characterized. The off-target interactions can be responsible for either therapeutic or side effects. Thus, identifying the genome-wide off-targets of lead compounds or existing drugs will be critical for designing effective and safe drugs, and providing new opportunities for drug repurposing. Although many computational methods have been developed to predict drug-target interactions, they are either less accurate than the one that we are proposing here or computationally too intensive, thereby limiting their capability for large-scale off-target identification. In addition, the performances of most machine learning based algorithms have been mainly evaluated to predict off-target interactions in the same gene family for hundreds of chemicals. It is not clear how these algorithms perform in terms of detecting off-targets across gene families on a proteome scale.

Here, we are presenting a fast and accurate off-target prediction method, REMAP, which is based on a dual regularized one-class collaborative filtering algorithm, to explore continuous chemical space, protein space, and their interactome on a large scale. When tested in a reliable, extensive, and cross-gene family benchmark, REMAP outperforms the state-of-the-art methods. Furthermore, REMAP is highly scalable. It can screen a dataset of 200 thousands chemicals against 20 thousands proteins within 2 hours. Using the reconstructed genome-wide target profile as the fingerprint of a chemical compound, we predicted that seven FDA-approved drugs can be repurposed as novel anti-cancer therapies. The anti-cancer activity of six of them is supported by experimental evidences. Thus, REMAP is a valuable addition to the existing in silico toolbox for drug target identification, drug repurposing, phenotypic screening, and side effect prediction. The software and benchmark are available at https://github.com/hansaimlim/REMAP.

Contributors

Agent

Created

Date Created
2016-10-07

153901-Thumbnail Image.png

Online ET-LDA - joint modeling of events and their related tweets with online streaming data

Description

Micro-blogging platforms like Twitter have become some of the most popular sites for people to share and express their views and opinions about public events like debates, sports events or other news articles. These social updates by people complement the

Micro-blogging platforms like Twitter have become some of the most popular sites for people to share and express their views and opinions about public events like debates, sports events or other news articles. These social updates by people complement the written news articles or transcripts of events in giving the popular public opinion about these events. So it would be useful to annotate the transcript with tweets. The technical challenge is to align the tweets with the correct segment of the transcript. ET-LDA by Hu et al [9] addresses this issue by modeling the whole process with an LDA-based graphical model. The system segments the transcript into coherent and meaningful parts and also determines if a tweet is a general tweet about the event or it refers to a particular segment of the transcript. One characteristic of the Hu et al’s model is that it expects all the data to be available upfront and uses batch inference procedure. But in many cases we find that data is not available beforehand, and it is often streaming. In such cases it is infeasible to repeatedly run the batch inference algorithm. My thesis presents an online inference algorithm for the ET-LDA model, with a continuous stream of tweet data and compare their runtime and performance to existing algorithms.

Contributors

Agent

Created

Date Created
2015

153832-Thumbnail Image.png

Secure Mobile SDN

Description

The increasing usage of smart-phones and mobile devices in work environment and IT

industry has brought about unique set of challenges and opportunities. ARM architecture

in particular has evolved to a point where it supports implementations across wide spectrum

of performance points and

The increasing usage of smart-phones and mobile devices in work environment and IT

industry has brought about unique set of challenges and opportunities. ARM architecture

in particular has evolved to a point where it supports implementations across wide spectrum

of performance points and ARM based tablets and smart-phones are in demand. The

enhancements to basic ARM RISC architecture allow ARM to have high performance,

small code size, low power consumption and small silicon area. Users want their devices to

perform many tasks such as read email, play games, and run other online applications and

organizations no longer desire to provision and maintain individual’s IT equipment. The

term BYOD (Bring Your Own Device) has come into being from demand of such a work

setup and is one of the motivation of this research work. It brings many opportunities such

as increased productivity and reduced costs and challenges such as secured data access,

data leakage and amount of control by the organization.

To provision such a framework we need to bridge the gap from both organizations side

and individuals point of view. Mobile device users face issue of application delivery on

multiple platforms. For instance having purchased many applications from one proprietary

application store, individuals may want to move them to a different platform/device but

currently this is not possible. Organizations face security issues in providing such a solution

as there are many potential threats from allowing BYOD work-style such as unauthorized

access to data, attacks from the devices within and outside the network.

ARM based Secure Mobile SDN framework will resolve these issues and enable employees

to consolidate both personal and business calls and mobile data access on a single device.

To address application delivery issue we are introducing KVM based virtualization that

will allow host OS to run multiple guest OS. To address the security problem we introduce

SDN environment where host would be running bridged network of guest OS using Open

vSwitch . This would allow a remote controller to monitor the state of guest OS for making

important control and traffic flow decisions based on the situation.

Contributors

Agent

Created

Date Created
2015

153574-Thumbnail Image.png

Breaking hash-tag detection algorithm for social media (Twitter)

Description

In trading, volume is a measure of how much stock has been exchanged in a given period of time. Since every stock is distinctive and has an alternate measure of shares, volume can be contrasted with historical volume inside a

In trading, volume is a measure of how much stock has been exchanged in a given period of time. Since every stock is distinctive and has an alternate measure of shares, volume can be contrasted with historical volume inside a stock to spot changes. It is likewise used to affirm value patterns, breakouts, and spot potential reversals. In my thesis, I hypothesize that the concept of trading volume can be extrapolated to social media (Twitter).

The ubiquity of social media, especially Twitter, in financial market has been overly resonant in the past couple of years. With the growth of its (Twitter) usage by news channels, financial experts and pandits, the global economy does seem to hinge on 140 characters. By analyzing the number of tweets hash tagged to a stock, a strong relation can be established between the number of people talking about it, to the trading volume of the stock.

In my work, I overt this relation and find a state of the breakout when the volume goes beyond a characterized support or resistance level.

Contributors

Agent

Created

Date Created
2015

154769-Thumbnail Image.png

Directional prediction of stock prices using breaking news on Twitter

Description

Stock market news and investing tips are popular topics in Twitter. In this dissertation, first I utilize a 5-year financial news corpus comprising over 50,000 articles collected from the NASDAQ website matching the 30 stock symbols in Dow Jones Index

Stock market news and investing tips are popular topics in Twitter. In this dissertation, first I utilize a 5-year financial news corpus comprising over 50,000 articles collected from the NASDAQ website matching the 30 stock symbols in Dow Jones Index (DJI) to train a directional stock price prediction system based on news content. Next, I proceed to show that information in articles indicated by breaking Tweet volumes leads to a statistically significant boost in the hourly directional prediction accuracies for the DJI stock prices mentioned in these articles. Secondly, I show that using document-level sentiment extraction does not yield a statistically significant boost in the directional predictive accuracies in the presence of other 1-gram keyword features. Thirdly I test the performance of the system on several time-frames and identify the 4 hour time-frame for both the price charts and for Tweet breakout detection as the best time-frame combination. Finally, I develop a set of price momentum based trade exit rules to cut losing trades early and to allow the winning trades run longer. I show that the Tweet volume breakout based trading system with the price momentum based exit rules not only improves the winning accuracy and the return on investment, but it also lowers the maximum drawdown and achieves the highest overall return over maximum drawdown.

Contributors

Agent

Created

Date Created
2016

154545-Thumbnail Image.png

Automatic tracking of linguistic changes for monitoring cognitive-linguistic health

Description

Many neurological disorders, especially those that result in dementia, impact speech and language production. A number of studies have shown that there exist subtle changes in linguistic complexity in these individuals that precede disease onset. However, these studies are conducted

Many neurological disorders, especially those that result in dementia, impact speech and language production. A number of studies have shown that there exist subtle changes in linguistic complexity in these individuals that precede disease onset. However, these studies are conducted on controlled speech samples from a specific task. This thesis explores the possibility of using natural language processing in order to detect declining linguistic complexity from more natural discourse. We use existing data from public figures suspected (or at risk) of suffering from cognitive-linguistic decline, downloaded from the Internet, to detect changes in linguistic complexity. In particular, we focus on two case studies. The first case study analyzes President Ronald Reagan’s transcribed spontaneous speech samples during his presidency. President Reagan was diagnosed with Alzheimer’s disease in 1994, however my results showed declining linguistic complexity during the span of the 8 years he was in office. President George Herbert Walker Bush, who has no known diagnosis of Alzheimer’s disease, shows no decline in the same measures. In the second case study, we analyze transcribed spontaneous speech samples from the news conferences of 10 current NFL players and 18 non-player personnel since 2007. The non-player personnel have never played professional football. Longitudinal analysis of linguistic complexity showed contrasting patterns in the two groups. The majority (6 of 10) of current players showed decline in at least one measure of linguistic complexity over time. In contrast, the majority (11 out of 18) of non-player personnel showed an increase in at least one linguistic complexity measure.

Contributors

Agent

Created

Date Created
2016

162017-Thumbnail Image.png

Learning from the Data Heterogeneity for Data Imputation

Description

Data mining, also known as big data analysis, has been identified as a critical and challenging process for a variety of applications in real-world problems. Numerous datasets are collected and generated every day to store the information. The rise in

Data mining, also known as big data analysis, has been identified as a critical and challenging process for a variety of applications in real-world problems. Numerous datasets are collected and generated every day to store the information. The rise in the number of data volumes and data modality has resulted in the increased demand for data mining methods and strategies of finding anomalies, patterns, and correlations within large data sets to predict outcomes. Effective machine learning methods are widely adapted to build the data mining pipeline for various purposes like business understanding, data understanding, data preparation, modeling, evaluation, and deployment. The major challenges for effectively and efficiently mining big data include (1) data heterogeneity and (2) missing data. Heterogeneity is the natural characteristic of big data, as the data is typically collected from different sources with diverse formats. The missing value is the most common issue faced by the heterogeneous data analysis, which resulted from variety of factors including the data collecting processing, user initiatives, erroneous data entries, and so on. In response to these challenges, in this thesis, three main research directions with application scenarios have been investigated: (1) Mining and Formulating Heterogeneous Data, (2) missing value imputation strategy in various application scenarios in both offline and online manner, and (3) missing value imputation for multi-modality data. Multiple strategies with theoretical analysis are presented, and the evaluation of the effectiveness of the proposed algorithms compared with state-of-the-art methods is discussed.

Contributors

Agent

Created

Date Created
2021

156468-Thumbnail Image.png

Study of Knowledge Transfer Techniques For Deep Learning on Edge Devices

Description

With the emergence of edge computing paradigm, many applications such as image recognition and augmented reality require to perform machine learning (ML) and artificial intelligence (AI) tasks on edge devices. Most AI and ML models are large and computational heavy,

With the emergence of edge computing paradigm, many applications such as image recognition and augmented reality require to perform machine learning (ML) and artificial intelligence (AI) tasks on edge devices. Most AI and ML models are large and computational heavy, whereas edge devices are usually equipped with limited computational and storage resources. Such models can be compressed and reduced in order to be placed on edge devices, but they may loose their capability and may not generalize and perform well compared to large models. Recent works used knowledge transfer techniques to transfer information from a large network (termed teacher) to a small one (termed student) in order to improve the performance of the latter. This approach seems to be promising for learning on edge devices, but a thorough investigation on its effectiveness is lacking.

The purpose of this work is to provide an extensive study on the performance (both in terms of accuracy and convergence speed) of knowledge transfer, considering different student-teacher architectures, datasets and different techniques for transferring knowledge from teacher to student.

A good performance improvement is obtained by transferring knowledge from both the intermediate layers and last layer of the teacher to a shallower student. But other architectures and transfer techniques do not fare so well and some of them even lead to negative performance impact. For example, a smaller and shorter network, trained with knowledge transfer on Caltech 101 achieved a significant improvement of 7.36\% in the accuracy and converges 16 times faster compared to the same network trained without knowledge transfer. On the other hand, smaller network which is thinner than the teacher network performed worse with an accuracy drop of 9.48\% on Caltech 101, even with utilization of knowledge transfer.

Contributors

Agent

Created

Date Created
2018

155252-Thumbnail Image.png

Mining signed social networks using unsupervised learning algorithms

Description

Due to vast resources brought by social media services, social data mining has

received increasing attention in recent years. The availability of sheer amounts of

user-generated data presents data scientists both opportunities and challenges. Opportunities are presented with additional data sources. The

Due to vast resources brought by social media services, social data mining has

received increasing attention in recent years. The availability of sheer amounts of

user-generated data presents data scientists both opportunities and challenges. Opportunities are presented with additional data sources. The abundant link information

in social networks could provide another rich source in deriving implicit information

for social data mining. However, the vast majority of existing studies overwhelmingly

focus on positive links between users while negative links are also prevailing in real-

world social networks such as distrust relations in Epinions and foe links in Slashdot.

Though recent studies show that negative links have some added value over positive

links, it is dicult to directly employ them because of its distinct characteristics from

positive interactions. Another challenge is that label information is rather limited

in social media as the labeling process requires human attention and may be very

expensive. Hence, alternative criteria are needed to guide the learning process for

many tasks such as feature selection and sentiment analysis.

To address above-mentioned issues, I study two novel problems for signed social

networks mining, (1) unsupervised feature selection in signed social networks; and

(2) unsupervised sentiment analysis with signed social networks. To tackle the first problem, I propose a novel unsupervised feature selection framework SignedFS. In

particular, I model positive and negative links simultaneously for user preference

learning, and then embed the user preference learning into feature selection. To study the second problem, I incorporate explicit sentiment signals in textual terms and

implicit sentiment signals from signed social networks into a coherent model Signed-

Senti. Empirical experiments on real-world datasets corroborate the effectiveness of

these two frameworks on the tasks of feature selection and sentiment analysis.

Contributors

Agent

Created

Date Created
2017