Matching Items (25)

136775-Thumbnail Image.png

Analysis of Learning Retention throughout Aging

Description

In this paper, it is determined that learning retention decreases with age and there is a linear rate of decrease. In this study, four male Long-Evans Rats were used. The

In this paper, it is determined that learning retention decreases with age and there is a linear rate of decrease. In this study, four male Long-Evans Rats were used. The rats were each trained in 4 different tasks throughout their lifetime, using a food reward as motivation to work. Rats were said to have learned a task at the age when they received the highest accuracy during a task. A regression of learning retention was created for the set of studied rats: Learning Retention = 112.9 \u2014 0.085919 x (Age at End of Task), indicating that learning retention decreases at a linear rate, although rats have different rates of decrease of learning retention. The presence of behavioral training was determined not to have a positive impact on this rate. In behavioral studies, there were statistically significant differences between timid/outgoing and large ball ability between W12 and Z12. Rat W12 had overall better learning retention and also was more compliant, did not resist being picked up and traveled more frequently at high speeds (in the large ball) than Z12. Further potential studies include implanting an electrode into the frontal cortex in order to compare neuro feedback with learning retention, and using human subjects to find the rate of decrease in learning retention. The implication of this study, if also true for human subjects, is that older persons may need enhanced training or additional refresher training in order to retain information that is learned at a later age.

Contributors

Agent

Created

Date Created
  • 2014-05

An Empirical Study of View Construction for Multi-View Learning

Description

Multi-view learning, a subfield of machine learning that aims to improve model performance by training on multiple views of the data, has been studied extensively in the past decades. It

Multi-view learning, a subfield of machine learning that aims to improve model performance by training on multiple views of the data, has been studied extensively in the past decades. It is typically applied in contexts where the input features naturally form multiple groups or views. An example of a naturally multi-view context is a data set of websites, where each website is described not only by the text on the page, but also by the text of hyperlinks pointing to the page. More recently, various studies have demonstrated the initial success of applying multi-view learning on single-view data with multiple artificially constructed views. However, there lacks a systematic study regarding the effectiveness of such artificially constructed views. To bridge this gap, this thesis begins by providing a high-level overview of multi-view learning with the co-training algorithm. Co-training is a classic semi-supervised learning algorithm that takes advantage of both labelled and unlabelled examples in the data set for training. Then, the thesis presents a web-based tool developed in Python allowing users to experiment with and compare the performance of multiple view construction approaches on various data sets. The supported view construction approaches in the web-based tool include subsampling, Optimal Feature Set Partitioning, and the genetic algorithm. Finally, the thesis presents an empirical comparison of the performance of these approaches, not only against one another, but also against traditional single-view models. The findings show that a simple subsampling approach combined with co-training often outperforms both the other view construction approaches, as well as traditional single-view methods.

Contributors

Agent

Created

Date Created
  • 2019-12

155226-Thumbnail Image.png

Sentiment informed cyberbullying detection in social media

Description

Cyberbullying is a phenomenon which negatively affects individuals. Victims of the cyberbullying suffer from a range of mental issues, ranging from depression to low self-esteem. Due to the advent of

Cyberbullying is a phenomenon which negatively affects individuals. Victims of the cyberbullying suffer from a range of mental issues, ranging from depression to low self-esteem. Due to the advent of the social media platforms, cyberbullying is becoming more and more prevalent. Traditional mechanisms to fight against cyberbullying include use of standards and guidelines, human moderators, use of blacklists based on profane words, and regular expressions to manually detect cyberbullying. However, these mechanisms fall short in social media and do not scale well. Users in social media use intentional evasive expressions like, obfuscation of abusive words, which necessitates the development of a sophisticated learning framework to automatically detect new cyberbullying behaviors. Cyberbullying detection in social media is a challenging task due to short, noisy and unstructured content and intentional obfuscation of the abusive words or phrases by social media users. Motivated by sociological and psychological findings on bullying behavior and its correlation with emotions, we propose to leverage the sentiment information to accurately detect cyberbullying behavior in social media by proposing an effective optimization framework. Experimental results on two real-world social media datasets show the superiority of the proposed framework. Further studies validate the effectiveness of leveraging sentiment information for cyberbullying detection.

Contributors

Agent

Created

Date Created
  • 2017

155228-Thumbnail Image.png

Structured sparse methods for imaging genetics

Description

Imaging genetics is an emerging and promising technique that investigates how genetic variations affect brain development, structure, and function. By exploiting disorder-related neuroimaging phenotypes, this class of studies provides a

Imaging genetics is an emerging and promising technique that investigates how genetic variations affect brain development, structure, and function. By exploiting disorder-related neuroimaging phenotypes, this class of studies provides a novel direction to reveal and understand the complex genetic mechanisms. Oftentimes, imaging genetics studies are challenging due to the relatively small number of subjects but extremely high-dimensionality of both imaging data and genomic data. In this dissertation, I carry on my research on imaging genetics with particular focuses on two tasks---building predictive models between neuroimaging data and genomic data, and identifying disorder-related genetic risk factors through image-based biomarkers. To this end, I consider a suite of structured sparse methods---that can produce interpretable models and are robust to overfitting---for imaging genetics. With carefully-designed sparse-inducing regularizers, different biological priors are incorporated into learning models. More specifically, in the Allen brain image--gene expression study, I adopt an advanced sparse coding approach for image feature extraction and employ a multi-task learning approach for multi-class annotation. Moreover, I propose a label structured-based two-stage learning framework, which utilizes the hierarchical structure among labels, for multi-label annotation. In the Alzheimer's disease neuroimaging initiative (ADNI) imaging genetics study, I employ Lasso together with EDPP (enhanced dual polytope projections) screening rules to fast identify Alzheimer's disease risk SNPs. I also adopt the tree-structured group Lasso with MLFre (multi-layer feature reduction) screening rules to incorporate linkage disequilibrium information into modeling. Moreover, I propose a novel absolute fused Lasso model for ADNI imaging genetics. This method utilizes SNP spatial structure and is robust to the choice of reference alleles of genotype coding. In addition, I propose a two-level structured sparse model that incorporates gene-level networks through a graph penalty into SNP-level model construction. Lastly, I explore a convolutional neural network approach for accurate predicting Alzheimer's disease related imaging phenotypes. Experimental results on real-world imaging genetics applications demonstrate the efficiency and effectiveness of the proposed structured sparse methods.

Contributors

Agent

Created

Date Created
  • 2017

155262-Thumbnail Image.png

Mason: Real-time NBA Matches Outcome Prediction

Description

The National Basketball Association (NBA) is the most popular basketball league in the world. The world-wide mighty high popularity to the league leads to large amount of interesting and challenging

The National Basketball Association (NBA) is the most popular basketball league in the world. The world-wide mighty high popularity to the league leads to large amount of interesting and challenging research problems. Among them, predicting the outcome of an upcoming NBA match between two specific teams according to their historical data is especially attractive. With rapid development of machine learning techniques, it opens the door to examine the correlation between statistical data and outcome of matches. However, existing methods typically make predictions before game starts. In-game prediction, or real-time prediction, has not yet been sufficiently studied. During a match, data are cumulatively generated, and with the accumulation, data become more comprehensive and potentially embrace more predictive power, so that prediction accuracy may dynamically increase with a match goes on. In this study, I design game-level and player-level features based on realtime data of NBA matches and apply a machine learning model to investigate the possibility and characteristics of using real-time prediction in NBA matches.

Contributors

Agent

Created

Date Created
  • 2017

155843-Thumbnail Image.png

Network Effects in NBA Teams: Observations and Algorithms

Description

The game held by National Basketball Association (NBA) is the most popular basketball event on earth. Each year, tons of statistical data are generated from this industry. Meanwhile, managing teams,

The game held by National Basketball Association (NBA) is the most popular basketball event on earth. Each year, tons of statistical data are generated from this industry. Meanwhile, managing teams, sports media, and scientists are digging deep into the data ocean. Recent research literature is reviewed with respect to whether NBA teams could be analyzed as connected networks. However, it becomes very time-consuming, if not impossible, for human labor to capture every detail of game events on court of large amount. In this study, an alternative method is proposed to parse public resources from NBA related websites to build degenerated game-wise flow graphs. Then, three different statistical techniques are tested to observe the network properties of such offensive strategy in terms of Home-Away team manner. In addition, a new algorithm is developed to infer real game ball distribution networks at the player level under low-rank constraints. The ball-passing degree matrix of one game is recovered to the optimal solution of low-rank ball transition network by constructing a convex operator. The experimental results on real NBA data demonstrate the effectiveness of the proposed algorithm.

Contributors

Agent

Created

Date Created
  • 2017

157531-Thumbnail Image.png

BagStack Classification for Data Imbalance Problems with Application to Defect Detection and Labeling in Semiconductor Units

Description

Despite the fact that machine learning supports the development of computer vision applications by shortening the development cycle, finding a general learning algorithm that solves a wide range of applications

Despite the fact that machine learning supports the development of computer vision applications by shortening the development cycle, finding a general learning algorithm that solves a wide range of applications is still bounded by the ”no free lunch theorem”. The search for the right algorithm to solve a specific problem is driven by the problem itself, the data availability and many other requirements.

Automated visual inspection (AVI) systems represent a major part of these challenging computer vision applications. They are gaining growing interest in the manufacturing industry to detect defective products and keep these from reaching customers. The process of defect detection and classification in semiconductor units is challenging due to different acceptable variations that the manufacturing process introduces. Other variations are also typically introduced when using optical inspection systems due to changes in lighting conditions and misalignment of the imaged units, which makes the defect detection process more challenging.

In this thesis, a BagStack classification framework is proposed, which makes use of stacking and bagging concepts to handle both variance and bias errors. The classifier is designed to handle the data imbalance and overfitting problems by adaptively transforming the

multi-class classification problem into multiple binary classification problems, applying a bagging approach to train a set of base learners for each specific problem, adaptively specifying the number of base learners assigned to each problem, adaptively specifying the number of samples to use from each class, applying a novel data-imbalance aware cross-validation technique to generate the meta-data while taking into account the data imbalance problem at the meta-data level and, finally, using a multi-response random forest regression classifier as a meta-classifier. The BagStack classifier makes use of multiple features to solve the defect classification problem. In order to detect defects, a locally adaptive statistical background modeling is proposed. The proposed BagStack classifier outperforms state-of-the-art image classification techniques on our dataset in terms of overall classification accuracy and average per-class classification accuracy. The proposed detection method achieves high performance on the considered dataset in terms of recall and precision.

Contributors

Agent

Created

Date Created
  • 2019

157587-Thumbnail Image.png

Learning from task heterogeneity in social media

Description

In recent years, the rise in social media usage both vertically in terms of the number of users by platform and horizontally in terms of the number of platforms per

In recent years, the rise in social media usage both vertically in terms of the number of users by platform and horizontally in terms of the number of platforms per user has led to data explosion.

User-generated social media content provides an excellent opportunity to mine data of interest and to build resourceful applications. The rise in the number of healthcare-related social media platforms and the volume of healthcare knowledge available online in the last decade has resulted in increased social media usage for personal healthcare. In the United States, nearly ninety percent of adults, in the age group 50-75, have used social media to seek and share health information. Motivated by the growth of social media usage, this thesis focuses on healthcare-related applications, study various challenges posed by social media data, and address them through novel and effective machine learning algorithms.

The major challenges for effectively and efficiently mining social media data to build functional applications include: (1) Data reliability and acceptance: most social media data (especially in the context of healthcare-related social media) is not regulated and little has been studied on the benefits of healthcare-specific social media; (2) Data heterogeneity: social media data is generated by users with both demographic and geographic diversity; (3) Model transparency and trustworthiness: most existing machine learning models for addressing heterogeneity are considered as black box models, not many providing explanations for why they do what they do to trust them.

In response to these challenges, three main research directions have been investigated in this thesis: (1) Analyzing social media influence on healthcare: to study the real world impact of social media as a source to offer or seek support for patients with chronic health conditions; (2) Learning from task heterogeneity: to propose various models and algorithms that are adaptable to new social media platforms and robust to dynamic social media data, specifically on modeling user behaviors, identifying similar actors across platforms, and adapting black box models to a specific learning scenario; (3) Explaining heterogeneous models: to interpret predictive models in the presence of task heterogeneity. In this thesis, novel algorithms with theoretical analysis from various aspects (e.g., time complexity, convergence properties) have been proposed. The effectiveness and efficiency of the proposed algorithms is demonstrated by comparison with state-of-the-art methods and relevant case studies.

Contributors

Agent

Created

Date Created
  • 2019

157589-Thumbnail Image.png

Learning with attributed networks: algorithms and applications

Description

Attributes - that delineating the properties of data, and connections - that describing the dependencies of data, are two essential components to characterize most real-world phenomena. The synergy between these

Attributes - that delineating the properties of data, and connections - that describing the dependencies of data, are two essential components to characterize most real-world phenomena. The synergy between these two principal elements renders a unique data representation - the attributed networks. In many cases, people are inundated with vast amounts of data that can be structured into attributed networks, and their use has been attractive to researchers and practitioners in different disciplines. For example, in social media, users interact with each other and also post personalized content; in scientific collaboration, researchers cooperate and are distinct from peers by their unique research interests; in complex diseases studies, rich gene expression complements to the gene-regulatory networks. Clearly, attributed networks are ubiquitous and form a critical component of modern information infrastructure. To gain deep insights from such networks, it requires a fundamental understanding of their unique characteristics and be aware of the related computational challenges.

My dissertation research aims to develop a suite of novel learning algorithms to understand, characterize, and gain actionable insights from attributed networks, to benefit high-impact real-world applications. In the first part of this dissertation, I mainly focus on developing learning algorithms for attributed networks in a static environment at two different levels: (i) attribute level - by designing feature selection algorithms to find high-quality features that are tightly correlated with the network topology; and (ii) node level - by presenting network embedding algorithms to learn discriminative node embeddings by preserving node proximity w.r.t. network topology structure and node attribute similarity. As changes are essential components of attributed networks and the results of learning algorithms will become stale over time, in the second part of this dissertation, I propose a family of online algorithms for attributed networks in a dynamic environment to continuously update the learning results on the fly. In fact, developing application-aware learning algorithms is more desired with a clear understanding of the application domains and their unique intents. As such, in the third part of this dissertation, I am also committed to advancing real-world applications on attributed networks by incorporating the objectives of external tasks into the learning process.

Contributors

Agent

Created

Date Created
  • 2019

157044-Thumbnail Image.png

Model Based Automatic and Robust Spike Sorting for Large Volumes of Multi-channel Extracellular Data

Description

Spike sorting is a critical step for single-unit-based analysis of neural activities extracellularly and simultaneously recorded using multi-channel electrodes. When dealing with recordings from very large numbers of neurons, existing

Spike sorting is a critical step for single-unit-based analysis of neural activities extracellularly and simultaneously recorded using multi-channel electrodes. When dealing with recordings from very large numbers of neurons, existing methods, which are mostly semiautomatic in nature, become inadequate.

This dissertation aims at automating the spike sorting process. A high performance, automatic and computationally efficient spike detection and clustering system, namely, the M-Sorter2 is presented. The M-Sorter2 employs the modified multiscale correlation of wavelet coefficients (MCWC) for neural spike detection. At the center of the proposed M-Sorter2 are two automatic spike clustering methods. They share a common hierarchical agglomerative modeling (HAM) model search procedure to strategically form a sequence of mixture models, and a new model selection criterion called difference of model evidence (DoME) to automatically determine the number of clusters. The M-Sorter2 employs two methods differing by how they perform clustering to infer model parameters: one uses robust variational Bayes (RVB) and the other uses robust Expectation-Maximization (REM) for Student’s 𝑡-mixture modeling. The M-Sorter2 is thus a significantly improved approach to sorting as an automatic procedure.

M-Sorter2 was evaluated and benchmarked with popular algorithms using simulated, artificial and real data with truth that are openly available to researchers. Simulated datasets with known statistical distributions were first used to illustrate how the clustering algorithms, namely REMHAM and RVBHAM, provide robust clustering results under commonly experienced performance degrading conditions, such as random initialization of parameters, high dimensionality of data, low signal-to-noise ratio (SNR), ambiguous clusters, and asymmetry in cluster sizes. For the artificial dataset from single-channel recordings, the proposed sorter outperformed Wave_Clus, Plexon’s Offline Sorter and Klusta in most of the comparison cases. For the real dataset from multi-channel electrodes, tetrodes and polytrodes, the proposed sorter outperformed all comparison algorithms in terms of false positive and false negative rates. The software package presented in this dissertation is available for open access.

Contributors

Agent

Created

Date Created
  • 2019