Filtering by
- All Subjects: Machine Learning
- Creators: Berisha, Visar
- Resource Type: Text
The first part deals with modeling and identification of network dynamics. I study two types of network dynamics arising from social and gene networks. Based on the network dynamics, the proposed network identification method works like a `network RADAR', meaning that interaction strengths between agents are inferred by injecting `signal' into the network and observing the resultant reverberation. In social networks, this is accomplished by stubborn agents whose opinions do not change throughout a discussion. In gene networks, genes are suppressed to create desired perturbations. The steady-states under these perturbations are characterized. In contrast to the common assumption of full rank input, I take a laxer assumption where low-rank input is used, to better model the empirical network data. Importantly, a network is proven to be identifiable from low rank data of rank that grows proportional to the network's sparsity. The proposed method is applied to synthetic and empirical data, and is shown to offer superior performance compared to prior work. The second part is concerned with algorithms on networks. I develop three consensus-based algorithms for multi-agent optimization. The first method is a decentralized Frank-Wolfe (DeFW) algorithm. The main advantage of DeFW lies on its projection-free nature, where we can replace the costly projection step in traditional algorithms by a low-cost linear optimization step. I prove the convergence rates of DeFW for convex and non-convex problems. I also develop two consensus-based alternating optimization algorithms --- one for least square problems and one for non-convex problems. These algorithms exploit the problem structure for faster convergence and their efficacy is demonstrated by numerical simulations.
I conclude this dissertation by describing future research directions.
In this thesis, six experiments which were computer simulations were conducted in order to replicate the negative association between sample size and accuracy that is repeatedly found in ML literature by accounting for data leakage and publication bias. The reason why it is critical to understand why this negative association is occurring is that in published studies, there have been multiple reports that the accuracies in ML models are overoptimistic leading to cases where the results are irreproducible despite conducting multiple trials and experiments. Additionally, after replicating the negative association between sample size and accuracy, parametric curves (learning curves with the parametric function) were fitted along the empirical learning curves in order to evaluate the performance. It was found that there is a significant variance in accuracies when the sample size is small, but little to no variation when the sample size is large. In other words, the empirical learning curves with data leakage and publication bias were able to achieve the same accuracy as the learning curve without data leakage at a large sample size.
This thesis evaluates several sampling methods and a non-parametric approach to sample sizes required to minimize the effect of these nuisance variables on classification performance. This work specifically focused on speech analysis applications, and hence the work was done with speech features like Mel-Frequency Cepstral Coefficients (MFCC) and Filter Bank Cepstral Coefficients (FBCC). The non-parametric divergence (D_p divergence) measure was used to study the difference between different sampling schemes (Stratified and Multistage sampling) and the changes due to the sentence types in the sampling set for the process.