Filtering by
- All Subjects: Data Mining
- All Subjects: Design of Experiments
- Creators: Industrial, Systems
- Creators: Wu, Teresa
The phase I study focuses on the imbalanced classification problem. A generative classifier, Gaussian Mixture Model (GMM) is studied which can learn the distribution of the imbalance data to improve the discrimination power on imbalanced classes. By fusing this knowledge into cost SVM (cSVM), a CSG method is proposed. Experimental results show the effectiveness of CSG in dealing with imbalanced classification problems.
The phase II study expands the research scope to include the noisy dataset into the imbalanced classification problem. A model fusion based framework, K Nearest Gaussian (KNG) is proposed. KNG employs a generative modeling method, GMM, to model the training data as Gaussian mixtures and form adjustable confidence regions which are less sensitive to data imbalance and noise. Motivated by the K-nearest neighbor algorithm, the neighboring Gaussians are used to classify the testing instances. Experimental results show KNG method greatly outperforms traditional classification methods in dealing with imbalanced classification problems with noisy dataset.
The phase III study addresses the issues of feature selection and parameter tuning of KNG algorithm. To further improve the performance of KNG algorithm, a Particle Swarm Optimization based method (PSO-KNG) is proposed. PSO-KNG formulates model parameters and data features into the same particle vector and thus can search the best feature and parameter combination jointly. The experimental results show that PSO can greatly improve the performance of KNG with better accuracy and much lower computational cost.
The U.S. Navy and other amphibious military organizations utilize a derivation of the traditional side stroke called the Combat Side Stroke, or CSS, and tout it as the most efficient technique available. Citing its low aerobic requirements and slow yet powerful movements as superior to the traditionally-best front crawl (freestyle), the CSS is the go-to stroke for any operation in the water. The purpose of this thesis is to apply principles of Industrial Engineering to a real-world situation not typically approached from a perspective of optimization. I will analyze pre-existing data about various swim strokes in order to compare them in terms of efficiency for different variables. These variables include calories burned, speed, and strokes per unit distance, as well as their interactions. Calories will be measured by heart rate monitors, converting BPM to calories burned. Speed will be measured by stopwatch and observer. Strokes per unit distance will be measured by observer. The strokes to be analyzed include the breast stroke, crawl stroke, butterfly, and combat side stroke. The goal is to informally test the U.S. Navy's claim that the combat side stroke is the optimum stroke to conserve energy while covering distance. Because of limitations in the scope of the project, analysis will be done using data collected from literary sources rather than through experimentation. This thesis will include a design of experiment to test the findings here in practical study. The main method of analysis will be linear programming, followed by hypothesis testing, culminating in a design of experiment for future progress on this topic.
Based on findings of previous studies, there was speculation that two well-known experimental design software packages, JMP and Design Expert, produced varying power outputs given the same design and user inputs. For context and scope, another popular experimental design software package, Minitab® Statistical Software version 17, was added to the comparison. The study compared multiple test cases run on the three software packages with a focus on 2k and 3K factorial design and adjusting the standard deviation effect size, number of categorical factors, levels, number of factors, and replicates. All six cases were run on all three programs and were attempted to be run at one, two, and three replicates each. There was an issue at the one replicate stage, however—Minitab does not allow for only one replicate full factorial designs and Design Expert will not provide power outputs for only one replicate unless there are three or more factors. From the analysis of these results, it was concluded that the differences between JMP 13 and Design Expert 10 were well within the margin of error and likely caused by rounding. The differences between JMP 13, Design Expert 10, and Minitab 17 on the other hand indicated a fundamental difference in the way Minitab addressed power calculation compared to the latest versions of JMP and Design Expert. This was found to be likely a cause of Minitab’s dummy variable coding as its default instead of the orthogonal coding default of the other two. Although dummy variable and orthogonal coding for factorial designs do not show a difference in results, the methods affect the overall power calculations. All three programs can be adjusted to use either method of coding, but the exact instructions for how are difficult to find and thus a follow-up guide on changing the coding for factorial variables would improve this issue.