Filtering by
- All Subjects: deep learning
- All Subjects: HoloLens
- Creators: Jayasuriya, Suren
The purpose of this project is to create a useful tool for musicians that utilizes the harmonic content of their playing to recommend new, relevant chords to play. This is done by training various Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNNs) on the lead sheets of 100 different jazz standards. A total of 200 unique datasets were produced and tested, resulting in the prediction of nearly 51 million chords. A note-prediction accuracy of 82.1% and a chord-prediction accuracy of 34.5% were achieved across all datasets. Methods of data representation that were rooted in valid music theory frameworks were found to increase the efficacy of harmonic prediction by up to 6%. Optimal LSTM input sizes were also determined for each method of data representation.
Video playback is currently the primary method coaches and athletes use in sports training to give feedback on the athlete’s form and timing. Athletes will commonly record themselves using a phone or camera when practicing a sports movement, such as shooting a basketball, to then send to their coach for feedback on how to improve. In this work, we present Augmented Coach, an augmented reality tool for coaches to give spatiotemporal feedback through a 3-dimensional point cloud of the athlete. The system allows coaches to view a pre-recorded video of their athlete in point cloud form, and provides them with the proper tools in order to go frame by frame to both analyze the athlete’s form and correct it. The result is a fundamentally new concept of an interactive video player, where the coach can remotely view the athlete in a 3-dimensional form and create annotations to help improve their form. We then conduct a user study with subject matter experts to evaluate the usability and capabilities of our system. As indicated by the results, Augmented Coach successfully acts as a supplement to in-person coaching, since it allows coaches to break down the video recording in a 3-dimensional space and provide feedback spatiotemporally. The results also indicate that Augmented Coach can be a complete coaching solution in a remote setting. This technology will be extremely relevant in the future as coaches look for new ways to improve their feedback methods, especially in a remote setting.
Video playback is currently the primary method coaches and athletes use in sports training to give feedback on the athlete’s form and timing. Athletes will commonly record themselves using a phone or camera when practicing a sports movement, such as shooting a basketball, to then send to their coach for feedback on how to improve. In this work, we present Augmented Coach, an augmented reality tool for coaches to give spatiotemporal feedback through a 3-dimensional point cloud of the athlete. The system allows coaches to view a pre-recorded video of their athlete in point cloud form, and provides them with the proper tools in order to go frame by frame to both analyze the athlete’s form and correct it. The result is a fundamentally new concept of an interactive video player, where the coach can remotely view the athlete in a 3-dimensional form and create annotations to help improve their form. We then conduct a user study with subject matter experts to evaluate the usability and capabilities of our system. As indicated by the results, Augmented Coach successfully acts as a supplement to in-person coaching, since it allows coaches to break down the video recording in a 3-dimensional space and provide feedback spatiotemporally. The results also indicate that Augmented Coach can be a complete coaching solution in a remote setting. This technology will be extremely relevant in the future as coaches look for new ways to improve their feedback methods, especially in a remote setting.
First, this work presents an application of mixture of experts models for quality robust visual recognition. First it is shown that human subjects outperform deep neural networks on classification of distorted images, and then propose a model, MixQualNet, that is more robust to distortions. The proposed model consists of ``experts'' that are trained on a particular type of image distortion. The final output of the model is a weighted sum of the expert models, where the weights are determined by a separate gating network. The proposed model also incorporates weight sharing to reduce the number of parameters, as well as increase performance.
Second, an application of mixture of experts to predict visual saliency is presented. A computational saliency model attempts to predict where humans will look in an image. In the proposed model, each expert network is trained to predict saliency for a set of closely related images. The final saliency map is computed as a weighted mixture of the expert networks' outputs, with weights determined by a separate gating network. The proposed model achieves better performance than several other visual saliency models and a baseline non-mixture model.
Finally, this work introduces a saliency model that is a weighted mixture of models trained for different levels of saliency. Levels of saliency include high saliency, which corresponds to regions where almost all subjects look, and low saliency, which corresponds to regions where some, but not all subjects look. The weighted mixture shows improved performance compared with baseline models because of the diversity of the individual model predictions.
tion source is a challenging task with vital applications including surveillance and robotics.
Recent NLOS reconstruction advances have been achieved using time-resolved measure-
ments. Acquiring these time-resolved measurements requires expensive and specialized
detectors and laser sources. In work proposes a data-driven approach for NLOS 3D local-
ization requiring only a conventional camera and projector. The localisation is performed
using a voxelisation and a regression problem. Accuracy of greater than 90% is achieved
in localizing a NLOS object to a 5cm × 5cm × 5cm volume in real data. By adopting
the regression approach an object of width 10cm to localised to approximately 1.5cm. To
generalize to line-of-sight (LOS) scenes with non-planar surfaces, an adaptive lighting al-
gorithm is adopted. This algorithm, based on radiosity, identifies and illuminates scene
patches in the LOS which most contribute to the NLOS light paths, and can factor in sys-
tem power constraints. Improvements ranging from 6%-15% in accuracy with a non-planar
LOS wall using adaptive lighting is reported, demonstrating the advantage of combining
the physics of light transport with active illumination for data-driven NLOS imaging.