2026-05-19T07:11:26Zhttps://keep.lib.asu.edu/oai/request

oai:keep.lib.asu.edu:node-2010242025-05-14T23:46:44Zoai_pmh:alloai_pmh:repo_items

201024 https://hdl.handle.net/2286/R.2.N.201024 http://rightsstatements.org/vocab/InC/1.0/ http://creativecommons.org/licenses/by-nc-sa/4.0 2025-05 29 pages Jhaj, Baaz Ramani, Krishna Hsu, Jeffrey Osburn, Steven Zhu, Haolin Barrett, The Honors College Computer Science and Engineering Program This thesis presents Translatica, a modular speech-to-speech translation (S2ST) system that preserves both linguistic meaning and the speaker’s vocal identity across languages. Alongside developing a working prototype, this work surveys the landscape of S2ST methods and motivates the choice of a modular architecture over direct approaches, emphasizing flexibility, interpretability, and voice fidelity. The system combines state-of-the-art tools in transcription, translation, and voice synthesis to enable expressive, speaker-preserving dubbing of prerecorded videos. Through implementation and evaluation, the thesis explores the trade-offs between accuracy, latency, and control, demonstrating how modular design enables customization for diverse use cases. Future work includes real-time translation, enhanced speaker tracking, and applications in education and live media. Speech-to-Speech Translation Voice Cloning Speaker Preservation Modular AI Systems Neural Voice Synthesis Human-Centered Machine Translation Translatica: A Survey and Implementation Study on Speech-to-Speech Translation and Voice Synthesis with Speaker Preservation