Research

Computer Sciences and Information Technology

Title :	Text To Speech Generation with Chosen Accent and Noise Profiles for Aerospace and Industrial Domains
Area of research :	Computer Sciences and Information Technology
Focus area :	Text To Speech (TTS) System
Principal Investigator :	Dr Hema Murthy, Professor, Indian Institute of Technology (IIT) Madras
Contact info :	hema@cse.iitm.ac.in

Details

Executive Summary :	The project objective is to design and improve Text To Speech Generation System for safety domains like Aerospace and Industrial Applications. Text To Speech systems known as Speech Synthesis systems have steadily improved. However, the output from contemporary speech synthesis system still remains clearly distinguishable from that of actual human speech. The objective of the proposal is many fold a. Improve Voice Experience by identifying and matching closely to the accent of the operator b. Improve Voice Experience by selective code switching thereby mixing local lingo c. Attempt to generate Emotive Speech so that user/operator almost perceives the system to be human d. Generate Voice for the desired accent and gender e. Generate Voice adapted / trained to a defined safety domain like Aerospace or Industrial application f. Generate Voice mixed with selective noise profiles so that the system could be used in certain applications like training g. Optimize the system towards near real time performance The project shall focus on user/operator accent identification, prosody analysis and generation that would satisfy the user/operator, noise mixing to match specific domain environment, selective code switching and mixing to improve Voice Experience. While classical methods like concatenation techniques offer good fidelity, the footprint is very large. Statistical parametric synthesis techniques provide good intelligibility but fail on naturalness. The objective is to improve fidelity and accuracy using deep learning based techniques. Google’s Wavenet (a deep generative model) and Lyrebird have employed Generative Adversarial Networks to copy and emulate a user’s speech characteristic. These methods may be explored for Non-native English Accents in the situation of limited data set. In addition to this, appropriate prosody needs to be incorporated. Appropriate prosodic analysis is therefore required. The speech generated may be limited to around 48 KHz samples. A speech utterance may be limited to a maximum of 25 words in a single phrase. The performance of the TTS system may be measured using Mean Opinion Score (MOS) / Subjective Quality Evaluation Tests, and word error rates (WER). Honeywell, an industry partner shall benefit in using the Accent and Noise Sensitive TTS Generation System (The deliverable) by customizing and integrating the solution along with Automatic Speech Recognition System thereby building conversational interfaces. Promoting natural interfaces and hands-free operation would improve overall safety and operational efficiency of Honeywell products, solutions or services
Total Budget (INR):	34,84,800
Achievements :	Production of accented speech. 1. Generic Indic voice generation and adaptation to various Indian languages. 2. Generic Indian English voice generation and adaptation to various Indian English accents. Development of generic Indian English voice, and adaptation to various accents. Development of generic Indian language Aryan and Dravidian voices, and adaptation to 9 Indian languages. 1. Generic Indian English voice. 2. Adaptation of English voice to different accents. 3. Generic Aryan and Dravian voices. 4. Adaptation to speaker and language. Adaptation of English voice to various Indian English accents
Publications :	4

Organizations involved

Implementing Agency :	Indian Institute of Technology (IIT) Madras
Funding Agency :	Department of Science and Technology (DST), Govt of India Ministry of Education (MoE), Govt of India

Related Research

Design and Development of AI-based Smart Battery Management System for Energy storage and E-Mobility applications

Computational Approach using Multiscale Modeling and Machine Learning for the Accelerated Design of Complex Concentrated Alloys

Decentralized Security Orchestration and Management with Programmable networking and Artificial Intelligence

Development of alternative therapeutic strategies for treatment of COVID-19 patients at early stage of the disease

Impact of COVID-19 on financial markets