Protocols of Network-based Speech-to-Speech Translation

A S2ST system is composed of three modules; speech recognition from the source language, translation from the source text into the target text, and speech synthesis from the target language. The figure shows the structure of the S2ST system. “ASR,” “TTS,” and “MT” in the figure represent “Automatic Speech Recognition,” “Machine Translation,” and “Text-to-Speech Synthesis,” respectively. To establish S2ST systems, we need to build the ASR, MT, and TTS for both source and target languages by collecting speech and language data such as; audio data, manual transcriptions, pronunciation lexica for each and every word, parallel corpora for translation, and so on. It is not an easy task for an individual to establish S2ST systems that cover all topics and languages, however, by connecting ASR, MT, and TTS modules distributed from organizations all over the world, we strongly believe that we can establish a global S2ST system that will allow us to free ourselves from language barriers. In order to connect these modules for different languages and functions reliably, it is necessary to standardize the communication protocols and data formats between modules, as illustrated in the figure.



In 2010, the standardizing procedures at APT (Asia-Pacific Telecommunity) were transferred to ITU-T (International Telecommunication Union) as A-STAR shifted to U-STAR, transforming not only its name but its regime to a worldwide consortium with the aim of establishing a more global system.

Expansion of Standardization technologies


S2ST Protocols

-NICT is the editor for S2ST standardization at ITU-T SG16, WP2, Q21/22
-ITU-T Recommendations as below were approved on Oct. 14th, 2010

Recommendation Title Scope
F.745 Functional Requirements for Network-based S2ST -Definition of Network-based S2ST
-Functions and service requirements of network-based S2ST
H.625 Architectual Requirements for Network-based S2ST -Requirements of S2ST architecture
-Definition of interface for Network-based S2ST

