英語 での Synthesis processor の使用例とその 日本語 への翻訳
{-}
-
Colloquial
-
Ecclesiastic
-
Computer
-
Programming
For example, a US English synthesis processor could process British English input.
If a single voice remains in the candidate set the synthesis processor must use it.
All processing steps in the synthesis processor must be performed fully automatically on raw text.
If a successful selection identifies only one voice, the synthesis processor MUST use that voice.
Processorchoice- the synthesis processor chooses the behavior(either priorityselect or keepexisting).
Text normalization is an automated process of the synthesis processor that performs this conversion.
It is an error if a value for alphabet is specified that is not known orcannot be applied by a synthesis processor.
However, such content would only be rendered by a synthesis processor that supported the custom markup.
In practice, the break element is most oftenused to override the typical automatic behavior of a synthesis processor.
Priorityselect- the synthesis processor uses the values of all voice feature attributes to select a voice by feature priority.
The fetching andcaching behavior of SSML documents is defined by the environment in which the synthesis processor operates.
Ignorelang- the synthesis processor will ignore the change in language and speak as if the content were in the previous language.
Changevoice- if a voice exists that can speak the language, the synthesis processor will switch to that voice and speak the content.
Synthesis processors are designed to perform text-to-phoneme conversions so most words of most documents can be handled automatically.
If the element is not present between tokens, the synthesis processor is expected to automatically determine a break based on the linguistic context.
This is equivalent to the use of ACSS with HTML andonce again SSML is the resulting representation to be passed to the synthesis processor.
This can occur when the synthesis processor encounters a new xml: lang value or characters or character sequences that the voice does not know how to process.
If the detail attribute is not specified,the level of detail that is produced by the synthesis processor depends on the text content and thelanguage.
Both human speakers and synthesis processors can pronounce these words correctly in context but may have difficulty without context(see"Non-markup behavior" below).
The token element allows the author to indicate its content is a token and to eliminate token(word)segmentation ambiguities of the synthesis processor.
If text-only output is being produced by the synthesis processor, the content of the desc element(s) SHOULD be rendered instead of other alternative content in audio.
After examining all feature attributes on the ordering list,if multiple voices remain in the candidate set, the synthesis processor MUST use any one of them.
When specified,the interpret-as and format values are to be interpreted by the synthesis processor as hints provided by the markup document author to aid text normalization and pronunciation.
Although indication of language(using xml: lang) and selection of voice(using voice) are independent,there is no requirement that a synthesis processor support every possible combination of values of the two.
In the case of Japanese text, if you have a synthesis processor that supports both Kanji and kana, you may be able to use the sub element to identify whether 今æ-¥ã¯ should be spoken as ãょã†ã¯("kyou wa"="today") or ã“ ã‚“ ã« ã¡ ã¯("konnichiwa"="hello").
If the document author requires a new voice that is better adapted to the new language,then the synthesis processor can be explicitly requested to select a new voice by using the voice element.
Non-markup behavior: In documents andparts of documents where these elements are not used, the synthesis processor is responsible for inferring the structure by automated analysis of the text, often using punctuation and other language-specific data.
A simple English example is"cup<break/>board";outside the token and w elements, the synthesis processor will treat this as the two tokens"cup" and"board" rather than as one token(word) with a pause in the middle.
Clarified in<voice>description that indication of language and voice are independent, no synthesis processor is required to support all combinations thereof, and processors must document behavior for every combination thereof.