Acapela Cloud

SSML

SSML tags

Voice Tags

\pau=number\	This tag inserts a pause of the specified number of milliseconds in the speech.
\prx="phonetic"\	With this tag, a user can synthesize a specific pronunciation inside a text. The phonetic string is composed of phonemes followed by space characters (phonetic alphabet is language-dependent). This tag is only suitable for inserting single words into the text. Unpredictable errors (mainly prosody) can occur when inserting greater units. Example: I will say: \prx=h e l @U1\. is equivalent to "I will say: hello."
\rms=number\	Sets the reading mode to spelling out< each letter of each word (number is 1), or turns it off (number is 0).
\rmw=number\	Sets the reading mode to leaving audible pauses between each word (number is 1), or turns it off (number is 0).
\spd=number\	This tag sets the baseline average talking speed of the voice to the specified number of words per minute. Each voice has a default speed (about 180 words per minute, depending of the voice). Call to reset to the default speed. Range: 1/3 to 3 times the default speed (typically \spd=60\ to . This tag persists on the next texts being spoken until the next voice change or the next tag cancelling it out.
\rspd=number\	This tag sets the relative speed. 100 is the default speed (about 180 words per minute, depending of the voice). Call \rst\ to reset to the default speed. This tag persists on the next texts being spoken until the next voice change or the next tag cancelling it out.
\vct=number\	This tag controls the shaping of the voice, please refer to the Developer’s Guide for more information (min: 50%-150%). This tag persists on the next texts being spoken until the next voice change or the next tag cancelling it out.
\vol=number\	This tag sets the output volume. It is formatted as \ where volume is a value in the range 0 to 65535, inclusive. The default value is 65535. This tag persists on the next texts being spoken until the next voice change or the next tag cancelling it out.
\mrk=number\	This tag indicates a user mark in the text. The positions of the tags can be retrieved in the events when using parameter markpos=on. \mrk=0\ is reserved
\audioboost=val\	The Audio Boost has effect on 2 aspects of the speech, it improves the speech clarity by emphasizing medium and high frequencies, that are important for intelligibility, and it increases the perceived level of the speech with no saturation effect. This tag is formatted as \audioboost=val\ where val is a value in the range 0 to 90, inclusive. The default value is 0. The parameter val controls the emphasis of medium and high frequencies, from no emphasis (0), to maximum emphasis (90).
\rst\	This tag resets the engine to the default settings for the current mode.
\vce=key=value\	Change the speaking voice according to the specified characteristics. The pitch, speed, volume, etc. revert to the defaults for the new voice - \vce=speaker=Ryan\ Specifies the speaker value of the voice. Beware that the speaker name is different from the voice name. It will use the same technology as the main voice (HQ/NT) For Ryan22k_HQ voice the speaker name is Ryan (e.g. \vce=speaker=Ryan\) - \vce=voice=Ryan22k_HQ\ - \vce=voice=Ryan22k_NT\ Specifies the exact voice to use including the technology (22k_HQ for HQ voices, 22k_NT for Neural voices)
\sel=altN\	Alternative synthesis for the following word. \sel=altN\ gives the N-th acoustic alternative for the following word (and thus, potentially, the whole breath group) Example: The sentence, I don't like the sound of this \sel=alt3\ word, takes the third best alternative of the pronunciation of word 'word'.

Audio Tags

First you need to upload audio files to your account using the account storage webpage or using the api/storage
Only raw PCM 16-bit mono files with the same frequency as the voice used (22Khz) are supported by the audio tags (file extension must be .raw)
But you can import a mp3 or wav file which will be automatically converted to the right format/extension.

\audio=play="filename.raw"\	Plays a sound in the foreground (synchronous mode). Example: \audio=play="filename.raw"\ I play a sound then I speak
\audio=mix="filename.raw"\	Plays the file in the background, the speech synthesis will continue during the playing. Example: \audio=mix="filename.raw"\ I play the sound while I speak
\audio=offset=x\	Skips x milliseconds at the beginning of the sound. Example: \audio=play="filename.raw";offset=5000\ I play the sound from position 5000ms then I speak
mix mode only \audio=pause\ \audio=resume\ \audio=stop\ \audio=play\	pause / resume / stop / play Example: \audio=mix="filname.raw"\ I play the musis while I speak , then I put the background music on pause! \audio=pause\ Then I resume it! \audio=resume\ Finally, I stop it! \audio=stop\ finished.
continue	Makes a sound continue in the background (asynchronous mode). There must be a duration=timeduration or until=timeposition argument. It turns the foreground playing into background playing Example: Please applaud! \audio=play="bravo.raw";duration=100;continue\ Thank you!. Play sound for 100 milliseconds then continue playing it in the background while saying “Thank you!”.
duration=timeduration	Plays the sound until the position timeposition milliseconds within the sound is reached (play or mix commands) and then stops reading it. Example: \audio=play="mozart.raw";until=5000\ Play music 5 secs then speak
until=timeposition	Plays the sound until the position timeposition milliseconds within the sound is reached (play or mix commands) and then stops reading it. Example: \audio=play="mozart.raw";duration=2000\ Play song two seconds then speak and play again music for three seconds.\audio=play="mozart.raw";offset=2000;until=5000\
\audio=repeat=status\	When status is on, continuously repeats the foreground or background sound (play or mix command). When it is off, does not repeat it (anymore). If status is a positive integral number, possibly zero, it will be the number of times to play the sound from the beginning as soon as its end is reached in addition to the regular play of the sound. In the end, the sound is player times + 1. Example: \audio=mix="bravo.raw";repeat=on\I speak while they applaud!\audio=play;repeat=off\
\audio=volume=percentage\	Sets the volume of the sound to percentage % of its base level. 100 is the base level. Lesser than 100 reduces the volume, greather than 100 raises it (which can lead to distortion and saturation). Example: \audio=play="bravo.raw";volume=100;\ I play the sound file at 100% volume. To fade the volume in or out you need to do add a pause tag after Example: \audio=volume=80\ \pau=1000\ \audio=volume=90\ \pau=1000\ \audio=volume=100\ \pau=1000\ \audio=volume=90\ \pau=1000\ \audio=volume=80\ \pau=1000\ \audio=volume=70\ \pau=1000\ \audio=volume=60\ \pau=1000\