diff --git a/.gitignore b/.gitignore index 9880e81..6ecd52f 100644 --- a/.gitignore +++ b/.gitignore @@ -27,3 +27,6 @@ dist.browser/ # Sample .sample + +# Environment variables +.env diff --git a/README.md b/README.md index c57f924..5f632e7 100644 --- a/README.md +++ b/README.md @@ -74,6 +74,32 @@ Sample speech markdown ``` +### SSML - Microsoft Azure + +Convert Speech Markdown to SSML for Microsoft Azure with automatic MSTTS namespace injection + +```js +const smd = require('speechmarkdown-js'); + +const markdown = `(This is exciting news!)[excited:"1.5"] The new features are here.`; +const options = { + platform: 'microsoft-azure', +}; + +const speech = new smd.SpeechMarkdown(); +const ssml = speech.toSSML(markdown, options); +``` + +The resulting SSML is: + +```xml + +This is exciting news! The new features are here. + +``` + +Azure supports 27 express-as styles including emotional styles (excited, disappointed, friendly, cheerful, sad, angry, etc.) and scenario-specific styles (newscaster, customerservice, chat, etc.). See [Azure platform documentation](./docs/platforms/azure.md) for complete details. + ### Plain Text Convert Speech Markdown to Plain Text diff --git a/azure-ssml.txt b/azure-ssml.txt new file mode 100644 index 0000000..58f93ba --- /dev/null +++ b/azure-ssml.txt @@ -0,0 +1,74 @@ +Full + + + Here are SSML samples. + I can pause . + I can play a sound + . + I can speak in cardinals. Your number is 10. + Or I can speak in ordinals. You are 10 in line. + Or I can even speak in digits. The digits for ten are 10. + I can also substitute phrases, like the W3C. + Finally, I can speak a paragraph with two sentences. + + + +dates + + + + 1960-09-10 + + + +expletive + + + censor this + + +Audio attachment + + + + + +Marks + + +Go from here, to there! + + + +Prosody + +Can you hear me now? + +Emphasis + +This is an important announcement + + +IPA + + manitoba + mahogany + + +Voice tags + +And then she asked, qu'est-ce qui +t'amène ici in her sweet and gentle voice. + +Langs in a speak + + +The french word for cat is chat + + +Style + +Hello I'm so happy today! diff --git a/docs/platforms/azure.md b/docs/platforms/azure.md index cdebde1..677cc1e 100644 --- a/docs/platforms/azure.md +++ b/docs/platforms/azure.md @@ -3,23 +3,240 @@ ## Official resources - [SSML structure reference](https://learn.microsoft.com/azure/ai-services/speech-service/speech-synthesis-markup-structure) +- [Voice and sound with SSML](https://learn.microsoft.com/azure/ai-services/speech-service/speech-synthesis-markup-voice) - [Voice gallery](https://learn.microsoft.com/azure/ai-services/speech-service/language-support?tabs=tts) ## Speech Markdown formatter coverage -Speech Markdown's `microsoft-azure` formatter layers Azure-specific behaviour on top of the shared SSML mapping: +The `microsoft-azure` formatter supports Azure Text-to-Speech features including automatic MSTTS namespace injection and neural voice styles. -- **Say-as conversions.** Speech Markdown forwards modifiers such as `address`, `fraction`, `ordinal`, `telephone`, `number`, and `characters` to `` while automatically choosing `cardinal` or `digits` for numeric text.【F:src/formatters/MicrosoftAzureSsmlFormatter.ts†L9-L48】 -- **Dates and times.** The formatter emits `` and `` with Azure's default `ymd` and `hms12` formats when no explicit format is supplied.【F:src/formatters/MicrosoftAzureSsmlFormatter.ts†L49-L58】 -- **Pronunciation helpers.** `sub` and `ipa` modifiers become `` and ``, letting authors control pronunciation directly from Speech Markdown.【F:src/formatters/MicrosoftAzureSsmlFormatter.ts†L59-L66】 -- **Prosody and whispering.** Rate, pitch, and volume modifiers augment `` tags, and the `whisper` modifier approximates whispered delivery with `volume="x-soft"` and `rate="slow"` settings as recommended by Microsoft.【F:src/formatters/MicrosoftAzureSsmlFormatter.ts†L22-L27】【F:src/formatters/MicrosoftAzureSsmlFormatter.ts†L67-L75】 -- **Voice and style selection.** Inline `voice` modifiers add `` tags, and the section-level `newscaster` modifier wraps content in `` so maintainers can target Azure's neural styles.【F:src/formatters/MicrosoftAzureSsmlFormatter.ts†L23-L27】【F:src/formatters/MicrosoftAzureSsmlFormatter.ts†L76-L103】 +### SSML Element Support Matrix -### Unsupported or manual features +The following table shows which Azure SSML elements are supported by Speech Markdown: -- The formatter explicitly disables Azure-only constructs such as `emphasis`, `expletive`, `interjection`, and `unit`, so those modifiers currently do not produce SSML output.【F:src/formatters/MicrosoftAzureSsmlFormatter.ts†L8-L17】 -- Additional expressive behaviours—including `excited`, `disappointed`, and other MSTTS styles—remain unmapped because the shared SSML base leaves those modifiers set to `null` pending future design work.【F:src/formatters/SsmlFormatterBase.ts†L63-L86】 +| SSML Element | Status | Speech Markdown Syntax | Notes | +| --------------------------------- | ---------------- | ----------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------- | +| **Core W3C SSML** | +| `` | ✅ Full | Automatic | Root element with automatic `xmlns:mstts` injection when needed | +| `` | ✅ Full | `(text)[voice:"name"]` or `#[voice:"name"]` | Voice selection and switching | +| `` | ✅ Full | `(text)[lang:"locale"]` or `#[lang:"locale"]` | Language/accent switching | +| `

` | ✅ Full | Automatic (optional) | Paragraph tags via `includeParagraphTag` option | +| `` | ❌ Not supported | N/A | Sentence tags not implemented | +| `` | ✅ Full | `[break:"time"]` or `[break:"strength"]` | Pauses with time or strength | +| `` | ✅ Full | `(text)[rate:"value"]`, `[pitch:"value"]`, `[volume:"value"]` | Rate, pitch, volume control | +| `` | ✅ Partial | `(text)[address]`, `[number]`, `[ordinal]`, `[telephone]`, `[fraction]`, `[date:"format"]`, `[time:"format"]`, `[characters]` | Interpret-as types supported | +| `` | ✅ Full | `(text)[ipa:"pronunciation"]` | IPA pronunciation | +| `` | ✅ Full | `(text)[sub:"alias"]` | Text substitution | +| `` | ✅ Full | `++text++` (moderate), `+text+` (strong), `--text--` (reduced), `-text-` (none) | Word-level stress with 4 levels | +| `