Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
c4b3269
feat: Add automatic Azure SSML namespace injection for MSTTS tags
willwade Oct 30, 2025
7a2f6cb
feat: Add support for 26 additional Azure MSTTS express-as styles
willwade Oct 30, 2025
f130979
docs: Update Azure documentation with comprehensive MSTTS feature cov…
willwade Oct 30, 2025
a2e6a92
docs: Add comprehensive feature comparison between Azure and other pl…
willwade Oct 30, 2025
492a619
feat: Add grammar support for all 27 Azure MSTTS express-as styles
willwade Oct 30, 2025
25a97b7
chore: Remove development test script
willwade Oct 30, 2025
9b2dac7
feat: Add 6 missing Azure MSTTS styles and lang support
willwade Nov 1, 2025
21aa842
docs: Add comprehensive SSML element support matrix for Azure
willwade Nov 1, 2025
6a2d6e1
feat: Enable emphasis and bookmark support for Azure
willwade Nov 1, 2025
1e6cde7
feat: Add support for Azure express-as role attribute with multiple a…
willwade Nov 1, 2025
58ffd26
fix: Apply Prettier formatting and fix linting issues
willwade Nov 1, 2025
1b8e6c3
docs: Update Azure documentation - remove outdated role section and r…
willwade Nov 2, 2025
8235e6c
feat: Update voice data structure with id, displayName, and languages…
willwade Nov 2, 2025
b55339e
feat: Voice lookup by display name or ID with automatic ID resolution
willwade Nov 2, 2025
2b72811
test: Add comprehensive Azure SSML test suite
willwade Nov 2, 2025
6b5b18f
fix: Update Azure comprehensive tests to use standard neural voices
willwade Nov 2, 2025
93d6db2
feat: Add support for Azure HD voices with dash-to-colon conversion
willwade Nov 2, 2025
5cb2ad5
feat: Add Google comprehensive test suite and google:style tag support
willwade Nov 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,6 @@ dist.browser/

# Sample
.sample

# Environment variables
.env
26 changes: 26 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,32 @@ Sample <break time="3s"/> speech <break time="250ms"/> markdown
</speak>
```

### SSML - Microsoft Azure

Convert Speech Markdown to SSML for Microsoft Azure with automatic MSTTS namespace injection

```js
const smd = require('speechmarkdown-js');

const markdown = `(This is exciting news!)[excited:"1.5"] The new features are here.`;
const options = {
platform: 'microsoft-azure',
};

const speech = new smd.SpeechMarkdown();
const ssml = speech.toSSML(markdown, options);
```

The resulting SSML is:

```xml
<speak xmlns:mstts="https://www.w3.org/2001/mstts">
<mstts:express-as style="excited" styledegree="1.5">This is exciting news!</mstts:express-as> The new features are here.
</speak>
```

Azure supports 27 express-as styles including emotional styles (excited, disappointed, friendly, cheerful, sad, angry, etc.) and scenario-specific styles (newscaster, customerservice, chat, etc.). See [Azure platform documentation](./docs/platforms/azure.md) for complete details.

### Plain Text

Convert Speech Markdown to Plain Text
Expand Down
74 changes: 74 additions & 0 deletions azure-ssml.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
Full

<speak>
Here are <say-as interpret-as="characters">SSML</say-as> samples.
I can pause <break time="3s"/>.
I can play a sound
<audio src="https://www.example.com/MY_MP3_FILE.mp3">didn't get your MP3 audio file</audio>.
I can speak in cardinals. Your number is <say-as interpret-as="cardinal">10</say-as>.
Or I can speak in ordinals. You are <say-as interpret-as="ordinal">10</say-as> in line.
Or I can even speak in digits. The digits for ten are <say-as interpret-as="characters">10</say-as>.
I can also substitute phrases, like the <sub alias="World Wide Web Consortium">W3C</sub>.
Finally, I can speak a paragraph with two sentences.
</speak>


dates

<speak>
<say-as interpret-as="date" format="yyyymmdd" detail="1">
1960-09-10
</say-as>
</speak>

expletive

<speak>
<say-as interpret-as="expletive">censor this</say-as>
</speak>

Audio attachment

<speak>
<audio src="cat_purr_close.ogg">
<desc>a cat purring</desc>
PURR (sound didn't load)
</audio>
</speak>

Marks

<speak>
Go from <mark name="here"/> here, to <mark name="there"/> there!
</speak>


Prosody

<prosody rate="slow" pitch="-2st">Can you hear me now?</prosody>

Emphasis

<emphasis level="moderate">This is an important announcement</emphasis>


IPA

<phoneme alphabet="ipa" ph="ˌmænɪˈtoʊbə">manitoba</phoneme>
<phoneme alphabet="x-sampa" ph='m@"hA:g@%ni:'>mahogany</phoneme>


Voice tags

<speak>And then she asked, <voice language="fr-FR" gender="female">qu'est-ce qui
t'amène ici</voice><break time="250ms"/> in her sweet and gentle voice.</speak>

Langs in a speak


<speak>The french word for cat is <lang xml:lang="fr-FR">chat</lang></speak>


Style

<speak><google:style name="lively">Hello I'm so happy today!</google:style></speak>
239 changes: 228 additions & 11 deletions docs/platforms/azure.md

Large diffs are not rendered by default.

74 changes: 74 additions & 0 deletions google-ssml-examples.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
Full

<speak>
Here are <say-as interpret-as="characters">SSML</say-as> samples.
I can pause <break time="3s"/>.
I can play a sound
<audio src="https://www.example.com/MY_MP3_FILE.mp3">didn't get your MP3 audio file</audio>.
I can speak in cardinals. Your number is <say-as interpret-as="cardinal">10</say-as>.
Or I can speak in ordinals. You are <say-as interpret-as="ordinal">10</say-as> in line.
Or I can even speak in digits. The digits for ten are <say-as interpret-as="characters">10</say-as>.
I can also substitute phrases, like the <sub alias="World Wide Web Consortium">W3C</sub>.
Finally, I can speak a paragraph with two sentences.
</speak>


dates

<speak>
<say-as interpret-as="date" format="yyyymmdd" detail="1">
1960-09-10
</say-as>
</speak>

expletive

<speak>
<say-as interpret-as="expletive">censor this</say-as>
</speak>

Audio attachment

<speak>
<audio src="cat_purr_close.ogg">
<desc>a cat purring</desc>
PURR (sound didn't load)
</audio>
</speak>

Marks

<speak>
Go from <mark name="here"/> here, to <mark name="there"/> there!
</speak>


Prosody

<prosody rate="slow" pitch="-2st">Can you hear me now?</prosody>

Emphasis

<emphasis level="moderate">This is an important announcement</emphasis>


IPA

<phoneme alphabet="ipa" ph="ˌmænɪˈtoʊbə">manitoba</phoneme>
<phoneme alphabet="x-sampa" ph='m@"hA:g@%ni:'>mahogany</phoneme>


Voice tags

<speak>And then she asked, <voice language="fr-FR" gender="female">qu'est-ce qui
t'amène ici</voice><break time="250ms"/> in her sweet and gentle voice.</speak>

Langs in a speak


<speak>The french word for cat is <lang xml:lang="fr-FR">chat</lang></speak>


Style

<speak><google:style name="lively">Hello I'm so happy today!</google:style></speak>
66 changes: 65 additions & 1 deletion scripts/update-voice-data.js
Original file line number Diff line number Diff line change
Expand Up @@ -183,19 +183,58 @@ async function updateAzureVoices() {
}

const voiceMap = {};
const displayNameCollisions = {};

for (const voice of data) {
const name = (voice.ShortName || voice.Name || '').trim();
const locale = (voice.Locale || '').trim();
const displayName = voice.DisplayName || voice.LocalName || name;

if (!name) {
continue;
}

voiceMap[name.toLowerCase()] = {
const voiceEntry = {
voice: {
name,
},
id: name,
displayName,
locale,
};

// Add entry by voice ID (e.g., "en-us-jennyneural")
voiceMap[name.toLowerCase()] = voiceEntry;

// Also add entry by display name (e.g., "jenny") for easier lookup
// Only add if display name is different from the voice ID
const displayNameKey = displayName.toLowerCase();
if (displayNameKey !== name.toLowerCase()) {
if (!voiceMap[displayNameKey]) {
voiceMap[displayNameKey] = voiceEntry;
} else {
// Track collisions for debugging
if (!displayNameCollisions[displayNameKey]) {
displayNameCollisions[displayNameKey] = [];
}
displayNameCollisions[displayNameKey].push(name);
}
}
}

// Log collisions if any
const collisionKeys = Object.keys(displayNameCollisions);
if (collisionKeys.length > 0) {
console.log(
`[azure] ${collisionKeys.length} display name collisions (not added as aliases):`,
);
collisionKeys.slice(0, 5).forEach((key) => {
console.log(
` "${key}": ${displayNameCollisions[key].slice(0, 3).join(', ')}${
displayNameCollisions[key].length > 3 ? '...' : ''
}`,
);
});
}

writeFormatterVoiceModule('microsoftAzureVoices.ts', [
Expand Down Expand Up @@ -229,6 +268,10 @@ async function updateGoogleVoices() {

for (const voice of voices) {
const name = (voice.name || '').trim();
const languageCodes =
voice.languageCodes && Array.isArray(voice.languageCodes)
? voice.languageCodes
: [];

if (!name) {
continue;
Expand All @@ -238,6 +281,8 @@ async function updateGoogleVoices() {
voice: {
name,
},
id: name,
languages: languageCodes,
};
}

Expand Down Expand Up @@ -282,6 +327,7 @@ async function updateWatsonVoices() {

for (const voice of voices) {
const name = (voice.name || '').trim();
const language = (voice.language || '').trim();

if (!name) {
continue;
Expand All @@ -291,6 +337,8 @@ async function updateWatsonVoices() {
voice: {
name,
},
id: name,
language,
};
}

Expand Down Expand Up @@ -456,10 +504,26 @@ async function updatePollyVoices() {
}

const key = id.toLowerCase();
const languageCodes = [];

if (voice.LanguageCode) {
languageCodes.push(voice.LanguageCode);
}

if (
voice.AdditionalLanguageCodes &&
Array.isArray(voice.AdditionalLanguageCodes)
) {
languageCodes.push(...voice.AdditionalLanguageCodes);
}

const entry = {
voice: {
name: id,
},
id,
displayName: voice.Name || id,
languages: languageCodes,
};

allVoices[key] = entry;
Expand Down
Loading