SwiftEdgeTTS

A Swift Package for Microsoft Edge Text-to-Speech (TTS) API integration. This package provides a clean, simple interface to generate high-quality audio files from text using Edge-TTS without any Python dependencies.

Features

✅ Pure Swift implementation - no external dependencies or Python required
✅ Simple, clean API - easy to use async/await interface
✅ 400+ neural voices across 100+ languages and locales
✅ High-quality MP3 audio output (24kHz, 48kbitrate)
✅ Automatic SSML generation with proper XML escaping
✅ Adjustable speech parameters - rate, volume, and pitch control
✅ Thread-safe token caching and clock synchronization
✅ iOS 15+ and macOS 12+ support
✅ Aligned with edge-tts 7.2.7 WebSocket handshake behavior
✅ Comprehensive error handling with detailed error types

Installation

Swift Package Manager

Add the package to your Package.swift:

dependencies: [
    .package(url: "https://github.com/herrkaefer/SwiftEdgeTTS.git", from: "1.3.0")
]

Or in Xcode:

File → Add Packages...
Enter the repository URL: https://github.com/herrkaefer/SwiftEdgeTTS.git
Select the version or branch

Versioning

Swift Package versioning is managed via Git tags (SemVer), not in Package.swift.

Existing release tags in this repository include: 1.0.0, v1.1.0, v1.2.0
Current release tag: v1.3.0

Quick Start

Basic Usage

import SwiftEdgeTTS

// Create a TTS service instance
let ttsService = EdgeTTSService()

// Synthesize text to audio file
let outputURL = FileManager.default.temporaryDirectory
    .appendingPathComponent("output.mp3")

do {
    let audioURL = try await ttsService.synthesize(
        text: "Hello, world!",
        voice: "en-US-JennyNeural",
        outputURL: outputURL
    )
    print("Audio saved to: \(audioURL.path)")
} catch {
    print("Error: \(error)")
}

Adjusting Speech Parameters

You can customize the speech rate, volume, and pitch:

// Slower speech rate
let audioURL = try await ttsService.synthesize(
    text: "Hello, world!",
    voice: "en-US-JennyNeural",
    outputURL: outputURL,
    rate: "-50%",      // 50% slower
    volume: nil,
    pitch: nil
)

// Lower volume
let audioURL = try await ttsService.synthesize(
    text: "Hello, world!",
    voice: "en-US-JennyNeural",
    outputURL: outputURL,
    rate: nil,
    volume: "-50%",    // 50% quieter
    pitch: nil
)

// Lower pitch
let audioURL = try await ttsService.synthesize(
    text: "Hello, world!",
    voice: "en-US-JennyNeural",
    outputURL: outputURL,
    rate: nil,
    volume: nil,
    pitch: "-50Hz"     // 50Hz lower pitch
)

// All parameters together
let audioURL = try await ttsService.synthesize(
    text: "Hello, world!",
    voice: "en-US-JennyNeural",
    outputURL: outputURL,
    rate: "+25%",      // 25% faster
    volume: "+10%",    // 10% louder
    pitch: "+20Hz"     // 20Hz higher pitch
)

Batch Synthesis

Generate multiple audio files at once:

let texts = [
    "First sentence.",
    "Second sentence.",
    "Third sentence."
]

let outputDirectory = FileManager.default.temporaryDirectory
    .appendingPathComponent("audio")

let results = try await ttsService.synthesizeMultiple(
    texts: texts,
    voice: "en-US-JennyNeural",
    outputDirectory: outputDirectory
)

// With prosody parameters
let results = try await ttsService.synthesizeMultiple(
    texts: texts,
    voice: "en-US-JennyNeural",
    outputDirectory: outputDirectory,
    rate: "-30%",      // Slower speech rate
    volume: nil,
    pitch: nil
)

// Process results (nil indicates a failed synthesis)
for (index, url) in results.enumerated() {
    if let url = url {
        print("File \(index + 1) saved: \(url.path)")
    } else {
        print("File \(index + 1) failed to generate")
    }
}

Get Available Voices

Discover available voices and filter by language:

let voices = try await ttsService.getAvailableVoices()

// Filter by language
let chineseVoices = voices.filter { $0.locale.hasPrefix("zh") }
let englishVoices = voices.filter { $0.locale.hasPrefix("en") }

// Print voice information
for voice in chineseVoices.prefix(5) {
    print("\(voice.name) - \(voice.locale) - \(voice.gender)")
}

Multi-language Examples

// Chinese (Mandarin)
try await ttsService.synthesize(
    text: "你好，世界！",
    voice: "zh-CN-XiaoxiaoNeural",
    outputURL: chineseURL
)

// Japanese
try await ttsService.synthesize(
    text: "こんにちは、世界！",
    voice: "ja-JP-NanamiNeural",
    outputURL: japaneseURL
)

// Spanish
try await ttsService.synthesize(
    text: "Hola, mundo.",
    voice: "es-ES-ElviraNeural",
    outputURL: spanishURL
)

// French
try await ttsService.synthesize(
    text: "Bonjour, le monde.",
    voice: "fr-FR-DeniseNeural",
    outputURL: frenchURL
)

// German
try await ttsService.synthesize(
    text: "Hallo, Welt.",
    voice: "de-DE-KatjaNeural",
    outputURL: germanURL
)

Audio Demos

Language	Voice	Demo
English	`en-US-JennyNeural`	Play
Chinese (Mandarin)	`zh-CN-XiaoxiaoNeural`	Play
Japanese	`ja-JP-NanamiNeural`	Play
Spanish	`es-ES-ElviraNeural`	Play
French	`fr-FR-DeniseNeural`	Play
German	`de-DE-KatjaNeural`	Play
Korean	`ko-KR-SunHiNeural`	Play
Russian	`ru-RU-SvetlanaNeural`	Play

Edge-TTS Compatibility

This package's current synthesis handshake and headers are aligned with Python edge-tts 7.2.7.

API Reference

`EdgeTTSClient` Protocol

protocol EdgeTTSClient {
    func synthesize(text: String, voice: String, outputURL: URL, rate: String?, volume: String?, pitch: String?) async throws -> URL
    func synthesizeMultiple(texts: [String], voice: String, outputDirectory: URL, rate: String?, volume: String?, pitch: String?) async throws -> [URL?]
    func getAvailableVoices() async throws -> [EdgeTTSVoice]
}

Prosody Parameters:

rate: Optional speech rate adjustment (e.g., "+50%", "-50%"). Default is "+0%"
volume: Optional volume adjustment (e.g., "+50%", "-50%"). Default is "+0%"
pitch: Optional pitch adjustment (e.g., "+50Hz", "-50Hz"). Default is "+0Hz"

For backward compatibility, the protocol extension provides convenience methods without prosody parameters that use default values.

`EdgeTTSService`

The default implementation of EdgeTTSClient.

let client = EdgeTTSService()

Error Handling

The package uses EdgeTTSError for error handling:

enum EdgeTTSError: Error {
    case synthesisFailed
    case invalidVoice
    case networkError(Error)
    case invalidResponse
    case fileWriteFailed(Error)
}

Example:

do {
    let audioURL = try await client.synthesize(
        text: "Hello",
        voice: "en-US-JennyNeural",
        outputURL: outputURL
    )
} catch EdgeTTSError.synthesisFailed {
    print("Synthesis failed")
} catch EdgeTTSError.networkError(let error) {
    print("Network error: \(error)")
} catch {
    print("Unknown error: \(error)")
}

Speech Parameters

Rate, Volume, and Pitch

You can adjust the speech rate, volume, and pitch using the optional parameters:

Rate: Controls speech speed. Format: "+X%" (faster) or "-X%" (slower)
- Examples: "+50%" (50% faster), "-50%" (50% slower)
Volume: Controls speech volume. Format: "+X%" (louder) or "-X%" (quieter)
- Examples: "+50%" (50% louder), "-50%" (50% quieter)
Pitch: Controls voice pitch. Format: "+XHz" (higher) or "-XHz" (lower)
- Examples: "+50Hz" (50Hz higher), "-50Hz" (50Hz lower)

These parameters are optional and default to "+0%" (rate/volume) or "+0Hz" (pitch) if not specified. All parameters use SSML prosody tags internally, similar to the Python edge-tts library.

Voice Selection

Voices follow the format: {locale}-{VoiceName}Neural

Popular Voice Examples

English

en-US-JennyNeural - English (US), Female
en-US-GuyNeural - English (US), Male
en-GB-LibbyNeural - English (UK), Female
en-AU-NatashaNeural - English (Australia), Female

Chinese

zh-CN-XiaoxiaoNeural - Chinese (Mandarin), Female
zh-CN-YunjianNeural - Chinese (Mandarin), Male
zh-TW-HsiaoChenNeural - Chinese (Taiwan), Female
zh-HK-HiuGaaiNeural - Chinese (Hong Kong), Female

Japanese

ja-JP-NanamiNeural - Japanese, Female
ja-JP-KeitaNeural - Japanese, Male

Other Languages

es-ES-ElviraNeural - Spanish (Spain), Female
fr-FR-DeniseNeural - French, Female
de-DE-KatjaNeural - German, Female
ko-KR-SunHiNeural - Korean, Female
it-IT-ElsaNeural - Italian, Female
pt-BR-FranciscaNeural - Portuguese (Brazil), Female
ru-RU-SvetlanaNeural - Russian, Female

Use getAvailableVoices() to discover all available voices for your use case.

Error Handling

The package provides detailed error types for better error handling:

do {
    let audioURL = try await ttsService.synthesize(
        text: "Hello",
        voice: "en-US-JennyNeural",
        outputURL: outputURL
    )
} catch EdgeTTSError.synthesisFailed {
    print("Audio synthesis failed")
} catch EdgeTTSError.invalidVoice {
    print("Invalid voice identifier")
} catch EdgeTTSError.networkError(let error) {
    print("Network error: \(error.localizedDescription)")
} catch EdgeTTSError.fileWriteFailed(let error) {
    print("Failed to write file: \(error.localizedDescription)")
} catch {
    print("Unknown error: \(error)")
}

Custom URLSession

You can provide a custom URLSession for advanced configuration (proxy, timeouts, etc.):

let configuration = URLSessionConfiguration.default
configuration.timeoutIntervalForRequest = 30
configuration.timeoutIntervalForResource = 60

let customSession = URLSession(configuration: configuration)
let ttsService = EdgeTTSService(session: customSession)

Requirements

iOS 15.0+ / macOS 12.0+
Swift 5.9+

Acknowledgments

This work has been inspired by the Python edge-tts library by rany2.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
Sources/SwiftEdgeTTS		Sources/SwiftEdgeTTS
Tests/SwiftEdgeTTSTests		Tests/SwiftEdgeTTSTests
artifacts/demos		artifacts/demos
.gitignore		.gitignore
LICENSE		LICENSE
Package.swift		Package.swift
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SwiftEdgeTTS

Features

Installation

Swift Package Manager

Versioning

Quick Start

Basic Usage

Adjusting Speech Parameters

Batch Synthesis

Get Available Voices

Multi-language Examples

Audio Demos

Edge-TTS Compatibility

API Reference

`EdgeTTSClient` Protocol

`EdgeTTSService`

Error Handling

Speech Parameters

Rate, Volume, and Pitch

Voice Selection

Popular Voice Examples

English

Chinese

Japanese

Other Languages

Error Handling

Custom URLSession

Requirements

Acknowledgments

License

About

Uh oh!

Releases

Packages

Languages

License

herrkaefer/SwiftEdgeTTS

Folders and files

Latest commit

History

Repository files navigation

SwiftEdgeTTS

Features

Installation

Swift Package Manager

Versioning

Quick Start

Basic Usage

Adjusting Speech Parameters

Batch Synthesis

Get Available Voices

Multi-language Examples

Audio Demos

Edge-TTS Compatibility

API Reference

EdgeTTSClient Protocol

EdgeTTSService

Error Handling

Speech Parameters

Rate, Volume, and Pitch

Voice Selection

Popular Voice Examples

English

Chinese

Japanese

Other Languages

Error Handling

Custom URLSession

Requirements

Acknowledgments

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`EdgeTTSClient` Protocol

`EdgeTTSService`

Packages