High-quality text-to-speech usually requires expensive cloud APIs or complex local model setup. Edge-TTS, created by rany2, takes a clever approach: it taps into Microsoft Edge’s built-in online TTS service, providing free access to hundreds of natural-sounding voices across dozens of languages.
The tool is a simple Python CLI that transforms text into audio files using the same neural TTS voices available in Microsoft Edge’s browser read-aloud feature. With support for SSML, voice tuning, and subtitle generation, it punches far above its weight as a free, open-source TTS solution.
Voice and Language Support
| Language | Male Voices | Female Voices | Quality |
|---|---|---|---|
| English (US) | 8 | 10 | Neural high |
| English (UK) | 5 | 6 | Neural high |
| Chinese (Mandarin) | 4 | 5 | Neural high |
| Japanese | 3 | 4 | Neural high |
| Spanish | 4 | 5 | Neural high |
| French | 3 | 4 | Neural high |
| German | 3 | 4 | Neural high |
| Total across 60+ languages | 100+ | 200+ | Neural |
Audio Generation Pipeline
flowchart LR
A[Text Input] --> B{Format}
B -->|Plain Text| C[Text Segmentation]
B -->|SSML| D[SSML Parsing]
C --> E[Voice Selection]
D --> E
F[Voice Parameters] --> E
E --> G[Edge TTS API Request]
G --> H[Audio Stream]
H --> I[MP3/WAV Output]
H --> J[SRT/VTT Subtitles]The pipeline handles both plain text and SSML input. SSML allows fine-grained control over pronunciation, pitch, rate, and emphasis. The audio stream from Edge’s API is saved as MP3 or WAV, and subtitles can be generated with word-level timing.
Feature Comparison
| Feature | edge-tts | Google TTS | AWS Polly | ElevenLabs |
|---|---|---|---|---|
| Cost | Free | Free tier limited | Pay per use | Pay per use |
| Voice count | 300+ | 100+ | 50+ | 100+ |
| SSML support | Yes | Yes | Yes | Partial |
| Subtitle export | Yes | No | No | No |
| API key required | No | Yes | Yes | Yes |
Practical Applications
Edge-TTS is ideal for content creators generating voiceovers, developers prototyping voice features, accessibility tools that need screen reader voices, language learning applications, and podcast creation. The lack of API keys or usage limits makes it particularly attractive for projects with unpredictable volume or budget constraints.
For more information, visit the edge-tts GitHub repository and the Microsoft Edge TTS voice list.
Frequently Asked Questions
Q: Is edge-tts legal to use? A: Yes, it uses the same public API as Microsoft Edge’s browser feature. Check Microsoft’s terms for commercial use.
Q: Does it require an internet connection? A: Yes, the TTS processing happens on Microsoft’s servers via the Edge API.
Q: Can I adjust voice speed and pitch? A: Yes, through SSML tags for fine-grained control over prosody.
Q: What audio formats does it output? A: MP3 and WAV are supported out of the box.
Q: How long can the generated audio be? A: There is no hard limit, but very long texts should be segmented for reliability.
無程式碼也能輕鬆打造專業LINE官方帳號!一鍵導入模板,讓AI助你行銷加分!