edge-tts: Python TTS Using Microsoft Edge Online Service

edge-tts is a Python CLI tool for generating speech from text using Microsoft Edge's online TTS service, supporting hundreds of voices and languages.

Keeping this site alive takes effort — your support means everything.

無程式碼也能輕鬆打造專業LINE官方帳號！一鍵導入模板，讓AI助你行銷加分！

Editorial Team May 05, 2026 3 min read

High-quality text-to-speech usually requires expensive cloud APIs or complex local model setup. Edge-TTS, created by rany2, takes a clever approach: it taps into Microsoft Edge’s built-in online TTS service, providing free access to hundreds of natural-sounding voices across dozens of languages.

The tool is a simple Python CLI that transforms text into audio files using the same neural TTS voices available in Microsoft Edge’s browser read-aloud feature. With support for SSML, voice tuning, and subtitle generation, it punches far above its weight as a free, open-source TTS solution.

Voice and Language Support

Language	Male Voices	Female Voices	Quality
English (US)	8	10	Neural high
English (UK)	5	6	Neural high
Chinese (Mandarin)	4	5	Neural high
Japanese	3	4	Neural high
Spanish	4	5	Neural high
French	3	4	Neural high
German	3	4	Neural high
Total across 60+ languages	100+	200+	Neural

Audio Generation Pipeline

flowchart LR
    A[Text Input] --> B{Format}
    B -->|Plain Text| C[Text Segmentation]
    B -->|SSML| D[SSML Parsing]
    C --> E[Voice Selection]
    D --> E
    F[Voice Parameters] --> E
    E --> G[Edge TTS API Request]
    G --> H[Audio Stream]
    H --> I[MP3/WAV Output]
    H --> J[SRT/VTT Subtitles]

The pipeline handles both plain text and SSML input. SSML allows fine-grained control over pronunciation, pitch, rate, and emphasis. The audio stream from Edge’s API is saved as MP3 or WAV, and subtitles can be generated with word-level timing.

Feature Comparison

Feature	edge-tts	Google TTS	AWS Polly	ElevenLabs
Cost	Free	Free tier limited	Pay per use	Pay per use
Voice count	300+	100+	50+	100+
SSML support	Yes	Yes	Yes	Partial
Subtitle export	Yes	No	No	No
API key required	No	Yes	Yes	Yes

Practical Applications

Edge-TTS is ideal for content creators generating voiceovers, developers prototyping voice features, accessibility tools that need screen reader voices, language learning applications, and podcast creation. The lack of API keys or usage limits makes it particularly attractive for projects with unpredictable volume or budget constraints.

For more information, visit the edge-tts GitHub repository and the Microsoft Edge TTS voice list.

Frequently Asked Questions

Q: Is edge-tts legal to use? A: Yes, it uses the same public API as Microsoft Edge’s browser feature. Check Microsoft’s terms for commercial use.

Q: Does it require an internet connection? A: Yes, the TTS processing happens on Microsoft’s servers via the Edge API.

Q: Can I adjust voice speed and pitch? A: Yes, through SSML tags for fine-grained control over prosody.

Q: What audio formats does it output? A: MP3 and WAV are supported out of the box.

Q: How long can the generated audio be? A: There is no hard limit, but very long texts should be segmented for reliability.

edge-tts: Python TTS Using Microsoft Edge Online Service

Voice and Language Support

Audio Generation Pipeline

Feature Comparison

Practical Applications

Frequently Asked Questions

LATEST POST

Workday, Anthropic, and LISC Join Forces to Launch AI Solopreneurship Accelerato

Sensor Tower Acquires AppMagic, Filling SMB Data Analytics Gap

Musk, Cook, and Fink Expected to Join Trump's Delegation to Beijing This Week

TAG

CATEGORIES