Open Source

MediaCrawler: Open-Source Social Media Data Scraper with 30K Stars

MediaCrawler is an open-source multi-platform social media scraper supporting Xiaohongshu, Douyin, Kuaishou, Bilibili, Weibo, and more with Playwright automation.

Keeping this site alive takes effort — your support means everything.

無程式碼也能輕鬆打造專業LINE官方帳號！一鍵導入模板，讓AI助你行銷加分！

Editorial Team May 03, 2026 5 min read

Social media data is a goldmine for market research, trend analysis, and competitive intelligence – but accessing it programmatically is notoriously difficult. Platforms actively block scrapers, change their APIs, and require complex authentication flows. MediaCrawler has emerged as one of the most popular open-source solutions to this challenge, with over 30,000 GitHub stars and support for all major Chinese social media platforms.

The project at github.com/NanmiCoder/MediaCrawler provides a unified framework for crawling data from Xiaohongshu (Little Red Book), Douyin (TikTok China), Kuaishou, Bilibili, Weibo, and more. It uses Playwright for browser automation, IP rotation, and cookie management to bypass anti-scraping measures. The result is a reliable data pipeline for extracting posts, comments, user profiles, and engagement metrics.

MediaCrawler’s popularity stems from its pragmatic design. Rather than being a generic scraping library, it is specifically tuned to the quirks of each supported platform – login flows, rate limits, response formats, and anti-bot detection mechanisms are all handled internally. Users configure targets (keywords, user IDs, hashtags) and the crawler handles the rest.

What is MediaCrawler?

MediaCrawler is an open-source, multi-platform social media data scraper that uses Playwright-based browser automation to collect content from major social platforms. It supports search-based crawling (by keyword), user-based crawling (by user ID), and comment collection. Data is output in structured JSON format for downstream analysis.

Which platforms are supported?

MediaCrawler supports all major Chinese social media platforms and a growing selection of international platforms.

Platform	Type	Crawling Modes
Xiaohongshu (RED)	Lifestyle/content sharing	Search notes, user notes, comments
Douyin (TikTok CN)	Short video	Search videos, user videos, comments
Kuaishou	Short video	Search videos, user videos
Bilibili	Video streaming	Search videos, user videos, comments
Weibo	Microblogging	Search posts, user posts, comments
Zhihu	Q&A platform	Search questions, answers
Tieba (Planned)	Forums	Search threads
TikTok Global (Planned)	Short video	Search videos

Each platform has its own crawling strategy tailored to its API behavior and anti-scraping measures.

What technology powers MediaCrawler?

MediaCrawler is built on a stack of well-established Python libraries for web automation and data processing.

Component	Technology	Purpose
Browser automation	Playwright	Headless browser control
Proxy management	Custom IP rotation	Bypass rate limits and blocks
Cookie management	Persistent cookie store	Maintain login sessions
Data extraction	CSS/XPath selectors	Parse page content
Data storage	JSON, CSV, MySQL	Output collected data
Concurrency	asyncio	Parallel scraping
Anti-detection	Custom stealth patches	Avoid bot detection

The Playwright-based approach means MediaCrawler interacts with pages like a real user, making it significantly harder for platforms to detect compared to simple HTTP request-based scrapers.

What are MediaCrawler’s key features?

MediaCrawler provides a comprehensive set of scraping capabilities beyond basic content extraction.

Feature	Description
Keyword search scraping	Collect all posts/videos matching search terms
User profile scraping	Extract all content from a specific user
Comment harvesting	Collect comments and replies on posts
Auto login	Credential-based or QR-code login per platform
Proxy rotation	SOCKS5/HTTP proxy pools for IP diversity
Rate limiting	Configurable delays to avoid detection
Incremental crawling	Resume from last checkpoint
Structured output	JSON with normalized field names across platforms

Is there a Pro version of MediaCrawler?

The core MediaCrawler project is fully open-source and free. The developers offer a “Pro” version with additional features for commercial users.

Feature	Open Source	Pro Version
Platform support	6 platforms	10+ platforms
Proxy support	Basic SOCKS5	Advanced rotating proxies
Data export	JSON + CSV	JSON, CSV, MySQL, Elasticsearch
Rate limiting	Manual config	Adaptive AI rate limiting
Support	GitHub Issues	Dedicated support channel
License	MIT	Commercial license

The Pro version is primarily aimed at enterprises running large-scale data collection pipelines.

Frequently Asked Questions

What is MediaCrawler?

MediaCrawler is an open-source Python tool for scraping social media data from platforms like Xiaohongshu, Douyin, Bilibili, Weibo, and more. It uses Playwright browser automation to extract posts, comments, and user data.

Xiaohongshu (RED), Douyin (TikTok China), Kuaishou, Bilibili, Weibo, and Zhihu. TikTok Global support is planned for future releases.

What technology does MediaCrawler use?

Playwright for browser automation, asyncio for concurrent scraping, customizable IP rotation, and persistent cookie management for session maintenance.

What are MediaCrawler’s key features?

Keyword search scraping, user profile extraction, comment harvesting, auto login, proxy rotation, rate limiting, incremental crawling, and structured JSON output.

Is there a Pro/enterprise version of MediaCrawler?

Yes, a Pro version offers additional platforms, advanced proxy management, adaptive rate limiting, and commercial support for enterprise users.

MediaCrawler: Open-Source Social Media Data Scraper with 30K Stars

What is MediaCrawler?

Which platforms are supported?

What technology powers MediaCrawler?

What are MediaCrawler’s key features?

Is there a Pro version of MediaCrawler?