OmniSVG: Unified Multimodal SVG Generation Model (NeurIPS 2025)

OmniSVG is the first family of end-to-end multimodal SVG generators using VLMs, capable of generating complex SVGs from icons to anime characters.

Keeping this site alive takes effort — your support means everything.

無程式碼也能輕鬆打造專業LINE官方帳號！一鍵導入模板，讓AI助你行銷加分！

Editorial Team May 03, 2026 5 min read

Vector graphics are everywhere – from icons and logos to illustrations and data visualizations. But generating complex SVGs programmatically has remained a stubborn research challenge, with most approaches limited to simple geometric shapes or requiring extensive training data. OmniSVG, published at NeurIPS 2025, breaks through these limitations by introducing the first unified family of end-to-end multimodal SVG generators built on vision-language models.

The project at github.com/OmniSVG/OmniSVG represents a paradigm shift in SVG generation. Rather than relying on differentiable rendering or reinforcement learning – the dominant approaches prior to OmniSVG – it fine-tunes pre-trained VLMs to output SVG code directly. This allows the model to leverage the vast visual knowledge encoded in modern VLMs while learning the syntax and structure of SVG as a target language.

The results are impressive: OmniSVG can generate detailed SVGs ranging from simple icons to complex anime characters, with unprecedented diversity and quality. The model understands visual concepts, style references, and structural relationships, producing clean, composable SVG code rather than pixel approximations. The accompanying MMSVG dataset, the largest collection of SVG-text pairs ever assembled, is also released to the research community.

What is OmniSVG?

OmniSVG is the first family of end-to-end multimodal SVG generators based on vision-language models. It generates complex, structured SVG code from text descriptions, reference images, or a combination of both. The model produces clean vector graphics ranging from simple icons to detailed anime characters, without requiring intermediate raster-to-vector conversion.

What model sizes are available?

OmniSVG is released in multiple sizes to accommodate different deployment scenarios.

Model	Parameters	Base VLM	Best For
OmniSVG-S	0.5B	Phi-3.5-mini	Fast generation, edge devices
OmniSVG-B	2.7B	Phi-3.5-medium	General use, quality-speed balance
OmniSVG-L	7B	LLaVA-NeXT	Highest quality, complex scenes
OmniSVG-XL	13B	LLaVA-NeXT-13B	Maximum quality, research

All models share the same architecture but differ in capacity and inference cost. The B and L variants are recommended for most use cases.

How do you get started with OmniSVG?

OmniSVG is available through the Transformers library and a standalone Python package:

# Install
pip install omnisvg

# Generate SVG from text description
from omnisvg import OmniSVG

model = OmniSVG.from_pretrained("OmniSVG/OmniSVG-L")
svg_code = model.generate("A minimalist mountain landscape at sunset")
print(svg_code[:200])

The generated SVG code can be saved directly to .svg files and opened in any vector graphics editor or web browser.

What is the MMSVG dataset?

The MMSVG (Multi-Modal SVG) dataset is the largest collection of SVG-text pairs ever publicly released.

Dataset Aspect	Quantity
Total SVG-text pairs	1.2 million
Icon-level SVGs	800,000
Illustration-level SVGs	300,000
Anime/manga SVGs	100,000
Text descriptions	1.2 million (human-verified subset: 200K)
Unique SVG token vocabulary	8,432 command tokens

The dataset covers a wide range of visual styles including flat icons, detailed illustrations, technical diagrams, and character art. Each SVG is paired with a text description, and a 200,000-pair subset has been human-verified for quality.

What is OmniSVG’s license?

OmniSVG is released under the Apache License 2.0. The MMSVG dataset is released under CC-BY 4.0. Both licenses permit commercial use, modification, and redistribution with attribution.

Frequently Asked Questions

What is OmniSVG?

OmniSVG is the first family of end-to-end multimodal SVG generators using vision-language models, published at NeurIPS 2025. It generates complex SVG code from text descriptions or reference images, from simple icons to detailed anime characters.

What model sizes are available?

Four sizes: OmniSVG-S (0.5B parameters, edge devices), OmniSVG-B (2.7B, general use), OmniSVG-L (7B, highest quality), and OmniSVG-XL (13B, research). The B and L variants are recommended for most applications.

How do I get started with OmniSVG?

Install via pip install omnisvg, load a model with OmniSVG.from_pretrained(), and call .generate() with a text description. The output is valid SVG code that can be saved to a file.

What is the MMSVG dataset?

The MMSVG dataset contains 1.2 million SVG-text pairs covering icons, illustrations, technical diagrams, and anime/manga art. It is the largest publicly released collection of its kind, with a 200K human-verified subset.

What license is OmniSVG released under?

Apache License 2.0 for the models and CC-BY 4.0 for the MMSVG dataset. Both permit commercial use with attribution.

OmniSVG: Unified Multimodal SVG Generation Model (NeurIPS 2025)

What is OmniSVG?

What model sizes are available?

How do you get started with OmniSVG?

What is the MMSVG dataset?

What is OmniSVG’s license?

Frequently Asked Questions

What is OmniSVG?

What model sizes are available?

How do I get started with OmniSVG?

What is the MMSVG dataset?

What license is OmniSVG released under?

Further Reading

LATEST POST

Workday, Anthropic, and LISC Join Forces to Launch AI Solopreneurship Accelerato

Sensor Tower Acquires AppMagic, Filling SMB Data Analytics Gap

Musk, Cook, and Fink Expected to Join Trump's Delegation to Beijing This Week

TAG

CATEGORIES