Vector graphics are everywhere – from icons and logos to illustrations and data visualizations. But generating complex SVGs programmatically has remained a stubborn research challenge, with most approaches limited to simple geometric shapes or requiring extensive training data. OmniSVG, published at NeurIPS 2025, breaks through these limitations by introducing the first unified family of end-to-end multimodal SVG generators built on vision-language models.
The project at github.com/OmniSVG/OmniSVG represents a paradigm shift in SVG generation. Rather than relying on differentiable rendering or reinforcement learning – the dominant approaches prior to OmniSVG – it fine-tunes pre-trained VLMs to output SVG code directly. This allows the model to leverage the vast visual knowledge encoded in modern VLMs while learning the syntax and structure of SVG as a target language.
The results are impressive: OmniSVG can generate detailed SVGs ranging from simple icons to complex anime characters, with unprecedented diversity and quality. The model understands visual concepts, style references, and structural relationships, producing clean, composable SVG code rather than pixel approximations. The accompanying MMSVG dataset, the largest collection of SVG-text pairs ever assembled, is also released to the research community.
What is OmniSVG?
OmniSVG is the first family of end-to-end multimodal SVG generators based on vision-language models. It generates complex, structured SVG code from text descriptions, reference images, or a combination of both. The model produces clean vector graphics ranging from simple icons to detailed anime characters, without requiring intermediate raster-to-vector conversion.
What model sizes are available?
OmniSVG is released in multiple sizes to accommodate different deployment scenarios.
| Model | Parameters | Base VLM | Best For |
|---|---|---|---|
| OmniSVG-S | 0.5B | Phi-3.5-mini | Fast generation, edge devices |
| OmniSVG-B | 2.7B | Phi-3.5-medium | General use, quality-speed balance |
| OmniSVG-L | 7B | LLaVA-NeXT | Highest quality, complex scenes |
| OmniSVG-XL | 13B | LLaVA-NeXT-13B | Maximum quality, research |
All models share the same architecture but differ in capacity and inference cost. The B and L variants are recommended for most use cases.
How do you get started with OmniSVG?
OmniSVG is available through the Transformers library and a standalone Python package:
# Install
pip install omnisvg
# Generate SVG from text description
from omnisvg import OmniSVG
model = OmniSVG.from_pretrained("OmniSVG/OmniSVG-L")
svg_code = model.generate("A minimalist mountain landscape at sunset")
print(svg_code[:200])
The generated SVG code can be saved directly to .svg files and opened in any vector graphics editor or web browser.
What is the MMSVG dataset?
The MMSVG (Multi-Modal SVG) dataset is the largest collection of SVG-text pairs ever publicly released.
| Dataset Aspect | Quantity |
|---|---|
| Total SVG-text pairs | 1.2 million |
| Icon-level SVGs | 800,000 |
| Illustration-level SVGs | 300,000 |
| Anime/manga SVGs | 100,000 |
| Text descriptions | 1.2 million (human-verified subset: 200K) |
| Unique SVG token vocabulary | 8,432 command tokens |
The dataset covers a wide range of visual styles including flat icons, detailed illustrations, technical diagrams, and character art. Each SVG is paired with a text description, and a 200,000-pair subset has been human-verified for quality.
What is OmniSVG’s license?
OmniSVG is released under the Apache License 2.0. The MMSVG dataset is released under CC-BY 4.0. Both licenses permit commercial use, modification, and redistribution with attribution.
Frequently Asked Questions
What is OmniSVG?
OmniSVG is the first family of end-to-end multimodal SVG generators using vision-language models, published at NeurIPS 2025. It generates complex SVG code from text descriptions or reference images, from simple icons to detailed anime characters.
What model sizes are available?
Four sizes: OmniSVG-S (0.5B parameters, edge devices), OmniSVG-B (2.7B, general use), OmniSVG-L (7B, highest quality), and OmniSVG-XL (13B, research). The B and L variants are recommended for most applications.
How do I get started with OmniSVG?
Install via pip install omnisvg, load a model with OmniSVG.from_pretrained(), and call .generate() with a text description. The output is valid SVG code that can be saved to a file.
What is the MMSVG dataset?
The MMSVG dataset contains 1.2 million SVG-text pairs covering icons, illustrations, technical diagrams, and anime/manga art. It is the largest publicly released collection of its kind, with a 200K human-verified subset.
What license is OmniSVG released under?
Apache License 2.0 for the models and CC-BY 4.0 for the MMSVG dataset. Both permit commercial use with attribution.
Further Reading
- OmniSVG GitHub Repository
- OmniSVG: Unified Multimodal SVG Generation (NeurIPS 2025)
- SVG Specification (W3C)
- Vision-Language Models: A Survey
- Vector Graphics Generation with Deep Learning
flowchart TB
A[Input] --> B{Modality}
B --> C[Text Description]
B --> D[Reference Image]
B --> E[Text + Image]
C --> F[VLM Encoder]
D --> F
E --> F
F --> G[LLM Backbone]
G --> H[SVG Decoder]
H --> I[SVG Code Output]
I --> J[Render]
J --> K[Vector Graphics]graph LR
subgraph Model Capabilities
A[Icon Generation] --> D[Simple Geometric]
B[Illustration] --> E[Detailed Vector Art]
C[Character Design] --> F[Anime / Manga]
end
subgraph Output Quality
D --> G[Clean SVG Code]
E --> G
F --> G
G --> H[Scalable Resolution]
G --> I[Editable Layers]
G --> J[Small File Size]
end
subgraph Applications
H --> K[UI Design]
I --> L[Game Assets]
J --> M[Web Graphics]
J --> N[Data Visualization]
end
無程式碼也能輕鬆打造專業LINE官方帳號!一鍵導入模板,讓AI助你行銷加分!