"DSPy 是什么？它解决了什么问题？"

"DSPy（Declarative Self-improving Python）是斯坦福 NLP 团队推出的框架，以编程优化取代手动提示工程。与其手动设计提示并期望它们有效，DSPy 让您定义任务结构和优化目标，然后自动搜索最佳的提示策略、少样本示例和微调配置。"

"DSPy 的优化过程是如何运作的？"

"DSPy 将提示构建视为优化问题。您定义具有输入/输出签名的任务模块，提供一些标记示例，并指定要最大化的指标。DSPy 的优化器然后探索不同的提示模板、少样本示例选择和指令措辞——使用 bootstrap few-shot、程序合成和贝叶斯搜索等技术——来找到在验证集上最大化指标的配置。"

"使用 DSPy 时需要编写提示吗？"

"不需要。在 DSPy 中，您从不直接编写提示。您定义声明式模块来指定输入和输出，DSPy 会自动生成提示。当优化器运行时，您会得到最佳提示配置——但框架设计使您通过模块 API 进行交互，而非原始提示字符串。"

"DSPy 可以与任何 LLM 提供者一起使用吗？"

"可以。DSPy 与模型无关，支持 OpenAI、Anthropic、Google、Cohere、通过 Ollama 的本地模型以及任何 Hugging Face 模型。优化过程无论底层模型为何都能运作，不过最佳提示会因模型家族和版本而异。"

"DSPy 与手动提示工程相比如何？"

"DSPy 在已发表的基准测试中系统性地优于手动设计的提示。该框架自动探索数百或数千种提示变体，而手动工程通常评估 5-10 种变体。在标准 NLP 基准测试中，DSPy 优化的提示在任务准确度上比人工设计的提示高出 10-30%。"

DSPy：斯坦福大学以算法优化 AI 提示的框架

DSPy 是一个以算法优化提示与微调 LLM 的框架，以编程优化取代手动提示工程。

Keeping this site alive takes effort — your support means everything.

無程式碼也能輕鬆打造專業LINE官方帳號！一鍵導入模板，讓AI助你行銷加分！

技术编辑团队 May 05, 2026 阅读 7 分钟

Prompt engineering has become an unexpected skill requirement in the AI era. Developers who wanted reliable LLM output learned to craft system prompts, structure few-shot examples, chain instructions, and iterate through trial and error. The process was manual, subjective, and brittle — a prompt that worked perfectly with GPT-4 might fail with Claude, and a prompt that worked last week might degrade after a model update.

DSPy, from the Stanford NLP group, takes a fundamentally different approach. Instead of asking developers to write prompts, it asks them to define the task. You specify what inputs the system receives, what outputs it should produce, and how to measure success. DSPy then treats the prompt as an optimization variable — searching through prompt strategies, few-shot examples, and instruction phrasings to find the combination that maximizes your metric.

How Does DSPy Replace Manual Prompt Engineering?

The core insight of DSPy is that prompt engineering is an optimization problem in disguise. Given a task (translate text, answer questions, extract entities), a set of labeled examples, and a success metric, the goal is to find the prompt configuration that maximizes performance.

DSPy makes this explicit through its module abstraction. A module defines the task boundary — what goes in and what comes out — without specifying how the LLM should be instructed. The module specification includes input/output field signatures, optional constraints on output format, and a reference to the metric that measures output quality.

Approach	Effort	Consistency	Transferability	Optimality
Manual prompt engineering	High (hours per prompt)	Low (varies by operator)	Low (rewrite per model)	Low (5-10 trials)
DSPy optimization	Medium (define task + metric)	High (algorithmic)	High (re-optimize per model)	High (100s-1000s of trials)

The optimizer starts with a naive prompt (a simple instruction to perform the task) and iteratively refines it. Each iteration tries a new prompt variant — different phrasings, different few-shot example selections, different output format specifications — evaluates it against the validation set using the provided metric, and keeps the best performers. Over hundreds of iterations, the prompt converges to an optimal configuration.

What Optimization Strategies Does DSPy Support?

DSPy provides multiple optimization strategies, each suited to different task characteristics and resource budgets. The simplest strategies optimize the prompt text and few-shot examples without any fine-tuning. More advanced strategies can fine-tune the underlying model or use ensemble approaches.

BootstrapFewShot is the default and most accessible optimizer. It takes a few labeled examples and asks the LLM to generate additional examples, then selects the best few-shot demonstrations for each prompt. BootstrapFewShotWithRandomSearch extends this by randomly sampling different combinations of examples and instructions, evaluating each combination.

Optimizer	Approach	Best For	Resource Needs
BootstrapFewShot	Generates examples, selects best	Quick optimization	Low (few calls)
BootstrapFewShotWithRandomSearch	Random search over configs	Balanced optimization	Medium
MIPRO	Bayesian optimization	Maximum performance	High (many calls)
MIPROv2	Enhanced Bayesian + fine-tuning	SOTA results	Very high
Ensemble	Multiple prompts voted	Reliability-critical	High

MIPRO (Multimodal Instruction-aware Prompt Optimization) uses Bayesian optimization to efficiently search the prompt configuration space. It maintains a probabilistic model of how prompt changes affect performance and uses this model to select the most promising configurations to try. MIPROv2 extends this with automatic detection of whether the optimal configuration involves a better prompt, better examples, or fine-tuning the model itself.

What Does a Real DSPy Workflow Look Like?

A typical DSPy workflow starts with installing the library and defining a language model client. You configure which LLM to use and the DSPy settings. Then you define modules for your task — DSPy provides built-in modules for common patterns like chain-of-thought reasoning, retrieval-augmented generation, and classification.

The critical step is defining the metric. This is the function DSPy uses to evaluate prompt quality during optimization. For a translation task, the metric might be BLEU score or human-rated accuracy. For a Q&A system, it might be exact match or F1 score against reference answers. The quality of the metric directly determines the quality of the optimized prompt.

flowchart LR
    A[定义任务<br/>Input/Output Signatures] --> B[设定 LLM]
    B --> C[提供范例<br/>Labeled Data]
    C --> D[定义指标<br/>Quality Measure]
    D --> E[选择优化器]
    E --> F[DSPy 优化器<br/>Run 100s of trials]
    F --> G[评估变体]
    G --> H{Converged?}
    H -->|No| F
    H -->|Yes| I[最佳提示<br/>Configuration]
    I --> J[部署模组]

After optimization, DSPy provides the compiled module with the optimal prompt embedded. You use this module in your application through the same API as the unoptimized version — the optimization happens once, and the result is a drop-in replacement.

How Does DSPy Handle Different Model Families?

One of DSPy’s most practical features is automatic prompt adaptation across model families. A prompt optimized for GPT-4 might not work well with Llama 3 or Claude. DSPy addresses this by re-optimizing prompts when the underlying model changes.

The framework maintains a database of prompt characteristics across models, learning what instruction styles, example formats, and output specifications work best for each model family. When you switch models, DSPy can warm-start the optimization process with a prompt configuration that is known to work well for the new model, reducing the number of optimization iterations needed.

This model-adaptation capability is increasingly important as organizations deploy across multiple model providers. A typical pattern uses DSPy-optimized prompts with GPT-4 for production alongside optimized prompts for local models (via Ollama) for development and testing, ensuring consistency across environments.

Model Family	Optimal Instruction Style	Few-Shot Sensitivity
GPT-4	Direct, concise	Medium
Claude 3	Detailed, structured	Low
Llama 3	Explicit, step-by-step	High
Mistral	Verbose, examples-heavy	High
Gemini	Structured, bullet-point	Medium

FAQ

What is DSPy and what problem does it solve? DSPy (Declarative Self-improving Python) is a Stanford NLP framework that replaces manual prompt engineering with programmatic optimization. You define the task and metric, and DSPy automatically finds the optimal prompt strategy.

How does DSPy’s optimization process work? DSPy treats prompt construction as an optimization problem. It explores hundreds of prompt variants — different phrasings, few-shot selections, and instructions — using techniques like bootstrap few-shot and Bayesian search to maximize your metric.

Do I need to write prompts when using DSPy? No. In DSPy, you never write prompts directly. You define declarative modules with input/output signatures, and DSPy generates and optimizes prompts automatically.

Can DSPy work with any LLM provider? Yes. DSPy is model-agnostic and supports OpenAI, Anthropic, Google, Cohere, Ollama, and Hugging Face models with automatic prompt adaptation across model families.

How does DSPy compare to manual prompt engineering? DSPy systematically outperforms manual prompts in benchmarks, achieving 10-30% higher accuracy by exploring hundreds of prompt variants instead of the typical 5-10 manually evaluated options.

DSPy：斯坦福大学以算法优化 AI 提示的框架

How Does DSPy Replace Manual Prompt Engineering?

What Optimization Strategies Does DSPy Support?

What Does a Real DSPy Workflow Look Like?

How Does DSPy Handle Different Model Families?

FAQ

References

LATEST POST

马斯克、库克与芬克预计本周随特朗普访中代表团赴北京

佛州大学毕业典礼演讲者遭嘘声凸显世代价值观断层与言论风险

Workday、Anthropic 与 LISC 联手推出 AI 一人创业加速器

TAG

CATEGORIES

DSPy：斯坦福大学以算法优化 AI 提示的框架

How Does DSPy Replace Manual Prompt Engineering?

What Optimization Strategies Does DSPy Support?

What Does a Real DSPy Workflow Look Like?

How Does DSPy Handle Different Model Families?

FAQ

References

LATEST POST

马斯克、库克与芬克预计本周随特朗普访中代表团赴北京

佛州大学毕业典礼演讲者遭嘘声 凸显世代价值观断层与言论风险

Workday、Anthropic 与 LISC 联手推出 AI 一人创业加速器

TAG

CATEGORIES

佛州大学毕业典礼演讲者遭嘘声凸显世代价值观断层与言论风险