Marco-o1: Alibaba's Open-Source Large Reasoning Model for Real-World Solutions

Q: "What is Marco-o1?"

"Marco-o1 is an open-source large reasoning model developed by Alibaba's AIDC-AI (MarcoPolo Team) based on Qwen2-7B-Instruct. It is designed for real-world problem solving across both standard-answer domains (math, physics, coding) and open-ended scenarios using advanced reasoning techniques like Chain-of-Thought fine-tuning and Monte Carlo Tree Search."

Q: "What techniques does Marco-o1 use (CoT + MCTS)?"

"Marco-o1 combines Chain-of-Thought (CoT) fine-tuning with Monte Carlo Tree Search (MCTS) to enhance reasoning depth. It also uses reflection mechanisms, novel mini-step granularity reasoning action strategies, and EDPO (Difficulty-Estimated Policy Optimization) for progressive self-improvement."

Q: "What are the different versions of Marco-o1?"

"Marco-o1 v1 (November 2024) was the initial open reasoning model. Marco-o1 v2 (February 2025) added DPO optimization for math and planning, and was accepted at ACL 2025. Marco-o1 v3 (February 2025) introduced MAM (Mixed Attention Module) and TTT (Test-Time Training), achieving 20% reduction in inference cost and 4.7% average performance improvement. A Marco-o1 Agentic model is planned."

Q: "How does Marco-o1 perform compared to other reasoning models?"

"Marco-o1 demonstrates strong performance on reasoning benchmarks, with v3 achieving notable inference cost reductions alongside quality improvements. The v2 paper was accepted at ACL 2025, and a follow-up paper on efficient LLM reasoning was accepted at ICLR 2026, underscoring the research team's academic contributions."

Q: "How can I use Marco-o1?"

"Marco-o1 is available on GitHub and Hugging Face. You can clone the repository, install dependencies with pip, and load the model using the Hugging Face Transformers library. It runs on standard hardware suitable for 7B-parameter models and integrates with common ML frameworks."

Marco-o1 is an open-source large reasoning model by Alibaba AIDC-AI, built on Qwen2-7B with CoT fine-tuning and MCTS for real-world problem solving.

Editorial Team May 02, 2026 6 min read

The race to build machines that can reason – not just pattern-match – has defined the cutting edge of artificial intelligence since the emergence of large language models. While proprietary systems like OpenAI’s o1-series have demonstrated impressive reasoning chains, the open-source community has long awaited a comparable alternative. Enter Marco-o1: an open-source large reasoning model from Alibaba’s AIDC-AI MarcoPolo Team that delivers structured, multi-step reasoning for both closed-form and open-ended problems.

Built on the Qwen2-7B-Instruct foundation, Marco-o1 represents a deliberate departure from models optimized solely for standardized benchmarks. The team at AIDC-AI designed it to tackle the messy, ambiguous problems that characterize real-world deployment – from logistics optimization to creative planning – while keeping the model fully open-source and accessible to the global research community.

The project has evolved rapidly through three major versions, each introducing architectural innovations that push the boundaries of what open-source reasoning models can achieve. With its v2 paper accepted at ACL 2025 and a related efficient-reasoning paper accepted at ICLR 2026, Marco-o1 has established itself as a serious academic contribution, not merely a replication of existing methods.

What Is Marco-o1 and Why Was It Created?

Marco-o1 is an open large reasoning model designed to bridge the gap between closed-source reasoning systems and the open-source ecosystem. Unlike foundation models that aim for broad general knowledge, Marco-o1 is specifically engineered for multi-step logical deduction, planning, and problem-solving in contexts where a single forward pass is insufficient.

The MarcoPolo Team at Alibaba AIDC-AI observed that most open-source models at the time excelled at recall and generation but fell short on structured reasoning. They set out to build a model that could “think before it speaks” – generating internal reasoning traces before arriving at answers – while remaining transparent about its decision-making process.

Aspect	Marco-o1	Typical Open-Source LLM
Reasoning approach	Multi-step CoT + MCTS	Single-pass generation
Problem scope	Open-ended + standard	Primarily standard formats
Inference strategy	Reflection + backtracking	Feed-forward only
Training method	CoT fine-tuning + EDPO	Standard SFT + RLHF
Academic acceptance	ACL 2025, ICLR 2026	Varies widely

What Techniques Power Marco-o1’s Reasoning?

The core of Marco-o1’s capability lies in the combination of Chain-of-Thought (CoT) fine-tuning with Monte Carlo Tree Search (MCTS), creating a dual-layer reasoning architecture.

CoT fine-tuning trains the model to break down complex queries into intermediate reasoning steps, much like showing your work in a math exam. MCTS, traditionally used in game-playing AI like AlphaGo, systematically explores multiple reasoning paths, evaluates their promise, and backtracks when necessary. Together, these techniques allow Marco-o1 to navigate complex problem spaces with the deliberation of a human expert.

graph TD
    A[User Query] --> B[CoT Decomposition]
    B --> C{MCTS Exploration}
    C --> D[Path 1: Standard reasoning]
    C --> E[Path 2: Alternative approach]
    C --> F[Path 3: Reflective reasoning]
    D --> G[Evaluate confidence]
    E --> G
    F --> G
    G --> H{Confidence threshold met?}
    H -->|Yes| I[Final answer]
    H -->|No| B

The model also employs EDPO (Difficulty-Estimated Policy Optimization), a training strategy that adjusts reinforcement signals based on the estimated difficulty of each reasoning step. This produces more robust behavior on hard problems while avoiding overfitting on easy ones.

How Do the Different Versions Compare?

Marco-o1 has evolved through three major releases, each building on the lessons of its predecessor while introducing new architectural innovations.

Version	Release Date	Key Innovation	Performance Impact
v1	November 2024	Initial CoT + MCTS framework	Baseline reasoning capability
v2	February 2025	DPO optimization, instruction following	Accepted at ACL 2025
v3	February 2025	MAM (Mixed Attention Module) + TTT	20% lower inference cost, 4.7% avg improvement

Marco-o1 v2 represented a maturation of the approach, with DPO bringing the model’s outputs closer to human-preferred reasoning patterns. V3, however, was the architectural breakthrough: the Mixed Attention Module allows the model to dynamically allocate computational resources across different parts of the input, while Test-Time Training (TTT) enables the model to refine its own weights during inference – a technique borrowed from meta-learning that significantly improves generalization.

How Can You Use Marco-o1?

Marco-o1 is designed for accessibility. The model weights are available on both Hugging Face and ModelScope, and the inference code is fully open source on GitHub.

git clone https://github.com/AIDC-AI/Marco-o1
cd Marco-o1
pip install -r requirements.txt

Loading the model requires nothing more than standard Transformers:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AIDC-AI/Marco-o1")
model = AutoModelForCausalLM.from_pretrained("AIDC-AI/Marco-o1")

The model runs on hardware compatible with 7B-parameter LLMs – a single A100 or comparable GPU is sufficient for inference, making it accessible to research labs and startups without massive compute budgets.

What’s Next for Marco-o1?

The MarcoPolo Team has announced a forthcoming Marco-o1 Agentic release, which will extend the model’s reasoning capabilities into autonomous agent workflows. This represents a natural evolution: a model that can reason about problems internally is well-positioned to execute multi-step actions in external environments, from API calls to browser automation.

The trajectory of Marco-o1 mirrors a broader industry trend: reasoning is no longer the exclusive domain of massive proprietary models. Open-source alternatives like Marco-o1 are democratizing access to structured thinking in AI, and the pace of improvement – from v1 to v3 in just three months – suggests this gap will continue to narrow.

FAQ

What is Marco-o1? Marco-o1 is an open-source large reasoning model developed by Alibaba’s AIDC-AI (MarcoPolo Team) based on Qwen2-7B-Instruct. It is designed for real-world problem solving across both standard-answer domains (math, physics, coding) and open-ended scenarios using advanced reasoning techniques like Chain-of-Thought fine-tuning and Monte Carlo Tree Search.

What techniques does Marco-o1 use (CoT + MCTS)? Marco-o1 combines Chain-of-Thought (CoT) fine-tuning with Monte Carlo Tree Search (MCTS) to enhance reasoning depth. It also uses reflection mechanisms, novel mini-step granularity reasoning action strategies, and EDPO (Difficulty-Estimated Policy Optimization) for progressive self-improvement.

What are the different versions of Marco-o1? Marco-o1 v1 (November 2024) was the initial open reasoning model. Marco-o1 v2 (February 2025) added DPO optimization for math and planning, and was accepted at ACL 2025. Marco-o1 v3 (February 2025) introduced MAM (Mixed Attention Module) and TTT (Test-Time Training), achieving 20% reduction in inference cost and 4.7% average performance improvement. A Marco-o1 Agentic model is planned.

How does Marco-o1 perform compared to other reasoning models? Marco-o1 demonstrates strong performance on reasoning benchmarks, with v3 achieving notable inference cost reductions alongside quality improvements. The v2 paper was accepted at ACL 2025, and a follow-up paper on efficient LLM reasoning was accepted at ICLR 2026, underscoring the research team’s academic contributions.

How can I use Marco-o1? Marco-o1 is available on GitHub and Hugging Face. You can clone the repository, install dependencies with pip, and load the model using the Hugging Face Transformers library. It runs on standard hardware suitable for 7B-parameter models and integrates with common ML frameworks.

Marco-o1: Alibaba's Open-Source Large Reasoning Model for Real-World Solutions

What Is Marco-o1 and Why Was It Created?

What Techniques Power Marco-o1’s Reasoning?

How Do the Different Versions Compare?

How Can You Use Marco-o1?

What’s Next for Marco-o1?

FAQ

Further Reading

LATEST POST

Easy Dataset: Open-Source Framework for Synthesizing LLM Fine-Tuning Data

CopilotKit: The Open-Source Frontend Stack for Building In-App AI Copilots

ComfyUI: The Most Powerful Open-Source Diffusion Model GUI with Node-Based Workflow

TAG

CATEGORIES