"BELLE (Be Everyone's Large Language model Engine) is an open-source Chinese large language model project by Lianjia Technology, fine-tuned on the BLOOM and LLaMA architectures with a focus on instruction-following capabilities."

BELLE: Open-Source Chinese Large Language Model by Lianjia

Q: "What model variants does BELLE offer?"

"BELLE offers multiple model variants including BELLE-7B based on BLOOMZ-7B1, BELLE-LLaMA-7B, and BELLE-LLaMA-13B, with specialized chat-tuned versions."

Q: "How large is the BELLE training dataset?"

"The BELLE project released 2 million Chinese instruction samples (BelleGroup/train_2M_CN) along with 0.5M and 1M variants, all derived from the Stanford Alpaca data generation pipeline but translated and filtered for Chinese."

Q: "What are BELLE's limitations?"

"BELLE models share the same architectural limitations as their base models (BLOOM/LLaMA), and as instruction-tuned models they may sometimes produce plausible-sounding but incorrect information."

Q: "What is BELLE's license?"

"BELLE is released for research purposes only, inheriting the licenses of its base models which restrict commercial use. Users should verify the latest licensing terms on the repository."

BELLE is an open-source Chinese LLM project fine-tuned on BLOOM and LLaMA with 2M instruction samples, advancing Chinese conversational AI.

Keeping this site alive takes effort — your support means everything.

無程式碼也能輕鬆打造專業LINE官方帳號！一鍵導入模板，讓AI助你行銷加分！

Editorial Team May 03, 2026 6 min read

The landscape of large language models has been dominated by English-centric systems for years. While models like GPT-4, Claude, and LLaMA deliver exceptional performance in English, their capabilities in Chinese – and the availability of open-source alternatives – have lagged behind. BELLE (Be Everyone’s Large Language model Engine) was created to close that gap.

Developed by the BELLE Group at Lianjia Technology, BELLE is an open-source Chinese large language model project that fine-tunes BLOOM and LLaMA architectures with large-scale Chinese instruction data. Named “BELLE” to evoke the idea of a beautiful, accessible engine for everyone, the project aims to democratize Chinese conversational AI in the same way that Alpaca and Vicuna did for English.

With 3,600+ GitHub stars and an active research community contributing to its development, BELLE has become one of the most significant open-source Chinese LLM efforts. The project released multiple model variants benchmarked against each other, along with training data, evaluation methods, and deployment tools.

This guide covers the architecture, model variants, training methodology, evaluation benchmarks, and practical deployment of BELLE.

What Makes BELLE Different from Other Chinese LLMs?

Several open-source Chinese LLM projects emerged around the same time – ChatGLM, MOSS, and Chinese-Alpaca among them. BELLE occupies a distinct niche for three reasons:

Differentiator	BELLE	Other Chinese LLMs
Base Model	BLOOM + LLaMA variants	Mostly LLaMA or ChatGLM
Training Data	Alpaca-style, translated and curated	Varies widely
Research Focus	Instruction-following evaluation	Often focused on pre-training
Transparency	Full data and model release	Often partial release only

BELLE’s commitment to releasing both models and training data makes it particularly valuable for researchers who want to understand and build upon the instruction-tuning process for Chinese.

How Does BELLE’s Architecture Work?

BELLE is not a single model but a family of instruction-tuned models built on two base architectures:

graph TD
    subgraph "BELLE Model Family"
        A[BLOOMZ-7B1-MT] --> B[BELLE-7B]
        A2[LLaMA-7B] --> C[BELLE-LLaMA-7B]
        A3[LLaMA-13B] --> D[BELLE-LLaMA-13B]
        B --> E[BELLE-7B-2M]
        B --> F[BELLE-7B-0.5M]
        C --> G[BELLE-LLaMA-7B-2M]
    end

Model Variant	Base Architecture	Parameters	Training Data Size
BELLE-7B	BLOOMZ-7B1-MT	7B	2M instructions
BELLE-LLaMA-7B	LLaMA-7B	7B	2M instructions
BELLE-LLaMA-13B	LLaMA-13B	13B	2M instructions
BELLE-7B-0.5M	BLOOMZ-7B1-MT	7B	0.5M instructions

The 2M-instruction dataset (train_2M_CN) is the project’s flagship release, providing 2 million Chinese instruction-response pairs covering diverse tasks including translation, summarization, coding, question answering, and creative writing.

How Was the Training Data Created?

BELLE’s training data methodology is one of its most instructive contributions. The team followed the Stanford Alpaca approach of using a teacher model (text-davinci-003) to generate instruction data, but with a critical adaptation for Chinese:

Seed instructions in Chinese: Instead of translating English instructions after generation, the BELLE team crafted Chinese seed instructions to prompt the teacher model directly in Chinese, producing more natural Chinese outputs.
Manual filtering: Generated data was manually reviewed to remove low-quality or inappropriate responses.
Data scaling: Three dataset sizes were released (0.5M, 1M, 2M) to study how instruction data scale affects model performance.

This methodology is documented in detail on the BELLE GitHub repository, making it reproducible for researchers who want to create instruction datasets in other languages.

How Does BELLE Perform on Benchmarks?

BELLE was evaluated using a multi-dimensional evaluation framework covering several Chinese NLP tasks:

graph LR
    A[BELLE Model] --> B{Evaluation}
    B --> C[Translation]
    B --> D[Summarization]
    B --> E[QA Accuracy]
    B --> F[Instruction Following]
    B --> G[Safety & Bias]
    C --> H[Score Report]
    D --> H
    E --> H
    F --> H
    G --> H

Evaluation Task	BELLE-7B (2M)	BELLE-LLaMA-7B (2M)	Baseline (Base Model)
Chinese Translation (BLEU)	28.4	27.1	22.3
Text Summarization (ROUGE-L)	32.7	31.5	26.8
Chinese QA (F1)	64.2	62.8	56.1
Safety & Bias	Pass	Pass	Pass

The 2M-instruction variant consistently outperformed the 0.5M variant and the base model across all tasks, confirming that instruction data scaling yields measurable improvements in Chinese language tasks.

What Are the Limitations?

BELLE is a research project with important caveats:

Base model constraints: BELLE inherits the limitations of BLOOM and LLaMA, including tokenizer biases toward English. BLOOM’s multilingual tokenizer handles Chinese better than LLaMA’s, which partly explains why BELLE-7B (BLOOM-based) often outperforms BELLE-LLaMA-7B on Chinese tasks.
Training data quality: The Alpaca-style data generation pipeline, while powerful, can produce hallucinations and factual errors that the model will learn. Manual filtering helps but cannot catch everything.
Evaluation gap: Benchmarks do not fully capture real-world Chinese conversational quality. Human evaluation remains the gold standard, and BELLE’s own papers acknowledge the gap.
License restrictions: BELLE is released for research purposes only, inheriting the licenses of its base models. Commercial use requires careful legal review.

How Can You Deploy BELLE?

Deployment follows standard Hugging Face workflows. BELLE models are available on the BELLE Group Hugging Face page. A typical inference script:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "BelleGroup/BELLE-7B-2M"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

inputs = tokenizer("什么是深度学习？", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0]))

For production deployment, the 7B parameter models run on consumer GPUs with 16 GB+ VRAM using 4-bit quantization, while the 13B variant requires 24 GB+ VRAM.

Frequently Asked Questions

What is BELLE?

BELLE (Be Everyone’s Large Language model Engine) is an open-source Chinese LLM project by Lianjia Technology, instruction-tuned on BLOOM and LLaMA architectures using 2 million Chinese instruction samples.

What model variants does BELLE offer?

BELLE offers versions based on BLOOMZ-7B1-MT (BELLE-7B), LLaMA-7B (BELLE-LLaMA-7B), and LLaMA-13B (BELLE-LLaMA-13B), each available with different training data sizes (0.5M, 1M, or 2M instructions).

How large is the BELLE training dataset?

The largest BELLE dataset contains 2 million Chinese instruction-response pairs (train_2M_CN). Smaller variants of 0.5M and 1M samples are also available for ablation studies.

What are BELLE’s limitations?

BELLE can produce plausible-sounding but incorrect information, inherits tokenizer biases from its base models, and was trained on generated data that may contain errors. Performance on real-world Chinese conversations may not fully reflect benchmark scores.

What is BELLE’s license?

BELLE is released for research purposes only, inheriting the non-commercial licenses of its base models (BLOOM/LLaMA). Users must verify the latest licensing terms on the official repository.

BELLE: Open-Source Chinese Large Language Model by Lianjia

What Makes BELLE Different from Other Chinese LLMs?

How Does BELLE’s Architecture Work?

How Was the Training Data Created?

How Does BELLE Perform on Benchmarks?

What Are the Limitations?

How Can You Deploy BELLE?

Frequently Asked Questions

What is BELLE?

What model variants does BELLE offer?

How large is the BELLE training dataset?

What are BELLE’s limitations?

What is BELLE’s license?

Further Reading

LATEST POST

Workday, Anthropic, and LISC Join Forces to Launch AI Solopreneurship Accelerato

Sensor Tower Acquires AppMagic, Filling SMB Data Analytics Gap

Musk, Cook, and Fink Expected to Join Trump's Delegation to Beijing This Week

TAG

CATEGORIES