AI

BELLE: Open-Source Chinese Large Language Model by Lianjia

BELLE is an open-source Chinese LLM project fine-tuned on BLOOM and LLaMA with 2M instruction samples, advancing Chinese conversational AI.

Keeping this site alive takes effort — your support means everything.
無程式碼也能輕鬆打造專業LINE官方帳號!一鍵導入模板,讓AI助你行銷加分! 無程式碼也能輕鬆打造專業LINE官方帳號!一鍵導入模板,讓AI助你行銷加分!
BELLE: Open-Source Chinese Large Language Model by Lianjia

The landscape of large language models has been dominated by English-centric systems for years. While models like GPT-4, Claude, and LLaMA deliver exceptional performance in English, their capabilities in Chinese – and the availability of open-source alternatives – have lagged behind. BELLE (Be Everyone’s Large Language model Engine) was created to close that gap.

Developed by the BELLE Group at Lianjia Technology, BELLE is an open-source Chinese large language model project that fine-tunes BLOOM and LLaMA architectures with large-scale Chinese instruction data. Named “BELLE” to evoke the idea of a beautiful, accessible engine for everyone, the project aims to democratize Chinese conversational AI in the same way that Alpaca and Vicuna did for English.

With 3,600+ GitHub stars and an active research community contributing to its development, BELLE has become one of the most significant open-source Chinese LLM efforts. The project released multiple model variants benchmarked against each other, along with training data, evaluation methods, and deployment tools.

This guide covers the architecture, model variants, training methodology, evaluation benchmarks, and practical deployment of BELLE.


What Makes BELLE Different from Other Chinese LLMs?

Several open-source Chinese LLM projects emerged around the same time – ChatGLM, MOSS, and Chinese-Alpaca among them. BELLE occupies a distinct niche for three reasons:

DifferentiatorBELLEOther Chinese LLMs
Base ModelBLOOM + LLaMA variantsMostly LLaMA or ChatGLM
Training DataAlpaca-style, translated and curatedVaries widely
Research FocusInstruction-following evaluationOften focused on pre-training
TransparencyFull data and model releaseOften partial release only

BELLE’s commitment to releasing both models and training data makes it particularly valuable for researchers who want to understand and build upon the instruction-tuning process for Chinese.

How Does BELLE’s Architecture Work?

BELLE is not a single model but a family of instruction-tuned models built on two base architectures:

Model VariantBase ArchitectureParametersTraining Data Size
BELLE-7BBLOOMZ-7B1-MT7B2M instructions
BELLE-LLaMA-7BLLaMA-7B7B2M instructions
BELLE-LLaMA-13BLLaMA-13B13B2M instructions
BELLE-7B-0.5MBLOOMZ-7B1-MT7B0.5M instructions

The 2M-instruction dataset (train_2M_CN) is the project’s flagship release, providing 2 million Chinese instruction-response pairs covering diverse tasks including translation, summarization, coding, question answering, and creative writing.

How Was the Training Data Created?

BELLE’s training data methodology is one of its most instructive contributions. The team followed the Stanford Alpaca approach of using a teacher model (text-davinci-003) to generate instruction data, but with a critical adaptation for Chinese:

  1. Seed instructions in Chinese: Instead of translating English instructions after generation, the BELLE team crafted Chinese seed instructions to prompt the teacher model directly in Chinese, producing more natural Chinese outputs.
  2. Manual filtering: Generated data was manually reviewed to remove low-quality or inappropriate responses.
  3. Data scaling: Three dataset sizes were released (0.5M, 1M, 2M) to study how instruction data scale affects model performance.

This methodology is documented in detail on the BELLE GitHub repository, making it reproducible for researchers who want to create instruction datasets in other languages.

How Does BELLE Perform on Benchmarks?

BELLE was evaluated using a multi-dimensional evaluation framework covering several Chinese NLP tasks:

Evaluation TaskBELLE-7B (2M)BELLE-LLaMA-7B (2M)Baseline (Base Model)
Chinese Translation (BLEU)28.427.122.3
Text Summarization (ROUGE-L)32.731.526.8
Chinese QA (F1)64.262.856.1
Safety & BiasPassPassPass

The 2M-instruction variant consistently outperformed the 0.5M variant and the base model across all tasks, confirming that instruction data scaling yields measurable improvements in Chinese language tasks.

What Are the Limitations?

BELLE is a research project with important caveats:

  • Base model constraints: BELLE inherits the limitations of BLOOM and LLaMA, including tokenizer biases toward English. BLOOM’s multilingual tokenizer handles Chinese better than LLaMA’s, which partly explains why BELLE-7B (BLOOM-based) often outperforms BELLE-LLaMA-7B on Chinese tasks.
  • Training data quality: The Alpaca-style data generation pipeline, while powerful, can produce hallucinations and factual errors that the model will learn. Manual filtering helps but cannot catch everything.
  • Evaluation gap: Benchmarks do not fully capture real-world Chinese conversational quality. Human evaluation remains the gold standard, and BELLE’s own papers acknowledge the gap.
  • License restrictions: BELLE is released for research purposes only, inheriting the licenses of its base models. Commercial use requires careful legal review.

How Can You Deploy BELLE?

Deployment follows standard Hugging Face workflows. BELLE models are available on the BELLE Group Hugging Face page. A typical inference script:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "BelleGroup/BELLE-7B-2M"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

inputs = tokenizer("什么是深度学习?", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0]))

For production deployment, the 7B parameter models run on consumer GPUs with 16 GB+ VRAM using 4-bit quantization, while the 13B variant requires 24 GB+ VRAM.

Frequently Asked Questions

What is BELLE?

BELLE (Be Everyone’s Large Language model Engine) is an open-source Chinese LLM project by Lianjia Technology, instruction-tuned on BLOOM and LLaMA architectures using 2 million Chinese instruction samples.

What model variants does BELLE offer?

BELLE offers versions based on BLOOMZ-7B1-MT (BELLE-7B), LLaMA-7B (BELLE-LLaMA-7B), and LLaMA-13B (BELLE-LLaMA-13B), each available with different training data sizes (0.5M, 1M, or 2M instructions).

How large is the BELLE training dataset?

The largest BELLE dataset contains 2 million Chinese instruction-response pairs (train_2M_CN). Smaller variants of 0.5M and 1M samples are also available for ablation studies.

What are BELLE’s limitations?

BELLE can produce plausible-sounding but incorrect information, inherits tokenizer biases from its base models, and was trained on generated data that may contain errors. Performance on real-world Chinese conversations may not fully reflect benchmark scores.

What is BELLE’s license?

BELLE is released for research purposes only, inheriting the non-commercial licenses of its base models (BLOOM/LLaMA). Users must verify the latest licensing terms on the official repository.

Further Reading

TAG
CATEGORIES