R‑Zero: a self‑evolving reasoning LLM framework

by admin 9 months ago

written by admin 9 months ago 0 comments

Researchers from Tencent introduced R‑Zero, a framework that trains large language models without any external data. Traditional self‑evolving methods rely on human‑curated tasks and labels, which limit scalability. R‑Zero instead starts with a base model and splits it into two independent roles: a Challenger, which proposes tasks at the edge of the model’s capabilities, and a Solver, which must solve those tasks. The two roles co‑evolve, generating and solving increasingly difficult problems, creating a self‑improving curriculum from scratch. Empirically, the framework boosted the Qwen3‑4B‑Base model’s average score on math reasoning benchmarks by 6.49 points and general‑domain reasoning benchmarks by 7.54 points. The authors argue that such self‑evolving LLMs offer a scalable pathway toward models capable of reasoning beyond human‑curated datasets.

These four stories showcase different facets of the rapidly evolving AI landscape. Nano Banana demonstrates how new image models can capture public imagination before official announcements; GPT‑5 highlights both progress and community pushback; Anthropic’s Claude for Chrome signals a move toward AI‑driven web browsers with deep safety considerations; and R‑Zero points to novel training paradigms that could reduce dependence on human‑labeled data.

R‑Zero: a self‑evolving reasoning LLM framework

You may also like

Queue