Researchers from Tencent introduced R‑Zero, a framework that trains large language models without any external data. Traditional self‑evolving methods rely on human‑curated tasks and labels, which limit scalability. R‑Zero instead starts with a base model and splits it into two independent roles: a Challenger, which proposes tasks at the edge of the model’s capabilities, and a Solver, which must solve those tasks. The two roles co‑evolve, generating and solving increasingly difficult problems, creating a self‑improving curriculum from scratch. Empirically, the framework boosted the Qwen3‑4B‑Base model’s average score on math reasoning benchmarks by 6.49 points and general‑domain reasoning benchmarks by 7.54 points. The authors argue that such self‑evolving LLMs offer a scalable pathway toward models capable of reasoning beyond human‑curated datasets.
These four stories showcase different facets of the rapidly evolving AI landscape. Nano Banana demonstrates how new image models can capture public imagination before official announcements; GPT‑5 highlights both progress and community pushback; Anthropic’s Claude for Chrome signals a move toward AI‑driven web browsers with deep safety considerations; and R‑Zero points to novel training paradigms that could reduce dependence on human‑labeled data.
- DeepSeek’s R1-0528 Model Raises Concerns Over Free Speech Limitations
- AI Emerges as a Crucial Tool in Combating Disinformation Campaigns
- AMD Unveils Next‑Gen AI Chips to Challenge Nvidia’s Market Leadership
- Hyperledger in Action: Transforming Industries with Blockchain
- GPT‑5: OpenAI’s next‑generation model draws both praise and criticism
- Ensuring Fairness and Compliance: Addressing Bias in AI Systems