Researchers from Tencent introduced R‑Zero, a framework that trains large language models without any external data. Traditional self‑evolving methods rely on human‑curated tasks and labels, which limit scalability. R‑Zero instead starts with a base model and splits it into two independent roles: a Challenger, which proposes tasks at the edge of the model’s capabilities, and a Solver, which must solve those tasks. The two roles co‑evolve, generating and solving increasingly difficult problems, creating a self‑improving curriculum from scratch. Empirically, the framework boosted the Qwen3‑4B‑Base model’s average score on math reasoning benchmarks by 6.49 points and general‑domain reasoning benchmarks by 7.54 points. The authors argue that such self‑evolving LLMs offer a scalable pathway toward models capable of reasoning beyond human‑curated datasets.
These four stories showcase different facets of the rapidly evolving AI landscape. Nano Banana demonstrates how new image models can capture public imagination before official announcements; GPT‑5 highlights both progress and community pushback; Anthropic’s Claude for Chrome signals a move toward AI‑driven web browsers with deep safety considerations; and R‑Zero points to novel training paradigms that could reduce dependence on human‑labeled data.
- Meta’s $14.3 Billion Stake in Scale AI Signals Bold Play for Superintelligence Leadership
- India to Launch Four Responsible AI Solutions on AIKosha by September
- Hyperledger’s Expanding Ecosystem: Diverse Use Cases Across Industries
- OpenAI Charts Course to Transform ChatGPT into Comprehensive AI Super Assistant
- Apple Unveils ‘Apple Intelligence’ at WWDC 2025 Featuring Live Translation, Liquid Glass Design & More
- GPT‑5: OpenAI’s next‑generation model draws both praise and criticism