reinforcement learning

News

You can now fine-tune your enterprise’s own version of OpenAI’s o4-mini reasoning model with reinforcement learning

For organizations with clearly defined problems and verifiable answers, RFT offers a compelling way to align models.

Alibaba’s ‘ZeroSearch’ lets AI learn to google itself — slashing training costs by 88 percent

Alibaba’s ZeroSearch trains large language models to beat Google Search and slash API costs by 88%, redefining how AI learns to retrieve information.

Cold Fusion on MSN16hOpinion

Are We Blaming AI for the Wrong Things?

AI is often portrayed as disruptive, dangerous, and even destructive. But what if we’ve been focusing too much on what’s ...

OnchainVip: Reinventing the WEB3 Trading Paradigm with Cutting-Edge Technology

As a core component of the cryptocurrency ecosystem of the second largest cryptocurrency exchange in the U.S., OnchainVip has ...

Tech Xplore2d

Researchers unveil IntersectionZoo to evaluate AI learning in complex urban traffic

If there's one thing that characterizes driving in any major city, it's the constant stop-and-go as traffic lights change and ...

Finbold2d

Fraction AI launches mainnet on Base

Decentralized AI agent auto-training platform Fraction AI has announced the launch of its mainnet on the Ethereum Layer 2 (L2) network Base.

Agentic AI In Banking: The Future And The Challenges

The financial world is on the brink of a new era marked by greater efficiency, innovation and customer-centric services.

Tech Xplore8d

Reinforcement learning boosts reasoning skills in new diffusion-based language model d1

A team of AI researchers at the University of California, Los Angeles, working with a colleague from Meta AI, has introduced d1, a diffusion-large-language-model-based framework that has been improved ...

Psychology Today8d

Why AI Gets Learning Right and Cognitive Science Doesn’t

When machines fall short, we adjust. When students do, we blame. Here's what that says about learning and instruction.

AI Is Using Your Likes to Get Inside Your Head

Liking features on social media can provide troves of data about human behavior to AI models. But as AI gets smarter, will it ...

GitHub16d

TTRL: Test-Time Reinforcement Learning

We investigate Reinforcement Learning (RL) on data without explicit labels for reasoning tasks in Large Language Models (LLMs). The core challenge of the problem is reward estimation during inference ...

Geeky Gadgets20d

Deepseeks Self Learning Breakthrough That Could Outshine GPT-4

reinforcement learning, and reward modeling. At the heart of this innovation lies Deepseek GRM, an AI judge carefully designed to evaluate responses with unparalleled precision and adaptability.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results