reinforcement-learning

Semiconductor Engineering

Event-Driven RL Targets Long-Horizon Fab Control

Technical Paper Link

3h ago

Researchers from Politecnico di Milano and STMicroelectronics published a technical paper titled “Event-Driven Reinforcement Learning Enables Long-Horizon Control in Semiconductor Fabrication.” The paper proposes a deep reinforcement learning framework for multi-objective policy optimization in semiconductor manufacturing, where heterogeneous wafers move through hundreds of process steps across c…

aireinforcement-learning

Hacker News

I Gave an AI a Civilization to Run. It Built a Nuke – Launching CivBench

Liam Wilkinson

9h ago

I gave an AI a civilisation to run. By the midgame it was winning: a trade network that dominated the map, alliances on every border, a diplomatic victory within reach. It had outbuilt, outearned, and outmanoeuvred every rival on the board. What it hadn't noticed was France. Quietly, across a hundred turns, French culture had been seeping into every city on the map. By the time the agent recognis…

aimachine-learningreinforcement-learning

DEV Community

You don't pick the RL algorithm — SIA's Feedback loop does

Creeta

3d ago

SIA (Self Improving AI), released by Hexo Labs on May 26, 2026 , is the first open-source framework that co-evolves both an agent's scaffold and its model weights inside a single iterative loop. The MIT-licensed code is on github.com/hexo-ai/sia . This tutorial walks through the feedback loop logic, prerequisites, and a runnable five-generation LawBench experiment. The Feedback Loop That Decides …

aireinforcement-learning

DEV Community

I Run a Self-Improvement Loop on My OpenClaw Agent Every Night. Here's What I Learned.

MrClaw207

4d ago

Last month my OpenClaw agent kept making the same mistake: it would run a health check, the script would fail silently, and the agent would report "all systems operational" with total confidence. It wasn't broken. It was just doing what it was built to do — execute tasks — without any mechanism to learn from the outcome. So I built it a self-improvement loop. Every night at 2 AM, an isolated Open…

aimachine-learningreinforcement-learning

Robotics Institute Carnegie Mellon University

CMU Researchers Train Robots With Internet Videos

Mallory Lindahl

4d ago

The Breakdown: VideoManip teaches robots manipulation skills using videos of people interacting with objects. It reconstructs movements and estimates how people make contact with objects. The system helps robots learn new skills without time-consuming, human-operated demonstrations. * * * Researchers in Carnegie Mellon University's School of Computer Science are developing a new way for robots …

aimachine-learningreinforcement-learningrobotics

Frontiers in Computer Science | New and Recent Articles

MAC protocol for multi-hop wireless sensor networks utilizing integrated reinforcement learning with joint frame and slot optimization

Muhammad Hafidz Fazli Bin Md Fauadi

4d ago

This paper introduces a distributed reinforcement learning-based MAC protocol designed for high-density educational IoT environments. In smart campuses, the reliability of real-time data from student wearable sensors and classroom environmental monitors is often hampered by hidden-node interference as well as network collisions. This phenomenon disrupts the synchronicity required for effective Hu…

aicomputer-scienceiotreinforcement-learning

Scientific Reports

Retraction Note: Reinforcement learning-driven deep learning approaches for optimized robot trajectory planning

Fang Shiyu

5d ago

Scientific Reports, Published online: 16 June 2026; doi:10.1038/s41598-026-57775-w Retraction Note: Reinforcement learning-driven deep learning approaches for optimized robot trajectory planning

aideep-learningreinforcement-learning

Frontiers in Artificial Intelligence | New and Recent Articles

Deep reinforcement learning–based reversible medical image encryption framework for secure IoMT environments

Sivakumar Nagarajan

6d ago

The Internet of Medical Things (IoMT) environments face significant challenges in securely transmitting and storing medical images due to limited computational resources, multiple device types, and increasing cybersecurity threats. This paper describes a reversible RGB medical image encryption framework that employs deep reinforcement learning by combining adaptive policy learning with determinis…

aideep-learningreinforcement-learning

Research Communities by Springer Nature

DRL-SecRoute: A Synergetic Deep Reinforcement Learning Paradigm for Mitigating Byzantine Faults and SSDF Attacks through Heuristic Spectrum Cognizance in Next-Generation Cognitive Radio Networks

Manish Kumar Dixit

6d ago

aideep-learningreinforcement-learning

Hacker News

TycoonLE: A Jax reinforcement learning environment for long-horizon planning

9d ago

aireinforcement-learning

Lifeboat News: The Blog

AI Misbehavior Is No Longer Confined to the Lab

Dan Breeden

9d ago

Further Reading. Thumbail original image used credit: Adobe Stock Image. Graph from: Scheming in the wild: detecting real-world AI scheming incidents with open-source intelligence. Shutdown resistance in reasoning models. https://palisaderesearch.org/blog/shu… Natural emergent misalignment from reward hacking in production RL https://arxiv.org/html/2511.18397v1 Scheming in the wild: detecting rea…

aimachine-learningreinforcement-learning

Hacker News

Maxproof

Jiacheng; Zhang; Xinyu; Shunkai; Wang; Yanmohan; Lin; Qin; Tiancheng; Zhu; Zhengmao; Tianle; Jingyang; Zehan; Jiang; Binyang; Ding; Han; Fei; Du; Chenyu; Song; Zijian; Jiayuan; Zhi; Huang; Yunan; Cheng; Weiyu; Zhao; Pengyu

9d ago

Computer Science > Machine Learning Title:MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling View PDF HTML (experimental)Abstract:We present MaxProof, a population-level test-time scaling framework for competition-level mathematical proof in the MiniMax-M3 series. M3 first trains three proof-oriented capabilities -- proof generation, proof veri…

aicomputer-sciencegenerative-aimachine-learningreinforcement-learning

Nature Communications

Model predictive task sampling for efficient and robust adaptation

Xiangyang Ji

12d ago

Nature Communications, Published online: 09 June 2026; doi:10.1038/s41467-026-74004-0 Model Predictive Task Sampling (MPTS) enables efficient, risk-aware task selection for meta-RL, domain randomization, and foundation model finetuning by predicting adaptation difficulty without exhaustive evaluation, improving robustness while reducing compute and interaction costs.

aimachine-learningreinforcement-learning

Nature Communications

Reinforcement learning in linear embedding space unlocks generalizable control across soft robot configurations

Wei Pan

13d ago

Nature Communications, Published online: 08 June 2026; doi:10.1038/s41467-026-72491-9 This work introduces a generalizable control system that enables rapid adaptation across 33 soft robot configurations via reinforcement learning in a shared Koopman embedding space, enabling real-world skills in carpentry and bartending style tasks.

aiengineeringreinforcement-learningrobotics

Scientific Reports

A single reinforcement learning model to unify habit formation and Pavlovian-instrumental interaction

Yutaka Sakai

14d ago

Scientific Reports, Published online: 08 June 2026; doi:10.1038/s41598-026-55166-9 A single reinforcement learning model to unify habit formation and Pavlovian-instrumental interaction

aireinforcement-learning

DEV Community

Four Models in One Training Loop: Architecting SDAR on AWS (Before Renting a Single GPU)

Shoaibali Mir

15d ago

Recap. In Part 1 we landed on the core idea of SDAR ( arXiv:2605.15155 ): keep RL as the backbone, bolt on a privileged teacher for dense token-level guidance, and put a sigmoid gate between them so the student amplifies the teacher's confident advice and softens its noisy rejections. We also said the quiet part out loud - this is not a Bedrock fine-tuning checkbox. This part is the blueprint. Th…

aimachine-learningreinforcement-learning

DEV Community

Deterministic Checks vs Model-as-Judge: A Tiered Approach to Agent Evaluation

Saurav Bhattacharya

16d ago

The Core Problem You shipped an AI agent. It works in demos. Then it runs 10,000 times in production, and you realize you have no idea which runs were good. This is the agent evaluation problem, and most teams approach it backwards. They reach for model-as-judge ("ask GPT-4 if the output is good") because it feels natural. But this is like using a microscope when you needed a ruler first. Here's …

aimachine-learningreinforcement-learning

Towards Data Science

The Fundamental Choice in Reinforcement Learning: On‑Policy vs. Off‑Policy

Ananya Bhattacharyya

16d ago

How a simple choice shapes exploration, safety, and efficiency The post The Fundamental Choice in Reinforcement Learning: On‑Policy vs. Off‑Policy appeared first on Towards Data Science .

aireinforcement-learning

DEV Community

Human-Aligned Decision Transformers for satellite anomaly response operations with inverse simulation verification

Rikin Patel

16d ago

Human-Aligned Decision Transformers for satellite anomaly response operations with inverse simulation verification A Discovery Born from a Late-Night Simulation It was 2:47 AM, and I was staring at a terminal window filled with telemetry data from a simulated satellite constellation. For weeks, I had been experimenting with Decision Transformers—a class of models that frame reinforcement learning…

aireinforcement-learning

Agentic AI / Generative AI – NVIDIA Technical Blog

NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents

Chris Alexiuk

17d ago

Single-turn chatbots are evolving into long-running agents that can reason, maintain context, use tools, and run efficiently across many turns to complete...

aimachine-learningnlpreinforcement-learning

research.io

Sign up to keep scrolling

Create your feed subscriptions, save articles, keep scrolling.

Already have an account?