reinforcement-learning

Semiconductor Engineering

Researchers from Politecnico di Milano and STMicroelectronics published a technical paper titled “Event-Driven Reinforcement Learning Enables Long-Horizon Control in Semiconductor Fabrication.” The paper proposes a deep reinforcement learning framework for multi-objective policy optimization in semiconductor manufacturing, where heterogeneous wafers move through hundreds of process steps across c…

aireinforcement-learning
Hacker News

I gave an AI a civilisation to run. By the midgame it was winning: a trade network that dominated the map, alliances on every border, a diplomatic victory within reach. It had outbuilt, outearned, and outmanoeuvred every rival on the board. What it hadn't noticed was France. Quietly, across a hundred turns, French culture had been seeping into every city on the map. By the time the agent recognis…

aimachine-learningreinforcement-learning
DEV Community

SIA (Self Improving AI), released by Hexo Labs on May 26, 2026 , is the first open-source framework that co-evolves both an agent's scaffold and its model weights inside a single iterative loop. The MIT-licensed code is on github.com/hexo-ai/sia . This tutorial walks through the feedback loop logic, prerequisites, and a runnable five-generation LawBench experiment. The Feedback Loop That Decides …

aireinforcement-learning
DEV Community

Last month my OpenClaw agent kept making the same mistake: it would run a health check, the script would fail silently, and the agent would report "all systems operational" with total confidence. It wasn't broken. It was just doing what it was built to do — execute tasks — without any mechanism to learn from the outcome. So I built it a self-improvement loop. Every night at 2 AM, an isolated Open…

aimachine-learningreinforcement-learning
Robotics Institute Carnegie Mellon University

The Breakdown:  VideoManip teaches robots manipulation skills using videos of people interacting with objects. It reconstructs movements and estimates how people make contact with objects. The system helps robots learn new skills without time-consuming, human-operated demonstrations. * * *  Researchers in Carnegie Mellon University's School of Computer Science are developing a new way for robots …

aimachine-learningreinforcement-learningrobotics
Frontiers in Computer Science | New and Recent Articles

This paper introduces a distributed reinforcement learning-based MAC protocol designed for high-density educational IoT environments. In smart campuses, the reliability of real-time data from student wearable sensors and classroom environmental monitors is often hampered by hidden-node interference as well as network collisions. This phenomenon disrupts the synchronicity required for effective Hu…

aicomputer-scienceiotreinforcement-learning
Scientific Reports
Frontiers in Artificial Intelligence | New and Recent Articles

The Internet of Medical Things (IoMT) environments face significant challenges in securely transmitting and storing medical images due to limited computational resources, multiple device types, and increasing cybersecurity threats. This paper describes a reversible RGB medical image encryption framework that employs deep reinforcement learning by combining adaptive policy learning with determinis…

aideep-learningreinforcement-learning
Lifeboat News: The Blog

Further Reading. Thumbail original image used credit: Adobe Stock Image. Graph from: Scheming in the wild: detecting real-world AI scheming incidents with open-source intelligence. Shutdown resistance in reasoning models. https://palisaderesearch.org/blog/shu… Natural emergent misalignment from reward hacking in production RL https://arxiv.org/html/2511.18397v1 Scheming in the wild: detecting rea…

aimachine-learningreinforcement-learning
Hacker News
Jiacheng; Zhang; Xinyu; Shunkai; Wang; Yanmohan; Lin; Qin; Tiancheng; Zhu; Zhengmao; Tianle; Jingyang; Zehan; Jiang; Binyang; Ding; Han; Fei; Du; Chenyu; Song; Zijian; Jiayuan; Zhi; Huang; Yunan; Cheng; Weiyu; Zhao; Pengyu
9d ago

Computer Science > Machine Learning Title:MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling View PDF HTML (experimental)Abstract:We present MaxProof, a population-level test-time scaling framework for competition-level mathematical proof in the MiniMax-M3 series. M3 first trains three proof-oriented capabilities -- proof generation, proof veri…

aicomputer-sciencegenerative-aimachine-learningreinforcement-learning
Nature Communications

Nature Communications, Published online: 09 June 2026; doi:10.1038/s41467-026-74004-0 Model Predictive Task Sampling (MPTS) enables efficient, risk-aware task selection for meta-RL, domain randomization, and foundation model finetuning by predicting adaptation difficulty without exhaustive evaluation, improving robustness while reducing compute and interaction costs.

aimachine-learningreinforcement-learning
Nature Communications

Nature Communications, Published online: 08 June 2026; doi:10.1038/s41467-026-72491-9 This work introduces a generalizable control system that enables rapid adaptation across 33 soft robot configurations via reinforcement learning in a shared Koopman embedding space, enabling real-world skills in carpentry and bartending style tasks.

aiengineeringreinforcement-learningrobotics
Scientific Reports
DEV Community

Recap. In Part 1 we landed on the core idea of SDAR ( arXiv:2605.15155 ): keep RL as the backbone, bolt on a privileged teacher for dense token-level guidance, and put a sigmoid gate between them so the student amplifies the teacher's confident advice and softens its noisy rejections. We also said the quiet part out loud - this is not a Bedrock fine-tuning checkbox. This part is the blueprint. Th…

aimachine-learningreinforcement-learning
DEV Community

The Core Problem You shipped an AI agent. It works in demos. Then it runs 10,000 times in production, and you realize you have no idea which runs were good. This is the agent evaluation problem, and most teams approach it backwards. They reach for model-as-judge ("ask GPT-4 if the output is good") because it feels natural. But this is like using a microscope when you needed a ruler first. Here's …

aimachine-learningreinforcement-learning
Towards Data Science
DEV Community

Human-Aligned Decision Transformers for satellite anomaly response operations with inverse simulation verification A Discovery Born from a Late-Night Simulation It was 2:47 AM, and I was staring at a terminal window filled with telemetry data from a simulated satellite constellation. For weeks, I had been experimenting with Decision Transformers—a class of models that frame reinforcement learning…

aireinforcement-learning
Agentic AI / Generative AI – NVIDIA Technical Blog
research.ioresearch.io

Sign up to keep scrolling

Create your feed subscriptions, save articles, keep scrolling.

Already have an account?