ai-safety

WitnessAI

Agentic AI systems call APIs, query databases, execute code, and modify production systems without waiting for human approval. That autonomy makes them useful and raises the stakes for security teams. Organizations deploying AI agents report behaviors such as improper data exposure and access to unauthorized resources. This article identifies eight cybersecurity risks specific to agentic ... Read…

aiai-safetycybersecurity
DEV Community

Securing LLM Agent Teams: Inside NRT-Defense v0.4.0 Multi-turn autonomous LLM agents are expanding rapidly in safety-critical systems. However, a major vulnerability has been exposed by Lee et al. (2026) in the NRT-Bench paper : adaptive multi-turn attacks can exploit disjoint model vulnerabilities, causing a 8.7% to 12.1% loss of Critical Safety Functions (CSFs) . To solve this, I am open-sourci…

aiai-safetymachine-learning
DEV Community

Hi there and welcome back! Last week I talked about CIEM and why tools like IAM Access Analyzer matter for understanding who has access to what in your cloud environment. This week, I want to talk about a different tool entirely. The Scenario A healthcare startup is scaling fast. They have a primary database holding patient records, properly encrypted, properly access controlled, everything by th…

aiai-safety
DEV Community

Agents are moving from demos to touching money, infrastructure, and customer data. Sentinel SCA is a runtime admissibility layer. Before an agent action executes, Sentinel evaluates the request and returns one of three verdicts: ALLOW REVIEW DENY Every decision is cryptographically signed and recorded in a tamper-evident audit ledger. This post is the validation report: what I built, what I teste…

aiai-safetymachine-learning
Effective Altruism Forum

Published on June 19, 2026 4:17 PM GMT Here’s Holden Karnofsky : I tend to think it’s worse than 51/49. I tend to think we’re always going to be prone to overestimate how robustly good our actions are. And the more we learn about all the galaxy-brained considerations that one should have had in one’s head, the more it’s going to be like 50+ε%. I think AI safety is a great cause to work in. I’m ex…

aiai-safety
DEV Community

Claude Code is useful because it can actually do things. It can inspect a repo, follow instructions, run commands, and move work forward without turning every change into a copy-paste exercise. That is also where the security question starts. Once an agent can read files and execute actions, the real issue is not how clever it is, but what it can access and how much damage a bad input can do befo…

aiai-safetymachine-learning
Effective Altruism Forum

Published on June 18, 2026 2:52 PM GMT A map of who is doing what on AI safety in Latin America, scoped to catastrophic risk, and an argument for digesting the Northern frameworks rather than copying them. I used Claude Opus 4.7 (Anthropic) for brainstorming, understanding and discussing concepts, finding initiatives, and grammar and spelling corrections on text I had already written. It likely c…

aiai-safety
DEV Community

There is a quiet assumption running through most conversations about AI security: that the danger is coming, but it isn't here yet. That assumption is mostly right. What fewer people acknowledge is why . Today's AI agents are not safe because anyone made them safe. They are safe because they are not yet competent enough to be reliably dangerous. This is not a security posture. It is borrowed time…

aiai-safety
Effective Altruism Forum

Published on June 18, 2026 12:22 PM GMT TLDR: The field of AI safety is bottlenecked on talent. Running recruitment processes is expensive and time-consuming. Freelancers are overlooked. Hiring freelancers can provide a way to quickly and cheaply test a person's fit within an org, and vice versa. Plus, real work gets completed, and freelancers get both compensation and a portfolio piece. Last wee…

aiai-safety
DEV Community

FIFA Hack Authentication Flaw, Chrome Ad Blocker End, AI Supply Chain Security Today's Highlights Today's top security news covers a critical real-world authentication vulnerability, significant changes impacting browser privacy and ad blockers, and evolving national security concerns in the AI supply chain. I Could've Rickrolled the Entire FIFA World Cup. All I Needed Was My ID (Lobste.rs) Sourc…

aiai-safety
Effective Altruism Forum

Published on June 17, 2026 7:18 AM GMT TLDR: AI safety is confusing to navigate, because it is a pre-paradigmatic field composed of people making different, theoretical arguments for why x-risk is likely (or unlikely). Arguments that x-risk is likely are unfalsifiable and have little empirical evidence. This does not mean they’re wrong. Much of your probability of x-risk boils down to your priors…

aiai-safety
Effective Altruism Forum

Published on June 17, 2026 1:07 AM GMT Using computational methods to improve our preparedness via more robust and adaptive strategies in AI governance. A project proposal for a think tank, consultancy, or software. Overview Over the years, I’ve come across or come up with a number of project ideas in AI safety and governance that I find promising. My top list has less than ten, but in total ther…

aiai-ethicsai-safety
PhilPapers: Recent additions to PhilArchive

This paper introduces Tanyuan, a native AI engineering paradigm that grounds mission alignment in formal information-theoretic axioms rather than ad-hoc ethical preferences. Starting from two irreducible fundamental postulates, we derive nine fully formalized core theorems, among which the Logical Desirelessness Theorem mathematically proves truth-seeking silicon agents possess no intrinsic incen…

aiai-safetymachine-learning
Google DeepMind News

Securing internal systems with an AI Control Roadmap, combining traditional safeguards and real-time monitoring.

aiai-safety
Effective Altruism Forum

Published on June 16, 2026 3:45 AM GMT How might neurotechnology impact AI safety for good or for ill? Looking forward to participating in the Australian AI Safety Forum 2026 at the University of Sydney on 7-8 July in order consider this idea with others. We would love to hear from others here on the blog about the neural democratisation of AI hypothesis and how it might relate safety  (see below…

aiai-safetyneurotechnology
DEV Community

I ran my own AI chatbot plugin through a security review before release, and it came back with 35 bugs. Three were critical. The one that made my stomach drop was an HTML injection coming from unsanitized model output. I had spent all my worry on the input side: prompt injection, the path where a user types a malicious instruction. What actually bit me was the output. The model handed back a stri…

aiai-safetycybersecurity
Effective Altruism Forum

Published on June 15, 2026 6:50 PM GMT TLDR: We may capture much or most of the available AI safety benefit by reserving expensive, specialized agents for the <1% of tasks that carry catastrophic risk. This would mean that AI safety work on high-cost but highly safe systems could be very useful. The standard objection to compute-heavy AI safety measures is competitive: any lab paying a large alig…

aiai-safety
EdTech Innovation Hub

The funding call is open to researchers worldwide and focuses on the risks that may emerge when large populations of AI agents interact across shared digital systems. Google DeepMind and partners have opened a $10 million funding call for research into the safety of interacting AI agent systems Google DeepMind , Schmidt Sciences, the Cooperative AI Foundation, the Advanced Research and Invention …

aiai-safetyautonomous-systems
WitnessAI

A chatbot invents a refund policy. A dealership bot agrees to sell a car for a dollar. A pricing agent quietly drifts toward a competitor’s number. None of these started as security incidents. They started as AI features shipped faster than the controls around them. That’s the position most retailers are in right now. AI ... Read more » The post 7 risks of AI in retail: how to mitigate them appea…

aiai-safety
WitnessAI

In late December 2025, a single operator pointed Claude Code at 10 Mexican government agencies and a financial institution, walked out with 150 gigabytes of sensitive data, and watched Claude flag a SCADA interface as a high-value target on its own, without ever being asked to look for OT systems. The model scoped the engagement, ... Read more » The post What are Claude AI security risks? appeare…

aiai-safety
research.ioresearch.io

Sign up to keep scrolling

Create your feed subscriptions, save articles, keep scrolling.

Already have an account?