Curated database of papers, blog posts, reports, and talks on AI safety — linked to the 17 risk vectors we track. Community submissions welcome.
Artificial intelligence-enabled deception detection is an emerging tool for identifying dishonest behavior in a wide range of applications, from security and forensics to politics and lower-risk every...
Purpose | In this paper, the phenomenon of AI washing – a deceptive form of market communication – is explored. In particular, the research aims to answer the question: What are the main factors foste...
,
Deception, a widespread aspect of human behavior, has significant implications in fields like law enforcement, security, judicial proceedings, and social areas. Detecting deception accurately, especia...
The swift modernization of conventional power grids into smart grids has substantially increased their attack surface, making them vulnerable to advanced cyber threats. These cyberattacks can jeopardi...
Amid the rapidly evolving landscape of artificial intelligence (AI) regulation, a significant concern has emerged regarding the predominant focus on preemptive measures aimed at preventing or mitiga...
This paper provides an outline analysis of the evolving governance framework for artificial intelligence (AI) in Singapore. Across the Singapore government, AI solutions are being adopted in line wi...
<p><em>Note: you are ineligible to complete this challenge if you’ve studied Ancient or Modern Greek, or if you natively speak Modern Greek, or if for other reasons you know what mistakes I’m claiming...
The governance of artificial intelligence has a blind spot: the machine identities that AI systems use to act. AI agents, service accounts, API tokens, and automated workflows now outnumber human iden...
Modern Transformer-based language models achieve strong performance in natural language processing tasks, yet their latent semantic spaces remain largely uninterpretable black boxes. This paper introd...
<p>In this post, I'll go through some of my best guesses for the current situation in AI as of the start of April 2026. You can think of this as a <a href="https://ai-2027.com/">scenario forecast</a>,...
This study examines the perception of legal professionals on the governance of AI in developing countries, using Nigeria as a case study. The study focused on ethical risks, regulatory gaps, and insti...
Interpreting the information encoded in model weights remains a fundamental challenge in mechanistic interpretability. In this work, we introduce ROTATE (Rotation-Optimized Token Alignment in weighT s...
Retail supply chain operations in supermarket chains involve continuous, high-volume manual workflows spanning demand forecasting, procurement, supplier coordination, and inventory replenishment, proc...
We present Arch (AI-native Register-transfer Clocked Hardware), a hardware description language designed from first principles for micro-architecture specification and AI-assisted code generation. Arc...
The increasing use of large language models in mental health applications calls for principled evaluation frameworks that assess alignment with psychotherapeutic best practices beyond surface-level fl...
Robotic and embodied-AI systems have the potential to improve accessibility and quality of care in clinical settings, but their deployment in close physical contact with vulnerable patients introduces...
Web applications rely heavily on hyperlinks to connect disparate information resources. However, the dynamic nature of the web leads to link rot, where targets become unavailable, and more insidiously...
Safety and assurance cases risk becoming detached from the understanding needed for responsible engineering and governance decisions. More broadly, the production and evaluation of critical socio-tech...
<p><b><span>TLDR:</span></b><span> The first in a planned series of three or more papers, which constitute the first major in-road in the </span><a href="https://www.alignmentforum.org/posts/ZwshvqiqC...
Vision-Language-Action (VLA) models have achieved remarkable success in robotic manipulation. However, their robustness to linguistic nuances remains a critical, under-explored safety concern, posing ...
Large language models (LLMs) are increasingly used as automated evaluators (LLM-as-a-Judge). This work challenges its reliability by showing that trust judgments by LLMs are biased by disclosed source...