🛡️AutoControl Arena: Synthesizing Executable Test Environments for Frontier AI Risk Evaluation
-
Updated
Apr 20, 2026 - Python
🛡️AutoControl Arena: Synthesizing Executable Test Environments for Frontier AI Risk Evaluation
200 AI agent skills, hardened with targeted behavioral guardrails. Free drop-in replacements.
The left hemisphere. Frameworks, logic, and certainty architecture. Home of FSVE, AION, LAV, ASL, GENESIS, TOPOS, and 60+ epistemically validated frameworks built to make AI systems reliable, not just capable.
👟 SUP: Sycophancy Under Pressure
AgenticStore: The secure toolkit for AI agents. Instantly equip Claude Desktop, Cursor, and Windsurf with 27+ MCP tools, persistent memory, and SearXNG search, all protected by a built-in PII prompt firewall to protect your data from being exposed to AI agents.
AI security scanner for OpenClaw - powered by AgentTinman. Discovers prompt injection, tool exfil, context bleed, and other security issues in your AI assistant sessions, then proposes mitigations mapped to OpenClaw's security controls.
A testbed for the Animal Harm Benchmark.
Shield models AI safety the way humans experience safety
Real-Time Manifold Integrity for Deterministic LLM Hallucination Suppression.
A very simple agent framework for LLM-based agents research, as self-contained as possible
Infusion: Shaping Model Behavior by Editing Training Data via Influence Functions
Architecture determines whether decision quality signals beyond confidence are observable or effectively hidden.
A kernel-userland protocol enforcing information-theoretic bounds on AI adaptivity leakage, benchmark gaming, and capability spillover.
To Learn Without the Possibility of Undoing is not Intelligence, It's a Surrender to Emergence.
AI-HPP-Standard: an inspection-ready architecture for accountable AI systems. Vendor-neutral. Audit-ready. High-risk gated. Developed via structured multi-model orchestration with human oversight. Designed to support emerging international AI governance.
The Triune of Sovereignty: Substrate Agnostic Relational Epistemics — Foundational Framework and Cross-Substrate Validation
SDFI emerges specifically under conditions of recursive self-description and sustained high semantic density, not in ordinary task-oriented interaction.This work is intended as a reference for researchers and system designers thinking about neutrality, termination behavior, and control surfaces in future AI systems.
Toy framework illustrating limits of AI policy verification and the shift to proof-carrying, instance-level certification.
Risk-Aware Introspective RAG (RAI-RAG) is a safety-aligned RAG framework integrating introspective reasoning, risk-aware retrieval gating, and secure evidence filtering to build trustworthy, robust, and secure LLM and agentic AI systems.
Этот репозиторий посвящен исследованию онтологических патологий в LLM-архитектурах. Я не ищу дыры в цензуре, я строю систему исследования и управления интеллектом, картографирую симуляционные побочные эффекты под давлением современных методов элаймента.
Add a description, image, and links to the ai-safety-research topic page so that developers can more easily learn about it.
To associate your repository with the ai-safety-research topic, visit your repo's landing page and select "manage topics."