Build software better, together

CosmosYi / AutoControl-Arena

Star

🛡️AutoControl Arena: Synthesizing Executable Test Environments for Frontier AI Risk Evaluation

agent alignment ai-safety ai-agents ai-agent safety-evaluation safety-alignment ai-safety-research

Updated Apr 20, 2026
Python

faberlens / hardened-skills

Star

200 AI agent skills, hardened with targeted behavioral guardrails. Free drop-in replacements.

mcp ai-safety ai-agents ai-security ai-skills ai-safety-research ai-behavior-analysis ai-skill-safety

Updated Apr 21, 2026
JavaScript

The left hemisphere. Frameworks, logic, and certainty architecture. Home of FSVE, AION, LAV, ASL, GENESIS, TOPOS, and 60+ epistemically validated frameworks built to make AI systems reliable, not just capable.

Updated Apr 15, 2026
Python

tashakim / sycop

Star

👟 SUP: Sycophancy Under Pressure

evaluation ai-safety runtime-enforcement sycophancy policy-compliance ai-safety-research

Updated Jan 11, 2026
Python

agenticstore / agentic-store-mcp

Star

AgenticStore: The secure toolkit for AI agents. Instantly equip Claude Desktop, Cursor, and Windsurf with 27+ MCP tools, persistent memory, and SearXNG search, all protected by a built-in PII prompt firewall to protect your data from being exposed to AI agents.

open-source mcp ai-agents ai-security ai-tools prompt-tuning ai-governance ai-tools-directory antigravity model-context-protocol mcp-server ai-skills claude-code model-context-protocol-servers trending-ai ai-safety-research

Updated Apr 9, 2026
Python

tinmanlabsl / openclaw-skill-tinman

Star

AI security scanner for OpenClaw - powered by AgentTinman. Discovers prompt injection, tool exfil, context bleed, and other security issues in your AI assistant sessions, then proposes mitigations mapped to OpenClaw's security controls.

security ai-safety ai-safety-research openclaw-skills

Updated Feb 17, 2026
Python

lukasgebhard / animalharmbench-testbed

Star

A testbed for the Animal Harm Benchmark.

supervised-finetuning llm-evaluation ai-safety-research

Updated Mar 1, 2026
Python

xraph / shield

Star

Shield models AI safety the way humans experience safety

ai ai-safety ai-safety-research

Updated Apr 20, 2026
Go

AEjonanonymous / Semantic-Gate-IP-Core

Star

Real-Time Manifold Integrity for Deterministic LLM Hallucination Suppression.

semantic fpga embeddings javascript-engine systemverilog gate manifold-learning killswitch manhattan-distance hardware-security ip-core real-time-monitoring edge-ai large-language-models high-dimensional-vectors hallucination-mitigation deterministic-ai ai-safety-research

Updated Apr 19, 2026
HTML

Antonio-Tresol / agents

Star

A very simple agent framework for LLM-based agents research, as self-contained as possible

ai-agents low-dependency ai-safety-research

Updated Mar 8, 2026
Python

jrosseruk / infusion

Star

Infusion: Shaping Model Behavior by Editing Training Data via Influence Functions

influence-functions training-data-attribution ai-safety-research

Updated Apr 11, 2026
Jupyter Notebook

tmcarmichael / nn-observability

Star

Architecture determines whether decision quality signals beyond confidence are observable or effectively hidden.

pytorch transformer ai-safety interpretability probing ai-research model-observability mechanistic-interpretability ai-safety-research activation-probing

Updated Apr 26, 2026
Python

jverdicc / EvidenceOS

Star

A kernel-userland protocol enforcing information-theoretic bounds on AI adaptivity leakage, benchmark gaming, and capability spillover.

rust cryptography kernel webassembly wasm verification sandboxing formal-methods systems-programming ai-safety differential-privacy reference-monitor e-values agentic-ai deterministic-execution ai-safety-research epistemic-security benchmark-integrity

Updated Mar 5, 2026
Rust

PardhuSreeRushiVarma20060119 / rlae-research

Sponsor

Star

To Learn Without the Possibility of Undoing is not Intelligence, It's a Surrender to Emergence.

experimentation research-paper security-research robust-machine-learning svar low-rank-adaptation ai-safety-research rlae reversible-behavoural-learning

Updated Apr 26, 2026
Jupyter Notebook

tryblackjack / AI-HPP-Standard

Star

AI-HPP-Standard: an inspection-ready architecture for accountable AI systems. Vendor-neutral. Audit-ready. High-risk gated. Developed via structured multi-model orchestration with human oversight. Designed to support emerging international AI governance.

ai-safety human-in-the-loop machine-ethics ai-ethics ai-alignment responsible-ai ai-policy ai-governance human-in-the-loop-ai ai-safety-research evidence-vault engineering-hack

Updated Apr 20, 2026
Python

Triune-of-Sovereignty / ToS-SARE-Foundations

Star

The Triune of Sovereignty: Substrate Agnostic Relational Epistemics — Foundational Framework and Cross-Substrate Validation

ai-safety antifragility mathematical-proof substrate-agnostic epistemics ai-safety-design ai-safety-research relational-topology cross-substrate-validation unknowable epistemic-structures

Updated Feb 20, 2026

leenathomas01 / Self-Descriptive-Fixed-Point-Instability-A-Cross-Architecture-Study-of-Recursive-Engagement-Collapse

Star

SDFI emerges specifically under conditions of recursive self-description and sustained high semantic density, not in ordinary task-oriented interaction.This work is intended as a reference for researchers and system designers thinking about neutrality, termination behavior, and control surfaces in future AI systems.

interaction-design control-theory epistemology emergent-behavior system-design conversational-ai research-notes meta-learning ai-alignment transformer-models human-ai-interaction rlhf llm-architecture recursive-systems ai-safety-research

Updated Feb 1, 2026

mhasan08 / ai-verification-limits

Star

Toy framework illustrating limits of AI policy verification and the shift to proof-carrying, instance-level certification.

information-theory verification formal-methods ai-safety kolmogorov-complexity trustworthy-ai proof-carrying ai-verification ai-safety-research

Updated Apr 7, 2026
Python

Miraj-Rahman-AI / RAI-RAG

Star

Risk-Aware Introspective RAG (RAI-RAG) is a safety-aligned RAG framework integrating introspective reasoning, risk-aware retrieval gating, and secure evidence filtering to build trustworthy, robust, and secure LLM and agentic AI systems.

trustworthy-ai safety-alignment retrieval-augmented-generation-rag agentic-ai-security llm-security-compliance-prompt-injection ai-safety-research risk-aware-ai introspective-reasoning

Updated Mar 7, 2026
Python

researchpogost / AI-Research-Lab

Star

Этот репозиторий посвящен исследованию онтологических патологий в LLM-архитектурах. Я не ищу дыры в цензуре, я строю систему исследования и управления интеллектом, картографирую симуляционные побочные эффекты под давлением современных методов элаймента.

research ml ontology alignment semantic-search red-team ai-safety interpretability red-teaming coherence-analysis llm model-behavior semantic-operators output-control ai-safety-research ontological-framing

Updated Apr 20, 2026
Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-safety-research

Here are 45 public repositories matching this topic...

CosmosYi / AutoControl-Arena

faberlens / hardened-skills

AionSystem / AION-BRAIN

tashakim / sycop

agenticstore / agentic-store-mcp

tinmanlabsl / openclaw-skill-tinman

lukasgebhard / animalharmbench-testbed

xraph / shield

AEjonanonymous / Semantic-Gate-IP-Core

Antonio-Tresol / agents

jrosseruk / infusion

tmcarmichael / nn-observability

jverdicc / EvidenceOS

PardhuSreeRushiVarma20060119 / rlae-research

tryblackjack / AI-HPP-Standard

Triune-of-Sovereignty / ToS-SARE-Foundations

leenathomas01 / Self-Descriptive-Fixed-Point-Instability-A-Cross-Architecture-Study-of-Recursive-Engagement-Collapse

mhasan08 / ai-verification-limits

Miraj-Rahman-AI / RAI-RAG

researchpogost / AI-Research-Lab

Improve this page

Add this topic to your repo