Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasoning elevation🍓 and hallucination alleviation🍄.
-
Updated
Dec 7, 2024 - Jupyter Notebook
Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasoning elevation🍓 and hallucination alleviation🍄.
Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Models
CSC-SQL: Corrective Self-Consistency in Text-to-SQL via Reinforcement Learning
Black-box AI reliability certification via self-consistency sampling and conformal calibration
The official PyTorch implementation for the Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence
Package for solving generalized BdG mean field theory of interacting systems.
KG-RAG + ToT + multi-agent LLMs for evidence-grounded QA with Neo4j and fine-tuning; reproducible medical case study & evaluation.
Perl implementation of Markov Chain for the course BIO331
Fixed Point solver for generic functions
GSM8K-Consistency is a benchmark database for analyzing the consistency of Arithmetic Reasoning on GSM8K.
An evaluation of prompting techniques (Zero-Shot CoT, Few-Shot, Self-Consistency) on the Mistral-7B model for mathematical reasoning. This project systematically benchmarks 7 distinct methods on the GSM8K dataset.
Subject of Electronic structure for my master's degree
Tactical next-action + reasoning prediction on 348 football match contexts (Shipd Project Eris). 4-component ensemble with task-coupling: DeBERTa-v3-base / large, cross-encoder MCQ scorer, zero-shot NLI, and a three-pass Qwen3.5-35B-A3B-Int4 + Gemma-4-26B-A4B-it MoE fusion with PRM rerank. W&B-instrumented. Target combined ≥ 0.80
Advanced prompt engineering techniques: Chain-of-Thought, Tree-of-Thoughts, ReAct, Self-Consistency
Self consistent model based filter design for 3-phase PLLs.
Evaluation framework for self-hosted LLMs. Systematic prompt ablation (baseline, CoT, few-shot, self-consistency voting) on Llama 3.1 8B via lm-evaluation-harness, with Wilson CI statistical analysis, determinism validation, and load testing under concurrency. Found chain-of-thought degrades accuracy 25pp at small scale.
A consistency-based firewall for high-stakes Retrieval Augmented Generation (RAG). Queries the model multiple times and incinerates the output if entropy is high (divergent answers), preferring silence over hallucination.
Advanced Context Engineering & Prompt Harness for LLM Agents
Research: Does multilingual self-consistency improve LLM reasoning beyond math? Empirical study across 6 benchmarks (commonsense, ethics, NLI, knowledge) and 10 languages using Qwen2.5-32B and Aya Expanse 32B on Apple Silicon (MLX). Chain-of-thought + cross-lingual prompting.
Add a description, image, and links to the self-consistency topic page so that developers can more easily learn about it.
To associate your repository with the self-consistency topic, visit your repo's landing page and select "manage topics."