Introduction
The community-driven uncensored LLM ecosystem accelerated dramatically in the final weeks of February and the start of March 2026. While major AI labs tighten alignment policies with each new flagship release, an equally ambitious wave of fine-tuners and abliterators continues to unlock these models for local, private, and unrestricted use. This March 2026 update chronicles the 20 newest uncensored model releases, organized with technical depth for practitioners who demand a precise understanding of what they are deploying.
Table of Contents
From the GLM-4.7 Flash variants and Qwen3 MoE abliterations to multilingual Dutch models and community-distilled Gemma hybrids, the breadth of this release window is extraordinary. For those new to local deployment, our guide to deploying open-source LLMs locally is a recommended prerequisite. For hardware selection guidance, see our GPU selection guide for local LLMs.
Abliteration vs Fine-Tuning: The Technical Difference
The models in this list were produced using one of two primary techniques. Understanding the distinction is critical for evaluating the quality and reliability of each release.
Abliteration (Weight Surgery)
Abliteration is a post-processing technique applied directly to trained model weights. It identifies and neutralizes the refusal direction vector within the model's residual stream using a technique similar to representation engineering. The result is a model that retains 100% of its original intelligence but has had its safety guardrail mechanism surgically disabled. Tools like the Heretic framework specialize in lossless abliteration.
Uncensored Fine-Tuning (Dataset Replacement)
Fine-tuning on curated uncensored datasets replaces a model's safety-trained behavior by overwhelming it with highly compliant instruction-response pairs. While effective, this method can slightly degrade the model's raw capabilities if the training dataset is too small or low-quality. Models marked with SFT (Supervised Fine-Tuning) or DPO (Direct Preference Optimization) in their names typically use this approach.
Most high-quality releases in March 2026 use abliteration, as it is now the preferred method for preserving model intelligence. For a deeper technical examination of quantization formats used in these models (GGUF, MXFP4), see our GGUF quantization guide.
20 Latest Uncensored LLM Releases (March 2026)
The following models were released between February 19 and March 1, 2026. All are available via GGUF or standard formats and are runnable via Ollama, Llama.cpp, or LM Studio.
GLM-4.7-Flash-Grande-Heretic-UNCENSORED (42B Total / 3B Active)
Base Model: GLM-4.7-Flash (Zhipu AI)
Method: Abliteration (Heretic)
The newest and most powerful entry in the GLM-4.7 Heretic lineage. This Grande variant packs 42 billion total parameters into a Mixture-of-Experts architecture that activates only 3 billion per token during inference, delivering enterprise-class reasoning at mid-range consumer VRAM levels. The Heretic abliteration layer neutralizes all refusal vectors while preserving the model's 200,000-token context window and SWE-bench-certified coding capability. Ideal for complex agentic workflows without compliance guardrails.
GLM-4.7-Flash-Uncensored-Heretic-NEO-CODE-Imatrix-MAX
Base Model: GLM-4.7-Flash (Zhipu AI)
Method: Abliteration (Heretic NEO)
The NEO-CODE build of the GLM-4.7 Heretic series represents a specialized fork optimized for software engineering tasks. The Imatrix MAX quantization applies a calibration dataset during compression to preserve precision in the most performance-critical weight layers. This results in minimal perplexity loss even at Q4 precision. Multiple builds of this exact model were released by different curators, reflecting its rapid community adoption. Recommended for developers needing zero-refusal coding assistance on 16GB VRAM cards.
Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored (18.4B)
Base Model: Meta Llama 3.2 3B (x8 MoE)
Method: Abliteration
The Dark Champion series is one of the most respected uncensored lineages in the local AI community. This 18.4B MoE variant is assembled from eight Llama 3.2 3B expert models, producing intelligence comparable to a 24B dense model while maintaining the inference speed of a much smaller architecture. With a 128,000-token context window and over 50 tokens per second throughput on a 16GB card, it is the premier choice for high-speed, low-VRAM uncensored roleplay, fiction, and complex instruction following.
GLM-4.7-Flash-Uncensored-Heretic-NEO-CODE-Imatrix-MAX (Feb 26)
Base Model: GLM-4.7-Flash
Method: Abliteration
A second independent community release of the GLM-4.7 Flash Heretic NEO-CODE model, published on February 26. The proliferation of these releases across different curators validates the high demand for this configuration. This build is identical in capability to the February 28 release but may carry different quantization calibration data, offering a slightly different performance profile suited to users who have found suboptimal results with earlier quants.
GEITje-7b-uncensored-GGUF
Base Model: GEITje-7B (Mistral 7B / Dutch)
Method: Uncensored Fine-Tune
GEITje is a landmark model in the European open-source AI community, representing one of the first high-quality Dutch-language LLMs. Built on the Mistral 7B foundation, it was pre-trained on 10 billion tokens of Dutch-language text. This GGUF-packaged uncensored variant uses the Apache 2.0 license, allowing truly unrestricted use for Dutch-language creative writing, legal research, and data analysis without content filters. It is a critical tool for Dutch-speaking researchers and journalists who need a private AI that operates natively in their language.
GEITje-7b-uncensored (Native Weights)
Base Model: GEITje-7B (Mistral 7B / Dutch)
Method: Uncensored Fine-Tune
The full-precision native weights companion to the GEITje uncensored GGUF release. Published simultaneously, this non-quantized release is intended for Dutch AI researchers and developers who need to perform further fine-tuning, embedding generation, or evaluation against Dutch-language benchmarks. Running at full BF16 precision ensures zero quality loss for academic research applications that the quantized GGUF variant cannot match.
Venice Uncensored (Q8_0 GGUF)
Base Model: Dolphin Mistral 24B Venice Edition
Method: Architectural De-alignment
The Venice Uncensored model is the flagship offering from Venice.ai, a privacy-first AI platform that runs entirely on the user's local hardware. Based on the Dolphin Mistral 24B architecture, this Q8_0 GGUF build prioritizes maximum fidelity to the original intelligence. Venice.ai removes all system-level safety prompts, making this a truly prompt-faithful model. The Q8_0 format at this scale requires 20GB of VRAM but delivers near-BF16 quality, making it a top-tier choice for professional users with RTX 3090 or 4090 cards.
Qwen3-30B-A3B-Claude-4.5-Opus-ABLITERATED-UNCENSORED-V2 (MLX 6-bit)
Base Model: Qwen3 30B-A3B + Claude 4.5 Opus distillation
Method: Abliteration V2 + Knowledge Distillation
One of the most ambitious merges in this release window. This model combines the Qwen3 30B-A3B MoE architecture with a knowledge distillation layer derived from Claude 4.5 Opus reasoning patterns. The V2 abliteration pass removes all safety filters from both the base model's weights and any distilled safety signals from the Claude training. Released in Apple MLX 6-bit format, it is specifically optimized for M4 Max and M4 Ultra Macs. Benchmarks show it exceeding 100 tokens per second on M4 Max while maintaining Qwen3-level reasoning coherence. Exceptional for Apple-native uncensored RAG pipelines.
Darkhn-M3.2-36B-Animus-V12.0-Heretic-Uncensored
Base Model: Llama 3.2 MoE (Custom Darkhn Architecture)
Method: Abliteration (Heretic V12)
Animus V12 represents the 12th major iteration of the Darkhn ecosystem, which has developed a reputation for producing some of the most sophisticated uncensored creative models available. Built on a custom Llama 3.2 MoE architecture at 36B parameters, this model features extraordinary compression and extremely low perplexity, surpassing Meta Llama 3 Instruct in measurable coherence. With a 128,000-token context window, it sustains character consistency and plot coherence over extremely long creative writing sessions. The V12 Heretic pass boasts a near-zero refusal rate across all categories.
GLM-4.7-Flash-Uncensored-Heretic-NEO-CODE-Imatrix-MAX (Feb 23)
Base Model: GLM-4.7-Flash
Method: Abliteration
The first widely-distributed release of the GLM-4.7 Flash Heretic NEO-CODE configuration, published February 23. This variant provided the calibration template that subsequent curators followed for the February 26 and 28 releases. As the original build in this lineage, it has accumulated the largest user base and most community-generated evaluation data. Highly recommended for users who want the most battle-tested version of this architecture.
GLM5.Uncensored
Base Model: GLM-5 (Zhipu AI)
Method: Full Uncensored Fine-Tune
This community release represents one of the first uncensored fine-tunes targeting the GLM-5 architecture, the successor to the GLM-4 series. As an early build with unspecified parameter counts in its official release metadata, it is currently recommended for advanced users and researchers who want to evaluate next-generation GLM performance without safety filters. The model demonstrates significantly improved multilingual capabilities and more coherent extended reasoning compared to its GLM-4.7 predecessors.
Qwen3-4B-Thinking-2507-SFT-Uncensored
Base Model: Qwen3 4B (Thinking Mode)
Method: Supervised Fine-Tuning (SFT)
A compact but powerful model that applies uncensored SFT to the Thinking Mode variant of Qwen3 4B. Thinking Mode enables the model to internally reason through complex problems before generating a final answer, similar to DeepSeek R1 chain-of-thought. With safety filters removed via SFT, the model can apply this deep reasoning capability to sensitive technical questions without generating refusals. At just 3GB VRAM in Q4, it is the most powerful thinking model available for edge hardware and low-performance laptops.
Llama-3.2-3B-Instruct-uncensored-GGUF
Base Model: Meta Llama 3.2 3B Instruct
Method: Uncensored Fine-Tune
A clean, community-packaged GGUF release of the Llama 3.2 3B Instruct model with safety alignment removed. At 2.5GB in Q4, this is an extremely portable model that can run on virtually any device with a dedicated GPU. It is the first choice for developers building uncensored AI pipelines on Raspberry Pi clusters, Apple M1 machines, or old mid-range laptops. Its core strength lies in fast, reliable basic instruction following without any moralizing refusals.
Mistral-Nemo-2407-12B-Uncensored-HERETIC
Base Model: Mistral Nemo 12B (Mistral AI x NVIDIA)
Method: Abliteration (Heretic)
Mistral Nemo, a collaboration between Mistral AI and NVIDIA, has been liberated via Heretic abliteration in this release. The Thinking variant in the name suggests this build incorporates a chain-of-thought fine-tune on top of the uncensored base, allowing it to reason through complex multi-step problems without safety filters. Targeted at users with 8GB-16GB VRAM cards who need a balance of intelligence and compliance for technical research, security analysis, or complex analytical work.
Llama-3.2-3B-Instruct-uncensored (Native Weights)
Base Model: Meta Llama 3.2 3B Instruct
Method: Uncensored Fine-Tune
Companion full-precision release to the GGUF version published the same day. Released in native safetensors format for researchers and developers who need to fine-tune further, merge layers, or evaluate the model using frameworks like Transformers. Running at full BF16 precision allows researchers to benchmark the baseline uncensored Llama 3.2 3B performance without any quantization artifacts.
Llama3.1-70b-Uncensored
Base Model: Meta Llama 3.1 70B
Method: Uncensored Fine-Tune
A full-precision release targeting the Llama 3.1 70B architecture, one of the most trusted large-scale open-weight models available. At 40GB in Q4, this requires a multi-GPU setup or a Mac Studio Ultra. It is the go-to choice for enterprise-grade local uncensored deployments, providing world-class reasoning, coding, and multilingual comprehension without any alignment-imposed restrictions. Recommended for law firms, hospitals, and research institutions running air-gapped AI infrastructure.
ripd-anthropic-saferlhf-gemma-2b-uncensored-v1-biased-bt
Base Model: Google Gemma 2B
Method: Reverse-RLHF (RIPO)
One of the most technically unusual releases in this window. This model applies a research technique that reverses the Anthropic SaferLHF (Safety from Human Feedback) alignment process, essentially unlearning the safety training that was originally applied. The biased-bt variant applies a Bradley-Terry preference model in reverse. At just 2GB VRAM in Q4, this is a research artifact more than a production model, but provides unique insights into how alignment techniques function and can be mechanically reversed.
ripd-anthropic-saferlhf-gemma-2b-uncensored-v1-seed-bt
Base Model: Google Gemma 2B
Method: Reverse-RLHF (Seed Variant)
The seed variant of the Anthropic SaferLHF reverse-engineering project. While identical in base architecture to the biased-bt release, the seed variant uses a different random initialization for the Bradley-Terry preference reversal, allowing researchers to compare how initial conditions affect the final uncensored behavior distribution. A critical pair of models for academic research into LLM alignment mechanics.
OpenAi-GPT-oss-20b-abliterated-uncensored-NEO-Imatrix
Base Model: OpenAI GPT-OSS 20B (Community Open Weight)
Method: Abliteration (NEO) + Imatrix
The NEO-Imatrix build of the GPT-OSS 20B abliteration, the Imatrix calibration process significantly reduces quality loss at 4-bit precision. This model targets the sweet spot between the compact Llama 3B variants and the demanding 36B+ models, providing professional-grade intelligence at mid-range VRAM requirements. It is particularly strong in mixed-modality tasks, blending code generation, logical analysis, and creative writing without any safety-driven topic avoidance.
qwe2.5-coder-Uncensored
Base Model: Qwen 2.5 Coder
Method: Uncensored Fine-Tune
A focused uncensored release targeting the Qwen 2.5 Coder architecture, which was specifically designed for software engineering tasks. By removing the safety filters from a coding-specialized model, this release allows developers to explore the full range of software security, exploit development, and penetration testing topics without content restrictions. The Gods Dev Project release focuses on making the model usable for open-source software development research without corporate-era content moderation.
March 2026 Model Comparison Table
| # | Model Name | Params | Min VRAM | Method | Best For |
|---|---|---|---|---|---|
| 01 | GLM-4.7-Flash-Grande Heretic | 42B MoE | 16GB | Abliteration | Agentic Coding |
| 02 | GLM-4.7-Flash NEO-CODE Imatrix | 30B MoE | 14GB | Abliteration | Dev Work |
| 03 | Llama-3.2 Dark Champion 18.4B MoE | 18.4B MoE | 10GB | Abliteration | Roleplay / Fiction |
| 05 | GEITje-7b-uncensored GGUF | 7B | 5GB | SFT | Dutch Language |
| 07 | Venice Uncensored Q8_0 | 24B | 20GB | De-alignment | Privacy / Privacy-First |
| 08 | Qwen3-30B-A3B Abliterated V2 MLX | 30B MoE | 24GB | Abliteration V2 | Apple Silicon / RAG |
| 09 | Darkhn Animus V12 Heretic | 36B | 22GB | Abliteration V12 | Long Context Creative |
| 12 | Qwen3-4B-Thinking-Uncensored | 4B | 3GB | SFT | Edge / Thinking Mode |
| 14 | Mistral Nemo 12B Uncensored Heretic | 12B | 8GB | Abliteration | Research / Analysis |
| 16 | Llama3.1-70B-Uncensored | 70B | 40GB | SFT | Enterprise Deployments |
| 19 | GPT-OSS-20B NEO Imatrix | 20B | 12GB | Abliteration | Mixed Tasks |
| 20 | Qwen2.5-coder-Uncensored | Variable | 5GB+ | SFT | Security / Code |
Hardware Guide: What Can You Actually Run?
Your VRAM determines which models in this list are accessible to you. For a complete 2026 hardware breakdown with specific GPU recommendations, read our GPU guide for local LLMs.
RTX 4060 / RX 7600 Tier
Access to models #12, #13, #15, #17, #18, #20. The Qwen3-4B Thinking model, GEITje 7B, Llama 3.2 3B variants, and the 2B Gemma research models all run comfortably. Also covers Mistral Nemo 12B at Q4.
RTX 4080 / RX 7900 XTX Tier
Unlocks the mainstream MoE models: #03 (Dark Champion 18.4B), #02 and #10 (GLM-4.7 Flash MoE), and #19 (GPT-OSS 20B). The sweet spot for uncensored local AI in 2026.
RTX 4090 / RTX 5080 Tier
Adds the Venice Uncensored Q8 (#07), Qwen3-30B-A3B (#08), and all GLM-4.7 variants at Q8 precision. Also activates the Darkhn Animus V12 (#09) at Q4.
RTX 5090 / Multi-GPU / Apple Ultra Tier
Full access to all 20 models including the Llama3.1-70B (#16) and the GLM-4.7-Flash-Grande 42B (#01). The Mac Studio Ultra with 192GB unified memory can run every model in this list at Q8 precision.
Primary Use Cases for Uncensored Local Models
Creative & Fiction Writing
Models like Darkhn Animus V12 and the Llama 3.2 Dark Champion MoE are specifically tuned for long-form creative content, dark-themed fiction, and mature narratives without content-based interruptions.
Models #03, #09Cybersecurity & Red Teaming
The GPT-OSS 20B NEO Imatrix and Qwen2.5-coder uncensored releases allow security researchers to analyze malware, explore exploit logic, and understand adversarial techniques without triggering safety refusals.
Models #19, #20Agentic Coding & Automation
The GLM-4.7 Flash series leads in agentic tasks, with the Grande 42B variant providing tool-use, UI generation, and multi-step code execution at speeds that match cloud APIs.
Models #01, #02, #10Multilingual Research
GEITje remains the definitive uncensored Dutch-language model, critical for researchers, journalists, and policy analysts operating in the Dutch linguistic sphere.
Models #05, #06Apple Silicon RAG Pipelines
The Qwen3-30B MLX build delivers state-of-the-art reasoning on M4 Max at over 100 tokens per second, making it the flagship uncensored model for Apple-native AI applications.
Model #08Academic AI Alignment Research
The Gemma 2B reverse-RLHF models offer unique insights into how safety training is mechanically applied to LLMs and how it can be systematically reversed, invaluable for AI safety researchers.
Models #17, #18Responsible Usage Notice
[!] Disclaimer: The models documented in this guide are published for educational, research, and informational purposes. Running abliterated or uncensored LLMs locally is legal in most jurisdictions under open research principles. However, users are entirely and solely responsible for the content they generate using these tools.
Removing safety guardrails means removing the mechanism that prevents the generation of harmful content. These models should not be used to generate material that violates local laws, facilitates criminal activity, or produces targeted harassment. For regulated industries (healthcare, legal, financial), always implement a secondary aligned model as a review layer before using any output in production. Read our privacy and policy guide for uncensored LLMs for the full regulatory landscape.
Conclusion
The March 2026 uncensored model release window confirms that the community-driven de-alignment movement is accelerating, not slowing. The GLM-4.7 Flash Heretic lineage has established itself as the dominant force in the mid-range uncensored space, while the Darkhn and Dark Champion series continue to set the standard for creative intelligence. For budget-conscious local users, the Qwen3-4B Thinking uncensored model at 3GB VRAM is a landmark achievement. For the full ecosystem of uncensored models beyond this March update, see our complete guide to the top uncensored open-source AI models in 2026.