Security

VLAD

A typed, audited security-reasoning platform. Deterministic routing, evidence-backed findings, and a tamper-evident record of every decision.

Availability

Private

Status

Active

Overview

VLAD is a security-reasoning platform built around a single idea: a finding is only as good as the evidence and the audit trail behind it. Given a codebase, it routes work through typed crews — deterministic outer routing decides which crew runs, structured LLM-driven dispatch decides how it reasons, and every tool call underneath is strongly typed and recorded. The output is not a list of alerts but an evidence-backed assessment that can be inspected, replayed, and argued with.

The discipline is enforced in the engineering, not just intended. Every cross-boundary value is a frozen Pydantic model that forbids unknown fields, the codebase holds mypy --strict at zero errors, and source-and-test pairing is a CI gate. Every run produces a hash-chained, tamper-evident ledger, and findings, intelligence, and artifacts are persisted to an evidence store with hybrid semantic search so the final report is assembled from durable evidence rather than a conversation. On large repositories it refuses to pretend: a budget-bounded sweep reports what it reviewed, what it only partially reviewed, and what it deferred.

I have made an explicit decision to keep VLAD closed. The system can produce high-confidence, high-impact findings at a tempo defenders cannot match by hand, and putting that capability in the open would help offensive operators more than defenders. The work is shared instead through writing, talks, and selective collaboration with security teams that have a legitimate operational use for it.

Highlights

Flow to MiniCrew to Subagent to Tool. Deterministic outer routing decides what runs, structured LLM-driven inner dispatch decides how it reasons, and every tool call underneath is typed and audited.
Pydantic everywhere. Every value that crosses a boundary is frozen and forbids unknown fields, and the codebase holds mypy --strict at zero errors.
A hash-chained audit ledger. Every run emits a tamper-evident JSONL log that can be verified, replayed, and diffed, so a finding is never just an assertion.
Evidence-backed by construction. Findings, intel, and artifacts persist to an evidence store with hybrid semantic search, and the report is assembled from that evidence rather than from a transcript.
Honest coverage on large repositories. A budget-bounded partition sweep ranks uncovered areas by risk and reviews them until coverage is complete or the budget is spent, reporting reviewed, partial, and deferred instead of silently sampling.
TDD is enforced, not aspirational. Source and test pairing is a pre-commit and CI gate, with coverage held at or above ninety-five percent on the critical layers. Anthropic Claude is the default model, with OpenAI supported out of the box.

Crews and Tooling

Work is organized into typed crews with deterministic outer routing and LLM-driven inner dispatch. Counts are public; individual crew and tool names are not.

Class	Count	Purpose
Domain crews	23	Typed MiniCrew factories, each owning a slice of the assessment — vulnerability classes, language-specific patterns, architecture, infrastructure, dependencies, exposure, and live validation.
Governance crews	4	Cross-cutting correlation, report generation, report governance, and a tribunal that adjudicates contested findings before they ship.
Typed tools	123	Audited tool surfaces beneath the crews. Each carries a strict typed name and Pydantic-validated inputs and outputs.
Evidence backends	3	An evidence store over in-memory and SQLite backends with Qdrant-backed hybrid semantic search. Findings, intelligence, and artifacts are persisted and queryable.