Back to Blog
AICode ReviewOpen SourceCLIDeveloper ToolsLLM

Quorate: A Council of AI Reviewers for Your Code

Umut Korkmaz2026-05-137 min read

I have spent enough time watching a single AI model review my code to know its limits. It is fast, it is articulate, and it is confidently wrong often enough to be dangerous. The problem is not that the model is bad. The problem is that one reviewer, human or machine, has one perspective, one set of blind spots, and one way of being overconfident. So I built Quorate, an open-source CLI that does something simple in concept and surprisingly useful in practice: it convenes a council. Several models review the same diff in parallel, each as an independent reviewer, and then their verdicts are synthesized into a single actionable review. The package lives on npm as quorate, the source is on GitHub under UmutKorkmaz, and it is MIT licensed.

Why one reviewer is not enough

When you ask a single model to review a diff, you get a fluent answer that feels complete. That feeling is the trap. A model will happily skip the off-by-one in a loop boundary while writing three paragraphs about naming conventions. Another model, given the same diff, will catch the boundary bug and miss the subtle resource leak. Neither is wrong exactly. They are each looking through a narrow aperture shaped by their training and their decoding quirks on that particular day.

Human code review solved this socially a long time ago. We do not let one person sign off on anything that matters. We pull in a second pair of eyes precisely because independent perspectives catch what a single reviewer misses. Quorate takes that instinct and applies it to models. By running several of them over the same change, you reduce single-model blind spots and you puncture the false confidence that comes from one polished answer. The disagreements between reviewers are not noise to be smoothed over. They are the most valuable signal in the whole process.

How the council works

The mechanics are deliberately unglamorous. Quorate takes a diff, fans it out to the configured providers, and runs each model as its own reviewer over the same input. They do not see each other's responses. That isolation matters, because the moment reviewers can read each other you get herding, where a weaker take anchors the others and the diversity you paid for collapses.

Each reviewer returns its own findings: the bugs it thinks it found, the risks it flagged, the things it would change. Because they run in parallel rather than in a chain, the wall-clock cost is roughly the slowest model rather than the sum of all of them. You are spending more tokens, not more time, and that is usually the trade you want when the output is a review you are going to act on.

Synthesizing reviewers who disagree

The interesting engineering is not in calling several models. It is in merging what they say. Three reviewers will rarely agree cleanly. One flags a security concern the others ignored. Two independently land on the same logic bug, phrased differently. One invents a problem that does not exist. Synthesis has to sort all of this into something a human can act on without reading three full reviews.

The heuristic I keep coming back to is convergence as a confidence signal. When multiple independent reviewers land on the same issue, that issue moves to the top, because agreement across models that cannot see each other is hard to fake. A finding raised by only one reviewer is not discarded, but it is presented as exactly that: a single perspective worth a look, not a consensus verdict. The synthesis step also has to dedupe, because the same bug described three ways should appear once, and it has to resist the urge to average everything into mush. A real disagreement about whether something is a bug is more useful surfaced than resolved.

The cost and latency you are actually paying

I want to be honest about the trade, because a council is not free. Running several models over every diff multiplies your token spend, plainly. For a small change that is trivial. For a large refactor reviewed on every commit it adds up, and you should be deliberate about when the council is worth convening. I do not run five reviewers on a one-line typo fix. I run them on the changes where a missed bug is expensive: anything touching auth, money, data migrations, or concurrency.

Latency is gentler than cost because of the parallelism, but it is not zero. You are still waiting on the slowest reviewer plus the synthesis pass. In practice that is seconds, not minutes, which is fine for a pre-commit or pre-push gate and probably too slow to run on every keystroke.

When consensus is signal and when it is noise

The failure mode I watch for is treating agreement as truth. Sometimes models converge because they share the same blind spot, not because they are right. If every reviewer was trained on the same flavor of code, they can all confidently miss the same class of bug, and their unanimous approval gives you false comfort. Diversity of providers is part of the defense here, but it is not a guarantee.

So I read a Quorate review the way I read a human one: as input, not as a verdict. The council is very good at widening the net and surfacing the issues a single model would have buried. It is not a substitute for understanding your own change. What it buys you is fewer surprises in production and a healthy distrust of any one reviewer that is too sure of itself, which, after enough years, is most of what I want from review in the first place.