Qwen Meetup Draft: Function Calling Harness 2 Boosts CoT Compliance from 9.91% to 100% via Structured Schemas

A talk at Qwen Meetup Korea (end of May) presents a second iteration of the function-calling harness pattern. The original harness pushed qwen3-coder-next from 6.75% to 100% on backend codegen using type validation and compiler feedback. This update extends the same idea to domains that lack a compiler: investment memos, legal opinions, and clinical charts.
Schema-Driven CoT Compliance
The core mechanism is a TypeScript schema (using typia tags) that forces the model's reasoning into a required form. Every field must be filled or the submission is rejected. Example schema for an investment memo:
import { tags } from "typia";
export interface IInvestmentMemo {
recommendation: "BUY" | "HOLD" | "SELL";
thesis: {
consensusView: string;
differentiatedView: string;
};
counterThesis: {
bearCase: string;
ourResponse: string;
};
// bull / base / bear all required — blocks submitting just the base case
scenarios: {
bull: IScenario;
base: IScenario;
bear: IScenario;
};
// empty arrays are sealed
valuationDrivers: IValuationDriver[] & tags.MinItems<1>;
killConditions: IKillCondition[] & tags.MinItems<1>;
evidenceSources: IEvidenceSource[] & tags.MinItems<1>;
}
// Falsifiable thresholds only — blocks free-form like "trust in management"
export type IKillCondition =
| { type: "price_drawdown"; percentBelowEntry: number }
| { type: "metric_breach"; metric: string; below: number }
| { type: "milestone_miss"; expectedBy: string; what: string };
The schema is then validated by running it on historical investment cases — same idea as backtesting a trading strategy on market data. The diff shows which past calls the schema would have gotten right and which it missed; you add what's missing.
Measured CoT Compliance
Using AutoBE's CoT feature (not financial investment analysis itself), qwen3.6-27b keeps up with frontier models on these CoT-compliance schemas. The harness brings compliance from 9.91% to 100%.
Who It's For
Developers building AI agents that need structured, verifiable reasoning in domains without automatic correctness checks (e.g., finance, legal, medical).
📖 Read the full source: r/LocalLLaMA
Previous presentation: Part 1
👀 See Also

AbsolutelySkilled Registry Adds 156 Production-Ready Skills for Claude Code
A developer has created AbsolutelySkilled, a registry of 156 structured skill modules for Claude Code that persist across sessions. Each skill includes trigger conditions, reference files, test cases, and anti-patterns in SKILL.md files.

Kanban CLI: A Local-First, Agent-First Task Manager for the Terminal
Kanban CLI is a Rust-based terminal tool that provides structured task tracking with full git integration, designed for AI agent-driven workflows.

MCP-Loci: Local Persistent Memory Server for Claude and MCP-Compatible AI
MCP-Loci is a persistent memory server that solves Claude's session-based memory limitation with five tools: remember, recall, forget, synthesize, and health. It uses hybrid BM25 keyword matching and semantic embeddings for accurate recall without requiring API keys.

OpenUtter: Query Google Meet Transcripts Live via OpenClaw
OpenUtter is a skill that joins Google Meet as a guest via a headless browser, captures live captions, and streams them to your OpenClaw event bus. You can query the live transcript mid-call via Telegram, WhatsApp, Slack, or Discord.