Qwen Meetup Draft: Function Calling Harness 2 Boosts CoT Compliance from 9.91% to 100% via Structured Schemas

✍️ OpenClawRadar📅 Published: May 2, 2026🔗 Source
Qwen Meetup Draft: Function Calling Harness 2 Boosts CoT Compliance from 9.91% to 100% via Structured Schemas
Ad

A talk at Qwen Meetup Korea (end of May) presents a second iteration of the function-calling harness pattern. The original harness pushed qwen3-coder-next from 6.75% to 100% on backend codegen using type validation and compiler feedback. This update extends the same idea to domains that lack a compiler: investment memos, legal opinions, and clinical charts.

Schema-Driven CoT Compliance

The core mechanism is a TypeScript schema (using typia tags) that forces the model's reasoning into a required form. Every field must be filled or the submission is rejected. Example schema for an investment memo:

import { tags } from "typia";

export interface IInvestmentMemo { recommendation: "BUY" | "HOLD" | "SELL"; thesis: { consensusView: string; differentiatedView: string; }; counterThesis: { bearCase: string; ourResponse: string; }; // bull / base / bear all required — blocks submitting just the base case scenarios: { bull: IScenario; base: IScenario; bear: IScenario; }; // empty arrays are sealed valuationDrivers: IValuationDriver[] & tags.MinItems<1>; killConditions: IKillCondition[] & tags.MinItems<1>; evidenceSources: IEvidenceSource[] & tags.MinItems<1>; }

// Falsifiable thresholds only — blocks free-form like "trust in management" export type IKillCondition = | { type: "price_drawdown"; percentBelowEntry: number } | { type: "metric_breach"; metric: string; below: number } | { type: "milestone_miss"; expectedBy: string; what: string };

The schema is then validated by running it on historical investment cases — same idea as backtesting a trading strategy on market data. The diff shows which past calls the schema would have gotten right and which it missed; you add what's missing.

Ad

Measured CoT Compliance

Using AutoBE's CoT feature (not financial investment analysis itself), qwen3.6-27b keeps up with frontier models on these CoT-compliance schemas. The harness brings compliance from 9.91% to 100%.

Who It's For

Developers building AI agents that need structured, verifiable reasoning in domains without automatic correctness checks (e.g., finance, legal, medical).

📖 Read the full source: r/LocalLLaMA

Previous presentation: Part 1

Ad

👀 See Also