DeepSeek vs Grok: Developer Switches for Finance AI Agent Speed

Finance AI Agent Performance Issues and Potential Switch

A developer has built a finance AI web app in FastAPI/Python that functions similarly to Perplexity but for stocks. The application runs a parallel pipeline before the LLM processes queries, including live stock quotes from several finance APIs, live web search from finance search APIs, and earnings calendar data. All this structured context gets injected into the system prompt, with the model handling only reasoning and formatting while facts come from APIs, making hallucination rates less relevant for this use case.

Current Model Performance Problems

The developer is currently using DeepSeek V3.2 Reasoning and reports significant performance issues:

TTFT (Time to First Token): ~70 seconds
Output speed: ~25 tokens per second
Streaming experience described as "terrible"
Stream start timeout set to 75 seconds to avoid constant timeouts

Application Requirements

The finance AI agent has two main features:

Chat stream: Perplexity-style finance analysis with inline source citations
Trade check stream: Trade coach that outputs GO/NO-GO/WAIT with entry, stop-loss, target, and R:R ratio

Model requirements include:

Fast performance with low TTFT and high tokens per second for streaming UX
Low cost for a small project
Smart enough for multi-step trade reasoning
Good instruction following for strict output formats in trade checks

Considering Grok 4.1 Fast Reasoning

The developer is considering switching to Grok 4.1 Fast Reasoning based on these comparisons:

TTFT: ~15 seconds (vs DeepSeek's ~70s)
Output speed: ~75 tokens per second (vs DeepSeek's ~25 t/s)
AA intelligence score: 64 vs DeepSeek's 57
Input cost: $0.20 vs $0.28 per million tokens

Other Models Considered

The developer has also looked at Minimax 2.5, Kimi K2.5, new Qwen 3.5 models, and Gemini 3 Flash, but notes most are relatively expensive and not better for their specific use case.

📖 Read the full source: r/LocalLLaMA