Ever watched someone solve a tough problem while pacing, scribbling half-sentences, muttering “carry the two…” under their breath? That’s how we humans reason. We don’t narrate full essays in our heads. We sketch ideas. And that’s exactly the vibe of Chain of Draft (CoD).
CoD is like giving your LLM a sticky note and saying: “Solve this, but keep it tight. No TED Talks.”
So what is Chain of Draft, really?
It’s a prompting strategy from a recent paper “Chain of Draft: Thinking Faster by Writing Less”. Instead of long-winded Chain of Thought (CoT) answers where the model explains every mental detour, CoD asks it to reason in fast, minimal bursts—5 words max per step.
Imagine you’re doing mental math in a checkout line and the person behind you is sighing loudly. That’s CoD.
Why should you care?
Because:
- You get the same accuracy (sometimes better)
- You save on tokens (92% fewer, no joke)
- You get answers faster (less to compute = more speed)
- It works on logic, arithmetic, QA, multi-hop tasks
This is ideal when your app’s running a fleet of agents and one of them decides to write a memoir. CoD stops the oversharing.
The Showdown: CoD vs CoT
Feature | Chain of Thought (CoT) | Chain of Draft (CoD) |
---|---|---|
Style | Rambling essay | Bullet points with a mission |
Token Usage | High | Low (like, really low) |
Speed | Slower | Fast and snappy |
Accuracy | Great | Also great |
Cost | $$$ | $ |
Interpretability | High | Medium (still readable though) |
Try it Yourself
import os
from openai import OpenAI
# Initialize the OpenAI client with your API key
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
def chain_of_draft_prompt(question):
return f"""You're a reasoning expert. Solve this problem using extremely
short steps (5 words max). Then, give the answer.
Question: {question}
Steps:
"""
# Create a chat completion
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": chain_of_draft_prompt("What’s 23 * 17?")}
],
temperature=0.3
)
# Print the assistant's reply
print(response.choices[0].message.content)
TL;DR
Chain of Draft is the minimalist cousin of CoT. It doesn’t talk much, but it gets things done.
Use it when:
- You want results, not rambling
- You’re running at scale
- Token bills are getting ridiculous
- Your agents are thinking like poets, not engineers