Chain of Draft (CoD): Making LLMs Think Like Humans on a Deadline

Ever watched someone solve a tough problem while pacing, scribbling half-sentences, muttering “carry the two…” under their breath? That’s how we humans reason. We don’t narrate full essays in our heads. We sketch ideas. And that’s exactly the vibe of Chain of Draft (CoD).

CoD is like giving your LLM a sticky note and saying: “Solve this, but keep it tight. No TED Talks.”

So what is Chain of Draft, really?

It’s a prompting strategy from a recent paper “Chain of Draft: Thinking Faster by Writing Less”. Instead of long-winded Chain of Thought (CoT) answers where the model explains every mental detour, CoD asks it to reason in fast, minimal bursts—5 words max per step.

Imagine you’re doing mental math in a checkout line and the person behind you is sighing loudly. That’s CoD.

Why should you care?

Because:

You get the same accuracy (sometimes better)
You save on tokens (92% fewer, no joke)
You get answers faster (less to compute = more speed)
It works on logic, arithmetic, QA, multi-hop tasks

This is ideal when your app’s running a fleet of agents and one of them decides to write a memoir. CoD stops the oversharing.

The Showdown: CoD vs CoT

Feature	Chain of Thought (CoT)	Chain of Draft (CoD)
Style	Rambling essay	Bullet points with a mission
Token Usage	High	Low (like, really low)
Speed	Slower	Fast and snappy
Accuracy	Great	Also great
Cost	$$$	$
Interpretability	High	Medium (still readable though)

Try it Yourself

import os
from openai import OpenAI

# Initialize the OpenAI client with your API key
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def chain_of_draft_prompt(question):
    return f"""You're a reasoning expert. Solve this problem using extremely 
    short steps (5 words max). Then, give the answer.

    Question: {question}

    Steps:
"""

# Create a chat completion
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": chain_of_draft_prompt("What’s 23 * 17?")}
    ],
    temperature=0.3
)

# Print the assistant's reply
print(response.choices[0].message.content)

TL;DR

Chain of Draft is the minimalist cousin of CoT. It doesn’t talk much, but it gets things done.

Use it when:

You want results, not rambling
You’re running at scale
Token bills are getting ridiculous
Your agents are thinking like poets, not engineers

llm prompt-technique chain-of-thought chain-of-draft

Ali Raza

As a thorough software architect, I bring precision and passion to every software project I tackle. My goal is to always produce innovative and high-quality software that pushes the boundaries of what's possible. I have a love for experimenting with new programming languages, and you can catch me blogging about my experience and insights in the software development world. Join me in my journey as I explore the ever-evolving world of technology and programming.