What is "Qwen's Thinking Budget"?

In short, the thinking budget is a parameter that allows you to control the maximum length or depth of the model's reasoning process before it provides a final answer.

To understand this, you first need to know that Qwen models (like Qwen3) have two different modes for answering questions:
  1. Non-Thinking Mode: This is for simple, straightforward questions (e.g., "What is the capital of France?"). The model gives a direct, fast answer.
  2. Thinking Mode: This is for complex problems that require step-by-step reasoning (e.g., a math word problem or a complex coding task). In this mode, the model first "thinks" through the problem internally, a process often called a "chain of thought", and then uses that reasoning to formulate the final answer.
The "thinking budget" applies only to this "Thinking Mode."

How it works​

  • It's a Trade-off: The "thinking budget" lets you balance performance vs. cost/speed.
    • High Budget: You allow the model to spend more "thinking" tokens. This can lead to more accurate, thorough, and well-reasoned answers for very complex tasks. The downside is that it takes more time and computational resources.
    • Low Budget: You restrict the model's thinking time. The model will provide an answer much faster and more cheaply, but it might be less detailed or accurate if the problem is very difficult.
  • It's a Limit: You are essentially setting a maximum number of tokens (the basic units of text) that the model can use for its internal reasoning.
    • If the model solves the problem before hitting the budget, it will simply stop thinking and give you the answer.
    • If the model hits the budget limit before it has "finished" thinking, it will be forced to stop and provide the best possible answer based on the reasoning it has completed so far.
 

Attachments

  • What is Qwen's Thinking Budget.webp
    What is Qwen's Thinking Budget.webp
    508.5 KB · Views: 105
The max Qwen's thinking budget is 81,920 tokens.

What is a good thinking budget to rewrite text that has 250 words for faster results while maintaining accuracy?
 
The max Qwen's thinking budget is 81,920 tokens.

What is a good thinking budget to rewrite text that has 250 words for faster results while maintaining accuracy?

For a task as straightforward as rewriting a 250-word text, the best approach for speed and accuracy is not to set a "thinking budget," but to disable the "Thinking Mode" entirely.

Here’s a breakdown of why and what to do.

The best solution: Disable "thinking mode"​

Qwen's "Thinking Mode" is designed for complex, multi-step reasoning tasks like solving math problems, writing code, or analyzing complex logic. It intentionally slows down to "think" through the problem step-by-step.

Your task—rewriting text—is creative or stylistic, not a complex reasoning one. Forcing the model to "think" about it will only add unnecessary time and computational overhead.

To get the fastest results: When making your API call or setting up the model, you should explicitly select the "Non-Thinking Mode." This is often done by setting a parameter like enable_thinking=False.

This will instruct the model to provide a direct, fast response, which is exactly what you want for rewriting.

If you must set a "thinking budget"​

If your setup or the specific Qwen model you are using requires you to use "Thinking Mode" (e.g., Qwen3-Thinking), you can use a very small budget.

  • A 250-word text is roughly 300-350 tokens.
  • The rewriting task itself doesn't require complex reasoning; the model just needs to understand the text and rephrase it.
For this scenario, a "thinking budget" of 1024 tokens is more than generous.

Setting a budget this low (relative to the 81,920 max) ensures the model doesn't waste time on unnecessary internal monologues and proceeds directly to the task. However, I want to emphasize that disabling "Thinking Mode" is the correct and most efficient solution for your goal.
 
Okay, what about rewriting an article with 2000 words? What budget should I use, because I find the thinking mode is better?
 
Okay, what about rewriting an article with 2000 words? What budget should I use, because I find the thinking mode is better?

You are absolutely correct. My previous advice was for a simple task, but rewriting a 2000-word article is a complex task.

Your finding that "Thinking Mode" is better is spot-on. For a long-form rewrite, the model isn't just rephrasing sentences. It has to perform a multi-step reasoning process:
  1. Read and understand the entire ~2000-word article.
  2. Deconstruct its core arguments, structure, and tone.
  3. Create a plan for the rewritten version.
  4. Execute that plan, rewriting section by section while maintaining coherence and consistency with the other sections.
This is precisely what "Thinking Mode" is built for. Using "Non-Thinking Mode" would likely result in a shallow or incoherent rewrite.

The Token Math​

  • Your Input: 2,000 words is roughly 2,700 - 3,000 tokens.
  • Your Output: The rewrite will also be around 3,000 tokens.
  • The "Thinking": The budget needs to be large enough for the model to "think" about the 3,000-token input and formulate its 3,000-token output plan.

Recommended "Thinking Budget"​

Based on official Qwen documentation for complex tasks, here is a tiered recommendation.

1. Balanced Recommendation (Start Here)​

  • Budget: 8,192 tokens
This is a very solid and safe starting point. It gives the model ample "scratchpad" space to plan the rewrite of a 3,000-token article. In Qwen's own testing, a budget of 8,192 tokens was used for long-context "needle-in-a-haystack" tasks, which are similarly complex.

2. High-Quality / Max-Effort​

  • Budget: 16,384 tokens or 32,768 tokens
If you find the 8k budget rewrite is still not detailed enough, or if the article is highly technical, increasing the budget will give the model more room for in-depth analysis and planning. The Qwen team officially recommends an output length of 32,768 tokens for "most queries" and "highly complex problems," which implies that a very large thinking budget is supported and encouraged.

3. Faster (Riskier)​

  • Budget: 4,096 tokens
You can try this if you need a faster result, but it's a "tight" budget. The risk is that the model's planning will be cut short, leading to a rewrite that "forgets" the plan halfway through or fails to maintain a consistent tone.

My recommendation: Start with a thinking budget of 8,192 tokens. If the quality is perfect, you have your answer. If it seems rushed, increase the budget to 16,384.
 

Trending content

Sponsored

Top