A lot of the new "GPT 5.4 sucks in OpenClaw" posts are really config issues
Since the Anthropic change, a lot more people here are trying GPT 5.4 in OpenClaw.
Which makes sense. Claude Pro / Max inside OpenClaw was one of the biggest cost-saving setups, so now a lot of people are testing GPT 5.4 harder than before.
What I keep seeing though is people blaming the model for stuff that is mostly setup.
A lot of them are testing on old OpenClaw versions. Or with reasoning off. Or with weak thinking settings. Or without the right OpenAI path.
At that point, you’re not really testing GPT 5.4 properly.
You’re testing a crippled setup.
That’s also where the weird "it looks like it’s working but it’s not actually doing anything" feeling comes from.
If reasoning isn’t active, the bot is way more limited than people think. It can answer the last message. It can sound plausible. But as soon as the task needs actual multi-step reasoning, or something changes mid-flow, it falls apart fast.
That’s why some demos feel fake.
Not because GPT 5.4 is automatically bad in OpenClaw, but because the setup is bad enough that the bot never had much room to work with.
The biggest fixes for me were:
2026.4.5reasoning onthinking at least at mediumopenai-responsesOnce those were fixed, GPT 5.4 felt way better.
Less fake progress.
Less random stopping.
Better continuity.
One important thing though: even with the right setup, GPT 5.4 still doesn’t feel exactly like Opus 4.6.
Opus 4.6 would sometimes take a lot more initiative. Sometimes that felt great. Sometimes it was honestly too much, and that freedom could also lead to mistakes.
GPT 5.4 feels a bit different. In my experience, it benefits more from validation and tighter steering on some steps.
Personally, I prefer that.
I’d rather have a model that needs a bit more checking, but stays more controllable, than one that takes too much initiative and occasionally goes off in the wrong direction.
This should be pinned. The openai-completions vs openai-responses distinction alone is responsible for half the complaints. Completions endpoint doesn't support reasoning tokens at all, so you're literally running 5.4 without its strongest feature and then wondering why it's bad.
Also worth checking your OpenClaw version. Anything below 1.x has issues with the newer OpenAI response format and silently falls back to worse behavior without any error message.
I used OpenClaw 2026.4.5, thinking that xhigh, the responses take very long, it's not active, and the responses are boring; it doesn't give exactly what I want. I have to search through messages for what I am looking for. Opus/sonnet uses to first highlight the message I am looking for (great personality), then give reasoning.
This behavior is with reasoning off; if reasoning is on, then it would give me essays.
I really would like to hear what the other best alternatives are. Tried Minimax, and it just lost context literally from the last message. Was impressed by its response time and personality, but it loses context, which is a bit of disappointment.
Help me here to pinpoint any model I can make it at least to sonnet level, so much dependency for me with this.
My problem always sits on model changes. The other day Codex got dumber, ChatGPT got lazier and more sloppy, and OpenClaw got more whimsical. Out of the blue I'll start having days where I'm downvoting almost every conversation because there is noticable degradation I can quantify which objectively didn't exist before.
ok I see how to turn on reasoning, I see how to change thinking levels, but how do I use openai-responses?
.agents.defaults.thinkingDefault = "medium" |
.agents.defaults.reasoningDefault = "on" -- unsupported in new versions of openclaw.
Please also note that “Reasoning” is only for displaying the reasoning process or not, it doesn’t change anything on how the model works, this is controlled by the /think command.
In the openclaw.json change it ftom openai-completions to openai-responses
Built by @aellman
2026 68 Ventures, LLC. All rights reserved.