After a week of daily use, I’m both impressed... and frustrated.
If you try GPT 5 Pro directly in the ChatGPT portal, it feels like a big leap forward: almost no hallucinations, sharper answers, and excellent benchmark scores. (Let’s skip over the messy first day rollout with routing glitches to old models…)
But in real daily workflows, especially through coding AI assistants like Cursor, Copilot, v0, and others, the experience is less impressive. That might be because these tools are still optimized for older models, and it will take time to re tune prompts for GPT 5. Or maybe GPT 5 just isn’t as strong when calling specific tools like MCP or handling more agentic workflows. Right now, tooling performance is surprisingly poor. I also realized just how many different tools we rely on every day and how tricky it is to swap in a new model when everything is built around carefully optimized prompts and flows.
Code generation quality is still pretty average. When it comes to pure code generation, I still find Anthropic’s Claude models more consistent and reliable as they tend to produce cleaner, more readable code with fewer small mistakes. In frontend or design heavy tasks, it struggles even more. My go-to for those remains the fine tuned Vercel models in v0, which consistently deliver better layouts, cleaner CSS, and more thoughtful UI decisions. In fact, there are times when even GPT 4o or o3 gave me stronger results for UI and design work than GPT 5 does right now.
Where GPT 5 shines is reasoning. It’s excellent for architecture design, refining tasks, and complex refactoring. I’ve added it as a PR reviewer and as an MCP tool for problem solving that requires deep thought, and in these cases, it’s outstanding.
Bottom line: GPT 5 is a good model, not quite the game changer it was pitched as, but definitely valuable in the right parts of your toolchain.
1
9
0