Opus 4.5 failed half my coding tests, despite bold claims File handling glitches made basic plugin testing nearly impossible Two tests passed, but reliability issues still dominate the story I've got ...
OpenAI's new GPT-5 flagship failed half of my programming tests. Previous OpenAI releases have had just about perfect results. Now that OpenAI has enabled fallbacks to other LLMs, there are options.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results