Grok3 Think mode is impressive. Since ChatGPT came out 27 months ago, at every model release I have tested it with my own test, a roman game that 1) Is hard to draw 2) Has multiple states
Both challenges have been intractable to models so far. Grok3 did it in 3 prompts, one of which was a esoteric bug that I found by chance. The ask for version to play against the human user, not only provided a good version with one prompt, but beautified it in a creative manner.
Feel free to ask me for the code.