• 0 Posts
  • 75 Comments
Joined 2 years ago
cake
Cake day: June 13th, 2023

help-circle
  • “Up until the semi finals, it seemed like nothing would be able to stop Grok 4 on its way to winning the event,” Pedro Pinhata, a writer for Chess.com, said in its coverage. “Despite a few moments of weakness, X’s AI seemed to be by far the strongest chess player… But the illusion fell through on the last day of the tournament.” He said Grok’s “unrecognizable” and “blundering” play enabled o3 to claim a succession of “convincing wins”.

    I think the main takeaway is that these models are fundamentally inconsistent, and you can never assume they’re going to be reliable based on past performance.



















  • AbouBenAdhem@lemmy.worldtoFediverse@lemmy.world*Permanently Deleted*
    link
    fedilink
    English
    arrow-up
    11
    ·
    edit-2
    4 months ago

    One metric you might want to add is the network effect: how much of a difference does it make to the user experience to join a large instance (or the same instance most of your friends are on) compared to a small or self-hosted one? (Or in other words—does the nature of the platform software potentially incentivize consolidation?)