Let the AIs play games against each other. The resulting leader board is more precise than benchmarks?

  • PlanterTree@discuss.tchncs.deOP
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    2 days ago

    The funny thing: Afaik the LLMs are terrible at chess vs purpose trained chess AI. https://dynomight.net/more-chess/

    Often suggests illegal chess moves.

    You are a chess grandmaster. You will be given a partially completed game. After seeing it, you should choose the next move. Use standard algebraic notation, e.g. “e4” or “Rdf8” or “R1a3”. NEVER give a turn number. NEVER explain your choice.