PlanterTree@discuss.tchncs.de to Artificial Intelligence@lemmy.worldEnglish · 2 days agoRethinking how we measure AI intelligenceblog.googleexternal-linkmessage-square1fedilinkarrow-up14arrow-down11file-text
arrow-up13arrow-down1external-linkRethinking how we measure AI intelligenceblog.googlePlanterTree@discuss.tchncs.de to Artificial Intelligence@lemmy.worldEnglish · 2 days agomessage-square1fedilinkfile-text
Let the AIs play games against each other. The resulting leader board is more precise than benchmarks?
minus-squarePlanterTree@discuss.tchncs.deOPlinkfedilinkEnglisharrow-up2·edit-22 days agoThe funny thing: Afaik the LLMs are terrible at chess vs purpose trained chess AI. https://dynomight.net/more-chess/ Often suggests illegal chess moves. You are a chess grandmaster. You will be given a partially completed game. After seeing it, you should choose the next move. Use standard algebraic notation, e.g. “e4” or “Rdf8” or “R1a3”. NEVER give a turn number. NEVER explain your choice.
The funny thing: Afaik the LLMs are terrible at chess vs purpose trained chess AI. https://dynomight.net/more-chess/
Often suggests illegal chess moves.