minus-squareRDSM@discuss.tchncs.detoTechnology@lemmy.world•AI struggles to understand human history and fails miserably when testedlinkfedilinkEnglisharrow-up6·29 days ago“Among the tested models, GPT-4 Turbo ranked highest with 46% accuracy, while Llama-3.1-8B scored the lowest at 33.6%.“ Have they tested actual SOTA models? linkfedilink
minus-squareRDSM@discuss.tchncs.detoTechnology@lemmy.world•Stop using generative AI as a search enginelinkfedilinkEnglisharrow-up0·3 months agoNo. linkfedilink
“Among the tested models, GPT-4 Turbo ranked highest with 46% accuracy, while Llama-3.1-8B scored the lowest at 33.6%.“
Have they tested actual SOTA models?