Timely_Jellyfish_2077@programming.dev to Technology@lemmy.worldEnglish · 27 days agoReasoning failures highlighted by Apple research on LLMsappleinsider.comexternal-linkmessage-square59fedilinkarrow-up1232arrow-down111 cross-posted to: [email protected]
arrow-up1221arrow-down1external-linkReasoning failures highlighted by Apple research on LLMsappleinsider.comTimely_Jellyfish_2077@programming.dev to Technology@lemmy.worldEnglish · 27 days agomessage-square59fedilink cross-posted to: [email protected]
minus-squareRimu@piefed.sociallinkfedilinkEnglisharrow-up8·edit-227 days agoI tried it myself (changing the name and changing the values) but lost interest after 3 attempts and always getting the right answer: https://chatgpt.com/share/670af65d-da08-800f-8ad4-c67782ee5477 https://chatgpt.com/share/670af672-45dc-800f-ac91-cc2811fa89c7 https://chatgpt.com/share/6709e80b-e5a8-800f-90d0-1af3418675ef
minus-squareA_A@lemmy.worldlinkfedilinkEnglisharrow-up3·27 days agoErrors from your links like this : Unable to load conversation 670a…6ed2c
minus-squareA_A@lemmy.worldlinkfedilinkEnglisharrow-up3·27 days ago“… So, Mary has 190 kiwifruit.” nice 😋🥝
minus-squaretinsukE@lemmy.worldlinkfedilinkEnglisharrow-up4arrow-down1·27 days agoI wouldn’t doubt that LLMs got some special input to deal with the specific examples of this paper, or similar enough.
minus-squarealienanimals@lemmy.worldlinkfedilinkEnglisharrow-up1·26 days agoThis is just improving LLMs, but with more steps.
I tried it myself (changing the name and changing the values) but lost interest after 3 attempts and always getting the right answer:
https://chatgpt.com/share/670af65d-da08-800f-8ad4-c67782ee5477
https://chatgpt.com/share/670af672-45dc-800f-ac91-cc2811fa89c7
https://chatgpt.com/share/6709e80b-e5a8-800f-90d0-1af3418675ef
Errors from your links like this :
Unable to load conversation 670a…6ed2c
Sorry! I’ve updated my links now.
“… So, Mary has 190 kiwifruit.”
nice 😋🥝
I wouldn’t doubt that LLMs got some special input to deal with the specific examples of this paper, or similar enough.
This is just improving LLMs, but with more steps.