db0@lemmy.dbzer0.com to TechTakes@awful.systemsEnglish · 6 months agoThe Google AI isn’t hallucinating about glue in pizza, it’s just over indexing an 11 year old Reddit post by a dude named fucksmith.message-squaremessage-square255fedilinkarrow-up1936arrow-down10file-text
arrow-up1936arrow-down1message-squareThe Google AI isn’t hallucinating about glue in pizza, it’s just over indexing an 11 year old Reddit post by a dude named fucksmith.db0@lemmy.dbzer0.com to TechTakes@awful.systemsEnglish · 6 months agomessage-square255fedilinkfile-text
minus-squaredumbasslinkfedilinkEnglisharrow-up88·6 months agoIts not gonna be legislation that destroys ai, it gonna be decade old shitposts that destroy it.
minus-squareMalachaiConstant@lemmy.worldlinkfedilinkEnglisharrow-up24·6 months agoEveryone who neglected to add the “/s” has become an unwitting data poisoner
minus-squareanton@lemmy.blahaj.zonelinkfedilinkEnglisharrow-up2·5 months agoCorollary: Everyone who added the /s is a collaborator of the data scraping AI companies.
minus-squarePhilippa Cowderoy@mendeddrum.orglinkfedilinkarrow-up1·6 months ago@MalachaiConstant @dumbass I’d be interested to know how few corpus linguists are actually doing LLM research
minus-squareMatch!!@pawb.sociallinkfedilinkEnglisharrow-up20·6 months agoWell now I’m glad I didn’t delete my old shitposts
minus-squareJonathan Hendry@iosdev.spacelinkfedilinkarrow-up16·6 months ago@dumbass @db0 I suppose we should be glad that they aren’t training on old 4chan/8chan posts.
minus-squareJonathan Hendry@iosdev.spacelinkfedilinkarrow-up6·6 months ago@harrys_balzac Posts there are expired and deleted over time, so unless someone’s made an effort to archive them, they’re gone. Of course, the AI people could hoover up new horrible posts.
minus-squarenickwitha_k (he/him)@lemmy.sdf.orglinkfedilinkEnglisharrow-up7·6 months agoI would be surprised if someone hasn’t been scraping it for years.
minus-squareSynopsisTantilize@lemm.eelinkfedilinkEnglisharrow-up9·6 months ago**Moe.archive and 4chan archive have entered the chat. **
minus-squareweker01@feddit.delinkfedilinkEnglisharrow-up5·6 months agoYea there are multiple 4chan archives…
minus-squarePanda (he/him)@lemmy.dbzer0.comlinkfedilinkEnglisharrow-up9·6 months agoEvery answer would either be the smartest shit you’ve ever read or the most racist shit you’ve ever read
Its not gonna be legislation that destroys ai, it gonna be decade old shitposts that destroy it.
Everyone who neglected to add the “/s” has become an unwitting data poisoner
Corollary: Everyone who added the /s is a collaborator of the data scraping AI companies.
@MalachaiConstant @dumbass I’d be interested to know how few corpus linguists are actually doing LLM research
Well now I’m glad I didn’t delete my old shitposts
@dumbass @db0
I suppose we should be glad that they aren’t training on old 4chan/8chan posts.
…yet
@harrys_balzac
Posts there are expired and deleted over time, so unless someone’s made an effort to archive them, they’re gone.
Of course, the AI people could hoover up new horrible posts.
I would be surprised if someone hasn’t been scraping it for years.
**Moe.archive and 4chan archive have entered the chat. **
Yea there are multiple 4chan archives…
Every answer would either be the smartest shit you’ve ever read or the most racist shit you’ve ever read