OpenAI o3 beats FrontierMath — because OpenAI funded the test and had access to the questions

David Gerard@awful.systems · 1 month ago

OpenAI o3 beats FrontierMath — because OpenAI funded the test and had access to the questions

blakestacey@awful.systems · 1 month ago

Tamay Besiroglu from Epoch AI says they were “restricted from disclosing the partnership” until the o3 launch. Their contract “specifically prevented us from disclosing information about the funding source and the fact that OpenAI has data access to much but not all of the dataset.”

If you had no problems with that contract, then I don’t trust your ethical judgment as a scientist.

self@awful.systems · 1 month ago

absolutely; there’s no reason to hide the funding source and OpenAI’s access unless you’re grifting. I feel bad for the mathematicians working on FrontierMath who didn’t know though. imagine wasting valuable time on something like this then finding out it was all just a marketing stunt devised by grifters.

BigMuffin69@awful.systems · 1 month ago

has data access to much but not all of the dataset.

Huh! I wonder what part of the dset had the 25% of questions they got right in it 🙃

zbyte64@awful.systems · 1 month ago

Besiroglu says OpenAI did have access to many of the FrontierMath problems and solutions — but he added “we have a verbal agreement that these materials will not be used in model training.”

It’s not like the company building a plagiarism tool would use said tool to plagiarize training data. That would be inconceivable.

BigMuffin69@awful.systems · edit-2 1 month ago

I can’t believe they fucking got me with this one. I remember back in August(?) Epoch was getting quotes from top mathematicians like Tarrence Tao to review the benchmark and he was quoted saying like it would be a big deal for a model to do well on this benchmark, it will be several years before a model can solve all these questions organically etc so when O3 dropped and got a big jump from SotA, people (myself) were blown away. At the same time red flags were going up in my mind: Epoch was yapping about how this test was completely confidential and no one would get to see their very special test so the answers wouldn’t get leaked. But then how in the hell did they evaluate this model on the test? There’s no way O3 was run locally by Epoch at ~$1000 a question -> OAI had to be given the benchmark to run against in house -> maybe they had multiple attempts against it and were tuning the model/ recovering questions from api logs/paying mathematicians in house to produce answers to the problems so they could generate their own solution set??

No. The answer is much stupider. The entire company of Epoch ARE mathematicians working for OAI to make marketing grift to pump the latest toy. They got me lads, I drank the snake oil prepared specifically for people like me to drink :(

sc_griffith@awful.systems · 1 month ago

I fucking knew it!!! I don’t even know why I feel so vindicated for calling out such an obvious fraud tbh. anyone, besides possibly a HN poster, could have seen it coming

self@awful.systems · 1 month ago

Besiroglu says OpenAI did have access to many of the FrontierMath problems and solutions — but he added “we have a verbal agreement that these materials will not be used in model training.”

ooh, a verbal agreement! incredible! altman & co didn’t even have to do the typical slimy corporate move and pay an intern to barely modify the original materials into the input for the training corpus, since that verbal agreement wasn’t legally binding and behind the scenes OpenAI can just go “oopsy woopsy we swear it won’t happen again” and who’s gonna stop them?

kamenLady.@lemmy.world · 1 month ago

Oops, i did it again, all the way

ShakingMyHead@awful.systems · 1 month ago

That’s what I was thinking as well. All they have to do is look the other way.

Phil@awful.systems · 1 month ago

So it looks like Mr. “Not consistently candid” has been at it again?

I will admit that they got me with this one: I genuinely thought the FrontierMath results meant something real. I didn’t think they would be that brazen about rigging a benchmark that was explicitly advertised as being kept private so that AI companies couldn’t train on the questions. More fool me I guess.

maol@awful.systems · 1 month ago

deleted by creator