Meet ‘Smaug-72B’: The new king of open-source AI

ylai@lemmy.ml · 5 months ago

Meet ‘Smaug-72B’: The new king of open-source AI

noneabove1182@sh.itjust.works · 5 months ago

Stop making me want to buy more graphics cards…

Seriously though this is an impressive result, “beating” gpt3.5 is a huge milestone and I love that we’re continuing the trend. Will need to try out a quant of this to see how it does in real world usage. Hope it gets added to the lmsys arena!

Mixtral@sh.itjust.works · 4 months ago

Open training code too?

h3ndrik@feddit.de · edit-2 4 months ago

I don’t get your question. I think their contribution isn’t training a model from zero, but a new DPO loss function for fine-tuning. You can read about that in their paper. It is open-access. The model itself is a fine-tune of MoMo-72B-lora-1.8.7-DPO which is based on Qwen-72B. Respective models have their own papers and Github repos. If your question is about the dataset, that is answered in Appendix D of the paper.

https://github.com/abacusai/smaug

(This is the repo they link with the statement “We release our code and pretrained models […]”. I can’t find a ready-made Python script there (yet). But their method and contribution to DPO seem to be described in the paper. Everything looks pretty open to me. They even described their dataset. But it’s a scientific paper with a small improvement to fine-tuning, accompanied with a model to show off the statistics… Not a software release.)

llm@sh.itjust.works · 4 months ago

It is awesome to have such models opensourced and competed with chatgpt4 but main feature why people still like closed source chatgpt is access to internet for such models. Is there any model have it now?

h3ndrik@feddit.de · edit-2 4 months ago

Sure. I think what you’re looking for are “AI agents” or “RAG” (Retrieval Augmented Generation). It’s not the model itself that does it, but the framework and software around it that provides the internet search capabilities.

And it’s not unique to ChatGPT. It’s been available also to open-weight / local models for years. I’ve lost track of all of the frameworks we have and what features they have. So I don’t know what to recommend. But you can look it up, there are several frameworks available that provide such capabilities to any model. Maybe your preferred solution even has a plugin available.

The underlying method is probably the same for how ChatGPT works internally, and all the other ones. And it’s related to how companies have always fed internal data to their chatbots since all of this started.

You can try one example on hf.co/chat Another solution would be something like h2ogpt. It can do web-search and also index all of the PDFs on your harddrive and answer questions about them and do vision tasks, look at pictures or generate them.