- Oct 9, 2013
- 3,454
- 14,311
As if everyone knows how to train their own model?
You're out of luck then
GPT3 is the largest and most accurate model (davinci) and it cost them between $4.6m to $12m to train it. How can anyone here possibly build anything better? Not even talking about the salary the geniuses get. They didn't receive $1B investment for no reason.
Nope, not really.
GPT3 is just one of the largest by parameters. Opt-66b is open source and is 1/3 the size. Opt-175b is available but you have to apply to get it. YaLM-100b is open source.
But, it's a misunderstanding of machine learning to say any model is the "most accurate". GPT3 is just the best for general tasks where you don't fine tune. gpt-neox-24b will outperform it when fine tuned on a task. I'm using a combination of open source models for question answering and the content is at least on par with GPT3 in terms of writing and accuracy wise is more factually accurate.
A larger base number of training parameters doesn't mean the fine tuned model will be better. This also is a misunderstanding in machine learning.
You are not creating a base model from scratch. @splishsplash is referring to training a model ON TOP of a base model. Simple rewriting tasks do not require a huge model like Davinci/NeoX. Instructions on how to do so can be found on the network in various places, there are also vendors offering access to GPT-J (and below) at monthly flat fees.
Exactly. Creating your own base models could be advantageous at an advanced level, but you don't need to train gpt-j, gpt-neox, opt-66b, any of the bert models. You just load them and fine tune them on a data set then use them.
Maybe, and hopefully, @splishsplash can direct us to some resources so we can do it properly (and cheaply).
https://huggingface.co/course/chapter1/1
And to learn the required math and machine learning, a good place to start is https://brilliant.org/. It helps to have a good foundation in calculus, vector calculus and linear algebra.
This is a good resource too: https://machinelearningmastery.com/start-here/
how do you build your own model? Do you mean fine-tuning something like GPT-J or NeoX?
I'm not saying it's not possible, just pretty hard without a PhD or hiring someone who know a lot about ML and these guys tend to charge quite a bit of money. Plus the GPU time.
GPUs are cheap. It's about $3/hr for an A100 80GB
You don't need a Ph.D. That won't help you at all. A Ph.D is a research degree. You don't learn anything. University is slow as fuck. You spend 4 years dicking around.
You just need to learn calculus, vector calculus and linear algebra to understand machine learning.

