GPT-3 Has A New Open Source Competitor - GPT-J-6B

steve123_

Registered Member
Joined
Nov 29, 2020
Messages
66
Reaction score
59
AI writing tools are a common discussion topic here on BHW; @MisterF periodically mentions that he uses Jarvis as a part of his content creation process and @jonnyah also mentioned how he is using GPT-3 to generate content for one of his websites.

The problem that I have ran into with many of the GPT-3 based writing tools on the market is that many do not have API access. Those with API access usually charge a fee + monthly subscription on top of the initial subscription price which discourages me from trying mass automation experiments.

Fortunately, there is a group of researchers developing GPT-3 alternatives that are open source and available for everyone on GitHub. This group, EleutherAI, is most popular for their GPT-3-like model GPT-Neo. This uses a transformer (machine learning model) which is notably also used by OpenAI for GPT-3 and Google AI. I was only able to find a few mentions of GPT-Neo on BHW. I would guess the reason is because GPT-Neo underperformed when compared to OpenAi's GPT-3.

gpt-neo-performance-bhw-stevie123_.jpg


This chart is taken from the official GPT-Neo GitHub page. It is sorted from least performant in linguistic reasoning to the highest. To put it simply, GPT-Neo performs worse than even the lowest quality OpenAI GPT-3 model, Ada. Refer to the image below to see the four models that OpenAI offers, all with varying ability and pricing.

open-ai-chart-models-stevie123_-bhw.jpg


The EleutherAI team has recently released a model they call GPT-J-6B which has caught my attention. Connor Leahy, one of the founding members of EleutherAI, said that his team believes that their new GPT-J-6B model is "sized similar to the Curie model of OpenAI" with around six billion parameters. Parameters are basically how the AI was trained (Larger parameter database = more sophisticated AI). Take a look at this video that compares GPT-2, GPT-3, and GPT-J-6B responses to general questions. Read the description of the video for important information about his test.


The video shows that GPT-J-6B outperformed both GPT-2 and GPT-3, however, it is important to remember that this only shows the ability of GPT-J-6B with general questions. The EleutherAI team has an official page in which you can try out the content generation capabilities of their model. I have found that you really need to mess around with the TOP-P and Temperature sliders to get good outputs. These two sliders seem to be very important in determining whether the model will create gibberish or readable content. Take a look at these images below.

0.8-top-p-bhw-stevie123_.jpg


These are using the default TOP-P and Temperature settings. We can obviously see that this is inaccurate and somewhat random. Romney and Obama are most definitely not the current presidents of the United States.

0.77-top-p-bhw-stevie123_.jpg


Changing only the TOP-P value from 0.8 to 0.77 results in a much better output. While I did not do multiple trials and run a proper experiment, it is safe to assume that these two sliders are essential in determining the quality of the output. This is even written about in their official GitHub pages. Unfortunately the answer is wrong about Trump but I would guess that this is due to the data that the model has been trained on.

The most appealing thing about GPT-J-6B is that it is open source. With the ability to implement this model into your content generation work flow, and even the ability to train the model with your own data with sufficient resources, I really do think that this is something that some of the more code-savvy users on BHW take a look at.

If any of you guys have ideas about possible implementations and projects, I would love to hear them! AI is so fascinating and I am so interested in the bright future it has.
 
it still makes stupid texts like 5 year boy. only good for doorways
 
This stuff requires an insane amount of RAM to even form a simple sentence. And i mean, 25 gb of ram isn't enough to tell you "happy birthday". Not sure who uses this, but probably people with insanely powerful machines or a lot of money to spend on servers who support this stuff. Not useful for the regular folk.
 
This stuff requires an insane amount of RAM to even form a simple sentence. And i mean, 25 gb of ram isn't enough to tell you "happy birthday". Not sure who uses this, but probably people with insanely powerful machines or a lot of money to spend on servers who support this stuff. Not useful for the regular folk.
I was going to say this - to get this code up and running will take insanely powerful servers to be useful, making it a high barrier to entry for startups.

That said, it's nice to see the technology being opened up, it will certainly help level the playing field a bit going forward for companies that can afford the power needed to run it.
 
I was going to say this - to get this code up and running will take insanely powerful servers to be useful, making it a high barrier to entry for startups.

That said, it's nice to see the technology being opened up, it will certainly help level the playing field a bit going forward for companies that can afford the power needed to run it.

Btw, i said "people with insanely powerful machines", but i believe no regular human being has a machine powerful enough to run this thing. If a very small sentence takes more than 25 GB of ram, how much ram do you need to create a paragraph? 10 TB of RAM? The entire Google infraestructure? I'm actually very curious about the people who use this stuff. Do you buy special servers for it? Like 100 servers to write a 10 line text? I really can't do shit with this. And even if i had all that money, how the hell do i monetize this thing in order to justify the investiment? I'm probably just too stupid, but it's really head scratching to me.
 
Btw, i said "people with insanely powerful machines", but i believe no regular human being has a machine powerful enough to run this thing. If a very small sentence takes more than 25 GB of ram, how much ram do you need to create a paragraph? 10 TB of RAM. I'm actually very curious about the people who use this stuff. Do you buy special servers for it? Like 100 servers to write a 10 line text? I really can't do shit with this. And even if i had all that money, how the hell do i monetize this thing in order to justify the investiment? I'm probably just too stupid, but it's really head scratching to me.
Nope, not stupid - this is definitely not something you'd run for fun or to crank out content for a few websites. Only businesses will be able to benefit from this as it stands, and not startups. I'm talking businesses that are ready to shell out thousands a month out of the gate just for this piece of tech - nevermind the staff, marketing, etc. But yeah you'd pretty much have to sell output as a business model to justify the costs, and even then you'd be losing money for a while until you built up a recurring revenue stream for it. OpenAI removes these costs/lowers the barrier to entry so they'll be the standard for a while.
 
I member from here turned me on to this and I've been messing around with it, you can host the model yourself on Google TPU Research Cloud and there's other places with the model hosted with API access.

It's really good, tuning the parameters can bring a lot of different results, but for original content it's sweet. You'll have to proof read and make corrections here and there but you can whip up an original and well written 500 word article in under 5 minutes.
 
This is the future right here. I'm 99% certain MANY sites ranking top right now are auto-generating their content.
 
don’t expect perfect content..you will need to do some editing..their accuracy and relevancy is related to the amount of current content for the particular topic you need..On the good side, these tools can save you hours and hours of research.
 
you can host the model yourself on Google TPU Research Cloud and there's other places with the model hosted with API access.
Is Ggl TPU cloud free or how much it costs to yse this on it? And what are the other places with the model hosted with API access?

And OP, how many words can we generate on their official page before they tell us to stop?
 
Is Ggl TPU cloud free or how much it costs to yse this on it? And what are the other places with the model hosted with API access?
Google TPU you can sign up for free 30 day evaluation.
I'm currently trying out these guys here:
Code:
https://hub.getneuro.ai/model/nlp/gpt-j-6B-text-generation
 
AI writing tools are a common discussion topic here on BHW; @MisterF periodically mentions that he uses Jarvis as a part of his content creation process and @jonnyah also mentioned how he is using GPT-3 to generate content for one of his websites.

The problem that I have ran into with many of the GPT-3 based writing tools on the market is that many do not have API access. Those with API access usually charge a fee + monthly subscription on top of the initial subscription price which discourages me from trying mass automation experiments.

Fortunately, there is a group of researchers developing GPT-3 alternatives that are open source and available for everyone on GitHub. This group, EleutherAI, is most popular for their GPT-3-like model GPT-Neo. This uses a transformer (machine learning model) which is notably also used by OpenAI for GPT-3 and Google AI. I was only able to find a few mentions of GPT-Neo on BHW. I would guess the reason is because GPT-Neo underperformed when compared to OpenAi's GPT-3.

gpt-neo-performance-bhw-stevie123_.jpg


This chart is taken from the official GPT-Neo GitHub page. It is sorted from least performant in linguistic reasoning to the highest. To put it simply, GPT-Neo performs worse than even the lowest quality OpenAI GPT-3 model, Ada. Refer to the image below to see the four models that OpenAI offers, all with varying ability and pricing.

open-ai-chart-models-stevie123_-bhw.jpg


The EleutherAI team has recently released a model they call GPT-J-6B which has caught my attention. Connor Leahy, one of the founding members of EleutherAI, said that his team believes that their new GPT-J-6B model is "sized similar to the Curie model of OpenAI" with around six billion parameters. Parameters are basically how the AI was trained (Larger parameter database = more sophisticated AI). Take a look at this video that compares GPT-2, GPT-3, and GPT-J-6B responses to general questions. Read the description of the video for important information about his test.


The video shows that GPT-J-6B outperformed both GPT-2 and GPT-3, however, it is important to remember that this only shows the ability of GPT-J-6B with general questions. The EleutherAI team has an official page in which you can try out the content generation capabilities of their model. I have found that you really need to mess around with the TOP-P and Temperature sliders to get good outputs. These two sliders seem to be very important in determining whether the model will create gibberish or readable content. Take a look at these images below.

0.8-top-p-bhw-stevie123_.jpg


These are using the default TOP-P and Temperature settings. We can obviously see that this is inaccurate and somewhat random. Romney and Obama are most definitely not the current presidents of the United States.

0.77-top-p-bhw-stevie123_.jpg


Changing only the TOP-P value from 0.8 to 0.77 results in a much better output. While I did not do multiple trials and run a proper experiment, it is safe to assume that these two sliders are essential in determining the quality of the output. This is even written about in their official GitHub pages. Unfortunately the answer is wrong about Trump but I would guess that this is due to the data that the model has been trained on.

The most appealing thing about GPT-J-6B is that it is open source. With the ability to implement this model into your content generation work flow, and even the ability to train the model with your own data with sufficient resources, I really do think that this is something that some of the more code-savvy users on BHW take a look at.

If any of you guys have ideas about possible implementations and projects, I would love to hear them! AI is so fascinating and I am so interested in the bright future it has.

Nice Article!
 
I had the opportunity to test a GPT J 6B software and the speed of it is amazing. Better than the AI tool I am actively paying for
 
I had the opportunity to test a GPT J 6B software and the speed of it is amazing. Better than the AI tool I am actively paying for

Mind me asking which one?
 
If we just wanna build in house GPT model using gpt-neox.
I wonder what type of hardware that it required apart from gtx3090

Make a few research, end up this is impossible for small business like mine.
This is industrial level type of development.
 
So can anyone recap or TLDR for us
the steps to use GPT-J-6B for article production, and the related costs?
 
Back
Top