How to use GPT-3?

Ed30

Newbie
Sep 2, 2020
25
7
So, I need AI-generated non-English content. I tried some more prominent AI tools, but they left much to be desired. So I thought I might use OpenAI API, set it up on my laptop and feed it some data-sets in my language. Considering that I have no experience with Python, is this feasible time- and money-wise? I saw conflicting information: some say that you need gazillion of examples in the data-set to feasibly teach the AI to do anything remotely useful, while other say that you need only several hundred example using one of the OpenAI models.

Have any of you tried to anything like this? Any suggestions or comments? Maybe a link to some step-by-step guide which could be followed easily by someone with no experience or knowledge in the field?
 
saw conflicting information: some say that you need gazillion of examples in the data-set to feasibly teach the AI to do anything remotely useful, while other say that you need only several hundred example using one of the OpenAI models.
The answer is: a yes and no. For tasks like classification or keyword extraction, You only need a few hundred examples to get decent results.

Text generation, however is a difficult task. You'd technically feed a model endless amounts of data and it still wouldn't generate text to your liking. Fine-tuning a single model with larger and larger datasets gives diminishing returns.

Have any of you tried to anything like this? Any suggestions or comments? Maybe a link to some step-by-step guide which could be followed easily by someone with no experience or knowledge in the field
I have finetuned a few "small" models (datasets less than 1 million tokens/750,000 words). If you are training for text generation. Instead of a single "long form content" model, train multiple small models that have specific tasks.

Like one model for H2s, one for intro paragraphs, one for content paragraphs and so on.

This is how large text gen tools like Jasper prolly work. Because whatever you do, a single model starts repeating itself after 700-1000 tokens.

And i am not sure how gpt-3 performs on languages other than English. The dataset it was trained on (The Pile) is multilingual, but I believe it's mostly English.
 
I have finetuned a few "small" models (datasets less than 1 million tokens/750,000 words). If you are training for text generation. Instead of a single "long form content" model, train multiple small models that have specific tasks.
What programming language should I learn to be able to do this type of tasks?
 
Considering that I have no experience with Python, is this feasible time- and money-wise?
Maybe yes? Or Maybe no. Training gpt-3 on OpenAi is easy, you just call an api and upload your dataset. There are no code solutions available for this.

However, preparing a good dataset itself may require good expertise with programming. You will need to scrape some data, clean it, wrangle it and export it to a format OpenAi accepts.

There likely will be no code solutions available for that too. Not sure how much helpful they would be tho.
 
I'm a bit confused now. They say that Jasper is built on having "read" 10% of the Internet. Others also say (even in this thread) that it could take a lot of time and effort to train the AI to write even somewhat decent content. But this book claims that anybody can as good content as Jasper from the comfort of their kitchen.

Where is the catch?

EDIT: this message was a response to someone linking to a thread where e-book is being sold, which claims to provide knowledge how to build a AI model to produce content as good as Japser. But the message I quoted got deleted.
 
Last edited:
So, I need AI-generated non-English content. I tried some more prominent AI tools, but they left much to be desired. So I thought I might use OpenAI API, set it up on my laptop and feed it some data-sets in my language. Considering that I have no experience with Python, is this feasible time- and money-wise? I saw conflicting information: some say that you need gazillion of examples in the data-set to feasibly teach the AI to do anything remotely useful, while other say that you need only several hundred example using one of the OpenAI models.

Have any of you tried to anything like this? Any suggestions or comments? Maybe a link to some step-by-step guide which could be followed easily by someone with no experience or knowledge in the field?

You should try BLOOM, a model similar to OpenAI's Davinci. However, BLOOM has been trained with data in many other languages (about 50) and it should outperform GPT-3 in non-English languages.

The answer is: a yes and no. For tasks like classification or keyword extraction, You only need a few hundred examples to get decent results.

Text generation, however is a difficult task. You'd technically feed a model endless amounts of data and it still wouldn't generate text to your liking. Fine-tuning a single model with larger and larger datasets gives diminishing returns.


I have finetuned a few "small" models (datasets less than 1 million tokens/750,000 words). If you are training for text generation. Instead of a single "long form content" model, train multiple small models that have specific tasks.

Like one model for H2s, one for intro paragraphs, one for content paragraphs and so on.

This is how large text gen tools like Jasper prolly work. Because whatever you do, a single model starts repeating itself after 700-1000 tokens.

And i am not sure how gpt-3 performs on languages other than English. The dataset it was trained on (The Pile) is multilingual, but I believe it's mostly English.

Most AI models repeat themselves after 700-1000 tokens but not all. Depends a LOT on the quality and the size of the data set with which they have been trained.
PS: The Pile is the open source language modelling data set used by EleutherAI for training GPT-J and GPT-NeoX. OpenAI uses a different data set.
 
Why nobody is selling dataset for OpenAi GPT-3 article generator?

Ive been searching around, almost none.

People should start selling it.

1. Dataset XXXX - This dataset can generate article like wikipedia. 1 keyword complete set with title, subheading.
2. Dataset for BBBB - This dataset can generate article like news website.. 1 keyword complete set with title, subheading.
(they need to scrape the content from various sites as sample)

Purchase, upload your data set, and ready to use.
 
Check out this ebook that was just released in the marketplace. It will teach you how to use and train GPT3 https://www.blackhatworld.com/seo/stop-wasting-money-on-ai-writers-train-and-fine-tune-your-own-ai-for-free-with-no-code-real-method-practice-examples.1424188/
Thank you for the positive thought :).

As for the OP:

My E-book would help accomplish your goals. It's not free but it does bring quality to the table :)
 
Thank you for the positive thought :).

As for the OP:

My E-book would help accomplish your goals. It's not free but it does bring quality to the table :)
Yes, I already saw that and will consider it, thank you. In case you offer review copies, I would gladly take it and write a detailed review, comparing your book with what I found on the Internet in the past few days during which I was struggling with precisely this issue. :)
 
Why nobody is selling dataset for OpenAi GPT-3 article generator?

Ive been searching around, almost none.

People should start selling it.

1. Dataset XXXX - This dataset can generate article like wikipedia. 1 keyword complete set with title, subheading.
2. Dataset for BBBB - This dataset can generate article like news website.. 1 keyword complete set with title, subheading.
(they need to scrape the content from various sites as sample)

Purchase, upload your data set, and ready to use.
I have the same question.
 
Back
Top
AdBlock Detected

We get it, advertisements are annoying!

Sure, ad-blocking software does a great job at blocking ads, but it also blocks useful features and essential functions on BlackHatWorld and other forums. These functions are unrelated to ads, such as internal links and images. For the best site experience please disable your AdBlocker.

I've Disabled AdBlock