Paranoid Android
Elite Member
- Jun 20, 2010
- 2,864
- 4,209
My 14 year old Autoblog fantasies have only started materializing recently, but it can be your reality today! No you don't need to 'know' python to run this script. All you need is an IDE on your local machine, figure out how to install a few libraries and the script is ready to run. This script can be scaled to be a multithreaded monster that can handle 100s in parallel and make 10s of 1000s of posts with comments every day, so I'd recommend setting up a local server on virutalbox and exporting the db later. Or, talk to @HostStage the best webhost on BHW. Vincent has been good enough to grant me some crazy permissions and limits on a shared account I have with him.
What the script does
Pulls the top posts from reddit along with 75 to 125 top level comments of each post and posts them to wordpress, posts and comments are from the original authors names with the original timestamp. The multithreading feature with the current settings can do 200 posts with an average of 75 comments per post in about 16 seconds.
Limitations:
1. Only 100 requests per sec per account, I recommend keeping the multiprocesing workers to the mininimum, not more than 2x the number of cores you have. The script does sleep after every 100 requests for 61 seconds.
2. You're good if you host your wordpress locally or on a vps or a dedicated server, but if you're on a shared hosting, your host might have a problem with multiple requests. You might have to setup an ssh tunnel, or build the site locally and populate the db and move it to the server or find a way to sync it in real time.
3. If you let it run and step away and if there is a loss of connection due to one of these limitations the script stops working.
4. You will need an administrator account, and not an account with any lesser privileges.
5. It does not include a rewriter so the content is not unique.
To Dos
1. If you don't have a python environment or an IDE, set it up
2. Copy the code and save it to a file with extension .py
3. Type in your reddit user name, password, client-id and client-secret. Find out how to make them here https://www.geeksforgeeks.org/how-t...nt_secret-for-python-reddit-api-registration/
4. Type in your wordpress username and Application Password. It won't work with your regular password. The guide to do that is here https://www.paidmembershipspro.com/create-application-password-wordpress/
5. If you need multithreading, set the flag to True, but initially I'd recommend keeping it at False so your host doesn't get surprised.
6. You can enter one subreddit at a time, or make a list of them one per line on sr.txt in the same folder as the code file.
7. Run the script.
Here is the code.
I request you to not PM me or reach out on chat if you have questions. Please post them here and I will be happy to answer them, and so will the other members who find this script useful.
Happy Blogging!
What the script does
Pulls the top posts from reddit along with 75 to 125 top level comments of each post and posts them to wordpress, posts and comments are from the original authors names with the original timestamp. The multithreading feature with the current settings can do 200 posts with an average of 75 comments per post in about 16 seconds.
Limitations:
1. Only 100 requests per sec per account, I recommend keeping the multiprocesing workers to the mininimum, not more than 2x the number of cores you have. The script does sleep after every 100 requests for 61 seconds.
2. You're good if you host your wordpress locally or on a vps or a dedicated server, but if you're on a shared hosting, your host might have a problem with multiple requests. You might have to setup an ssh tunnel, or build the site locally and populate the db and move it to the server or find a way to sync it in real time.
3. If you let it run and step away and if there is a loss of connection due to one of these limitations the script stops working.
4. You will need an administrator account, and not an account with any lesser privileges.
5. It does not include a rewriter so the content is not unique.
To Dos
1. If you don't have a python environment or an IDE, set it up
2. Copy the code and save it to a file with extension .py
3. Type in your reddit user name, password, client-id and client-secret. Find out how to make them here https://www.geeksforgeeks.org/how-t...nt_secret-for-python-reddit-api-registration/
4. Type in your wordpress username and Application Password. It won't work with your regular password. The guide to do that is here https://www.paidmembershipspro.com/create-application-password-wordpress/
5. If you need multithreading, set the flag to True, but initially I'd recommend keeping it at False so your host doesn't get surprised.
6. You can enter one subreddit at a time, or make a list of them one per line on sr.txt in the same folder as the code file.
7. Run the script.
Here is the code.
Python:
import praw, os, requests, time, random, base64, json, secrets, string
from concurrent.futures import ProcessPoolExecutor
from datetime import datetime
redditUsername = ""
redditPassword = ""
redditClientId = ""
redditClientSecret = ""
redditPullLimit = 200
redditUserAgent= "postConsolidator_1.0"
wpDomain = '' ##DO NOT INCLUDE HTTPS OR WWW OR ANY TRAILING SLASHES. ONLY ENTER domainname.tld unless wordpress is located in a subfolder, in which case do not use a trailing slash
wpUsername = 'admin'
wpApplicationPassword = ''
wpHasHTTPS = True
wpPostStatus = 'publish'
wpCommentStatus = 'approve'
multiProcessFlag = False
multiProcessMaxWorkers = 16
reddit = praw.Reddit(username=redditUsername,
password=redditPassword,
client_id=redditClientId,
client_secret=redditClientSecret,
user_agent=redditUserAgent
)
wpCredentials = f'{wpUsername}:{wpApplicationPassword}'
wpCredentialsToken = base64.b64encode(wpCredentials.encode())
wpHeader = {'Authorization': 'Basic ' + wpCredentialsToken.decode('utf-8')}
if wpHasHTTPS:
wpPostAPIURL = f'https://{wpDomain}/wp-json/wp/v2/posts/'
wpCommentsAPIURL = f'https://{wpDomain}/wp-json/wp/v2/comments/'
wpUsersAPIURL = f'https://{wpDomain}/wp-json/wp/v2/users/'
else:
wpPostAPIURL = f'http://{wpDomain}/wp-json/wp/v2/posts/'
wpCommentsAPIURL = f'http://{wpDomain}/wp-json/wp/v2/comments/'
wpUsersAPIURL = f'http://{wpDomain}/wp-json/wp/v2/users/'
main_menu = '''
Howdy!
Hit
1: Input name of Subreddit manually
2: Import from a list of subreddits
3. Exit
Wachawannado? : '''
def post2wp(submission, i):
if str(submission.selftext) == '':
print('nothing to post')
return
wpUserDate = {
'username':str(submission.author).lower().replace('_',"").replace('-',''),
'email':str(submission.author).lower().replace('_',"").replace('-','') + '@' + wpDomain,
'password':''.join(secrets.choice(string.ascii_letters + string.digits + '!@#$%^&*()_+=-`~;:"|?><,./') for i in range(16)),
'roles':['subscriber'],
}
wpUserResponse = requests.post(url=wpUsersAPIURL, headers=wpHeader, json=wpUserDate)
try:
wpPostAuthorID = int(json.loads(wpUserResponse.text)['id'])
except KeyError:
print(f'{submission.author}, not created')
print(wpUserResponse.text)
wpPostAuthorID = 2
print('Author ID', wpPostAuthorID)
wpPostData = {
'title' : submission.title,
'status': wpPostStatus,
'content': submission.selftext,
'date_gmt':datetime.utcfromtimestamp(submission.created).strftime('%Y-%m-%dT%H:%M:%S'),
'author':wpPostAuthorID,
}
wpPostResponse = requests.post(url=wpPostAPIURL, headers=wpHeader, json=wpPostData)
print('wppostresponse', wpPostResponse)
try:
wpPostID = json.loads(wpPostResponse.text)['id']
except KeyError:
j=0
print('Bad Response, skipping post')
return
submission.comments.replace_more(limit=None)
for j, topLevelComment in enumerate(submission.comments):
if topLevelComment is not None:
wpCommentData = {
'post': wpPostID,
'author_name': topLevelComment.author,
'author_email': str(topLevelComment.author) + '@' + wpDomain,
'content': topLevelComment.body,
'status':wpCommentStatus,
'author_ip':str(random.randint(0,255))+'.'+str(random.randint(0,255))+'.'+str(random.randint(0,255))+'.'+str(random.randint(0,255)),
'date_gmt':datetime.utcfromtimestamp(topLevelComment.created).strftime('%Y-%m-%dT%H:%M:%S'),
}
wp_comment_response = requests.post(url=wpCommentsAPIURL, headers=wpHeader, data=wpCommentData)
print(i+1, j+1, wp_comment_response)
if j > random.randint(75,125):
break
print(f'Item no. {i + 1} posted, with {j+1} comments and title{submission.title} and post content \n\n {submission.selftext} \n\n {wpPostResponse}')
if __name__ == '__main__':
startTime = time.perf_counter()
try:
wachawannado = int(input(main_menu))
except ValueError:
print('\nPlease enter a number')
if wachawannado == 1:
subreddit = reddit.subreddit(input("Ok, lets crawl reddit.\nWhats the name of the subreddit? : "))
with ProcessPoolExecutor(max_workers=multiProcessMaxWorkers) as executor:
for i, submission in enumerate(subreddit.top(limit=redditPullLimit)):
if multiProcessFlag:
executor.submit(post2wp, submission=submission, i=i)
else:
post2wp(submission=submission, i=i)
elif wachawannado == 2:
if os.path.exists('sr.txt'):
if not os.stat("sr.txt").st_size == 0:
with open('sr.txt','r') as sr:
with ProcessPoolExecutor(max_workers=multiProcessMaxWorkers) as executor:
for i, line in enumerate(sr):
subreddit = reddit.subreddit(line)
for j, submission in enumerate(subreddit.top(limit=redditPullLimit)):
if multiProcessFlag:
executor.submit(post2wp, submission=submission, i=j)
if j % 99 == 0:
print('yawn')
time.sleep(61)
print('Sorry, dozed off')
else:
post2wp(submission=submission, i=j)
print(f'{i}:{line} done, pulled {j} submissions from {line}')
else:
print('sr.txt is empty')
else:
print('Cant find sr.txt with the list of SubReddits')
elif wachawannado == 3:
print('Bye!')
else:
print('\nHit a valid number\n')
print('Total time for this loop is ', time.perf_counter()-startTime)
I request you to not PM me or reach out on chat if you have questions. Please post them here and I will be happy to answer them, and so will the other members who find this script useful.
Happy Blogging!