[CODE] People Also Ask Scraping with Python [Upgraded code from iam_ironman]

Ramazan

Regular Member
Joined
Aug 19, 2018
Messages
433
Reaction score
308
I saw the code on @iam_ironman post and upgraded with for loop, keyword list and writing resuls on txt file line by line. So also thanks for idea @RealDaddy

Python:
import requests
from lxml.html import fromstring
import lxml.html

file1 = open('YourPath/PeopleAlsoAsk/keywords.txt', 'r')
file2 = open('YourPath/PeopleAlsoAsk/paa.txt', 'w')
Lines = file1.readlines()
header = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36',
        'Accept-Language': 'tr-tr,en;q=0.9',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
        }
for line in Lines:
        query = line.strip()
        response = requests.get(f'https://www.google.com/search?q={query}&start=0', headers=header).text
        tree = lxml.html.fromstring(response)
        node = tree.xpath('//@data-q')
        x = node[0]
        y = node[1]
        z = node[2]
        print(query)
        file2.write(query)
        file2.write(": ")
        file2.write(x)
        file2.write(" , ")
        file2.write(y)
        file2.write(" , ")
        file2.write(z)
        file2.write("\n")
file1.close()
file2.close()

Create keywords.txt file and paa.txt program is saving results so paa.txt so write your keywords line by line on keywords.txt
Example paa.txt
1658393179990.png
 
Excellent share, sure this will help many of people.
 
I saw the code on @iam_ironman post and upgraded with for loop, keyword list and writing resuls on txt file line by line. So also thanks for idea @RealDaddy

Python:
import requests
from lxml.html import fromstring
import lxml.html

file1 = open('YourPath/PeopleAlsoAsk/keywords.txt', 'r')
file2 = open('YourPath/PeopleAlsoAsk/paa.txt', 'w')
Lines = file1.readlines()
header = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36',
        'Accept-Language': 'tr-tr,en;q=0.9',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
        }
for line in Lines:
        query = line.strip()
        response = requests.get(f'https://www.google.com/search?q={query}&start=0', headers=header).text
        tree = lxml.html.fromstring(response)
        node = tree.xpath('//@data-q')
        x = node[0]
        y = node[1]
        z = node[2]
        print(query)
        file2.write(query)
        file2.write(": ")
        file2.write(x)
        file2.write(" , ")
        file2.write(y)
        file2.write(" , ")
        file2.write(z)
        file2.write("\n")
file1.close()
file2.close()

Create keywords.txt file and paa.txt program is saving results so paa.txt so write your keywords line by line on keywords.txt
Example paa.txt
View attachment 218623
thanks for sharing! Just bought paa software, but i think this one will be more powerful if we add multithread module + proxies. ggwp
 
thanks for sharing! Just bought paa software, but i think this one will be more powerful if we add multithread module + proxies. ggwp
Would you like to add multithread and share? :p
 
Thank you & great job

what do you mean by paa software though?
 
Would you like to add multithread and share? :p
Python:
import requests
import threading
from lxml.html import fromstring
import lxml.html
import time

file1 = open('keywords.txt', 'r')
file2 = open('paa.txt', 'w')
Lines = file1.readlines()
header = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36',
        'Accept-Language': 'tr-tr,en;q=0.9',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
        }

exitFlag = 0
number=0
class myThread (threading.Thread):
   def __init__(self, threadID, name, counter):
      threading.Thread.__init__(self)
      self.threadID = threadID
      self.name = name
      self.counter = counter
   def run(self):      
        print ("Starting " + self.name)
        global number
        query = Lines[number].strip()      
        number = number+1
        response = requests.get(f'https://www.google.com/search?q={query}&start=0', headers=header).text
        tree = lxml.html.fromstring(response)
        node = tree.xpath('//@data-q')
        x = node[0]
        y = node[1]
        z = node[2]
        print("Keyword Processed: " + query)
        file2.write(query)
        file2.write(": ")
        file2.write(x)
        file2.write(" , ")
        file2.write(y)
        file2.write(" , ")
        file2.write(z)
        file2.write("\n")
        print ("Exiting " + self.name)
       

def print_time(threadName, delay, counter):
   while counter:
      if exitFlag:
         threadName.exit()
      time.sleep(delay)
      print ("%s: %s" % (threadName, time.ctime(time.time())))
      counter -= 1

# Create new threads
thread1 = myThread(1, "Thread-1", 1)
thread2 = myThread(2, "Thread-2", 2)

# Start new Threads
thread1.start()
thread2.start()
thread1.join()
thread2.join()

# Closing File
file1.close()
file2.close()
Sure. This is example using 2 multi thread. We can multiply thread just adding thread, or just loop it using for each. :D
 
Last edited:
Code:
import requests
import threading
from lxml.html import fromstring
import lxml.html
import time

file1 = open('keywords.txt', 'r')
file2 = open('paa.txt', 'w')
Lines = file1.readlines()
header = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36',
        'Accept-Language': 'tr-tr,en;q=0.9',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
        }

exitFlag = 0
number=0
class myThread (threading.Thread):
   def __init__(self, threadID, name, counter):
      threading.Thread.__init__(self)
      self.threadID = threadID
      self.name = name
      self.counter = counter
   def run(self):      
        print ("Starting " + self.name)
        global number
        query = Lines[number].strip()      
        number = number+1
        response = requests.get(f'https://www.google.com/search?q={query}&start=0', headers=header).text
        tree = lxml.html.fromstring(response)
        node = tree.xpath('//@data-q')
        x = node[0]
        y = node[1]
        z = node[2]
        print("Keyword Processed: " + query)
        file2.write(query)
        file2.write(": ")
        file2.write(x)
        file2.write(" , ")
        file2.write(y)
        file2.write(" , ")
        file2.write(z)
        file2.write("\n")
        print ("Exiting " + self.name)
       

def print_time(threadName, delay, counter):
   while counter:
      if exitFlag:
         threadName.exit()
      time.sleep(delay)
      print ("%s: %s" % (threadName, time.ctime(time.time())))
      counter -= 1

# Create new threads
thread1 = myThread(1, "Thread-1", 1)
thread2 = myThread(2, "Thread-2", 2)

# Start new Threads
thread1.start()
thread2.start()
thread1.join()
thread2.join()

# Closing File
file1.close()
file2.close()
Sure. This is example using 2 multi thread. We can multiply thread just adding thread, or just loop it using for each. :D
thanks.
 
Python:
import requests
import threading
from lxml.html import fromstring
import lxml.html
import time

file1 = open('keywords.txt', 'r')
file2 = open('paa.txt', 'w')
Lines = file1.readlines()
header = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36',
        'Accept-Language': 'tr-tr,en;q=0.9',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
        }

exitFlag = 0
number=0
class myThread (threading.Thread):
   def __init__(self, threadID, name, counter):
      threading.Thread.__init__(self)
      self.threadID = threadID
      self.name = name
      self.counter = counter
   def run(self):     
        print ("Starting " + self.name)
        global number
        query = Lines[number].strip()     
        number = number+1
        response = requests.get(f'https://www.google.com/search?q={query}&start=0', headers=header).text
        tree = lxml.html.fromstring(response)
        node = tree.xpath('//@data-q')
        x = node[0]
        y = node[1]
        z = node[2]
        print("Keyword Processed: " + query)
        file2.write(query)
        file2.write(": ")
        file2.write(x)
        file2.write(" , ")
        file2.write(y)
        file2.write(" , ")
        file2.write(z)
        file2.write("\n")
        print ("Exiting " + self.name)
      

def print_time(threadName, delay, counter):
   while counter:
      if exitFlag:
         threadName.exit()
      time.sleep(delay)
      print ("%s: %s" % (threadName, time.ctime(time.time())))
      counter -= 1

# Create new threads
thread1 = myThread(1, "Thread-1", 1)
thread2 = myThread(2, "Thread-2", 2)

# Start new Threads
thread1.start()
thread2.start()
thread1.join()
thread2.join()

# Closing File
file1.close()
file2.close()
Sure. This is example using 2 multi thread. We can multiply thread just adding thread, or just loop it using for each. :D
ability to extract answers would be good too
 
I get this; any ideas about what is not okay?
Code:
Traceback (most recent call last):
  File "paascript.py", line 18, in <module>
    x = node[0]
IndexError: list index out of range
 
I get this; any ideas about what is not okay?
Code:
Traceback (most recent call last):
  File "paascript.py", line 18, in <module>
    x = node[0]
IndexError: list index out of range
Yeah actually i have same problem trying to fix. any idea? @typina

Its supports first 4-5 line i guess. @Madoxx When I fix will reply
 
not all pages will return 3 questions so

Python:
import requests
from lxml.html import fromstring
import lxml.html

file1 = open('YourPath/PeopleAlsoAsk/keywords.txt', 'r')
file2 = open('YourPath/PeopleAlsoAsk/paa.txt', 'w')
Lines = file1.readlines()
header = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36',
        'Accept-Language': 'tr-tr,en;q=0.9',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'
        }
for line in Lines:
        query = line.strip()
        response = requests.get(f'https://www.google.com/search?q={query}&start=0', headers=header).text
        tree = lxml.html.fromstring(response)
        nodes = tree.xpath('//@data-q')
        print(query)
        file2.write(query)
        file2.write(": ")
        for node in nodes:
            file2.write(node)
            file2.write(" , ")
        file2.write("\n")
file1.close()
file2.close()
 
Bookmarked. Thanks for sharing the upgraded version
 
Back
Top