How do I? a = b + c in Python.

Deleted member 969102 · Jan 22, 2017

Hi there,

For my first python project, I made a web scraper (Like a Sitemap)

I then decided that I would modify it to make a range of Instagram bots. The one that I'm currently setting up is scraping people who posted with a specific #.

I have it fully working, but the user has to enter the full URL every time, whereas I only want them to enter the #

The code I have currently looks like this:

Code:

tag = input("Please enter a Tag:")
url = "https://www.instagram.com/explore/tags/" + tag

but it doesn't work. I have tried a lot of things like

Code:

tag = input("Please enter a Tag:")
tagurl = "https://www.instagram.com/explore/tags/"
url = ['tagurl','tag']

And a bunch of other stuff changing out the spacing, etc but I can't get it to work.

I would really appreciate the help on this one. I could even pay if you really want.

bartosimpsonio · Jan 22, 2017

The return from input probably includes the newline character. Are you cleanin up the return so it has no invisible stuff after it?

Deleted member 969102 · Jan 22, 2017

bartosimpsonio said:
The return from input probably includes the newline character. Are you cleanin up the return so it has no invisible stuff after it?

Something like tag.strip() ?

bartosimpsonio · Jan 22, 2017

Here are some ideas http://stackoverflow.com/questions/8270092/python-remove-all-whitespace-in-a-string

Deleted member 969102 · Jan 22, 2017

bartosimpsonio said:
Here are some ideas http://stackoverflow.com/questions/8270092/python-remove-all-whitespace-in-a-string

It's really weird.

Before it was giving me an error, but now

Code:

tag = input("Please enter a Tag:")
url = "https://www.instagram.com/explore/tags/" + tag

works fine. Maybe there was an error with saving the output to a file or something, I don't know.

Anyway, thanks for the help. All that's left to do is a simple GUI now

bartosimpsonio · Jan 22, 2017

Nice. U using scrapy?

Deleted member 969102 · Jan 22, 2017

bartosimpsonio said:
Nice. U using scrapy?

Just some YouTube tutorial I watched to scrape OBLs. The code is like this:

Code:

import urllib
from bs4 import BeautifulSoup
import urlparse
import mechanize
import csv

tag = input("Please enter a Tag:")
url = "https://www.instagram.com/explore/tags/" + tag
#url = "https://www.instagram.com/explore/tags/example"
br = mechanize.Browser()
urls = [url]
visited = [url]
while len(urls)>0:
    try:
        br.open(urls[0])
        urls.pop(0)   
        for link in br.links():
            newurl = urlparse.urljoin(link.base_url,link.url)
            b1 = urlparse.urlparse(newurl).hostname
            b2 = urlparse.urlparse(newurl).path
            newurl =  "http://"+b1+b2

            if newurl not in visited and urlparse.urlparse(url).hostname in newurl:
                urls.append(newurl)
                visited.append(newurl)
                print newurl
                
    except:
        print "error"
        urls.pop(0)

Then I remove all lines containing /p/ (picture links), /explore/ (other #s), /location/, error (for the error part of the script) and other non-user urls like:
http://www.instagram.com/accounts/
http://www.instagram.com/about/
http://www.instagram.com/press/
http://www.instagram.com/developer/
http://www.instagram.com/legal/privacy/
http://www.instagram.com/legal/terms/
http://www.instagram.com/about/directory/
http://www.instagram.com/download/instagram/

I'm setting it up to dynamically change depending on if you want to scrape followers, likers, posters to a #, posters to a location etc.

Good first main project as it has the possibility of implementing other useful things like outputting to CSV, proxy rotation, multi-threading etc

MoneyEagle · Jan 22, 2017

Great going man!

bartosimpsonio · Jan 22, 2017

Pretty cool stuff. Will play with this later

Deleted member 969102 · Jan 22, 2017

MoneyEagle said:
Great going man!

Thanks.

bartosimpsonio said:
Pretty cool stuff. Will play with this later

Yeah it's really interesting!

It's 3am here, so I don't have time to add a GUI today.

Here's the code so far:

Code:

import urllib
from bs4 import BeautifulSoup
import urlparse
import mechanize
import csv

print("What would you like to scrape from?")
print("a) Hashtag")
print("b) Other")
answer = input("Make your choice: ")

if answer == "a":
    urlb = input("Please enter a Tag:")
    urla = "https://www.instagram.com/explore/tags/"

else:
    print("Coming Soon")

url = urla + urlb
file = open(urlb + '.txt', 'w')
br = mechanize.Browser()
urls = [url]
visited = [url]
while len(urls)>0:
    try:
        br.open(urls[0])
        urls.pop(0)   
        for link in br.links():
            newurl = urlparse.urljoin(link.base_url,link.url)
            b1 = urlparse.urlparse(newurl).hostname
            b2 = urlparse.urlparse(newurl).path
            newurl =  "http://"+b1+b2

            if newurl not in visited and urlparse.urlparse(url).hostname in newurl:
                urls.append(newurl)
                visited.append(newurl)
                file.write(newurl + "\n")
    except:
        file.write("error\n")
        urls.pop(0)

file.close()
      
print "Complete"

I added file output with a custom name dependant on the Tag. I was going to do location as well but unfortunately, each location has a number (Ie ig.com/location/1234/nowyork) and I haven't found a way to get this easily.

I set up this script that will be activated when the user clicks "Stop" on the GUI (Will make tomorrow)

Code:

bad_words = ['error', '/p/','explore','locations','accounts','about','press','developer','legal','download']

with open(urlb + '.txt') as oldfile, open(urlb + 'final.txt'', 'w') as newfile:
   for line in oldfile:
       if not any(bad_word in line for bad_word in bad_words):
           newfile.write(line)

Then I will remove the "https://instagram.com/" from the start and the "/" from the end.

ѕмarтgυy · Jan 22, 2017

hasa1015 said:
It's really weird.

Before it was giving me an error, but now

Code:

tag = input("Please enter a Tag:") url = "https://www.instagram.com/explore/tags/" + tag

works fine. Maybe there was an error with saving the output to a file or something, I don't know.

Anyway, thanks for the help. All that's left to do is a simple GUI now

Better do it like this :

Code:

tag = input("Please enter a Tag:")
url = "https://www.instagram.com/explore/tags/{}".format(tag)

gman777 · Feb 12, 2017

So you use beautifulsoup to scrape data. Nice man. This thing really motivated me to do again python.

It doesn't seem difficult to understand the notions used in your program.

I was planning to create a bot to scrape the first 10 results from google for a particular keyword, and check if the given keyword is found in title, description, or permalink etc. + other factors and then based on the given value it would result in my own metric...yeah that would be cool..

Anyway, thanks...

How do I? a = b + c in Python.

Deleted member 969102

Guest

bartosimpsonio

Elite Member

Deleted member 969102

Guest

bartosimpsonio

Elite Member

Deleted member 969102

Guest

bartosimpsonio

Elite Member

Deleted member 969102

Guest

MoneyEagle

Regular Member

bartosimpsonio

Elite Member

Deleted member 969102

Guest

ѕмarтgυy

Newbie

gman777

Power Member

Main Menu

Marketplace

Making Money

BlackHat World