1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

[Python] Multiprocessing - 'Manager' Not Working

Discussion in 'Programming' started by apex1, Apr 13, 2018.

  1. apex1

    apex1 Junior Member

    Joined:
    May 29, 2015
    Messages:
    196
    Likes Received:
    154
    I'm trying to have a basic counter in the 'test' function that increments for each iteration (list1 item).

    I can't get it to work, been trying for hours. Any ideas what I'm doing wrong?

    Code:
    from multiprocessing import Pool, Manager
    
    
    def test(current_item, counter): 
    
        counter = counter + 1
        print(counter)
    
        print(current_item)
    
    
    if __name__ == '__main__':
    
        list1 = ["item1",
                 "item2",
                 "item3",
                 "item4",
                 "item5",
                 "item6",
                 "item7",
                 "item8",
                 "item9",
                 "item10",
                 "item11",
                 "item12"]
    
        counter = Manager().Value(0)
    
        p = Pool(4)  # worker count
        p.map(test, list1)  # (function, iterable)
        p.terminate()
        p.join()
    
     
  2. Gazo

    Gazo Newbie

    Joined:
    Apr 12, 2018
    Messages:
    4
    Likes Received:
    4
    I think I understand what you are trying to do here, and there is probably an easier way to do it, but trying to go about it the way you did I came up with this:

    Code:
    from multiprocessing import Pool, Manager
    
    def test(obj):
        counter, item = obj
        counter_val = counter.get()
        counter.set(counter_val + 1)
        print(item, counter_val)   
        return
    
    if __name__ == '__main__':
        list1 = ["item1",
                "item2",
                "item3",
                "item4",
                "item5",
                "item6",
                "item7",
                "item8",
                "item9",
                "item10",
                "item11",
                "item12"]
    
        counter = Manager().Value(int, 0)
        p = Pool(4)  # worker count
        array = [(counter, item) for item in list1]
        p.map(test, array)
        p.terminate()
        print(counter)
    
    
    You were missing the type value for the counter object. The most confusing part about my code is probably how I am building the
    Code:
    array
    variable. I create a list of tuples that contain the counter object and the list1 items. I pass that to the test function because it's easier via p.map
     
    • Thanks Thanks x 1
  3. apex1

    apex1 Junior Member

    Joined:
    May 29, 2015
    Messages:
    196
    Likes Received:
    154
    Holy crap.. you're the best. It works perfectly!! :D

    You have no idea how long it would have taken me to figure that out.

    I searched around forever and couldn't find any tutorials showing your method.

    Cheers!!
     
  4. Gazo

    Gazo Newbie

    Joined:
    Apr 12, 2018
    Messages:
    4
    Likes Received:
    4
    Not a problem, any time you need some Python help just hit me up.
     
    • Thanks Thanks x 1
  5. apex1

    apex1 Junior Member

    Joined:
    May 29, 2015
    Messages:
    196
    Likes Received:
    154
    Need your help bro. I think I'm almost there but can't quite get it.

    I'm taking scraped URLs I want processed, adding them to a dataframe with Pandas, and trying to pass that through map in the array to use within my 'scraper' function.

    I need the dataframe within scraper function because the counter will let me fill the scraped data into the right table cell.

    Part of my problem is I don't know where to create the dataframe or how to manage it properly inside the function.

    Here's the code:

    Code:
    from multiprocessing import Lock, Pool, Manager
    from time import sleep
    from bs4 import BeautifulSoup
    import pandas as pd
    import re
    import requests
    
    
    exceptions = []
    lock = Lock()
    
    
    def scraper(obj):  # obj is the array passed from map (counter, url items)
    
        counter, url = obj  # not sure what this does
    
        df.insert(1, 'Alexa Rank:', "")  # insert new column
        df.insert(2, 'Status:', "")  # insert new column
    
        lock.acquire()
    
        counter_val = counter.get()
    
        try:
    
            scrape = requests.get(url,
                                  headers={"user-agent": "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36"},
                                  timeout=10)
    
            if scrape.status_code == 200:
    
                """ --------------------------------------------- """
                # ---------------------------------------------------
                '''      --> SCRAPE ALEXA RANK: <--    '''
                # ---------------------------------------------------
                """ --------------------------------------------- """
    
                sleep(0.1)
                scrape = requests.get("http://data.alexa.com/data?cli=10&dat=s&url=" + url,
                                      headers={"user-agent": "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36"})
                html = scrape.content
                soup = BeautifulSoup(html, 'lxml')
    
                rank = re.findall(r'<popularity[^>]*text="(\d+)"', str(soup))
    
                df.iloc[counter_val, 0] = url  # fill cell with URL data
                df.iloc[counter_val, 1] = rank  # fill cell with alexa rank
    
                counter.set(counter_val + 1)  # increment counter
    
                print("Server Status:", scrape.status_code, '-', u"\u2713", '-', counter_val, '-', df.iloc[counter_val, 0], '-', "Rank:", rank[0])
    
            else:
                print("Server Status:", scrape.status_code)
                df.iloc[counter_val, 2] = scrape.status_code  # fill df cell with server status code
                counter.set(counter_val + 1)
                pass
    
        except BaseException as e:
            exceptions.append(e)
            print("Exception:", e)
            df.iloc[counter_val, 2] = e  # fill df cell with script exception message
            counter.set(counter_val + 1)
            pass
    
        finally:
            lock.release()
            df.to_csv("output.csv", index=False)
            return
    
    
    if __name__ == '__main__':
    
        """ --------------------------------------------- """
        # ---------------------------------------------------
        '''               GET LINK LIST:                  '''
        # ---------------------------------------------------
        """ --------------------------------------------- """
    
        # get this line of code from the pastebin (link list)
        https://pastebin.com/h42wqJPp
    
        df = pd.DataFrame(list1, columns=["Links:"])  # create pandas dataframe from links list
    
        """ --------------------------------------------- """
        # ---------------------------------------------------
        '''               MULTIPROCESSING:         '''
        # ---------------------------------------------------
        """ --------------------------------------------- """
    
        counter = Manager().Value(int, 0)  # set counter as manager with value of 0
        array = [(counter, url) for url in df]  # link together the counter and list in an array ---------------------------------------- ***** ERROR - not adding links to array correctly *****
        print("Problem here, it's not adding all the links to array", array)
    
        p = Pool(20)  # worker count
        p.map(scraper, array)  # function, iterable
        p.terminate()