1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How To Extract Data From A List Of URLs

Discussion in 'BlackHat Lounge' started by princeofmtl, Oct 24, 2016.

  1. princeofmtl

    princeofmtl Newbie

    Joined:
    Nov 28, 2015
    Messages:
    8
    Likes Received:
    0
    I have a list of 2000+ URLs that have only a small amount of text data in them, is there a way to automate the process and put the data into a spreadsheet? Any input would be greatly appreciated, i have been looking everywhere!
    Thank you!
     
  2. BlackBDO

    BlackBDO Jr. VIP Jr. VIP

    Joined:
    Jan 4, 2016
    Messages:
    513
    Likes Received:
    323
    Creating a small bot for this should cost you max $5-7.
     
  3. tb303

    tb303 Senior Member

    Joined:
    Dec 18, 2011
    Messages:
    804
    Likes Received:
    480
    what is the url structure look like?

    if its something like url.com/page.php?param1=text1&param2=text2&etc.... then you could do it quickly with excel

    Open the file in notepad and search&replace all "?" for "&" and save it as a temp file. Then open that temp file in excel, select Other as the Delimiter and put "&" in the box. You should get each parameter as a column then that looks like
    Code:
    param1=text1     param2=text2
    You could also do a search/replace all for "=" to "&" before importing it to get it like this
    Code:
    param1     text1     param2     text2
    for other url structures you can try different search/replaces to make it easily splitable eg "/" - just make sure the char you replace with is not valid in the text you want to extract.
     
  4. nobodyelsein

    nobodyelsein Regular Member

    Joined:
    Mar 24, 2014
    Messages:
    393
    Likes Received:
    110
    Occupation:
    Dunce
    Location:
    This corporeal plane
    I imagine scrapebox could likely do this quite easily?

    Is this just a one-off or an ongoing project?
     
  5. thetrustedzone

    thetrustedzone Jr. VIP Jr. VIP

    Joined:
    Jun 15, 2010
    Messages:
    2,507
    Likes Received:
    2,032
    Home Page:
  6. HoNeYBiRD

    HoNeYBiRD Jr. VIP Jr. VIP

    Joined:
    May 1, 2009
    Messages:
    7,424
    Likes Received:
    8,362
    Gender:
    Male
    Occupation:
    Geographer, Tourism Manager
    Location:
    Ghosted
    It can be done even with iMacros looping through your list of urls.
    I just put this macro together really quick:

    HTML:
    VERSION BUILD=10022823
    TAB T=1
    TAB CLOSEALLOTHERS
    SET !ERRORIGNORE NO
    SET !DATASOURCE c:\urls.csv
    SET !LOOP 1
    SET !DATASOURCE_LINE {{!LOOP}}
    SET !EXTRACT_TEST_POPUP NO
    ADD !EXTRACT {{!COL1}}
    URL GOTO={{!COL1}}
    TAG POS=1 TYPE=HTML ATTR=* EXTRACT=TXT
    SAVEAS TYPE=EXTRACT FOLDER=c:\ FILE=data.csv
    Instructions:
    Create an urls.csv file in the root of your c: drive (or alter the location of the data source in the macro), put one url per line in your .csv.
    Play the macro in loop, if you have 2,000 urls, put 2000 in the loop counter.
    A data.csv file will be saved into the root of your c: drive with the urls in the first column and with the belonging extracted data in the second column.

    The macro will extract everything which is on a page, ads too, so there might be redundant data.
     
  7. princeofmtl

    princeofmtl Newbie

    Joined:
    Nov 28, 2015
    Messages:
    8
    Likes Received:
    0
    Where can a find someone to make me a bot?
     
  8. princeofmtl

    princeofmtl Newbie

    Joined:
    Nov 28, 2015
    Messages:
    8
    Likes Received:
    0
    Looks great however when I run it I am getting
    Error -1100: Wrong format of SAVEAS TYPE=EXTRACT FOLDER=C:\ FILE=data.csv command, at line: 12
    What am I doing wrong?
     
  9. HoNeYBiRD

    HoNeYBiRD Jr. VIP Jr. VIP

    Joined:
    May 1, 2009
    Messages:
    7,424
    Likes Received:
    8,362
    Gender:
    Male
    Occupation:
    Geographer, Tourism Manager
    Location:
    Ghosted
    Nothing, i effed it up. lol

    When i made the macro, i worked with different file paths, which i stripped down to c:\, when i posted the code above. It wasn't a good idea, it appears. With Win 10 i can't seem to write the root of c: for some reason (i guess security reason). What you need to do is alter the FOLDER=c:\ parameter in the last line to another location on your hard drive and it should work. Just make it to FOLDER=c:\Users\YourWindowsUsername\Desktop\ or wherever you want to save the .csv and the macro should work.

    Or wait, maybe that's not even it, because i'm not getting the error. The macro simply don't run for me in the form i originally posted.

    Have you modified the path already in the last line? Because if you did and if there's any space in the path, you should insert <SP> in the code, where the space(s) can be found.

    Let' say you want to save the .csv file to the following path:
    c:\Users\YourWindowsUsername\My Documents\
    then when you put it in the code, it should look like:
    c:\Users\YourWindowsUsername\My<SP>Documents\

    Wherever a space is in the path (either in the folder name or the file name), you should insert <SP> instead of it into the code.
     
    Last edited: Oct 25, 2016
  10. aminima

    aminima Newbie

    Joined:
    Aug 25, 2016
    Messages:
    30
    Likes Received:
    2
    Gender:
    Male
    I can help.
    I am on fiverr too under the same username.
     
  11. Weirdest

    Weirdest Junior Member

    Joined:
    Jul 14, 2016
    Messages:
    144
    Likes Received:
    31
    Gender:
    Male
    Can you make an apk bot that will run on my Android phone? I need a bot to do multiple signups
     
  12. aminima

    aminima Newbie

    Joined:
    Aug 25, 2016
    Messages:
    30
    Likes Received:
    2
    Gender:
    Male
    Please, i'm not in the liberty of discussing this things here.
     
  13. fastlinks

    fastlinks Power Member

    Joined:
    Feb 4, 2015
    Messages:
    616
    Likes Received:
    76
    i can make bot, skype: ipowerhost2