1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Extract domains out of text files - reward 10$ bitcoin or paypal

Discussion in 'Hire a Freelancer' started by fabssoouu, Aug 25, 2016.

  1. fabssoouu

    fabssoouu Newbie

    Joined:
    Aug 10, 2016
    Messages:
    19
    Likes Received:
    1
    Hi guys,

    Can some-one please extract all the domains out of the (html) text-files in this link?

    Exclude the following domain in the results: http://www.theperfectwedding.nl/ (and every combination after the slach)

    All the files are in here: (file size is 1,3 gb)

    https://drive.google.com/folderview?id=0B0ROX0vWrG2aaWxqeGZ0Z2JRMFU&usp=sharing
     
    • Thanks Thanks x 1
  2. seoguruseo

    seoguruseo Regular Member

    Joined:
    Aug 2, 2016
    Messages:
    206
    Likes Received:
    11
    Gender:
    Male
    how many domains are there total?
    also its not single file
     
  3. Unknown Overlord

    Unknown Overlord Junior Member

    Joined:
    Nov 7, 2009
    Messages:
    106
    Likes Received:
    48
    I highly doubt someone is going to go through all the trouble of going through all these files and extracting links for $10.
     
  4. fabssoouu

    fabssoouu Newbie

    Joined:
    Aug 10, 2016
    Messages:
    19
    Likes Received:
    1
    There is 1 domain for each file so around 10000+
    So it is not one file. You first have to merge them and then extract i think?
     
  5. fabssoouu

    fabssoouu Newbie

    Joined:
    Aug 10, 2016
    Messages:
    19
    Likes Received:
    1
    Lets hope somebody willl :)
     
  6. seoguruseo

    seoguruseo Regular Member

    Joined:
    Aug 2, 2016
    Messages:
    206
    Likes Received:
    11
    Gender:
    Male
    I can do but question is how u r going to verify all domains?
     
  7. plut0

    plut0 Regular Member

    Joined:
    Aug 2, 2008
    Messages:
    264
    Likes Received:
    60
    how much tld do you want to grab ??
    .com, .net, .org only ?
     
  8. fabssoouu

    fabssoouu Newbie

    Joined:
    Aug 10, 2016
    Messages:
    19
    Likes Received:
    1
    Only one must be extracted tld: .nl
     
  9. fabssoouu

    fabssoouu Newbie

    Joined:
    Aug 10, 2016
    Messages:
    19
    Likes Received:
    1
    In the text files it has to start with: href="http://www.
    And end with tld: .nl

    And exclude this domain: http://www.theperfectwedding.nl/

    If possible show file name (107 for example) and then the domain extracted out of that fill in a excel doc. Only if possible..

    Hope you know enough
     
  10. telim2

    telim2 Regular Member

    Joined:
    Sep 7, 2014
    Messages:
    339
    Likes Received:
    145
    can automate this but the payment is too low for the project. add me on Skype telim221 If you can pay $50 I will develop a bot to auto extract required details
     
  11. plut0

    plut0 Regular Member

    Joined:
    Aug 2, 2008
    Messages:
    264
    Likes Received:
    60
    should be doable. let me play on it in this couple days.
     
  12. cpaforever

    cpaforever Newbie

    Joined:
    Sep 3, 2015
    Messages:
    35
    Likes Received:
    3
    put direct links , no one gong to open all this folders for you.
    split each 50 mb
     
  13. amarindia

    amarindia Power Member

    Joined:
    Dec 5, 2014
    Messages:
    730
    Likes Received:
    48
    Gender:
    Male
    Occupation:
    Freelancer
    Location:
    India
    i can do... connect on skype: bpo2india

    reg.
     
  14. living2xl

    living2xl Jr. VIP Jr. VIP

    Joined:
    Dec 9, 2011
    Messages:
    1,739
    Likes Received:
    415
    Occupation:
    Sippin dat juice - Shout it louder!
    Location:
    Not sleeping!
    Home Page:
    damn this is a royal pain

    maybe combine all files into one txt file then parse the file for domains and filter for nl
     
  15. Asif WILSON Khan

    Asif WILSON Khan Executive VIP Jr. VIP

    Joined:
    Nov 10, 2012
    Messages:
    12,775
    Likes Received:
    35,220
    Gender:
    Male
    Occupation:
    Fun Lovin' Criminal
    Location:
    London
    Home Page:
  16. MrBlue

    MrBlue Senior Member

    Joined:
    Dec 18, 2009
    Messages:
    975
    Likes Received:
    681
    Occupation:
    Web/Bot Developer
    Not that difficult using GREP and SED

    Code:
    grep http ./yourfilename.txt | sed 's/http/\nhttp/g' | grep ^http | sed 's/\(^http[^ <]*\)\(.*\)/\1/g' | grep IWANTthis | sort -u
    
     
  17. Buzzika

    Buzzika Supreme Member

    Joined:
    Jul 8, 2009
    Messages:
    1,208
    Likes Received:
    1,484
    Occupation:
    Hustler
    Location:
    Gurgaon
    I could do it for free if you give it to me in a single file.
     
  18. tasburrfoot

    tasburrfoot Regular Member

    Joined:
    Dec 16, 2008
    Messages:
    332
    Likes Received:
    153
    Sent you a PM.
     
  19. OrangeNRG

    OrangeNRG Regular Member

    Joined:
    Dec 10, 2012
    Messages:
    407
    Likes Received:
    256
    OP are you retarded? You should pay me $10 just for how many times I pressed the Page Down key...
     
  20. fabssoouu

    fabssoouu Newbie

    Joined:
    Aug 10, 2016
    Messages:
    19
    Likes Received:
    1
    Love u 2