1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How would i extract links from a xml.gz file?

Discussion in 'Black Hat SEO' started by studentcashnow, Apr 30, 2011.

  1. studentcashnow

    studentcashnow Regular Member

    Joined:
    Aug 21, 2009
    Messages:
    246
    Likes Received:
    186
    Hi all obviously i can extract links from a .xml file using scrapebox but it doesnt support a compressed xml file.

    Anyone have any ideas or software that i can use to extract all the links.

    cheers
     
    • Thanks Thanks x 1
  2. panoet

    panoet Regular Member

    Joined:
    Jun 26, 2010
    Messages:
    243
    Likes Received:
    56
    So you just need to extract that file...
     
  3. cooooookies

    cooooookies Senior Member

    Joined:
    Oct 6, 2008
    Messages:
    1,008
    Likes Received:
    216
    .gz is gnu zip archive, very common on linux.

    Check here: http://gnuwin32.sourceforge.net/packages/gzip.htm
     
  4. roamer

    roamer Power Member

    Joined:
    Dec 2, 2008
    Messages:
    500
    Likes Received:
    479
    Occupation:
    Gfx designer, vfx and mgfx
    Location:
    plɹoʍ ǝɥʇ punoɹɐ ƃuıɯɐoɹ
    Open/extract the file as suggested by panoet with an archiver like Winrar or 7zip. You should be left with an xml file, judging from the file name.
     
  5. studentcashnow

    studentcashnow Regular Member

    Joined:
    Aug 21, 2009
    Messages:
    246
    Likes Received:
    186
    ok cheers yeah i am left with a .xml file but when i upload it to my server it says the sitemap doesnt exist. But if i open it from my locahost all the links are there.

    I am trying to get scrapebox to download/extract links from the sitemap. But scrapebox doesnt support the compressed version (xml.gz). So i need to get the xml.gz to a xml. If that makes sense lol

    ahh got it, i had to create a new folder on my server then put the xml file in there.:) weird.

    cheers everyone