1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

A script to fetch HTML

Discussion in 'General Scripting Chat' started by 1337python, Sep 23, 2013.

  1. 1337python

    1337python Regular Member

    Joined:
    Jun 18, 2013
    Messages:
    392
    Likes Received:
    235
    Location:
    127.0.0.1
    Hello,

    I would like to learn if there's a way for you to fetch a certain part of a website, for example, I want to know if my competitors use the same type of a script, line of code, etc. in their website. You type a part of code you want to check in the website and it searches for it in different URL's. The URL's would be checked from a text file which contains a fair amount of websites? I tried messing around with "curl" in bash, but I just can't seem to get it right. If you have some insight on how you can do it in python, korn shell, bash, c++, java, please share. Thanks
     
  2. Panther28

    Panther28 Elite Member

    Joined:
    May 2, 2010
    Messages:
    2,268
    Likes Received:
    3,405
    Occupation:
    Internet.
    Location:
    Internet.
    Here is some php script to do a curl scrape for a site. You will need to add a bit more to it, but this will get you going in the right direction.

     
    • Thanks Thanks x 1
  3. kboing

    kboing Newbie

    Joined:
    Sep 19, 2013
    Messages:
    24
    Likes Received:
    4
    use the code from Panther28 to fetch html and after preg_match_all to get all info you need !
     
  4. roach

    roach BANNED BANNED

    Joined:
    Sep 8, 2009
    Messages:
    740
    Likes Received:
    395
    I would just use a simple webrequest to download the html, then parse it for the keywords you are looking for. I could code this in 15 mins tops. Just do a google on simple webrequests and another on regex... Best of luck to you. :)
     
    • Thanks Thanks x 1
  5. 1337python

    1337python Regular Member

    Joined:
    Jun 18, 2013
    Messages:
    392
    Likes Received:
    235
    Location:
    127.0.0.1
    Thanks Panther, I'm going to try this tomorrow. You're helpful as ever. :)


    I thought of that already, but it would take too much resources to complete that.
     
  6. bytzu

    bytzu Registered Member

    Joined:
    Jun 30, 2011
    Messages:
    96
    Likes Received:
    137
    • Thanks Thanks x 1
  7. roach

    roach BANNED BANNED

    Joined:
    Sep 8, 2009
    Messages:
    740
    Likes Received:
    395
    Sir it takes no resources at all to use a simple webrequest.. a general rule of thumb in life... If you dont know what you are talking about then keep your mouth shut.. Just makes you sound stupid when you make statements like that....
     
    • Thanks Thanks x 1
  8. roach

    roach BANNED BANNED

    Joined:
    Sep 8, 2009
    Messages:
    740
    Likes Received:
    395
    Code to Download Source Code in VB NET:
    Code:
     Dim request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create(TextBox2.Text)
        Dim response As System.Net.HttpWebResponse = request.GetResponse()
    
        Dim sr As System.IO.StreamReader = New System.IO.StreamReader(response.GetResponseStream())
    
        Dim sourcecode As String = sr.ReadToEnd()
        TextBox1.Text = sourcecode
    
     
    • Thanks Thanks x 2
  9. innosoft

    innosoft Jr. VIP Jr. VIP Premium Member

    Joined:
    Nov 25, 2008
    Messages:
    1,633
    Likes Received:
    639
    Occupation:
    Software Developer, SEO
    Location:
    Office
    Home Page:
    if ur not good with programming and if u have less sites, u can do it manually by viewing source and hit on ctrl + F and type what u wanna search.. sometimes manual work is timesaver..

    Edit: While i was typing roach gave a nice code.. :) small addition to it..

    Code:
    
    If SourceCode.Contains("<Text to Search>") then
    
    MsgBox("Found....")
    
    End If
    
    
     
    • Thanks Thanks x 2
    Last edited: Sep 23, 2013
  10. roach

    roach BANNED BANNED

    Joined:
    Sep 8, 2009
    Messages:
    740
    Likes Received:
    395
    Here is a complete function I just got for you. It is VB NET and does exactly what you want:

    Code:
    Dim request As WebRequest = WebRequest.Create("http://pagewantingtouse.com") 'create a web request to the html file
        Using response As WebResponse = request.GetResponse() 'get the response back from the request
            Using Reader As New StreamReader(response.GetResponseStream) 'identify how you want to read the response and assign it to streamreader
                Dim webtext As String = Reader.ReadToEnd() 'read the response (web page) as a string to the end of the file
    
     Dim wbc As WebBrowser = New WebBrowser() 'create a web browser to handle the data. this will help sift through it
    
                wbc.DocumentText = ""
    
                With wbc.Document
                    .OpenNew(True)
                    .Write(webtext) 'write the web page response into the web browser control
    
    
                    Dim itemlist As HtmlElementCollection = .GetElementsByTagName("DIV")
    
                        For Each item In itemlist 'look at each item in the collection
                        If item.classname = "description" Then 
                            msgbox item.innertext 'this would msgbox your description
                        Exit For 'exit once found
                        End If
    
    
    
                    Next 'do this for every item in the collection until we find it
    
                End With 
    
            wbc.dispose()
    
            End Using
    
        End Using 
    
     
    • Thanks Thanks x 2
  11. 1337python

    1337python Regular Member

    Joined:
    Jun 18, 2013
    Messages:
    392
    Likes Received:
    235
    Location:
    127.0.0.1
    I can't seem to get it working with MS Visual Studio 2012. Which version did you use when you were writing? A lot of your code was being changed by the compiler and it went all bad. Sorry if you misunderstood me in my previous post. I meant to say "Wouldn't it take too much resources" Lol. English isn't my 1st language, I know it isn't an excuse.
     
  12. roach

    roach BANNED BANNED

    Joined:
    Sep 8, 2009
    Messages:
    740
    Likes Received:
    395
    Yes I am sorry I use Visual studio 2010. I have not moved to 2012 yet. I suppose I should have told you that. My bad :) Good Luck. :)
     
    • Thanks Thanks x 1
  13. Four Seasons

    Four Seasons Regular Member

    Joined:
    Aug 22, 2011
    Messages:
    409
    Likes Received:
    206
    Location:
    Cottonballs
    What you need is something called scraping and "parsing". Can be done in PHP too.
     
    • Thanks Thanks x 2
  14. roach

    roach BANNED BANNED

    Joined:
    Sep 8, 2009
    Messages:
    740
    Likes Received:
    395
    BeautifulSoup and Python will also do that...... :)
     
    • Thanks Thanks x 1
  15. 1337python

    1337python Regular Member

    Joined:
    Jun 18, 2013
    Messages:
    392
    Likes Received:
    235
    Location:
    127.0.0.1
    Thanks guys!
     
  16. TZ2011

    TZ2011 Senior Member

    Joined:
    Jun 26, 2011
    Messages:
    832
    Likes Received:
    863
    Occupation:
    Cleaning servers
    It can be done in php, but working just with classic .php it's not recommended since php can be pretty heavy on server resources. I have done various projects with fetching and content scraping with it, but to optimize it and make it to work stable, in timeout limits, can be major pain in the ass.
     
  17. roach

    roach BANNED BANNED

    Joined:
    Sep 8, 2009
    Messages:
    740
    Likes Received:
    395
    Since you have sent me a PM explaining that you can not get this code to work. I am going to build you a very simple tool. This tool is just to show you a working function. I will not add features or update this tool. This tool is just to help you learn. Give me a few minutes to make this and then to virus scan it for you. I will upload the source code and comment it the best I can. I do not mind helping. Please take this simple tool and learn from it.
     
    • Thanks Thanks x 1
  18. 1337python

    1337python Regular Member

    Joined:
    Jun 18, 2013
    Messages:
    392
    Likes Received:
    235
    Location:
    127.0.0.1
    I appreciate that you do this for me. When I get more successful in coding and in IM I will definitely remember you!
     
    Last edited: Sep 24, 2013
  19. roach

    roach BANNED BANNED

    Joined:
    Sep 8, 2009
    Messages:
    740
    Likes Received:
    395
    Here is the VERY SIMPLE APP. This is just a proof of concept to show you how to use the code I gave you. This does exactly what you asked for. http://puu.sh/4zchg.rar I uploaded using puush.me I love puush :) Virus Total: https://www.virustotal.com/en/file/4b60dadbf3dd238f80d56e995cfb81902ba44c4300e9c8a56ac01ae07dde3a3a/analysis/1380043664/ This download has full source as well as exe. Good Luck learning to code :)
     
    • Thanks Thanks x 1
  20. 1337python

    1337python Regular Member

    Joined:
    Jun 18, 2013
    Messages:
    392
    Likes Received:
    235
    Location:
    127.0.0.1
    • Thanks Thanks x 1