1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Simple Google Scraper

Discussion in 'Visual Basic .NET' started by moonlighsunligh, May 15, 2012.

  1. moonlighsunligh

    moonlighsunligh Jr. VIP Jr. VIP Premium Member

    Joined:
    May 1, 2010
    Messages:
    1,623
    Likes Received:
    218
    I cannot get this one to work. How to decode fucking Google search html code:

    PHP:
    Private Sub Button1_Click(ByVal sender As System.ObjectByVal e As System.EventArgsHandles Button1.Click
            WebBrowser1
    .Navigate("http://www.google.com/search?q=" TextBox1.Text)
        
    End Sub

        
    Private Sub WebBrowser1_DocumentCompleted(ByVal sender As System.ObjectByVal e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgsHandles WebBrowser1.DocumentCompleted
            Dim htmlele 
    As HtmlElementCollection
            htmlele 
    WebBrowser1.Document.GetElementsByTagName("h3")
            For 
    Each htm As HtmlElement In htmlele
                Dim chld 
    As HtmlElementCollection htm.GetElementsByTagName("a")
                For 
    Each ch As HtmlElement In chld
                    RichTextBox1
    .AppendText(ch.GetAttribute("href") & vbCrLf)
                
    Next
            Next
        End Sub

     
    Last edited: May 15, 2012
  2. andee

    andee Regular Member

    Joined:
    Jul 24, 2010
    Messages:
    218
    Likes Received:
    83
    turn javascript off look at the page source and use webrequests, its prob easier than using webcontrols
     
  3. Aceblackhat1

    Aceblackhat1 Newbie

    Joined:
    Mar 9, 2012
    Messages:
    4
    Likes Received:
    0

    Hey Moon,

    I tried your code to parse the google serp, but it's still leaving lines like this:


    It still extracts a lot of the urls, but we can't have these google redirects in our richtextbox when scraping. Do you know how to get rid of this?

    Thanks
     
  4. darkrache

    darkrache Junior Member Premium Member

    Joined:
    Mar 27, 2011
    Messages:
    148
    Likes Received:
    162
    Try this

    Code:
    Dim request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create("http://www.google.com/search?q=" & TextBox1.Text)            
    Dim response As System.Net.HttpWebResponse = request.GetResponse
    Dim sr As System.IO.StreamReader = New System.IO.StreamReader(response.GetResponseStream())
    Dim src As String = sr.ReadToEnd
    Now src is the source of the google page. You can parse out the urls, pages, or whatever you want. It's much faster than webbrowser since it doesn't markup the text, it just retrieves it.
     
    • Thanks Thanks x 1