1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Moving from WebBrowserControl to HttpWebRequest - bit of help needed

Discussion in 'Visual Basic .NET' started by m3ownz, Jan 6, 2011.

  1. m3ownz

    m3ownz Regular Member

    Joined:
    Dec 12, 2009
    Messages:
    311
    Likes Received:
    135
    Hello.
    I am trying to get away from webbrowsercontrol for a number of reasons, mainly because its slow and bloated and i have a bot i eventually want to multi-thread (another topic all together!)

    Anyway, i have been reading some tuts and just generally playing about in VB 2010 expess.

    I can get my bot to post, but i need to get a value from the page.
    I can get the page source into a string with a streamreader, but dont know how to parse the data in the string to get what i want.

    Basically the page contains:

    HTML:
    lots of html
    <input name="example_name" value="123">
    lots of html
    
    And i need to save that value to a variable.

    Any ideas?
     
  2. saxgod

    saxgod Regular Member

    Joined:
    Sep 19, 2010
    Messages:
    351
    Likes Received:
    337
    What language are you working in ? C#?
    Try using a regexp to filter out that part of the html and save the 'value' part.
     
    • Thanks Thanks x 1
  3. creditandgolfse

    creditandgolfse Registered Member

    Joined:
    Dec 22, 2009
    Messages:
    84
    Likes Received:
    39
    i'm doing the same in vba.. converting from webbrowsercontrol to http requests.. you have to do the regexp.. something like this:

    <input name="example_name" value="(.*)?">

    and then depending on the language you will be able to get whats in the (.*)? part and then use that to continue on with your code.

    Note for the regexp, you'll have to escape certain characters - likely the ". Other thing you can do that I do is instead of doing the " you can use a . instead. So you do:

    <input name=.example_name. value="(.*)?">

    that way if a site uses a single quote it still works.
     
    • Thanks Thanks x 1
  4. m3ownz

    m3ownz Regular Member

    Joined:
    Dec 12, 2009
    Messages:
    311
    Likes Received:
    135
    Thanks alot.

    Im using VisualBasic .net 2010.
    I will look into using regular expressions.

    Im sure it will be well worth learning all this in the long run, but webrowsercontrol sure is simpler!

    Edit:
    If anyone feels like spoon feeding me some regualr expresions, heres where im at with this so far:

    Code:
    
    
            'get target page source code to extract data from
            Dim request As HttpWebRequest = HttpWebRequest.Create(txtTargetUrl.Text)
            Dim response As HttpWebResponse = request.GetResponse()
            'store the status code - did it load etc
            Dim didItWork As Integer = response.StatusCode
               
            Dim sr As StreamReader = New StreamReader(response.GetResponseStream())
    
            Dim pageSourceCode As String = sr.ReadToEnd()
    
            'save the souce in a text box so i know it worked
            RichTextBox1.Text = response.StatusCode
    
            'magically retrieve some data from the pageSourceCode string??
    
    
     
    Last edited: Jan 6, 2011
  5. Heranthius

    Heranthius Newbie

    Joined:
    Sep 6, 2010
    Messages:
    6
    Likes Received:
    5
    Occupation:
    Self Employed
    Location:
    Upstate, New York
    You can try the following code. Just change example_name to whatever you need. You'll end up with a string sCapture containing the value of the tag.

    Code:
    Dim sCapture As String = System.Text.RegularExpressions.Regex.Match(pageSourceCode, "\<input\sname\=\""example_name\""\svalue\=\""(.*)?\""\>").Groups(1).Value
     
    • Thanks Thanks x 1
  6. walker

    walker Junior Member

    Joined:
    Feb 19, 2009
    Messages:
    146
    Likes Received:
    49
    regex will do this for you. however, generally, httprequest can only do simple things. on lots of sites, you may need special processing using the httprequest. and it will make the maintenance of the program not easy since it is not so straightforward.

    webbrowser deals with all situation for you, that's why it is slow. you can actually customize the browser to make it faster.
     
    • Thanks Thanks x 1
  7. m3ownz

    m3ownz Regular Member

    Joined:
    Dec 12, 2009
    Messages:
    311
    Likes Received:
    135
    Heranthius
    Thanks alot, works great.
    It does have an issue if certain sites have additional attributes for that particular input tag, for example an unexpected class attribute before or after the value attribute will break it, but i am sure i can work around that. You have given me a good starting point.

    Walker
    I have the bot working with webbrowser, but modifying webbrowsers seems to be very unreliable as its based on IE settings that change from machine to machine (and i want to give the finished bot away to my subscribers).

    This is also a learning project for myself, so im going to stick with webrequest for the time being, but thanks for your suggestion.
     
  8. Heranthius

    Heranthius Newbie

    Joined:
    Sep 6, 2010
    Messages:
    6
    Likes Received:
    5
    Occupation:
    Self Employed
    Location:
    Upstate, New York
    You might want to look into a parser called HtmlAgilityPack (do a Google search). That's what I use for all of my bots. It turns the raw html from a webrequest into a nicely formatted xml document that you can search using XPath (another Google search) which I find to be much simpler than doing the whole RegEx thing. You can search for specific tags with certain properties ie: all input tags with the class "example" or whatever. It's a little more work initially but it makes so many things 10x easier.
     
    • Thanks Thanks x 2
  9. reinrein

    reinrein Regular Member

    Joined:
    Feb 8, 2008
    Messages:
    443
    Likes Received:
    343
    Home Page:
    Webbrowser have advantages like you can see current stated of requests etc, but its prone to crashes. What I did is I created a formelement class that scrapes all needed input/values from a html page, and processess the form post. Basically its like a webbrowser emulation, and its a ton of code. Only tricky thing to support with sockets are the javascript codes that needs to be executed.
     
    • Thanks Thanks x 1
  10. xhpdx

    xhpdx Regular Member

    Joined:
    Sep 21, 2008
    Messages:
    331
    Likes Received:
    2,160
    Occupation:
    Coder
    Location:
    EU
    +1 for htmlagility pack. Personally I hate regex, so here is how to get the value using htmlagility:

    Code:
    dim doc as new htmldocument()
    doc.loadhtml(source code of the page)
    dim s as htmlnode = doc.DocumentNode.SelectSingleNode("//input")
    msgbox(s.getattributevalue("value","")
    Sure it's more lines than a regex, but easier to learn. XPATH is very flexible, so if you have many input fields, you can use "//input[@name='example_name']" to get that specific input
     
    • Thanks Thanks x 1
  11. m3ownz

    m3ownz Regular Member

    Joined:
    Dec 12, 2009
    Messages:
    311
    Likes Received:
    135
    htmlagility pack looks very useful, googleing it now.
    I do not need to worry about js for this bot, its actually something that was proving to be a pain in the arse with webbrowsers - bloody popups and ads slowing it down and causing redirects etc.

    Thanks for everyone's help, will surely be back with more daft questions, especially when it comes to multi-threading!