1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Learning Html Agility Pack ~ site doesn't offer guides ~ know of any?

Discussion in 'General Programming Chat' started by simpleonline1234, May 27, 2011.

  1. simpleonline1234

    simpleonline1234 Junior Member

    Joined:
    Jan 26, 2010
    Messages:
    169
    Likes Received:
    13
    I'm currently in a project that will involve navigating to a website and grabbing all the input elements like ID, or NAME, etc and then inputing the value to a textbox.

    I've read that Html Agility Pack is the way to go but their website doesn't really offer a guide of code to sift through just a bunch of people posting their code.

    Where would one go to get all the methods, options for Html Agility Pack ?
     
  2. Packers

    Packers Registered Member

    Joined:
    Jan 31, 2011
    Messages:
    77
    Likes Received:
    7
    I used c#. just a bit of googling ending up finding some useful links. There are some docs and a help file with the pack that you download, not directly on the website!
     
  3. Packers

    Packers Registered Member

    Joined:
    Jan 31, 2011
    Messages:
    77
    Likes Received:
    7
    Code:
    		//extract all forms
    		public Dictionary<string,Dictionary<string,string>> extractForms(HtmlDocument html)
    		{
        		//List<KeyValuePair<string, string>> forms = new List<KeyValuePair<string, string>>();
    			Dictionary<string, Dictionary<string,string>> forms = new Dictionary<string, Dictionary<string, string>>();
    			
     			int n = 0;
        		foreach (HtmlNode form in html.DocumentNode.SelectNodes("//form"))
        		{
    				Dictionary<string,string> formElements = new Dictionary<string, string>();
    
    				foreach(HtmlAttribute at in form.Attributes) {		
    					formElements.Add(at.Name, at.Value);
    				}
    				if (form.Attributes["id"] == null) {
    					forms.Add(n.ToString(), formElements);
    				} else
    					//Console.WriteLine(form.Attributes["id"].Value);
    					try {
    						forms.Add(form.Attributes["id"].Value, formElements);
    					} 
    					catch(System.ArgumentException e) {
    						forms.Add(form.Attributes["id"].Value + n.ToString(), formElements);
    					}
    			
        		}	
    
        		return forms;
    		}
    A little snippet which extracts the forms on the webpage, and in each form captures the input names and values.

    EDIT: Thats a lie. it doesnt capture the input names and values. It captures the attributes of the form and their values. Need to fix that up. The code is also a mess I know, but I've been experimenting...
     
    Last edited: May 27, 2011
  4. simpleonline1234

    simpleonline1234 Junior Member

    Joined:
    Jan 26, 2010
    Messages:
    169
    Likes Received:
    13
    Cool..thanks...I will give it a try..oh here's the same code translated to VB.NEt for any VB.NET peps out there

    Code:
    Public Function extractForms(html As HtmlDocument) As Dictionary(Of String, Dictionary(Of String, String))
    
    	Dim forms As New Dictionary(Of String, Dictionary(Of String, String))()
    
    	Dim n As Integer = 0
    	For Each form As HtmlNode In html.DocumentNode.SelectNodes("//form")
    		Dim formElements As New Dictionary(Of String, String)()
    
    		For Each at As HtmlAttribute In form.Attributes
    			formElements.Add(at.Name, at.Value)
    		Next
    		If form.Attributes("id") Is Nothing Then
    			forms.Add(n.ToString(), formElements)
    		Else
    
    			Try
    				forms.Add(form.Attributes("id").Value, formElements)
    			Catch e As System.ArgumentException
    				forms.Add(form.Attributes("id").Value + n.ToString(), formElements)
    			End Try
    
    		End If
    	Next
    
    	Return forms
    End Function
    
     
  5. Packers

    Packers Registered Member

    Joined:
    Jan 31, 2011
    Messages:
    77
    Likes Received:
    7
    Good job :) To be honest man, I'm not sure how useful this is as a parser. I'm trying to get all the input elements within a <form> tag and it's skipping a few... :\ Very frustrating!