1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Need a C# coder who knows how to scrape

Discussion in 'Hire a Freelancer' started by kytro360, Dec 24, 2011.

  1. kytro360

    kytro360 Power Member

    Joined:
    Jan 12, 2010
    Messages:
    703
    Likes Received:
    732
    Hi I am looking to hire a C# coder who knows how to scrape.

    I am coding something and need something to be scraped with what I am working with. I would need the source as well.

    If you know how to scrape and think you are apt for the job please PM me.

    I need this done fast!
     
  2. notrin

    notrin Power Member

    Joined:
    Apr 15, 2010
    Messages:
    643
    Likes Received:
    71
    Occupation:
    Self Employed Web Master
    Location:
    Montana, USA
    scraping is actually faily simple, i have one built in c# that gets urls from a given url

    here is some code i modified in c#

    its not all 100% working but it does return the urls

    Code:
    
            protected void btnScrape_Click(object sender, EventArgs e)
            {
                using (var context = new netFinal.scrapeContainer())
                {
                    target etarget = new target();
                    etarget.name = txtScrapeUrl.Text;
    
    
                    ArrayList a = new ArrayList();
                    string myString = null;
                    byte[] aRequestHTML;
    
                    // make an object of the WebClient class 
                    WebClient objWebClient = new WebClient();
    
                    // gets the HTML from the url written in the textbox
                    aRequestHTML = objWebClient.DownloadData(txtScrapeUrl.Text);
    
                    // creates UTf8 encoding object
                    UTF8Encoding utf8 = new UTF8Encoding();
    
                    // gets the UTF8 encoding of all the html we got in aRequestHTML
                    myString = utf8.GetString(aRequestHTML);
    
                    // this is a regular expression to check for the urls 
                    Regex r = new Regex(@"((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)");
                    //this regex grabs all href tags
                    // Regex r = new Regex(@"((mailto\:|(news|(ht|f)tp(s?))\://){1}\S+)");
    
                    // get all the matches depending upon the regular expression
                    MatchCollection mcl = r.Matches(myString);
                    context.AddTotargets(etarget);
                    foreach (Match ml in mcl)
                    {
                        a.Add(ml.Value);
                        if (ml.Value.IndexOf("@") > 0)
                        {
                            email eMail = new email();
                            eMail.domain = ml.Value;
                            context.AddToemails(eMail);
                            eMail.target = etarget;
                            context.SaveChanges();
                        }
                        else
                        {
                            url newUrl = new url();
                            newUrl.domain = ml.Value;
                            context.AddTourls(newUrl);
                            newUrl.target = etarget;
                            context.SaveChanges();
                         }
                        
    
                    }
                    // assign arraylist to the datasource
    
                    //dataGridView1.DataSource = a;
    
    
                    // The following lines of code writes the extracted Urls to the file named test.txt
                    //StreamWriter sw = new StreamWriter("test.txt");
                    //sw.Write(myString);
                   // sw.Close();
                    
    
                    context.SaveChanges();
                }
            }
        }
    }

    i could probably do this. this here was a school project, i have some ppl i can get some help from too

    what price are you offering?
     
    Last edited: Dec 24, 2011