1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

help scrape youtube url's

Discussion in 'C, C++, C#' started by mintuz, Mar 12, 2011.

  1. mintuz

    mintuz Newbie

    Joined:
    Jan 15, 2011
    Messages:
    42
    Likes Received:
    3
    Home Page:
    hey I have nearly finished a program i have been working on for the last couple of months which can mass comment youtube videos. at the moment the user has to type in each url separately. what i want to achieve is a scrape youtube url function which pulls all youtube url's with a specific keyword. Can somebody help me. i have been reading over the youtube api but cannot find anything relevant.

    please help me, this program could be released by tomorrow and want to share it on blackhat world.

    it already has the ability to increase youtube views and think this will be a valuable piece of software to any youtube marketer.

    i am using c# .net 4.0
     
  2. Stalli0n

    Stalli0n Junior Member

    Joined:
    Nov 17, 2010
    Messages:
    115
    Likes Received:
    83
    Location:
    Europe
    What do you need the API for ^^

    Just use HttpWebRequest, Sockets, WebClient etc.
    get the source from all the result pages:
    Youtube dot com/results?search_query={keyword}&page={n}

    then use Regex to get the video URLs:
    <h3 id="video-long-title-XXXXXXXX"><a href="/watch?v=XXXXXXX" title=".........">
     
  3. smack

    smack Junior Member

    Joined:
    Feb 1, 2010
    Messages:
    182
    Likes Received:
    78
    Occupation:
    Software Engineer/Evil Genius
    Location:
    inside .NET
    yeah i don't think you're going to want to use the API for any site for any sort of black hat type functionality.

    that's an easy way to get noticed by the back office staff and have your accounts killed quite quickly.

    stalli0n has it exactly right for getting those URLs.
     
  4. theMagicNumber

    theMagicNumber Regular Member

    Joined:
    May 13, 2010
    Messages:
    345
    Likes Received:
    195
    Probably you solved, your issues but here is the regex to get yt video IDSs, from search page results.
    "data-video-ids=\"(.+?)\""
     
  5. mintuz

    mintuz Newbie

    Joined:
    Jan 15, 2011
    Messages:
    42
    Likes Received:
    3
    Home Page:
    not sure how to use regex, ive been using htmlagilitypack. still havent solved my problem. and my comment submitter was done using the api but got round it by the user supplies their own api key, so the program can never actually be banned.
     
  6. Stalli0n

    Stalli0n Junior Member

    Joined:
    Nov 17, 2010
    Messages:
    115
    Likes Received:
    83
    Location:
    Europe
    Code:
    Regex videoUrls = new Regex([SIZE=2][COLOR=#a31515][SIZE=2][COLOR=#a31515]"data-video-ids=\"(.+?)\""[/COLOR][/SIZE][/COLOR][/SIZE]);
    
    foreach (Match m in videoUrls.Matches)
    {
        m.Value.... etc...
    }
    
     
    Last edited: Mar 15, 2011
  7. theMagicNumber

    theMagicNumber Regular Member

    Joined:
    May 13, 2010
    Messages:
    345
    Likes Received:
    195
    Well, if you still didn't figure that out, here is some code for you.
     

    Attached Files:

  8. mintuz

    mintuz Newbie

    Joined:
    Jan 15, 2011
    Messages:
    42
    Likes Received:
    3
    Home Page:
    how would you return more than one page of results. my method is now using html agility pack but am trying to cut down on the amount code written because overrall, my code is getting quite hard to manage. i think once everything is complete and fully working I might rewrite it to be tidy. All i have left to do is the scraper. everything else is functional. the issue i am having is returning more than one page of results at a time.
     
  9. Stalli0n

    Stalli0n Junior Member

    Joined:
    Nov 17, 2010
    Messages:
    115
    Likes Received:
    83
    Location:
    Europe
    Just iterate through all the pages and return them all at once ^^

    And if you want to keep it clean for example:

    Code:
    private List<string> getSearchResult(string keyword)
    {
        //In there just iterate through all the pages, get the source code and use that to    extract the URLs:
        //private List<string> extractVideoURLs(string html)
    }