1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

[DEV] PHP crawler for scraping video sites

Discussion in 'Black Hat SEO' started by "Don Vito" Genovese, May 15, 2013.

  1. "Don Vito" Genovese

    "Don Vito" Genovese Junior Member

    Joined:
    May 15, 2013
    Messages:
    119
    Likes Received:
    24
    Location:
    France
    I'm starting a project in the adult video sharing niche, after I've seen someone on the forum talk about an idea like this, I want to try it out!
    I'm thinking about using a simple hosting solution, that's cost-effective, which is a big factor right now, and use videos hosted on other servers as my content;

    what I'm doing right now, is looking for a crawler that I could run locally, that would scrape videos, basically, from a given video site. I have already developped the core crawler;
    what I still need is the ability to reckognize video player footprints and video file urls.

    I want to ask you if you have ever used something similar, and if so, where did you get it from and what results did you have?
    Also, what do you think of my idea? Is it legal? Is it worth it? I've already seen the method applied on other sites, so I know it's possible; do these sites have some kind of affiliation with the ones that are hosting the content? what I mean is: do I have to tell them that I'm going to be using their content? could I be held responsable, legally, for using their content? what experience have you guys had with similar methods?

    and one last question, for the more technical-inclined of bhatters out there: what about cloacked video sources? take booloo for example: I cannot get the real url of the source video, since they use javascript to build the player and the flashvars afte the page has been loaded, so PHP can't get to it ... is there any option to use a browser plugin or make a page with a javascript code that could capture the real video stream source? and then just automatically open a tab for each page I crawl and let javascript extract the sh*t out of it?