dannyhw
Senior Member
- Jul 16, 2008
- 979
- 476
So I actually wrote this bot because I'm producing a record for a major rap artist and I wanted a way to build a huge archive of music to analyze and snatch samples from fast. I turned out way more finished tracks than we need but I'm wondering if anyone has any other applications. I could figure something out on my own but I figure you guys know more than me on this end and we can make it sort of a community project.
Basically the first step is it uses a few short keyword lists to generate a larger one, then it searches all those terms, downloads every file (multithreaded, a file every 5-10 seconds) and converts them to MP3. Then it creates a corpus from the titles of the videos it got and extracts n-grams to expand the keyword list. Basically you can't exhaust the keyword list, though once in a while it pays to analyze it to change the seed keywords to keep it relevant.
One thing I noticed is a lot of the videos I'm snatching are definitely copyrighted, but they're obscure enough that they've been up for a long time, have tons of views. Basically, it's easy to filter it into content that is very, very unlikely to get taken down.
Re-encoding with new photos in bulk seems like a no-brainer, but I'm wondering about monetization. You think AdSense would draw heat? I was also thinking I could easily generate links to a dynamic content locker for the "mp3 download", but unlocking would lead to a script that just downloads the MP3 from youtube (there are plenty, I obviously already have the code).
I could set this all up easily, but has anyone got the know how to fly under the radar more effectively or inflate views or whatever?
Basically the first step is it uses a few short keyword lists to generate a larger one, then it searches all those terms, downloads every file (multithreaded, a file every 5-10 seconds) and converts them to MP3. Then it creates a corpus from the titles of the videos it got and extracts n-grams to expand the keyword list. Basically you can't exhaust the keyword list, though once in a while it pays to analyze it to change the seed keywords to keep it relevant.
One thing I noticed is a lot of the videos I'm snatching are definitely copyrighted, but they're obscure enough that they've been up for a long time, have tons of views. Basically, it's easy to filter it into content that is very, very unlikely to get taken down.
Re-encoding with new photos in bulk seems like a no-brainer, but I'm wondering about monetization. You think AdSense would draw heat? I was also thinking I could easily generate links to a dynamic content locker for the "mp3 download", but unlocking would lead to a script that just downloads the MP3 from youtube (there are plenty, I obviously already have the code).
I could set this all up easily, but has anyone got the know how to fly under the radar more effectively or inflate views or whatever?