Custom bot I wrote for bulk sampling music, useful to anyone else

May 20, 2014

    Jul 16, 2008
    So I actually wrote this bot because I'm producing a record for a major rap artist and I wanted a way to build a huge archive of music to analyze and snatch samples from fast. I turned out way more finished tracks than we need but I'm wondering if anyone has any other applications. I could figure something out on my own but I figure you guys know more than me on this end and we can make it sort of a community project.

    Basically the first step is it uses a few short keyword lists to generate a larger one, then it searches all those terms, downloads every file (multithreaded, a file every 5-10 seconds) and converts them to MP3. Then it creates a corpus from the titles of the videos it got and extracts n-grams to expand the keyword list. Basically you can't exhaust the keyword list, though once in a while it pays to analyze it to change the seed keywords to keep it relevant.

    One thing I noticed is a lot of the videos I'm snatching are definitely copyrighted, but they're obscure enough that they've been up for a long time, have tons of views. Basically, it's easy to filter it into content that is very, very unlikely to get taken down.

    Re-encoding with new photos in bulk seems like a no-brainer, but I'm wondering about monetization. You think AdSense would draw heat? I was also thinking I could easily generate links to a dynamic content locker for the "mp3 download", but unlocking would lead to a script that just downloads the MP3 from youtube (there are plenty, I obviously already have the code).

    I could set this all up easily, but has anyone got the know how to fly under the radar more effectively or inflate views or whatever?