Automating a meme: Compound Movies

14 Feb 2011

Humour 

Earlier tonight dylanbeattie started tweeting a few things with the #compoundmovies hashtag. Basic idea is that you take two films, where the first ends with the same word as the first word of the second and mash them together, and some of them are pretty funny.

Of course, this obviously lends itself to automation, and my suggestion of trawling IMDB got the response from him that they've already supplied the data for me, so no trawling needed. However, IMDB contains far too many movies, most of which have names I don't even know the meanings for, and generating the data for this from the IMDB data will take an insane amount of time. However, there's also Wikipedia, which despite some complaints I'd heard about it's API, appears to be pretty easy to use.

So, here we go, a #compoundMovies generator. Either run it as "compoundGenerator.py imdb" if you've got lots of time, or as "compoundGenerator.py wikipedia <some category name>" (without the "Category:" bit). I tried it out on "British_films" and got back a few interesting things:

I then found "English-language_films" which gets us

(That gets 173k entries, so it's only a sampling).

Anyone else got other good categories? Note that this will also work on other things that aren't movies...

Previously: Repository crawlers for Mercurial (or why you need to learn about revsets) Next: Data visualisation: How weird is our jukebox?