Finding new albums by old bands
29 Dec 2011
Technology
Originally posted at https://tech.labs.oliverwyman.com/blog/2011/12/29/finding-new-albums-by-old-bands/
I’m once again visting music-related problems, with a look at a different aspect of the “discovering new music” problem. Now, the LShift jukebox is very good at introducing me to new and weird artists, and it also occasionally tells me about other work by artists I’m already aware of, but this is all rather haphazard.
The core problem I’m trying to address is as follows: you find a new artist, and you really like them and buy all their CDs. Problem is, is that there’s nothing notifying me when they do a new release, and given new releases are a relatively rare event, I forget to check… essentially I want Songkick, but for CDs (or MP3 albums, but the principle still applies).
Enter Missing Albums. It works as follows:
- Trawls through your existing music collection, and finds all the bands you’ve got at least 4 tracks of (which tends to eliminate most of the uninteresting bands), and gets an album list for each based on the tags of those files (presence of any track from an album is assumed to imply knowledge of that album)
- Grabs the Musicbrainz data for those bands, and establishes a list of albums for each.
- Find the dates for each existing album for a band, determine the newest one, and get the list of all albums for that band after that date.
- Grab Amazon price data and album cover for each newer album and spit out a list of albums in reverse chronological order. Genshi is used to give this nice formatting.
Results look like this:
The tool manages to solve my core problem i.e. finding out about new albums, but it’s still got a number of flaws:
- Musicbrainz is quite slow, and so this is a batch processed command-line app for the moment.
- The only identifier we’ve got for a band is it’s name, and that’s non-unique in a large enough set of cases. Brian Whitman’s excellent article on the issues with Facebook’s music ID system illustrates this well, and I’ve got some pretty nasty cases just in my collection (e.g. I have both tracks from the Belgian rock band and Trance artists called “Deus”, and there’s another 2 entries for it in Musicbrainz; there are 5 entries all called “Muse” and 7 called James). In this tool, I’ve hacked around the issue by ignoring anyone with no albums listed on Amazon, which tends to thin things down to just the big commercial bands.
- My artist tagging scheme mostly lists bands starting with the word “The” as “Foo, The” rather than “The Foo”, but telling that both are the same is again a bit of a problem due to the same disambiguation problem
- Sometimes a band isn’t present at all in Musicbrainz, or there’s no Amazon data for an album. I’ve solved this by either a) adding the data into Musicbrainz, or b) adding artists to an “ignore” list. The missing-albums.py script takes as one of it’s arguments an “overrides” file which is an ini-style file containing potentially two sections; first is “artist”, which rewrites any artist names I’ve drastically misspelt and the second is “ignore” which is a list of artists to skip (e.g. my meta-artist name of “Theme” which I’ve marked most of the TV/film tracks I’ve got)
On the other hand, all these issues aren’t too awful for a command-line tool. If I was trying to make a slick web interface for this, they’d be major problems, but I’m willing to skip the odd dodgy result as the results are very usable already, and have resulted in some purchasing already.
TODO