Originally posted at https://tech.labs.oliverwyman.com/blog/2017/07/25/tailgate-calendar-data-for-books/
I’ve had an idea for a while, but like many good ideas it has the problem “but where do we get the data from?”. The idea in a nutshell: Songkick, but for authors. Songkick, for those who don’t use it, is a service that lets you track bands and get told when they announce new gig dates. I don’t have that much interest in gigs these days, but I do want to know when my favourite authors release new books. As with most ideas like this, I don’t want the high-noise feed that is their social media feed, I want the announce-only version. Luckily, I got reminded about Goodreads the other day, and it turns out it has an API….
Well, most of an API. The developer forums are full of posts complaining about it, and I now get why. First problem: OAuth, with no details of which version (1 for those curious), or where the various token endpoints are (but it’s buried in their Python example, so that’s ok). Next problem: users can follow authors, and in theory there’s an endpoint that’ll list those. Except that it only seems to list authors who are also Goodreads users (which is true for some and not for others). sigh Luckily this is public information, so time to break out the Regex (yes, I’m aware of the general thoughts on HTML parsing with Regex). Third item of fun: there’s an “list author books” endpoint, which gives you random editions of a book (and the “list editions” endpoint is “contact us for access”). But wait, if we search for an author we a) get the “best edition” of each book (which is almost always good enough) and b) we can filter out other things with that author name in easily.
On the plus side, I’ve waded through all of that, and managed to make something that actually works, with thanks to Heroku for continuing to give me a stupid number of free app hours. Source is in the usual location.