Originally posted at https://tech.labs.oliverwyman.com/blog/2015/02/28/tumblr-blog-generator/
TL;DR version – OAuth sucks, Tumblr’s API has some notable faults.
So, a while back I came across the “dice shaming” meme. For those of you who haven’t seen this before, or who aren’t RPG players and so don’t know what’s going on, it’s a bunch of posts of people going “I rolled my dice at this critical point in a game and they screwed me over”. Despite my general dislike of superstition, I’ll admit to indulging a little bit in the idea of particular dice being lucky or unlucky, or having had “all the good rolls used up”, which is a comfort when things screw up!
Having seen this, I had the idea of building a little tool that’d generate a Tumblr blog based off of various people’s posts that have been tagged appropriately. Nice, simple, easy right? Yeah right…
First problem: OAuth is an [expletive redacted] for console apps. If you’re really, really lucky then the API builders have implemented XAuth which will let you swap username/password for tokens, but if you’re Tumblr, then XAuth is locked down and they’ll only enable it for when a “trusted app is ready for mass distribution” (actual quote). Right, fine, so hacky solutions it is then. Luckily this is being written in Clojure, so spinning up a temporary web server is a doddle (thank you HTTP Kit and Compojure). I’m literally spinning up something on localhost so that I can use the OAuth redirect mechanism to get the tokens and dump them back out onto the filesystem, it’s actually that hacky. But hey, it works!
Second problem: OAuth implementations vary. According to the OAuth 1.0 (and 1.0a) specs, there’s both Authentication and query-string parameter options for the auth tokens. However, they’re “recommended” and “should” respectively, which in the real world seems to mean “We’ll support one of them. Read our documentation very carefully”. Sadly the Clojure OAuth library I was using went with query-string for all it’s examples (which appears to be more commonly supported), where as Tumblr only supports the Authentication header. Luckily there was at least a method I could call to get them (here’s how I use it), but that took far longer than it should to work out that this was the problem, and because this is a security thing nice error messages aren’t something you’re going to get (“you don’t have an Authentication header at all” for example would not compromise things, while telling developers they’ve screwed up).
Right, tokens acquired, OAuth signing sorted. The nice bit about this sort of app these days is that Clojure + Light Table is a really nice development situation for playing around with things. The -> macro is also insanely awesome for pulling data out of an API and manipulating it into different forms.
And then we run into problem number 3: Tumblr’s API design. Specifically, the “tagged” method I’m using supports a “get things before this timestamp” but not an “after” parameter. Now if it supported “after”, I’d just set it to the beginning of time (or the unix epoch), walk through things to the present day, then record the last timestamp. Next time I run things, give me everything after that timestamp, and I can know I’ve got a complete set of posts. With “before”, I can keep walking backwards, but there’s no sane way to get all the newer posts, and the API call overhead for updates is a bit nuts. TBH, the only good reason I can think of for this is to avoid people doing this sort of complete trawl (or there’s some sort of weird architectural reason for the choice). I’ve emailed them about this, but not betting on a response.
Given this problem, I don’t actually do the timeline walking, and instead just post any missing items from the first 20 results (maximum you can get back in one go). Naughty Dice is functional and the project itself will let you specify any tag you like, so others can at least use it to some extent.