Originally posted at https://tech.labs.oliverwyman.com/blog/2020/01/31/munger/
Despite the ongoing good work in many places to move to fully digital options, many organisations remain committed to sending you shards of dead tree through the post (particularly the NHS, though I can understand that given how they’ve been burnt in the past). Keeping track of all this paper is tricky, and particularly hard is having the right piece when you need it. So, I needed a scanner setup, and ideally one that could be deployed in my kitchen so incoming items could be fed straight in. Also on the wishlist was not having to think much about it once setup—ideally just ‘insert paper, push the button, scanned document magically appears in shared storage’.
I acquired a Fujitsu ScanSnap S1300i, which despite it’s marketing info saying ‘Scan to Cloud’, doesn’t. The ‘ScanSnap Cloud’ software is meant to be able to do scan-to-Dropbox, but doesn’t. Also, even if it did, it needs to be connected to a Windows or Mac machine to work at all, and I don’t have space for one of those in my kitchen. OTOH, it’s a good scanner so I thought I could reuse one of the many spare Raspberry Pi’s I have around.
I’ve built a tool called munger that does all this (I won’t bore you with the install instructions, but they’re in the repository). It re-uses an updated version of my Raspberry Chef work from 2015, and originally I’d planned to just do something like the various other cloud scanners out there, but the Pi I’ve got spare (a 1st generation Model B, so only 512mb RAM and a fairly anemic processor) took forever to do anything, so I’ve ended up with a ‘the Pi does the scan and then hands over raw images to a server elsewhere’ model instead. I also happened to have a spare screen for the Pi available, so that meant I could provide useful status information so I know what it’s doing along the way.
The full setup looks as follows, and the steps are now:
Open the scanner and insert the pages
Push the scan button
Wait for it to finish scanning and uploading
Wait for the server side work to make the PDF
Along the way (because the server-side Docker image needs it) I made a working netpbm package for Alpine because apparently things like .pnm files are still standard in scanners and so we need that to be able to work with those.