A new format

Published on 6/17/2015

That thing about PDFs?

I started looking at some new import options for BadaChing. Until now you could import the CSV format available to download on most internet banking sites, and also the old OFX format popularized by Microsoft Money about 15 years ago and which is still offered by the local banks here. The problem with this is that those statement downloads are only available for the last 3 months, or 150 transactions or so, depending on your bank. So if you miss a month or two, your data is incomplete.

The solution is to read the PDF statements we all get sent on email. I have a label in GMail with every bank statement ever sent to me, so I should be able to import any of these. All I had to do was to read the PDF, which is easier said than done. Depending on the library (free, paid or otherwise) this is unstructured text; it is impossible to know all the different combinations that transactions are printed with, and some people will receive theirs in different languages! But the POC worked well enough. At least for FNB statements. I hope to make this available soon with two additional features: a complete statement overview before committing the import (because its unstructured and not guaranteed), but also to ask about duplicate transactions.

Currently only the OFX format supplies unique transaction identifiers, so if the user imports a CSV or PDF, all the program has to go on is the description, date and amount. I had a situation where the same amount at the same vendor and on the same date was transacted twice. BadaChing failed to correctly import this statement. This check is necessary because the CSV and OFX formats can be downloaded at any time, meaning it could contain the same transactions. The PDF statements are of course based on monthly periods, so using these exclusively means the check is superfluous.

Check back for an update in, oh.. December?