I'm looking for a service I could install to archive a huge pile of letters, preferably in PDF form, to a database. I'm living in a country where paper is still king, and digital services are either non-existent, or loathed (Germany). My current situation is that I have a mailbox with lots of PDFs all over the place, but also many folders of paper sent in 2007 etc. that I have to keep, but I also have to find them every five years or so.
So what I'd like to have is a service to my homelab, where I could scan these and copy these, that would index them, clean them, OCR them and all that good stuff. It should have really good metadata abilities, because my files are usually named in a very random way, so if I could copy these, and quickly categorize them, that would be really awesome.
There is one service called Papermerge, that kind of fits to my use-case. I spent one afternoon with it, and there were a few issues:
crashes quite often
when sending a large folder of PDFs, uses all the CPU and crashes again
categorizing functions are not very good, it takes time to get everything together and clean when organizing files
This might not be very interesting if your country has digital services for everything, but for us needing to suffer this paper madness, a service to do so would be great.
The killer feature for me is my networked scanner scanning directly to the paperless consume samba share and the documents just popping up in the inbox fully OCRd and pre-categorised. Pretty magical.
NB, the docs make it sound like a proper DB is optional, but it's really not. Performance was iffy for me with sqlite but is rock solid with Postgres.
This was it for me now, installed paperless-xng, set it up to scan my email folders, copied all random PDFs from my "organized" tax folder and scanned the rest.
Too bad I just happen to have that Brother printer/scanner without SMB or FTP support. So I need to go through the process of scanning on my computer first, then uploading.
Well I do see the advantages of what your suggesting, no depute there. Searching for a specific tag would make my life easier but at what cost?
As I was saying a person - not a company - won't likely be receiving that much important letters to the point you can't simply go through a couple of folders and find out what you're looking for. Paperless-ngx could indeed make me save a few minutes while searching for documents but then, what about the amount of time and effort it would be spending keeping the software running, up to date, backups etc? More importantly, what about longevity? Those kinds of archives are something you may want to get into 10 or 20 years to look for a file and then software you chose might not be around or working anymore.
The extra minutes wasted while searching and having the piece of mind provided by simple folders and PDF files seem to be a good tradeoff as it eliminates the need for databases, upgrades, special servers, formats and whatnot.