“We did everything we said we were gonna do and nobody can take that away from us …”
Zack to Sheridan in Babylon 5: “Sleeping in Light”
Like written in the previous posting, I wanted to digitize the contents of my book shelf … and I did it. Last weekend, I cut and scanned 109 books. Yes, one hundred and nine books. OCR is also finished, as is reducing the file size (my MacBook was busy at night).
Here are some data:
- Size of scanned books (original): 8.19 GB
- Size of OCR’d books (with Acrobat 9): 4.79 GB
- Size of Reduced File Size books (with Acrobat 9, compatible with Acrobat 9): 1.48 GB
The reduced file size is still readable, although in some cases the compression is noticeable if you look closely at the text. But for reading them on the screen the quality is very well and turning the virtual pages is snappy with the reduced file size versions. However, I will keep the original scans. Color is compressed here as well (medium quality), but the quality is better than the reduced file size versions, and I might need them.
Why did I cut (and thus destroyed the physical version of) my books? There are a few reasons:
- I wanted to have them easily available, at work and at home, or in general, no matter where I am.
I cannot carry them with me physically, but if I have them in digital form, I can. At the moment, they even are on my USB-stick which I carry in my wallet, in best quality and in reduced file size. Try doing that with a book shelf.
- I wanted to remove the distance to open them and retrieve information.
Yup, the bookshelf is only a few meters away, but it’s still easier to open them digitally and use the search function to go to the information I need. I will put them into my Wiki where they are easily available and where I can store the notes with the PDF.
- I want to have control over the content.
Sure I could have bought them digitally as eBook, but I did not want to buy them again, especially not with DRM. I want to have PDFs that I can use in any way I want — copy whole paragraphs into my notes, search them, extract pages or images, the works.
- Books are cheap.
I once sold about 70cm3 of books for 50 €. I paid … I don’t know, probably 500 € or even 1000 € for them originally. But if you have books where there are a million copies around, it’s pretty much worthless. So, instead of selling my books for almost nothing, I digitize them. I still have them available, but they take up no space.
I noticed the following advantages to digitizing one’s bookshelf this way:
- You end up with one virtual page equals one printed page.
Two book pages on one scanned page (which you would get if you open the book and put it on a flatbed scanner or copier with a scan unit) might save paper if you want to print it, but printing is not the reason for scanning. The reason is to read it on screen and easily make notes with the same medium and for this one page on one scanned page is ideal. If you want to print it some day you can still use the driver of the printer to put two scanned pages on one printed page.
- No black areas.
If you are using a scanner with automatic size detection you get scans that have no black areas which cost a lot of ink/toner and look atrocious.
- It’s fricking fast.
I am really fast in scanning books with a normal copier with a scan unit, where you have to put on each page spread and press a button. I did it during my time at the university when I was working as a student assistant, and I did it afterwards. I am a squirrel in many ways and I like to keep what I find. But I also want to work unencumbered by stuff that slows me down. Digital is the way to go and over time I really got fast in digitizing stuff. But this way surprised even me. It’s just so … fast.
Here are some tips for digitizing books (by gutting them):
- Go in brackets.
For example, first scan all books about climbing, then all books about photography, and so on, so you ahve specific blocks of books that you have digital. Dealing with categories is more motivating than having 12 of 100 books scanned.
- Start with the most precious book you want to scan but leave the most important category for last.
Afterwards, you have no qualms about scanning the “lesser” books. It’s kinda like dealing with pain, the worst pain comes first, then you know whatever comes next won’t be as worse and it’s easier to bear. Then again, keep the most important category for last. This will motivate you to go to the end.
- Don’t try to compete with the scanner.
First I tried to cut books while the scanner was busy with the 25-30 pages it was scanning. Only the scanner beat me each and every time. I tried to work faster and this nearly destroyed some books (destroyed in the sense that I cut into the text). So I stopped competing with the scanner, which also meant stopping to parallelize the work. I first cut all books belonging to one category and then I did scan them (in some cases, I turned the pages of the block that was scanned next while the scanner was working to ensure that the pages are really separated from each other and do not cause a paper jam). But competing with the scanner … nope, the scanner is just to fricking fast.
Cutting the books:
Important note: Working with a cutter/blade is dangerous. If you are too stupid to realize that, don’t do it. I do not take any responsibility to what happens to you, your fingers or skin or whatever if you try to cut your books.
- Do not cut where you scan.
It might be nice to scan and at the same time (and place) cut the next book. Don’t. The cutting creates dust which will gravitate to the scanner and you have to clean it more often. Cut away from the scanner. I did the cutting in the “kitchen” while the scanner was on my desk about five meters away.
- If you hurry a book dies needlessly.
Cutting is delicate work and it’s very easy to cut into the letters on the page. So, take your time and cut carefully — many strokes with small advances cut the book.
- Remove the cover first.
Even with paperbacks it’s more easy to cut if you remove the cover first. This often includes the first page, which is glued to the cover differently than the rest and can give you the impression of cutting farer away from the text than you really are (i.e., a book will die). It’s much easier to cut if the cover (including the back cover) is out of the way and you are only dealing with the book block.
- Mind the distance.
Sure, you do not want to cut into the text, but removing the spine is a trade off between being too close to the letters and having a paper jam because the pages still stick together (as if they were stapled). In some cases I had to manually turn each page (and for some pages, manually separate them) to make sure they are really separated because the text was close to the spine and I couldn’t keep more distance to the glue and the threads with which the pages are bound.
- For book blocks that are like wood, cut the block into thinner pieces.
Cheap paper dream to cut, a few strokes and you’re through a 400 pages book, but some books are printed on high quality paper that is thin, sticks to each other and handles like wood if you want to cut through it. In these cases it can really help to separate the book block into smaller units. Open the book block and cut straight through the middle to separate the currently open spread. You have two (thinner) book blocks that you can cut more easily (or repeat the cutting procedure until you have really thin book blocks).
- Use a metal ruler to assist the blade (and use the other side for it).
If you try to use a cutter with a plastic ruler the cutter will happily chip away the plastic. Metal is the way to go here. Make sure that you use the other side of the ruler, i.e., not the flat side with the size markings but the other, higher side. You don’t need the size markings but you need some elevation to make sure the blade doesn’t jump over the ruler.
- Don’t force the blade or extend it too far.
If you never learned it as a child, learn it now: never force the blade. And if you work with a cutter, never extend the blade too far. How much is too far? The blade will tell you with a “sploink” — the sound of the thin cutter blade breaking and jumping in front of your eyes (in my case, luckily, sideways). If you can’t cut deeper, remove the metal ruler and (if necessary) some pages. Usually the cutter should for the cut open which is more helpful than a longer blade, as it only gets stuck and breaks.
- Elevate the blade more in the lower part of the book.
If you cut vertically (to you) elevate the blade more in the lower part of the book and use a little more force. The beginning and end of the cut are usually not as deep. I found it helpful to align the self-healing cutting mat with the book and the table so that I could cut into the open (mind the blade and do not stand in its way!).
After cutting the books:
- Try to remove as much dust as possible.
Cutting the books produces a lot of dust and dust in the enemy of any photographic unit, including scanners. Dust leads to dots on the scanned page and to vertical stripes. So, shake the pages and brush the cutting area.
- Remove any traces of glue from the spine and the front cover page.
Unless you are really lucky and estimated the distance just right you will have traces of glue on the pages. This is often the case with the first page which is glued more strongly to the cover. The problem is that many scanners works with high temperatures (high meaning high enough to melt the glue). If the glue melts it sticks to the scanning area and it’s a bitch to clean.
- Photograph the hardcovers
In some cases the covers were printed on sturdy paper and could not be scanned with the document scanner. In these cases I used my iPhone to simply take a photograph of the cover and used this photograph as cover for the PDF.
Scanning the books:
- Get a microfiber rug to clean the scanner.
If you see signs of dust (e.g., small dots on the scanned pages of high quality paper or vertical lines in images) clean the scanning area. I used a microfiber rug that belongs to my DSLR camera which worked really well.
- Use the profiles
I created profiles for Cover (simplex, color), b/w book block (duplex, b/w), color book block (duplex, color), grayscale book block (duplex, grayscale) and single gray page (if a book has only a few photos, I scan it in b/w and then scan the pages that have grayscale photos, later I replace them in the PDF; simplex, grayscale; for b/w books with a few color pages I use the cover settings). All color scans were with medium compression because I do not want to scan artwork but have a readable version for the screen.
- Avoid paper jams in the tray
The scanner I used does seem to have a slight design flaw. In the paper tray there is a slight elevation that causes some paper sizes to jam. No big deal, because it’s in the tray and the pages only get curled, but it’s annoying and screws the page order. However, with some paper and some tape, you can smoothen out the tray and avoid these jams (see photo).
- Make sure that each and every page is scanned
I’ll still have to check the 109 PDFs whether each page was scanned (check with the page numbers). It should be all right, the scanner has a function that gives an alert if multiple pages were scanned, but mistakes happen. Only after I have made sure that each page was scanned I will throw away the remains of the books.
- Do OCR and Reduce File Size later.
OCR and Reduce File Size can be done later with Acrobat 9 by selecting multiple documents at once and let Acrobat work over night. I did this after I combined the scanned files of one book into one document when I way away or sleeping. Just make sure you do not overwrite the original files (Acrobat asks you what to do) because OCR and Reduce File Size will reduce the file size of the scans and likely also the quality — and one day you might need the original quality. Some scanners offer to do OCR while scanning but I wouldn’t do it. It probably takes time — even if it’s only one or two seconds per scan, it’s time you could be using for something else. Let Acrobat do this when you are not working with your computer.
In retrospect I get dizzy when I compare the time it took me to digitalize the books this way compared to the time it would have taken me with my old flatbed scanner. Instead of cutting the books, putting them in and going zzz, zzz, zzz … finished, it would have been a long process even to get a single page scanned (open the book, put the correct spread on the flatbed scanner, press the button each time, waiting until the scanner heats up, waiting the painful seconds it takes to scan the page, correct the scanned page with the software, etc. pp. And it wouldn’t have been that much better for the book. Sure, the book would still be a book, but opening the book far enough to scan the whole spread (and avoid black bars in the middle where the paper is in some distance from the glass) would also have damaged the spine.
I guess it’s another strong reminder to make damn sure you use the right tools and the right technique to do the job, or to quote someone: “If you have eight hours to cut down a tree, it is best to spend six hours sharpening your axe and then two hours cutting down the tree.” And despite the high costs of destroying the books, this way worked admirably.