Crawling your own website

«The point is, Sturmbannführer, that this file has been ten years maturing in the Gestapo registry – a little here, a little there, year in, year out, growing like a tumour in the dark. And now you’ve made a powerful enemy, and he wants to use it.»
“Fatherland” by Robert Harris

I was interested in what I have written over the past 10+ years, from my first posting on the 31st of May, 2009, up to today, well, 17th of January, 2021. So I wrote a script to download the pages (and images) with R (there are nice options to parse a website). And yeah, I could have done the same with a database export and via FTP.

Looking at 1244 postings, the the word clouds of the titles, categories and tags are interesting (to me):




That’s the fun thing about writing a blog, at least when it’s about a (sometimes very) general theme. You keep a journal about your thoughts related to a specific topic. And given that it’s public, you have an additional reason to do it well.