Mundane Tasks — Perl to the Rescue

“What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?”
Larry Wall in <1992Aug26.184221.29627@netlabs.com>

I wanted to put all the quotations from my quotation file (a 5.1 MB text file with 74731 lines of text) into different text files. One file for each quotation. I already have an empty line separating each quotation in the quotation file, but copy-and-pasting the quotations into new files and saving them manually would take too long. After all, there are 19834 quotations (it’s a hobby).

So, I decided to dust off my very, very basic knowledge of Perl. And with the help of sites like perlmaven.com hacked together a simple script that does the work for me. Take into account that the script is badly written, likely much more complex than it should be — but it worked well. Putting the quotation file and the script file in the same folder and executing it via Terminal on the Mac (OS X comes with Perl, yeah) and — presto!

use strict;
use warnings;

# open the quotes source file
my $filenameInput = 'all_quotes.txt';
open(my $fhi, '<:encoding(UTF-8)', $filenameInput) or die "Could not open file '$filenameInput' $!";

# needed for naming and to avoid an empty line at the end of each new quote file
my $x = 1;
my $first = 1;

# create the first output file
my $filenameOutput = 'quote_'.$x.'.txt';
open(my $fho, '>:encoding(UTF-8)', $filenameOutput) or die "Could not open file '$filenameOutput'";

# runs through each line in the source file
# if there's an empty line, it creates a new file
# new lines at the end of each line are cut off, but added in front if it is not the first line of a new quote
# this is not programming 101, it's just a hack that saves loads of time
while (my $row = <$fhi>) {
	if ($row eq "\n") {
		close $fho;
		print "$x\n";
		$x++;
		$first = 1;
		$filenameOutput = 'quote_'.$x.'.txt';
		open($fho, '>:encoding(UTF-8)', $filenameOutput) or die "Could not open file '$filenameOutput'";
		} else {
		chomp $row;
		if($first) {
			print $fho $row;
			$first = 0;
			} else
			{
			print $fho "\n".$row;
			}
		}
}

print "$x\n";
close $fho;
close $fhi;
print "done\n";

A task that would have taken days manually was done in less than 30 seconds. And it only took so long because I wanted visual feedback after each created file.

So, if you have a daunting mundane task that can be automatized, it pays to either invest some time into learning Perl (or a similar language) — or to hire a wizard.

BTW, why the need for having the quotations in separate files? I want to put them into DEVONthink (quickly identifies duplicates) and then tag them to assign them to different handbooks. Seemed to be the easiest way. Plus with the content search in DEVONthink you can just enter a few words that appear anywhere in the quotation and it finds it quickly. Much, much easier than searching in a text file.

(And, of course, it would be possible to use not quote_1.txt, quote_2.txt (no problem given the comfortable in-file search in DEVONthink) and instead use the first words of each quotation.)

1 Trackback / Pingback

  1. Mundane Tasks — Perl to the Rescue Part 2 | ORGANIZING CREATIVITY

Comments are closed.