Workshop: Scientific Work — Possible Academic Literature Workflow

Education … has produced a vast population able to read but unable to distinguish what is worth reading.
G. M. Trevelyan

There are countless possible workflows when it comes to reading scientific literature. My own changed over time. I tried out different ways and — for the past few years — I am a huge fan of careful selection and detailed note keeping. I treat the literature I read as the LEGO(R) building blocks for my own works. I must know for each “brick”:

  • where it came from (to avoid plagiarism),
  • how it can be used (enough context information to remember myself what the author really meant), and
  • how much confidence I can have in it (e.g., a meta-analysis is usually more convincing than a case study).

With these LEGO(R) bricks I can build my own articles by using outlines (see, for example, How to Write a Dissertation Thesis in a Month: Outlines, Outlines, Outlines).

So this is my current workflow:

My current workflow for scientific literature (the order is DEVONthink Sources (I) => DEVONthink Sources (II) => iPad (GoodReader) => [box without title below] => Circus Ponies Notebooks => DEVONthink Sources (III) => Mendeley [Update: Until it becomes clear how Elsevier treats Mendeley, I no longer recommend using Mendeley.] — I could have drawn a flowchart but I do not use it, I wanted to show what I use as a reminder). Click on the image to see a larger version. The image is also the background for the virtual screen/desktop that is occupied by DEVONthink.
The workflow seems complicated but is actually pretty straightforward:

  1. DEVONthink Sources (I)
    All the literature I find ends up in a dedicated DEVONthink database called “Sources”. Everything I might need to cite. The files are all PDFs, with the exception of some text files that contain abstracts of interesting articles that I could not download. The files are all in the standard file format (authorname_(authorname_)year). There is a tagging structure (see Workshop: Scientific Work — Managing Literature) although I haven’t (yet) tagged all articles. The files itself are in a-z folders.
    The main function here is to have a place where all the literature is available, where you can quickly see whether you have duplicates (DEVONthink detects them based on the file content, not the file name, which is very useful), where you can tag the files to have some way to sift through the literature, and where you can ‘simply’ put the literature in without having to worry about the correct citation. The last issue is important when you go on a literature hunting spree — why enter the citation if you are not sure whether you need the source. A good literature manager could very well function in the same way, if it either recognizes the literature automatically or allows you to skip entering the information. It’s mostly a repository here, not something you would work in detail with.
  2. DEVONthink Sources (II)
    The next step is a careful selection which texts are are going to read. You can tag the potentially interesting literature (e.g., use a “possible_for_diss” tag or label them in green). Make it a deliberate choice — some good tips are in the Workshop: Scientific Work – Finding & Selecting Literature posting. You have to know what you expect from the source you are reading — you have to (mentally) ask questions to the text while reading. Never read scientific literature like you would read a fiction book — it does not work this way. Even if you read exploratory to find an interesting question you have a question to ask the text: “What is an interesting research question to ask?”
    I copy the interesting articles that I want to read in the near future to GoodReader on my iPad. One way is to simply connect the device to your computer, open iTunes, go to Apps, File Sharing, GoodReader and drag the files into the Documents window. You can also use wireless sync. Once the files are copied I label them in blue (meaning they are now also on my iPad).
    The main function here is to select critically from the vast repository of your literature (which you will quickly have) the papers that bring you the most when you read them. You could also print them if you want to read on paper, or use any other tablet, or move them to another folder/tag them with “reading” and read them on the computer. It does not matter as long as you select critically what to read.
  3. iPad (GoodReader)
    When I start reading the paper I change the file name to add an “_wes” to the end of the basename (e.g., mueller_1970.pdf becomes mueller_1970_wes.pdf). Because when I read it, I will annotate it with my thoughts in notes and highlight the “LEGO(R) bricks” I want to use later. This makes the PDF personal and I would not like to accidentally send this file to a colleague who wants to have the article (you can delete the highlights and the notes, but I would not want to share my thoughts this way). I also would not want to accidentally delete a file I have invested effort in.
    While reading I constantly keep my goal in the back of my mind. The paper should provide me with material/answers to reach my goal (e.g., do an experiment, write a theoretical background for a paper). Anything useful for my goal gets highlighted. I use notes to write down relevant thoughts that apply to the paper. I do not use external notes (e.g., a Word file), because I want to keep my attention on the paper itself. GoodReader allows you to export the text you have highlighted and your notes (E-Mail Summary), but it cannot export figures and it screws up tables, so figures and tables get marked with a “get fig” or “get tab” as note.
    Once I am finished with the paper I send the highlights/notes to my Mac with E-Mail Summary and move the file back to my Mac as well (either by sending it with the annotations per eMail, or, if it is too large, by putting it in the top folder of GoodReader, which makes it available in iTunes once the device is connected).
    Note that the main function here is to find the useful elements in the text. You can do so while reading it on paper. However, highlighting the interesting text and having only this highlighted text immediately available in your Mail Inbox is hard to top. But there are other Apps than GoodReader. For example, Sente (Mac + iPad) allows some interesting annotations and there is a way to export the highlights/notes as well. Chose something you can work with.
  4. [Tidying up the highlighted text/notes]
    Doesn’t have a title in the graphic, but one step after getting the highlighted text and the notes per eMail is to tidy up the notes. GoodReader gives some information that is useful for group work (who highlighted it when), but superfluous for my purpose. There is a nice way to strip this superfluous text with TextWrangler.
    The main point here is that you might to tidy up your highlights/notes a little. If you highlighted it on paper, you have to type it (or open the PDF and copy & paste it), if you made handwritten notes, you have to type them, etc. I would not leave them on paper or in the PDF file, you have to have it available. BTW, check out whether there are scripts that can do the work for you. You will do this often, invest the time to make it as quick and easy as possible for you.
  5. Circus Ponies Notebooks
    Actually, all my Circus Ponies Notebooks (CPN) are in a dedicated DEVONthink database. Sounds strange, but makes backup easier and all my work is done in DEVONthink. To deal with the literature notes I use a couple of CPNs. Update: I just had a problem with corrupted data in one of my CPN files. Support does not think that DEVONthink is to blame, but you might want to be careful here. The CPNs are the lynchpin in my workflow and the data corruption was hard to spot (file opens and handles normally, it is just when you open the affected page that you get an error message and the data is gone). Be careful and (regularly) do separate backups which you do not overwrite! CPN is a brilliant and secure piece of software, but error can happen and this one is devastating!

    1. 0 reading notes.nb
      First I create a new outliner page in the “0 reading notes.nb” (the 0 ensures that the file is always on top of the folder if its sorted alphabetically). Then I copy the text into this outliner page — via Edit – Paste – Past Text as an Outline. This gives me each paragraph as its own outline cell. Next I add graphics and tables (tables as images) from the PDF. I copy the files into the CPN file. Linking has too many disadvantages. Next each cell gets tagged with the source information. Simply write the source information in a cell (e.g., mueller_1954), then highlight the text, right-click and select “Assign as Keyword”. Next select all cells (click outside the cell, then cmd+a), right-click on the highlighted cells, Keywords, Add, select the source information tag (here: mueller_1954). You can see the keywords on the left side of the outline cells when you press cmd+k. Every cell should have the keyword in grey. You should check this if you copy the cell to another notebook — the normal copy and paste should retain the keyword, but there are other paste commands that remove the keyword. Try out the different paste commands to understand how CPN works.
      Make sure to tidy up the notes, e.g. split cells (= former paragraphs) that contain too many information units (careful: when you add new cells they are not tagged with they source information keyword, use ctrl + k to split the cell at the cursor position and maintain the keyword for both cells), add context information, aggregate information, create a hierarchy where your higher-level notes have cells with the highlighted text as sub-cells, etc. This is also a good time to check for any spelling or OCR errors.
    2. “1 to be sorted.nb” and “Read [X].nb
      Once the tidying up is finished I copy the complete outline to two(!) different CPNs. I have 26 CPN files named “Read a.nb”, Read b.nb”, “Read c.nb”, etc. I copy the whole outline (actually, the page) to the respective “Read [X].nb” — i.e., the CPN where the [X] is the first letter of the first author (mueller_1954 goes to “Read m.nb”. This is to ensure that the notes continue to be available later no matter what I do with them. I also copy the outline (the page) to “1 to be sorted.nb” and then — after saving “1 to be sorted.nb” — delete the page from “0 reading notes.nb”. I continue to work with the “1 to be sorted.nb” CPN, if anything goes wrong I have the whole outline available in the “Read [X].nb” and I always know that when it is not in the “0 reading notes.nb” the tidying up is finished, it is secured in “Read [X].nb” and I can cut and paste information from the “1 to be sorted.nb”.
    3. “2 prepare for topic notebooks.nb”
      The last CPN to work with the literature is the “2 prepare for topic notebooks.nb”. I have a couple of notebooks that are topic notebooks. I will write more about them in one of the next postings, sufficient to say they address different areas of my scientific work. The “2 prepare for topic notebooks.nb” has pages that correspond to these topic notebooks. So I put the “1 to be sorted.nb” next to the “2 prepare for topic notebooks.nb” on my screen and cut and paste the cells from the “1 to b sorted.nb” onto the right pages in the “2 prepare for topic notebooks.nb”. It’s quicker and easier than work with the topic notebooks directly.
    4. Topic Notebooks
      Once I have dealt with a couple of articles, I cut and paste the information from the “2 prepare for topic notebooks.nb” into the Topic Notebooks itself. As the topic notebooks contain my ordered notes, this takes thought so I rather do it ‘en block’.
      Note that the main function with the steps here is to get the notes stored for possible later use and put them into the Topic Notebooks. You could skip all the prior notebooks (“0 reading notes.nb”, “1 to be sorted.nb”, “2 prepare for topic notebooks.nb”) and only work with the topic notebooks. Personally, I do a lot of highlighting and note taking, so I need these prior steps. But it’s entirely possible to do the same by highlighting/annotating the text on paper, then use Word to write up the notes. You have to work with it.
  6. DEVONthink Sources (III)
    Given that the source document (e.g., article, book) is now fully processed, I change the status in my Sources DEVONthink database (the label is changed to yellow).
    You could do the same by crossing out the article in a list, or writing read on the print out, or tagging it with “did read it” in a reference manager.
  7. Mendeley [Update: Until it becomes clear how Elsevier treats Mendeley, I no longer recommend using Mendeley.]
    Once I have processed the article and put the information in the topic notebooks, I might want to cite it — or recommend it. Given that I would like to try out the social network function of Mendeley, I use it to generate the APA citation and have an overview of the literature I have actually read. So I enter the information in Mendeley. I trust the software because it can export the citations as BibTeX file (it is no data-island, I can switch to another program if I want to). I also copy the citation to the “Read [x].nb”. Another advantage is that reference managers make it easier to build the bibliography when you use the building bricks to construct your own articles. However, as most are absolutely unsuited to make notes, I only use them in this last step.
    The function here is to have the literature available for later citation in articles. You could — of course — use any other reference manager you like.

So, that’s the workflow I use at the moment. Sounds complicated, it’s not. I have learned from experience that I work best with outlinescontent outlines that is and thus by treating the articles I read as containing the LEGO(R) bricks I need to build my work with — properly referenced of course. It helped me to write my dissertation thesis in a month once the outline for it was finished. But your style might be different. Take it as an idea you can try out, as stimulation. The main focus should always be that you actually advance in your work. It’s easy to get caught in trying to optimize the workflow while doing nothing, and suddenly your contract runs out or your adviser loses confidence in your work. A workflow is best developed when you do your work. Take an occasional day off to reflect on it, change it, then monitor whether you are more productive.

While the programs I use are all Mac programs only (DEVONthink and Circus Ponies Notebook), I would be surprised if there are not programs for Windows or Linux that fulfill the same basic functions. If you have any suggestions, I’d be happy to hear from it, and the other readers who use Windows or Linux probably too.

Off topic: If you are wondering about the images/photos on the screenshot (no copyright infringement intended, BTW, it’s usually private use), they are motivational (for me).


  1. Hi Daniel,
    Thanks for your blog and workflow notes. I have a similar but less sophisticated workflow – I’ve mainly been dumping a lot of web articles into DT. This week I’m starting to use CP Notebooks for outlining.
    My main ‘gap’ has been reference management — I’d hoped to use the citation/scanning features in Papers2 and import the corresponding pdf articles into DT. Unfortunately the P2 folder structure wouldn’t map onto my DT folder structure when I tried it a few months ago – I needed to create a separate folder for indexed pdf files and tag them in DT. So, I’m curious to see how Mendeley will evolve (and I’m still looking for a way to integrate pdf articles into DT and have APA-format citations readily accessible).
    I was also interested to hear that you store your CP Notebooks in DT; I’ll give that a try. Thanks!

  2. Hoi,

    the workflow is not that sophisticated — it just looks this way if you write down each and every step 😉

    Regarding reference manager — it does not really matter which one you use, as long as you like working with it and you can use it to, e.g., automatically create the bibliography in your writings. I really liked Mendeley for their social networking aspects, but after Elsevier bought them … I don’t know. There are other reference managers owned by publishers (e.g., EndNote is owned by Thomson Reuters), but I see a troubling conflict of interest when a “social network reference manager” is owned by a publisher. So I’m having a look the different reference managers again and decide for a new one (for the moment).

    Regarding the storage of files — perhaps it’s one idea to use DT for all articles, papers, books, etc. and put only the read ones (and I mean read, where you have made annotations and highlights and which you want to use) in the reference manager as well (as copies). You have redundancy and you quickly know which ones are available. I’ve done something similar and have simply labeled the ones I have put in the reference manager in DEVONthink as yellow. It allows me on the one hand to act like a squirrel and keep everything I find (it ends up in my sources database in devonthink), but on the other hand to have a list (with correct citations!) separately from that for when I use the information. And I can honestly say that I have read everything that is in my reference manager 😉

    Regarding the APA citations — if you put the read literature in a reference manager (and then add the missing information to automatically build the citation in APA style), you can copy and paste the APA citation and add it to the PDF as text field or note (best: first page, top right or left corner, always in the same place). It is one very easy and future proof way to ensure that you can easily cite the paper.

    Ah, and the Notebooks in DEVONthink — it makes it a bit easier for me to do backups (in addition to the time machine backups) — I simply backup one folder with my databases and my notebooks are backed up too. UPDATE: I had a problem with corrupted data in one of my Circus Ponies Notebook files. Support does not think it’s likely that DEVONthink was responsible, and I am monitoring whether another corruption happens. So I still store my CPNs in DEVONthink, but you might want to keep extra backups.

    All the best


  3. Excellent post! Thanks for sharing your workflow. I use DT, but not for storing my PDFs. Rather, I use Papers2, since it gets all the citation data and makes the papers easy to find. I was curious to know what advantage DT offers you over keeping your papers in Papers2. I like the magic citation format of Papers2. It’s so easy to use, and it would make adding citations in my topic notebooks a cinch.

    What are your thoughts?


  4. Hoi Alex,

    hmm, I had a look at the Magic Citation demo on the Papers 2 website — looks nice. I have made bad experiences with Papers when I tried it out, broke my heart actually. But that was in 2011, perhaps it’s time for a new trial.

    The reason why I use DEVONthink to store my articles in is my workflow. I browse journals and the web in general frequently and when I see something, I get it. I am like a squirrel in this regard. This leaves me with a lot of articles, but not the … motivation to enter them all into a reference manager. The sources are too diverse and my interests too broad to do this quickly — and a literature management database with empty fields just looks terrible. And I do not need the storage facilities either. I read the papers on my iPad, I get a lot of notes, I put these in a Circus Ponies Notebook (Topic Notebooks) with the source information as keyword to each cell. After that I only need to know the correct citation for the keyword (which is how I cite it in the text itself, authorname and year). So that’s what I was (planning) on using Mendeley for (did it manually so far). I wanted to store the papers I have actually read and wanted to use in my writings.

    So the advantages of another repository like DEVONthink is that you can go on a binge and get a lot of literature and not be forced to painstakingly enter all the information about it in the reference manager. Sure, some can try to get it automatically, but when I last tried it I had too many missings. Using the reference manager only for the literature you have actually already read makes more sense to me — a nice filter. BTW, DEVONthink will find the duplicates (in content, not in file names) and the thing I religiously do is the rename the files in the authorname_(authorname_)year format, so there is some content control.

    But it’s an individual thing and I see how it works when I write my next paper. Until then (about a month) I look for a new reference manager now that Mendeley is out of the picture for the moment. I will try Zotero (again), Sente (again) and Papers (again). Could be an interesting comparison. While Papers is also owned by a publisher, it is not that much of a network that Mendeley is (was?) and where I expect a conflict of interest in the future (but I might be wrong).

    All the best


4 Trackbacks / Pingbacks

  1. Workshop: Scientific Work — Topic Notebooks | ORGANIZING CREATIVITY
  2. Beware of data corruption devastating your Workflow Lynchpin | ORGANIZING CREATIVITY
  3. Another academic workflow visualization | ORGANIZING CREATIVITY
  4. Workshop: Scientific Work – Overview | ORGANIZING CREATIVITY

Comments are closed.