Academic Process Management

If you have talked to me in person for anything longer than ten minutes it will be no surprise to you that I am fascinated by order and efficiently running processes. Among my friends it has become somewhat of a trope to tease me about my inclination to measure stuff, automate things and doing everything the “data” way, which I can’t begrudge at all. I am weird that way.

In any case, since starting my PhD this has kicked into overdrive and I have spent a good deal of time thinking about the process of doing science itself. I wrote about that a little bit in my post about One citekey to bind them and more recently in Science As Pull Requests. In the last half year or so I’ve discussed a couple of other aspects with people, and this post will be a first stab at ordering my thoughts in a more structured fashion.

The problem we as scientists face is that there exists barely any comprehensive language in our field for personal processes we use to do our jobs. Sure, there is a lot of debate about reproducibility, proper statistical procedures and approaches, publishing tools and there are some lone (and incredibly helpful) people out there who try to provide concrete examples and guidance for how to actually do what we are doing. What I mean by this is: we get taught (and teach) the methods of our field, but not the process of how to apply these methods as humans in front of a computer.
Consider this: you are a scientist, sitting in front of your computer. You have a topic, you have some books and other sources, you might even have some data. You know how to read, type and do (if that’s your thing, quantitative) analysis and math and stuff. But how do you mix these things together so that the product is new knowledge? How do you do so in a way that’s at least halfway efficient? We all figure out our own ways to do this, and I think there is much to be gained to discuss these ways more.

This is not to say that there do not exist great resources that talk about individual pieces of the puzzle. Thousands of books have been written about writing, time management, project management, presenting, how to read, how to take notes, how to cite, etc. etc. But what I rarely see is guidance on how to fit these things into one coherent narrative from which I can learn. I think Raul Pacheco Vega and Keiran Healy are two of the very few examples that do provide some guidance in this direction, that take you by the hand and say: here’s a way from start to finish that I find works well.

When I think about doing science, I tend to come up with the following elements:

  1. Reading and understanding what I read
  2. Remembering what I have read
  3. Referencing and citing what I have read
  4. Data Gathering
  5. Data Processing
  6. Data Analysis
  7. Writing
  8. Editing
  9. Gathering Feedback
  10. Publishing

and, as a meta-element that ties all these steps together:

  1. managing myself during a day and from day to day.

For me, seeing these steps in a list helps me think through the process of producing new knowledge. How can I improve this process for myself so that I can be more accurate, faster, and less stressed? Where are the bottlenecks that cost me sleep or nerves? Where are the breaking points that bring the process to a halt?

I’d love to hear what you think about these steps. Am I missing a step? Where do you struggle the most? Any cool hacks, tools or approaches for any particular step or the process in general? Let me know via email or Twitter!

Lukas Kawerau
Lukas Kawerau
Data Engineer, Writer