5 ways to write readable, science code

michael saminsky
Positive Peer Pressure
5 min readSep 20, 2017

--

Abstract: Write your code for others.

Dear Sam,

If you are a researcher in science today, it is a guarantee that you will have to write your own code. You will also have to debug someone else’s code. On more than one occasion, you will have to debug someone else’s code that will be 100% unintelligible to you. It could be a script that someone who used to be in your lab wrote years ago to pull data off some instrument that you now have to use (and which has a manual on the company’s website but it’s behind a paywall, and the only pages you can find online are in Japanese). It might be code for a model from a journal paper and you asked the author to send you the code. It may even have been written by you, and you’re re-reading it a few years/months/weeks/days later and have no idea what’s going on.

Science code is particularly wide-ranging in quality because you have people writing it with different background and experience — from first-timers to software developers. On the one hand, that’s great because you don’t have to go through years of specialized schooling to be able to write scripts and programs to automate your workflow, or analyze your data

On the other hand, without formal training means it also means that a lot of code in the science world is stylistically… less than ideal. A single script might be frankensteined out of cut-and-paste code snippets from stack overflow, chunks of uncommented code, or large sections of heavily commented code made up mostly of notes to self (“# Fix this later”, “# How is this supposed to work?”, “# I’m so sorry, I’m so sorry, what have I done, I’m so sorry, god forgive me.”).

I know this issue well, because I have had to read my own code that I’ve written from years’ past and it is bad. Long, confusing variable names, no information on what the script was used for, no indication of whether it was ever implemented or not, no versioning, and cluttered with unhelpful comments.

This year I’ve been lucky enough to have a great mentor in my lab, Steve, who is a software engineer/perl programmer/Linux Wizard. He wrote much of the code that glues together the processing and image acquisition we rely on to run HabCam and analyze/visualize the data back on shore. Many days I have sat beside him, in wonder, watching (or as Steve generously calls it, “pair programming”) as he turns 5-lines of code into operational, problem-solving programs. He has patiently taught me an incredible amount about programming, writing clean code, and thinking more logically. I’ve also had the chance to read thousands of lines of code he has written, which has given me a chance to see what well-styled code looks like.

From this learning experience, I have gathered five language-agnostic principles which I recommend to any and all scientific coders who want to make their code more readable and more useful to others:

1. Write your code for a beginner audience

In research, there is a really good chance that your code will be read by others. There is an even better chance that your code will be read by someone who has very little coding knowledge. Your code may be looked at and used many years after you’re gone (gone like graduated or changed jobs, not gone like dead. Maybe gone like dead.), so it should be accessible by most people who need to read it and understand what’s going on.

The best way to write like this is to make as few assumptions as you can about what someone reading your code will know. There is the ever-present pitfall where the better you are at programming, the more you forget how non-obvious certain elements of it are.

2. Comment everything

I know this sounds obvious, but it’s easy to forget that comments are not just for you — they are the guide lines for anyone who will be reading your code . Comment and describe pre-built functions that perform some great bit of magic, variables that serve un-intuitive purposes, and hard-wired constants. It takes more time, but it creates an end-product that becomes a self-contained lesson for people to learn from.

3. Include metadata

This is something I saw in a lot of Steve’s code, which I now include in all my scripts. Each script has a 10–15 line header that provides an overview of the code: including the program name, basic usage, and a log of edits. The log of edits provides context to the document and shows you how it has changed over time, who else has been working on it, and how long it’s been around.

Also, this is a great section where you can give credit where credit is due — include their contributions here.

4. Simple over “elegant”

Ok, so you’ve learned that great new programming technique and it’s beautiful and shrinks all your loops down to single lines, and you’re all hot and heavy to start using it everywhere (I’m looking at you, recursion). Or maybe there’s a new framework or library that everyone has been going on about and you want to start implementing it. There’s nothing wrong with that, but keep in mind that adding in things that might be unfamiliar to most people makes your code more difficult to read and debug if an issue arises and you’re not there.

If you can do the same thing with simple structures and it’s not any less effective, use simple structures.

This also means that when you can, you should opt for fewer dependencies. Each library or module required to run your code is another potential failure point when you move the script to another machine. Depending on your research field, your code might also be running in remote areas with crappy internet (e.g.,on a ship), so downloading dependencies is not always an option

5. Use Vim

This isn’t a programming principle, nor will it make your code more readable, BUT it is my favorite thing I’ve learned this year and this is my blog so it stays. Vim is a command-line text editor with tons of keyboard shortcuts that make text editing not suck. After the initial and intense learning curve, you will start saving a lot of time when writing code or editing text files thanks to its many sensible features: unintuitive keyboard shortcuts that are a bit of a pain to memorize but once you know them, greatly speed up text editing (and they reduce finger movement, goodbye finger fatigue); macros that allow you to record key strokes and then re-run them multiple times; and easy-to-use search and replace functions (that can take regex). Because it is a command-line editor, you don’t need to use your mouse or open and close other windows. Also, vim comes pre-loaded on every unix system (as well as the tutorial vimtutor), so it’s guaranteed to be there on most computers.

Love,

Mike

--

--

Likes: fisheries + ocean monitoring, smart + responsible use of technology, Jacques Cousteau, people doing stuff in low gravity, giving a good stink eye.