In Defense of Spreadsheets

I’ve seen a lot of posts on Metafilter over the last few months about spreadsheets. The first was the new LAMBDA functions in Excel, which, in short, Excel would gain support user-defined functions. While the notion excited me as an Excel power user, a lot of other commenters were concerned that it was giving too much power to the user. There were a lot of examples where a spreadsheet in the hands of a business user grew to be a monstrosity that at once drove the business, and yet was completely unmaintainable. Effectively, a critical application was built without using proper development methodologies.

This phenomenon was further mentioned in a more recent post, sharing a link to Tim Harford’s article “The Tyranny of Spreadsheets”. Harford provides multiple examples of where spreadsheets caused problems, such as when details of COVID cases were lost due to the wrong file format being used, or when genetic researches had to rename sequences to accommodate Excel’s autoformatting tried to turn them into dates. Harford provided Bill Gates’s response to this, which, frankly, sums up the issue quite nicely: “It’s good to have people double-check things.”

The fact is, spreadsheets are the quiet backbone of almost every enterprise. Every IT shop I’ve been part of has had spreadsheets tracking all sorts of critical information, dependent on tribal knowledge, understood by the quixotic decisions of the team that built them, and good intentions to remain viable. Organically, something used by one person starts to be used by their team, then their department, then the whole organization. Ultimately, it becomes almost impossible to step away from.

The Basic Desktop Toolkit

Desktop computing (to mean individual users machines (be they actual desktops or laptops) as opposed to servers) has had roughly the same toolkit for the past thirty or so years: a word processor, a spreadsheet, a presentation tool, and, eventually, an email package. During the DOS days, it was WordPerfect, 1-2-3, and Harvard Graphics (and no real de facto standard for email), though a given individual (or organization) may have gone for an alternative for any of the three.

Today, it’s Microsoft Word, Excel, PowerPoint, and Outlook. People may use some alternative (the Apple, OpenOffice, or the Google suite). There are plenty of things that get added to that toolset (I need Visio for my job; others may rely on Microsoft Access or Photoshop), but most desktops in most schools and offices are going to have a core offering of those four applications, and expect most people will have some level of comfort with them.

Why Spreadsheets

Of these basic tools that everyone has and knows, only spreadsheets offer ways to manipulate data. If someone needs to store a lot of data, be it bookkeeping records or an inventory of baseball cards. Why is that?

Everyone Knows Them (at Least Somewhat) Even across different suites, the core interface and functionality of spreadsheets works more-or-less the same way. It’s probably second only to the formatted text entry that is the intersection of composing an email and typing in a word processing doc. Most spreadsheet users can work in a spreadsheet at some level, even if it’s just navigating around cells with the arrow keys. Most basic formulas and functions will work across programs.
Easy Manipulation of Data Rearranging columns, resorting rows, and basic manipulation through formulas can be done fairly simply, and through a GUI interface.
Intuitive Metaphor Most people have an understanding of a table as a database, with fields as columns and rows as records (or vice versa).
Infinite Canvas Even if this was just to store data, a spreadsheet offers a large amount of space to work. A table in a word processing document can be limited by the notional “page size.”
Power In the right hands, spreadsheets can offer a great deal of power. A knowledgeable user can create an application that gets the job done. Someone who may know the math, formulas, and queries needed to process data can make a powerful tool so without having to understand programming, and can leverage an existing interface.

So What’s the Problem?

There seems to be two related problems. First, while spreadsheets can be extremely powerful, the more complicated they become, and the more heavily they are relied upon, the more it is dependent on the users taking care in how they use them. As Gates said, there is a need for people double-checking things. In a lot of cases, it’s related to “gotchas” that require some understanding of how the tool operates: is it trying to format something that looks like a date as a date? Is there a limitation in the file format. Some of this can be quite surface level–is there a cell in the column that looks different from the others? Other things, like a limitation in file format, may require a deeper understanding of the tool (“an .xls file can have less data than an .xlsx one”), or at least a mentality to error on the side of “newest is best (“Microsoft must have switched to .xlsx for a reason.”).

The other issue goes to the last bullet in the previous section: “Someone…can make a powerful tool so without having to understand programming.” An example from Harford article is international phone numbers. If you just type such a number into a cell, it will drop a leading zero. This is because, to Excel, it just looks like a number, and you can drop a zero in front. The proper way to do it is to format the cells as text (so, regardless of what something looks like, it will be treated as text, and not changed), or precede the entry with a “'” (which is the old school, Lotus 1-2-3 way of doing the same thing).

A programmer would call this data typing–telling the program what sort of information is going to be stored in a variable or field. This would, in turn, limit the program in what it does to the data, dictate how much memory is allocated, etc. A spreadsheet, however, is not “strongly typed” by default. If someone is not aware of this, information can be lost or corrupted before the issue is realized.

Ultimately, this leads to the overarching problem: there is a line where a spreadsheet goes from just being a document or worksheet to a full fledged database or application, and one which is relied on by an enterprise. Often, this threshold is crossed before anyone realizes it. By that point, moving it to a more appropriate platform is difficult, and often met with failure, if only because the impact to other workflows is huge.

What Are the Alternatives?

So, what are the alternatives? One is to get something purpose-built. For a number of fields (like IT infrastructure management), there are off-the-shelf solutions that can migrate off of a dozen inventory spreadsheets. There is a monetary cost, but it’s an option.

If there isn’t an alternative, something may have to be built, with a proper development team, and in a proper database. I have seen this work, and I’ve seen it fail. In the case of the latter, the tool has embedded itself into so many diverse workflows, it proves to be challenging to satisfy everyone. Often, the voice of one stakeholder becomes dominate. Users for whom the new tool is less useful than the spreadsheets will keep using the old tool.

A common proposal in comments is “have everyone learn to code.” I’m not sure that’s the answer. Coding isn’t going to be everyone’s forte, and a lot of time will be spent trying to code something to avoid the spreadsheet. Also, many general purpose languages have some of the same limitations as a spreadsheet. The result will be less useful. Not everyone can be a brilliant writer, mathematician, geneticist, or coder. Besides, would there something where the skills are transferable to the degree a spreadsheet is?

Ultimately, a lot of the same issues would exist with these alternatives. Data can be assigned the wrong type in python just as readily as Excel. An error can replicate in the rows of a database as easily as a spreadsheet. There might be better validity checks introduced, but will it get the job done?

What is the Fix?

Here’s the thing: short of simply taking spreadsheets away, people are going to use them as a “Swiss army knife” for processing data. There does not seem to be a tool that can fill the niche without being too complicated, too specialized, or too expensive. Anything that might fill that niche–some sort of hybrid programming language, user interface, and database–will likely wind up looking a lot like a spreadsheet, and won’t offer many advantages. Saying people shouldn’t be using spreadsheets just isn’t going to happen.

Instead, I loop back to Gates: “It’s good to have people double-check things.” We need to teach people to be aware of the issues they may encounter, such as auto-formatting. Granted, there still is the problem that someone may be great in their field but “just not be good at computers,” but proofreading cuts across fields.

Closing this knowledge gap is also an opportunity for manufacturers to add some guardrails. If people type in a number with a leading zero, it could prompt “is this a number or text?” It’s taking the autoformatting to the next step: instead of assuming you want something, offer a choice and explain the impact.

Ultimately, it has to be understood that a spreadsheet is a tool, and ultimately subject to its user. Spreadsheets will fill gaps where there is no other tool, and even if there is one, it may be good enough relative to the cost or effort of using a more specialized one.