Wednesday, December 06, 2023


In May 2009, Google hosted an internal "Design Wizardry" panel, with talks by Jeff Dean, Mike Burrows, Paul Haahr, Alfred Spector, Bill Coughran, and myself. Here is a lightly edited transcript of my talk. Some of the details have aged out, but the themes live on, now perhaps more than ever.


Simplicity is better than complexity.

Simpler things are easier to understand, easier to build, easier to debug, and easier to maintain. Easier to understand is the most important, because it leads to the others. Look at the web page for One text box. Type your query, get useful results. That's brilliantly simple design and a major reason for Google's success. Earlier search engines had much more complicated interfaces. Today they have either mimicked ours, or feel really hard to use.

That's But what about what's behind it? What about GWS? How you do you invoke it? I looked at the argument list of a running GWS (Google Web Server) instance. XX,XXX characters of configuration flags. XXX arguments. A few name backend machines. Some configure backends. Some enable or disable properties. Most of them are probably correct. I guarantee some of them are wrong or at least obsolete.

So, here's my question: How can the company that designed be the same company that designed GWS? The answer is that GWS configuration structure was not really designed. It grew organically. Organic growth is not simple; it generates fantastic complexity. Each piece, each change may be simple, but put together the complexity becomes overwhelming.

Complexity is multiplicative. In a system, like Google, that is assembled from components, every time you make one part more complex, some of the added complexity is reflected in the other components. It's complexity runaway.

It's also endemic.

Many years ago, Tom Cargill took a year off from Bell Labs Research to work in development. He joined a group where every subsystem's code was printed in a separate binder and stored on a shelf in each office. Tom discovered that one of those subsystems was almost completely redundant; most of its services were implemented elsewhere. So he spent a few months making it completely redundant. He deleted 15,000 lines of code. When he was done, he removed an entire binder from everybody's shelf. He reduced the complexity of the system. Less code, less to test, less to maintain. His coworkers loved it.

But there was a catch. During his performance review, he learned that management had a metric for productivity: lines of code. Tom had negative productivity. In fact, because he was so successful, his entire group had negative productivity. He returned to Research with his tail between his legs.

And he learned his lesson: complexity is endemic. Simplicity is not rewarded.

You can laugh at that story. We don't do performance review based on lines of code.

But we're actually not far off. Who ever got promoted for deleting Google code? We revel in the code we have. It's huge and complex. New hires struggle to grasp it and we spend enormous resources training and mentoring them so they can cope. We pride ourselves in being able to understand it and in the freedom to change it.

Google is a democracy; the code is there for all to see, to modify, to improve, to add to. But every time you add something, you add complexity. Add a new library, you add complexity. Add a new storage wrapper, you add complexity. Add an option to a subsystem, you complicate the configuration. And when you complicate something central, such as a networking library, you complicate everything.

Complexity just happens and its costs are literally exponential.

On the other hand, simplicity takes work—but it's all up front. Simplicity is very hard to design, but it's easier to build and much easier to maintain. By avoiding complexity, simplicity's benefits are exponential.

Pardon the solipsism but look at the query logging system. It's far from perfect but it was designed to be—and still is—the only system at Google that solves the particular, central problem it was designed to solve. Because it is the only one, it guarantees stability, security, uniformity of use, and all the economies of scale. There is no way Google would be where it is today if every team rolled out its own logging infrastructure.

But the lesson didn't spread. Teams are constantly proposing new storage systems, new workflow managers, new libraries, new infrastructure.

All that duplication and proliferation is far too complex and it is killing us because the complexity is slowing us down.

We have a number of engineering principles at Google. Make code readable. Make things testable. Don't piss off the SREs. Make things fast.

Simplicity has never been on that list. But here's the thing: Simplicity is more important than any of them. Simpler designs are more readable. Simpler code is easier to test. Simpler systems are easier to explain to the SREs, and easier to fix when they fail.

Plus, simpler systems run faster.

Notice I said systems there, not code. Sometimes—not always—to make code fast you need to complicate it; that can be unavoidable. But complex systems are NEVER fast—they have more pieces and their interactions are too poorly understood to make them fast. Complexity generates inefficiency.

Simplicity is even more important than performance. Because of the multiplicative effects of complexity, getting 2% performance improvement by adding 2% complexity—or 1% or maybe even .1%—isn't worth it.

But hold on! What about our Utilization Code Red?

We don't have utilization problems because our systems are too slow. We have utilization problems because our systems are too complex. We don't understand how they perform, individually or together. We don't know how to characterize their interactions.

The app writers don't fully understand the infrastructure.

The infrastructure writers don't fully understand the networks.

Or the apps for that matter. And so on and so on.

To compensate, everyone overprovisions and adds zillions of configuration options and adjustments. That makes everything even harder to understand.

Products manage to launch only by building walls around their products to isolate them from the complexity—which just adds more complexity.

It's a vicious cycle.

So think hard about what you're working on. Can it be simpler? Do you really need that feature? Can you make something better by simplifying, deleting, combining, or sharing? Sit down with the groups you depend on and understand how you can combine forces with them to design a simpler, shared architecture that doesn't involve defending against each other.

Learn about the systems that already exist, and build on them rather than around them. If an existing system doesn't do what you want, maybe the problem is in the design of your system, not that one.

If you do build a new component, make sure it's of general utility. Don't build infrastructure that solves only the problems of your own team.

It's easy to build complexity. In the rush to launch, it's quicker and easier to code than to redesign. But the costs accumulate and you lose in the long run.

The code repository contains 50% more lines of code than it did a year ago. Where will we be in another year? In 5 years?

If we don't bring the complexity under control, one day it won't be a Utilization Code Red. Things will get so complex, so slow, they'll just grind to a halt. That's called a Code Black.

Monday, August 22, 2022

My black body story (it's physics).

 I studied physics in university, and at one point asked a professor if I should learn German, because it seemed all the key texts of early 20th century physics were written by German-speaking physicists in German journals such as Annalen der Physik. But my prof assured me that was not needed, insisting that my own native language would serve me well. And he knew German, so his advice seemed sincere.

In the end he was right, but I still have an occasional pang of regret about never learning another language well enough. Wouldn't it be nice to read Einstein in the original? The E=mc² paper is astonishing in its precision and brevity even in translation. (Although later physicists have criticisms of it, it remains a marvel). I did eventually pick up a bit of German, but not enough for Einstein.

Which brings me to Max Planck, who first quantized light, or so I was told.

By the third year of undergraduate physics, I had been taught the same derivation of the resolution of the "ultraviolet catastrophe" at least three times. The classical (19th century) physics theory of a black body was clearly incomplete because the energy emitted was unbounded: higher and higher frequencies of light (the "ultraviolet")  contributed ever more energy to the solution, leading to infinite energy regardless of the temperature (the "catastrophe"). By quantizing the light, Planck tamped down the runaway energy because as the frequency increased, the energy required to fill the quantized slots (photons) was no longer available, and the spectrum died down, as it does in the real world.

By the third or maybe fourth or fifth rendering of this story in class, I began to wonder: Why is the story always told this way? Why is the derivation always exactly the same? It almost seemed like a "just so" story, with the conclusion leading the analysis rather than the other way around. (But read on.) Or perhaps some pedagogue found a nice way to explain the theory to students, and that story was copied from textbook to textbook ad infinitum, eventually to become the story of the invention of quantum mechanics. In short, I was being taught a way to understand the physics, which is fine, but not how the ideas came to life, which left me wanting.

I wanted to investigate by going back to the original paper by Planck. But I couldn't read German, and as far as I knew there was no English translation of the original.

I visited my prof after class and asked him if he would help. He enthusiastically agreed, and we went off to the library to find the appropriate issue of Annalen der Physik. Fittingly, the paper was published in 1900.

Slowly, we—mostly he—worked our way through the paper. It was delightfully, completely, and utterly a 19th-century answer, not the 20th century one that was taught. Planck used thermodynamics, the jewel in the crown of 19th century physics, to derive the black body formula and create the 20th century. The argument was based on entropy, not energy. It had nothing to do, explicitly at least, with photons. But by focusing on the entropy of a set of indistinguishable oscillators at distinct frequencies, he could derive the correct formula in just a few pages. It was a tour de force of thermodynamic reasoning.

Why not teach us the historical derivation? The answer was now clear: This was a deep argument by a towering figure of 19th century physics, one that was beautiful but not a good fit for a 20th century, quantum-capable mind. Yes, Planck quantized the energy, but he did it as a mathematical trick, not a physics one. It was Einstein 5 years later, in his paper on the photoelectric effect, who made photons real by asserting that the quantization was not mere mathematics but due to a real, physical particle (or wave!). Us lowly students were being taught in a world that had photons already in place. Planck had no such luxury.

Another point I learned later, through the books of Abraham Païs, was that Planck knew the formula before he started. Using brilliantly precise experimental work by people in his laboratory, he recognized that the black body spectrum had a shape that he could explain mathematically. It required what we would now call quantization, a distribution over distinct states. In one evening, he found a way to derive that formula from the physics. It was not a matter of quantizing and finding it worked; quite the reverse. The analysis did in fact start from the conclusion, but not as we had been taught.

It's a lovely side detail that Einstein's Nobel Prize was for the photoelectric effect, not relativity as many assumed it would be at the time. The old guard that decided the prizes thought it was safe to give it to Einstein for his explanation, based on ideas by Planck, of a different vexing physics problem. That relativity stuff was too risqué just yet. In retrospect, making the photon real was probably Einstein's greatest leap, even though Planck and others of his generation were never comfortable with it. The Nobel committee got it right by accident.

To put all this together, what we learn through our education has always been filtered by those who came after the events. It can be easier to explain things using current ideas, but it's easy to forget that those who invented the ideas didn't have them yet. The act of creating them may only be well understood by stepping back to the time they were working.

I'm sorry I don't remember the professor's name. Our unpicking of the black body story was one of the most memorable and informative days of my schooling, and I will be forever grateful for it.

Tuesday, September 29, 2020

Color blindness

Color blindness is an inaccurate term. Most color-blind people can see color, they just don't see the same colors as everyone else.

There have been a number of articles written about how to improve graphs, charts, and other visual aids on computers to better serve color-blind people. That is a worthwhile endeavor, and the people writing them mean well, but I suspect very few of them are color-blind because the advice is often poor and sometimes wrong. The most common variety of color blindness is called red-green color blindness, or deuteranopia, and it affects about 6% of human males. As someone who has moderate deuteranopia, I'd like to explain what living with it is really like.

The answer may surprise you.

I see red and green just fine. Maybe not as fine as you do, but just fine. I get by. I can drive a car and I stop when the light is red and go when the light is green. (Blue and yellow, by the way, I see the same as you. For a tiny fraction of people that is not the case, but that's not the condition I'm writing about.)

If I can see red and green, what then is red-green color blindness?

To answer that, we need to look at the genetics and design of the human vision system. I will only be writing about moderate deuteranopia, because that's what I have and I know what it is: I live with it. Maybe I can help you understand how that impairment—and it is an impairment, however mild—affects the way I see things, especially when people make charts for display on a computer.

There's a lot to go through, but here is a summary. The brain interprets signals from the eye to determine color, but the eye doesn't see colors. There is no red receptor, no green receptor in the eye. The color-sensitive receptors in the eye, called cones, don't work like that. Instead there are several different types of cones with broad but overlapping color response curves, and what the eye delivers to the brain is the difference between the signals from nearby cones with possibly different color response. Colors are what the brain makes from those signals.

There are also monochromatic receptors in the eye, called rods, and lots of them, but we're ignoring them here. They are most important in low light. In bright light it's the color-sensitive cones that dominate.

For most mammals, there are two color response curves for cones in the eye. They are called warm and cool, or yellow and blue. Dogs, for instance, see color, but from a smaller palette than we do. The color responses are determined, in effect, by pigments in front of the light receptors, filters if you will. We have this system in our eyes, but we also have another, and that second one is the central player in this discussion.

We are mammals, primates, and we are members of the branch of primates called Old World monkeys. At some point our ancestors in Africa moved to the trees and started eating the fruit there. The old warm/cool color system is not great at spotting orange or red fruit in a green tree.  Evolution solved this problem by duplicating a pigment and mutating it to make a third one. This created three pigments in the monkey eye, and that allowed a new color dimension to arise, creating what we now think of as the red/green color axis. That dimension makes fruit easier to find in the jungle, granting a selective advantage to monkeys, like us, who possess it.

It's not necessary to have this second, red/green color system to survive. Monkeys could find fruit before the new system evolved. So the red/green system favored monkeys who had it, but it wasn't necessary, and evolutionary pressure hasn't yet perfected the system. It's also relatively new, so it's still evolving. As a result, not all humans have equivalent color vision.

The mechanism is a bit sloppy. The mutation is a "stutter" mutation, meaning that the pigment was created by duplicating the original warm pigment's DNA and then repeating some of its codon sequences. The quality of the new pigment—how much the pigment separates spectrally from the old warm pigment—is determined by how well the stutter mutation is preserved. No stutter, you get just the warm/cool dimension, a condition known as dichromacy that affects a small fraction of people, almost exclusively male (and all dogs). Full stutter, you get the normal human vision with yellow/blue and red/green dimensions. Partial stutter, and you get me, moderately red-green color-blind. Degrees of red-green color blindness arise according to how much stutter is in the chromosome.

Those pigments are encoded only on the X chromosome. That means that most males, being XY, get only one copy of the pigment genes, while most females, being XX, get two. If an XY male inherits a bad copy of the X he will be color-blind. An XX female, though, will be much less likely to get two bad copies. But some will get a good one and a bad one, one from the mother and one from the father, giving them four pigments. Such females are called tetrachromatic and have a richer color system than most of us, even than normal trichromats like you.

The key point about the X-residence of the pigment, though, is that men are much likelier than women to be red-green color-blind.

Here is a figure from an article by Denis Baylor in an essay collection called Colour Art & Science, edited by Trevor Lamb and Janine Bourriau, an excellent resource.

The top diagram shows the pigment spectra of a dichromat, what most mammals have. The bottom one shows the normal trichromat human pigment spectra. Note that two of the pigments are the same as in a dichromat, but there is a third, shifted slightly to the red. That is the Old World monkey mutation, making it possible to discriminate red. The diagram in the middle shows the spectra for someone with red-green color blindness. You can see that there are still three pigments, but the difference between the middle and longer-wave (redder) pigment is smaller.

A deuteranope like me can still discriminate red and green, just not as well. Perhaps what I see is a bit like what you see when evening approaches and the color seems to drain from the landscape as the rods begin to take over. Or another analogy might be what happens when you turn the stereo's volume down: You can still hear all the instruments, but they don't stand out as well.

It's worth emphasizing that there is no "red" or "green" or "blue" or "yellow" receptor in the eye. The optical pigments have very broad spectra. It's the difference in the response between two receptors that the vision system turns into color.

In short, I still see red and green, just not as well as you do. But there's another important part of the human visual system that is relevant here, and it has a huge influence on how red-green color blindness affects the clarity of diagrams on slides and such.

It has to do with edge detection. The signals from receptors in the eye are used not only to detect color, but also to detect edges. In fact since color is detected largely by differences of spectral response from nearby receptors, the edges are important because that's where the strongest difference lies. The color of a region, especially a small one, is largely determined at the edges.

Of course, all animals need some form of visual processing that identifies objects, and edge detection is part of that processing in mammals. But the edge detection circuitry is not uniformly deployed. In particular, there is very little high-contrast detection capability for cool colors. You can see this yourself in the following diagram, provided your monitor is set up properly. The small pure blue text on the pure black background is harder to read than even the slightly less saturated blue text, and much harder than the green or red. Make sure the image is no more than about 5cm across to see the effect properly, as the scale of the contrast signal matters:

In this image, the top line is pure computer green, the next is pure computer red, and the bottom is pure computer blue. In between is a sequence leading to ever purer blues towards the bottom. For me, and I believe for everyone, the bottom line is very hard to read.

Here is the same text field as above but with a white background:

Notice that the blue text is now easy to read. That's because it's against white, which includes lots of light and all colors, so it's easy for the eye to build the difference signals and recover the edges. Essentially, it detects a change of color from the white to the blue. Across the boundary the level of blue changes, but so do the levels red and green. When the background is black, however, the eye depends on the blue alone—black has no color, no light to contribute a signal, no red, no green—and that is a challenge for the human eye.

Now here's some fun: double the size of the black-backgrounded image and the blue text becomes disproportionately more readable:

Because the text is bigger, more receptors are involved and there is less dependence on edge detection, making it easier to read the text. As I said above, the scale of the contrast changes matters. If you use your browser to blow up the image further you'll see it becomes even easier to read the blue text.

And that provides a hint about how red-green color blindness looks to people who have it.

For red-green color-blind people, the major effect comes from the fact that edge detection is weaker in the red/green dimension, sort of like blue edge detection is for everyone. Because the pigments are closer together than in a person with regular vision, if the color difference in the red-green dimension is the only signal that an edge is there, it becomes hard to see the edge and therefore hard to see the color. 

In other words, the problem you have reading the blue text in the upper diagram is analogous to how much trouble a color-blind person has seeing detail in an image with only a mix of red and green. And the issue isn't between computer red versus computer green, which are quite easy to tell apart as they have very different spectra, but between more natural colors on the red/green dimension, colors that align with the naturally evolved pigments in the cones.

In short, color detection when looking at small things, deciding what color an item is when it's so small that only the color difference signal at the edges can make the determination, is worse for color-blind people. Even though the colors are easy to distinguish for large objects, it's hard when they get small.

In this next diagram I can easily tell that in the top row the left block is greenish and the right block is reddish, but in the bottom row that is a much harder distinction for me to make, and it gets even harder if I look from father away, further shrinking the apparent size of the small boxes. From across the room it's all but impossible, even though the colors of the upper boxes remain easy to identify.

Remember when I said I could see red and green just fine? Well, I can see the colors just fine (more or less). But that is true only when the object is large enough that the color analysis isn't being done only by edge detection. Fields of color are easy, but lines and dots are very hard.

Here's another example. Some devices come with a tiny LED that indicates charging status by changing color: red for low battery, amber for medium, and green for a full charge. I have a lot of trouble discriminating the amber and green lights, but can solve this by holding the light very close to my eye so it occupies a larger part of the visual field. When the light looks bigger, I can tell what color it is.

Another consequence of all this is that I see very little color in the stars. That makes me sad.

Remember this is about color, just color. It's easy to distinguish two items if their colors are close but their intensities, for example, are different. A bright red next to a dull green is easy to spot, even if the same red dulled down to the level of the green would not be. Those squares above are at roughly equal saturations and intensities. If not, it would be easier to tell which is red and which is green.

To return to the reason for writing this article, red/green color blindness affects legibility. The way the human vision system works, and the way it sometimes doesn't work so well, implies there are things to consider when designing an information display that you want to be clearly understood.

First, choose colors that can be easily distinguished. If possible, keep them far apart on the spectrum. If not, differentiate them some other way, such as by intensity or saturation.

Second, use other cues if possible. Color is complex, so if you can add another component to a line on a graph, such as a dashed versus dotted pattern, or even good labeling, that helps a lot.

Third, edge detection is key to comprehension but can be tricky. Avoid difficult situations such as pure blue text on a black background. Avoid tiny text.

Fourth, size matters. Don't use the thinnest possible line. A fatter one might work just as well for the diagram but be much easier to see and to identify by color.

And to introduce one last topic, some people, like me, have old eyes, and old eyes have much more trouble with scattered light and what that does to contrast. Although dark mode is very popular these days, bright text on a black background scatters in a way that makes it hard to read. The letters have halos around them that can be confusing. Black text on a white background works well because the scatter is uniform and doesn't make halos. It's fortunate that paper is white and ink is black, because that works well for all ages.

The most important lesson is to not assume you know how something appears to a color-blind person, or to anyone else for that matter. If possible, ask someone you know who has eyes different from yours to assess your design and make sure it's legible. The world is full of people with vision problems of all kinds. If only the people who used amber LEDs to indicate charge had realized that.


In May 2009, Google hosted an internal "Design Wizardry" panel, with talks by Jeff Dean,  Mike Burrows, Paul Haahr, Alfred Spector...