Tuesday, April 03, 2012

The byte order fallacy

Whenever I see code that asks what the native byte order is, it's almost certain the code is either wrong or misguided. And if the native byte order really does matter to the execution of the program, it's almost certain to be dealing with some external software that is either wrong or misguided. If your code contains #ifdef BIG_ENDIAN or the equivalent, you need to unlearn about byte order.

The byte order of the computer doesn't matter much at all except to compiler writers and the like, who fuss over allocation of bytes of memory mapped to register pieces. Chances are you're not a compiler writer, so the computer's byte order shouldn't matter to you one bit.

Notice the phrase "computer's byte order". What does matter is the byte order of a peripheral or encoded data stream, but--and this is the key point--the byte order of the computer doing the processing is irrelevant to the processing of the data itself. If the data stream encodes values with byte order B, then the algorithm to decode the value on computer with byte order C should be about B, not about the relationship between B and C.

Let's say your data stream has a little-endian-encoded 32-bit integer. Here's how to extract it (assuming unsigned bytes):
i = (data[0]<<0) | (data[1]<<8) | (data[2]<<16) | (data[3]<<24);
If it's big-endian, here's how to extract it:
i = (data[3]<<0) | (data[2]<<8) | (data[1]<<16) | (data[0]<<24);
Both these snippets work on any machine, independent of the machine's byte order, independent of alignment issues, independent of just about anything. They are totally portable, given unsigned bytes and 32-bit integers.

What you might have expected to see for the little-endian case was something like
i = *((int*)data);
#ifdef BIG_ENDIAN
/* swap the bytes */
i = ((i&0xFF)<<24) | (((i>>8)&0xFF)<<16) | (((i>>16)&0xFF)<<8) | (((i>>24)&0xFF)<<0);
#endif
or something similar. I've seen code like that many times. Why not do it that way? Well, for starters:
  1. It's more code.
  2. It assumes integers are addressable at any byte offset; on some machines that's not true.
  3. It depends on integers being 32 bits long, or requires more #ifdefs to pick a 32-bit integer type.
  4. It may be a little faster on little-endian machines, but not much, and it's slower on big-endian machines.
  5. If you're using a little-endian machine when you write this, there's no way to test the big-endian code.
  6. It swaps the bytes, a sure sign of trouble (see below).

By contrast, my version of the code:
  1. Is shorter.
  2. Does not depend on alignment issues.
  3. Computes a 32-bit integer value regardless of the local size of integers.
  4. Is equally fast regardless of local endianness, and fast enough (especially on modern processsors) anyway.
  5. Runs the same code on all computers: I can state with confidence that if it works on a little-endian machine it will work on a big-endian machine.
  6. Never "byte swaps".
In other words, it's simpler, cleaner, and utterly portable. There is no reason to ask about local byte order when about to interpret an externally provided byte stream.

I've seen programs that end up swapping bytes two, three, even four times as layers of software grapple over byte order. In fact, byte-swapping is the surest indicator the programmer doesn't understand how byte order works.

Why do people make the byte order mistake so often? I think it's because they've seen a lot of bad code that has convinced them byte order matters. "Here comes an encoded byte stream; time for an #ifdef." In fact, C may be part of the problem: in C it's easy to make byte order look like an issue. If instead you try to write byte-order-dependent code in a type-safe language, you'll find it's very hard. In a sense, byte order only bites you when you cheat.

There's plenty of software that demonstrates the byte order fallacy is really a fallacy. The entire Plan 9 system ran, without architecture-dependent #ifdefs of any kind, on dozens of computers of different makes, models, and byte orders. I promise you, your computer's byte order doesn't matter even at the level of the operating system.

And there's plenty of software that demonstrates how easily you can get it wrong. Here's one example. I don't know if it's still true, but some time back Adobe Photoshop screwed up byte order. Back then, Macs were big-endian and PCs, of course, were little-endian. If you wrote a Photoshop file on the Mac and read it back in, it worked. If you wrote it on a PC and tried to read it on a Mac, though, it wouldn't work unless back on the PC you checked a button that said you wanted the file to be readable on a Mac. (Why wouldn't you? Seriously, why wouldn't you?) Ironically, when you read a Mac-written file on a PC, it always worked, which demonstrates that someone at Adobe figured out something about byte order. But there would have been no problems transferring files between machines, and no need for a check box, if the people at Adobe wrote proper code to encode and decode their files, code that could have been identical between the platforms. I guarantee that to get this wrong took far more code than it would have taken to get it right.

Just last week I was reviewing some test code that was checking byte order, and after some discussion it turned out that there was a byte-order-dependency bug in the code being tested. As is often the case, the existence of byte-order-checking was evidence of the presence of a bug. Once the bug was fixed, the test no longer cared about byte order.

And neither should you, because byte order doesn't matter.

Saturday, December 31, 2011

Esmerelda's Imagination

An actress acquaintance of mine—let's call her Esmerelda—once said, "I can't imagine being anything except an actress." To which the retort was given, "You can't be much of an actress then, can you?"

I was reminded of this exchange when someone said to me about Go, "I can't imagine programming in a language that doesn't have generics." My retort, unspoken this time, was, "You can't be much of a programmer, then, can you?"

This is not an essay about generics (which are a fine thing and may arrive in Go one day, or may not) but about imagination, or at least what passes for imagination among computer programmers: complaint. A friend observed that the definitive modern pastime is to complain on line. For the complainers, it's fun, for the recipients of the complaint it can be dispiriting. As a recipient, I am pushing back—by complaining, of course.

Not so long ago, a programmer was someone who programs, but that seems to be the last thing programmers do nowadays. Today, the definition of a programmer is someone who complains unless the problem being solved has already been solved and whose solution can be expressed in a single line of code. (From the point of view of a language designer, this reduces to a corollary of language success: every program must be reducible to single line of code or your language sucks. The lessons of APL have been lost.)

A different, more liberal definition might be that a programmer is someone who approaches every problem exactly the same way and complains about the tools if the approach is unsuccessful.

For the programmer population, the modern pastime demands that if one is required to program, or at least to think while programming, one blogs/tweets/rants instead. I have seen people write thousands of words of on-line vituperation that problem X requires a few extra keystrokes than it might otherwise, missing the irony that had they spent those words on programming, they could have solved the problem many times over with the saved keystrokes. But, of course, that would be programming.

Two years ago Go went public. This year, Dart was announced. Both came from Google but from different teams with different goals; they have little in common. Yet I was struck by a property of the criticisms of Dart in the first few days: by doing a global substitution of "Go" for "Dart", many of the early complaints about Go would have fit right into the stream of Dart invective. It was unnecessary to try Go or Dart before commenting publicly on them; in fact, it was important not to (for one thing, trying them would require programming). The criticisms were loud and vociferous but irrelevant because they weren't about the languages at all. They were just a standard reaction to something new, empty of meaning, the result of a modern programmer's need to complain about everything different. Complaints are infinitely recyclable. ("I can't imagine programming in a language without XXX.") After all, they have a low quality standard: they need not be checked by a compiler.

A while after Go launched, the criticisms changed tenor somewhat. Some people had actually tried it, but there were still many complainers, including the one quoted above. The problem now was that imagination had failed: Go is a language for writing Go programs, not Java programs or Haskell programs or any other language's programs. You need to think a different way to write good Go programs. But that takes time and effort, more than most will invest. So the usual story is to translate one program from another language into Go and see how it turns out. But translation misses idiom. A first attempt to write, for example, some Java construct in Go will likely fail, while a different Go-specific approach might succeed and illuminate. After 10 years of Java programming and 10 minutes of Go programming, any comparison of the language's capabilities is unlikely to generate insight, yet here come the results, because that's a modern programmer's job.

It's not all bad, of course. Two years on, Go has lots of people who've spent the time to learn how it's meant to be used, and for many willing to invest such time the results have been worthwhile. It takes time and imagination and programming to learn how to use any language well, but it can be time well spent. The growing Go community has generated lots of great software and has given me hope, hope that there may still be actual programmers out there.

However, I still see far too much ill-informed commentary about Go on the web, so for my own protection I will start 2012 with a resolution:

I resolve to recognize that a complaint reveals more about the complainer than the complained-about. Authority is won not by rants but by experience and insight, which require practice and imagination. And maybe some programming.

Sunday, September 18, 2011

User experience

[We open in a well-lit corporate conference room. A meeting has been running for a while. Lots has been accomplished but time is running out.]

[The door opens and a tall, tow-headed twenty-something guy in glasses walks in, carrying a Mac Air and a folder.]

Manager:
Oh, here he is. This is Richard. I asked him to join us today. Glad he could make it. He's got some great user experience ideas.

Richard:
Call me Dick.

Manager:
Dick's done a lot of seminal UX work for us.

Engineer:
Hey, aren't you the guy who's arguing we shouldn't have search in e-books?

Dick:
Absolutely. It's a lousy idea.

Engineer:
What?

Dick:
Books are the best UI ever created. They've been perfected over more than 500 years of development. We shouldn't mess with success.

Product manager:
Well, this is a new age. We should be allowed to ...

Dick:
Books have never had search. If we add search, we'll just confuse the user.

Product manager:
Oh, you're right. We don't want to do that.

Engineer:
But e-books aren't physical books. They're not words on paper. They're just bits, information.

Dick:
Our users don't know that.

Engineer:
Yes they do! They don't want simple books, they want the possibilities that electronic books can bring. Do you know about information theory? Have you even heard of Claude Shannon?

Dick:
Isn't he the chef at that new biodynamic tofu restaurant in North Beach?

Engineer:
Uhh, yeah, OK. But look, you're treating books as a metaphor for your user interface. That's as lame as using a trash can to throw away files and folders. We can do so much more!

Dick:
You misunderstand. Our goal is to make computers easier to use, not to make them more useful.

Product manager:
Wow, that's good.

Engineer:
Wow.

Manager:
Let's get back on track. Dick, you had some suggestions for us?

Dick:
Yeah. I was thinking about the work we did with the Notes iPhone app. Using a font that looked like a felt marker was a big help for users.

Engineer:
Seriously?

Dick:
Yes, it made users feel more comfortable about keeping notes on their phone. Having a font that looks like handwriting helps them forget there's a computer underneath.

Engineer:
I see....

Dick:
Yes, so... I was thinking for the Address Book app for Lion, we should change the look to be like a...

Manager:
Can you show us?

Dick:
Yeah, sure. I have a mock-up here.
[Opens laptop, turns it to face the room.]

Product manager:
An address book! That's fantastic. Look at the detail! Leather, seams at the corners, a visible spine. This is awesome!

Engineer:
It's just a book. It's a throwback. What are you doing? Why does it need to look like a physical address book?

Dick:
Because it is an address book!

Engineer:
No it's not, it's an app!

Dick:
It's a book.

Engineer:
You've made it one. This time it's not even a metaphor - it's literally a book. You're giving up on the possibility of doing more.

Dick:
As I said, users don't care about functionality. They want comfort and familiarity. An Address Book app that looks like an address book will be welcome. Soothing.

Engineer:
If they want a paper address book, they can buy one.

Dick:
Why would they do that if they have one on their desktop?

Engineer:
Can they at least change the appearance? Is there a setting somewhere?

Dick:
Oh, no. We know better than the user - otherwise why are we here? Settings are just confusing.

Engineer:
I ... I really don't understand what's going on.

Manager:
That's OK, you don't have to, but I'd like to give you the action item to build it. End of the quarter OK?

Engineer:
Uhhh, sure.

Manager.
Dick, do you have the requirements doc there?

Dick:
Right here.
[Pushes the folder across the desk.]

Engineer:
Can't you just mail it to me?

Dick:
It's right there.

Engineer:
I know, but... OK.

Manager:
That's a great start, Dick. What else do you have?

Dick:
Well, actually, maybe this is the time to announce that I'm moving on. Today is my last day here.

Manager, Product manager:
[Unison] Oh no!

Dick:
Yeah, sorry about that. I've had an amazing time here changing the world but it's tiem for me to seek new challenges.

Manager:
Do you have something in mind?

Dick:
Yes, I'm moving north. Microsoft has asked me to head a group there. They've got some amazing new ideas around paper clips.

FADE

Monday, August 22, 2011

Regular expressions in lexing and parsing

Comments extracted from a code review. I've been asked to disseminate them more widely.

I should say something about regular expressions in lexing and
parsing. Regular expressions are hard to write, hard to write well,
and can be expensive relative to other technologies. (Even when they
are implemented correctly in N*M time, they have significant
overheads, especially if they must capture the output.)

Lexers, on the other hand, are fairly easy to write correctly (if not
as compactly), and very easy to test. Consider finding alphanumeric
identifiers. It's not too hard to write the regexp (something like
"[a-ZA-Z_][a-ZA-Z_0-9]*"), but really not much harder to write as a
simple loop. The performance of the loop, though, will be much higher
and will involve much less code under the covers. A regular expression
library is a big thing. Using one to parse identifiers is like using a
Ferrari to go to the store for milk.

And when we want to adjust our lexer to admit other character types,
such as Unicode identifiers, and handle normalization, and so on, the
hand-written loop can cope easily but the regexp approach will break
down.

A similar argument applies to parsing. Using regular expressions to
explore the parse state to find the way forward is expensive,
overkill, and error-prone. Standard lexing and parsing techniques are
so easy to write, so general, and so adaptable there's no reason to
use regular expressions. They also result in much faster, safer, and
compact implementations.

Another way to look at it is that lexers and parsing are matching
statically-defined patterns, but regular expressions' strength is that
they provide a way to express patterns dynamically. They're great in
text editors and search tools, but when you know at compile time what
all the things are you're looking for, regular expressions bring far
more generality and flexibility than you need.

Finally, on the point about writing well. Regular expressions are, in
my experience, widely misunderstood and abused. When I do code reviews
involving regular expressions, I fix up a far higher fraction of the
regular expressions in the code than I do regular statements. This is
a sign of misuse: most programmers (no finger pointing here, just
observing a generality) simply don't know what they are or how to use
them correctly.

Encouraging regular expressions as a panacea for all text processing
problems is not only lazy and poor engineering, it also reinforces
their use by people who shouldn't be using them at all.

So don't write lexers and parsers with regular expressions as the
starting point. Your code will be faster, cleaner, and much easier to
understand and to maintain.

Friday, August 27, 2010

Know your science

Except for the TV show "The Big Bang Theory", popular culture gets science wrong. We all know that.

But there's a way it tends to get science wrong that upsets me more than most. That is when it misuses the tools of science by willfully ignoring what science actually means.

One common example is celebrity equations, wherein some mathematical-looking expression mixes two or more celebrities together, as in (I'm making this one up and I'm not a cultural critic, let alone a comic, so please bear with me): Lady Gaga = (2*Madonna + Carrot Top)/3. Mathematically savvy readers will recognize that I normalized that equation. If you don't know what that means, you shouldn't be writing celebrity equations, because mathematical equations mean something, they're not just symbols. Like musical comedy based on bad notes, bogus mathematical equations are not funny, just lazy.

Some years ago I even wrote a letter to Entertainment Weekly when they had a long article full of egregious celebrity equations. To their credit, they published the letter and even mended their ways for a while. I quote the letter here:

According to EW math, the more buzz or intelligence you have, the less likely you are to be on the It List. That may be true, but I bet you didn't mean that. Your equation is art-directed nonsense. EW seems to think the joke is that the equations look cute: If Einstein is funny, his square root is hilarious...
In short, mathematics may look funny if you don't understand it but that doesn't make it funny if you misuse it in ignorance.

Another sort of abuse is comedy periodic tables: periodic tables of the vegetables, period table of the desserts, periodic table of the presidents, and on and on. There are zillions of them. I believe the vegetables one was the first widely distributed example.

What's wrong with them? Again, they miss the point about the one true periodic table, Mendeleev's periodic table of the elements. In fact, to put things with no structure into a periodic table not only misses the point of the periodic table, it misses the profound idea that some things have periods.

Mendeleev's table, by recognizing the periodic structure of the elements, predicted not only properties of the elements, but the very existence of undiscovered elements. It was a breakthrough.

The periodic table is not some artistic layout of letters, it's science at its very best, one of the great results of the 19th century and the birth of modern chemistry. It doesn't honor science to take, say, typefaces and put them in a funny-looking grid. That just mocks the idea that science can predict the way the world works.

Science is not arbitrary. Making arbitrary cultural artifacts by abusing scientific ideas is not just wrong, it's offensive. It cheapens science.

Another area of abuse is quantum mechanics, and a common victim is Heisenberg's uncertainty principle. Despite what some ill-informed academics would have you believe, Heisenberg's principle is not some general statment about weird shit happening in the world, it is a fantastically precise scientific statement about the limits of measurement of two simultaneous physical properties: position and momentum. It's not a metaphor!

What's really sad is that many of the commonest misuses of the terminology of quantum mechanics come from other areas of science and technology. For instance, there is a term in computer engineering called a Heisenbug, which refers to faults that are unpredictable, most often for bugs that go away when you examine them. It's a cute name but it isn't even a correct reference. The quantum mechanical property of things changing when you observe them is not the Heisenberg uncertainty principle, it's the observer effect. These two ideas are often confused but they are not the same. They're not even closely related.

The observer effect in quantum mechanics describes how the act of measuring a quantum system forces the system to cough up a measurable quantity, which triggers a "wave function collapse". Heisenberg's uncertainty principle says that the minimum product of the error in simultaneous measurement of a particle's position and momentum is Planck's constant divided by 4π, or as we write it in physics, ℏ/2. (By the way, that's an extremely small value.)

Not only are these very different ideas, neither of them has anything to do with computer bugs. The term Heisenbug is trendy but bogus and ignores some strange and beautiful ideas. It's no better informed than the square root of Einstein or the periodic table of the typefaces.

If you're going to use the terms of science to inform your world, please make a point to understand the science too. Your world will be richer for it.

Wednesday, April 02, 2008

MacDonald's not McDonald's



Same general concept. Had a hamburger here too.

Tuesday, April 01, 2008


When I travel I like to have a hamburger at a McDonald's restaurant. There are a number of these to be found around the world.