Cary Millsap

Subscribe to Cary Millsap feed
My web log for things I’m interested in, including design, software development, performance analysis, learning, and running a business.Cary Millsap
Updated: 15 hours 26 min ago

Words I Don’t Use, Part 5: “Wait”

Tue, 2017-08-15 18:25
The fifth “word I do not use” is the Oracle technical term wait.
The Oracle Wait InterfaceIn 1991, Oracle Corporation released some of the most important software instrumentation of all time: the wait statistics that were implemented in Oracle 7.0. Here’s part of the story, in Juan Loaiza’s words, as told in Nørgaard et. al (2004), Oracle Insights: Tales of the Oak Table.
This stuff was developed because we were running a benchmark that we could not get to perform. We had spent several weeks trying to figure out what was happening with no success. The symptoms were clear—the system was mostly idle—we just couldn’t figure out why.

We looked at the statistics and ratios and kept coming up with theories, the trouble was that none of them were right. So we wasted weeks tuning and fixing things that were not the problem. Finally we ran out of ideas and were forced to go back and instrument the code to figure out what the problem was.

Once the waits were instrumented the problem was diagnosed in minutes. We were having “free buffer” waits because the DBWR was not writing blocks fast enough. It’s amazing how hard that was to figure out with statistics, and how easy it was to figure out once the waits were instrumented.

...In retrospect a lot of the names could be greatly improved. The wait interface was added after the freeze date as a “stealth” project so it did not get as well thought through as it should have. Like I said, we were just trying to solve a problem in the course of a benchmark. The trouble is that so many people use this stuff now that if you change the names it will break all sorts of thing tools, so we have to leave them alone.Before Juan’s team added this code, the Oracle kernel would show you only how much time its user calls (like parse, exec, and fetch) were taking. The new instrumentation, which included a set of new fixed views like v$session_wait and new WAIT lines in our trace files, showed how much time Oracle’s system calls (like reads, writes, and semops) were taking.
The Working-Waiting ModelThe wait interface begat a whole new mental model about Oracle performance, based on the principle of working versus waiting:
Response Time = Service Time + Wait TimeIn this formula, Oracle defines service time as the duration of the CPU used by your Oracle session (the duration Oracle spent working), and wait time as the sum of the durations of your Oracle wait events (the duration that Oracle spent waiting). Of course, response time in this formula means the duration spent inside the Oracle Database kernel.
Why I Don’t Say Wait, Part 1There are two reasons I don’t use the word wait. The first is simply that it’s ambiguous.

The Oracle formula is okay for talking about database time, but the scope of my attention is almost never just Oracle’s response time—I’m interested in the business’s response time. And when you think about the whole stack (which, of course you do; see holistic), there are events we could call wait events all the way up and down:
  • The customer waits for an answer from a user.
  • The user waits for a screen from the browser.
  • The browser waits for an HTML page from the application server.
  • The application server waits for a database call from the Oracle kernel.
  • The Oracle kernel waits for a system call from the operating system.
  • The operating system’s I/O request waits to clear the device’s queue before receiving service.
  • ...
If I say waits, the users in the room will think I’m talking about application response time, the Oracle people will think I’m talking about Oracle system calls, and the hardware people will think I’m talking about device queueing delays. Even when I’m not.
Why I Don’t Say Wait, Part 2There is a deeper problem with wait than just ambiguity, though. The word wait invites a mental model that actually obscures your thinking about performance.

Here’s the problem: waiting sounds like something you’d want to avoid, and working sounds like something you’d want more of. Your program is waiting?! Unacceptable. You want it to be working. The connotations of the words working and waiting are unavoidable. It sounds like, if a program is waiting a lot, then you need to fix it; but if it’s working a lot, then it is probably okay. Right?

Actually, no.

The connotations “work is virtuous” and “waits are abhorrent” are false connotations in Oracle. One is not inherently better or worse than the other. Working and waiting are not accurate value judgments about Oracle software. On the contrary, they’re not even meaningful; they’re just arbitrary labels. We could just as well have been taught to say that an Oracle program is “working on disk I/O” and “waiting to finish its CPU instructions.”

The terms working and waiting really just refer to different subroutine call types:

“Oracle is working”means“your Oracle kernel process is executing a user call”“Oracle is waiting”means“your Oracle kernel process is executing a system call”
The working-waiting model implies a distinction that does not exist, because these two call types have equal footing. One is no worse than the other, except by virtue of how much time it consumes. It doesn’t matter whether a program is working or waiting; it only matters how long it takes.
Working-Waiting Is a Flawed AnalogyThe working-waiting paradigm is a flawed analogy. I’ll illustrate. Imagine two programs that consume 100 seconds apiece when you run them:

Program AProgram BDurationCall typeDurationCall type 98system calls (waiting)98user calls (working)2user calls (working)2system calls (waiting) 100Total100Total
To improve program A, you should seek to eliminate unnecessary system calls, because that’s where most of A’s time has gone. To improve B, you should seek to eliminate unnecessary user calls, because that’s where most of B’s time has gone. That’s it. Your diagnostic priority shouldn’t be based on your calls’ names; it should be based solely on your calls’ contributions to total duration. Specifically, conclusions like, “Program B is okay because it doesn’t spend much time waiting,” are false.
A Better ModelI find that discarding the working-waiting model helps people optimize better. Here’s how you can do it. First, understand the substitute phrasing: working means executing a user call; and waiting means executing a system call. Second, understand that the excellent ideas people use to optimize other software are excellent ideas for optimizing Oracle, too:
  1. Any program’s duration is a function of all of its subroutine call durations (both user calls and system calls), and
  2. A program is running as fast as possible only when (1) its unnecessary calls have been eliminated, and (2) its necessary calls are running at hardware speed.
Oracle’s wait interface is vital because it helps us measure an Oracle program’s complete execution duration—not just Oracle’s user calls, but its system calls as well. But I avoid saying wait to help people steer clear of the incorrect bias introduced by the working-waiting analogy.

Words I Don’t Use, Part 4: “Expert”

Mon, 2017-08-07 09:55
The fourth “word I do not use” is expert.

When I was a young boy, my dad would sometimes drive me to school. It was 17 miles of country roads and two-lane highways, so it gave us time to talk.

At least once a year, and always on the first day of school, he would tell me, “Son, there are two answers to every test question. There’s the correct answer, and there’s the answer that the teacher expects. ...They’re not always the same.”

He would continue, “And I expect you to know them both.”

He wanted me to make perfect grades, but he expected me to understand my responsibility to know the difference between authority and truth. My dad thus taught me from a young age to be skeptical of experts.

The word expert always warns me of a potentially dangerous type of thinking. The word is used to confer authority upon the person it describes. But it’s ideas that are right or wrong; not people. You should evaluate an idea on its own merit, not on the merits of the person who conveys it. For every expert, there is an equal and opposite expert; but for every fact, there is not necessarily an equal and opposite fact.

A big problem with expert is corruption—when self-congratulators hijack the label to confer authority upon themselves. But of course, misusing the word erodes the word. After too much abuse within a community, expert makes sense only with finger quotes. It becomes a word that critical thinkers use only ironically, to describe people they want to avoid.

Words I Don’t Use, Part 3: “Best Practice”

Sat, 2017-07-29 10:24
The third “word I do not use” is best practice.

The “best practice” serves a vital need in any industry. It is the answer to, “Please don’t make me learn about this; just tell me what to do.” The “best practice” is a fine idea in spirit, but here’s the thing: many practices labeled “best” don’t deserve the adjective. They’re often containers for bad advice.

The most common problem with “best practices” is that they’re not parameterized like they should be. A good practice usually depends on something: if this is true, then do that; otherwise, do this other thing. But most “best practices” don’t come with conditions of execution—they often contain no if statements at all. They come disguised as recipes that can save you time, but they often encourage you to skip past thinking about things that you really ought to be thinking about.

Most of my objections to “best practices” go away when the practices being prescribed are actually good. But the ones I see are often not, like the old SQL “avoid full-table scans” advice. Enforcing practices like this yields applications that don’t run as well as they should and developers that don’t learn the things they should. Practices like “Measure the efficiency of your SQL at every phase of the software life cycle,” are actually “best”-worthy, but alas, they’re less popular because they sound like real work.

Words I Don’t Use, Part 2: “Holistic”

Thu, 2017-07-27 11:51
The second “word I do not use” is holistic.

When people use the word “holistic” in my industry (Oracle), it means that they’re paying attention to not just an individual subcomponent of a system, but to a whole system, including (I hope) even the people it serves.

But trying to differentiate technology services by saying “we take a holistic view of your system” is about like differentiating myself by saying I’ll wear clothes to work. Saying “holistic” would make it look like I’ve only just recently become aware that optimizing a system’s individual subsystems is not a reliable way to optimize the system itself. This should not be a distinctive revelation.

Words I Don’t Use, Part 1: “Methodology”

Fri, 2017-07-21 12:58
Today, I will begin a brief sequence of blog posts that I’ll call “Words I Don’t Use.” I hope you’ll have some fun with it, and maybe find a thought or two worth considering.

The first word I’ll discuss is methodology. Yes. I made a shirt about it.

Approximately 100% of the time in the [mostly non-scientific] Oracle world that I live in, when people say “methodology,” they’re using it in the form that American Heritage describes as a pretentious substitute for “method.” But methodology is not the same as method. Methods are processes. Sequences of steps. Methodology is the scientific study of methods.

I like this article called “Method versus Methodology” by Peter Klein, which cites the same American Heritage Dictionary passage that I quoted on page 358 of Optimizing Oracle Performance.

Messed-Up App of the Day: Tables of Numbers

Thu, 2016-05-19 14:51
Quick, which database is the biggest space consumer on this system?
Database                  Total Size   Total Storage
-------------------- --------------- ---------------
SAD99PS 635.53 GB 1.24 TB
ANGLL 9.15 TB 18.3 TB
FRI_W1 2.14 TB 4.29 TB
DEMO 6.62 TB 13.24 TB
H111D16 7.81 TB 15.63 TB
HAANT 1.1 TB 2.2 TB
FSU 7.41 TB 14.81 TB
BYNANK 2.69 TB 5.38 TB
HDMI7 237.68 GB 476.12 GB
SXXZPP 598.49 GB 1.17 TB
TPAA 1.71 TB 3.43 TB
MAISTERS 823.96 GB 1.61 TB
p17gv_data01.dbf 800.0 GB 1.56 TB
It’s harder than it looks.

Did you come up with ANGLL? If you didn’t, then you should look again. If you did, then what steps did you have to execute to find the answer?

I’m guessing you did something like I did:
  1. Skim the entire list. Notice that HDMI7 has a really big value in the third column.
  2. Read the column headings. Parse the difference in meaning between “size” and “storage.” Realize that the “storage” column is where the answer to a question about space consumption will lie.
  3. Skim the “Total Storage” column again and notice that the wide “476.12” number I found previously has a GB label beside it, while all the other labels are TB.
  4. Skim the table again to make sure there’s no PB in there.
  5. Do a little arithmetic in my head to realize that a TB is 1000× bigger than a GB, so 476.12 is probably not the biggest number after all, in spite of how big it looked.
  6. Re-skim the “Total Storage” column looking for big TB numbers.
  7. The biggest-looking TB number is 15.63 on the H111D16 row.
  8. Notice the trap on the ANGLL row that there are only three significant digits showing in the “18.3” figure, which looks physically the same size as the three-digit figures “1.24” and “4.29” directly above and below it, but realize that 18.3 (which should have been rendered “18.30”) is an order of magnitude larger.
  9. Skim the column again to make sure I’m not missing another such number.
  10. The answer is ANGLL.
That’s a lot of work. Every reader who uses this table to answer that question has to do it.

Rendering the table differently makes your readers’ (plural!) job much easier:
Database          Size (TB)  Storage (TB)
---------------- --------- ------------
SAD99PS .64 1.24
ANGLL 9.15 18.30
FRI_W1 2.14 4.29
DEMO 6.62 13.24
H111D16 7.81 15.63
HAANT 1.10 2.20
FSU 7.41 14.81
BYNANK 2.69 5.38
HDMI7 .24 .48
SXXZPP .60 1.17
TPAA 1.71 3.43
MAISTERS .82 1.61
p17gv_data01.dbf .80 1.56
This table obeys an important design principle:
The amount of ink it takes to render each number is proportional to its relative magnitude.I fixed two problems: (i) now all the units are consistent (I have guaranteed this feature by adding unit label to the header and deleting all labels from the rows); and (ii) I’m showing the same number of significant digits for each number. Now, you don’t have to do arithmetic in your head, and now you can see more easily that the answer is ANGLL, at 18.30 TB.

Let’s go one step further and finish the deal. If you really want to make it as easy as possible for readers to understand your space consumption problem, then you should sort the data, too:
Database          Size (TB)  Storage (TB)
---------------- --------- ------------
ANGLL 9.15 18.30
H111D16 7.81 15.63
FSU 7.41 14.81
DEMO 6.62 13.24
BYNANK 2.69 5.38
FRI_W1 2.14 4.29
TPAA 1.71 3.43
HAANT 1.10 2.20
MAISTERS .82 1.61
p17gv_data01.dbf .80 1.56
SAD99PS .64 1.24
SXXZPP .60 1.17
HDMI7 .24 .48
Now, your answer comes in a glance. Think back at the comprehension steps that I described above. With the table here, you only need:
  1. Notice that the table is sorted in descending numerical order.
  2. Comprehend the column headings.
  3. The answer is ANGLL.
As a reader, you have executed far less code path in your brain to completely comprehend the data that the author wants you to understand.

Good design is a topic of consideration. And even conservation. If spending 10 extra minutes formatting your data better saves 1,000 readers 2 minutes each, then you’ve saved the world 1,990 minutes of wasted effort.

But good design is also a very practical matter for you personally, too. If you want your audience to understand your work, then make your information easier for them to consume—whether you’re writing email, proposals, reports, infographics, slides, or software. It’s part of the pathway to being more persuasive.

Fail Fast

Fri, 2016-05-13 17:22
Among movements like Agile, Lean Startup, and Design Thinking these days, you hear the term fail fast. The principle of failing fast is vital to efficiency, but I’ve seen project managers and business partners be offended or even agitated by the term fail fast. I’ve seen it come out like, “Why the hell would I want to fail fast?! I don’t want to fail at all.” The implication, of course: “Failing is for losers. If you’re planning to fail, then I don’t want you on my team.”

I think I can help explain why the principle of “fail fast” is so important, and maybe I can help you explain it, too.

Software developers know about fail fast already, whether they realize it or not. Yesterday was a prime example for me. It was a really long day. I didn’t leave my office until after 9pm, and then I turned my laptop back on as soon as I got home to work another three hours. I had been fighting a bug all afternoon. It was a program that ran about 90 seconds normally, but when I tried a code path that should have been much faster, I could let it run 50 times that long and it still wouldn’t finish.

At home, I ran it again and left it running while I watched the Thunder beat the Spurs, assuming the program would finish eventually, so I could see the log file (which we’re not flushing often enough, which is another problem). My MacBook Pro ran so hard that the fan compelled my son to ask me why my laptop was suddenly so loud. I was wishing the whole time, “I wish this thing would fail faster.” And there it is.

When you know your code is destined to fail, you want it to fail faster. Debugging is hard enough as it is, without your stupid code forcing you to wait an hour just to see your log file, so you might gain an idea of what you need to go fix. If I could fail faster, I could fix my problem earlier, get more work done, and ship my improvements sooner.

But how does that relate to wanting my business idea to fail faster? Well, imagine that a given business idea is in fact destined to fail. When would you rather find out? (a) In a week, before you invest millions of dollars and thousands of hours investing into the idea? Or (b) In a year, after you’ve invested millions of dollars and thousands of hours?

I’ll take option (a) a million times out of a million. It’s like asking if I’d like a crystal ball. Um, yes.

The operative principle here is “destined to fail.” When I’m fixing a reported bug, I know that once I create reproducible test case for that bug, my software will fail. It is destined to fail on that test case. So, of course, I want for my process of creating the reproducible test case, my software build process, and my program execution itself to all happen as fast as possible. Even better, I wish I had come up with the reproducible test case a year or two ago, so I wouldn’t be under so much pressure now. Because seeing the failure earlier—failing fast—will help me improve my product earlier.

But back to that business idea... Why would you want a business idea to fail fast? Why would you want it to fail at all? Well, of course, you don’t want it to fail, but it doesn’t matter what you want. What if it is destined to fail? It’s really important for you to know that. So how can you know?

Here’s a little trick I can teach you. Your business idea is destined to fail. It is. No matter how awesome your idea is, if you implement your current vision of some non-trivial business idea that will take you, say, a month or more to implement, not refining or evolving your original idea at all, your idea will fail. It will. Seriously. If your brain won’t permit you to conceive of this as a possibility, then your brain is actually increasing the probability that your idea will fail.

You need to figure out what will make your idea fail. If you can’t find it, then find smart people who can. Then, don’t fear it. Don’t try to pretend that it’s not there. Don’t work for a year on the easy parts of your idea, delaying the inevitable hard stuff, hoping and praying that the hard stuff will work its way out. Attack that hard stuff first. That takes courage, but you need to do it.

Find your worst bottleneck, and make it your highest priority. If you cannot solve your idea’s worst problem, then get a new idea. You’ll do yourself a favor by killing a bad idea before it kills you. If you solve your worst problem, then find the next one. Iterate. Shorter iterations are better. You’re done when you’ve proven that your idea actually works. In reality. And then, because life keeps moving, you have to keep iterating.

That’s what fail fast means. It’s about shortening your feedback loop. It’s about learning the most you can about the most important things you need to know, as soon as possible.

So, when I wish you fail fast, it’s a blessing; not a curse.

Loss Aversion and the Setting of DB_BLOCK_CHECKSUM

Mon, 2016-03-07 14:16
Within Accenture Enkitec Group, we have recently been discussing the Oracle db_block_checksum parameter and how difficult it is to get clients to set it to a safer setting.

Clients are always concerned about the performance impact of features like this. Several years ago, I met a lot of people who had—in response to some expensive advice with which I strongly disagreed—turned off redo logging with an underscore parameter. The performance they would get from doing this would set the expectation level in their mind, which would cause them to resist (strenuously!) any notion of switching this [now horribly expensive] logging back on. Of course, it makes you wish that it had never even been a parameter.

I believe that the right analysis is to think clearly about risk. Risk is a non-technical word in most people’s minds, but in finance courses they teach that risk is quantifiable as a probability distribution. For example, you can calculate the probability that a disk will go bad in your system today. For disks, it’s not too difficult, because vendors do those calculations (MTTF) for us. But the probability that you’ll wish you had set db_block_checksum=full yesterday is probably more difficult to compute.

From a psychology perspective, customers would be happier if their systems had db_block_checksum set to full or typical to begin with. Then in response to the question,
“Would you like to remove your safety net in exchange for going between 1% and 10% faster? Here’s the horror you might face if you do it...”...I’d wager that most people would say no, thank you. They will react emotionally to the idea of their safety net being taken away.

But with the baseline of its being turned off to begin with, the question is,
“Would you like to install a safety net in exchange for slowing your system down between 1% and 10%? Here’s the horror you might face if you don’t...”...I’d wager that most people would answer no, thank you, even though this verdict that is opposite to the one I predicted above. They will react emotionally to the idea of their performance being taken away.

Most people have a strong propensity toward loss aversion. They tend to prefer avoiding losses over acquiring gains. If they already have a safety net, they won’t want to lose it. If they don’t have the safety net they need, they’ll feel averse to losing performance to get one. It ends up being a problem more about psychology than technology.

The only tools I know to help people make the right decision are:
  1. Talk to good salespeople about how they overcome the psychology issue. They have to deal with it every day.
  2. Give concrete evidence. Compute the probabilities. Tell the stories of how bad it is to have insufficient protection. Explain that any software feature that provides a benefit is going to cost some system capacity (just like a new report, for example), and that this safety feature is worth the cost. Make sure that when you size systems, you include the incremental capacity cost of switching to db_block_checksum=full.
My teammates get it, of course, because they’ve lived the stories, over and over again, in their roles on the corruption team at Oracle Support. You can get it, too, without leaving your keyboard. If you want to see a fantastic and absolutely horrifying short story about what happens if you do not use Oracle’s db_block_checksum feature properly, read David Loinaz’s article now.

When you read David’s article, you are going to see heavy quoting of my post here in his intro. He did that with my full support. (He wrote his article when my article here wasn’t an article yet.) If you feel like you’ve read it before, just keep reading. You really, really need to see what David has written, beginning with the question:
If I’ve never faced a corruption, and I have good backup strategy, my disks are mirrored, and I have a great database backup strategy, then why do I need to set these kinds of parameters that will impact my performance?Enjoy.

The “Two Spaces After a Period” Thing

Fri, 2016-01-08 14:21
Once upon a time, I told my friend Chet Justice why he should start using one space instead of two after a sentence-ending period. I’m glad I did.

Here’s the story.

When you type, you’re inputting data into a machine. I know you like feeling like you’re in charge, but really you’re not in charge of all the rules you have to follow while you’re inputting your data. Other people—like the designers of the machine you’re using—have made certain rules that you have to live by. For example, if you’re using a QWERTY keyboard, then the ‘A’ key is in a certain location on the keyboard, and whether it makes any sense to you or not, the ‘B’ key is way over there, not next to the ‘A’ key like you might have expected when you first started learning how to type. If you want a ‘B’ to appear in the input, then you have to reach over there and push the ‘B’ key on the keyboard.

In addition to the rules imposed upon you by the designers of the machine you’re using, you follow other rules, too. If you’re writing a computer program, then you have to follow the syntax rules of the language you’re using. There are alphabet and spelling and grammar rules for writing in German, and different ones for English. There are typographical rules for writing for The New Yorker, and different ones for the American Mathematical Society.

A lot of people who are over about 40 years old today learned to type on an actual typewriter. A typewriter is a machine that used rods and springs and other mechanical elements to press metal dies with backwards letter shapes engraved onto them through an inked ribbon onto a piece of paper. Some of the rules that governed the data input experience on typewriters included:
  • You had to learn where the keys were on the keyboard.
  • You had to learn how to physically return the carriage at the end of a line.
  • You had to learn your project’s rules of spelling.
  • You had to learn your project’s rules of grammar.
  • You had to learn your project’s rules of typography.
The first two rules listed here are physical, but the final three are syntactic and semantic. Just like you wouldn’t press the ‘A’ key to make a ‘B’, you wouldn’t use the strings “definately” or “we was” to make an English sentence.

On your typewriter, you might not have realized it, but you did adhere to some typography rules. They might have included:
  • Use two carriage returns after a paragraph.
  • Type two spaces after a sentence-ending period.
  • Type two spaces after a colon.
  • Use two consecutive hyphens to represent an em dash.
  • Make paragraphs no more than 80 characters wide.
  • Never use a carriage return between “Mr.” and the proper name that follows, or between a number and its unit.
The rules were different for different situations. For example, when I wrote a book back in the mid 1980s, one of the distinctive typography rules my publisher imposed upon me was:
  • Double-space all paragraph text.
They wanted their authors to do this so that their copyeditor had plenty of room for markup. Such typography rules can vary from one project to another.

Most people who didn’t write for different publishers got by just fine on the one set of typography rules they learned in high school. To them, it looked like there were only a few simple rules, and only one set of them. Most people had never even heard of a lot of the rules they should have been following, like rules about widows and orphans.

In the early 1980s, I began using computers for most of my work. I can remember learning how to use word processing programs like WordStar and Sprint. The rules were a lot more complicated with word processors. Now there were rules about “control keys” like ^X and ^Y, and there were no-break spaces and styles and leading and kerning and ligatures and all sorts of new things I had never had to think about before. A word processor was much more powerful than a typewriter. If you did it right, typesetting could could make your work look like a real book. But word processors revealed that typesetting was way more complicated than just typing.

Doing your own typesetting can be kind of like doing your own oil changes. Most people prefer to just put gas in the tank and not think too much about the esoteric features of their car (like their tires or their turn signal indicators). Most people who went from typewriters to word processors just wanted to type like they always had, using the good-old two or three rules of typography that had been long inserted into their brains by their high school teachers and then committed by decades of repetition.

Donald Knuth published The TeXBook in 1984. I think I bought it about ten minutes after it was published. Oh, I loved that book. Using TeX was my first real exposure to the world of actual professional-grade typography, and I have enjoyed thinking about typography ever since. I practice typography every day that I use Keynote or Pages or InDesign to do my work.

Many people don’t realize it, but when you type input into programs like Microsoft Word should follow typography rules including these:
  • Never enter a blank line (edit your paragraph’s style to manipulate its spacing).
  • Use a single space after a sentence-ending period (the typesetter software you’re using will make the amount of space look right as it composes the paragraph).
  • Use a non-breaking space after a non-sentence-ending period (so the typesetter software won’t break “Mr. Harkey” across lines).
  • Use a non-breaking space between a number and its unit (so the typesetter software won’t break “8 oz” across lines).
  • Use an en dash—not a hyphen—to specify ranges of numbers (like “3–8”).
  • Use an em dash—not a pair of hyphens—when you need an em dash (like in this sentence).
  • Use proper quotation marks, like “this” and ‘this’ (or even « this »).
Of course, you can choose to not follow these rules, just like you can choose to be willfully ignorant about spelling or grammar. But to a reader who has studied typography even just a little bit, seeing you break these rules feels the same as seeing a sentence like, “You was suppose to use apostrophe's.” It affects how people perceive you.

So, it’s always funny to me when people get into heated arguments on Facebook about using one space or two after a period. It’s the tiniest little tip of the typography iceberg, but it opens the conversation about typography, for which I’m glad. In these discussions, two questions come up repeatedly: “When did the rule change? Why?”

Well, the rule never did change. The next time I type on an actual typewriter, I will use two spaces after each sentence-ending period. I will also use two spaces when I create a Courier font court document or something that I want to look like it was created in the 1930s. But when I work on my book in Adobe InDesign, I’ll use one space. When I use my iPhone, I’ll tap in two spaces at the end of a sentence, because it automatically replaces them with a period and a single space. I adapt to the rules that govern the situation I’m in.

It’s not that the rules have changed. It’s that the set of rules was always a lot bigger than most people ever knew.

What I Wanted to Tell Terry Bradshaw

Thu, 2015-10-01 17:23
I met Terry Bradshaw one time. It was about ten years ago, in front of a movie theater near where I live.

When I was little, Terry Bradshaw was my enemy because, unforgivably to a young boy, he and his Pittsburgh Steelers kept beating my beloved Dallas Cowboys in Super Bowls. As I grew up, though, his personality on TV talk shows won me over, and I enjoy watching him to this day on Fox NFL Sunday. After learning a little bit about his life, I’ve grown to really admire and respect him.

I had heard that he owned a ranch not too far from where I live, and so I had it in mind that inevitably I would meet him someday, and I would say thank you. One day I had that chance.

I completely blew it.

My wife and I saw him there at the theater one day, standing by himself not far from us. It seemed like if I were to walk over and say hi, maybe it wouldn’t bother him. So I walked over, a little bit nervous. I shook his hand, and I said, “Mr. Bradshaw, hi, my name is Cary.” I would then say this:

I was a big Roger Staubach fan growing up. I watched Cowboys vs. Steelers like I was watching Good vs. Evil.

But as I’ve grown up, I have gained the deepest admiration and respect for you. You were a tremendous competitor, and you’re one of my favorite people to see on TV. Every time I see you, you bring a smile to my face. You’ve brought joy to a lot of people.

I just wanted to say thank you.
Yep, that’s what I would say to Terry Bradshaw if I got the chance. But that’s not how it would turn out. How it actually went was like this, …my big chance:

Me: I was a big Roger Staubach fan growing up.
TB: Hey, so was I!
Me: (stunned)
TB: (turns away)
The End
I was heartbroken. It bothers me still today. If you know Terry Bradshaw or someone who does, I wish you would please let him know. It would mean a lot to me.

…I did learn something that day about the elevator pitch.

The Fundamental Challenge of Computer System Performance

Thu, 2015-09-17 11:46
The fundamental challenge of computer system performance is for your system to have enough power to handle the work you ask it to do. It sounds really simple, but helping people meet this challenge has been the point of my whole career. It has kept me busy for 26 years, and there’s no end in sight.
Capacity and WorkloadOur challenge is the relationship between a computer’s capacity and its workload. I think of capacity as an empty box representing a machine’s ability to do work over time. Workload is the work your computer does, in the form of programs that it runs for you, executed over time. Workload is the content that can fill the capacity box.

Capacity Is the One You Can Control, Right?When the workload gets too close to filling the box, what do you do? Most people’s instinctive reaction is that, well, we need a bigger box. Slow system? Just add power. It sounds so simple, especially since—as “everyone knows”—computers get faster and cheaper every year. We call that the KIWI response: kill it with iron.
KIWI... Why Not?As welcome as KIWI may feel, KIWI is expensive, and it doesn’t always work. Maybe you don’t have the budget right now to upgrade to a new machine. Upgrades cost more than just the hardware itself: there’s the time and money it takes to set it up, test it, and migrate your applications to it. Your software may cost more to run on faster hardware. What if your system is already the biggest and fastest one they make?

And as weird as it may sound, upgrading to a more powerful computer doesn’t always make your programs run faster. There are classes of performance problems that adding capacity never solves. (Yes, it is possible to predict when that will happen.) KIWI is not always a viable answer.
So, What Can You Do?Performance is not just about capacity. Though many people overlook them, there are solutions on the workload side of the ledger, too. What if you could make workload smaller without compromising the value of your system?
It is usually possible to make a computer produce all of the useful results that you need without having to do as much work.You might be able to make a system run faster by making its capacity box bigger. But you might also make it run faster by trimming down that big red workload inside your existing box. If you only trim off the wasteful stuff, then nobody gets hurt, and you’ll have winning all around.

So, how might one go about doing that?
Workload“Workload” is a conjunction of two words. It is useful to think about those two words separately.

The amount of work your system does for a given program execution is determined mostly by how that program is written. A lot of programs make their systems do more work than they should. Your load, on the other hand—the number of program executions people request—is determined mostly by your users. Users can waste system capacity, too; for example, by running reports that nobody ever reads.

Both work and load are variables that, with skill, you can manipulate to your benefit. You do it by improving the code in your programs (reducing work), or by improving your business processes (reducing load). I like workload optimizations because they usually save money and work better than capacity increases. Workload optimization can seem like magic.
The Anatomy of PerformanceThis simple equation explains why a program consumes the time it does:
r = cl        or        response time = call count × call latencyThink of a call as a computer instruction. Call count, then, is the number of instructions that your system executes when you run a program, and call latency is how long each instruction takes. How long you wait for your answer, then—your response time—is the product of your call count and your call latency.

Some fine print: It’s really a little more complicated than this, but actually not that much. Most response times are composed of many different types of calls, all of which have different latencies (we see these in program execution profiles), so the real equation looks like r = c1l1 + c2l2 + ... + cnln. But we’ll be fine with r = cl for this article.

Call count depends on two things: how the code is written, and how often people run that code.
  • How the code is written (work) — If you were programming a robot to shop for you at the grocery store, you could program it to make one trip from home for each item you purchase. Go get bacon. Come home. Go get milk... It would probably be dumb if you did it that way, because the duration of your shopping experience would be dominated by the execution of clearly unnecessary travel instructions, but you’d be surprised at how often people write programs that act like this.
  • How often people run that code (load) — If you wanted your grocery store robot to buy 42 things for you, it would have to execute more instructions than if you wanted to buy only 7. If you found yourself repeatedly discarding spoiled, unused food, you might be able to reduce the number of things you shop for without compromising anything you really need.
Call latency is influenced by two types of delays: queueing delays and coherency delays.
  • Queueing delays — Whenever you request a resource that is already busy servicing other requests, you wait in line. That’s a queueing delay. It’s what happens when your robot tries to drive to the grocery store, but all the roads are clogged with robots that are going to the store to buy one item at a time. Driving to the store takes only 7 minutes, but waiting in traffic costs you another 13 minutes. The more work your robot does, the greater its chances of being delayed by queueing, and the more such delays your robot will inflict upon others as well.
  • Coherency delays — You endure a coherency delay whenever a resource you are using needs to communicate or coordinate with another resource. For example, if your robot’s cashier at the store has to talk with a specific manager or other cashier (who might already be busy with a customer), the checkout process will take longer. The more times your robot goes to the store, the worse your wait will be, and everyone else’s, too.
The SecretThis r = cl thing sure looks like the equation for a line, but because of queueing and coherency delays, the value of l increases when c increases. This causes response time to act not like a line, but instead like a hyperbola.

Because our brains tend to conceive of our world as linear, nobody expects for everyone’s response times to get seven times worse when you’ve only added some new little bit of workload, but that’s the kind of thing that routinely happens with performance. ...And not just computer performance. Banks, highways, restaurants, amusement parks, and grocery-shopping robots all work the same way.

Response times are trememdously sensitive to your call counts, so the secret to great performance is to keep your call counts small. This principle is the basis for perhaps the best and most famous performance optimization advice ever rendered:
The First Rule of Program Optimization: Don’t do it.

The Second Rule of Program Optimization (for experts only!): Don’t do it yet.

The ProblemKeeping call counts small is really, really important. This makes being a vendor of information services difficult, because it is so easy for application users to make call counts grow. They can do it by running more programs, by adding more users, by adding new features or reports, or by even by just the routine process of adding more data every day.

Running your application with other applications on the same computer complicates the problem. What happens when all these application’ peak workloads overlap? It is a problem that Application Service Providers (ASPs), Software as a Service (SaaS) providers, and cloud computing providers must solve.
The SolutionThe solution is a process:
  1. Call counts are sacred. They can be difficult to forecast, so you have to measure them continually. Understand that. Hire people who understand it. Hire people who know how to measure and improve the efficiency of your application programs and the systems they reside on.
  2. Give your people time to fix inefficiencies in your code. An inexpensive code fix might return many times the benefit of an expensive hardware upgrade. If you have bought your software from a software vendor, work with them to make sure they are streamlining the code they ship you.
  3. Learn when to say no. Don’t add new features (especially new long-running programs like reports) that are inefficient, that make more calls than necessary. If your users are already creating as much workload as the system can handle, then start prioritizing which workload you will and won’t allow on your system during peak hours.
  4. If you are an information service provider, charge your customers for the amount of work your systems do for them. The economic incentive to build and buy more efficient programs works wonders.

Messed-Up App of the Day: Crux CCH-01W

Thu, 2015-08-20 17:26
Today’s Messed-Up App of the Day is the “Crux CCH-01W rear-view camera for select 2007-up Jeep Wrangler models.”

A rear-view camera is an especially good idea in the Jeep Wrangler, because it is very difficult to see behind the vehicle. The rear seat headrests, the wiper motor housing, the spare tire, and the center brake light all conspire to obstruct much of what little view the window had given you to begin with.

The view is so bad that it’s easy to, for example, accidentally demolish a mailbox.

I chose the Crux CCH-01W because it is purpose-built for our 2012 Jeep Wrangler. It snaps right into the license plate frame. I liked that. It had 4.5 out of 5.0 stars in four reviews at, my favorite place to buy stuff like this. I liked that, too.

But I do not like the Crux CCH-01W. I returned it because our Jeep will be safer without this camera than with it. Here’s the story.

My installation process was probably pretty normal. I had never done a project like this before, so it took me longer than it should have. Crux doesn’t include any installation instructions with the camera, which is a little frustrating, but I knew that from the reviews. There is a lot of help online, and Crutchfield helped as much as I needed. After all the work of installing it, it was a huge thrill when I first shifted into Reverse and—voilà!—a picture appeared in my dashboard.

However, that was where the happiness would end. When I tried to use the camera, I noticed right away that the red, yellow, and green grid lines that the camera superimposes upon its picture didn’t make any sense. The grid lines showed that I was going to collide with the vehicle on my left that clearly wasn’t in jeopardy (an inconvenient false negative), and they showed that I was all-clear on the right when in fact I was about to ram into my garage door facing (a dangerous false positive).

The problem is that the grid lines are offset about two feet to the left. Of course, this is because the camera is about two feet to the left of the vehicle’s centerline. It’s above the license plate, below the left-hand tail light.

So then, to use these grid lines, you have to shift them in your mind about two feet to the right. In your mind. There’s no way to adjust them on the screen. Since this camera is designed exclusively for the left-hand corner of a 2007-up Jeep Wrangler, shouldn’t the designers have adjusted the location of the grid lines to compensate?

So, let’s recap. The safety device I bought to relieve driver workload and improve safety will, unfortunately, increase driver workload and degrade safety.

That’s bad enough, but it doesn’t end there. There is a far worse problem than just the misalignment of the grid lines.

Here is a photo of a my little girl standing a few feet behind the Jeep, directly behind the right rear wheel:

And here is what the camera shows the driver while she is standing there:

No way am I keeping that camera on the vehicle.

It’s easy to understand why it happens. The camera, which has a 120° viewing angle, is located so far off the vehicle centerline that it creates a blind spot behind the right-hand corner of the vehicle and grid lines that don’t make sense.

The Crux CCH-01W is one of those products that seems like nobody who designed it ever actually had to use it. I think it should never have been released.

As I was shopping for this project, my son and a local professional installer advised me to buy a camera that mounted on the vehicle centerline instead of this one. I didn’t take their advice because the reviews for the CCH-01W were good, and the price was $170 less. Fortunately, Crutchfield has a generous return policy, and the center-mounting 170°-view replacement camera that I’ll install this weekend has arrived today.

I’ve learned a lot. The second installation will go much more quickly than the first.

I Wish I Sold More

Wed, 2015-07-29 18:26
I flew home yesterday from Karen’s memorial service in Jacksonville, on a connecting flight through Charlotte. When I landed in Charlotte, I walked with all my stuff from my JAX arrival gate (D7) to my DFW departure gate (B15). The walk was more stressful than usual because the airport was so crowded.

The moment I set my stuff down at B15, a passenger with expensive clothes and one of those permanent grins established eye contact, pointed his finger at me, and said, “Are you in First?”

Wai... Wha...?

I said, “No, platinum.” My first instinct was to explain that I had a right to occupy the space in which I was standing. It bothers me that this was my first instinct.

He dropped his pointing finger, and his eyes went no longer interested in me. The big grin diminished slightly.

Soon another guy walked up. Same story: the I’m-your-buddy-because-I’m-pointing-my-finger-at-you thing, and then, “First Class?” This time the answer was yes. “ALRIGHT! WHAT ROW ARE YOU IN?” Row two. “AGH,” like he’d been shot in the shoulder. He holstered his pointer finger, the cheery grin became vaguely menacing, and he resumed his stalking.

One guy who got the “First Class?” question just stared back. So, big-grin guy asked him again, “Are you in First Class?” No answer. Big-grin guy leaned in a little bit and looked him square in the eye. Still no answer. So he leaned back out, laughed uncomfortably, and said half under his breath, “Really?...”

I pieced it together watching this big, loud guy explain to his traveling companions so everybody could hear him, he just wanted to sit in Row 1 with his wife, but he had a seat in Row 2. And of course it will be so much easier to take care of it now than to wait and take care of it when everybody gets on the plane.

Of course.

This is the kind of guy who sells things to people. He has probably sold a lot of things to a lot of people. That’s probably why he and his wife have First Class tickets.

I’ll tell you, though, I had to battle against hoping he’d hit his head and fall down on the jet bridge (I battled coz it’s not nice to hope stuff like that). I would never have said something to him; I didn’t want to be Other Jackass to his Jackass. (Although people might have clapped if I had.)

So there’s this surge of emotions, none of them good, going on in my brain over stupid guy in the airport. Sales reps...

This is why Method R Corporation never had sales reps.

But that’s like saying I’ve seen bad aircraft engines before and so now in my airline, I never use aircraft engines. Alrighty then. In that case, I hope you like gliders. And, hey: gliders are fine if that makes you happy. But a glider can’t get me home from Florida. Or even take off by itself.

I wish I sold more Method R software. But never at the expense of being like the guy at the airport. It seems I’d rather perish than be that guy. This raises an interesting question: is my attitude on this topic just a luxury for me that cheats my family and my employees out of the financial rewards they really deserve? Or do I need to become that guy?

I think the answer is not A or B; it’s C.

There are also good sales people, people who sell a lot of things to a lot of people, who are nothing like the guy at the airport. People like Paul Kenny and the honorable, decent, considerate people I work with now at Accenture Enkitec Group who sell through serving others. There were good people selling software at Hotsos, too, but the circumstances of my departure in 2008 prevented me from working with them. (Yes, I do realize: my circumstances would not have prevented me from working with them if I had been more like the guy at the airport.)

This need for duality—needing both the person who makes the creations and the person who connects those creations to people who will pay for them—is probably the most essential of the founder’s dilemmas. These two people usually have to be two different people. And both need to be Good.

In both senses of the word.

My Friend Karen

Wed, 2015-07-29 12:54
My friend Karen Morton passed away on July 23, 2015 after a four-month battle against cancer. You can hear her voice here.

I met Karen Morton in February 2002. The day I met her, I knew she was awesome. She told me the story that, as a consultant, she had been doing something that was unheard-of. She guaranteed her clients that if she couldn’t make things on their systems go at least X much faster on her very first day, then they wouldn’t have to pay. She was a Give First person, even in her business. That is really hard to do. After she told me this story, I asked the obvious question. She smiled her big smile and told me that her clients had always paid her—cheerfully.

It was an honor when Karen joined my company just a little while later. She was the best teammate ever, and she delighted every customer she ever met. The times I got to work with Karen were bright spots in my life, during many of the most difficult years of my career. For me, she was a continual source of knowledge, inspiration, and courage.

This next part is for Karen’s family and friends outside of work. You know that she was smart, and you know she was successful. What you may not realize is how successful she was. Your girl was famous all over the world. She was literally one of the top experts on Earth at making computing systems run faster. She used her brilliant gift for explaining things through stories to become one of the most interesting and fun presenters in the Oracle world to go watch, and her attendance numbers proved it. Thousands of people all over the world know the name, the voice, and the face of your friend, your daughter, your sister, your spouse, your mom.

Everyone loved Karen’s stories. She and I told stories and talked about stories, it seems like, all the time we were together. Stories about how Oracle works, stories about helping people, stories about her college basketball career, stories about our kids and their sports, ...

My favorite stories of all—and my family’s too—were the stories about her younger brother Ted. These stories always started out with some middle-of-the-night phone call that Karen would describe in her most somber voice, with the Tennessee accent turned on full-bore: “Kar’n: This is your brother, Theodore LeROY.” Ted was Karen’s brother Teddy Lee when he wasn’t in trouble, so of course he was always Theodore LeROY in her stories. Every story Karen told was funny and kind.

We all wanted to have more time with Karen than we got, but she touched and warmed the lives of literally thousands of people. Karen Morton used her half-century here on Earth with us as well as anyone I’ve ever met. She did it right.

God bless you, Karen. I love you.

What happened to “when the application is fast enough to meet users’ requirements?”

Fri, 2015-02-27 15:00
On January 5, I received an email called “Video” from my friend and former employee Guđmundur Jósepsson from Iceland. His friends call him Gummi (rhymes with “who-me”). Gummi is the guy whose name is set in the ridiculous monospace font on page xxiv of Optimizing Oracle Performance, apparently because O’Reilly’s Linotype Birka font didn’t have the letter eth (đ) in it. Gummi once modestly teased me that this is what he is best known for. But I digress...

His email looked like this:

It’s a screen shot of frame 3:12 from my November 2014 video called “Why you need a profiler for Oracle.” At frame 3:12, I am answering the question of how you can know when you’re finished optimizing a given application function. Gummi’s question is, «Oi! What happened to “when the application is fast enough to meet users’ requirements?”»

Gummi noticed (the good ones will do that) that the video says something different than the thing he had heard me say for years. It’s a fair question. Why, in the video, have I said this new thing? It was not an accident.
When are you finished optimizing?The question in focus is, “When are you finished optimizing?” Since 2003, I have actually used three different answers:
When are you are finished optimizing?
  1. When the cost of call reduction and latency reduction exceeds the cost of the performance you’re getting today.
    Source: Optimizing Oracle Performance (2003) pages 302–304.
  2. When the application is fast enough to meet your users’ requirements.
    Source: I have taught this in various courses, conferences, and consulting calls since 1999 or so.
  3. When there are no unnecessary calls, and the calls that remain run at hardware speed.
    Source: “Why you need a profiler for Oracle” (2014) frames 2:51–3:20.
My motive behind answers A and B was the idea that optimizing beyond what your business needs can be wasteful. I created these answers to deter people from misdirecting time and money toward perfecting something when those resources might be better invested improving something else. This idea was important, and it still is.

So, then, where did C come from? I’ll begin with a picture. The following figure allows you to plot the response time for a single application function, whatever “given function” you’re looking at. You could draw a similar figure for every application function on your system (although I wouldn’t suggest it).

Somewhere on this response time axis for your given function is the function’s actual response time. I haven’t marked that response time’s location specifically, but I know it’s in the blue zone, because at the bottom of the blue zone is the special response time RT. This value RT is the function’s top speed on the hardware you own today. Your function can’t go faster than this without upgrading something.

It so happens that this top speed is the speed at which your function will run if and only if (i) it contains no unnecessary calls and (ii) the calls that remain run at hardware speed. ...Which, of course, is the idea behind this new answer C.
Where, exactly, is your “requirement”?Answer B (“When the application is fast enough to meet your users’ requirements”) requires that you know the users’ response time requirement for your function, so, next, let’s locate that value on our response time axis.

This is where the trouble begins. Most DBAs don’t know what their users’ response time requirements really are. Don’t despair, though; most users don’t either.

At banks, airlines, hospitals, telcos, and nuclear plants, you need strict service level agreements, so those businesses investment into quantifying them. But realize: quantifying all your functions’ response time requirements isn’t about a bunch of users sitting in a room arguing over which subjective speed limits sound the best. It’s about knowing your technological speed limits and understanding how close to those values your business needs to pay to be. It’s an expensive process. At some companies, it’s worth the effort; at most companies, it’s just not.

How about using, “well, nobody complains about it,” as all the evidence you need that a given function is meeting your users’ requirement? It’s how a lot of people do it. You might get away with doing it this way if your systems weren’t growing. But systems do grow. More data, more users, more application functions: these are all forms of growth, and you can probably measure every one of them happening where you’re sitting right now. All these forms of growth put you on a collision course with failing to meet your users’ response time requirements, whether you and your users know exactly what they are, or not.

In any event, if you don’t know exactly what your users’ response time requirements are, then you won’t be able to use “meets your users’ requirement” as your finish line that tells you when to stop optimizing. This very practical problem is the demise of answer B for most people.
Knowing your top speedEven if you do know exactly what your users’ requirements are, it’s not enough. You need to know something more.

Imagine for a minute that you do know your users’ response time requirement for a given function, and let’s say that it’s this: “95% of executions of this function must complete within 5 seconds.” Now imagine that this morning when you started looking at the function, it would typically run for 10 seconds in your Oracle SQL Developer worksheet, but now after spending an hour or so with it, you have it down to where it runs pretty much every time in just 4 seconds. So, you’ve eliminated 60% of the function’s response time. That’s a pretty good day’s work, right? The question is, are you done? Or do you keep going?

Here is the reason that answer C is so important. You cannot responsibly answer whether you’re done without knowing that function’s top speed. Even if you know how fast people want it to run, you can’t know whether you’re finished without knowing how fast it can run.

Why? Imagine that 85% of those 4 seconds are consumed by Oracle enqueue, or latch, or log file sync calls, or by hundreds of parse calls, or 3,214 network round-trips to return 3,214 rows. If any of these things is the case, then no, you’re absolutely not done yet. If you were to allow some ridiculous code path like that to survive on a production system, you’d be diminishing the whole system’s effectiveness for everybody (even people who are running functions other than the one you’re fixing).

Now, sure, if there’s something else on the system that has a higher priority than finishing the fix on this function, then you should jump to it. But you should at least leave this function on your to-do list. Your analysis of the higher priority function might even reveal that this function’s inefficiencies are causing the higher-priority functions problems. Such can be the nature of inefficient code under conditions of high load.

On the other hand, if your function is running in 4 seconds and (i) its profile shows no unnecessary calls, and (ii) the calls that remain are running at hardware speeds, then you’ve reached a milestone:
  1. if your code meets your users’ requirement, then you’re done;
  2. otherwise, either you’ll have to reimagine how to implement the function, or you’ll have to upgrade your hardware (or both).
There’s that “users’ requirement” thing again. You see why it has to be there, right?

Well, here’s what most people do. They get their functions’ response times reasonably close to their top speeds (which, with good people, isn’t usually as expensive as it sounds), and then they worry about requirements only if those requirements are so important that it’s worth a project to quantify them. A requirement is usually considered really important if it’s close to your top speed or if it’s really expensive when you violate a service level requirement.

This strategy works reasonably well.

It is interesting to note here that knowing a function’s top speed is actually more important than knowing your users’ requirements for that function. A lot of companies can work just fine not knowing their users’ requirements, but without knowing your top speeds, you really are in the dark. A second observation that I find particularly amusing is this: not only is your top speed more important to know, your top speed is actually easier to compute than your users’ requirement (…if you have a profiler, which was my point in the video).

Better and easier is a good combination.
Tomorrow is important, tooWhen are you are finished optimizing?
  1. When the cost of call reduction and latency reduction exceeds the cost of the performance you’re getting today.
  2. When the application is fast enough to meet your users’ requirements.
  3. When there are no unnecessary calls, and the calls that remain run at hardware speed.
Answer A is still a pretty strong answer. Notice that it actually maps closely to answer C. Answer C’s prescription for “no unnecessary calls” yields answer A’s goal of call reduction, and answer C’s prescription for “calls that remain run at hardware speed” yields answer A’s goal of latency reduction. So, in a way, C is a more action-oriented version of A, but A goes further to combat the perfectionism trap with its emphasis on the cost of action versus the cost of inaction.

One thing I’ve grown to dislike about answer A, though, is its emphasis on today in “…exceeds the cost of the performance you’re getting today.” After years of experience with the question of when optimization is complete, I think that answer A under-emphasizes the importance of tomorrow. Unplanned tomorrows can quickly become ugly todays, and as important as tomorrow is to businesses and the people who run them, it’s even more important to another community: database application developers.
Subjective goals are treacherous for developersMany developers have no way to test, today, the true production response time behavior of their code, which they won’t learn until tomorrow. ...And perhaps only until some remote, distant tomorrow.

Imagine you’re a developer using 100-row tables on your desktop to test code that will access 100,000,000,000-row tables on your production server. Or maybe you’re testing your code’s performance only in isolation from other workload. Both of these are problems; they’re procedural mistakes, but they are everyday real-life for many developers. When this is how you develop, telling you that “your users’ response time requirement is n seconds” accidentally implies that you are finished optimizing when your query finishes in less than n seconds on your no-load system of 100-row test tables.

If you are a developer writing high-risk code—and any code that will touch huge database segments in production is high-risk code—then of course you must aim for the “no unnecessary calls” part of the top speed target. And you must aim for the “and the calls that remain run at hardware speed” part, too, but you won’t be able to measure your progress against that goal until you have access to full data volumes and full user workloads.

Notice that to do both of these things, you must have access to full data volumes and full user workloads in your development environment. To build high-performance applications, you must do full data volume testing and full user workload testing in each of your functional development iterations.

This is where agile development methods yield a huge advantage: agile methods provide a project structure that encourages full performance testing for each new product function as it is developed. Contrast this with the terrible project planning approach of putting all your performance testing at the end of your project, when it’s too late to actually fix anything (if there’s even enough budget left over by then to do any testing at all). If you want a high-performance application with great performance diagnostics, then performance instrumentation should be an important part of your feedback for each development iteration of each new function you create.
My answerSo, when are you finished optimizing?
  1. When the cost of call reduction and latency reduction exceeds the cost of the performance you’re getting today.
  2. When the application is fast enough to meet your users’ requirements.
  3. When there are no unnecessary calls and the calls that remain run at hardware speed.
There is some merit in all three answers, but as Dave Ensor taught me inside Oracle many years ago, the correct answer is C. Answer A specifically restricts your scope of concern to today, which is especially dangerous for developers. Answer B permits you to promote horrifically bad code, unhindered, into production, where it can hurt the performance of every function on the system. Answers&nnbsp;A and B both presume that you know information that you probably don’t know and that you may not need to know. Answer C is my favorite answer because it is tells you exactly when you’re done, using units you can measure and that you should be measuring.

Answer C is usually a tougher standard than answer A or B, and when it’s not, it is the best possible standard you can meet without upgrading or redesigning something. In light of this “tougher standard” kind of talk, it is still important to understand that what is optimal from a software engineering perspective is not always optimal from a business perspective. The term optimized must ultimately be judged within the constraints of what the business chooses to pay for. In the spirit of answer A, you can still make the decision not to optimize all your code to the last picosecond of its potential. How perfect you make your code should be a business decision. That decision should be informed by facts, and these facts should include knowledge of your code’s top speed.

Thank you, Guđmundur Jósepsson, of Iceland, for your question. Thank you for waiting patiently for several weeks while I struggled putting these thoughts into words.

“How did you learn so much stuff about Oracle?”

Fri, 2014-02-28 22:35
In LinkedIn, a new connection asked me a very nice question. He asked, “I know this might sound stupid, but how did you learn so much stuff about Oracle. :)”

Good one. I like the presumption that I know a lot of stuff about Oracle. I suppose that I do, at least about some some aspects of it, although I often feel like I don’t know enough. It occurred to me that answering publicly might also be helpful to anyone trying to figure out how to prepare for a career. Here’s my answer.

I took a job with the young consulting division of Oracle Corporation in September 1989, about two weeks after the very first time I had heard the word “Oracle” used as the name of a company. My background had been mathematics and computer science in school. I had two post-graduate degrees: a Master of Science Computer Science with a focus on language design and compilers, and a Master of Business Administration with a focus in finance.

My first “career job” was as a software engineer, which I started before the MBA. I designed languages and wrote compilers to implement those languages. Yes, people actually pay good money for that, and it’s possibly still the most fun I’ve ever had at work. I wrote software in C, lex, and yacc, and I taught my colleagues how to do it, too. In particular, I spent a lot of time teaching my colleagues how to make their C code faster and more portable (so it would run on more computers than just one on which you wrote it).

Even though I loved my job, I didn’t see a lot of future in it. At least not in Colorado Springs in the late 1980s. So I took a year off to get the MBA at SMU in Dallas. I went for the MBA because I thought I needed to learn more about money and business. It was the most difficult academic year of my life, because I was not particularly connected to or even interested in most of the subject matter. I hated a lot of my classes, which made it difficult to do as well as I had been accustomed. But I kept grinding away, and finished my degree in the year it was supposed to take. Of course I learned many, many things that year that have been vital to my career.

A couple of weeks after I got my MBA, I went to work for Oracle in Dallas, with a salary that was 168% of what it had been as a compiler designer. My job was to visit Oracle customers and help them with their problems.

It took a while for me to get into a good rhythm at Oracle. My boss was sending me to these local customers that were having problems with the Oracle Financial Applications (the “Finapps,” as we usually called them, which would many years later become the E-Business Suite) on version 6.0.26 of the ORACLE database (it was all caps back then). At first, I couldn’t help them near as much as I had wanted to. It was frustrating.

That actually became my rhythm: week after week, I visited these people who were having horrific problems with ORACLE and the Finapps. The database in 1990, although it had some pretty big bugs, was still pretty good. It was the applications that caused most of the problems I saw. There were a lot of problems, both with the software and with how it was sold. My job was to fix the problems. Some of those problems were technical. Many were not.

A lot of the problems were performance; problems of the software running “too slowly.” I found those problems particularly interesting. For those, I had some experience and tools at my disposal. I knew a good bit about operating systems and compilers and profilers and linkers and debuggers and all that, and so learning about Oracle indexes and rollback segments (two good examples, continual sources of customer frustration) wasn’t that scary of a step for me.

I hadn’t learned anything about Oracle or relational databases in school, I learned about how the database worked at Oracle by reading the documentation, beginning with the excellent Oracle® Database Concepts. Oracle sped me along a bit with a couple of the standard DBA courses.

My real learning came from being in the field. The problems my customers had were immediately interesting by virtue of being important. The resources available to me for solving such problems back in the early 1990s were really just books, email, and the telephone. The Internet didn’t exist yet. (Can you imagine?) The Oracle books available back then, for the most part, were absolutely horrible. Just garbage. Just about the only thing they were good for was creating problems that you could bill lots of consulting hours to fix. The only thing that was left was email and the telephone.

The problem with email and telephones, however, is that there has to be someone on the other end. Fortunately, I had that. The people on the other end of my email and phone calls were my saviors and heroes. In my early Oracle years, those saviors and heroes included people like Darryl Presley, Laurel Jamtgaard, Tom Kemp, Charlene Feldkamp, David Ensor, Willis Ranney, Lyn Pratt, Lawrence To, Roderick Mañalac, Greg Doherty, Juan Loaiza, Bill Bridge, Brom Mahbod, Alex Ho, Jonathan Klein, Graham Wood, Mark Farnham (who didn’t even work for Oracle, but who could cheerfully introduce me to anyone I needed), Anjo Kolk, and Mogens Nørgaard. I could never repay these people, and many more, for what they did for me. ...In some cases, at all hours of the night.

So, how did I learn so much stuff about Oracle? It started by immersing myself into a universe where every working day I had to solve somebody’s real Oracle problems. Uncomfortable, but effective. I survived because I was persistent and because I had a great company behind me, filled with spectacularly intelligent people who loved helping each other. Could I have done that on my own, today, with the advent of the Internet and lots and lots of great and reliable books out there to draw upon? I doubt it. I sincerely do. But maybe if I were young again...

I tell my children, there’s only one place where money comes from: other people. Money comes only from other people. So many things in life are that way.

I’m a natural introvert. I naturally withdraw from group interactions whenever I don’t feel like I’m helping other people. Thankfully, my work and my family draw me out into the world. If you put me into a situation where I need to solve a technical problem that I can’t solve by myself, then I’ll seek help from the wonderful friends I’ve made.

I can never pay it back, but I can try to pay it forward.

(Oddly, as I’m writing this, I realize that I don’t take the same healthy approach to solving business problems. Perhaps it’s because I naturally assume that my friends would have fun helping solve a technical problem, but that solving a business problem would not be fun and therefore I would be imposing upon them if I were to ask for help solving one. I need to work on that.)

So, to my new LinkedIn friend, here’s my advice. Here’s what worked for me:
  • Educate yourself. Read, study, experiment. Educate yourself especially well in the fundamentals. So many people don’t. Being fantastic at the fundamentals is a competitive advantage, no matter what you do. If it’s Oracle you’re interested in learning about, that’s software, so learn about software: about operating systems, and C, and linkers, and profilers, and debuggers, .... Read the Oracle Database Concepts guide and all the other free Oracle documentation. Read every book there is by Tom Kyte and Christian Antognini and Jonathan Lewis and Tanel Põder and Kerry Osborne and Karen Morton and James Morle all the other great authors out there today. And read their blogs.
  • Find a way to hook yourself into a network of people that are willing and able to help you. You can do that online these days. You can earn your way into a community by doing things like asking thoughtful questions, treating people respectfully (even the ones who don’t treat you respectfully), and finding ways to teach others what you’ve learned. Write. Write what you know, for other people to use and improve. And for God’s sake, if you don’t know something, don’t act like you do. That just makes everyone think you’re an asshole, which isn’t helpful.
  • Immerse yourself into some real problems. Read Scuttle Your Ships Before Advancing if you don’t understand why. You can solve real problems online these days, too (e.g., StackExchange and even, although I think that it’s better to work on real live problems at real live customer sites. Stick with it. Fix things. Help people.
Help people.

That’s my advice.

NoSQL and Oracle, Sex and Marriage

Fri, 2013-04-05 06:00
At last week’s Dallas Oracle Users Group meeting, an Oracle DBA asked me, “With all the new database alternatives out there today, like all these open source NoSQL databases, would you recommend for us to learn some of those?”

I told him I had a theory about how these got so popular and that I wanted to share that before I answered his question.

My theory is this. Developers perceive Oracle as being too costly, time-consuming, and complex:
  • An Oracle Database costs a lot. If you don’t already have an Oracle license, it’s going to take time and money to get one. On the other hand, you can just install Mongo DB today.
  • Even if you have an Oracle site-wide license, the Oracle software is probably centrally controlled. To get an installation done, you’re probably going to have to negotiate, justify, write a proposal, fill out forms, know, supplicate yourself to—er, I mean negotiate with—your internal IT guys to get an Oracle Database installed. It’s a lot easier to just install MySQL yourself.
  • Oracle is too complicated. Even if you have a site license and someone who’s happy to install it for you, it’s so big and complicated and proprietary... The only way to run an Oracle Database is with SQL (a declarative language that is alien to many developers) executed through a thick, proprietary, possibly even difficult-to-install layer of software like Oracle Enterprise Manager, Oracle SQL Developer, or sqlplus. Isn’t there an open source database out there that you could just manage from your operating system command line?
When a developer is thinking about installing a database today because he need one to write his next feature, he wants something cheap, quick, and lightweight. None of those constraints really sounds like Oracle, does it?

So your Java developers install this NoSQL thing, because it’s easy, and then they write a bunch of application code on top of it. Maybe so much code that there’s no affordable way to turn back. Eventually, though, someone will accidentally crash a machine in the middle of something, and there’ll be a whole bunch of partway finished jobs that die. Out of all the rows that are supposed to be in the database, some will be there and some won’t, and so now your company will have to figure out how to delete the parts of those jobs that aren’t supposed to be there.

Because now everyone understands that this kind of thing will probably happen again, too, the exercise may well turn into a feature specification for various “eraser” functions for the application, which (I hope, anyway) will eventually lead to the team discovering the technical term transaction. A transaction is a unit of work that must be atomic, consistent, isolated, and durable (that’where this acronym ACID comes from). If your database doesn’t guarantee that every arbitrarily complex unit of work (every transaction) makes it either 100% into the database or not at all, then your developers have to write that feature themselves. That’s a big, tremendously complex feature. On an Oracle Database, the transaction is a fundamental right given automatically to every user on the system.

Let’s look at just that ‘I’ in ACID for a moment: isolation. How big a deal is transaction isolation? Imagine that your system has a query that runs from 1 pm to 2 pm. Imagine that it prints results to paper as it runs. Now suppose that at 1:30 pm, some user on the system updates two rows in your query’s base table: the table’s first row and its last row. At 1:30, the pre-update version of that first row has already been printed to paper (that happened shortly after 1 pm). The question is, what’s supposed to happen at 2 pm when it comes time to print the information for the final row? You should hope for the old value of that final row—the value as of 1 pm—to print out; otherwise, your report details won’t add up to your report totals. However, if your database doesn’t handle that transaction isolation feature for you automatically, then either you’ll have to lock the table when you run the report (creating an 30-minute-long performance problem for the person wanting to update the table at 1:30), or your query will have to make a snapshot of the table at 1 pm, which is going to require both a lot of extra code and that same lock I just described. On an Oracle Database, high-performance, non-locking read consistency is a standard feature.

And what about backups? Backups are intimately related to the read consistency problem, because backups are just really long queries that get persisted to some secondary storage device. Are you going to quiesce your whole database—freeze the whole system—for whatever duration is required to take a cold backup? That’s the simplest sounding approach, but if you’re going to try to run an actual business with this system, then shutting it down every day—taking down time—to back it up is a real operational problem. Anything fancier (for example, rolling downtime, quiescing parts of your database but not the whole thing) will add cost, time, and complexity. On an Oracle Database, high-performance online “hot” backups are a standard feature.

The thing is, your developers could write code to do transactions (read consistency and all) and incremental (“hot”) backups. Of course they could. Oh, and constraints, and triggers (don’t forget to remind them to handle the mutating table problem), and automatic query optimization, and more, ...but to write those features Really Really Well™, it would take them 30 years and a hundred of their smartest friends to help write it, test it, and fund it. Maybe that’s an exaggeration. Maybe it would take them only a couple years. But Oracle has already done all that for you, and they offer it at a cost that doesn’t seem as high once you understand what all is in there. (And of course, if you buy it on May 31, they’ll cut you a break.)

So I looked at the guy who asked me the question, and I told him, it’s kind of like getting married. When you think about getting married, you’re probably focused mostly on the sex. You’re probably not spending too much time thinking, “Oh, baby, this is the woman I want to be doing family budgets with in fifteen years.” But you need to be. You need to be thinking about the boring stuff like transactions and read consistency and backups and constraints and triggers and automatic query optimization when you select the database you’re going to marry.

Of course, my 15-year-old son was in the room when I said this. I think he probably took it the right way.

So my answer to the original question—“Should I learn some of these other technologies?”—is “Yes, absolutely,” for at least three reasons:
  • Maybe some development group down the hall is thinking of installing Mongo DB this week so they can get their next set of features implemented. If you know something about both Mongo DB and Oracle, you can help that development group and your managers make better informed decisions about that choice. Maybe Mongo DB is all they need. Maybe it’s not. You can help.
  • You’re going to learn a lot more than you expect when you learn another database technology, just like learning another natural language (like English, Spanish, etc.) teaches you things you didn’t expect to learn about your native language.
  • Finally, I encourage you to diversify your knowledge, if for no other reason than your own self-confidence. What if market factors conspire in such a manner that you find yourself competing for an Oracle-unrelated job? A track record of having learned at least two database technologies is proof to yourself that you’re not going to have that much of a problem learning your third.

And now, ...the video

Wed, 2013-04-03 16:57
First, I want to thank everyone who responded to my prior blog post and its accompanying survey, where I asked when video is better than a paper. As I mentioned already in the comment section for that blog post, the results were loud and clear: 53.9% of respondents indicated that they’d prefer reading a paper, and 46.1% indicated that they’d prefer watching a video. Basically a clean 50/50 split.

The comments suggested that people have a lower threshold for “polish” with a video than with a paper, so one of the ideas to which I’ve needed to modify my thinking is to just create decent videos and publish them without expending a lot of effort in editing.

But how?

Well, the idea came to me in the process of agreeing to perform a 1-hour session for my good friends and customers at The Pythian Group. Their education department asked me if I’d mind if they recorded my session, and I told them I would love if they would. They encouraged me to open it up the the public, which of course I thought (cue light bulb over head) was the best idea ever. So, really, it’s not a video as much as it’s a recorded performance.

I haven’t edited the session at all, so it’ll have rough spots and goofs and imperfect audio... But I’ve been eager ever since the day of the event (2013-02-15) to post it for you.

So here you go. For the “other 50%” who prefer watching a video, I present to you: “The Method R Profiling Ecosystem.” If you would like to follow along in the accompanying paper, you can get it here: “A First Look at Using Method R Workbench Software.”

When is Video Better?

Thu, 2012-11-15 16:59
Ok, I’m stuck, and I need your help.

At my company, we sell software tools that help Oracle application developers and database administrators see exactly where their code spends their users’ time. I want to publish better information at our web page that will allow people who are interested to learn more about our software, and that will allow people who don’t even realize we exist to discover what we have. My theory is that the more people who understand exactly what we have, the more customers we’ll get, and we have some evidence that bears that out.

I’ve gotten so much help from YouTube in various of my endeavors that I’ve formed the grand idea in my head:
We need more videos showing our products.The first one I made is the 1:13 video on YouTube called “Method R Tools v3.0: Getting Started.” I’m interested to see how effective it is. I think this content is perfect for the video format because the whole point is to show you how easy it is to get going. Saying it’s easy just isn’t near as fun or convincing as showing it’s easy.

The next thing I need to share with people is a great demonstration that Jeff Holt has helped me pull together, which shows off all the tools in our suite, how they interact and solve an interesting and important common problem. But this one can’t be a 1-minute video; it’s a story that will take a lot longer to tell than that. It’s probably a 10- to 15-minute story, if I had to guess.

Here’s where I’m stuck. Should I make a video? Or write up a blog post for it? The reason this is a difficult question is that making a video costs me about 4 hours per minute of output I can create. That will get better over time, as I practice and accumulate experience. But right now, videos cost me a lot of time. On the other hand, I can whip together a blog post with plenty of detail in a fraction of the time.

Where I need your help is to figure out how much benefit there is, really, to creating a video instead of just a write-up. Some things I have to consider:
  • Would a 15-minute video capture the attention of people who Do people glaze over (TL;DR) on longish printed case studies on which they’d gladly watch about 15 minutes of video?
  • Or do people just pass on the prospect of watching a 15-minute case study about a problem they might not even realize they have yet? I ask this, because I find myself not clicking on any video that I know will take longer than a minute or two, unless I believe it’s going to help me solve a difficult problem that I know I have right now.
So, if you don’t mind giving me a hand, I’ve created a survey at SurveyMonkey that asks just three simple questions that will help me determine which direction to go next. I’m eager to see what you say.

Thank you for your time.

Is “Can’t” Really the Word You Want?

Tue, 2012-08-28 12:16
My friend Chester (Chet Justice in real life; oraclenerd when he puts his cape on) tweeted this yesterday:
can’t lives on won’t street - i’m sure my son will hate me when he’s older for saying that all the time.

I like it. It reminds me of an article that I drafted a few months ago but hadn’t posted yet. Here it is...

When I ask for help sometimes, I find myself writing a sentence that ends with, “…but I can’t figure it out.” I try to catch myself when I do that, because can’t is not the correct word.

Here’s what can’t means. Imagine a line representing time, with the middle marking where you are at some “right now” instant in time. The leftward direction represents the past, and the rightward direction represents the future.

Can’t means that I am incapable of doing something at every point along this timeline: past, present, and future.

Now, of course, can’t is different from mustn’t—“must not”which means that you’re not supposed to try to do something, presumably because it’s bad for you. So I’m not talking about the can/may distinction that grammarians bring to your attention when you say, “Can I have a candy bar?” and then they say, “I don’t know, can you?” And then you have to say, “Ok, may I have a candy bar” to actually have candy bar. I digress.

Back to the timeline. There are other words you can use to describe specific parts of that timeline, and here is where it becomes more apparent that can’t is often just not the right word:
  • Didn’t, a contraction of “did not.” It means that you did not do something in the past. It doesn’t necessarily mean you couldn’t have, or that you were incapable of doing it; it’s just a simple statement of fact that you, in fact, did not.
  • Aren’t, a contraction of “are not.” It means that you are in fact not doing something right now. This is different from “don’t,” which often is used to state an intention about the future, too, as in “I don’t smoke,” or “I do not like them, Sam-I-am.”
  • Won’t, a contraction of “will not.” It means that you will not do something in the future. It doesn’t necessarily mean you couldn’t, or that you are going to be incapable of doing it; it’s just a simple statement of intention that you, right now, don’t plan to. That’s the funny thing about your future. Sometimes you decide to change your plans. That’s ok, but it means that sometimes when you say won’t, you’re going to be wrong.
Here’s how it all looks on the timeline:

So, when I ask for help and I almost say, “I can’t figure it out,” the truth is really only that “I didn’t figure it out.”


...Because, you see, I do not have complete knowledge about the future, so it is not correct for me to say that I will never figure it out. Maybe I will. However, I do have complete knowledge of the past (this one aspect of the past, anyway), and so it is correct to speak about that by saying that I didn’t figure it out, or that I haven’t figured it out.

Does it seem like I’m going through a lot of bother making such detailed distinctions about such simple words? It matters, though. The people around you are affected by what you say and write. Furthermore, you are affected by the the stuff you say and write. Not only do our thoughts affect our words, our words affect our thoughts. The way you say and write stuff changes how you think about stuff. So if you aspire to tell the truth (is that still a thing?), or if you just want to know more about the truth, then it’s important to get your words right.

Now, back to the timeline. Just because you haven’t done something doesn’t mean you can’t, which means that you never will. Can’t is an as-if-factual statement about your future. Careless use of the word can’t can hurt people when you say it about them. It can hurt you when you think it about yourself.

Listen for it; you’ll hear it all the time:
  • “Why can’t you sit still?!” If you ask the question more accurately, it kind of answers itself: “Why haven’t you sat still for the past hour and a half?” Well, dad, maybe it’s because I’m bored and, oh, maybe it’s because I’m five! Asking why you haven’t been sitting still reminds me that perhaps there’s something I can do to make it easier for both of us to work through the next five minutes, and that maybe asking you to sit still for the next whole hour is asking too much.
  • “You can’t run that fast.” Prove it. Just because you haven’t doesn’t mean you never will. Before 1954, people used to say you can’t run a four-minute mile, that nobody can. Today, over a thousand people have done it. Can you become the most commercially successful and critically acclaimed act in history of popular music if you can’t read music? Well, apparently you can: that’s what the Beatles did. I’m probably not going to, but understanding that it’s not impossible is more life-enriching than believing I can’t.
  • “You can’t be trusted.” Rough waters here, mate. If you’ve behaved in such a manner that someone would say this about you, then you’ve got a lot of work in front of you. But no. No matter what, so far, all you’ve demonstrated is that you have been untrustworthy in the past. A true statement about your past is not necessarily a true statement about your future. It’s all about the choices you make from this moment onward.
Lots of parents and teachers don’t like can’t for its de-motivational qualities. I agree: when you think you can’t, you most likely won’t, because you won’t even try. It’s Chet’s “WONT STREET”.

When you think clearly about its technical meaning, you can also see that it’s also a word that’s often just not true. I hate being wrong. So I try not to use the word can’t very often.