Cary Millsap

Subscribe to Cary Millsap feed
My web log for things I’m interested in, including design, software development, performance analysis, learning, and running a business.Cary Millsap
Updated: 3 hours 12 min ago

An Organizational Constraint that Diminishes Software Quality

Thu, 2012-06-07 10:19
One of the biggest problems in software performance today occurs when the people who write software are different from the people who are required to solve the performance problems that their software causes. It works like this:
  1. Architects design a system and pass the specification off to the developers.
  2. The developers implement the specs the architects gave them, while the architects move on to design another system.
  3. When the developers are “done” with their phase, they pass the code off to the production operations team. The operators run the system the developers gave them, while the developers move on to write another system.
The process is an assembly line for software: architects specialize in architecture, developers specialize in development, and operators specialize in operating. It sounds like the principle of industrial efficiency taken to its logical conclusion in the software world.

In this waterfall project plan,
architects design systems they never see written,
and developers write systems they never see run.
Sound good? It sounds like how Henry Ford made a lot of money building cars... Isn’t that how they build roads and bridges? So why not?

With software, there’s a horrible problem with this approach. If you’ve ever had to manage a system that was built like this, you know exactly what it is.

The problem is the absence of a feedback loop between actually using the software and building it. It’s a feedback loop that people who design and build software need for their own professional development. Developers who never see their software run don’t learn enough about how to make their software run better. Likewise, architects who never see their systems run have the same problem, only it’s worse, because (1) their involvement is even more abstract, and (2) their feedback loops are even longer.

Who are the performance experts in most Oracle shops these days? Unfortunately, it’s most often the database administrators, not the database developers. It’s the people who operate a system who learn the most about the system’s design and implementation mistakes. That’s unfortunate, because the people who design and write a system have so much more influence over how a system performs than do the people who just operate it.

If you’re an architect or a developer who has never had to support your own software in production, then you’re probably making some of the same mistakes now that you were making five years ago, without even realizing they’re mistakes. On the other hand, if you’re a developer who has to maintain your own software while it’s being operated in production, you’re probably thinking about new ways to make your next software system easier to support.

So, why is software any different than automotive assembly, or roads and bridges? It’s because software design is a process of invention. Almost every time. When is the last time you ever built exactly the same software you built before? No matter how many libraries you’re able to reuse from previous projects, every system you design is different from any system you’ve ever built before. You don’t just stamp out the same stuff you did before.

Software is funny that way, because the cost of copying and distributing it is vanishingly small. When you make great software that everyone in the world needs, you write it once and ship it at practically zero cost to everyone who needs it. Cars and bridges don’t work that way. Mass production and distribution of cars and bridges requires significantly more resources. The thousands of people involved in copying and distributing cars and bridges don’t have to know how to invent or refine cars or bridges to do great work. But with software, since copying and distributing it is so cheap, almost all that’s left is the invention process. And that requires feedback, just like inventing cars and bridges did.

Don’t organize your software project teams so that they’re denied access to this vital feedback loop.

The String Puzzle

Thu, 2012-03-29 12:29
I gave my two boys an old puzzle to solve yesterday. I told them that I’d give them each $10 if they could solve it for me. It’s one of the ways we do the “allowance” thing around the house sometimes.

Here’s the puzzle. A piece of string is stretched tightly around the Earth along its equator. Imagine that this string along the equator forms a perfect circle, and imagine that to reach around that perfect circle, the string has to be exactly 25,000 miles long. Now imagine that you wanted to suspend this string 4 inches above the surface of the Earth, all the way around it. How much longer would the string have to be do do this?

Before you read any further, guess the answer. How much longer would the string have to be? A few inches? Several miles? What do you think?

Now, my older son Alex was more interested in the problem than I thought he would be. He knows the formula for computing the circumference of a circle as a function of its diameter, and he knew that raising the string 4 inches above the surface constituted a diameter change. So the kernel of a solution had begun to formulate in his head. And he had a calculator handy, which he loves to use.

We were at Chipotle for dinner. The rest of the family went in to order, and Alex waited in the truck to solve the problem “where he could have some peace and quiet.” He came into the restaurant in time to order, and he gave me a number that he had cooked up on his calculator in the truck. I had no idea whether it was correct or not (I haven’t worked the problem in many years), so I told him to explain to me how he got it.

When he explained to me what he had done, he pretty quickly discovered that he had made a unit conversion error. He had manipulated the ‘25,000’ and the ‘4’ as if they had been expressed in the same units, so his answer was wrong, but it sounded like conceptually he got what he needed to do to solve the problem. So I had him write it down. On a napkin, of course:

The first thing he did was draw a sphere (top center) and tell me that the diameter of this sphere is 25,000 miles divided by 3.14 (the approximation of π that they use at school). He started dividing that out on his calculator when I pulled the “Whoa, wait” thing where I asked him why he was dividing those two quantities, which caused him, grudgingly, to write out that C = 25,000 mi, that C = πd, and that therefore d = C/π. So I let him figure out that d ≈ 7,961 mi. There’s loss of precision there, because of the 3.14 approximation, and because there are lots of digits to the right of the decimal point after ‘7961’, but more about that later.

I told him to call the length of the original string C (for circumference) and to call the 4-inch suspension distance of the string h (for height), and then write me the formula for the length of the 4-inch high string, without worrying about any unit conversion issues. He got the formula pretty close on the first shot. He added 4 inches to the diameter of the circle instead of adding 4 inches to the radius (you can see the ‘4’ scratched out and replaced with an ‘8’ in the “8 in/63360 in” expression in the middle of the napkin. Where did the ‘63360’ come from, I asked? He explained that this is the number of inches in a mile (5,280 × 12). Good.

But I asked him to hold off on the unit conversion stuff until the very end. He wrote the correct formula for the length of the new string, which is [(C/π) + 2h]·π (bottom left). Then I let him run the formula out on his calculator. It came out to something bigger than exactly 25,000; I didn’t even look at what he got. This number he had produced minus 25,000 would be the answer we were looking for, but I knew there would be at least two problems with getting the answer this way:
  • The value of π is approximately 3.14, but it’s not exactly 3.14.
  • Whenever he had to transfer a precise number from one calculation to the next, I knew Alex was either rounding or truncating liberally.
So, I told him we were going to work this problem out completely symbolically, and only plug the numbers in at the very end. It turns out that doing the problem this way yields a very nice little surprise.

Here’s my half of the napkin:

I called the new string length cʹ and the old string length c. The answer to the puzzle is the value of cʹ − c.

The new circumference cʹ will be π times the new diameter, which is c/π + 2h, as Alex figured out. The second step distributes the π factor through the addition, resulting in cʹ − c = πc/π + 2πh − c. The πc/π term simplifies to just c, and it’s the final step where the magic happens: cʹ − c = c + 2πhc reduces simply to cʹ − c = 2πh. The difference between the new string length and the old one is 2πh, which in our case (where h = 4 inches) is roughly 25.133 inches.

So, problem solved. The string will have to be about 25.133 inches longer if we want to suspend it 4 inches above the surface.

Notice how simple the solution is: the only error we have to worry about is how precisely we want to represent π in our calculation.

Here’s the even cooler part, though: there is no ‘c’ in the formula for the answer. Did you notice that? What does that mean?

It means that the original circumference doesn’t matter. It means that if we have a string around the Moon that we want to raise 4 inches off the surface, we just need another 25.133 inches. How about a string stretched around Jupiter? just 25.133 more inches. Betelgeuse, a star whose diameter is about the same size as Jupiter’s orbit? Just 25.133 more inches. The whole solar system? Just 25.133 more inches. The entire Milky Way galaxy? Just 25.133 more inches. A golf ball? Again, 25.133 more inches. A single electron? Still 25.133 inches.

This is the kind of insight that solving a problem symbolically provides. A numerical solution tends to answer a question and halt the conversation. A symbolic formula answers our question and invites us to ask more.

The calculator answer is just a fish (pardon the analogy, but a potentially tainted fish at that). The symbolic answer is a fishing pole with a stock pond.

So, did I pay Alex for his answer? No. Giving two or three different answers doesn’t close the deal, even if one of the answers is correct. He doesn’t get paid for blurting out possible answers. He doesn’t even get paid for answering the question correctly; he gets paid for convincing me that he has created a correct answer. In the professional world, that is the key: the convincing.

Imagine that a consultant or a salesman told you that you needed to execute a $250,000 procedure to make your computer application run faster. Would you do it? Under what circumstances? If you just trusted him and did it, but it didn’t do what you had hoped, would you ever trust him again? I would argue that you shouldn’t trust an answer without a compelling rationale, and that the recommender’s reputation alone is not a compelling rationale.

The deal is, whenever Alex can show me the right answer and convince me that he’s done the problem correctly, that’s when I’ll give him the $10. I’m guessing it’ll happen within the next three days or so. The interesting bet is going to be whether his little brother beats him to it.

Gwen Shapira on SSD

Sun, 2011-12-04 06:13
If you haven’t seen Gwen Shapira’s article about de-confusing SSD, I recommend that you read it soon.

One statement stood out as an idea on which I wanted to comment:
If you don’t see significant number of physical reads and sequential read wait events in your AWR report, you won’t notice much performance improvements from using SSD.I wanted to remind you that you can do better. If you do notice a significant number of physical reads and sequential write wait events in your AWR report, then it’s still not certain that SSD will improve the performance of the task whose performance you’re hoping to improve. You don’t have to guess about the effect that SSD will have upon any business task you care about. In 2009, I wrote a blog post that explains.

I Can Help You Trace It

Fri, 2011-11-18 22:59
The first product I ever created after leaving Oracle Corporation in 1999 was a 3-day course about optimizing Oracle performance. The experiences of teaching this course from 2000 through 2003 (heavily revising the material each time I taught it) added up to the knowledge that Jeff Holt and I needed to write Optimizing Oracle Performance (2003).

Between 2000 and 2006, I spent many weeks on the road teaching this 3-day course. I stopped teaching it in 2006. An opportunity to take or teach a course ought to be a joyous experience, and this one had become too much of a grind. I didn’t figure out how to fix it until this year. How I fixed it is the story I’d like to tell you.
The ProblemThe problem was simply inefficiency. The inefficiency began with the structure of the course, the 3-day lecture marathon. Realize, 6 × 3 = 18 hours of sitting in a chair, listening attentively to a single voice (my voice) is the equivalent of a 6-week university term of a 3-credit-hour course, taught straight through in three days. No hour-plus homework assignment after each hour of lecture to reinforce the lessons; but a full semester’s worth of listening to one voice, straight through, for three days. What retention rate would you expect from a university course compressed into just 3 days?

So, I optimized. I have created a new course that lasts one day (not even an exhausting full day at that). But how can a student possibly learn as much in 1 day as we used to teach in 3 days? Isn’t a 1-day event bound to be a significantly reduced-value experience?

On the contrary, I believe our students benefit even more now than they used to. Here are the big differences, so you can see why.
The Time SavingsIn the 3-day course, I would spend half a day explaining why people should abandon their old system-wide-ratio-based ways of managing system performance. In the new 1-day course, I spend less than an hour explaining the Method R approach to thinking about performance. The point of the new course is not to convince people to abandon anything they’re already doing; it’s to show students the tremendous additional opportunities that are available to them if they’ll just look at what Oracle trace files have to offer. Time savings: 2 hours.

In the 3-day course, I would spend a full day explaining how to interpret trace data. By hand. These were a few little lab exercises, about an hour’s worth. Students would enter dozens of numbers from trace files into laptops or pocket calculators and write results on worksheets. In the new 1-day course, the software tools that a student needs to interpret files of any size—or even directories full of files—are included in the price of the course. Time savings: 5 hours.

In the 3-day course, I would spend half a day explaining how to collect trace data. In the new 1-day course, the software tools that a student needs to get started collecting trace files are included in the price of the course. For software architectures that require more work than our software can do for you, there’s detailed instruction in the course book. Time savings: 3 hours.

In the 3-day course, I would spend half a day working through about five example cases using a software tool to which students would have access for 30 days after they had gone home. In the new 1-day course, I spend one hour working through about eight example cases using software tools that every student will take home and keep forever. I can spend less time per case yet teach more because the cases are thoroughly documented in the course book. So, in class, we focus on the high-level decision making instead of the gnarly technical details you’ll want to look up later anyway. Time savings: 3 hours.

...That’s 13 classroom hours we’ve eliminated from the old 3-day experience. I believe that in these 13 hours, I was teaching material that students weren’t retaining to begin with.
The BookThe next big difference: the book.

In the old 3-day course, I distributed two books: (1) the “Course Notebook,” which was a black and white listing of the course PowerPoint slides, and (2) a copy of Optimizing Oracle Performance (O’Reilly 2003). The O’Reilly book was great, because it contained a lot of detail that you would want to look up after the course. But of course it doesn’t contain any new knowledge we’ve learned since 2003. The Course Notebook, in my opinion, was never worth much to begin with. (In my opinion, no PowerPoint slide printout is worth much to begin with.)

The Mastering Oracle Trace Data (MOTD) book we give each student in my new 1-day course is a full-color, perfect-bound book that explains the course material and far more in deep detail. It is full-color for an important reason. It’s not gratuitous or decorative; it’s because I’ve been studying Edward Tufte. I use color throughout the book to communicate detailed, high-resolution information faster to your brain.

Color in the book helps to reduce student workload and deliver value long after a student has left the classroom. In this class, there is no collection of slide printouts like you’ve archived after every Oracle class you’ve been to since the 1980s. The MOTD book is way better than any other material I’ve ever distributed in my career. I’ve heard students tell their friends that you have to see it to believe it.
“A paper record tells your audience that you are serious, responsible, exact, credible. For deep analysis of evidence and reasoning about complex matters, permanent high-resolution displays [that is, paper] are an excellent start.” —Edward TufteThe SoftwareSo, where does a student recoup all the time we used to spend going through trace files, and studying how to collect trace data on half a dozen different software architectures? In the thousands of man-hours we’ve invested into the software that we give you when you come to the course. Instead of explaining every little detail about quirks in Oracle trace data that change between Oracle versions 10.1 and 10.2 and 11.2 or and, the software does the work for you. Instead of having to explain all the detail work, we have time to explain how to use the results of our software to make decisions about your data.

What’s the catch? Of course, we hope you’ll love our software and want to buy it. The software we give you is completely full-featured and yours to keep forever, but the license limits you to using it only with one login id, and it doesn’t include patches and upgrades, which we release a few times each year. We hope you’ll love our software so much that you’ll want to buy a license that lets you use it on any of your systems and that includes the right to upgrade as we fix bugs and add features. We hope you’ll love it so much that you encourage your colleagues to buy it.

But there’s really no catch. You get software and a course (and a book and a shirt) for less than the daily rate that we used to charge for just a course.
A Shirt?MOTD London 2011-09-08: “I can help you trace it.”Yes, a shirt. Each student receives a Method R T-shirt that says, “I can help you trace it.” We don’t give these things away to anyone except for students in my MOTD course. So if you see one, the person wearing it can, in actual fact, Help You Trace It.
The Net ResultThe net result of all this optimization is benefits on several fronts:
  • The course costs a lot less than it used to. The fee is presently only about 25% of the 3-day course’s price, and the whole experience requires less than ⅓ of time away from work that the original course did.
  • In the new course, our students don’t have to work so hard to make productive use of the course material. The book and the software take so much of the pressure off. We do talk about what the fields in raw trace data mean—I think it’s necessary to know that in order to use the data properly, and have productive debates with your sys/SAN/net/etc. administration colleagues. But we don’t spend your time doing exercises to untangle nested (recursive) calls by hand. The software you take home does that for you. That’s why it is so much easier for a student to put this course to work right away.
  • Since the course duration is only one day, I can visit far more cities and meet far more students each year. That’s good for students who want to participate, and it’s great for me, because I get to meet more people.
PlansThe only thing missing from our Mastering Oracle Trace Data course right now is you. I have taught the event now in Southlake, Texas (our home town), in Copenhagen, and in London. It’s field-tested and ready to roll. We have several cities on my schedule right now. I’ll be teaching the course in Birmingham UK on the day after UKOUG wraps up, December 8. I’ll be doing Orlando and Tampa in mid-December. I’ll teach two courses this coming January in Manhattan and Long Island. There’s Billund (Legoland) DK in April. We have more plans in the works for Seattle, Portland, Dallas, and Cleveland, and we’re looking for more opportunities.

Share the word by linking the official
MOTD sticker to wish is for you to help me book more cities in North America and Europe (I hope to expand beyond that soon). If you are part of a company or a user group with colleagues who would be interested in attending the course, I would love to hear from you. Registering en masse saves you money. The magic number for discounting is 10 students on a single registration from one company or user group.

I can help you trace it.

Using Agile Practices to Create an Agile Presentation

Fri, 2011-06-17 13:25
What’s the best way to make a presentation on Agile practices? Practice Agile practices.

You could write a presentation “big bang” style, delivering version 1.0 in front of your big audience of 200+ people at Kscope 2011 before anybody has seen it. Of course, if you do it that way, you build a lot of risk into your product. But what else can you do?

You can execute the Agile practices of releasing early and often, allowing the reception of your product to guide its design. Whenever you find an aspect of your product that doesn’t get the enthusiastic reception you had hoped for, you fix it for the next release.

That’s one of the reasons that my release schedule for “My Case for Agile Methods” includes a little online webinar hosted by Red Gate Software next week. My release schedule is actually a lot more complicated than just one little pre-ODTUG webinar:

2011-04-15Show key conceptual graphics to son (age 13)2011-04-29Review #1 of paper with employee #12011-05-18Review #2 of paper with customer2011-05-14Review #3 of paper with employee #12011-05-18Review #4 of paper with employee #22011-05-26Review #5 of paper with employee #32011-06-01Submit paper to ODTUG web site2011-06-02Review #6 of paper with employee #12011-06-06Review #7 of paper with employee #32011-06-10Submit revised paper to ODTUG web site2011-06-13Present “My Case for Agile Methods” to twelve people in an on-site customer meeting2011-06-22Present “My Case for Agile Methods” in an online webinar hosted by Red Gate Software2011-06-27Present “My Case for Agile Methods” at ODTUG Kscope 2011 in Long Beach, California
(By the way, the vast majority of the work here is done in Pages, not Keynote. I think using a word processor, not an operating system for slide projectors.)

Two Agile practices are key to everything I’ve ever done well: incremental design and rapid iteration. Release early, release often, and incorporate what you learn from real world use back into the product. The magic comes from learning how to choose wisely in two dimensions:
  1. Which feature do you include next?
  2. To whom do you release next?
The key is to show your work to other people. Yes, there’s tremendous value in practicing a presentation, but practicing without an audience merely reinforces, it doesn’t inform. What you need while you design something is information—specifically, you need the kind of information called feedback. Some of the feedback I receive generates some pretty energetic arguing. I need that to fortify my understanding of my own arguments so that I’ll be more likely to survive a good Q&A session on stage.

To lots of people who have seen teams run projects into the ground using what they call “Agile,” the word “Agile” has become a synonym for sloppy, irresponsible work habits. When you hear me talk about Agile, you’ll hear about practices that are highly disciplined and that actually require a lot of focus, dedication, commitment, practice, and plain old hard work to execute.

Agile, to me, is about injecting discipline into a process that is inevitably rife with unpredictable change.

Why KScope?

Fri, 2011-06-03 10:09
Early this year, my friend Mike Riley from ODTUG asked me to write a little essay in response to the question, “Why Kscope?” that he could post on the ODTUG blog. He agreed that cross-posting would help the group reach more people, so I’ve reproduced my response to that question here. I’ll hope to see you at Kscope11 in Long Beach June 26–30. If you develop applications for Oracle systems, you need to be there.

MR: Why KScope?

CM: Most people in the Oracle world who know my name probably think of me as a database administrator. In my heart, I am a software designer and developer. Before my career with Oracle, I worked in the semiconductor industry as a language designer. I wrote compilers for a living. Designing and writing software has always been my professional true love. I’ve never strayed too far away from it; I’ve always found a reason to write software, no matter what my job has been. [Ed: Examples include the Oracle*APS suite and a compiler design project he did for Great West Life in the 1990s, the queueing theory models he worked on in the late 1990s, the Method R Profiler software (Cary wrote all the XSLT code), and finally today, he spends about half of his time designing and writing the MR Tools suite.]

My career as an Oracle performance specialist is really a natural extension of my software development background. It is still really weird to me that in the Oracle market, performance is regarded as a job done primarily by operations people instead of by development people. Developers control at least 90% of the leverage over how fast an application will be able to run. I think that performance became a DBA responsibility in the formative years of our Oracle world because so many early Oracle projects had DBA teams but no professional development teams.

Most of those big projects were people implementing big off-the-shelf applications like Oracle Financial and Manufacturing Applications (which grew into the Oracle E-Business Suite). The only developers that most of those implementation teams had were what I would call nonprofessional developers. Now, I don’t mean people who were in any way unprofessional. I mean they were predominantly businesspeople who had never been educated as software developers, but who’d been told that of course anybody could write computer programs in this new “fourth-generation language” called SQL.

Just about any time you implement a vendor’s highly customizable new application with 20,000+ database objects underneath it, you’re going to run into performance problems. Someone had to attend to those problems, and the DBAs and sysadmins were the only technical people anywhere near the project who could do it. Those DBAs and Oracle sysadmins were also the people who organized the early Oracle conferences, and I think this is where the topic of “performance tuning” became embedded into the DBA track.

The resulting problem that I still see today is that the topic became dominated by “tips and techniques”—lists of tricks that operational people could try to maybe make their systems go a little bit faster. The word “tuning” says it all. I almost never use the word except facetiously, because it’s a cheap imitation of what systems really need, which is performance optimization, which is what designers and developers of software are supposed to do. Even the evolution of Oracle tools for the performance analyst mirrors this post-production tips-and-techniques “tuning” mentality. That’s why most performance management tools you see today are predominantly oriented toward viewing performance from a system resource perspective (the DBA’s perspective), rather than the code path perspective (the developer’s perspective).

The whole key to performance is the application design and development team, especially when you realize that the performance of an application is not just its code path speed, but its overall interaction with the person using it. So many of the performance problems that I’ve found are caused by applications that are just stupid in how they’re designed to interact with me. For example, if you’ve seen my “Messed-up apps” presentation before, you might remember the self-service bus ticket kiosk that made me wait for over a minute while the application tallied the more-than-2,000 different bus trips for which I might want to buy a ticket. That’s an app with a broken specification. There’s nothing that a run-time operations team can do to make that application any fun to use (short of sending it back for redesign).

My goal as a software designer is not just to make software that runs quickly. My goal is also to make applications that are delightful to use. It’s the difference between an application that you use because you must and one that feels like it’s a necessary part of who you are. Making software like that is the kind of thing that a designer learns from studying Don Norman, Edward Tufte, Christopher Alexander, and Jonathan Ive. It’s a level of performance that just isn’t on the menu for operational run-time support staff to even think about, because it’s beyond their control.

So: why Kscope? The ODTUG conferences are the best places I can go in the Oracle market where I can be with people who think and talk about these things. …Or for that matter, who understand that these ideas even exist and deserve to be studied. KScope is just the right place for me to be.

It’s Conference Season!

Mon, 2011-02-14 16:59
My favorite mode of life is being busy doing something that I enjoy and that I know, beyond a doubt, is the Right Thing to be doing. Any hour I get to spend in that zone is a precious gift.

I’ve been in that zone nearly continuously for the past three weeks. I’ve been doing two of my favorite things: lots of consulting work (helping, earning, and learning), and lots of software development work (which helps me help, earn, and learn even faster).

I’m looking forward to the next four weeks, too, because another Right Thing that I love to do is talk with people about software performance, and three of my favorite events where I can do that are coming right up:
  • RMOUG Training Days, Denver CO — I leave tomorrow. I’m looking forward to reuniting with lots of good friends. My stage time will be Wednesday, February 16th, when I’ll talk about material from my new “Mastering Performance with Extended SQL Trace” paper. 
  • NoCOUG Winter Conference, Pleasanton CA — I’ll be in the east Bay Area on Thursday, February 24th presenting the keynote address where I’ll discuss whether Exadata means never having to “tune” again and then spending two hours helping people to think clearly about performance.
  • Hotsos Symposium, Irving TX — I’ll present “Thinking Clearly about Performance” on Monday, March 7th. I love the agenda at this event. It’s a high quality lineup that is dedicated purely to Oracle software performance. This is one of the very few conferences where I can enjoy sitting and just watching for whole days at a time. If you are interested in Oracle system performance, do not miss this. 
Happy Valentine’s Day. I shall hope to see you soon.

Describing Performance Improvements (Beware of Ratios)

Fri, 2011-01-21 09:00
Recently, I received into my Spam folder an ad claiming that a product could “...improve performance 1000%.” Claims in that format have bugged me for a long time, at least as far back as the 1990s, when some of the most popular Oracle “tips & techniques” books of the era used that format a lot to state claims.

Beware of claims worded like that.

Whenever I see “...improve performance 1000%,” I have to do extra work to decode what the author has encoded in his tidy numerical package with a percent-sign bow. The two performance improvement formulas that make sense to me are these:
  1. Improvement = (ba)/b, where b is the response time of the task before repair, and a is the response time of the task after repair. This formula expresses the proportion (or percentage, if you multiply by 100%) of the original response time that you have eliminated. It can’t be bigger than 1 (or 100%) without invoking reverse time travel.
  2. Improvement = b/a, where b and a are defined exactly as above. This formula expresses how many times faster the after response time is than the before one.
Since 1000% is bigger than 100%, it can’t have been calculated using formula #1. I assume, then, that when someone says “...improve performance 1000%,” he means that b/a = 10, which, expressed as a percentage, is 1000%. What I really want to know, though, is what were b and a? Were they 1000 and 1? 1 and .001? 6 and .4? (...In which case, I would have to search for a new formula #3.) Why won’t you tell me?

Any time you see a ‘%’ character, beware: you’re looking at a ratio. The principal benefit of ratios is also their biggest flaw. A ratio conceals its denominator. That, of course, is exactly what ratios are meant to do—it’s called normalization—but it’s not always good to normalize. Here’s an example. Imagine two SQL queries A and B that return the exact same result set. What’s better: query A, with a 90% hit ratio on the database buffer cache? or query B, with a 99% hit ratio?

QueryCache hit ratio A90%B99%
As tempting as it might be to choose the query with the higher cache hit ratio, the correct answer is...
There’s not enough information given in the problem to answer. It could be either A or B, depending on information that has not yet been revealed.Here’s why. Consider the two distinct situations listed below. Each situation matches the problem statement. For situation 1, the answer is: query B is better. But for situation 2, the answer is: query A is better, because it does far less overall work. Without knowing more about the situation than just the ratio, you can’t answer the question.

Situation 1QueryCache lookupsCache hitsCache hit ratio A1009090%B1009999%
Situation 2QueryCache lookupsCache hitsCache hit ratio A10990%B1009999%
Because a ratio hides its denominator, it’s insufficient for explaining your performance results to people (unless your aim is intentionally to hide information, which I’ll suggest is not a sustainable success strategy). It is still useful to show a normalized measure of your result, and a ratio is good for that. I didn’t say you shouldn’t use them. I just said they’re insufficient. You need something more.

The best way to think clearly about performance improvements is with the ratio as a parenthetical additional interesting bit of information, as in:
  • I improved response time of T from 10s to .1s (99% reduction).
  • I improved throughput of T from 42t/s to 420t/s (10-fold increase).
There are three critical pieces of information you need to include here: the before measurement (b), the after measurement (a), and the name of the task (here, T) that you made faster. I’ve talked about b and a before, but this I’ve slipped this T thing in on you all of a sudden, haven’t I!

Even authors who give you b and a have a nasty habit of leaving off the T, which is far worse even than leaving off the before and after numbers, because it implies that using their magic has improved the performance of every task on the system by exactly the same proportion (either p% or n-fold), which is almost never true. That is because it’s rare for any two tasks on a given system to have “similar” response time profiles (defining similar in the proportional sense). For example, imagine the following quite dissimilar two profiles:

Task AResponse timeResource100%Total90%CPU10%Disk I/O
Task BResponse timeResource100%Total90%Disk I/O10%CPU
No single component upgrade can have equal performance improvement effects upon both these tasks. Making CPU processing 2× faster will speed up task A by 45% and task B by 5%. Likewise, making Disk I/O processing 10× faster will speed up task A by 9% and task B by 80%.

For a vendor to claim any noticeable, homogeneous improvement across the board on any computer system containing tasks A and B would be an outright lie.

An Axiomatic Approach to Algebra and Other Aspects of Life

Fri, 2011-01-14 22:50
Not many days pass that I don’t think a time or two about James R. Harkey. Mr. Harkey was my high school mathematics teacher. He taught me algebra, geometry, analytic geometry, trigonometry, and calculus. What I learned from Mr. Harkey influences—to this day—how I write, how I teach, how I plead a case, how I troubleshoot, .... These are the skills I’ve used to earn everything I own.

Prior to Mr. Harkey’s algebra class, algebra for me just was a morass of tricks to memorize: “Take the constant to the other side...”; “Cancel the common factors...”; “Flip the fraction and multiply...” I could practice for a while and then solve problems just like the ones I had been practicing, by applying memorized transformations to superficial patterns that I recognized, but I didn’t understand what I had been taught to do. Without continual practice, the rules I had memorized would evaporate, and then once more I’d be able to solve only those problems for which I could intuit the answer: “7x + 6 = 20” would have been easy, but “7/x – 6 = 20” would have stumped me. This made, for example, studying for final exams quite difficult.

On the first day of Mr. Harkey’s class, he gave us his rules. First, his strict rules of conduct in the classroom lived up to his quite sinister reputation, which was important. Our studies began with a single 8.5" × 14" sheet of paper that apparently he asked us to label “Properties A” (because that’s what I wrote in the upper right-hand corner; and yes, I still have it). He told us that we could consult this sheet of paper on every homework assignment and every exam he’d give. And here’s how we were to use it: every problem would be executed one step at a time; every step would be written down; and beside every step we would write the name of the rule from Properties A that we invoked to perform that step.

You can still hear us now: Holy cow, that’s going to be a lot of extra work.

Well, that’s how it was going to be. Here’s what each homework and test problem had to look like:

The first few days of class, we spent time reviewing every single item on Properties A. Mr. Harkey made sure we all agreed that each axiom and property was true before we moved on to the real work. He was filling our toolbox.

And then we worked problem after problem after problem.

Throughout the year, we did get to shift gears a few times. Not every ax + b = c problem required fourteen steps all year long. After some sequence of accomplishments (I don’t remember what it was—maybe some set number of ‘A’ grades on homework?), I remember being allowed to write the number of the rule instead of the whole name. (When did you first learn about foreign keys? ☺) Some accomplishments after that, we’d be allowed to combine steps like 3, 4 and 5 into one. But we had to demonstrate a pattern of consistent mastery to earn a privilege like that.

Mr. Harkey taught algebra as most teachers teach geometry or predicate logic. Every problem was a proof, documented one logical step at a time. In Mr. Harkey’s algebra class, your “answer” to a homework problem or test question wasn’t the number that x equals, it was the whole proof of how you arrived at the value of x in your answer. Mr. Harkey wasn’t interested in grading your answers. He was going to grade how you got your answers.

The result? After a whole semester of this, I understood algebra, and I mean thoroughly. You couldn’t make a good grade in Mr. Harkey’s algebra class without creating an intimate comprehension of why algebra works the way it does. Learning that way supplies you for a whole lifetime: I still understand it. I can make dimensioned drawings of the things I’m going to build in my shop. I can calculate the tax implications of my business decisions. I can predict the response time behavior of computer software. I can even help my children with their algebra. Nothing about algebra scares me, because I still understand all the rules.

When I help my boys with their homework, I make them use Mr. Harkey’s axiomatic approach with my own Properties A that I made for them. (I rearranged Mr. Harkey’s rules to better illuminate the symmetries among them. If Mr. Harkey had been handy with the laptop computer, which didn’t exist when I was in school, I imagine he’d have done the same thing.)

Invariably, when my one of boys misses a math problem, it’s for the same stupid reason that I make mistakes in my shop or at work. It’s because he’s tried to do steps in his head instead of writing them all down, and of course he’s accidentally integrated an assumption into his work that’s not true. When you don’t have a neat and orderly audit trail to debug, the only way you can fix your work is to start over, which takes more time (which itself increases frustration levels and degrades learning) and which bypasses perhaps the most important technical skill in all of Life today: the ability to troubleshoot.
Theory: Redoing an n-step math problem instead of learning how to propagate a correction to an error made in step – k through step n is how we get to a society in which our support analysts know only two solutions to any problem: (a) reboot, and (b) reinstall.It’s difficult to teach people the value of mastering the basics. It’s difficult enough with children, and it’s even worse with adults, but great teachers and great coaches understand how important it is. I’m grateful to have met my share, and I love meeting new ones. Actually, I believe my 11-year old son has a baseball practice with one tomorrow. We’ll have to check his blog in about 30 years.

New paper "Mastering Performance with Extended SQL Trace"

Thu, 2011-01-13 11:24
Happy New Year.

It’s been a busy few weeks. I finally have something tangible to show for it: “Mastering Performance with Extended SQL Trace” is the new paper I’ve written for this year’s RMOUG conference. Think of it a 15-page update to chapter 5 of Optimizing Oracle Performance.

There’s lots of new detail in there. Some highlights:
  • How to enable and disable traces, even in un-cooperative applications.
  • How to instrument your application so that tracing the right code path during production operation of your application becomes dead simple.
  • How to make that instrumentation highly scalable (think 100,000+ tps).
  • How timestamps since 10.2 allow you to know your recursive call relationships without guessing.
  • How to create response time profiles for calls and groups of calls, with examples.
  • Why you don’t want to be on Oracle 11g prior to
I hope you’ll be able to make productive use of it.