Rob Baillie

Subscribe to Rob Baillie feed
More than 2 decades of writing software, and still loving it...Rob Bailliehttp://www.blogger.com/profile/06513796097645814224noreply@blogger.comBlogger144125
Updated: 11 hours 16 min ago

Can a change in execution plan change the results?

Thu, 2007-07-12 08:15
We've been using Oracle Domain indexes for a while now in order to search documents to get back a ranked order of things that meet certain criteria. The documents are releated to people, and we augment the basic text search with other filters and score metrics based on the 'people' side of things to get an overall 'suitability' score for the results in a search. Without giving too much away about the business I work with I can't really tell you much more about the product than that, but it's probably enough of a background for this little gem. We've known for a while that the domain index 'score' returned from a 'contains' clause is based not only on the document to which that score relates, but also on the rest of the set that is searched. An individual document score does not live in isolation, rather in lives in the context of the whole result set. No problem. As I say, we've known this for a while and so have our customers. Quite a while ago they stopped asking what the numbers mean and learned to trust them. However, today we realised something. Since the results are affected by the result set that is searched, this means that the results can be affected by the order in which the optimizer decides to execute a query. I can't give you a full end to end example, but I can assure you that the following is most definately the case on one of our production domain indexes (names changed, obviously): We have a two column table 'document_index', which contains 'id' and 'document_contents'. Both columns have an index. The ID being the primary key and the other being a domain index. The following SQL gives the related execution path: SELECT id, SCORE( 1 ) FROM document_index WHERE CONTAINS( document_contents, :1, 1 ) > 0 AND id = :2 SELECT STATEMENT TABLE ACCESS BY INDEX ROWID SCOTT.DOCUMENT_INDEX DOMAIN INDEX SCOTT.DOCUMENT_INDEX_IDX01 However, the alternative SQL gives this execution path: SELECT id, SCORE( 1 ) FROM document_index WHERE CONTAINS( document_contents, 'Some text', 1 ) > 0 AND id = :2 SELECT STATEMENT TABLE ACCESS BY INDEX ROWID SCOTT.DOCUMENT_INDEX INDEX UNIQUE SCAN SCOTT.DOCUMENT_INDEX_PK Normally, this kind of change in execution path wouldn't be a problem. But as stated earlier, the result of a score operation against a domain index is not just dependant on the individual records, but the context of the whole result set. The first execution provides you a score for the single document in the context of the all the documents in the table, the second gives you a score within the context of just that document. The scores are different. Now obviously, this is an extreme example, but more subtle examples will almost certainly exist if you combine the domain index lookups with any other where clause criteria. This is especially true if you're using literal values instead of bind variables in which case you may find the execution path changing between calls to the 'same' piece of SQL. My advice? Well, we're going to split our domain index look ups from all the rest of the filtering criteria, that way we can prepare the set of documents we want the search to be within and know that the scoring algorithm will be applied consistently.

Handy "Alert Debugging" tool

Wed, 2007-07-04 13:29
One of the coolest things about OO Javascript is that methods can be written to as if they are variables. This means that you can re-write functions on the fly. Bad for writing maintainable code if you're not structured; Fantastic for things like MVC controllers (rather use the controller to forward calls on to the model, you use it to rewire the view so that it calls it directly, and all without the view even realising it!). What I didn't realise was that the standard window object (and probably so many others out there) can have its methods overwritten like any other. Probably the simplest example of that proves to be incredibly useful... changing the alert function so that the dialog becomes a confirm window. Clicking cancel means that no further alerts are shown to the user. Great for when you're writin Javascript without a debugger and have to resort to 'alert debugging'.

window.alert = function(s) {
if( !confirm(s) ) window.alert = null;
}
In case you're wondering... I found it embedded in the comments on this post: http://www.joehewitt.com/blog/firebug_for_iph.php. Cheers Menno van Slooten

Tab Complete in Windows

Tue, 2007-06-26 14:00
Another one of those things that I can never remember off the top of my head so find myself constantly looking it up whenever I get access to a new machine.

I figure it may as well be my own site that I get the info from :-)

To switch on 'Tab Complete' in Windows command line change the following registry keys to '09':
  • HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor\CompletionChar
  • HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor\PathCompletionChar

Haiku

Fri, 2007-06-22 11:54
Saw a cracking Haiku on a t-shirt the other day:

Haiku are easy
But sometimes they don't make sense
Refridgerator.

Then my mind started dwelling on it:

Got the release out
But the testing's not finished
It's falling over

Or

Database is slow
Just can't see what's wrong with it
Set autotrace on

Or

A quick refactor
Turns into a bigger job
Should have unit tests

Records

Mon, 2007-05-28 11:45
And to follow on from the last post... my current personal bests:

I figure if I keep them here, at least I'll always know where they are!
5km Run23:44 (Battersea Park, 'Beat the Baton' 28/05/07)
10km Run53:23 (Hyde Park, 'Run London' 08/10/06)
Half Marathon2:17:49 (Redcar, 'Tees Valley Half Marathon' 12/03/06)
Rubik's cube57 seconds

Targets

Mon, 2007-05-28 11:12
You've gotta have targets.

The more I try to motivate myself to do things, the more I realise that if I don't have a target it's incredibly difficult.

When I realised this it came as a big surprise to me. I'm really not the sort of person to have a 5 year plan or career goals, but it seems that if I don't set myself an only just achievable goal I find it very difficult to motivate myself to do much.

I keep myself fit so that I get the most out of playing football. But just having that in mind isn't enough to get me out and running. If I didn't set myself a target time for a 5km or 10km run and then book a place at a running event, then I'd just sit on my fat arse every night watching TV. OK, so I may be exaggerating my self deprecation, but you get the idea.

I find that this affects me in many different aspects of my life.

To motivate myself to run I set a target (public) 5km or 10km time (this year it's 22:30 and 50:00 respectively).

To motivate myself to learn to do the Rubik's cube, I set myself a target completion time (1 minute - yup, managed it).

To motivate myself to save money I set a target amount to reach by a certain date (nope, not telling you how much).

A friend of mine decided that he'd set himself the target of taking a photo a day for a year and posting it on his site. I may have to steal that idea next year... but until then you can find his here: www.ysr23.com/blog. It really is damn good.

I do the job I do because I just flat out enjoy it. As soon as it becomes too much of a chore I'll move on. And I reckon I'm doing alright career wise in whatever way you choose to measure it. For me the only measure that truly counts is enjoyment, and in the main it's a damn fine job. Well, it is most of the time anyway ;-)

Someone at work once said to me: You know, every now and again Tom Cruise probably gets up in the morning, probably on set, in his trailer and thinks to himself "Damn, gotta do some of that acting shit again today". OK, so he gets paid more in a minute that I do in a year, but you get the point.

And the big thing that keeps my enjoying my job is that I'm still learning new things. I suppose I have a clear target in my career to always keep on learning and to surround myself in people who can teach me. It's probably one of the biggest reasons why I'm so pleased to be working with Extreme Programming. It makes it easy to fulfill that goal. And it works on a clear system of easy to understand targets.

A release to the business has a target set of functionality.
A single story has a clearly defined purpose.
A unit test gives you a goal that must be met, and a clear way of determining the success or failure.

Layers of targets.

And if you're doing XP properly you get to celebrate when you meet those targets.

A brief whoop when the unit test passes.
A handful of jelly beans when the story's complete.
A damn big meal and a piss up when a release hits the business.

OK, so real life targets don't have quite the same level of celebration, but it's the same deal.

Set yourself a clear target and you get clarity of purpose in aiming for it, and the celebration when you pass it.

Roll-out

Wed, 2007-04-11 02:12
I'm pretty sure that most people that read this blog will also read The Daily WTF.

But just in case you don't, there's a nice entry on 'soft-coding'.

Overall the article makes sound sense, but there's a line right at the end that resonates with me, especially since I read it the day after someone told me that they needed a developer for a whole day (9 hours) to roll out their system...

With the myriad of tools available today, there is no reason that your deployment process need be any more complicated than a simple, automated script that retrieves the code from source control, compiles it, copies/installs the executables, and then runs the relevant database scripts.


It makes me feel like I'm not alone.

Question

Fri, 2007-03-23 02:40
I've had this conversation a couple of time with people, and I've realised that I can't get to a satisfactory answer without some research. And I'm lazy. So I'm going to pose a question... and if I don't get a satisfactory answer here I might well send it to The New Scientist in the hope that they'll answer it.

Assuming that the cost of setting up and maintaining the infrastructure is already taken care of, which is more energy efficient: an electric kettle or a stove top kettle?

I am still here...

Sat, 2007-02-17 06:03
Sorry people, I promise I'm still here and I WILL get round to finishing my text on estimating and answering the request for more info on the database patch runner. I will, I will, I will!

The problem is, I've started reading again, and I've started playing on-line poker. Damn it :-)


But I'm enjoying it, especially a Cohn book on Agile Estimation and Planning. It is an absolute MUST read. It takes off where the estimation chapter from User Stories Applied left off, and it really doesn't dissappoint.

Unfortunately it seems to say an awful lot that I agree with, and was going to form the bulk of my next couple of posts. So if you like what I have to say on the topic, then Mike Cohn is definately worth a read... he goes into a lot more detail than I ever will here!

Obviously I'm reading an awful lot on Texas Hold 'em as well... but I'm not going to tell you what 'cause that might take away my advantage ;-)

Producing Estimates

Sun, 2007-01-07 07:52
OK, so it's about time I got back into writing about software development, rather than software (or running, or travelling, or feeds) and the hot topic for me at the moment is the estimation process.

This topic's probably a bit big to tackle in a single post, so it's post series time. Once all the posts are up I'll slap another up with all the text combined.

So – Producing good medium term estimates...

I'm not going to talk about the process for deriving a short term estimate for a small piece of work, that's already covered beautifully by Mike Cohn in User Stories Applied, and I blogged on that topic some time ago. Rather I'm going to talk about producing an overall estimate for a release iteration or module.

I've read an awful lot on this topic over the last couple of years, so I'm sorry if all I'm doing is plagiarising things said by Kent Beck, Mike Cohn or Martin Fowler (OK, the book's Kent as well, but you get the point), or any of those many people out there that blog, and that I read. Truly, I'm sorry.

I'm not intending to infringe copyright or take credit for other people's work, it's just that my thinking has been heavily guided by these people. This writing is how I feel about estimating, having soaked up those texts and put many of their ideas into practice.

Many people think that XP is against producing longer term estimates, but it isn't. It's just that it's not about tying people down to a particular date 2 and a half years in the future and beating them with a 400 page functional design document at the end of it; it's about being able to say with some degree of certainty when some useful software will arrive, the general scope of that software, and ultimately to allow those people in power to be able to predict some kind of cost for the project.

Any business that allows the development team to go off and do a job without providing some kind of prediction back to the upper management team being irresponsible at best, grossly negligent at worst.

Providing good medium term estimates is a skill, and a highly desirable one. By giving good estimates and delivering to them you are effectively managing the expectations of upper management and allowing them to plan the larger scale future of the software in their business. When they feel they can trust the numbers coming out of the development teams it means they are less likely to throw seemingly arbitrary deadlines at their IT departments and then get angry when they're not met. Taking responsibility for, and producing good estimates will ultimately allow you to take control of the time-scales within your department. If you continually provide bad estimates or suggest a level of accuracy that simply isn't there by saying "it'll take 1,203 ½ man days to complete this project", then you've only got yourself to blame when that prediction is used to beat you down when your project isn't complete in exactly 1,203 ½ man days.

Whilst being written within the scope of XP, there's no reason why these practices won't work when you're not using an agile process. All you need to do is to split your work into small enough chunks so that each one can be estimated in the region of a few hours up to 5 days work before you start.

In summary, the process is this:
  • Produce a rough estimate for the whole of the release iteration.
  • Iteratively produce more detailed estimates for the work most likely to be done over the next couple of weeks.
  • Feed the detailed estimates into the overall cost.
  • Do some development.
  • Feed the actual time taken back into the estimates and revise the overall cost.
  • Don't bury your head in the sand if things aren't going to plan.


1 – Produce a rough estimate for the whole of the release iteration.

Before you can start work developing a new release you should have some idea of what functionality is going into that release. You should have a fairly large collection of loosely defined pieces of functionality. You can't absolutely guarantee that these stories will completely form the release that will go out in 3 months time, but what you can say is: "Today we believe that it would be most profitable for us to work on these areas of functionality in order to deliver a meaningful release to the business within a reasonable time-frame".

Things may change over the course of the next few months, and the later steps in this process will help you to respond to those changes and provide the relevant feedback to the business.

Kent Beck states (I paraphrase): You don't drive a car by pointing it at the end of the road and driving in a straight line with your eyes closed. You stay alert and make corrections moment by moment.
I state: When you get in a car and start to drive you should at least have some idea where you're going.

The two statements don't oppose each other.

The basic idea is this:
  • In broad strokes, group the stories in terms of cost in relation to each other. If X is medium and Y is huge then Z is tiny.
  • Assign a numbered cost to each group of stories.
  • Inform that decision with past experience.
    Produce a total.


So, get together the stories that you're putting into this release iteration, grab a room with a big desk, some of your most knowledgeable customers and respected developers and off you go.

Ideally you want to have around 5 to 8 people in the room, more than this and you'll find you end up arguing over details, less and you may find you don't have enough buy in from the customer and development teams.

As a golden rule: developers that are not working on the project should NOT be allowed to estimate. There's nothing worse for developer buy-in than having an estimate or deadline over which they feel that have no control.

Try to keep the number of stories down, I find around 50 stories is good. If you have more you may be able to group similar ones together into a bigger story, or you may find you have to cull a few. That's OK, you can go through this process several times. You can try to go through the process with more, but I find that you can't keep more than 50 stories in your head at one time. You're less likely to get the relative costs right. And it'll get boring.

Quickly estimate each story:
  • The customer explains the functionality required without worrying too much about the about the detail;
  • The developers place each story into one of 4 piles: small, medium, large and wooooaaaah man that's big.


People shouldn't be afraid to discuss their thinking, though you should be worried if it's taking 10 minutes to produce an estimate for each story.

You may find that developers will want to split stories down into smaller ones and drive out the details. Don't, if you can avoid it. At this point we want a general idea of the relative size of each story. We don't need to understand the exact nature of each individual story, we don't need enough to be able to sit down and start developing. All we need is a good understanding of the general functionality needed and a good idea of the relative cost. Stories WILL be re-estimated before they're implemented and many of them will be split into several smaller ones to make their development simpler. But we'll worry about that later.

As the stories are being added to piles be critical of past decisions. Allow stories to be moved between piles. Allow piles to be completely rebuilt. You may start off with 10 stories that spread across all 4 piles, then the 11th 12th and 13th stories come along and have nowhere to go – they're all an order of magnitude bigger than the ones that have gone before. That's OK, look at the stories you've already grouped in light of this new knowledge and recalibrate the piles.

Once all the stories are done, spread the piles out so you can see every story in each one. Appraise your groupings. Move stories around. It's important that you're comfortable that you've got them in the right pile relative to each other.

If the developers involved in this round of estimating recently worked on another project or a previous release of this project then add some recently completed stories from that work. Have the developers assign those stories into the piles. You know how long those stories took and you can use that to help guide your new estimates. There's nothing more useful in estimating than past experience... don't be afraid to use that knowledge formally.

Once you're that the piles are accurate, put a number of days against each pile. The idea is to get 'on average a single story in this pile will take x days'.

If you have to err, then err on the side of bigger. Bear in mind that people always estimate in a perfect world where nothing ever goes wrong. The stories you added from the last project can be used to guide the number. You know exactly how long it took to develop those stories and you'd expect the new numbers to be similar.

The resulting base estimate is then just the sum of (number of stories in each pile * cost of that pile). This number will give you a general estimate on the overall cost.

You'll probably want to add a contingency, and past experience on other projects will help you guide that. If you've gone through this process before, pull out the numbers from the last release... when you last performed this process and compare the starting estimate with the actual amount of time taken. Use that comparison for the contingency.

The gut feeling is to say, "Hang on, the last time we did this we estimated these 50 stories, but only built these 20 and added these other 30. That means that the starting estimate was nothing like what we actually did". But the point is that if every release has the same kind of people working on it, working for the same business with the same kinds of pressures then this release is likely to be the same as the last. Odds are that THIS release will only consist of 20 of these stories plus 30 other unknown ones... and the effect on the estimate is likely to be similar. Each release is NOT unique, each release is likely to go ahead in exactly the same way as the last one. Use that knowledge, and use the numbers you so painstakingly recorded last time.

Aside: Project managers tend to think that I have a problem with them collecting numbers (as they love to do). They're wrong... I have a problem with numbers being collected and never being used. The estimation process is the place where this is most apparent. Learn from what happened yesterday to inform your estimate for tomorrow.


Once the job's done you should have enough information to allow you to have the planning game. For those not familiar with XP, that is the point at which the customer team (with some advice from the development team) prioritises and orders the stories. It gives you a very good overall view of the direction of the project and a clear goal for the next few weeks of work.

At this point you should have a good idea of cost and scope and have a good degree of confidence in both. Now's the time to tell your boss...

Next up – using the shorter term estimates to inform the longer term one...

Technorati Tags: , , , , , , , , ,

Pages