Andrew Clarke

Subscribe to Andrew Clarke feed
Notes from the Tooting Bec Underground
Updated: 10 hours 11 min ago

The personal is technical

Mon, 2013-07-15 19:50
On Friday evening I attended an IT Job Fair at the Amerigo Vespucci in Canary Wharf. Let me say straight away that hanging out with a random bunch of techies and recruiters would not be my first choice for a Friday evening. But, hey! I'm looking for my next role, and right now the market is too tough to turn down opportunities to find it. Besides I was interested to see whether the Meetup template would translate into a recruitment fair.

On the day the translation was a mixed success. Unlike most Meetups, which can work with any number of attendees, a job fair requires a goodly number of recruiters, and recruiters will only turn up if they think there will enough candidates to make it worth their while. This first event didn't achieve that critical mass, and I would be quite surprised if I get an opportunity from it. Nevertheless I will try to attend the next event, whenever that may be, because pop-up job fairs in bars are a great idea. Not for the reason you're thinking (I drank cola all evening), but because it was enjoyable. I talked with some interesting people, and got a couple of email addresses as well.

But more than that I was impressed with the concept. The informal social setting is good for understanding what a person is really like, specifically what they might be like to work with. This has to be worthwhile. The CV is a dead letter: it lists skills and accomplishments but doesn't animate or demonstrate them. The formality of the technical interview makes it hard to judge somebody as a person. Anyway it's generally aimed at establishing how much of their CV is true. The social element is often missed entirely. Big mistake. Software is like Soylent Green, it's made of people.

Personality matters: the toughest problems most projects face are political (organizational, personal) rather than technical (except for system integration - that's always going to be the biggest pain in the neck). A modern development project is a complex set of relationships. There are external relationships, with users, with senior managers, with other stakeholders (Security, Enterprise Architecture, etc) any of which can jeopardize the success of the project if handled badly. But the internal relationships - between Project Manager and staff, between developers and testers, or developers and DBAs - are just as fraught with difficulty.

The personal is technical because the team dynamic is a crucial indicator of the likely success of the project. You don't just need technical competence, you need individuals who communicate well and share a common vision; people who are (dread phrase) team players. That's why Project Managers generally like to work with people they already know, because they already know they can work with them.

For a new hire, nobody knows the answer to the burning question, "Can I stand to be in this person's company eight hours a day, five days a week, for the duration of the project?" Hence the value of chatting about work and other things in a bar on a hot summer's eve over a glass of something with ice. I'm not sure how well the model would works for recruitment agents, but I think it would suit both hirers and hirees. It's not a technique that scales, but if people made better hiring decisions perhaps that wouldn't matter?

UKOUG Analytics Event: a semi-structured analysis

Thu, 2013-07-11 17:40
Yesterday's UKOUG Analytics event was a mixture of presentations about OBIEE with sessions on the frontiers of data analysis. I'm not going to cover everything, just dipping into a few things which struck me during the day

During the day somebody described dashboards as "Fisher Price activity centres for managers". Well, Neil Sellers showed a mobile BI app called RoamBI which is exactly that. Swipe that table, pinch that graph, twirl that pie chart! (No really, how have we survived so long with pie charts which can't be rotated?) The thing is so slick, it'll keep the boss amused for hours. Neil's theme on the importance of data visualization to convey a message or tell a story was picked up by Claudio Bastia and Nicola Sandol.   Their presentation included a demo of IConsulting's Location Intelligence extension for OBIEE. The tool not only does impressive things with the display of geographic data, it also allows users to interact with the maps to refine queries and drill down into the data. This is visualization which definitely goes beyond the gimmick: it's an extremely powerful way of communicating complex data sets.

A couple of presentations quoted the statistic that 90% of our data was created in the last two years. This is a figure which has been bandied about but I've never seen a citation which explains who calculated it and what method they used (although it's supposed to have originated at IBM). It probably comes from the same place as most other statistics (and project estimates). What is the "data" the figure measures? I'm sure in some areas of human endeavour (bioinformatics, say, or CERN) the amount of data they produce has gone metastatic. And obviously digital cameras, especially on phones, are now ubiquitous, so video and photographs account for a lot of the data growth. But are selfies, instagrammed burgers and cute kittens really data? Same with other content: how much of this data explosion is mirroring, retweets, quoting, spam and AdSense farms? Not to mention the smut. Anyway, that 90% was first cited in 2012; it's now 2013 and somebody needs to invent derive a new figure.

The day rounded off with a panel and a user presentation. Toby Price opened the Q&A by asking Oracle's Nick Whitehead, how does Hadoop fit into an Oracle estate? It's a good question. After all, Oracle has been able to handle unstructured data, i.e. text, since the introduction of ConText in 8.0 (albeit as a chargeable extra in those days). And there's nothing special about MapReduce: PL/SQL can do that. So what's the deal with Hadoop? Here's the impertinent answer to this pertinent question: Hadoop allows us to run massively parallel jobs without paying Oracle's per processor licenses. Let's face it, not even Tony Stark could afford to run a one-thousand core database.

The closing session was a presentation from James Wyper & Dirk Shelley about upgrading the BI architecture at John Lewis Partnership. They described it as a war story, but actually it was a report from the front lines, because the implementation is not yet finished. James and Dirk covered the products - which ones worked as advertised, which ones gave them grief (integration was a particular source of grief). They also discussed their approach to the project, relating what they did well and what they would do differently with the advantage of hindsight. This sort of session is the best part of any user group: real users sharing their experiences with the community. We need more of them.

Let me SLEEP!

Mon, 2013-06-24 12:11
DBMS_LOCK is a slightly obscure built-in package. It provides components which so we build our own locking schemes. Its obscurity stems from the default access on the package, which is restricted to its owner SYS and the other power user accounts. Because implementing your own locking strategy is a good way to wreck a system, unless you really know what you're doing. Besides, Oracle's existing functionality is such that there is almost no need to need to build something extra (especially since 11g finally made the SELECT ... FOR UPDATE SKIP LOCKED syntax legal). So it's just fine that DBMS_LOCK is private to SYS. Except ...

... except that one of the sub-programs in the package is SLEEP(). And SLEEP() is highly useful. Most PL/SQL applications of any sophistication need the ability to pause processing for a short while, either a fixed time or perhaps polling for a specific event. So it is normal for PL/SQL applications to need access to DBMS_SLEEP.LOCK().

Commonly this access is granted at the package level, that is grant execute on dbms_lock to joe_dev. Truth to be told, there's not much harm in that. The privilege is granted to a named account, and if somebody uses the access to implement a roll-your-own locking strategy which brings Production to its knees, well, the DBAs know who to look for.

But we can employ a schema instead. The chief virtue of a schema is managing rights on objects. So let's create a schema for mediating access to powerful SYS privileges:

create user sys_utils identified by &pw
temporary tablespace temp
/
grant create procedure, create view, create type to sys_utils
/

Note that SYS_UTILS does not get the create session privilege. Hence nobody can connect to the account, a sensible precaution for a user with potentially damaging privileges. Why bar connection in all databases and not just Production? The lessons of history tell us that developers will assume they can do in Production anything they can do in Development, and write their code accordingly.

Anyway, as well as granting privileges, the DBA user will need to build SYS_UTIL's objects on its behalf:
grant execute on dbms_lock to sys_utils
/
create or replace procedure sys_utils.sleep
( i_seconds in number)
as
begin
dbms_lock.sleep(i_seconds);
end sleep;
/
create public synonym sleep for sys_utils.sleep
/
grant execute on sys_utils.sleep to joe_dev
/

I think it's a good idea to be proactive about creating an account like this; granting it some obviously useful privileges before developers ask for them, simply because some developers won't ask. The forums occasionally throw up extremely expensive PL/SQL loops whose sole purpose is to burn CPU cycles or wacky DBMS_JOB routines which run every second. These WTFs have their genesis in ignorance of, or lack of access to, DBMS_LOCK.SLEEP().

Oracle 10g - a time traveller's tale

Mon, 2013-06-24 11:03
Time travel sucks, especially going back in time. Nobody takes a bath, there are no anaesthetics and you can't get a decent wi-fi signal anywhere. As for killing your own grandfather, forget about it.

The same is true for going back in database versions. In 2009 I had gone straight from an Oracle 9i project to an Oracle 11g one. So when I eventually found myself on a 10g project it was rather disorientating. I would keep reaching for tools which weren't in the toolbox: LISTAGG(), preprocessor scripts for external tables, generalized invocation for objects.

I had missed out on 10g while it was shiny and new, and now it just seemed limited. Take Partitioning. Oracle 10g supported exactly the same composite partitioning methods as 9i: just Range-hash and Range-List, whereas 11g is full of wonders like Interval-Range, Hash-Hash and the one I needed, List-List. Faking a List-List composite partitioning scheme in 10gConsider this example of a table with a (slightly forced) need for composite List-List partitioning. It is part of a engineering stock control system, in which PRODUCTS are grouped in LINES (Ships, Cars, Planes) and COMPONENTS are grouped into CATEGORIES (Frame, interior fittings, software, etc). We need an intersection table which links components to products.

There are hundreds of thousands of components and tens of thousands of products. But we are almost always only interested in components for a single category within a single product line (or product) so composite partitiong on (product_line, component_category) is a good scheme. In 11g the List-List method works just fine:
SQL> create table product_components
2 (product_line varchar2(10) not null
3 , product_id number not null
4 , component_category varchar2(10) not null
5 , component_id number not null
6 , constraint pc_pk primary key (product_id, component_id )
7 , constraint pc_prd_fk foreign key (product_id )
8 references products (product_id)
9 , constraint pc_com_fk foreign key (component_id )
10 references components (component_id)
11 )
12 partition by range(product_line) subpartition by list(component_category)
13 subpartition template
14 (subpartition sbody values ('BODY')
15 , subpartition sint values ('INT')
16 , subpartition selectr values ('ELECTR')
17 , subpartition ssoft values ('SOFT')
18 )
19 (partition pship values ('SHIP')
20 , partition pcar values ('CAR')
21 , partition pplane values ('PLANE')
22 )
23 /

Table created.

SQL>

But in 10g the same statement hurls ORA-00922: missing or invalid option. The workaround is a bit of a nasty hack: replace the first List with a Range, producing a legitimate Range-List composite:
SQL> create table product_components
2 (product_line varchar2(10) not null
3 , product_id number not null
4 , component_category varchar2(10) not null
5 , component_id number not null
6 , constraint pc_pk primary key (product_id, component_id )
7 , constraint pc_prd_fk foreign key (product_id )
8 references products (product_id)
9 , constraint pc_com_fk foreign key (component_id )
10 references components (component_id)
11 )
12 partition by range(product_line) subpartition by list(component_category)
13 subpartition template
14 (subpartition sbody values ('BODY')
15 , subpartition sint values ('INT')
16 , subpartition selectr values ('ELECTR')
17 , subpartition ssoft values ('SOFT')
18 )
19 (partition pcar values less than ('CAS')
20 , partition pplane values less than ('PLANF')
21 , partition pship values less than ('SHIQ')
22 )
23 /

Table created.

SQL>

Note the wacky spellings which ensure that 'CAR' ends up in the right partition. Also we have to re-order the partition clause so that the partition bounds don't raise an ORA-14037exception. We are also left with the possibility that a rogue typo might slip records into the wrong partition, so we really ought to have a foreign key constraint on the product_line column:

alter table product_components add constraint pc_prdl_fk foreign key (product_line)
references product_lines (line_code)
/

I described this as a nasty hack. It is not really that nasty, in fact it actually works very well in daily processing. But managing the table is less intuitive. Say we want to manufacture another line, rockets. We cannot just add a new partition:

SQL> alter table product_components
add partition prock values less than ('ROCKEU')
/
2 3 add partition prock values less than ('ROCKEU')
*
ERROR at line 2:
ORA-14074: partition bound must collate higher than that of the last partition


SQL>

Instead we have to split the PSHIP partition in two:

SQL> alter table product_components split partition pship
2 at ('ROCKEU')
3 into (partition prock, partition pship)
4 /

Table altered.

SQL>

The other snag is, that once we do get back to the future it's a bit of a chore to convert the table to a proper List-List scheme. Probably too much of a chore to be worth the effort. Even with a time machine there are only so many hours in the day.

Where's SCOTT?

Fri, 2013-06-21 10:53
The database on the Developer Days Database App VBox appliance doesn't have the SCOTT schema. This is fair enough, as the sample schemas aren't include by default any more (for security reasons, obviously). I know the official sample schemas used in the documentation - HR, OE, and so on - are more realistic and useful for developing prototypes. But nothing beats the simplicity of SCOTT.EMP for explaining something in an online forum.

So, where is the script for building the SCOTT schema?

Back in the day it was part of the SQL*Plus build: $ORACLE_HOME/sqlplus/demo/demobld.sql (or something, I'm doing this from memory). But in 11gR2 there are no demo scripts in the sqlplus sub-directory. This was also probably the case in 10g but I never had occasion to look for it on that platform. Anyway, in 11gR2 its location is $ORACLE_HOME/admin/rdbms/utlsampl.sql.

Presumably Oracle have obscured it because they want everybody to stop using SCOTT and standardise on the modern samples. But this schema is part of the Oracle heritage. It should have Grade II listed status.

Geek quotient

Sun, 2013-01-20 16:57
I only scored 9/10 on the 'How big a David Bowie fan are you?' quiz. And I scored 20/20 on the 'can you tell Arial from Helvetica?' quiz. But I only scored 32.1032% on the Geek Test. So I still have some way to go.

Error Wrangling

Mon, 2012-12-10 01:50
Last week the OTN SQL and PL/SQL Forum hosted of those threads which generate heat and insight without coming to a firm conclusion: this one was titled WHEN OTHERS is a bug. Eventually Rahul, the OP, complained that he was as confused as ever. The problem is, his question asked for a proof of Tom Kyte's opinion that, well, that WHEN OTHERS is a bug. We can't proof an opinion, even an opinion from a well-respected source like Tom. All we can do is weigh in with our own opinions on the topic.

One of the most interesting things in the thread was Steve Cosner's observations on designing code "to be called by non-database, and ... non-Oracle software (Example: Oracle Forms, or a web process)". He uses WHEN OTHERS to produce a return code. Now return codes are often cited as an example of bad practice. Return codes disguise the error and the allow the calling program to proceed as though nothing had gone wrong in the called unit. Exceptions, on the other hand, cannot be ignored.

But Steve goes on to make a crucial point: we must make allowances for how our code will be called. For instance, one time I was writing PL/SQL packages for an E-Business Suite application and I was doing my usual thing, coding procedures which raised exceptions on failure. This was fine for the low level routines which were only used in other PL/SQL routines. But the procedures which ran as concurrent jobs had to finish cleanly, no matter what happened inside; each passed a return code to the concurrent job manager, and logged the details. Similarly the procedures used by Forms passed back an outcome message, prefixed with "Success" or"Error" as appropriate.

This rankled at first but it is undoubtedly true that exceptions make demands on the calling program. Although the demands are not as high in PL/SQL as in say C++ or Java, raising exceptions still changes how the calling function must be coded. We have to work with the grain of the existing code base when introducing new functionality. (Interestingly, Google has a fiat against exceptions in its C++ code base, and developed its Go language to support return codes as the default error handling mechanism. Find out more.)

The point is APIs are a contract. On its side the called program can enforce rules about how it is called - number and validity of input parameters, return values. But it cannot impose rules about what the calling program does with the outcome. So there's no point in exposing a function externally if its behaviour is unacceptable to the program which wants to call it. When the calling program wants to use return codes there's little point in raising exceptions instead. Sure the coder writing the calling program can ignore the value in the return code, but that is why we need code reviews.

So, is WHEN OTHERS a bug? The answer is, as so often, it depends.

GOTOs, considered

Mon, 2012-12-03 00:12
Extreme programming is old hat now, safe even. The world is ready for something new, something tougher, something that'll... break through. You know? . And here is what the world's been waiting for: Transgressive Programming.

The Transgressive Manifesto is quite short: It's okay to use GOTO. The single underlying principle is that we value willful controversy over mindless conformity.

I do have a serious point here. Even programmers who haven't read the original article (because they can't spell Dijkstra and so can't find it through Google) know that GOTOs are "considered harmful". But as Marshall and Webber point out, "the problem lies ... in the loss of knowledge and experience. If something is forbidden for long enough, it becomes difficult to resurrect the knowledge of how to use it."

How many Oracle developers even realise PL/SQL supports GOTO? It does, of course. Why wouldn't it? PL/SQL is a proper programming language.

The standard objection is that there is no role for GOTO because PL/SQL has loops, procedures, CASE, etc. But sometimes we need to explicitly transfer control. In recent times I have have across these examples:

  • a loop which raised a user-defined exception to skip to the END LOOP; point when the data was in certain ststes, thus avoiding large chunk of processing. A GOTO would have have been cleaner, because it is poor practice to represent normal business states as exceptions.
  • a huge function with more than a dozen separate RETURN statements. GOTOs directing flow to a single RETURN call would have been really helpful, because I needed to log the returned value.
  • a condition which set a counter variable to a large number so as to short-circuit a loop. Here a GOTO would simply have been more honest.
These examples are all explicit control transfers: they cause exactly the sort of random paths through the code which Dijkstra inveighed against. But the coders didn't honour the principle underlying his fatwa, they just lacked the moxie to invoke the dread statement. Instead they kludged. I'm not saying that using a GOTO would have redeemed a function with 800 LOC ; clearly there'e a lot more refactoring to be done there. But it would have been better.

Here is a situation I have come across a few times. The spec is to implement a series of searches of increasing coarseness, depending on which arguments are passed; the users want the most focused set of records available, so once a specific search gets some hits we don't need to run the more general searches.

Nested IF statements provide one way to do this:


result_set := sieve_1(p1=>var1, p2=>var2, p3=>var4, p4=>var5);

if result_set.count() = 0
then
result_set := sieve_2(p1=>var2, p2=>var3, p3=>var4);

if result_set.count() = 0
then
result_set := sieve_3(p1=>var3, p2=>var5);

if result_set.count() = 0
then
....
end if;
end if;
end if;
return result_set;
Obviously as the number of distinct searches increases the nested indentation drives the code towards the right-hand side of the page. Here is an alternative implementation which breaks the taboo and does away with the tabs.

result_set := sieve_1(p1=>var1, p2=>var2, p3=>var4, p4=>var5);
if result_set.count() > 0
then
goto return_point;
end if;

result_set := sieve_2(p1=>var2, p2=>var3, p3=>var4);
if result_set.count() > 0
then
goto return_point;
end if;

result_set := sieve_3(p1=>var3, p2=>var5);
if result_set.count() > 0
then
goto return_point;
end if;
...
<< return_point >>
return result_set;
I think the second version has a clearer expression of intent. Did we find any records? Yes we did, job's a good'un, let's crack on.

tl;dr
GOTO: not as evil as triggers.

A new career in a new town

Sun, 2012-11-25 07:36
Well, I say "new career" but really it's the same thing: Oracle development. I'm working on some complex data matching routines for a data warehouse. But it is new, because I'm a contractor.

Going freelance has been a big step, as I was with Logica for over seventeen years. Actually, the company I joined was CMG, which through a chain of mergers, acquisitions and re-brandings became CMG Admiral, LogicaCMG, Logica and is now part of CGI. Contracting is something I have been considering for a while years but the actual decision almost took me by surprise. It's two weeks now, and I'm still enjoying it.

The "new town" part of the title is not quite accurate either, as Croydon is a town with a long and venerable history. But it's not a place I had been to before, although I have driven through it. This is not a crack at Croydon, it's just that sort of place; Wikipedia describes it as "a natural transport corridor between London and England's south coast".

Croydon is a bit odd, comprising zones of office blocks, shopping malls, car parks and entertainment complexes delineated by dual carriage ways and fly-overs. For me its chief virtue is a shortish commute. My last engagement for Logica entailed working away from home. After several months of living in a hotel room it's just nice to come back the family every evening.

Pages