Understanding Terracotta caching

From: Sandra Becker <sbecker6925_at_gmail.com>
Date: Thu, 21 Jan 2010 10:18:40 -0700
Message-ID: <3c5f7821001210918y49d6452cg8741b36906e5821e_at_mail.gmail.com>



Platform: IBM Series z
OS: SUSE 10
Oracle: EE 10gR2

*Current situation*:

Concern that the application is not scalable. Some large tables are have heavy loads, a mix of read/write, heavier on the reads. Some inefficient code on top of a poor design have been causing performance issues. We both write and host the application for our customers.

*Tables*:

  1. c - used by a small portion of the transactions, a mix of read/write, code/indexes fairly efficient, have never seen this table as a bottleneck
  2. d - critical table used in approximately 60% of all code, 33M+ rows, lots of DML and reads, inefficient code, poor design, frequently a bottleneck
  3. i - important table, 7.5M+ rows, inefficient code, lots of DML and reads, sometimes it and its children are a bottleneck
  4. m - critical table, 7.4M+ rows, inefficient code, combination of columns in tables d and i because "joins are bad", all inserts to d and i requie inserts to this table as well, updates on several columns in d and i require updates on the same columns in this table, the state column can have any one of 13 different values, frequently a bottleneck
  5. p - heavily used, 1.1M+ rows, predominately reads a little DML, has been a bottleneck at times in the past 6 months
  6. u - very heavily used, 151K+ rows, predominately reads some DML, some inefficient code but not generally a bottleneck

*Proposal:*

Use Terracotta caching to make the application more scalable as we add more customers and users.

The database architect is proposing that table c be fully cached because he thinks it is important. He also proposes only one state of the 13 for table m be cached. The working theory is that this single state accounts for most of the reads against the table and the other 12 states are frequently updated/inserted. We both agree that the u table should be cached. He is planning to refresh the cache every 5 to 10 minutes. When I asked him what were we expecting the end user to see, he said there shouldn't be any change in performance but this would make the application scalable. I also asked what was the expected benefit if he was freshing the cache that often. I didn't get an answer.

I suggested that the p table be cached rather than the c table since it is used more often in the application and doesn't have as much DML. I also suggested that we do some research to determine if the state on the m table he wants to cache is really the most often requested state for reads. He nixed that idea saying he knew it was. By reviewing the sql being executed today for the m table, I see that it shows it is not the most frequently requested state for reads but I can't get him to consider he's wrong. I also went to the Terracotta website to get more information about their EX caching and see if they had any recommendations, which they did.

*Questions:*

  1. My understanding of caching in general and Terracotta in particular is you should cache heavily used tables that don't experience a lot of DML. Is this an incorrect understanding? Terracotta says less than 50% of transacations against the table should be DML so that's partly what I base my opinion on.
  2. My theory is we should focus on those parts of the application that we *KNOW* are the bottlenecks rather than just *ASSUMING* we know where they are. If we don't know, then we should research. I had some success researching/tuning other parts of inefficient code so I'm certain we can effect some positive changes in this area. Am I totally off base here thinking we should do more research/testing?
  3. I suggested we spend more time tuning the parts of the code we know are inefficient along with looking at the caching. That was rejected as taking too much time even though the same code must be reviewed to use the caching. I'm really lost on this one as to why you wouldn't look at both options at the same time. Is that unreasonable?
  4. Is refreshing the cache every 5 to 10 minutes a reasonable time period? Is it too often, not often enough, etc. Since the refresh has to read the table, what should I be looking at to see if the benefits are really there?
  5. Terracotta says to look at the cache/hit ratio for the table to determine if it really is a good candidate. How do I look at cache/hit ratios for tables?

Thanks in advance for any clarifcation you can give me.

Sandy
Transzap, Inc.

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Jan 21 2010 - 11:18:40 CST

Original text of this message