Cache-aside, write-behind, magic and why it sucks being an Oracle customer

I’ve been looking at a few different technologies to improve the scalability of one of our applications. We’re scaling pretty ok to be honest, considering that we currently have a traditional database centric solution. The cost for scaling an database bound application running Oracle is crazy to say the least considering they charge $47 500 per two x86 cores for Oracle Enterprise Edition. On top of this it’s 22% for software updates and support per year. As this wasn’t enough, they also increase the support cost with 4% per year.

You might think that the price above is for production environments, but in fact you have to pay for every single installation throughout the organization. There are no discounts for staging, DR, test or development environments.

I have a piece of advice for all you kids out there considering to run Oracle – just don’t do it.

This advice goes for all Oracle products really, as they all have the same pricing model.

Databases are overrated

My strong recommendation is to build an application that doesn’t rely on an underlying RDBMS. The relational database is an overrated, overly complex form of persistent store. They are slow, and are also usually a single point of failure. Does this mean that databases are dead and a thing from the past? No, but the role of the database will probably change going forward. In my opinion we should use the RDBMS as a System of Record that is mostly up to date.

If you ask me, databases are great at mainly two things:

  1. They make the data accessible for other systems in a standard way and
  2. They have a strong query language that many people know

So, write to databases asynchronously and use it for reporting and extracting data. Store the data in a data grid in the application tier (where it’s used).

What is a Data Grid?

A Data Grid is a horizontally scalable in-memory data management solution. Data grids try to eliminate data source contention by scaling out data management with commodity hardware.

Some underlying philosophies of data grids – according to Oracle (sic!):

  • Keep data in the application tier (where it’s used)
  • Disks are slow and databases are evil
  • Data Grids will solve your application scalability and performance problems

I have been looking at three different data grid vendors; Oracle Coherence, Gigaspaces EDG/XAP and Terracotta DSO.

Oracle Coherence

I really like this product. It focuses solely on being a potent data grid, with abilities to do act as a compute grid as well. Although I haven’t used Coherence for any large projects, its design and concepts are easy to relate to. It supports JTA transactions and consists of a single jar that you drop into your class path. The Coherence configuration doesn’t contain any infrastructural descriptions which means that you can use the same configuration on a your development laptop as in the production environment with multiple servers. The main issue with Coherence is the fact that Oracle owns it since a few years back.

Gigaspaces XAP

Gigaspaces mission seem to be to provide a very scalable application server with XAP – “The Scale-Out Application Server”. The EDG – enterprise data grid – packaging seem to provide about the same feature set as Coherence. The main difference to me, is the fact that the Gigaspaces offerings are both application server infrastructure that needs configuration, deployments and all of that. As I see things, the main drawback is the application server approach – it feels overwhelming. On the other hand, Gigaspaces is still a smaller company and eager to do business and provide great implementation support and the product seems to be a really good application server.

Terracotta DSO

Terracotta has a different approach. They provide Networked Attached Memory for the Java heap. If you can write a thread-safe program, you can scale out using Terracotta with no or minor changes to your application. From a technical point of view it’s a beautiful solution: You declare what objects you want to make available using Terracotta, and then Terracotta will makes your data persistent (if you want) and available on all clustered nodes. When you invoke new() on a clustered object, you will get a reference to the cluster object (if one exists). Another important difference between Terracotta and the others is that they only send the part of an object that’s been changed rather than the full serialized object graph.

I’m in love with this product. Its free and open source too and Terracotta Inc provides commercial support. The main concern I have with Terracotta is that its really a paradigm shift to the average java enterprise developer to start to write multi-threaded programs without having JTA transactions. Another concern is the magic – the low-level hooks they do in the JVM:s. At the time of writing, only Sun and IBM JVM:s are supported. It runs fine on OSX though.

The bottom line

So which one is the better? Well, that depends on a lot of things as always. If you decide to move to the grid it’s going to require retraining of your developers regardless of what solution you go for.

Please do keep in mind that products doesn’t usually solve your problems. And that you can go a long way using a less expensive RDBMS by partitioning the data across multiple servers – sharding. This is what a lot of large sites out there do.

Further reading:
The Coming of the Shard
eBay’s Architectural Principles
Oracle Coherence
Gigaspaces EDG
Terracotta DSO

More for less

Since I started working with IT, I’ve always focused on helping organizations increase productivity and/or cut costs. Since I joined Unibet I constantly am challenged by my managers to cut operational cost, with at the same time making the system and platform more performant, available and scalable. You might think that this would suck, but it’s really great fun. Its very rewarding and not too difficult really. My approach is to question everything. Start asking questions! Why? How much does it cost? What value does it provide?

So far I’m looking on a number of different areas where we can cut some cost.

Example #1 – KISS = Effectiveness

After a few month in my new position I started to question the current technical setup we had. We have one site in Malta and another one in Costa Rica. The latter site was set up a few years ago, and they moved some of our markets there for legal reasons. The thing that I was surprised with was that no one was challenging this decision or reevaluating it even though it seemed to cause major issues in production and increased development costs by quite a bit – obviously TTM suffered too. So, I decided to look at what our competitors are doing and I quickly come to the conclusion that they seemed to have a much more straight forward IT infrastructure.

The next step was obviously to try to change this so that we could be more effective, provide a better service, and increase TTM – while at the save time cutting costs. And the way to do this in any organization is to present a business case that explains what the rationale for making the change is. With the help of the colleagues in the IT management team I delivered a business case to my managers which in turn was presented to legal (who had been advocating for setting up the second site in the first place). They were baffled what the actual cost was for the current setup, and also that no one really explained to them before what the implications of their requirement was. As there was very hard to justify the direct and indirect costs by having the second data center in production, we are now, four months later, not in Costa Rica anymore.

Example #2 – Bandwidth costs

So, we run our business solely off Malta, a not particularly interesting rock in the Mediterranean. Bandwidth costs are insanely high in Malta due to the lack of competition in this space – and we require quite a lot of it. Most of our competitors run their systems closer to mainland Europe (London, Vienna, Gibraltar, Isle-of-Man and Madrid to name the most popular hosting locations for e-gaming). Legally they probably take a slightly higher risk by doing so, but they gain better performance as they are closer to the customers and they have lower cost – hosting and bandwidth costs are 30-50% of what we pay in Malta.

For this reason I was curious if it was allowed to run off a Maltese e-gaming license outside of Malta. After reading up on the regulations for the Maltese LGA’s laws and regulations, I found out that its allowed to have everything except the very core pieces of the site outside of Malta.

So, we move more and more stuff onto the Content Delivery Network. Currently we are diverting more than 50% of the traffic to the CDN and hence we could reduce our bandwidths costs in Malta by a lot. ROI from day one!

Example #3 – Support and software license costs

Another huge operational expense is the license fees we’re paying to companies such as Oracle, Bea. Oracle has a really good product that I don’t mind paying for but the issue here was that we payed too much (for too many CPU:s). We had database (disaster) replication using Oracle Data Guard to a server on the same site. We also had as many CPU:s active in the Data Guard as in the production databases. I read up on Oracle license agreement fine print and quickly came to two conclusions: We shouldn’t use more than one or two CPU:s in the replication database. We can consolidate smaller databases and save license costs. On top of this it was fairly easy to look at parts of the application and rewrite it to minimize load on the production database. In about four months we managed to reduce the load on the main database by 50% or more, hence cutting the Oracle licensing costs by the same amount.

As for Bea Weblogic costs, I don’t really see a point in paying them going forward. Application Servers are becoming commodity (as in open, free software), and Bea’s product isn’t really providing the business value to justify its cost. Bea support is infamous for its terrible offshore first line in India, and you get no help from them unless you reproduce the problem yourself, write the test case and submit it. You’re ending up doing what you pay them to do for you. Let me just say, I’m eating a hat if we’re paying for Bea’s services in eight months from now.