Have you walked down the ORM road of death?
April 30, 2009A friend of mine asked me a really good question tonight:
Hey Stefan,
It would be great if you could please give me a sense for how many development teams get hit by a database bottleneck in JEE / Java / 3-tier / ORM / JPA land? And, how they go about addressing it? What exactly causes their bottleneck?
I think most successful apps – scaling problems are hopefully a sign that people are actually using the stuff, right? – built with Hibernate/JPA hit db contention pretty early on. From what I’ve seen this is usually caused by doing excessive round-trips over the wire or returning too large data sets.
And then we spend time fixing all the obvious broken data access patterns, by first to use HQL over standard eager/lazy fetching, or tuning existing HQL and then direct SQL if needed.
I believe the next step after this is typically to try to scale vertically, both in the db and app tier. Throwing more hardware at the problem may get us quite a bit further at this point.
Then we might get to the point where the app gets fixed so that it actually makes sense to scale horizontally in the app tier. We will probably have to add a load balancer to the mix and use sticky sessions by now.
And then then we will perhaps find out that we will not do that very well without a distributed 2nd level cache, and that all our direct SQL code writing to the DB (that bypass the 2nd level cache) won’t allow us to use a 2nd level cache for reads either…
Here is where I think there are many options and I’m not sure how people tend go from here. Here we might see some people abandoning ORM, while others may try to get the 2nd level cache to work?
Are these the typical steps for scaling up a Java Hibernate/JPA app? What’s your experience?
Web pages are disappearing?
April 30, 2009I believe the page (url) is becoming more of a task oriented landing area where the web site will adopt the contents to the requesting user’s needs. I believe the divorce between content and pages is inevitable. It will be interesting to see how this will affect the KPI:s, analytics tools we currently use and search engine optimization practices going forward.
I recently attended a breakfast round-table discussion hosted by Imad Mouline. Imad is the Chief Technology Officer of Gomez. For those who aren’t familiar with Gomez, they specialize in web performance monitoring. It was an interesting discussion with participants from a few different industries. Participants were either CTO:s or CTO direct reports.
Imad shared a few additional trends regarding web pages (aggregated from the Gomez data warehouse):
- Page weight is increasing (kB/page)
- The number of page objects are plateauing
- The number of origin domains per page are increasing
We covered a few different topics, but the most interesting discussion (to me) was related to how web pages are being constructed in modern web sites and what impact this has on measuring service level key performance indicators (KPI:s).
In order to sell effectively you need to create a web site that really stands out. One of the more effective ways of doing this is to use what we know about the user to contribute to this experience.
In general we tend to know a few things about each site visitor:
- What browsing device is the user using (agent http header)
- Where the user is (geo-ip lookup)
- What the user’s preferred language is (browser setting or region)
- Is the user is a returning customer or not (cookie)
- The identity of the customer (cookie) and hence possibly age, gender, address etc
- What time of day it is
So we basically know the how, who, when, where and what’s. In addition to this we can use data from previous visits to our site, such as click stream analysis, order history or segmentation by data warehouse analysis fed back into the content delivery system to improve the customer experience.
For example, when a user visits our commerce site we can use all of the above to present the most relevant offers in a very targeted manner to that user. We can also cross-sell efficiently and offer bonuses if we think there is a risk of this being a lapsing customer. We can adapt to the user’s device and create a different experience depending on if the user is visiting in the afternoon or late night.
If we do a good job with our one-to-one sales experience, the components and contents delivered on a particular page (url) will in other words vary depending on who’s requesting it, from where the user is requesting it, what device is used, and what time it is. Depending on the application and the level of personalization, this will obviously impact both the non-functional and functional KPI:s: What is the conversion rate for the page? What is the response time for the page?
Sunset
April 25, 2009I am a long time fan of Robert X. Cringely and I was looking forward to his comments on the Oracle/Sun debacle. Here’s what he said in his blog – I couldn’t agree more:
it ends with the heart of Sun moving a few miles up 101 to where it will certainly die.
But for the most part what Oracle will do with Sun is show a quick and dirty profit by slashing and burning at a produgious rate, cutting the plenty of fat (and a fair amount of muscle) still at Sun. If you read the Oracle press release, the company is quite confident it is going to make a lot of money on this deal starting right away. How can they be so sure?
It’s easy. First drop all the bits of Sun that don’t make money. Then drop all the bits that don’t fit in Oracle’s strategic vision. Bring the back office entirely into Redwood Shores. The cut what overhead is left to match the restructured business. Sell SPARQ to some Asian OEM. Cut R&D by 80 percent, saving $2.4 billion per year. I’m guessing sell StorageTek, maybe even to IBM. And on and on. Gut Sun and milk what remains.
Read more at http://www.cringely.com/2009/04/sunset/
Regarding my previous post – I think that the acquisition is the start of a long death process for Java open source. I do not expect Oracle to announce the death of anything, but it will never the less die unless fully embraced by Oracle. The sun will surely set on Glassfish and the rest of the projects that doesn’t make any money for Sun, nor is of strategic interest to Oracle.
Oracle kills Open Source Java with a really big rock?
April 20, 2009Being known as the guy that called Oracle evil in a blog post, I feel I gotta comment on today’s announcement that Oracle is buying Sun Microsystems for 7.4 billion US dollars. As you can imagine, I’m not very optimistic.
What does the deal really mean to the Open Source Java community? Isn’t this just business as usual? And wouldn’t we be worse off if Big Blue would have bought Sun a couple of weeks back?
As you might have guessed, I would have preferred IBM to buy Sun for many reasons. Perhaps the main thing is that I feel IBM has been embracing open source, whereas Oracle hasn’t. It makes all the difference. Let’s hope Oracle sees the light and doesn’t screw up everything Java!
The Sun assets at stake:
Development Tools: NetBeans
Middleware: OpenSSO, Glassfish, MySQL, Java Hotspot JVM, Java Real Time System
Consumer Technology: OpenOffice, JavaFX / JavaFX Mobile
I’ll try to describe what I think is a likely outcome of the assets above by comparing them to Oracle’s current product line and let’s see how bad this actually can get…
Sun NetBeans vs Oracle JDeveloper
This is easy – no one uses JDeveloper, and it would surprise me if Oracle didn’t bite the bullet and ditch JDeveloper for NetBeans which has become a really good (the best?) IDE recently.
Sun OpenSSO vs Oracle Access Manager
There will be no point for Oracle to invest any money in OpenSSO when they already have a good offering in their Fusion middleware suite. OpenSSO is toast.
Sun Glassfish vs Oracle Weblogic
Oracle has a stronger app server in Weblogic than Glassfish is. I think that Glassfish will be put to the axe. Quickly. Oracle is not known for giving away software and they will only open source software that is needing life support. Some recent examples include TopLink and ADF Faces.
Sun MySQL vs Oracle RDBMS
I think this is _the_ most obvious: MySQL is TheirSQL now and also R.I.P.
Sun Java Hotspot vs Oracle JRockit
Being the cynical person I am, I think Oracle can kill two birds with one stone here. One might think that Oracle would merge the two VM efforts into one, but the result might not be what you think. Let’s just assume that Oracle takes the good stuff from JRockit and puts it into the Hotspot JVM reference implementation. Is this a likely scenario? Hell no. Oracle is in the software business to make money, and if you want to run a production-grade Java Server VM, then you will have to get it from Oracle for a fee. By doing this they also effectively kill Terracotta, the only viable contender to Oracle Coherence. See, Terracotta will not run on JRockit… The consumer JVM will be named Sun JVM. Oracle will of course keep the Java Real Time VM as it’s profitable business.
OpenOffice, JavaFX / JavaFX Mobile
Oracle’s track-record in building consumers applications is, well, not great. Anyone that’s ever tried to install an Oracle product knows what I’m talking about. So I don’t think OpenOffice will survive either. On the other hand Larry may want to keep pushing it just to be a thorn in Microsoft’s side… As for JavaFX, it doesn’t really stand a chance versus Adobe – its too little, and too late. Oracle knows this and will kill it. Quietly.
So, is this good for anyone at all? Yes. Oracle and Microsoft. Everyone else loses. IBM is in a really awkward situation and JavaOne this year may be the ultimate funeral service for (free) open source Java.
Keep in mind that when Oracle says open they generally mean open standards whereas when IBM and Sun says open they generally mean free open source software.
I’ll close with a few Larry Ellison quotes from the conference call:
“Java is the foundation of the Oracle Fusion middleware and its the second most important software asset we’ve ever acquired.”
“We acquired BEA because they had the leading Java virtual machine”
Unibet Privacy Proxy
February 18, 2009One of the cat and mouse games we play in the e-gaming space is with the regulating authorities is the blocking/anti-blocking game.
To give you some background on what’s going on we need to look at the legal landscape in the EU for e-gaming.
Most EU member states try to enforce a (state-owned) monopoly on offline and online gaming. The EU on the other hand is pro-competition and opening up the markets on equal terms for privately owned operators with a licensing process for each country in line with articles 59 and 60 of the Rome treaty.
Article 59. Within the framework of the provisions set out below, restrictions on freedom to provide services within the Community shall be progressively abolished during the transitional period in respect of nationals of Member States who are established in a State of the Community other than that of the person for whom the services are intended.
The Council may, acting by a qualified majority on a proposal from the Commission, extend the provisions of this Chapter to nationals of a third country who provide services and who are established within the Community.
Article 60. Services shall be considered to be ’services’ within the meaning of this Treaty where they are normally provided for remuneration, in so far as they are not governed by the provisions relating to freedom of movement for goods, capital and persons.
‘Services’ shall in particular include:
* (a) activities of an industrial character;
* (b) activities of a commercial character;
* (c) activities of craftsmen;
* (d) activities of the professions.Without prejudice to the provisions of the Chapter relating to the right of establishment, the person providing a service may, in order to do so, temporarily pursue his activity in the State where the service is provided, under the same conditions as are imposed by that State on its own nationals.
However, the member states hasn’t been very keen on letting a huge amount of profit from it’s fully-owned state operated lotteries, casino and betting companies be subject to outside competition. Not to mention revenue from the tax on gaming…
As you can imagine there is not a whole lot of interest from the member states to open up the monopolies and risk being subject to competitor. So the EU is taking legal action against these member states and fining them until they do open up the markets.
So not too many member states have had an open market, the exceptions being the UK and Italy. As the UK has had privately held operators in a regulated market for many years the government doesn’t have any interests to protect, but this is far from the case in the rest of the member states.
One of the quirks with what I guess one can call “reversed e-commerce” (when the customers occasionally gain money from using a service), is that the a EU citizen can use any e provider outside of the national borders, but is still still subject to national tax. Also, if a privately held operator wants to apply for a license to operate in a EU member state that decides to open up an regulated market, that operator need to withhold the tax for the customers and also pay tax on its profits.
In order to force the nationals of a member state to only play with the licensed companies in a regulated market, the strategy is to try to cut off access to other online e-gaming services by DNS-blocking (being practiced in Italy at the moment). Sweden has said that they will use IP-address level blocking in addition to DNS blocking to restrict Swedes to only access the licensed betting sites.
This really annoys me. One of the reasons Internet has become what it is today is openness, and I am very concerned that countries in the free world are now using Chinese mentality approach to protect revenue streams from gambling.
And when I get annoyed, I try to help people circumvent these communist approaches
Enter the Unibet Privacy Proxy
So, I decided to set up a proxy in the cloud – in this case using Amazon EC2. I spent an evening setting up a Linux image with Squid (a proxy server) and an Apache web server.
On the Apache web server I serve a proxy auto-configuration file that proxies only URL:s used by Unibet via the proxy. All other URL:s doesn’t use the proxy.
The Squid proxy is an open proxy, but does only proxy URL:s for Unibet servers.
Then, to simplify use and configuration, I spent a few nights writing an add-on for Firefox. I’m looking into doing the same for Internet Explorer.
I love these projects, because I learn a lot by doing them. In this case I learned a lot about both Amazon EC2 and Firefox add-on development! Hopefully I can manage to get the IE add-on done too at some point!
Happy proxying!
Speed sells
February 1, 2009This coming week (first week of February), Unibet launches its revamped website based on the Facelift project I lead. As a part of this effort, we have worked extremely hard in order to lower page loading times. We have invested a substantial amount of time and money focusing on improving performance. Is this really justified?
A 2006 study by Jupiter Research found that the consequences for an online retailer whose site underperforms include diminished goodwill, negative brand perception, and, most important, significant loss in overall sales. Online shopper loyalty is contingent upon quick page loading, especially for high-spending shoppers and those with greater tenure.
The report ranked poor site performance second only to high prices and shipping costs as the main dissatisfaction among online shoppers. Additional findings in the report show that more than one-third of shoppers with a poor experience abandoned the site entirely, while 75 percent were likely not to shop on that site again. These results demonstrate that a poorly performing website can be damaging to a company’s reputation; according to the survey, nearly 30 percent of dissatisfied customers will either develop a negative perception of the company or tell their friends and family about the experience.
+500 ms page load time lead to a -20% drop in traffic at Google
Marissa Mayer ran an experiment where Google increased the number of search results from ten to thirty per page. Traffic and revenue from Google searchers in the experimental group dropped by 20%.
After a bit of looking, they found an uncontrolled variable. The page with 10 results took 400ms to generate. The page with 30 results took 900ms. Half a second delay caused a 20% drop in traffic. Half a second delay killed user satisfaction.
“It was almost proportional. If you make a product faster, you get that back in terms of increased usage”
-Marissa Mayer,VP Search Product and User Experience at Google
The same effect happened with Google Maps. When the company trimmed the 120KB page size down by about 30 percent, the company started getting about 30 percent more map requests.
+100 ms page load time lead to a -1% sales at Amazon
Amazon also performed some A/B testing and found that page load times directly impacted the revenue:
“In A/B tests, we tried delaying the page in increments of 100 milliseconds and found that even very small delays would result in substantial and costly drops in revenue.”
-Greg Linden, Amazon.com
There are a number of tools and best-practices available to improve web-site performance. I particularly like the work of Steve Souders. Steve was the Chief Performance Yahoo! (at Yahoo! obviously) and is now at Google doing web performance and open source initiatives.
When at Yahoo, Steve published a benchmark and tool, called YSlow which is a good indicator of how well the front-end web technology (HTML, javascript and images etc) of your site is implemented. Front-end makes up for almost 90% of the page load times at more e-commerce sites.
At Unibet, our old HTML had a YSlow score of 56/100 in average. This is about average in the e-gaming industry. However, the Facelifted version just out is 96/100. As comparison, eBay start page is 97/100, Yahoo! start-page is 95/100. This should result in reduced wait and based on the research above this will help drive revenue and customer satisfaction.
We have worked extremely hard in order to lower page loading times. We have invested a substantial amount of time and money in doing so. Is this really justified? YES! I am confident that our new site will contribute to increased sales and increased customer lifetime value.
Facelift and EDA
October 11, 2008I’m getting back in shape! Since my last post I’ve become a dad again and life is good. I’ve taken up exercise again, and run almost every day. I’ve lost 10 kg:s and is starting to look somewhat fit again…
Operation Facelift
Fortunately for the company – we’re getting in shape at work too! We’re currently moving to XHTML 1.1 strict and a floating layout with skinning support. We’re also moving to 100% YUI and optimizing for SEO and performance. TTM and TCO will decrease significantly too. Man, I love to work with great front-end people. This is going to rock!
Event Driven Architecture
I’ve also kicked off a huge push where all the product teams will start aligning their architectures to an Event Driven Architecture (EDA) model. EDA is great for separation of concerns (the registration module of the customer system will let everyone who cares know that a new customer has registered) and also for getting a scalable architecture (async async async!).
Maven2 and Hudson
We’ve gotten all the product teams up on Maven2 from the god-forgotten shell-script/ant mess we had a year ago. Hudson is used for continuous building. I love Hudson – highly recommended! We also migrated to Subversion (finally).
The (almost) perfect (rich) website
May 27, 2008I am personally a fan of light-weight web pages that use W3C standards based elements and layout. However, many commercial web sites seem to want to move to a more “print-like” experience.
The cost of moving to a richer experience is usually higher maintainance cost and round trip time – you need the graphics or flash guys for many changes. SEO (Search Engine Optimization) suffers as the graphics can’t be indexed by the web crawlers, and you usually take a hit on page load times too.
Wouldn’t it be great if you could make a web site that is:
- Great looking
- SEO friendly
- Quick to load and render
- and is XHTML compliant
We have come a long way at unibet.com, but we made some compromise in look and feel for speed and we also do still have article headers using generated images. This has bothered me for some time. One of our consultant mentioned that he know of someone that used Flash for rendering headlines, and it sounded like a good idea to me. I did some research and stumbled upon sIFR.
sIFR (or Scalable Inman Flash Replacement) is a technology that allows you to replace text elements on screen with Flash equivalents. Put simply, sIFR allows website headings, pull-quotes and other elements to be styled in whatever font the designer chooses – be that Foundry Monoline, Gill Sans, Impact, Frutiger or any other font – without the user having it installed on their machine. sIFR provides some javascript files and a Flash movie in source code format (.fla) that you can embed your fonts into. It’s really easy to set up.
To use sIFR on your website you embed the font (be careful to encode all (but only) the chars you will need) to minimize the size of the Flash movie. Typically the SWF movie is between 8-70kB. This may seem like a lot more than an image, but remember that the SWF will be cached for a very long time in to browser if you’ve set up your web server correctly. Effectively the font flash will only be downloaded once or not at all per site visit.
When you have made the SWF:s you need, just add a few lines of sIFR code into the web page and that’s it.
The following explains the sIFR process in the browser:
- A web page is requested and loaded by the browser.
- Javascript detects if Flash 6 or greater is installed.
- If no Flash is detected, the page is drawn as normal.
- If Flash is detected, the HTML element of the page is immediately given the class “hasFlash”. This effectively hides all text areas to be replaced but keeps their bounds intact. The text is hidden because of a style in the style sheet which only applies to elements that are children of the html.hasFlash element.
- The javascript traverses through the DOM and finds all elements to be replaced. Once found, the script measures the offsetWidth and offsetHeight of the element and replaces it with a Flash movie of the same dimensions.
- The Flash movie, knowing its textual content, creates a dynamic text field and render the text at a very large size (96pt).
- The Flash movie reduces the point size of the text until it all fits within the overall size of the movie.
sIFR is a clever hack, but none the less a hack. The result is really amazing however. It’s hardly noticeable to the end user and meets all the four requirements I set up in my “what if…” list above so we’re moving to sIFR for the next release of unibet.com.
While sIFR gives us better typography today, it is clearly not the solution for the next 20 years.
Further reading:
Cache-aside, write-behind, magic and why it sucks being an Oracle customer
May 25, 2008I’ve been looking at a few different technologies to improve the scalability of one of our applications. We’re scaling pretty ok to be honest, considering that we currently have a traditional database centric solution. The cost for scaling an database bound application running Oracle is crazy to say the least considering they charge $47 500 per two x86 cores for Oracle Enterprise Edition. On top of this it’s 22% for software updates and support per year. As this wasn’t enough, they also increase the support cost with 4% per year.
You might think that the price above is for production environments, but in fact you have to pay for every single installation throughout the organization. There are no discounts for staging, DR, test or development environments.
I have a piece of advice for all you kids out there considering to run Oracle – just don’t do it.
This advice goes for all Oracle products really, as they all have the same pricing model.
Databases are overrated
My strong recommendation is to build an application that doesn’t rely on an underlying RDBMS. The relational database is an overrated, overly complex form of persistent store. They are slow, and are also usually a single point of failure. Does this mean that databases are dead and a thing from the past? No, but the role of the database will probably change going forward. In my opinion we should use the RDBMS as a System of Record that is mostly up to date.
If you ask me, databases are great at mainly two things:
- They make the data accessible for other systems in a standard way and
- They have a strong query language that many people know
So, write to databases asynchronously and use it for reporting and extracting data. Store the data in a data grid in the application tier (where it’s used).
What is a Data Grid?
A Data Grid is a horizontally scalable in-memory data management solution. Data grids try to eliminate data source contention by scaling out data management with commodity hardware.
Some underlying philosophies of data grids – according to Oracle (sic!):
- Keep data in the application tier (where it’s used)
- Disks are slow and databases are evil
- Data Grids will solve your application scalability and performance problems
I have been looking at three different data grid vendors; Oracle Coherence, Gigaspaces EDG/XAP and Terracotta DSO.
Oracle Coherence
I really like this product. It focuses solely on being a potent data grid, with abilities to do act as a compute grid as well. Although I haven’t used Coherence for any large projects, its design and concepts are easy to relate to. It supports JTA transactions and consists of a single jar that you drop into your class path. The Coherence configuration doesn’t contain any infrastructural descriptions which means that you can use the same configuration on a your development laptop as in the production environment with multiple servers. The main issue with Coherence is the fact that evil Oracle owns it since a few years back.
Gigaspaces XAP
Gigaspaces mission seem to be to provide a very scalable application server with XAP – “The Scale-Out Application Server”. The EDG – enterprise data grid – packaging seem to provide about the same feature set as Coherence. The main difference to me, is the fact that the Gigaspaces offerings are both application server infrastructure that needs configuration, deployments and all of that. As I see things, the main drawback is the application server approach – it feels overwhelming. On the other hand, Gigaspaces is still a smaller company and eager to do business and provide great implementation support and the product seems to be a really good application server.
Terracotta DSO
Terracotta has a different approach. They provide Networked Attached Memory for the Java heap. If you can write a thread-safe program, you can scale out using Terracotta with no or minor changes to your application. From a technical point of view it’s a beautiful solution: You declare what objects you want to make available using Terracotta, and then Terracotta will makes your data persistent (if you want) and available on all clustered nodes. When you invoke new() on a clustered object, you will get a reference to the cluster object (if one exists). Another important difference between Terracotta and the others is that they only send the part of an object that’s been changed rather than the full serialized object graph.
I’m in love with this product. Its free and open source too and Terracotta Inc provides commercial support. The main concern I have with Terracotta is that its really a paradigm shift to the average java enterprise developer to start to write multi-threaded programs without having JTA transactions. Another concern is the magic – the low-level hooks they do in the JVM:s. At the time of writing, only Sun and IBM JVM:s are supported. It runs fine on OSX though.
The bottom line
So which one is the better? Well, that depends on a lot of things as always. If you decide to move to the grid it’s going to require retraining of your developers regardless of what solution you go for.
Please do keep in mind that products doesn’t usually solve your problems. And that you can go a long way using a less expensive RDBMS by partitioning the data across multiple servers – sharding. This is what a lot of large sites out there do.
Further reading:
The Coming of the Shard
eBay’s Architectural Principles
Oracle Coherence
Gigaspaces EDG
Terracotta DSO

Posted by stnor
Posted by stnor
Posted by stnor