The latest expert opinions, articles, and guides for the Java professional.
Issues and Fixes
I have Issues!
Let’s now look at the issues themselves – the bugs behind all the complaints! First of all, how are they caught? Figure 1.14 shows the variety of paths a bug could take to become noticed by you. The most common (31%) way is via user reports and user feedback. In short, we as an industry are failing to test properly. Also, 20% of respondents say that some of their system faults or crashes are a result of performance issues. The vast majority of the remaining half (46%) of issues are found through tooling, including Performance monitoring tools, like APMs, Profilers and home grown software.
Let’s switch focus slightly to understand the symptoms of typical performance issues. The overwhelming symptom is users failing to complete requests. We’re not talking about a user giving up on a request because it’s slow, that just got 12%. Application or server outage also appears for almost 17% of users, but the focus really is on incomplete requests.
Getting to the Root of the Problem
Let’s look back at previous performance issues that have already been caught, diagnosed and fixed. Let’s understand what the root causes for those issues are. After all, understanding how we should strategize and focus our future performance testing based on our history is crucial. In figure 1.16, on the previous page, huge proportions of activity here occur around the database, with too many (38.6%) or slow database queries (54.8%) being the culprits. Inefficient application code is also a problem at 51.47%. Concurrency, Configuration issues and Memory leaks sit at around 20-30%, with GC pauses and Slow DB just under 20% each. Other root causes seem less frequent. It’s interesting to see the disk IO and network IO which are often considered to be high in latency not really appearing as much of a root cause to performance issues, relative to other culprits.
Again, we can see that it was very common for people to have multiple root causes across many different issues. In fact, figure 1.17 shows us that almost two thirds responded with two or more root causes.
We know more about the issue now, so let’s try to understand the fix a little more. Different people will take different amounts of time to fix different issues in different ways. I guess I’m trying to say we’re all different, so let’s look at how long we tend to take for finding, fixing, and testing issues. Figure 1.18 shows that the most common answer was 0-1 days. This can be considered pretty fast, but of course it all depends on the complexity of the issue. Over half the respondents (52%) take less than half a week to fix an issue, but as you can see from the graphic, there’s a pretty long tail which pulls the average out to just under a week at 4.38 days. Just over one in four respondents (27%) claimed their fixes took a full week or more to diagnose, fix and test. It will be interesting to see who takes longer to fix the bugs later. *Spoiler* it is indeed interesting!
The graphic in figure 1.19 shows us an overwhelming statistic with three out of four respondents claiming their performance issues tend to affect their end users in production. This is huge, and a clear stat that needs to be improved to give our users a better experience with our applications. Actually, if you heed the data from this report and care about performance as if you were a user, you could really get an edge over your competitors by being one of the 25%. After all, if you have two services to choose from and one is plagued with performance issues – while the other works without pain – it’s clear which you’d use.
The big question and in fact one of the most clear objectives we have when running performance testing is to make our application more performant, duh! But do we actually test this? That is to say, do we measure how much more performant our fix makes our application? That may sound like a stupid thing to say, but according to the data, one in four respondents don’t test or don’t compare before and after, or don’t know how to compare. We have wonderful tools available like JMeter and Gatling which provide us with benchmarking for our applications and should really be used to test for performance fixes. Just over one in four people state their application is only marginally faster. The remainder, which is almost half the respondents see a sizable increase from 50% faster to more than 3 times as fast.
Another metric we can use to determine how successful our performance testing has been, is the number of issues we’ve found in an application release. This will of course depend on the size of the release, which we’ll pivot on in later sections, but just looking at the raw data for the question reveals a split. The majority find approximately 0-3 issues (69%), while around 25% find 4-20 issues and 10% find more than 20 issues each release. The average number of issues found is just over 5 per release, but as mentioned before, this data is somewhat meaningless until we pivot on it in later parts. Next up, it’s Part 2, so you don’t have to wait long!
No comments yet.
Leave a comment