The latest expert opinions, articles, and guides for the Java professional.
Developer Productivity Report 2015 at a Glance
Part 1: You and the Data
Tools and Processes
Issues and Fixes
Summary and Conclusion
If I asked you to pick one aspect of RebelLabs that rocks, you might well pick the beautiful reports we’ve produced over the years. If I asked you to pick one standout report, it might well be one of the developer productivity reports we’re well known for. As the flagship report which RebelLabs has produced year after year, it gives me great pleasure to be involved so heavily in the 2015 edition. In fact, I’ve been happier than a kitten under a leaky cow – mooooooooo!
This year we decided to dive back into a specific area, rather than looking at a broader topic. Given our interest in the performance arena, having created the new developer focused Java profiler XRebel, it was a great time to explore a discipline with very few industry experts.
Back in March 2015, I created a list of questions based on Java performance that would give us insights into how teams and organizations go about their performance testing. A few months later I collated this data and examined it thoroughly to find trends that I could share with you in this report. We received 1562 responses to our survey this year.
Ideally, we’d liked to have passed the 2000 mark, but are happy that we have enough information on which we can base our opinions and find trends. We also raised loads of money for a great charity from completed surveys, so great job to all those who answered all our questions!
I sincerely hope you enjoy reading this report as much as I have enjoyed putting it together and I wish all your performance dreams come true!
— Simon Maple
Developer Productivity Report 2015 at a Glance
Java Performance is often considered the dark art of software development. In fact, aspects which you might think are the simplest of tasks, like benchmarking a piece of code, can in fact turn out to be among the most complex. It’s inevitable that different people and organizations will approach performance testing in different ways. This in turn means that the benefits you might see, if any, will vary from person to person. Many variables, including who runs the performance tests, the toolsets, the stage when you run your performance tests and many more, all add to the number of variables which will ultimately affect the success of your application performance – including your end user satisfaction.
This report sets out to understand how performance testing is currently dealt with by different organizations and teams within those organizations. We aim to understand trends, best practices and pitfalls that could be avoided based on how things are done today, with the data we have collected.
The survey itself was released in March 2015 and contained 19 questions, profiling the respondent, their application, their processes, tools and more. Overall we received 1562 responses. Although we wanted to hit the 2000 mark, 1562 is a sufficiently large enough sample to see trends.
While we expect everyone to be eagerly waiting to complete our survey each year, we understand that your time is valuable. We again decided to add a charity donation for each respondent who took the time and effort to complete our survey. This year we chose Dogs for the Disabled as our charity. Here’s a description of Dogs for the disabled from their website.
We committed to donating $0.50 (USD) to Dogs for the Disabled for every completed survey, which equates to $781. We rounded this up to $1000 as we’re kind and because it’s such a deserving and rewarding charity! So if you were one of the many that completed the survey, you should feel good that you contributed and have made disabled children’s lives better. High-fives all round!
Dogs for the Disabled is working to provide solutions to help people with a wide variety of different disabilities and conditions; from assistance dogs helping children and adults with physical disabilities and families with a child with autism, to pet dog autism workshops, and innovative new projects working in schools and residential care settings.
– DOGS FOR THE DISABLED, http://www.dogsforthedisabled.org/
The report is split into 3 sections. The first is a representation of the raw answers given to the survey questions. No fluff, no pivoting, just answers! Parts 2 and 3 provide a more in-depth analysis, with pivoting on the data points to understand trends. Pivoting? Wasn’t that a thing in my old physics class with a see-saw, some forces and some kind of formula? Well, maybe, but in this case, we’re asking questions about our data, based on the answers given to other questions. For instance, do those who state their application has over 100 screens find more bugs? Or perhaps: do organizations with dedicated performance teams have fewer complaints from their end users? We’ll ask a raft of questions like this and let our data try and answer them. But for now, let’s start with Part 1 and the raw data.
P.S. Oh and by the way, if you’re just getting started with performance testing, a RebelLabs report that will help you learn and understand the performance basics is also available for you to download – The Developers Guide to Understanding Performance Problems. It will give you a great base knowledge on all things related to Java Performance, including terminologies and tooling. Make sure you read this one too!
Part 1: You and the Data
Show me the Data!
We’re all geeks here and we all love to get our hands dirty in raw data. The good news is, you’re reading the right report! In this section we’ll show you the raw data in graphical form and describe what we can see with a good dose of opinion thrown in. As we described earlier, there will be no pivoting or comparisons at this stage, but be patient, we’ll get to that in the later parts of the report.
We start by profiling the people who filled out the survey, then the application, the organization they work with, the processes they follow… Ok breathe… then the tools they use, the kinds of issues they see, the issues that are fixed… still with me? Almost there… what kind of impact they have and how successful their testing is. Phew, we made it! It sounds like a lot to get through, so grab a coffee/tea/beer/water/drink of choice and let’s get started!
The respondent, aka you!
As we’d expect, the vast majority of those who responded to our survey labeled themselves software developers, as shown in figure 1.1. A further 27% sit in the Architect/Team Lead/Project Manager category. This constitutes well over 90% of those who responded, so we can rest assured that we’re talking to the techies! Interestingly, only 1.54% of those who responded were dedicated performance engineers, which either means there are very few performance engineers out there, or that perhaps we didn’t penetrate that specific market with the survey. It’s often difficult to get a survey out to an audience without experiencing bias, and this could be a result of bias from our market reach.
As you’d expect, the vast majority of applications are going to be web applications, shown in figure 1.2 as taking over 70% of the audience. Desktop applications come in at a little over 11% with batch and mobile at just over 6% and 3.5% respectively. In the “Other” column, respondents mentioned other applications such as middleware or application servers. So we can now picture our typical respondent as a software developer who develops a web application. Should we name him/her? How about Sam? Actually, it’s probably better in the long run that we don’t get too attached to them, so we’ll continue calling them the respondent.
Another facet of an application which is extremely important to understand is its complexity. The graph in figure 1.3 displays how many screens or views the applications have. This will give us a very rough guide as to how complex and large an application is. OK, it’s not ideal, but it’s the best and simplest question we could think of which showed us size and complexity without individual opinion or bias. You’ll notice that the majority lay between 10-100, with over 55% of respondents going with that option. The average number of screens is 118.
Let’s find out where our average respondent works. w We can start to understand the average organization a little better by understanding how big the teams that develop, test and support the application are. Figure 1.4 shows the distribution of people working on their application from design and development through to deployment and support. The average is 21.27 people. Remember, this is the arithmetic mean. There isn’t 0.27 of a person trying their best to develop an application using two of their available fingers and 3 toes on their one good leg! With over half of our respondents (55%) stating they work for a team of less than 9 people and 83% of respondents stating they work in a team of less than 25, it’s safe to say that the teams are mostly small. There will always be outliers which pull the average up as organizations turn enterprise, but these are the minority cases for this survey.
Next, we turn to responsibility. We can see straight away from figure 1.5 that as far as responsibility goes, the clear winner sits with the development teams. The data shows that at just over 55%, the most likely person to be responsible for performance testing is whoever wrote that piece of code. This could be for a couple of reasons. Perhaps we really do live in an agile world and the engineer who develops the code fully tests it – functionally, as well as for performance. Another possibility is that there isn’t any process at all. This would mean that performance testing is just dumped on the poor developer who has little support or knowledge about performance testing, so ultimately ignores it. While we all wish that would not be the case, it does creep into mind. Operations, Performance Teams and QA take up similar slices, as well as the answer ”Nobody” at around 11-14%.
The eagle-eyed among you will have seen that our numbers don’t add up to 100% in figure 1.5. Well, you’re right, but the reason is due to the question being multiple choice rather than a n00b and rookie Excel oversight. We now want to see how many respondents state that there are multiple groups responsible for performance. It turns out that often multiple teams are responsible, as shown in figure 1.6. While just over half of the respondents state they’re only one team responsible for performance – the other half state two or more teams share responsibility for it. In fact, the average shows that 1.71 teams will be responsible for performance testing, with the majority of those who state responsibility is shared, doing it across two to three teams.
Tools and Processes
Process, process, process!
Processes that are put in place (with the best of intentions, usually!) will vary widely from organization to organization. Some organizations will have no process whatsoever and only the good will and diligence of a developer will keep an application from spiraling into a fireball of stack traces and Twitter hatred. Conversely, others (less fortunate than ourselves) will have more processes than lines of code, and twice as buggy. One aspect we care for very much is *when* performance tests are run. You will gain different value from testing at different stages. It’s largely accepted that testing as early as you possibly can will result in fixing issues faster and cheaper than when they were found later. This mentality should apply both to functional testing as well as performance testing.
While there is obvious value in performance testing against a production or staging environment, if we can shift as much as possible to the left, putting in effort earlier in the cycle, we should expect better applications that are cheaper to develop. Figure 1.7 shows that 37% of respondents test their application for performance while it’s being developed. That’s a great statistic and shows how important people believe it is to test early. The most common phase to run performance tests was during system/integration testing. Overall, there is a reasonable spread throughout the application lifecycle which is reassuring. As a multiple choice question, the average number of options selected by a respondent was 1.87.
Irrespective of who finds the performance problems, it’s clear from figure 1.8 who fixes the issues that emerge from performance testing. With almost 94%, it’s the developers that apply the fix. Yeh, the same people who wrote the code in the first place. As a result, it makes even more sense to performance test your code as you develop it, as the fix might as well be applied while the bug is being written!
Now we move to the question of test frequency. Profiling is an important part of performance testing, so that we can really understand execution paths, bottlenecks and so forth. However, figure 1.9 shows this as an extremely reactive activity. Over 40% of respondents state they profile their code only when issues arise, rather than something that’s typically done on a regular basis. Other than that answer, the results are largely spread across the other options with almost one in ten stating they never profile their code at all.
Before we look at how much time is spent performance testing during a release, we need to understand how long releases typically are. We can see the usual bell curve here, albeit a bell on its side. With almost half (46%) of the respondents saying their releases are between one and six months, we can say the bulk sit in medium sized release cycles. 21% of respondents release their application every two weeks or less, with a further 21% releasing every 2-4 weeks. 10% of respondents release every six months or more. But how does this affect the amount of time spent performance testing? We’ll find out later on!
Sadly, our next data shows that one in four respondents (26%) spend no time on performance testing during a release. Perhaps this says something about the types or size of changes that tend to go into releases. Over 55% of respondents state they spend 1-5 days performance testing each release, which is much more encouraging. The remaining respondents test for 1 or more weeks each release.
The proverb suggests that a good worker never blames their tools. We’ll not be looking into how effective each tool is just yet, but we will look at what is being used. VisualVM seemed an extremely popular choice as did JProfiler and Java Mission Control, the performance tool bundled with Oracle’s distribution of Java since version 7u40. Interestingly, 20% of respondents state they have their own custom in-house tools. Developers love to develop, right! XRebel is also worth mentioning with over 3% of the votes, particularly because it’s barely a year old!
Another important metric to recognize is that there isn’t a single killer tool that does everything. In fact, if we look at the number of tools used by our respondents, almost half of those that took the survey claimed to use more than one performance profiler for their application. On average, 1.7 tools were used picked by each respondent.
Issues and Fixes
I have Issues!
Let’s now look at the issues themselves – the bugs behind all the complaints! First of all, how are they caught? Figure 1.14 shows the variety of paths a bug could take to become noticed by you. The most common (31%) way is via user reports and user feedback. In short, we as an industry are failing to test properly. Also, 20% of respondents say that some of their system faults or crashes are a result of performance issues. The vast majority of the remaining half (46%) of issues are found through tooling, including Performance monitoring tools, like APMs, Profilers and home grown software.
Let’s switch focus slightly to understand the symptoms of typical performance issues. The overwhelming symptom is users failing to complete requests. We’re not talking about a user giving up on a request because it’s slow, that just got 12%. Application or server outage also appears for almost 17% of users, but the focus really is on incomplete requests.
Getting to the Root of the Problem
Let’s look back at previous performance issues that have already been caught, diagnosed and fixed. Let’s understand what the root causes for those issues are. After all, understanding how we should strategize and focus our future performance testing based on our history is crucial. In figure 1.16, on the previous page, huge proportions of activity here occur around the database, with too many (38.6%) or slow database queries (54.8%) being the culprits. Inefficient application code is also a problem at 51.47%. Concurrency, Configuration issues and Memory leaks sit at around 20-30%, with GC pauses and Slow DB just under 20% each. Other root causes seem less frequent. It’s interesting to see the disk IO and network IO which are often considered to be high in latency not really appearing as much of a root cause to performance issues, relative to other culprits.
Again, we can see that it was very common for people to have multiple root causes across many different issues. In fact, figure 1.17 shows us that almost two thirds responded with two or more root causes.
We know more about the issue now, so let’s try to understand the fix a little more. Different people will take different amounts of time to fix different issues in different ways. I guess I’m trying to say we’re all different, so let’s look at how long we tend to take for finding, fixing, and testing issues. Figure 1.18 shows that the most common answer was 0-1 days. This can be considered pretty fast, but of course it all depends on the complexity of the issue. Over half the respondents (52%) take less than half a week to fix an issue, but as you can see from the graphic, there’s a pretty long tail which pulls the average out to just under a week at 4.38 days. Just over one in four respondents (27%) claimed their fixes took a full week or more to diagnose, fix and test. It will be interesting to see who takes longer to fix the bugs later. *Spoiler* it is indeed interesting!
The graphic in figure 1.19 shows us an overwhelming statistic with three out of four respondents claiming their performance issues tend to affect their end users in production. This is huge, and a clear stat that needs to be improved to give our users a better experience with our applications. Actually, if you heed the data from this report and care about performance as if you were a user, you could really get an edge over your competitors by being one of the 25%. After all, if you have two services to choose from and one is plagued with performance issues – while the other works without pain – it’s clear which you’d use.
The big question and in fact one of the most clear objectives we have when running performance testing is to make our application more performant, duh! But do we actually test this? That is to say, do we measure how much more performant our fix makes our application? That may sound like a stupid thing to say, but according to the data, one in four respondents don’t test or don’t compare before and after, or don’t know how to compare. We have wonderful tools available like JMeter and Gatling which provide us with benchmarking for our applications and should really be used to test for performance fixes. Just over one in four people state their application is only marginally faster. The remainder, which is almost half the respondents see a sizable increase from 50% faster to more than 3 times as fast.
Another metric we can use to determine how successful our performance testing has been, is the number of issues we’ve found in an application release. This will of course depend on the size of the release, which we’ll pivot on in later sections, but just looking at the raw data for the question reveals a split. The majority find approximately 0-3 issues (69%), while around 25% find 4-20 issues and 10% find more than 20 issues each release. The average number of issues found is just over 5 per release, but as mentioned before, this data is somewhat meaningless until we pivot on it in later parts. Next up, it’s Part 2, so you don’t have to wait long!
Holy pivoting, Batman!
Parts 2 and 3 of this report are both focused on pivoting the data we have, allowing us to get answers to more specific questions. But first of all, what is a pivot? Well, when we talk about pivoting data in this report, we’re really talking about a clause in a question. Let’s say we have a sports league with a big table of results, including information about win or loss for each game and whether the game was played at home or away. We might ask a question like: how many games has the legendary baseball team HitBallFar won at home? We’d need to first filter the results just for the team based on the name HitBallFar, then filter just the home games and count all those games which are marked as a win. This is pretty much what we’re doing for the survey data we have, typically filtering across 2 survey questions at a time to try and find trends in the data. In this section of the report, we’ll be focusing on teams within an organization and the application itself. Let’s get pivoting!
Teams and their Structure
First of all we’ll concentrate on the teams that are responsible for the performance testing, the people who actually run the monitoring and who find the issues internally. Our first question is around the role of the people who are responsible for the testing. Is it also their job to fix the issue?
Figure 2.1 shows the percentage of people in specific teams who are responsible for performance testing that also carry out the fix when they find an issue. Within development teams 96.4% of developers who find the issue will fix it themselves, while for QA teams this drops massively to a mere 7.1%.
Approximately one in three dedicated performance teams will make the fix once they find the problem. This figure increases slightly for the Operations team to almost 40%. Of course different teams might find different problems, and the fix might be a coding fix for QA whereas it’s a configuration fix for the operations team. Overall, it’s the development team which takes on the brunt of the fix.
Are dedicated performance teams better?
Next we’re going to zoom into the dedicated performance teams by themselves, and we’ll look at whether they achieve greater success compared to other teams. After all, it is their role. One rough measure of success is the number of issues you find in the application. It’s rough, because different issues have different impacts and can be more meaningful than others, but we’ll use this number as a guide. The line graph in Figure 2.2 shows how the results for a dedicated performance team vary compared to the overall average which we discovered back in Part 1, Figure 1.19. This is represented as a percentage increase or decrease relative to the results from other teams. We can see a definite trend that dedicated performance teams do tend to find more issues than other teams who run performance tests themselves. We see this because their likelihood to find either 0 or 1 bug(s) decreases by up to 10% and the chance of finding 6-19 or 20+ bugs rise by up to 6%.
Do different teams prefer different tools?
We compared the tool selection across all the teams and found there was little difference between tool usage for those that are available on the market, however some teams were more likely to create custom in-house tools. In fact, if you were in QA, a dedicated performance team or Operations, you’re up to 40% more likely to use custom in-house tools compared with other teams. Another interesting statistic from the data is for those groups who state nobody is directly responsible for performance testing. 30% of respondents will not use tooling whatsoever for their performance testing – sad panda. Presumably, those without performance tools just rely on their logs, programming by coincidence or simply no testing whatsoever!
Do different teams test at different times?
The data shows us that developers do prefer to fail early, so they tend to do more performance work to the codebase while they write code – compared to other teams. This is pretty much expected. However for those where nobody is directly responsible, almost 20% of votes go to testing during the development phase, third highest out of all the teams. They also do the lowest amount of performance testing through CI, snapshot builds, system/integration testing and staging. It changes when we get to production, as we see dedicated performance teams are second highest in that category with just over 18% of the share. QA and the dedicated performance team have more focus on CI, snapshot builds and system/integration test phases than any of the other teams.
Who spends longer testing?
In terms of which team spends longer fixing bugs, the information is quite varied as can be seen from the table in figure 2.3. The Operations team and “Whoever writes the code” category provided the most votes for either no performance testing whatsoever or just 1-2 days of testing during a release (when responsibility was claimed by a team). Most teams favored the 1-2 days option, although an exception to this was the dedicated performance team whose modal average was 2-4 weeks, with over 28% of the responses. The QA team and Architects were also strong in this area, although not quite as strong as the performance team.
Which teams catch more bugs?
Next we can look at which teams catch more performance related bugs. As shown in figure 2.4, our dedicated performance team caught the most bugs with architects following close behind, and developers not far behind them. Interestingly, the QA teams struggle a little here and Operations teams struggle even more, catching only half as many bugs as architects or dedicated performance test teams would. It is of course worth mentioning again that different people might be looking for different types of bugs etc.
Who spends longer fixing bugs?
There are a couple of things we can look at when we focus on the time taken to fix bugs. Firstly, there’s the time taken for each team to fix a single bug and secondly, there’s the total time each team spends in a release fixing bugs, based on the number of bugs they find. Let’s focus on the former first.
The line graph in figure 2.5 shows how each specific team compares with the overall average for diagnosing, fixing and testing a single issue as a percentage (remember the overall average is 4.38, as we discovered in figure 1.16). We can see that when we fail early it’s far cheaper to fix issues, assuming all issues are of equal complexity. We also notice the huge spike that almost pokes us in the eye! This shows us that a dedicated performance team takes almost 48% more time to diagnose, fix and test issues compared to the average.
Who spends longer fixing bugs in a release?
Now, what does this mean in terms of the total time spent fixing issues in a full release? Well, we already know the number of performance related issues that are found per team, which we saw in figure 2.4. We can multiply those numbers with the time it takes to fully diagnose and fix the issues for each team. In figure 2.6, the bar graph shows the full time in days, spent diagnosing, fixing and testing all performance issues in a release. We can straight away see a dedicated performance team spends well over twice as long diagnosing, fixing and testing performance issues compared to a regular developer. We’ll also notice that developers spend the lowest amount of time to fix issues in a release, even though they find more bugs than the average. However, they’re able to spend less time overall in a release on the issues as their turnaround time for fixing a performance issue is so much smaller, assuming they do the same quality job of the fix.
Why should I care?
That’s a great question! We’re not asking you by the way, so you don’t need to answer. What we’re actually asking is: how much do different teams care about their performance testing. It is commonly accepted that testing early and often will find more issues and provide cheaper fixes. In fact, figure 2.5 is the perfect evidence for showing the cheaper fix claim in numbers. Next, we’re going to see if different teams look upon performance testing in different ways – whether they choose to do it periodically or reactively. This should give us an idea as to who treats performance testing as a first class citizen in the application release cycle.
Check out the table on the next page, figure 2.7. One thing which jumps straight out is the dedicated performance team. They are almost twice as likely than any other team to profile their code daily. Developers are most likely to profile as they code, which is most understandable as they’re most active during the development phase anyway. Dedicated performance teams are also 50% less likely to profile code when issues arise, compared to other teams. We might read into this that dedicated performance teams are less reactive to issues that arise, as they’re more likely to profile code and find issues more regularly.
Dedicated performance teams, QA and those in senior development roles also have a stronger preference to profile monthly. This might point to these teams using a milestone or snapshot build which would also be used during integration or system test cycles. Overall, it does look like profiling is a very reactive activity, which is quite a common way in which this style of tooling is used. It will be interesting to see how XRebel disrupts this market given its primary focus is usage in the “As I write it” category to promote the “fail early and often” mentality.
In this section we’ll look at how the complexity or size of the application changes how performance testing is done. We’ll split applications into three categories, which we’ll mention henceforth as simple applications (applications with fewer than 10 screens), medium complexity applications (applications with 10 to 99 screens) and complex applications (applications with 100 or more screens). When it comes to tooling, we notice the general trend that as applications become more complex, more tools tend to be used. A couple of exceptions include the NetBeans profiler and Java Mission Control. The usage of Java Mission Control actually decreases as applications become larger and more complex.
Complexity vs. Root Causes
How about the root causes of problems? Do they change when application size/complexity increases? Largely no, but there are a few exceptions shown in Figure 2.8. We can see that HTTP sessions and database queries are affected by the size of an application. As an application gets larger or more complex, HTTP sessions also grow. I guess this one is pretty self-explanatory as there will potentially be more places to get and store data from. Database access problems also increase as the application complexity increases, which might just be to say the larger applications are more likely to use a database. However, databases are more or less a fixture and fitting these days to most web applications. There may be more chance of a database growing to a size where it requires further indexing, or perhaps has a poor design which requires multiple queries to be run for the same data. Alternatively, there might be a higher chance of an n+1 style problem being compounded by a single request which ultimately bubbles up to the user as a noticeable problem.
We’ve talked about the complexity of the application using the number of screens as a metric, but what about the application type? There were four types we asked about, batch, webapp, mobile and desktops – but does the type affect the tools that are used? Well across the selection there were two tools in particular that differed across application types, namely JProfiler and YourKit, as shown in figure 2.9. JProfiler seemed very popular with web applications, mobile and desktop applications, but seemingly not a favorable choice for batch applications. However YourKit has a really strong presence in batch applications. In fact those who stated batch was their application type were more than 3 times more likely to use YourKit than those who have an application type of mobile. Batch teams also are less likely to test without tools. Three times less likely than teams with a web application in fact! So, batch application folks, take a bow, you’re our performance testing heroes!
When are different applications tested?
Batch applications are tested in production approximately 30% less than other applications. Instead, batch performance testing is more likely to occur from the CI stage all the way through to staging, more than any other application type. Once again Batch performance testers showed their awesomeness by saying they don’t test at all half as often as all the other application types. Hey mom, when I grow up, I wanna be a batch performance tester! Desktop application types are the most likely to be tested while they’re being coded – by over 25%, so a lot can be said for testing early on the desktop.
What gets measured?
Mobile applications measure network latency more than any of the other application types – over three times as much as batch applications and twice as much as desktop applications. Batch testers care more about thread concurrency and contention than others by around 30%. They also care a lot about application code performance as they are 20% more likely to be monitored than web or desktop applications and 50% more likely to be monitored than mobile applications. So, batch applications, you totally take this round! Let’s move on to the next part to find out what our respondents consider to be best practices in the world of performance testing.
Well, you made it to the final part of the report, great job! In this section, we’ll analyze the survey results for tools, testing frequency, stage and duration. We’ll go on to look at the best practices to understand what the differences are between those respondents who stated their users are unaffected by performance issues and those who say their users are affected. But first, we’ll look at the tools respondents use.
Using the right tool for the job
Tools are always considered one of the most important parts of a task. Very often process and timing and other aspects get forgotten and too much importance is placed on tooling. Let’s take a look at our toolset and compare which finds more performance issues.
In figure 3.1 we see the two leaders are Custom in-house tools and JProfiler with 8.75 and 8 issues found in each release. One might assume that if someone is knowledgeable enough about their environment and performance to write their own tools, they will find more bugs than the average person anyway! In third place is XRebel, one of the new kids on the block with 5.84 issues per release found, followed closely by the NetBeans profiler, JProbe and Java Mission Control.
With Performance testing there is no such thing as a silver bullet. In fact it’s very common to use multiple tools that are fit for specific purposes. For example, XRebel is designed for development. Java Mission Control has a very low overhead, so it works really well in production etc. As a result, let’s see if people who use multiple tools really do see more issues per release, or whether there’s a diminishing return if you’re a multi-tool pro. Figure 3.1 shows a steady growth in the increase of issues found per release with a slight anomaly for 2 tools, but the values for 1 and 2 are close enough that there’s not much we can read into it – other than it being an anomaly.
I love deadlines. I like the whooshing sound they make as they fly by.
– Douglas AdamS
When, what, where, how long?
As mentioned previously, there’s more to performance testing than just tools – the process is equally important. For instance: how frequently should you profile your code, when during the release lifecycle should you test and how long should you be testing for in your release? We wouldn’t dare go 1-2 months of development without running unit or functional tests, would we? (The answer is no, we wouldn’t.) So we should set the same standard for our performance testing.
In figure 3.3, we compare how frequently people profile their application code with how much faster their application is after their performance testing. Now we’re not saying anything about the total duration of performance testing here, just the frequency that it’s done at. There’s a clear trend here that those who test frequently are more likely achieve a better performing, faster application than those who test infrequently. Those who test monthly or quarterly, follow the same trend with results that sit neatly in the middle of the two other sets of statistics.
This is exactly what we’d expect for unit or functional testing too, which is why we run unit tests during development and functional or system tests in CI. We should expect nothing less from the way we do performance testing.
Does Time Really Heal Everything?
We’ve talked about the frequency of profiling, but now let’s move on to duration of performance testing in a release. Now obviously all releases will be of differing durations, so let’s see if we can see any trends between length of release and total time performance testing in that release. We’d hope that for teams with substantially longer release cycles, they’d be spending more time testing. Let’s see if that’s the case. Figure 3.4 shows a line graph plotting the length of time in a release vs the total time spent performance testing in that release. If we look at the first point that’s plotted for all the data sets, we notice that you’re more likely to omit testing altogether if your release cycle is shorter. As we look at the next data point, < 1 week, we can see that you’re most likely to spend less than a week testing performance if you have a shorter release cycle. Also, you’re most likely to spend a shorter amount of time performance testing. Further to this, you’re over twice as likely to spend no time at all performance testing. If your release cycle is every 6 months, you’re 5 times less likely to spend 1-2 days on performance testing and much more likely to spend 1-4 weeks running your performance testing. Again, our mid data point roughly follows the same trend, which means that our prediction was thankfully right and with longer release cycles teams do indeed put additional effort into their testing as we’d hope!
Performance Testing Best Practices
In this section we only ever pivot over one question: do your performance issues tend to affect users in production? We always want this answer to be no. We want to find, fix and test issues before they reach production and certainly before they reach users. So, next we’ll look at the differences between those who do affect their users with performance related issues and those that don’t, to see if there are trends or best practices which we can learn from.
To be honest, there was little in the data to suggest a trend existed one way or the other. Those who stated their users were not affected, said that whoever writes the code is responsible a few percent more than those who stated their users were affected, and that was the greatest difference in the stats. As a result, we can say that these numbers aren’t telling us whether it’s an advantage or disadvantage as to who’s responsible.
How about the complexity and size of the application that our respondents work on? That must play a part in how many issues affect users. First, let’s start with how big the team is, which is also a measure of how big the application is. We can see from figure 3.5 that those who say their users are affected by performance issues have around 23 people in the team, while there are only around 16 for those who aren’t affected. Said another way, teams with apps that suffer from users seeing performance issues are 45% larger than teams that have happy users not suffering from performance issues. It’s hard to say why the team size makes an issue, as there are likely many aspects that could affect this. You’d like to think that with more resources available, there would be more availability to find someone to run more performance testing, but clearly that gets squeezed, perhaps for more functionality as is often the way.
The complexity or size of an application is also measured in this survey by the number of screens which an application has. Here, in figure 3.6 we see substantial differences, with those seeing performance issues by their users have applications with an average 130.4 screens, while those without users complaining of performance issues have a lot less, 81.7 screens. This means those applications with angry users have 60% more screens than applications with happy users. This could be a sign of a complex application, or perhaps it’s more a sign of an application that takes longer to test as it’s simply larger. This would result in more time needed to run performance tests, which might not always be available in an application release cycle.
It’s all About the Timing
Again, we can see a big divide in when performance testing is done. It seems that by doing more performance testing earlier in the release cycle will have an impact on your end user, as we can see from figure 3.7. Those with unaffected users test while they code 36% more often than those with users who are affected by performance issues. This trend goes all the way through to production, whereby those who have sad users test more in production than those with happy users. Although it’s probably too late by then as the bugs have already been let loose.
Should we blame their tools?
In short, no – we absolutely shouldn’t! There was very little difference in the tooling statistics across the board. Teams that report users do not suffer from performance issues in their applications are 20% more likely to use custom in-house tools than those who have users which do suffer performance issues with their application. This could again point to the fact that those teams more capable or with the time and expertise to write their own custom tooling are going to be more likely to performance test most accurately with higher expertise and with more time. This could well be a signal of the kinds of people who write those kinds of tools.
How About the Fix
This is not as much of a best practice as an observation. For instance, if we notice a difference in time here – as to how long the fix takes to make and test – we can’t just say the solution is to fix and test twice as fast and your users won’t see performance issues anymore! However in figure 3.8 we can see that there is a big difference. In fact, it takes those whose users are affected over 60% longer to diagnose, fix and test a bug than for those whose users don’t see performance issues. This could well directly correlate with the phase in which this testing is done, linking back to the results in figure 3.7, which supports this claim.
What’s the Root Cause?
If we now look at the root causes of the issues that occur, we can see a focus around all things database. In fact, that’s pretty much the only trend we found, so let’s just look at this in figure 3.9. Let’s start with the database itself. We’re only looking at low numbers, so the difference isn’t too great, but those with users affected by performance issues are 28% more likely to have speed problems in their backend database than those with users not affected by performance issues. Database query issues were a much more common problem and a similar split can be seen. Performance issues due to there being too many database queries are almost 30% more likely in applications where users suffer from the performance issues than those that don’t. Similarly slow database queries are 36% more likely in applications with users suffering performance issues compared to those that do not. This is substantial evidence into database performance and interactions being a key component to user happiness.
Big bang, or Little and Often?
The final metric we’ll look at in this section is how the frequency of performance testing affects end users. We can see from the graph in figure 3.10 that there is a trend once more. Those applications that have users affected by performance issues are less likely to test frequently. Those who test at least every week, are 25% less likely to have affected users. On the other end of the spectrum, it’s the applications whose users do see performance issues that are most likely to be profiled on a yearly basis or even less frequently.
From this, we can say that the frequency of profiling does have an impact on whether end users are affected by performance issues in production. This graph shows answers to the frequency question which were of a regular timescale. I.e. Weekly or yearly. For those with affected users, only 45% of their responses to the question picked one of these regular timescale answers. For those whose users are not affected, this number rises up to 55%. This means profiling every x days is a beneficial thing. And as that number gets lower, the benefits increase. Let’s take a look at how the remaining respondents answered next.
Figure 3.11, which looks at the other answers available to the same profiling frequency question is heavily weighted to the answer: When we see issues. This is a very problem-reactive solution. That is to say that testing is performed only when an issue occurs. To clarify, profiling isn’t done at a certain time to see if bugs exist, rather it’s done as a result of a bug already having been found. This means we can label this option as a reactive measure. The flipside of this is a proactive method, in which profiling is done regularly as a measure to find potential issues, not react to them. We can see that those whose users are affected by performance issues are 23% more likely to adopt the reactive method of profiling when an issue is found, than those whose users are not affected by performance issues.
Summary & Conclusion
Wow, we’re at the summary already! That went quick. It must have been because of all the amazing graphs and humorous jokes that made the time just fly by. For those of you who need a refresher (or who just want to see the highlights), let’s check out what we learned from the data.
TL;DR – For those with limited time.
We released a survey in March 2015 focused on Java Performance. Here are some stats:
- A total of 1562 amazing respondents completed our 19 question survey which meant we could write this report, so thank you!
- ZeroTurnaround, the sponsors of RebelLabs, donated $1000 to the Dogs for the Disabled charity. Great job survey filler outers, you did great!
THE RAW DATA
Here are some of the important results and highlights from individual survey questions:
- The majority of our respondents were software developers, working on web applications.
- On average, over nine out of ten respondents say it’s the developers who fix performance issues, regardless who finds them.
- Almost half of the respondents profile only when performance issues arise.
- 20% of teams write their own custom in-house tooling to run their performance tests.
- Almost 50% of teams use multiple tools when testing.
- The most common root causes of performance issues are slow database queries, inefficient application code and excessive database queries.
- It takes just under one working week on average to diagnose, fix and test performance issues. Coincidentally, the exact same amount of time that most managers take to reply to super urgent emails.
- Three out of every four respondents state their performance issues affect the end user.
- On average, five and a half performance issues are found during each application release.
DEDICATED PERFORMANCE TEAMS
If we look at just the responses given by those who state dedicated performance teams are responsible for testing, we uncover some interesting findings.
- Dedicated performance teams are more likely to find a greater number of issues than any other team. Over twice as likely when compared to the operations team for example.
- Dedicated performance teams spend on average almost 50% longer diagnosing, fixing and testing performance issues, compared to other teams.
- Dedicated performance teams spend over 40 days, on average, diagnosing, fixing and testing all performance issues they find each release, compared to a software developer who spends just over 20 days.
Here’s what we learned about the tools our respondents use:
- JProfiler and custom in-house tools find more performance issues than any other tools. XRebel comes in third, leading a chasing pack.
- Using a greater number of tools in total will increase your chances of finding more performance issues – than sticking to just one.
THE APPLICATION AND PROCESSES
We performed the same pivot across application complexity and understanding the performance test process that teams followed:
- As application complexity increases, the number of database query issues and HTTP session issues also increases, while other root causes tend not to increase.
- Profiling your code frequently will give you a greater chance of a performance boost, compared to profiling at a less regular interval.
- As application release cycles get longer, the data suggests that people do spend additional time performance testing.
We also compared the activities of those respondents who claim their users are not affected by performance issues and those whose users are affected. We found that teams who have the happiest end users:
- Work in smaller teams – 30% smaller teams that design, develop, test and support the application.
- Have less complex applications – 38% fewer application screens/views.
- Test earlier – 36% more likely to run performance testing while they code.
- Are more efficient – 38% faster at diagnosing, fixing, and testing performance issues.
- Look after their databases – 20-25% less likely to have database query or slow DB issues.
- Are more proactive – Almost 40% more likely to profile on a daily or weekly basis.
- Are less reactive – Almost 20% less likely to test reactively when issues arise.
Goodbye and a comic
This report has been great fun to write – from planning the survey questions, to creating all the graphics and funky charts. Yep, it was even fun playing with the numbers in ‘Exhell’, filtering and pivoting the data, trying to seek out trends and patterns. We really hope you enjoyed reading it just as much and will even consider refocusing your efforts based on some of the discoveries in this report. Perhaps you’ll now test for performance sooner in your application lifecycle. Or maybe you’ll focus more heavily on your database interactions. You might consider profiling code much more frequently. We do hope you don’t fire half your staff to make your team smaller though – that bit was more of an observation rather than a recommendation!
We’re left with little else to do now, other than showing you a comic, on the following page, that might just make you wonder how this report would have turned out had we selected survey questions from Google autocomplete.
No comments yet.
Sorry, the comment form is closed at this time.