The latest expert opinions, articles, and guides for the Java professional.
Towards the end of 2015, we released our cheat sheet for Java 8 best practices, which got thousands upon thousands of hits and downloads! So we thought, we should create another one, and we did just that. This time around, based on your survey feedback, we focus on one of the most important and substantial features in the Java 8 release, the Streams API. You should click on the cheat sheet below and print it out. It will look amazing next to the Java 8 cheat sheet! Incidentally, Venkat Subramaniam, Java legend gave a Java 8 Streams masterclass just last week on Virtual JUG, so make sure you check that video out if you want to hear about streams in more depth.
This blog post discusses the streams API, but if you’re impatient, here’s the 1 page Java 8 streams cheat sheet, click on it, save it, print it out!
Let’s start off with what a stream actually is and what a stream isn’t! Here are some important points which shape how you should think about a stream:
- A Stream is a pipeline of functions that can be evaluated
- Streams can transform data
- A Stream is not a data structure
- Streams cannot mutate data
Most importantly, a stream isn’t a data structure. You can often create a stream from collections to apply a number of functions on a data structure, but a stream itself is not a data structure. That’s so important, I mentioned it twice! A stream can be composed of multiple functions that create a pipeline that data that flows through. This data cannot be mutated. That is to say the original data structure doesn’t change. However the data can be transformed and later stored in another data structure or perhaps consumed by another operation.
We stated that a stream is a pipeline of functions, or operations. These operations can either be classed as an intermediate operation or a terminal operation. The difference between the two is in the output which the operation creates. If an operation outputs another stream, to which you could apply a further operation, we call it an intermediate operation. However, if the operation outputs a concrete type or produces a side effect, it is a terminal type. A subsequent stream operation cannot follow a terminal operation, obviously, as a stream is not returned by the terminal operation!
An intermediate operation is always lazily executed. That is to say they are not run until the point a terminal operation is reached. We’ll look in more depth at a few of the most popular intermediate operations used in a stream.
- filter – the filter operation returns a stream of elements that satisfy the predicate passed in as a parameter to the operation. The elements themselves before and after the filter will have the same type, however the number of elements will likely change.
- map – the map operation returns a stream of elements after they have been processed by the function passed in as a parameter. The elements before and after the mapping may have a different type, but there will be the same total number of elements.
- distinct – the distinct operation is a special case of the filter operation. Distinct returns a stream of elements such that each element is unique in the stream, based on the equals method of the elements.
Here’s a table that summarises this, including a couple of other common intermediate operations.
|Function||Preserves count||Preserves type||Preserves order|
A terminal operation is always eagerly executed. This operation will kick off the execution of all previous lazy operations present in the stream. Terminal operations either return concrete types or produce a side effect. For instance, a reduce operation which calls the Integer::sum operation would produce an Optional
|Function||Output||When to use|
|reduce||concrete type||to cumulate elements|
|collect||list, map or set||to group elements|
|forEach||side effect||to perform a side effect on elements|
This cheat sheet is brought to you by XRebel, a tool to remind you about your app performance when you are actually working on it, not later when your clients think it is already slow. If you're working with any Java web applications, you should try it. It might change your attitude towards performance. Try XRebel!
Let’s take a look at a couple of examples and see what our functional code examples using streams would look like.
Exercise 1: Get the unique surnames in uppercase of the first 15 book authors that are 50 years old or older.
So you know, the source of our stream, library, is an ArrayList. Check out the code and follow along with the description. From this list of books, we first need to map from books to the book authors which gets us a stream of Authors and then filter them to just get those authors that are 50 or over. We’ll map the surname of the Author, which returns us a stream of Strings. We’ll map this to uppercase Strings and make sure the elements are unique in the stream and grab the first 15. Finally we return this as a list using toList from
library.stream() .map(book -> book.getAuthor()) .filter(author -> author.getAge() >= 50) .map(Author::getSurname) .map(String::toUpperCase) .distinct() .limit(15) .collect(toList()));
Did you notice that as we read through the code we can describe what it’s doing line by line. Imagine doing that with imperative code!
Exercise 2: Print out the sum of ages of all female authors younger than 25.
library.stream() .map(Book::getAuthor) .filter(author -> author.getGender() == Gender.FEMALE) .map(Author::getAge) .filter(age -> age < 25) .reduce(0, Integer::sum)
Using the same original stream we once again map the elements from Books to Authors and filter just on those authors that are female. Next we map the elements from Authors to author ages which gives us a stream of ints. We filter ages to just those that are less than 25 and use a reduce operation and Integer::sum to total the ages.
You can parallelise the work you do in a stream in a couple of ways. Firstly you can get a parallel stream directly from your source by calling the parallelStream() method directly as shown:
Alternatively, you can call an intermediate operation on an existing stream which spawns off threads and executes further operations in parallel, as shown:
One important thing to note is that parallel streams achieve parallelism through threads using the existing common ForkJoinPool. As a result there are possible complications as we detailed in this previous RebelLabs post. Using parallel streams can cause concurrency issues depending on what you’re doing in your stream as well. Make sure you need to use a parallel stream for a big enough job, rather than using them by default. Also given you’re using the common ForkJoinPool, be sure not to run any blocking operations.
A classic example of a potential concurrency issue when using parallel streams is when updating a shared mutable variables from a forEach operation. Let’s consider the following code:
List<Book> myList = new ArrayList<>(); library.parallelStream().forEach(e -> myList.add(e));
Would you expect multiple threads to concurrently access an ArrayList without issue normally? Of course you wouldn’t! So you shouldn’t expect it to within a parallel stream. In fact, it’s best not to do this even in a non-parallel stream, just in case someone tries to make it parallel in future. It tends to be safer to collect this into a List using the collect() operation.
Anyway, don’t want to hold you any longer, I know you’re pretty convinced by now that Java 8 streams are paramount to the best practices of development and you can get all the information about them in this concise 1 page cheat sheet!
This is our second cheat sheet, after the Java 8 Best Practices Cheat Sheet. Our next one is already in the works, due to the popularity and demand! This one will be around some of the most useful git commands known to man, and woman! Make sure you keep an eye on the RebelLabs blog so you don’t miss it! You could even subscribe to our mailing list so that you get a gentle nudge when it’s available.
Leave a comment