Anyone who works with Scala for a while probably used, or at least seen the Future class. Being part of the standard library, Future
provides a way how to express asynchronous computation or value. It’s used by many and many popular Scala libraries, such as Apache Spark, Akka or Slick, and the use of Future
itself is also pretty easy. But it also comes with some drawbacks, caused mainly by its design. This blog post summarizes the most common pitfals and introduces Monix Task as more powerful and robust alternative.
1 What’s wrong with Future?
Scala’s Future
represents a value, that might not be currently available, but will be at some point of time (if no error occurs). Current implementation is based on these design decisions:
- eager evaluation -
Future
starts evaluating its value right after it’s defined - memoization - once the value is computed, it’s shared to anyone who asks for it, without being recalculated again
Unfortunately, this design can lead to some unexpected situations, mainly when some side effects happen inside the computation.
1.1 Breaks referential transparency
In general, referential transparency means that any expression can be replaced with its value without changing the program’s behaviour. Such expression (or function) must be pure, meaning that is has no side effect, because any side effect would break this condition. Let’s start with simple example:
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
import scala.util.Random
val future = Future(println("hello"))
for {
_ <- future
_ <- future
_ <- futureyield ()
} // hello
The above code prints the hello string once. According to referential transparency, if the future
variable is replaced by the value, result should be the same:
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
import scala.util.Random
for {
Future(println("hello"))
_ <- Future(println("hello"))
_ <- Future(println("hello"))
_ <- yield ()
} // hello
// hello
// hello
Now the hello string is printed three times and this clearly breaks the referential transparency. Why is that? Because Future
is eagerly evaluated and memoizes it’s value. That means that code inside Future
is evaluated right after it’s defined, and it’s evaluated only once, remembering the computed value. And the eager evaluation leads to another problem…
1.2 ExecutionContext everywhere
Each Future
needs to know where to execute itself, on which thread. This is why the ExecutionContext instance is required. In ideal world, it would be possible to define the Future
value, perform some some transformations using map
, flatMap
, etc. and at the very end to call some kind of run
method, which would run the entire chain using the given ExecutionContext
.
Unfortunately, because of the eager nature of Future
, the ExecutionContext
is required as implicit parameter by any of the transformation or callback methods, such as map, flatMap, foreach and onComplete. It basically means that everywhere in your codebase where you work with Future
s, you have to somehow propagate also the ExecutionContext
, which quickly becomes really cumbersome.
1.3 Gotchas with for-comprehension
Sometimes it’s required to execute several independent Future
s in parallel and combine their results into single value. Naive approach to this might be following:
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
import Thread.sleep
def longRunningJob1: Int = {sleep(800); 2}
def longRunningJob2: Int = {sleep(300); 4}
def longRunningJob3: Int = {sleep(900); 6}
for {
Future(longRunningJob1)
a <- Future(longRunningJob2)
b <- Future(longRunningJob3)
c <- yield a + b + c }
Problem is that this code actually runs synchronously. Why is that? If you desugar the for-comprehension, the code looks like this:
Future(longRunningJob1)
flatMap(a => Future(longRunningJob2)
.flatMap(b => Future(longRunningJob3)
.map(c => a + b + c))) .
And the flatMap
method on Future
is executed after its value is evaluated, this is also clearly stated in the method’s Scaladoc:
Creates a new future by applying a function to the successful result of this future, and returns the result of the function as the new future.
The workaround for this is to declare the Future
s before merging them together, like this:
val futureA = Future(longRunningJob1)
val futureB = Future(longRunningJob2)
val futureC = Future(longRunningJob3)
for {
a <- futureA
b <- futureB
c <- futureCyield a + b + c }
This works beacuse of the eager evaluation nature of Future
. The computation of values for fields futureA
, futureB
and futureC
starts independently and before the for
block is performed. The problem here is that the code behaves differently based on how it’s structured and programmer must be aware of this. If Future
was lazily evaluated, both examples would behave the same.
2 Monix Task to the rescue
Monix is popular Scala library, providing various tools for composing asynchronous programs. One of the provided data types is Task, representing (possibly) asynchronous computation. Here is the overview of key architecture differences between Task
and Future
:
evaluation | memoization | |
---|---|---|
Future | eager | yes (forced) |
Task | lazy | no (but can be enabled) |
Using Task
is very similar to using Future
. Main difference is that instead of ExecutionContext
, you need the Scheduler (which is basically just wrapper around it), but contrary to Future
it’s required only when the value is evaluated.
import monix.eval.Task
import monix.execution.Scheduler.Implicits.global
// 1) define the task
val task1 = Task(println("hello"))
// 2) then run it
runSyncUnsafe() // executes the task, synchronously (blocking operation) task1.
Monix also provides fine grained control over how the Task
will be executed. By using various implementations of Scheduler
, you can choose where the Task
will be executed (fixed thread pool, etc.) and using the various runXY
methods, you can tell how the task will be executed (synchronously, asynchronously, with delay, etc). See the official documentation for more details.
Let’s check if using Task
for same scenarios can solve the issues we had with Future
.
2.1 Preserves referential transparency
As shown earlier, Future
breaks rules of referential transparency, which might lead to some surprising errors, mainly if Future
performs some side effects. Let’s compare it with Task
.
import monix.eval.Task
import monix.execution.Scheduler.Implicits.global
val task = Task(println("hello"))
val result = for {
_ <- task
_ <- task
_ <- taskyield ()
}
runSyncUnsafe()
result.// hello
// hello
// hello
The above code prints the string to console three times, because Task
does not memoize the computed value, so each time it’s used its value is computed again. Let’s see what happens if we replace the task
variable references by inlining the Task
itself.
import monix.eval.Task
import monix.execution.Scheduler.Implicits.global
val result = for {
Task(println("hello"))
_ <- Task(println("hello"))
_ <- Task(println("hello"))
_ <- yield ()
}
runSyncUnsafe()
result.// hello
// hello
// hello
Output is the same for both examples, which means that Task
preserves referential transparency.
2.2 Scheduler needed only for evaluation
One of the ugly properties of Future
is that methods used for value manipulation, such map
and flatMap
, requires implicit value of ExecutionContext
to be in scope, so you need to propagate it through your codebase. Task
requires Scheduler
only for its execution using the runXY
methods, so there’s no need to pollute your codebase with its instances.
2.3 Consistent behaviour with for-comprehension
Earlier we discussed that using multiple values of Future
in for-comprehension might result in different execution, based on how the source code is structured. Let’s compare it with the same example, rewritten using the Task
:
def longRunningJob1: Int = { sleep(800); println("executing job 1"); 2 }
def longRunningJob2: Int = { sleep(300); println("executing job 2"); 4 }
def longRunningJob3: Int = { sleep(900); println("executing job 3"); 6 }
val result = for {
Task(longRunningJob1)
a <- Task(longRunningJob2)
b <- Task(longRunningJob3)
c <- yield a + b + c
}
runSyncUnsafe() result.
Written this way, these three Tasks are executed synchronously, because the for-comprehension is again desugared into flatMap
calls and flatMap
waits until the result of previous Task
is computed. So far it’s pretty same to the Future
. Let’s see what happens if the Task
values are defined outside the for
block:
val task1 = Task(longRunningJob1)
val task2 = Task(longRunningJob2)
val task3 = Task(longRunningJob3)
val result = for {
a <- task1
b <- task2
c <- task3yield a + b + c
}
runSyncUnsafe() result.
And, unlike Future
, the result is the same, Tasks are again executed synchronously. This is because unlike Future
, Task
is always lazily evaluated so it really doesn’t matter where in the code you define it. If you want to execute multiple Task
values in parallel, you have to explicitly do that on your own (see documentation about parallel processing).
3 Conclusion
As shown in simple examples in above article, the Future’s design (eager evaluation and memoization) can lead to some unexpected situations, mainly when combined with side effects. Monix Task is nice alternative that preserves some fundamental principles of functional programming, such as referential transparency, allows to write more clean codebase by reducing the need of ExecutionContext
everywhere and provides more fine grained control over where and how it’s executed.