Leveraging Static Typing to Manage Object State

alex-blog
There has been a lot of talk in the software engineering world about the pains associated with OOP and specifically mutable state. Side-effect free Functional Programming is often touted as a solution. While there are some salient points on mutability, the real issues arise from naive use of implicitly mutable objects. If done properly, mutable state can be a useful tool for reasoning about code, and can guide future developers naturally and seamlessly towards correct code changes. In this post we’ll share some of our headaches that led to a principle of class design–using types to indicate object state.

Stateful Collaborators

Let’s start with an example. At Rescale we use a REST API to keep track of metadata about a job. Our worker nodes query the API to get information about what virtual hardware to spin up, and our cluster nodes query it to get information about what analysis to run. Both of them authenticate by passing an encrypted token. Our first client interface looked something like this:

public interface JobMetadataClient {
  Analysis getAnalysis(long jobId, Credentials credentials);
  CoreSummary getCoreSummary(long jobId, Credentials credentials);
  ...
}

This interface was good for ensuring that we always passed the correct credentials with each request, but it was a little cumbersome. As we refactored old code to use the new API client, we had to create credentials for each method call. Each worker takes a task for a job and then will make many requests using that job’s authentication token, so we constantly had to create identical credentials objects or pass one around. That made it a hassle to rely on the API as heavily as we wanted.
Our next thought was that we could have a worker set credentials on its API client once, when it pulls a task for a job, and then let all of the helper objects use the client with the assumption that it had already gotten its credentials. The interface was then something like:

public interface JobMetadataClient {
  void setCredentials(Credentials credentials);
  Analysis getAnalysis(long jobId);
  CoreSummary getCoreSummary(long jobId);
 ...
}

Most calling code could now just call the query methods without worrying about credentials. This accomplished the goal of making it much easier to use the API client methods but whenever we introduced the client into a new area of the codebase we would forget to set credentials. It also led us to write code that made implicit assumptions about the context under which it ran, and was hence less reusable and more fragile. Changes to one part of the system had the potential to break other parts, which is one of the principle things to avoid in software systems.
The first implementation is what I’ll call a single-state collaborator. It is just a bag of methods, and developers can easily reason about its behavior when they have an instance. The second does not make things so apparent because it has an implicit state change. If a developer gets a hold of an instance, they see that they can call query methods and those methods will probably work. Understanding the authentication state requires more knowledge of the system.
There’s a better design that leverages static typing to make the authentication state immediately apparent to future developers. We can go back to our first client interface and require credentials on every method call, but also provide a wrapper class whose type indicates its state:

public interface JobMetadataClient {
  Analysis getAnalysis(long jobId, Credentials credentials);
  CoreSummary getCoreSummary(long jobId, Credentials credentials);
  ...
}
...
public class AuthenticatedMetadataClient {
  private final Credentials credentials;
  private final JobMetadataClient metadataClient;
  public Analysis getAnalysis(long jobId) {
    return this.metadataClient.getAnalysis(jobId, this.credentials);
  }
  ...
}

Now if a developer works on a class that has an instance of AuthenticatedMetadataClient as a collaborator, they know for sure that it has authentication and that it will not lose it. If we write new classes that take an instance of AuthenticatedMetadataClient in their constructors, those classes can only be used when authentication has already been provided. Future developers will see from the class what they can do with the client objects, and their IDEs will suggest appropriate methods. They won’t need to keep as much information about the whole system in their heads in order to reason about parts of it. Those are powerful tools for working in the codebase.

Accumulating State in Memory

That was fine and good for an API client, but that class didn’t really need to change state because the real state is held in the API. What about when we want to accumulate state changes in memory before persisting them? Let’s take another example from Rescale’s codebase. We use optimization software that runs an analysis with varying values for initial parameters and selects an “optimal” result. We represent that workflow with a class, say, CaseWorkflow, that will hold the values of the initial parameters for the optimal run once it has been determined. We want to persist those values once everything is completed.
So we initially had some very imperative looking code that performed all the initialization and cleanup actions in a single method:

public void runWorkflow(CaseWorkflow workflow) {
  doSomeInitialization1();
  doSomeInitialization2();  
  for(InitialParameters initialParamters : workflow.getParams()) {
    //run parameters and set them on workflow if optimal
  }
  doSomeCleanup1();
  doSomeCleanup2();
 //the important line for this example
  persistOptimalParameters(workflow.getOptimalParameters());
}

We decided to refactor this using lifecycle listeners to separate responsibilities and make the code easier to understand and unit test. We wrote an interface like this:

public interface WorkflowLifecycleListener {
  void notifyWorkflowStarting(CaseWorkflow workflow);
  void notifyParamtersRun(IntialParameters initialParameters);
  void notifyWorkflowCompleted();
}

And refactored the original method to use these listeners:

public void run(CaseWorkflow workflow, ListenerFactory factory) {
  Collection lifecycleListener = factory.createListeners(workflow);
    for(WorkflowLifecycleListener listener : lifecycleListeners) {
    listener.notifyWorkflowStarting(workflow);
  }
  for(InitialParameters initialParamters : workflow.getParams()) {
    //process parameters
    for(WorkflowLifecycleListener listener : lifecycleListeners) {
      listener.notifyParametersRun(initialParameters);
    }
  }
  for(WorkflowLifecycleListener listener : lifecycleListeners) {
    listener.notifyWorkflowCompleted();
  }
}

We used a factory object because we wanted a different set of listeners for different types of workflows, but that’s not relevant here. What is relevant is that each listener object was created in scope, and tied to a single workflow. That’s why the following listener made sense at the time:

public class PersistOptimalParametersListener implements WorkflowLifecycleListener {
  private final InitialParameters optimalParameters;
  public PersistOptimalParametersListener(CaseWorkflow workflow) {
    this.optimalParamters = workflow.getOptimalParameters();
  }
  @Override
  public void notifyWorkflowStarting(CaseWorkflow workflow) { }
  @Override
  public void notifyParamtersRun(IntialParameters params) { }
  @Override
  void notifyWorkflowCompleted() {
    persistOptimalParameters(this.optimalParameters);
  }
}

After all this explanation, the mistake seems obvious: the optimal parameters are not set on the workflow at the time this listener will be instantiated. At that time they are just an empty collection–but in the midst of refactoring, that’s easy to forget. Keeping that at top of mind requires a lot of context about the entire optimization system. We wondered–could we use finer-grained types to prevent this mistake and communicate workflow state to future developers? Yes.
A key issue here was the use of getter and setter methods that is common in Java. A class that has a getOptimalParameters method doesn’t tell the developer when that method can be appropriately called. That class uses implicit state changes like our API client that allowed credentials to be set on itself. Instead, we should write the objects so that they don’t have those methods at all if they’re not appropriate to call:

public class CaseWorkflow {
 ...
 public CompletedWorkflow complete(InitalParameters optimalInitialParameters) {
  ...
 }
 ...
}
public class CompletedWorkflow extends CaseWorkflow {
  private final InitialParameters optimalInitialParameters;
  public InitialParameters getOptimalParamters() { ... }
}

Like the AuthenticatedClient in the first example, we can now write methods that operate on a CompletedWorkflow and be sure about its state. We don’t have to remember all the ins and outs of what gets set when because the methods available on the class tell us.

Summary

The common factor in these examples was leveraging Java’s type system as a tool for documenting the possible states of objects. With the help of IDE method suggestion, reasoning about objects with informative types is natural and smooth. The types also reduce the context required to correctly understand object behaviour, which is a boon for productivity.

Author

  • Adam McKenzie

    As CTO, Adam is responsible for managing the HPC and customer success teams. Adam began his career at Boeing, where he spent seven years working on the 787, managing structural and software engineering projects designing, analyzing, and optimizing the wing. Adam holds a B.S. in Mechanical Engineering cum laude from Oregon State University.

Similar Posts