Leveraging Static Typing to Manage Object State

There has been a lot of talk in the software engineering world about the pains associated with OOP and specifically mutable state. Side-effect free Functional Programming is often touted as a solution. While there are some salient points on mutability, the real issues arise from naive use of implicitly mutable objects. If done properly, mutable state can be a useful tool for reasoning about code, and can guide future developers naturally and seamlessly towards correct code changes. In this post we’ll share some of our headaches that led to a principle of class design–using types to indicate object state.

Stateful Collaborators

Let’s start with an example. At Rescale we use a REST API to keep track of metadata about a job. Our worker nodes query the API to get information about what virtual hardware to spin up, and our cluster nodes query it to get information about what analysis to run. Both of them authenticate by passing an encrypted token. Our first client interface looked something like this:

public interface JobMetadataClient {
  Analysis getAnalysis(long jobId, Credentials credentials);
  CoreSummary getCoreSummary(long jobId, Credentials credentials);
  ...
}

This interface was good for ensuring that we always passed the correct credentials with each request, but it was a little cumbersome. As we refactored old code to use the new API client, we had to create credentials for each method call. Each worker takes a task for a job and then will make many requests using that job’s authentication token, so we constantly had to create identical credentials objects or pass one around. That made it a hassle to rely on the API as heavily as we wanted.
Our next thought was that we could have a worker set credentials on its API client once, when it pulls a task for a job, and then let all of the helper objects use the client with the assumption that it had already gotten its credentials. The interface was then something like:

public interface JobMetadataClient {
  void setCredentials(Credentials credentials);
  Analysis getAnalysis(long jobId);
  CoreSummary getCoreSummary(long jobId);
 ...
}

Most calling code could now just call the query methods without worrying about credentials. This accomplished the goal of making it much easier to use the API client methods but whenever we introduced the client into a new area of the codebase we would forget to set credentials. It also led us to write code that made implicit assumptions about the context under which it ran, and was hence less reusable and more fragile. Changes to one part of the system had the potential to break other parts, which is one of the principle things to avoid in software systems.
The first implementation is what I’ll call a single-state collaborator. It is just a bag of methods, and developers can easily reason about its behavior when they have an instance. The second does not make things so apparent because it has an implicit state change. If a developer gets a hold of an instance, they see that they can call query methods and those methods will probably work. Understanding the authentication state requires more knowledge of the system.
There’s a better design that leverages static typing to make the authentication state immediately apparent to future developers. We can go back to our first client interface and require credentials on every method call, but also provide a wrapper class whose type indicates its state:

public interface JobMetadataClient {
  Analysis getAnalysis(long jobId, Credentials credentials);
  CoreSummary getCoreSummary(long jobId, Credentials credentials);
  ...
}
...
public class AuthenticatedMetadataClient {
  private final Credentials credentials;
  private final JobMetadataClient metadataClient;
  public Analysis getAnalysis(long jobId) {
    return this.metadataClient.getAnalysis(jobId, this.credentials);
  }
  ...
}

Now if a developer works on a class that has an instance of AuthenticatedMetadataClient as a collaborator, they know for sure that it has authentication and that it will not lose it. If we write new classes that take an instance of AuthenticatedMetadataClient in their constructors, those classes can only be used when authentication has already been provided. Future developers will see from the class what they can do with the client objects, and their IDEs will suggest appropriate methods. They won’t need to keep as much information about the whole system in their heads in order to reason about parts of it. Those are powerful tools for working in the codebase.

Accumulating State in Memory

That was fine and good for an API client, but that class didn’t really need to change state because the real state is held in the API. What about when we want to accumulate state changes in memory before persisting them? Let’s take another example from Rescale’s codebase. We use optimization software that runs an analysis with varying values for initial parameters and selects an “optimal” result. We represent that workflow with a class, say, CaseWorkflow, that will hold the values of the initial parameters for the optimal run once it has been determined. We want to persist those values once everything is completed.
So we initially had some very imperative looking code that performed all the initialization and cleanup actions in a single method:

public void runWorkflow(CaseWorkflow workflow) {
  doSomeInitialization1();
  doSomeInitialization2();  
  for(InitialParameters initialParamters : workflow.getParams()) {
    //run parameters and set them on workflow if optimal
  }
  doSomeCleanup1();
  doSomeCleanup2();
 //the important line for this example
  persistOptimalParameters(workflow.getOptimalParameters());
}

We decided to refactor this using lifecycle listeners to separate responsibilities and make the code easier to understand and unit test. We wrote an interface like this:

public interface WorkflowLifecycleListener {
  void notifyWorkflowStarting(CaseWorkflow workflow);
  void notifyParamtersRun(IntialParameters initialParameters);
  void notifyWorkflowCompleted();
}

And refactored the original method to use these listeners:

public void run(CaseWorkflow workflow, ListenerFactory factory) {
  Collection lifecycleListener = factory.createListeners(workflow);
    for(WorkflowLifecycleListener listener : lifecycleListeners) {
    listener.notifyWorkflowStarting(workflow);
  }
  for(InitialParameters initialParamters : workflow.getParams()) {
    //process parameters
    for(WorkflowLifecycleListener listener : lifecycleListeners) {
      listener.notifyParametersRun(initialParameters);
    }
  }
  for(WorkflowLifecycleListener listener : lifecycleListeners) {
    listener.notifyWorkflowCompleted();
  }
}

We used a factory object because we wanted a different set of listeners for different types of workflows, but that’s not relevant here. What is relevant is that each listener object was created in scope, and tied to a single workflow. That’s why the following listener made sense at the time:

public class PersistOptimalParametersListener implements WorkflowLifecycleListener {
  private final InitialParameters optimalParameters;
  public PersistOptimalParametersListener(CaseWorkflow workflow) {
    this.optimalParamters = workflow.getOptimalParameters();
  }
  @Override
  public void notifyWorkflowStarting(CaseWorkflow workflow) { }
  @Override
  public void notifyParamtersRun(IntialParameters params) { }
  @Override
  void notifyWorkflowCompleted() {
    persistOptimalParameters(this.optimalParameters);
  }
}

After all this explanation, the mistake seems obvious: the optimal parameters are not set on the workflow at the time this listener will be instantiated. At that time they are just an empty collection–but in the midst of refactoring, that’s easy to forget. Keeping that at top of mind requires a lot of context about the entire optimization system. We wondered–could we use finer-grained types to prevent this mistake and communicate workflow state to future developers? Yes.
A key issue here was the use of getter and setter methods that is common in Java. A class that has a getOptimalParameters method doesn’t tell the developer when that method can be appropriately called. That class uses implicit state changes like our API client that allowed credentials to be set on itself. Instead, we should write the objects so that they don’t have those methods at all if they’re not appropriate to call:

public class CaseWorkflow {
 ...
 public CompletedWorkflow complete(InitalParameters optimalInitialParameters) {
  ...
 }
 ...
}
public class CompletedWorkflow extends CaseWorkflow {
  private final InitialParameters optimalInitialParameters;
  public InitialParameters getOptimalParamters() { ... }
}

Like the AuthenticatedClient in the first example, we can now write methods that operate on a CompletedWorkflow and be sure about its state. We don’t have to remember all the ins and outs of what gets set when because the methods available on the class tell us.

Summary

The common factor in these examples was leveraging Java’s type system as a tool for documenting the possible states of objects. With the help of IDE method suggestion, reasoning about objects with informative types is natural and smooth. The types also reduce the context required to correctly understand object behaviour, which is a boon for productivity.

Adam McKenzie

As CTO, Adam is responsible for managing the HPC and customer success teams. Adam began his career at Boeing, where he spent seven years working on the 787, managing structural and software engineering projects designing, analyzing, and optimizing the wing. Adam holds a B.S. in Mechanical Engineering cum laude from Oregon State University.

View all posts

Cookie	Duration	Description
AWSALBCORS	7 days	This cookie is managed by Amazon Web Services and is used for load balancing.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
player	1 year	Vimeo uses this cookie to save the user's preferences when playing embedded videos from Vimeo.

Cookie	Duration	Description
AWSALB	7 days	AWSALB is an application load balancer cookie set by Amazon Web Services to map the session to the target.
sync_active	never	This cookie is set by Vimeo and contains data on the visitor's video-content preferences, so that the website remembers parameters such as preferred volume or video quality.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_UA-32985745-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
utm_campaign	past	Google Ad Services sets this cookie to store session campaign value if present.
utm_content	past	This cookie is used for storing the session content value if present.
utm_source	past	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.
utm_term	past	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
_mkto_trk	2 years	This cookie, provided by Marketo, has information (such as a unique user ID) that is used to track the user's site usage. The cookies set by Marketo are readable only by Marketo.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
personalization_id	2 years	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
utm_medium	past	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
_chtbl	session	No description available.
_dtses	30 minutes	No description available.
_dtuid	10 years	No description available.
BIGipServersj30web-nginx-app_https	session	No description
email	past	No description available.
gclid	past	No description
handl_ip	1 month	No description available.
handl_landing_page	1 month	No description available.
handl_original_ref	past	No description available.
handl_ref	past	No description available.
handl_url	1 month	No description available.
li_gc	2 years	No description
muc_ads	2 years	No description
username	past	No description available.

Rescale Platform

Overview

HPC & AI Software

HPC & AI Architectures

Security & Compliance

Ecosystem Integrations

Pricing

HPC as a Service

Intelligent Batch

Elastic Cloud Workstation

Storage Fabric

Enterprise Management

Multi-Team Management

Performance Management

Software Publisher

Digital Engineering

AI Physics

Knowledge Management

Computational Pipelines