Writing readable code is essential for the maintainability of any application. There are many techniques we can use to be able to write readable code. This article discusses one such technique that uses Algebraic Data Types and Pattern Matching, a few of the important features from Project Amber [1]. We will first discuss what Algebraic Data Types are, and then we will discuss some Pattern Matching features from Java. Finally, we will see an example where we can use them to write readable code.
Algebraic Data Types
Algebraic Data Types (ADTs) are data-types that are created by combining existing data-types, mainly by applying algebraic operations to them. There are two main categories of ADTs: Product Types and Sum Types. Let’s explore them both.
Product Types
In Product Types, each possible value of the resulting data-type has a value for each of its constituent types. For example, we can create Point type by combining two int types (two coordinates of a point).
In Java, a product-type can be implemented by using record [2].
record Point(int x, int y) { }
Another example is an Address has a houseNumber, a street, a city and a country. In Java this can be implemented as:
record Address(int houseNumber, String street, String city, String country) { }
Sum Types
In Sum Types, each possible value of the resulting data-type is one of the values of its constituent types. For example, let’s say a shape can either be a triangle, a quadrilateral or a circle.
In Java, we can implement this by creating sealed interface [3] called Shape with 3 implementations: a Triangle, a Quadrilateral and a Circle. This way, any instance of Triangle, Quadrilateral or Circle is of type Shape. Also, a Shape instance can only be one of these 3 implementation types and nothing else.
sealed interface Shape {
record Triangle(Point p1, Point p2, Point p3) implements Shape { }
record Quadrilateral(Point p1, Point p2, Point p3, Point p4) implements Shape { }
record Circle(Point p1, int radius) implements Shape { }
}
Pattern Matching
Pattern Matching is checking for the presence of a pattern in a given sequence of tokens. There are many pattern-matching features in Java introduced as part of Project Amber. Let’s discuss some of these features, which are needed for our example later on.
Pattern Matching for Instanceof
instanceof is an operator that you can use to check if an object is an instance of a class or interface. This way you can safely cast the object to the class or interface. Typically, this is followed by using the object and accessing the members of the class or interface.
With this feature, you can now do all that in the same statement. For example:
if (object instanceof Point) {
Point p = (Point) object;
double dist = Math.sqrt(p.x * p.x + p.y * p.y);
}
Here we are checking if object is an instance of Point, and if it is, we create a variable p of type Point (by casting object to Point) and we can use that variable in the subsequent code. All of this can be done with this [4]:
if (object instanceof Point p) {
double dist = Math.sqrt(p.x() * p.x() + p.y() * p.y());
}
Record Patterns
In the above example, if Point is a record, we could deconstruct [5] it into its constituent members. As shown in the example below, we can access x and y coordinates of the point and calculate the distance of the point from origin.
if (object instanceof Point(int x, int y)) {
double dist = Math.sqrt(x * x + y * y);
}
We can also apply the deconstruction pattern recursively. For example,
if (shape instanceof Circle( Point(int x, int y), int radius)) {
double dist = Math.sqrt(x * x + y * y) - radius;
}
Here, we check if the given shape is a circle, and if it is, we deconstruct the circle into a point (center of the circle) and radius. And then we further deconstruct the point into its x and y coordinates.
Pattern Matching for Switch
Record patterns can also be used in switch expressions and statements [6]. For example,
int sides = switch (shape) {
case Triangle(Point p1, Point p2, Point p3) -> 3;
case Quadrilateral(Point p1, Point p2, Point p3, Point p4) -> 4;
case Circle(Point p1, int radius) -> 0;
};
In the example above, we calculate the sides of a shape. When the shape is triangle, we return 3. When the shape is quadrilateral, we return 4 and when the shape is circle, we return 0.
Unnamed Variables and Patterns
When a variable is unused, we can be explicit about this by using an underscore as a name of the variable. That way we can inform the compiler (and readers of the code) that we don’t plan to use it.
For example, the above example can be rewritten with all variables in record patterns as underscores [7]:
int sides = switch (shape) {
case Triangle(Point _, Point _, Point _) -> 3;
case Quadrilateral(Point _, Point _, Point _, Point _) -> 4;
case Circle(Point _, int _) -> 0;
};
We can make it even more concise by not deconstructing the records as below:
int sides = switch (shape) {
case Triangle _ -> 3;
case Quadrilateral _ -> 4;
case Circle _ -> 0;
};
Using Them All Together
Let’s now take a look at how we can use all these features together.
What we want to do is write a program that can evaluate such expressions, for given set of variables-values:
4x + 3y + 2z + 1 23ab + 34bc + 45ac
If we analyse the expressions above, we notice that there are some variables, and these expressions are created by applying some operations to these variables. The variables in the above examples are x, y, z and a, b, c. The operations that are used in above examples are addition and multiplication.
If we think of all possible expressions that we can form by these rules, we see that a valid expression can be a constant, a variable, addition of two valid expressions or multiplication two valid expressions. We can denote this with a grammar like this:
Expression = a constant
| a variable
| an expression + an expression
| an expression * an expression
We can create all valid expressions using this grammar, and any expression created by this grammar will be a valid expression.
This tells us that an expression can be one of four things: a constant, a variable, addition of two expressions and multiplication of two expressions. And that means an expression is a sum-type. So let’s implement this using a sealed interface as below:
sealed interface Expression {
record ConstExpr(int value) implements Expression { }
record VarExpr(String name) implements Expression { }
record AdditionExpr(Expression left, Expression right)
implements Expression { }
record MultiplicationExpr(Expression left, Expression right)
implements Expression { }
}
Here we have four implementations of Expression, viz., ConstExpr (a constant expression), VarExpr (a variable expression), AdditionExpr (an addition expression) and MultiplicationExpr (a multiplication expression). These four are records. ConstExpr can only have a single integer value. VarExpr has a variable name as its member. AdditionExpr and MultiplicationExpr both have two expressions (left and right) as their members. So these four are product-types, and expression is a sum-type. By using Algebraic Data Types for the definition of Expression, we ensure that any instance of Expression will be a valid expression.
Now for the evaluation part, we create a default method evaluate() which takes two arguments: an expression to be evaluated and a Map<String, Integer> of variable-values. We can implement this method as below:
default int evaluate(Map<String, Integer> values) {
return switch (this) {
case ConstExpr(var value) -> value;
case VarExpr(var name) -> values.get(name);
case AdditionExpr(var left, var right) ->
left.evaluate(values) + right.evaluate(values);
case MultiplicationExpr(var left, var right) ->
left.evaluate(values) * right.evaluate(values);
};
}
Here, we use pattern matching for switch expression to check what type of expression it is and depending on the type of the expression, we evaluate it differently. So if the expression is a constant expression, we already know its value, and we can simply return it. If it is a variable expression, we can retrieve the value from the map being passed in the argument and return it. If it is an addition expression, we recursively evaluate the left expression and the right expression and then simply add the two values to get the result and return it. The same is the case for multiplication expression, with the only difference that we return the multiplication of the results of left and right expressions.
Now using this, let’s see how we can evaluate the two example expressions. Before that, let’s add some methods in Expression to make creating expressions easier:
static Expression constant(int value) {
return new ConstExpr(value);
}
static Expression varExpr(String name) {
return new VarExpr(name);
}
default Expression plus(Expression other) {
return new AdditionExpr(this, other);
}
default Expression times(Expression other) {
return new MultiplicationExpr(this, other);
}
The first expression can be evaluated as below (for values: x=10, y=15 and z=20):
// 4x + 3y + 2z + 1
static void main() {
var expr1 =
constant(4).times(varExpr("x"))
.plus(
constant(3).times(varExpr("y"))
).plus(
constant(2).times(varExpr("z"))
).plus(
constant(1)
);
var values = Map.of("x", 10, "y", 15, "z", 20);
System.out.println(expr1.evaluate(values));
}
// Output:
// 126
And the second one can be evaluated as this (for values: a=10, b=15 and c=20):
// 23ab + 34bc + 45ac
static void main() {
var expr2 =
constant(23).times(varExpr("a")).times(varExpr("b"))
.plus(
constant(34).times(varExpr("b")).times(varExpr("c"))
).plus(
constant(45).times(varExpr("a")).times(varExpr("c"))
);
var values = Map.of("a", 10, "b", 15, "c", 20);
System.out.println(expr2.evaluate(values));
}
// Output
// 22650
Data Oriented Programming
If you analyse the implementation of the example above, what we notice is that we have divided the implementation into two parts: definition of the data-model (the Expression type) and implementation of the behaviour (the evaluate() method).
To define the data-model, we used Algebraic Data Types, viz., records and sealed-types. This ensures that only valid states are allowed to exist. To define the behaviour we used powerful pattern-matching features. This makes code easy to read and easy to understand. This pattern is also known as Data Oriented Programming (DOP) [8] [9].
DOP is based on following principles:
- Model data immutably and transparently.
- Model the data, the whole data, and nothing but the data.
- Make illegal states unrepresentable.
- Separate operations from data.
If we follow these principles, we can utilize Data Oriented Programming to greatly increase the readability of Java programs.
References
[1] https://openjdk.org/projects/amber/[2] https://openjdk.org/jeps/395
[3] https://openjdk.org/jeps/409
[4] https://openjdk.org/jeps/394
[5] https://openjdk.org/jeps/440
[6] https://openjdk.org/jeps/441
[7] https://openjdk.org/jeps/456
[8] https://www.infoq.com/articles/data-oriented-programming-java/
[9] https://inside.java/2024/05/23/dop-v1-1-introduction/