Expressions are difficult to represent because they form a matrix of operations and data types. Languages make it either easy to add new operations but difficult to add data types or easy to add new data types but hard to add new operations. This problem isn't limited to expressions, of course, but it is a very clear and familiar example so I will use it to characterize the more general problem. I will present four ways to approach the problem, each with their particular strengths and weaknesses. For my examples I representing a simple expression language comprising of the following expression forms,
number Literal expr + expr Add expr - expr Subtract expr * expr Multiply expr / expr Divide
Some operations I want to perform on these expressions include evaluating expressions and printing expressions. The operations and the expressions form a matrix of functionality, expression kind by operation,
Evaluate Literal evaluate literal print literal Add evaluate add print add Subtract evaluate subtract print subtract Multiply evaluate multiply print multiply Divide evaluate divide print divide
Languages either make it easier to add new columns, that is operations, or add new rows, that is new expression kinds. Procedural and functional languages typically make it easy to add new operations. Object-oriented languages typically make it easier to add new expression kinds represented as new data types.
The first approach I will demonstrate is procedural. I define a data type,
Expr, that will hold the operator and the operands.
The expression evaluation operation for this data structure looks like,
The advantages to a procedural approach are,
- It is very simple to add additional operations (such as
- All the logic for an operation is centralized in one place, and often in a single routine.
- New operations can be added without having the modify existing operations.
- New operations can be also be added dynamically without affecting or needing to recompile existing operations.
To demonstrate some of the advantages of using a procedural approach we will add a print operation. It looks like,2
Expr type doesn't
require any modifications to
Some disadvantages to a procedural approach are,
- It is difficult to add additional data types (in our example, each data type represents a different operator) because adding a new type affects all operations.
- Each operation must account for every data type.
- The internal representation of the data type must be accessible to every operation. This makes even minor changes to the data layout have far reaching effects and often leads to difficult to maintain implementations.
- It is very difficult, and requires careful planning, to dynamically add data types.
- Efficient storage of the data types requires variant record support in the language, as discussed in note 1 below, (which can be simulated though inheritance in object-oriented languages but I don't provide and example of that).
To demonstrate the difficulty of adding a data type, let's add power support.
expr ^ expr Power
This represents the expression xn. To implement this we first we need to modify the enumeration
Next we need to add an additional case statement to
An finally we need to also add another case statement to
Note we had to modify both operations and the
Oper enumeration. If we had several
operations, we would have had
to modify most, if not all, of them. Also, note that the compiler didn't help us
find any of the places we need to modify since the code was legal after each of the above modifications.
If we had made the changes to just
example, the compiler wouldn't have complained, but we would receive a runtime
error if we called
Print() with an expression tree
that contains a
To learn more about the expression problem and its history, see Kim Bruce's excellent paper, Some Challenging Typing Issues in Object-Oriented Languages. Be warned, it will be a bit of a spoiler for some of what I will cover.
1 - A casual observer will notice that
Expr is a bit wasteful in that it allocates
room for a value for every node instead of only for literals. It also takes up
Right for literals. In C#, removing this waste is a bit awkward
to express and makes the
Evaluate() method more
complicated, so I ignored the waste in favor of simplicity.
In C++, this is a bit less awkward, and might look like,
with a very similar evaluation function, but for now we will stick with C#. << back <<