Generalizing PEMDAS: What is an operator?
By Robert Jacobson
In programming languages, an operator is a symbol used to represent a specific operation such as subtraction of integers or dereferencing a pointer. The symbols +
, *
, and !
are commonly used as operators in many programming languages, for example. In this article we define some basic programming language terminology that allows us to categorize operators according to their different properties.
Computation with Expressions
Virtually all programming languages have expressions, like 2 + 4
, which consist of operators (the +
operator in 2 + 4
) and their operands (or arguments; the 2
and the 4
), that is, the things operators operate on. This language is barrowed from mathematics where, for example, $17/2 + \sin(\pi/2)$ is an expression that evaluates to the value $19/2$. Most programming languages also have other language constructs like statements and declarations that are not expressions. In some languages, like Wolfram Language, everything is an expression. In these languages, an entire program itself is an expression, albeit possibly a very complex one, and executing a program is called evaluation.
Properties of Operators
Arity
The arity of an operator is just a fancy word for how many operands (arguments) the operator takes. Here are some common examples:
Arity | Operator | Example Expression |
---|---|---|
“Nullary” (takes no arguments) | Constants | 7 |
Unary | Factorial: ! | $3! = 3 \cdot 2 \cdot 1 = 6$ |
Binary | Addition: + | $2 + 4 = 6$ |
Ternary | The conditional operator found in many languages: a ? b : c | // if a>b then x else y result = a > b ? x : y; |
Variadic (takes a variable number of arguments) | Lists: {x, y, ...} | { } (empty list);{2, 3, 5, 7, 11} |
$n$-ary (takes $n$ arguments) | A function of $n$ arguments: $f(a_1, a_2, \ldots, a_n)$ | DeleteCases[expr, pattern, levelspec, n] |
You will occassionally see authors abuse the language above by using n-ary to mean variadic. The Wikipedia article on arity has a long list of obscure names for different arities.
Affix
The affix of an operator refers to the operator’s position within an expression with respect to its operands. Again, some common examples:
Affix | Operator | Example Expression |
---|---|---|
Prefix: precedes its operand(s) | Unary minus: - | $-7$ |
Postfix: follows its operand(s) | Factorial: ! | $3!$ |
Infix: between its operands | Addition: + | $2 + 4$ |
Matchfix (Circumfix): surrounds its operand(s) | Floor function: ⌊ ⌋ | $\lfloor{2.78}\rfloor = 2$ |
The affix of operators with more than two operands defy easy categorization. To illustrate, suppose we wish to invent our own ternary operator, and we have the symbols @
, #
, $
, and &
available to use to represent our operator if we need them. If a
, b
, and c
are operands, then we could define the operator as @ a # b $ c &
or a # b $ c &
or @ a # b $ c
or a # b $ c
. Or instead of using different symbols to separate the operands, we could use just one symbol, as Wolfram Language does with the Span
operator: start ;; stop ;; step
.
Associativity
The associativity of an operator determines how an operator groups with expressions around it. Consider the exponentiation (power) operator ^
in the expression 2^3^2
$=2^{3^2}$, which is a perfectly valid expression in many programming languages and in mathematics. Setting aside mathematical tradition, this expression could mean either (2^3)^2
$= (2^3)^2 = $ 8^2 = 64
, but it could also mean 2^(3^2)
$= 2^{(3^2)} = $ 2^9 = 512
. These are clearly very different meanings! In the first interpretation, we say the ^
operator associates to the left, because the left-most operation is done first. In the second interpretation, we say the ^
operator associates to the right. Mathematical tradition and most programming languages define ^
as right associative.
Precedence
You may recall from elementary school that some mathematical operators must be done before others. In the expression $2+3\times 5$, for example, we need to multiply the 3 by 5 before we do the addition: $2+3\times 5 = 2+(3\times 5) = 2+15 = 17$. If we fail to follow the correct order of operations, we are likely to get the wrong answer: $2+3\times 5 \neq (2 + 3) \times 5 = 5 \times 5 = 25 \neq 17$. School children learn the correct order of operations by remembering the initialism “PEMDAS” or, “Please Excuse My Dear Aunt Sally,” which assigns the arithmetic operators the precedence:
- Parentheses
- Exponents
- Multiplication
- Division
- Addition
- Subtraction
(In fact, we usually give multiplication the same precedence as division, and addition the same precedence as subtraction.) The numbers in the list above are called precedence levels and tell us the order each operation is performed when an expression is evaluated. If we expand our list of operators, we must decide which operations should precede which other operations during evaluation of the expression, that is, what precedence each operator should have.
An equivalent way of thinking of the precedence of an operator is as the “binding power” of the operator. In the expression $2+3\times 5$, we say that the $\times$ operator binds more tightly to nearby expressions (numbers, in this case) than does the $+$ operator, and consequently $2+3\times 5 = 2+(3\times 5)$
Conclusion
The arity, affix, associativity, and precedence of an operator are essential to know in order to determine which parts of an expression a given operator is acting on and therefore how the expression should be evaluated. A parser for a programming language is responsible for correctly interpreting its input according to these operator properties. In fact, some parsing strategies rely entirely on operator properties to interpret the meaning of expressions in programs.