Thursday, September 26, 2013

Utility and Equivalence

Credit: deviantART / sekeroglu

In a previous post, I outlined the four von Neuman-Morgenstern axioms for preferences (completeness, transitivity, continuity and independence), discussed my interpretation of the outcome consequentialism that underlies sensible interpretations of the axiom, and finished by talking about how they imply the existence of utility functions:

Theorem (von Neumann–Morgenstern utility theorem): Assuming the VNM axioms, there exists some linear (weighted-average preserving) function \(u: \mathcal{O} \rightarrow \mathbb{R}\) such that: \[o_1 \prec o_2 \Leftrightarrow u(o_1) < u(o_2)\]

These functions aren't unique: given any such function you can translate it or multiply it by a positive scalar to get another function that represents the same preference relation \(\prec\). In fact, for all intents and purposes, utility functions can be considered invariant under positive affine transformations.

This notion of invariance/equivalency has a couple of immediate consequences:

  • Outcomes can't (meaningfully) have zero utility. If Gretel has a utility function (over the outcome space \(\mathcal{O}\) of surprise gifts) like this: \[ \begin{align} u(\textrm{free apples}) &= 0 \\ u(\textrm{free lollies}) &= 1 \\ u(\textrm{influenza}) &= -3 \end{align} \]

    ...then there's nothing particularly special about the free apples option that we can infer from that value of 0. After all, we can also infer that Gretel has a utility function like this: \[ \begin{align} u(\textrm{free apples}) &= -2.5 \\ u(\textrm{free lollies}) &= 0 \\ u(\textrm{influenza}) &= -10 \end{align} \]

    ...and this is no less Gretel's utility function than the previous one was.

  • You can't (meaningfully) add utility functions together. There's no such thing as Hansel's utility function plus Gretel's utility function. Consider what happens when we try to add two functions pointwise: \[ \left( \begin{array}{rl} u_1(A) &= 0 \\ u_1(B) &= 1 \end{array} \right) + \left( \begin{array}{rl} u_2(A) &= 2 \\ u_2(B) &= 0 \end{array} \right) = \left( \begin{array}{rl} u(A) &= 2 \\ u(B) &= 1 \end{array} \right) \]

    But we could just have easily added the equivalent pair: \[ \left( \begin{array}{rl} u_1(A) &= 0 \\ u_1(B) &= 3 \end{array} \right) + \left( \begin{array}{rl} u_2(A) &= 2 \\ u_2(B) &= 0 \end{array} \right) = \left( \begin{array}{rl} u(A) &= 2 \\ u(B) &= 3 \end{array} \right) \]

    ...which is clearly not equivalent to our first sum (the former gives \(A \succ B\); the latter gives \(A \prec B\)). In general, pointwise addition of utility functions doesn't give a meaningful result.

  • You can't even add individual values from the same utility function together. E.g. trying to extend the utility to pairs of outcomes (lollies and apples!, influenza and lollies, two influenzas!) by the obvious \(u'(o_1, o_2) = u(o_1) + u(o_2)\) fails to work.

    (The only hack I can see that makes this well-defined in general is if you force the values to be from the exact same unscaled function. In which case you're just redefining probabilistic combination by way of \(u'(o_1, o_2) = u(50\%o_1 + 50\%o_2)\).)

More on utility later. But first, clocks.

Integers modulo 12

I woke up this morning and decided that what the world really needed was a formal definition of \(\mathbb{Z}_{12}\). Not \(\mathbb{Z}_n\) in general, not \(\mathbb{Z}_2\) or some other case with obvious practical value, but the integers modulo 12.

(It would later turn out that the different approaches I tried for this would point to a useful framework for describing the essence of utility functions. I would find the coincidence highly suspicious but think no further of it.)

For the zero of you reading an abstract math post without having encountered modular arithmetic before, \(\mathbb{Z}_{12}\) is the land of clock math. It is where \(5+3 = 8\), but \(10+5 = 3\) because it ticks around when it reaches twelve: 10, 11, 0, 1, 2, 3.

For everyone else, let's consider a remainder-based definition of the ring \((\mathbb{Z}_{12}, +, \times)\).

Definition (modular arithmetic, definition 1: normal form): Let \(\mathbb{Z}_{12} := \{0_{12}, 1_{12}, \cdots, 11_{12}\}\), where all the \(n_{12}\) are atoms. Addition is then defined as \(a_{12} + b_{12} := r_{12}\) where \(r\) is the unique integer in \([0,12)\) such that \(a + b = 12q + r\) for some \(q \in \mathbb{Z}\), and multiplication is defined similarly.

The above definition neatly captures the intuition of a modulo operator like you would encounter in most programming languages: operations are defined by performing them normally then taking the remainder mod 12.

It's easy to show that the operations are well-defined and commutative, and that 0 and 1 are identities for addition and multiplication. To complete the proof that this is a ring, we need to show the following:

  • Is addition associative? (E.g. does \((a+b)+c = a+(b+c)\)?)
  • Is multiplication associative? (E.g. does \((ab)c = a(bc)\)?)
  • Do addition and multiplication distribute? (E.g. does \(a(b+c) = ab + ac\)?)

The proof for all three of the above points is boring but annoyingly involved.

Proof that addition is associative in this definition of \(\mathbb{Z}_{12}\): Let \(a_{12}, b_{12}, c_{12}\) be given. By commutativity, it suffices to show that \(a_{12}+(b_{12}+c_{12}) = r_{12}\) for the unique \(r \in [0,12)\) such that \(\exists q \in \mathbb{Z}: a+b+c = 12q + r\). \[ \begin{align*} a_{12} +(b_{12}+c_{12}) &= a_{12} + x_{12} \\ &= r_{12} \end{align*} \]

...where \(x, r \in [0,12)\), and \(\exists q : 12q + x = b+c\) and \(\exists q : 12q + r = a+x\). It follows that: \[ \exists q, q': 12q + r = a + b + c - 12q' \]

...which trivially completes the proof.

Let's consider another definition of the ring \((\mathbb{Z}_{12}, +, \times)\), this time based on equivalence classes:

Definition (modular arithmetic, definition 2: equivalence): We start by defining an equivalence relation \(\equiv_{12}\) on the integers, which we'll pick to be the K5 (reflexive, symmetric and transitive) closure of \(\forall i \in \mathbb{Z}: (i \equiv_{12} i+12)\).

Examples: \(0 \equiv 0\), \(3 \equiv 15\), \(3 \not \equiv 4\), \(-2 \equiv 22\)

Now, let \(n_{12}\) be the equivalence class of \(n\), i.e. \(n_{12} := \{m \in \mathbb{Z} \; : \; m \equiv n\}\). Clearly \(n_{12} = \{n + 12q : q \in \mathbb{Z}\}\)

Let \(\mathbb{Z}_{12} := \{n_{12} \; : \; n \in \mathbb{Z}\}\) (i.e., the set of all equivalence classes).

Addition is then defined as \(A_{12} + B_{12} := \{a + b : a \in A_{12}, b \in B_{12}\}\). Multiplication is defined similarly.

The biggest legwork we have to do to formalise this definition is to show that the operations are closed, i.e. that adding or multiplying two equivalence classes gives us another one.

Proof that addition is closed in this definition of \(\mathbb{Z}_{12}\): It suffices to show that \(a_{12} + b_{12} = [a+b]_{12}\).

(\(\subseteq\)): For any \(a+12q \in a_{12}\) and \(b+12q' \in b_{12}\), we have \(a+b+12(q+q') \in [a+b]_{12}\).

(\(\supseteq\)): For any \(a+b+12q \in [a+b]_{12}\), pick \(a \in a_{12}, b \in b_{12}\).

Once we've done this, we get all of the other ring properties for (practically) free just by importing them over from \(\mathbb{Z}\). Associativity, negation, distributivity, identity... all of these can be derived from the equivalence we showed in the above prove, and the analogous properties over the integers.

While both of the above approaches produce isomorphic rings, there are a couple of things that I prefer about the latter approach.

Firstly, it doesn't require privileging a certain subset of the integers as part of our definition. While \([0,12)\) is a natural set of canonical integers for some applications, we could just as easily have picked \((-6,6]\) — this is not unlike the nomenclature problem of deciding whether angles range from \((-\pi,\pi]\) or \([0, 2\pi)\). Avoiding picking an ad hoc normal form is, in some sense, part of what makes the resulting proofs easier.

Secondly, we were able to define operations on \(\mathbb{Z}_{12}\) that clearly preserved the structure and behavior of those same operations on \(\mathbb{Z}\). (In the first definition, we instead had a larger burden of proof showing that e.g. addition behaved correctly regardless of whether we projected the operands into \([0,12)\) beforehand.)

Equivalent utility functions

Normalising our raw utility functions is tricky. They're not necessarily bounded, their domain doesn't necessarily have any a priori privileged elements to give nice round values to, and we don't have a measure on those functions that we can scale down to unit size. So if we're going to formalise the idea of some object representing a person's canonical utility function, the equivalence class approach is a far better starting point.

Definition: We say that two linear (weighted-average preserving) functions \(u, v: \mathcal{O} \rightarrow \mathbb{R}\) are equivalent (\(u \equiv v\)) if and only if, for every pair of outcomes \(o_1, o_2 \in \mathcal{O}\): \[u(o_1) < u(o_2) \Leftrightarrow v(o_1) < v(o_2)\]

(I.e. if the two functions order outcomes the same way.)

Lemma (equivalence is positive affine transformation): \(u \equiv v\) if and only if \(u = a \cdot v + b\), for some \(a \in \mathbb{R}^+, b \in \mathbb{R}\).

Proof (sketch):

(\(\Leftarrow\)): Trivial.

(\(\Rightarrow\)): If they're constant functions, we're done. Otherwise pick some \(o_1, o_2: u(o_1) < u(o_2)\), and derive the implied values of \(a\) and \(b\). Show that the desired affine transform holds by contradiction: if \(u(o_\bot) \neq a \cdot v(o_\bot) + b\), we can construct some weighted average of \(o_1\), \(o_2\) and \(o_\bot\) that breaks one of our assumptions.

Definition (utility equivalence classes): The equivalence class of the raw utility function \(u: \mathcal{O} \rightarrow \mathbb{R}\) is: \[ \begin{align*} [u] &:= \{v \in \mathcal{O} \rightarrow \mathbb{R} : u \equiv v\} \\ &= \{a \cdot u + b : a \in \mathbb{R}^+, b \in \mathbb{R}\} \end{align*} \]

And now, to finish the analogy to \(\mathbb{Z}_{12}\)...

Definition (utility function space): Let the set of utility functions over \(\mathcal{O}\), \(\mathbb{U}[\mathcal{O}]\), be the set of all raw utility function equivalence classes. That is: \[\mathbb{U}[\mathcal{O}] := \{[u]: u \in \mathcal{O} \rightarrow \mathbb{R}, u \textrm{ linear} \}\] (\(\mathbb{U}[\mathcal{O}]\) has type \(\mathbb{PP}(\mathcal{O} \rightarrow \mathbb{R})\).)

We call the equivalence classes utility functions, since these are the objects for which the VNM axioms predict one-to-one correspondence with preference relations. (They are not functions in a strict sense; they are sets of equivalent functions. However, this fits with the shorthand use of phrases such as Katniss's utility function.)

Corollary to von Neumann–Morgenstern utility theorem: Assuming the VNM axioms, there exists some unique utility function \([u] \in \mathbb{U}[\mathcal{O}]\) such that for all \(u \in [u]\): \[o_1 \prec o_2 \Leftrightarrow u(o_1) < u(o_2)\]

What can we meaningfully say about utility functions?

  • You can compare two outcomes to see which one is preferred. E.g. given some \([u] \in \mathbb{U}[\mathcal{O}]\), you can pick an arbitrary \(u \in [u]\), and calculate the sign of \(u(b) - u(a)\).

    No, this doesn't tell us anything new we didn't know from VNM axioms 1 and 2.

  • You can compare three outcomes to determine their relative desirability. E.g. given some \([u] \in \mathbb{U}[\mathcal{O}]\), you can pick an arbitrary \(u \in [u]\), and (if \(u(a) \neq u(b)\)) calculate the value of \(\frac{u(c)-u(a)}{u(b)-u(a)}\).

    This gives us a constructive way to determine the values of \(\alpha\) described in VNM axiom 3.

You could actually condense all of this information down to a single function \(\gamma : \mathbb{U}[\mathcal{O}] \rightarrow \mathcal{O}^3 \rightarrow \mathbb{R} \cup \{\infty\}\), which gives the normalised ratio between outcome utilities: \[\gamma([u], a, b, c) = \dfrac{u(c) - u(a)}{\left|u(b) - u(a)\right|}\]

So to calculate whether \(a \prec b\) according to the utility function, you just check the sign of \(\gamma(u, a, b, b)\).

Claim: That's it.

I'm not completely sure how to formalise the idea of you can't extract any useful information from a utility function besides what \(\gamma\) gives you. Here's one attempt:

Claim (ugly type theory version): For any index set \(I\) and result set \(X\), for any \(\mathcal{O}\)-agnostic function \(f: \forall \mathcal{O} \cdot \mathbb{U}[\mathcal{O}] \rightarrow (I \rightarrow \mathcal{O}) \rightarrow X\), there exists some \(g: (I^3 \rightarrow \mathbb{R} \cup \{\infty\}) \rightarrow X\) such that: \[f([u],\Omega) = g\left(\lambda a \; b \; c \cdot \dfrac{u(\Omega(c)) - u(\Omega(a))}{\left|u(\Omega(b)) - u(\Omega(a))\right|} \right)\]

Explanation: Think of \(I \rightarrow \mathcal{O}\) as a generalisation of outcome collections, for example such as \(\mathcal{O}^4\) (which is isomorphic to \(\mathbb{Z}_4 \rightarrow \mathcal{O}\)). (The generalisation leaves us with the potential to extend to collections of arbitrarily large cardinality.)

We're claiming that for any \(\mathcal{O}\)-agnostic function from \(\mathbb{U}[\mathcal{O}] \rightarrow \mathcal{O}^4 \rightarrow X\), we can implementing it by calculating the normalised ratio between all \(4^3 = 64\) triples you can pick from the input, and passing these ratios to some function \(g: (\mathbb{R} \cup \{\infty\})^{64} \rightarrow X\).

Proof: Hahahaha nooooo ideaaaa I have discovered a truly marvelous proof of this which this box is too narrow to contain.