Kipli's Cage

What do you mean?
How do you know?
What does that entail?
Sums of products of random integers

Let `f` be a function that chooses a random integer from 1 to 100 (inclusive) with a uniform probability distribution. Then the probability that what `f` returns is an even integer is 1/2. We will use `f * f` to denote calling `f` twice and multiplying the results, so it is simply the product two randomly chosen integers. In general, let `f^n` represent calling `f` `n` times and multiplying the `n` random integers returned.

Let `P(a = E)` be the probability that `a` is an even integer and `P(a = O)` the probability that `a` is odd. Show that

  1. `P(f^n = E) = 1 - (1/2)^n`

  2. `P(\sum_{k=1}^n f^k = E) = 1/2`

So the probability that the product of a large number of these random integers is even is close to 1, but the probability that the sum of the products of these random integers is even is always 1/2.

What happens if `f` chooses a random integer from 1 to 3? Or more generally, assume that `f` is choosing from some finite set of positive integers with `a` even integers and `b` odd integers. Now what is the probability that the sum of the products, `\sum_{k=1}^n f^k`, is even?

Let `s_n = P(\sum_{k=1}^n f^k = E)`. Set `p = b/(a+b)`, so `s_1 = 1 - p`. Following the proof of (2), we can write a recursive expression for `s_{n+1}`:

`s_{n+1} = P(f^{n+1} = E) s_n + P(f^{n+1} = O) (1 - s_n) = (1 - p^{n+1})s_n + p^{n+1} (1 - s_n) = p^{n+1} + (1 - 2p^{n+1})s_n`.

Since `|s_{n+1} - s_n| < p^{n+1}`, the sequence `\{s_n\}` forms a Cauchy sequence, and so converges to some value `A`.

But what is `A`? Numerical experiments indicate that it is not 1/2 (or if it is, then the rate of convergence is very slow) and a closed form expression for `s_n` generated by Maple is not especially helpful. So at this point I'm stumped.

The number 7712

The number 7712, prime factorization `2^5 241`, appears in several integer sequences. An interesting one is A063776, the number of subsets of `\{1,2,...,n\}` which sum to 0 mod `n`. Investigating this sequence some more leads to a result about the number of losing positions in the game of Nim.

Recall that Nim is played with sticks in heaps: a move is to remove any number of sticks from a single heap and the loser is the player who cannot take any sticks. If there are `k` heaps of `n_i` sticks in heap `i`, then we write the position as `(n_1,n_2,...,n_k)`. If there are no sticks in any heap then the position is (0); if there is more than one heap then we will assume that all the `n_i` are positive (since a heap with 0 sticks doesn't affect the game).

Theorem. Fix `m \geq 1`. The number of losing positions `(n_1,n_2,...,n_k)` in the game of Nim where the `n_i` are distinct and less than `2^m` is `L_m = 2^{2^m - m - 1}`.

Example: if the heap sizes can be between 1 and 7 then `L_3 = 16`. The losing positions are: (4, 5, 6, 7), (3, 5, 6), (3, 4, 7), (2, 5, 7), (2, 4, 6), (2, 3, 6, 7), (2, 3, 4, 5), (1, 6, 7), (1, 4, 5), (1, 3, 5, 7), (1, 3, 4, 6), (1, 2, 5, 6), (1, 2, 4, 7), (1, 2, 3), (1, 2, 3, 4, 5, 6, 7), and (0).

The connection between this result about Nim and the number 7712 can be found in We will get to this result about Nim from the number 7712 through the paper An interesting result about subset sums by N. Kitchloo and L. Pachter. In that paper, the authors derive a formula for the number of subsets `B` of `\{1,2,...,n\}` such that `\sum_{b\in B} b \equiv k` mod `n`.

For example, there are 6 subsets of `\{1,2,3,4,5\}` whose members have sum congruent to 2 mod 5: `\{2\}, \{1,2,4\}, \{3,4\}, \{0,2\}, \{0,1,2,4\}, \{0,3,4\}`.

The formula given by Kitchloo and Pachter works for arbitrary `k` and `n` and involves Euler's totient function `\phi(t)` and the Mobius function `\mu(t)`. But for the special case that `k \equiv 0`, the formula reduces to

`N_n = 1/n \sum_{t|n, t   odd} 2^{n/t} \phi(t)`

(By convention, the empty set is included in this count.) It's not hard to see that the number that started this, 7712, is `N_{17}`.

Kitchloo and Pachter also present a formula for the number of subsets of a finite abelian group `G` whose elements sum to the identity element (0) of `G`. This formula is not terribly complicated but all we will need is the special case that `G` is the direct sum of `m` copies of `\mathbb Z_2`. Then the formula for the number of subsets with zero sum is simply `2^{2^m - m}`.

Now, this count includes the empty set and the set containing just the zero element `(0,0,...,0)`. Of the others, half will contain only non-zero elements, so there are `2^{2^m - m - 1}` nonempty subsets with zero sum and either only non-zero elements or the single element `(0,...,0)`. This is the number given in the theorem; we need only connect it with losing positions in Nim.

There is an easy way to tell if a position in Nim is winning or losing: XOR the binary digits of the number of sticks in the heaps. If the result is all 0's then the position is losing, otherwise it is winning (and the person to move should take the number of sticks needed to change the XOR to all 0). Here's an example with 4 heaps:

# sticksBinary
3 0011
9 1001
12 1100
15 1111
XOR: 1001

Since the XOR is 1001 and not 0000, this is a winning position for the player making the move; he should take 9 sticks from one of the heaps.

The connection between positions of Nim and `\mathbb Z_2^m` is obvious: map the number of sticks in a heap with its binary representation (padded to the left with 0's if needed), then a Nim position is a subset of `\mathbb Z_2^m`; the zero element of `\mathbb Z_2^m` corresponds to the zero game (0) of Nim. Since XOR binary digits and addition in `\mathbb Z_2^m` are equivalent, losing positions of unequal heap sizes are in one-to-one correspondence with subsets of `Z_2^m` with sum of `(0,...,0)`. This proves the theorem.

It turns out that the sequence `L_m` is sequence A016031, a de Bruijn sequence (the number of ways of arranging `2^n` bits in a circle so all `2^n` consecutive strings of length `n` are distinct). And that looks interesting as well...

Fun with KnotPlot

KnotPlot is a wonderful program for generating, viewing, playing with knots. The program itself has been around for a while, but I first saw it yesterday at a talk on knots. A description of the features of KnotPlot:

Knots can be loaded from a database of almost 1000 knots and links or sketched by hand in three dimensions. Also, knots may be constructed via the Conway notation or using a tangle calculator. A number of special knot types (torus knots, knot chains, Lissajous knots) may be created on the fly. Finally, new knots can be created from old knots using a number of transformations.

There are lots of extras beyond knots, such as creating knots from braids, Celtic knots, and other fascinating demos. In addition to a nice learning tool for knot theory, this is the kind of thing I could spend hours just playing with.

A knot
No Need for Voting

I helped judge a science fair this weekend, and in the process came across an instance of over-mathematization of the world.

The situation was this: I and a group of 5 other people were charged with the task of selecting the top three projects from about a dozen. After going through each project individually and then again as a group while talking with the students who worked on the project, we came up with our own impressions of each project. Now to the decision stage.

We first went down the list of projects and eliminated those that we felt were not 'top three' material. We did this by general consensus: if anyone felt that a project was worth talking about some more then it was added to the list of potential winners. At the end of this we had six projects to consider.

So how do we decide which project is top ranked, which is second ranked, and which is third? This can be viewed as a voting problem, and a method to solve it is called a voting system.

Now, there are lots of voting systems. The most common one is simple majority rule---vote for the choice you most prefer and count how many votes each choice (candidate) receives; the winner is the one with the most vote. Unfortunately, this obviously may not work for more than two choices.

For three or more choices, we could just determine the winner by plurality: the winner is the one with the most votes, though not necessarily the one with the majority of votes. But this method has some problems, most obviously that we need to do more than select a single winner; we also need second and third place. Plurality voting also has the problem that it may encourage voters to not vote for their top-ranked choice, especially if it appears that that their third-ranked choice will win a plurality of the votes. Perhaps by voting strategically (and not true to their preference), a voter can swing the election to their second-ranked choice.

Another method we could use is a Borda count: each person assigns `n-1` points to the top-ranked choice, `n-2` points to the second-ranked choice, and so on to 0 points for the last-ranked choice. Then the points are totaled for each candidate and the winner is the one with the most points (and the second and third place can also be determined). This will find our rankings in one vote.

However, the Borda count also has its problems. In some situations it is possible for a group of voters (or even one voter) to change their rankings and make a winner become a loser while still ranking the original winner first! For example, suppose there are 22 voters with these rankings:

# voters: 5 4 2 4 1 6
1st A A B B C C
2nd B C A C A B
3rd C B C A B A
Points:
A 10 8 2 0 1 0
B 5 0 4 8 0 6
C 0 4 0 4 2 12

So A gets 21 points, B gets 23 points and C gets 22 points; the Borda ranking is B-C-A. But now suppose that the 4 voters who prefer B-C-A were to change their preferences to B-A-C. They still prefer B (the previous winner), but now the Borda totals come out to 25 points for A, 23 points for B and 18 points for C. Candidate A has gone from third place to first, and the voters who put him there still prefer B to win!

Other voting systems have other problems. In fact, given a set of quite reasonable criteria that a voting system should satisfy (such as not allowing the situation above), there is no voting system that satisfies all of the criteria. This is upshot of Arrow's Impossibility Theorem.

So which voting system should we have used to rank the projects? In fact we didn't use any of them, for the simple reason that we didn't need to. After ranking the projects individually, we put our rankings on the board and saw what the consensus was. In our case, it was clear which project was ranked first, and that the second-third ranks were between two others. After a brief discussion, we all agreed on an ordering and that was that.

And in our case, that was the best option. There was no need to 'quantify' the results since there was no need for a final, end of discussion vote. Unlike most elections, the group was able to continue discussion about the projects until there was a consensus. Perhaps if we had not been able to reach a consensus we would have had to vote, but luckily that wasn't necessary.

However, one member of the group insisted on getting 'objective' results that would 'quantify' the situation, using what (I think) amounted to the Borda count system. With these results, he argued, we could see that the 'right' result was, and so eliminate possible 'force of personality' effects that may arise during discussion.

But why should the Borda count give the 'right' result? Why should any one quantification be preferred over any other? I could understand why it would be nice to protect the wilting lilies of the world who cannot stand up for themselves, but when our rankings are arrived at independently, then displayed for all to see, I'm not sure that personality problem is very significant. (And lest I be accused of being the forcing personality, I should point out that I said nothing at all during the discussion of our second-third choices.)

I think he places too much emphasis on 'quantified' results, especially in this case when there was no need for them. It was especially curious given that there is no objective way to determine which system we should have been using. After all, what criteria we want our system to satisfy is a matter of preference itself; should we have a vote to determine what voting system we should use?

Mathematics is good, and quantification can be useful, but sometimes it is not necessary for reaching a good decision.