The problem is the following:
Define the set of ternary trees T_3 as:
(a) The empty tree, \lambda, is an element of T_3,
(b) If trees T_L, T_M, T_R are elements of T_3 having no nodes in common, and R is a node that is not in T_L, T_M or T_R then T = (R, T_L, T_M, T_R) is an element of T_3. We call T_L the left subtree, T_M the middle subtree and T_R the right subtree, of T.
For arbitrary natural number n, find a (possibly recursive) formula for the number of non-equivalent ternary trees with n nodes. Prove that your formula is correct.
Polya's four steps are:
1) Understanding the problem.
2) Devising a plan.
3) Carrying out the plan.
4) Looking back.
1) First I had to read the problem carefully and make sure I understood what was asked. Understanding the definition of a ternary tree and what consists an equivalent ternary tree is key in order to avoid duplicates.
2) My first reaction was to draw the ternary trees with a small number of nodes say up to four or five nodes. I was expecting a pattern to emerge that would help me come up with the formula. Then I would have to prove that formula correct.
3) I started drawing the ternary trees with zero, one and two nodes. I got 1, 1 and 3 respectively. The I started drawing trees with three nodes. Things started to look a bit more complicated but nothing too hard. The total number was 12. When I moved to four nodes things got messy. I was working hard to avoid duplicates. I got 55 ternary trees with four nodes.
Just to get an idea of what my drawings looked like, here is a picture of some of the ternary trees with four nodes:
When I moved to n=5, after hours and hours of careful drawing, I got 300 ternary trees. I talked to Professor Heap and he told me that the correct number for n=5 is 273. So I was not able to avoid duplicates. I spent more hours trying to figure out where I went wrong. I managed to find the problem (symmetry was to blame together with carelessness on my part). Even with the correct number I was not able to find a pattern. Clearly my plan was not working.
4) Looking back, it's obvious that my plan was not working. I had to come up with a new plan. The next thing that came to my mind was to approach the problem with a combinatorial perspective. So, I went back to step 3.
3) Here is a proof of the problem using a combination of combinatorics and complete induction:
First let T(n) be the number of ternary trees with n nodes.
The recursive formula that gives the number of ternary trees with n nodes is:
(i) T(n) = 1, for n=0
(ii) T(n) = Sum_i={0}^{n-1}Sum_j={0}^{n-1-i}[T(i)*T(j)*T((n-1)-(i+j))], for n>0 possibility, the empty tree. The recursive forumla above
A ternary tree has the root note and a left subtree, a middle subtree and a
right subtree. Any of the subtrees can be the empty tree with zero nodes. In the
forumla above i stands for the number of nodes the left subtree can have and
of course it can have zero nodes up to (n-1) nodes (the other is the root node).
j is for the number of nodes that the middle subtree can have. In this case, j
can be zero, all the way up to (n-1)-i. Note that after i and j are selected,
the number of nodes that the right subtree has is determined since all of the
nodes must add up to (n-1).
Next, I will prove the following statement (claim),
P(n): "The recursive formula above gives the number of ternary trees with n
nodes."
Proof (using the principle of complete induction):
Base Case: For n=0, when we have zero nodes, we have only one
also tells us that T(0)=1 so P(0) holds.
Induction hypothesis: Assume that the formula is correct for ternary trees with
one node, for ternary trees with two nodes, for ternary
trees with three nodes, ..., for ternary trees with (n-2)
nodes, for ternary trees with (n-1) nodes. That is, we
assume that all of P(1), P(2), P(3),..., P(n-1) are true.
Now consider ternary trees with n nodes. If we exclude the root node, we have
(n-1) nodes to distribute among the subtrees. When the number of nodes for the
left and middle subtrees is determined, the number of nodes for the right
subtree is also determined because they need to add up to n-1. So, if we have i
nodes in the left subtree and j in the middle one, the right subtree has
(n-1)-(i+j) nodes. The total number of subtrees for this particular case (for
specific i and specific j) is the number of the different combinations of left,
middle and right subtrees. We know that we get the total number of combinations
by multiplying them. So, for a specific i and a specific j, the total number is
given by T(i)*T(j)*T((n-1)-(i+j)). That's the expression we have inside the
double sum. So, at each step, the formula counts the total number of ternary
trees when we have i nodes in the left subtree, j nodes in the middle subtree
and the remaining nodes in the right subtree. Note that i, j as well as
(n-1)-(i+j) get values from zero to n-1 (including the end points), therefore
they are in the range we want them to be for us to be able to use the induction
hypothesis.
So, by our assumption, we know that we have the correct number of ternary trees
with one node, with two nodes, ..., with (n-1) nodes. Thus, we are considering
all the different cases and at each case we have the mutliple of three numbers
that we know are correct by our assumption.
Therefore, we have that \forall n \in N, {P(0), P(1), ..., P(n-1)} \implies P(n)
and we hence, by the principle of complete induction, we can conclude that P(n)
holds for all natural numbers n. That is, we can conclude that the recursive
formula defined in the beginning of the problem gives us the number of ternary trees with n nodes, for all natural numbers n.