Fermat's Last Theorem: 2007-02-18

Monday, February 19, 2007

Newton's Identities: Fundamental Theorem on Symmetric Polynomials

In today's blog, I will present the Fundamental Theorem of Symmetric Polynomials. In my previous blog, I talked about symmetric polynomials and elementary symmetric polynomials. In today's blog, I show the relationship between the two.

The content in today's blog is taken straight from Harold M. Edwards' Galois Theory.

Theorem 1: Fundamental Theorem on Symmetric Polynomials

Every symmetric polynomial in r₁, ..., r_n can be expressed as a polynomial in elementary symmetric polynomials σ₁, ..., σ_n. In addition, a symmetric polynomial with integer coefficients can be expressed as a polynomial in elementary symmetric polynomials with integer coefficients.

Proof:

(1) For n=1, the symmetric polynomial is x - r₁ so that we see that a₁ = -r₁ and σ_i = (-1)¹(-r₁)=r₁

[See Definition 1, here for definition of symmetric polynomial; see Definition 2, here for definition of the kth elementary symmetric polynomial]

(2) if a₁ is an integer, then r₁ is an integer and σ₁ is an integer.

(3) The polynomial in elementary symmetric functions is then x - σ₁ which shows that the theorem is true for n=1.

(4) Let's assume that the theorem has been proved for all symmetric polynomials up to n-1.

(5) Let G(r₁, r₂, ..., r_n) be a symmetric polynomial in n variables so that:

G(r₁, r₂, ..., r_n) = (x - r₁)*(x - r₂)*...*(x - r_n)

(6) Let G_i be a series of polynomials such that:

G(r₁, r₂, ..., r_n) = G₀ + G₁r_n + G₂(r_n)² + ... + G_v(r_n)^v

where:

v is the highest power of r_n that occurs in G and each G_i is only made up of r₁, r₂, ..., r_n-1.

We can see that G₀ is a polynomial that doesn't include r_n. G₁ includes all multiples of (r_n)¹. G₂ includes all the multiples of (rn)² and so on up until v which is defined as the highest power of r_n which can be 1.

(7) Now, since G(r₁, r₂, ..., r_n) is a symmetric polynomial, it is unchanged if any two variables r_i and r_j are interchanged. [See Definition 1 here for definition of symmetric polynomial]

(8) Since G(r₁, r₂, ..., r_n) is unchanged with any interchange, so too are the polynomials defined by G_i are unchanged.

(9) This means that each G_i is itself a symmetrical polynomial on r₁, r₂ ... r_n-1.

(10) By the inductive hypothesis in step #4, we can assume that each G_i is itself expressible as an elementary symmetric polynomial in r₁, r₂, ..., r_n-1

(11) Let τ_i, τ₂, ..., τ_n-1 denote these elementary polynomials in (n-1) variables such that [See Definition 2 here for definition of elementary symmetric polynomials]:

τ₁ = r₁ + r₂ + ... + r_n-1

τ₂ = r₁r₂ + r₁r₃ + ... + r_n-2r_n-1
...

τ_n-1 = r₁r₂*...*r_n-1

[NOTE: Each τ_i represents the sum of all i-combinations of 1 ... n-1. So, τ₁ is the sum of all 1-combinations, τ₂ is the sum of all 2-combinations, and τ_n-1 is the sum of all n-1 combinations (for which there is only one)].

(12) So, from the inductive hypothesis, if we let G_i(r₁, ..., r_n-1) represent the symmetric polynomial G_i, we can see that there exists another polynomial g_i such that:

G_i(r₁, ..., r_n-1) = g_i(τ₁, τ₂, ..., τ_n-1)

(13) Further, from the inductive hypothesis, we know that if G_i has integer coefficients, then, so does g_i.

(14) Let σ₁, σ₂, ..., σ_n be the elementary symmetric polynomials in n variables.

(15) We can now restate σ_i in terms of τ_i since:

σ₁ = r₁ + r₂ + ... + r_n = τ₁ + r_nσ₂ = r₁r₂ + r₁r₃ + ... + r_n-1r_n = τ₂ + r_nτ₁

[Note: r_n*τ₁ is equal to the sum of all (n-1) 1-combinations with r_n]

σ₃ = r₁r₂r₃ + r₁r₂r₄ + ... + r_n-2r_n-1r_n = τ₃ + r_nτ₂[Note: r_n*τ₁ is equal to the sum of all (n-1) 2-combinations with r_n]

σ₄ = τ₄ + r_nτ₃

...

σ_n = r₁*r₂*....*r_n = 0 + r_n*t_n-1

[Note: the main idea here is the σ_i represents the sum of all i-combinations of n variables. We are now equating these sums with τ_i which represents the sum of all i-combinations of (n-1) variables. In other words, we are now including the new combinations with r_n.]

(16) We can now restate the equations in step #15 in terms of τ_i to get (using the basic algebraic operations of addition/subtraction to both sides of the equation):

τ₁ = σ₁ - r_n

τ₂ = σ₂ - r_nτ₁ = σ₂ - r_n(σ₁ - r_n) = σ₂ - r_nσ₁ + (r_n)²

τ₃ = σ₃ - r_nτ₂ = σ₃ - r_nσ₂ + (r_n)²σ₁ - (r_n)³.

...

τ_n-1 = σ_n-1 - r_nτ_n-2 = σ_n-1 - r_nσ_n-2 + ... + (-1)^n-1(r_n)^n-1

(17) Finally, we can use the last equation in step #15, to get:
0 = σ_n - r_nτ_n-1 = σ_n - r_nσ_n-1 + ... + (-1)ⁿ(r_n)ⁿ.

(18) Since we restate all the terms τ_i in terms of r_n and σ_i, we can define a polynomial f_i(σ₁, σ₂, ..., σ_n-1), such that using step # 12, we have:

G(r₁, r₂, ..., r_n) = f₀ + f₁(σ_n) + f₂(σ_n)² + ... + f_μ(σ_n)^μ

where each f_i is a polynomial strictly in terms of σ₁, σ₂, ..., σ_n-1 and does not include σ_n.

(19) Each f_i has integer coefficients if G does since:

(a) Using steps #16, we can state each f_i in terms of τ₁,...,τ_n-1

(b) From step #13, we know that τ₁,...,τ_n-1 have integer coefficients if G has.

(20) From step #17, we get the following:

(r_n)ⁿ = (r_n)^n-1σ₁ - (r_n)^n-2σ₂ + (r_n)^n-3σ₃ + ... + (-1)^n-1σ_n

(21) If μ ≥ n, then we can continuously apply the result in step #18 to the result in step #20, to get:

G(r₁, r₂, ..., r_n) = f₀(σ) + f₁(σ)r_n + f₂(σ)(r_n)² + ... + f_n-1(σ)(r_n)^n-1

(22) Since interchanging any values r_i with r_j doesn't change the value of G(r₁, r₂, ..., r_n), then it doesn't change the value of f₀(σ) + f₁(σ)r_n + f₂(σ)(r_n)² + ... + f_n-1(σ)(r_n)^n-1

(23) By repeatedly interchanging distinct r_i and r_j values where i ≠ j, we get the following n set of equations using step #21:

G(r₁, r₂, ..., r_n) = f₀ + f₁r₁ + f₂(r₁)² + ... + f_n-1(r₁)^n-1 G(r₁, r₂, ..., r_n) = f₀ + f₁r₂ + f₂(r₂)² + ... + f_n-1(r₂)^n-1 ^...G(r₁, r₂, ..., r_n) = f₀ + f₁r_n + f₂(r_n)² + ... + f_n-1(r_n)^n-1

(24) We can think of these n equations of n terms as an n x n matrix of the form (r_i)^j-1 where i = the column (1 ... n) and j = the row (1 ... n).

(25) The determinant of this matrix which is a polynomial in r₁, r₂, ..., r_n is nonzero since this is an example of the transpose of the Vandermonde Matrix [see Definition 1, here] and:

(a) From the properties of determinants, we know that det(V_n) = det(V_n^T) [See Theorem 9, here]

(b) The formula for the determinant of the Vandermonde matrix is (see Theorem 1, here):

(c) Since by assumption r₁ ≠ r₂ ≠ ... ≠ r_n [since each of the parameters is distinct], we can conclude that det(V_n) ≠ 0.

(26) We can think of the set of equations as a multiplication between the above matrix [(r_i)^j-1] and the vector [f₀(σ), f₁(σ), ..., f_n-1(σ)] [See here for review of matrix multiplication] since:

(27) But, the the set of equations is also the result of multiplying the vector[G,0,0,...,0] since:

(a) We have the following equation:

(b) From step #23, we have:

(28) Since the determinant in step #24 above is nonzero, the matrix is invertible (see Theorem 4, here) and we can multiply its inverse to both sides to get:

(29) From this, we can conclude that:

f₀ = G and f₁=0, ..., f_n-1=0. [See Definition 2, here]

(30) Since f₀ is an elementary symmetric polynomial (see step #18 above) and since f₀ has integer coefficients if G does (see step #19 above), the conclusion follows from mathematical induction (see here for review of mathematical induction).

QED

In my next blog, I will complete the proof by reviewing the formula for the determinant of the Vandermonde Matrix. In a future blog, I will show how this theorem can be used to derive Newton's identities.

References

Harold M. Edwards, Galois Theory, Springer, 1984.

Newton's Identities: Symmetric Polynomials

It is very easy to build an equation that has a certain set of solutions. If we wanted to build an equation where the solutions are 1,2,3,4,5 then all we need to do is multiply out the following:

(x - 1)(x - 2)(x - 3)(x -4)(x - 5)=0

If we carry out the multiplication, we can be sure that the result is an algebraic equation of power n=5 where we have the following:

ax⁵ + bx⁴ + cx³ + dx² + e = 0

if x = 1, 2, 3, 4, or 5. We can figure out a,b,c,d,e by carrying out the multiplication.

Let's generalize this idea. Let's assume that the roots are r₁, r₂, ..., r_n. Further, let's call each coefficient of the final result a₁, a₂, ..., a_n.

Then we have:

xⁿ + a₁x^n-1 + ... + a_n-1x + a_n = (x - r₁)*(x - r₂)*...*(x - r_n)

If xⁿ has a coefficient, then we can divide the entire equation by a₀ to get the above result.

We call a polynomial that results in this way, a symmetric polynomial. The idea is that we can switch any r_i with any r_j and it doesn't change the resulting polynomial.

Definition 1: Symmetric Polynomial

Let P(x₁, x₂, ..., x_n) be a polynomial that consists of n variables. P(x₁, x₂, ..., x_n) is said to be a symmetric polynomial if any two of these variables can be interchanged without changing the value of the polynomial.

Examples:

The following polynomials are symmetric.

P(x₁, x₂) = (x₁)³ + (x₂)³ - 7.

P(x₁, x₂) = 4x₁x₂

P(x₁, x₂, x₃) = x₁x₂x₃x₂ - 2x₁x₃ - 2x₂x₃

The following polynomial is not:

P(x₁,x₂) = x₁ - x₂

Now, looking at the above equation again:

xⁿ + a₁x^n-1 + ... + a_n-1x + a_n = (x - r₁)*(x - r₂)*...*(x - r_n)

We can see that:

a₁ = -r₁ + -r₂ + ... + -r_n

We know this since there are n ways to get a value of x^n-1. And they are:

-r₁*x*x*x*...*x -r₂*x*x*x*...*x - ... -r_n*x*x*x*...*x

We can also see that a_n = (-r₁)*(-r₂)*(-r₃)*...*(-r_n)

since this is only way to multiply the values together to get x⁰. We can see that there exactly C(n,0)=1 ways to include x⁰. There are C(n,1)=n ways to to include only x¹ and C(n,2) = ways to include x², etc.

Indeed, in each case we can see that a_i is the sum of C(n,i) different terms (see Lemma 3, here for details on C(n,r) if needed):

a₂ = (-r₁)*(-r₂) + (-r₁)*(-r₃) + ... + (-r_n-1)*(-r_n) a₃ = (-r₁)*(-r₂)*(-r₃) + ... + (-r_n-2)*(-r_n-1)*(-r_n) ... a_n-1 = (-r₁)*...*(-r_n-1) + (-r₂)*...*(-r_n)

Each of these terms is itself a polynomial. They are called the elementary symmetric polynomials σ_i.

Definition 2: Kth Elementary Symmetric Polynomials

A kth elementary symmetric polynomial is defined as:

So, we have:

σ_k = (-1)^ka_k where σ_k is called the kth elementary symmetric polynomial in r₁, ..., r_n

In my next blog, I will show that for every symmetric polynomial in r₁, ..., r_n can be expressed as an elementary symmetric polynomial in r₁, ..., r_n

References

Harold M. Edwards, Galois Theory, Springer, 1984.
"Symmetric Polynomials", Wikipedia.org
"Elementary Symmetric Polynomials", Wikipedia.org

Fermat's Last Theorem

Monday, February 19, 2007

Newton's Identities: Fundamental Theorem on Symmetric Polynomials

Newton's Identities: Symmetric Polynomials

Topic Index

Completed Proofs

Recommended Books

Required Reading for Experts

About Me

Blog Archive