Fermat's Last Theorem: 2009-02-01

In a previous entry, I posted the proof of Sturm's Theorem. This is a method for determining the number of real roots in a given interval.

Today, I will show some examples of its use.

Example 1: f(x) = x⁵ - 3x - 1 in the interval [-2, +2]

The Sturm Chain for this polynomial is:

f₀ = x⁵ - 3x - 1

f₁ = 5x⁴ - 3

f₂ = 12x + 5

f₃ = 1

For x=-2, there are 3 sign changes.

For x=-1, there are 2 sign changes

For x=0, there is 1 sign change

For x=1, there is 1 sign change

For x=2, there is 0 sign changes

So, between -2 and -1, there is 3-2=1 real zero

Between -1 and 0, there is 2-1=1 real zero

Between 0 and 1, there are no 1-1=0 real zeros.

Between 1 and 2, there is 1-0=1 real zero.

In summary, between -2 and +2, there are 3-0=3 real zeros.

Example 2: x⁵ -ax -b when a,b are positive and 4⁴a⁵ is greater than 5⁵b⁴

The Sturm Chain for this polynomial is:

f₀ = x⁵ -ax -b

f₁ = 5x⁴ - a

f₂ = 4ax + 5b

f₃ = 4⁴a⁵ - 5⁵b⁴

For this example, let's look at the interval between -∞ and +∞

For -∞, there are 3 sign changes.

For +∞, there are 0 sign changes.

So, all equations of this form have 3-0=3 real roots.

References

Heinrich Dorrie (Translated by David Antin), 100 Great Problems of Elementary Mathematics (Dover, 1965)

In a previous blog entry, I talked about the properties of Sturm Chains. In today's blog entry, I will show how they can be used to establish Sturm's Theorem.

Definition 1: Number of Sign Changes Across a Sturm Chain at x

Let f₀, f₁, ..., f_s be a Sturm Chain where for all 0 ≤ i ≤ s, f_i ≠ 0. The number of sign changes across a Sturm Chain at x is the number of times that a neighboring function changes sign: the number of times when f_i(x) is negative and f_i+1(x) is positive or when f_i(x) is negative and f_i+1(x) is positive.

Example:

For the function f(x) = x⁵ - 3x - 1, the Sturm Chain is (see here for details on how this Sturm Chain is constructed):

f₀(x) = x⁵ - 3x - 1

f₁(x) = 5x⁴ - 3

f₂(x) = 12x + 5

f₃(x) = 1

For x=-2, we see that:

f₀ = -

f₁ = +

f₂ = -

f₃ = +

So, the number of sign changes across the Sturm Chain at x=-2 is 3.

Definition 2: Open and Closed Intervals

An interval is open on a,b if for all x in a,b, a is less than x and is less than b. An open interval is represented (a,b). [It is called open because there is no minimum or maximum value]

An interval is closed on a,b if for all x in a,b, a ≤ x ≤ b. A closed interval is represented [a,b] (It is called closed because it has both a minimum and a maximum value)

We can also mix the two notations so that x in [a,b) means that a ≤ x and x is less than b.

x in (a,b] means that a is less than x ≤ b.

I use this notation below.

Lemma 1:

Let f be a continuous function such that f(a) and f(b) have different signs,

Then:

There exists x such that x in (a,b) and f(x)=0

Proof:

This follows directly from the Intermediate Value Theorem (see Theorem, here).

QED

Corollary 1.1:

If f is continuous on [a,b] and for all x in [a,b]:

f(x) ≠ 0

Then:

for all x in [a,b]:

f(x) has the same sign.

Proof:

(1) Assume that f changes sign over [a,b]

(2) Then by Lemma 1 above there exists x such that f(x)=0

(3) But this is not true so we reject our assumption at step #1.

QED

Corollary 1.2:

Let f₀, f₁, ..., f_s be a Sturm Chain

Let k be the only zero for any f_i and it is a zero where i ≥ 1.

Let a,b be an interval such that k is in [a,b]

Then:

There are no sign changes across the Sturm Chain for x in [a,b]

Proof:

(1) For all x in [a,b]:

f_i-1(x) ≠ 0 and f_i+1(x) ≠ 0 and if x ≠ k, then f_i(k) ≠ 0.

(2) Since f_i(k)=0, we know that f_i-1(k) and f_i+1(k) have opposite signs. (see for properties of Sturm Chains, see Definition 1, here)

(3) Since f_i-1 and f_i+1 are nonzero on [a,b], we know from Corollary 1.1 above that f_i-1 and f_i+1 have the same opposite signs for all x in [a,b].

(4) But this means that any sign change on f_i doesn't affect the total number of sign changes from f_i-1 to f_i to f_i+1 since:

Before k:

There are the following possibilities for all f_i-1, f_i, f_i+1

+, +, - = 1 sign change
+, -, - = 1 sign change
-, +, + = 1 sign change
-, -, + = 1 sign change

After k:

There are the following possibilities for f_i-1, f_i, f_i+1

+, -, - = 1 sign change
+, +, - = 1 sign change
-, -, + = 1 sign change
-,+,+ = 1 sign change

(5) The above property is true even if there are multiple Sturm functions that are zero at k. The key point is that by the properties of Sturm Chains, we know that no neighboring functions are both 0 (see here for details on the properties of a Sturm Chain).

(6) It is possible that f_i, f_i+2, etc. are all 0 on k. But even in this case, the above logic holds and there are no sign changes for any of the Sturm functions.

(5) Because in all other cases, the Sturm functions are nonzero, it follows from Corollary 1.1 above that they don't change sign and we are done.

QED

Corollary 1.3:

Let f₀, f₁, ..., f_s be a Sturm Chain

Let [a,b] be an interval such that for all x in [a,b]:

f₀(x) ≠ 0 (but the other Sturm functions may have a zero).

Let Z(x) be the number of sign changes that occur across a Sturm Chain for a given x (see Definition 1 above)

Then:

There are no sign changes over the Sturm Chain in [a,b]. That is, Z(a) - Z(b) = 0.

Proof:

(1) Let f₀, f₁, ..., f_s be a Sturm Chain

(2) Assume that k is the only zero in [a,b] for any function in the Sturm Chain where i ≥ 1.

(3) In this case, the Theorem holds using Corollary 1.2 above.

(4) Assume that we order all zeros in [a,b] for any function in the Sturm Chain and up to the nth zero k, there is no sign change across the Sturm Chain.

(7) Let k be the nth zero in [a,b] and k' be the n+1th zero in [a,b] and k'' be the n+2th zero. Let i,i',i'' be the function that is zero so that we have: f_i(k)=0, f_i'(k')=0, and f_i''(k'') = 0.

(8) Since k' is the only zero on (k,k''), we can use Corollary 1.2 above to complete the inductive proof.

QED

Lemma 2:

Let f₀, f₁, ..., f_s be a Sturm Chain

Let k be the only zero for f₀ in [a,b]

Let Z(x) be the number of sign changes that occur across a Sturm Chain for a given x.

Then:

Z(a) - Z(b) = 1.Proof:

(1) Let h be a point before k and l be a pointer after k such that f'(h) has the same sign as f'(k) and as f'(l). (See Property 3 of Sturm Chains in Definition 1, here)

(4) We have the following cases to consider:

Case I: f(h) is positive and f(l) is negative

(a) For all x in (h,l):

f(x) is decreasing, so its derivative f'(x) is negative. [See Lemma here if needed]

(b) So at h:

f₀(h)=f(h) is positive and f₁(h)=f'(h) is negative.

(c) At l:

f₀(l) is negative and f₁(l) is negative.

(d) So, Z(h) - Z(l) = 1.

Case II: f(h) is negative and f(l) is positive

(a) For all x in (h,l):

f(x) is increasing, so its derivative f'(x) is positive. [See Lemma here if needed]

(b) So at h:

f₀(h) is negative and f₁(h) is positive.

(c) At l:

f₀(l) is positive and f₁(l) is positive.

(d) So, again, Z(h) - Z(l) = 1.

QED

Theorem: Sturm's Theorem

Let f(x) be an algebraic equation with real coefficients and with only simple roots.

Let (a,b) be an interval such that f(a) ≠ 0 and f(b) ≠ 0 and a is less than b.

Let Z(x) be the number of sign changes that occur across a Sturm Chain for a given x (see Definition 1 above if needed).

Then:

The number of real roots occurring in (a,b) is equal to Z(a) - Z(b)

Proof:

(1) Let f₀, f₁, ..., f_s be a Sturm Chain for f(x) where f₀ = f. [See Lemma 1, here]

(2) Let [a,b] be an interval such that a is less than b and a,b are real numbers.

(3) Let Z(x) be the number of the sign changes across the Sturm Chain for a given x.

(4) There are three cases that we need to consider to prove this theorem.

Case I: No zeros for any function in the Sturm Chain

(a) To prove the theorem for this case, we need to show that Z(a) - Z(b) = 0.

(b) For all x in [a,b], for all 0 ≤ i ≤ s, f_i(x) ≠ 0

(c) But then by Corollary 1.1 above, for f_i, there is no sign changes for any function in the Sturm Chain.

(d) So, Z(a) - Z(b)=0

Case II: No zeros in (a,b) for f₀(x) but at least one zero for the other functions in the Sturm Chain.

This case is equivalent to Corollary 1.3 above.

Case III: At least one zero in (a,b) for f(x)

(a) ∃x such that a ≤ x ≤ b and f₀(x) = 0

(b) To prove this, I will use induction.

(c) Assume that there is only one such zero in [a,b]

(d) Using Lemma 2 above, we know that in this case, Z(a) - Z(b) = 1.

(e) Assume that the theorem is true up to the nth zero of f in [a,b]

(f) Let k be the nth zero, k' be the n+1th zero and k'' be the n+2th zero.

(g) Then there is only zero in (k,k'') which is located at k'.

(h) Let l be a point in (k,k') and l' be a point in (k',k'').

(i) Then, by Lemma 2 above Z(l) - Z(l') = 1.

(j) From step e above, we have Z(a) - Z(l) = n and from step i above we have Z(a) - Z(l') = n+1.

QED

References

Heinrich Dorrie (Translated by David Antin), 100 Great Problems of Elementary Mathematics (Dover, 1965)

Fermat's Last Theorem

Saturday, February 07, 2009

Sturm's Theorem: Examples

Friday, February 06, 2009

Sturm's Theorem: The Proof

Tuesday, February 03, 2009

Cauchy's Bound for Real Roots

Topic Index

Completed Proofs

Recommended Books

Required Reading for Experts

About Me

Blog Archive