Saturday, February 07, 2009

Sturm's Theorem: Examples

In a previous entry, I posted the proof of Sturm's Theorem. This is a method for determining the number of real roots in a given interval.

Today, I will show some examples of its use.

Example 1: f(x) = x5 - 3x - 1 in the interval [-2, +2]

The Sturm Chain for this polynomial is:

f0 = x5 - 3x - 1

f1 = 5x4 - 3

f2 = 12x + 5

f3 = 1

For x=-2, there are 3 sign changes.

For x=-1, there are 2 sign changes

For x=0, there is 1 sign change

For x=1, there is 1 sign change

For x=2, there is 0 sign changes

So, between -2 and -1, there is 3-2=1 real zero

Between -1 and 0, there is 2-1=1 real zero

Between 0 and 1, there are no 1-1=0 real zeros.

Between 1 and 2, there is 1-0=1 real zero.

In summary, between -2 and +2, there are 3-0=3 real zeros.

Example 2: x5 -ax -b when a,b are positive and 44a5 is greater than 55b4

The Sturm Chain for this polynomial is:

f0 = x5 -ax -b

f1 = 5x4 - a

f2 = 4ax + 5b

f3 = 44a5 - 55b4

For this example, let's look at the interval between -∞ and +∞

For -∞, there are 3 sign changes.

For +∞, there are 0 sign changes.

So, all equations of this form have 3-0=3 real roots.

References

Friday, February 06, 2009

Sturm's Theorem: The Proof

In a previous blog entry, I talked about the properties of Sturm Chains. In today's blog entry, I will show how they can be used to establish Sturm's Theorem.

Definition 1: Number of Sign Changes Across a Sturm Chain at x

Let f0, f1, ..., fs be a Sturm Chain where for all 0 ≤ i ≤ s, fi ≠ 0. The number of sign changes across a Sturm Chain at x is the number of times that a neighboring function changes sign: the number of times when fi(x) is negative and fi+1(x) is positive or when fi(x) is negative and fi+1(x) is positive.

Example:

For the function f(x) = x5 - 3x - 1, the Sturm Chain is (see here for details on how this Sturm Chain is constructed):

f0(x) = x5 - 3x - 1

f1(x) = 5x4 - 3

f2(x) = 12x + 5

f3(x) = 1

For x=-2, we see that:

f0 = -

f1 = +

f2 = -

f3 = +

So, the number of sign changes across the Sturm Chain at x=-2 is 3.

Definition 2: Open and Closed Intervals

An interval is open on a,b if for all x in a,b, a is less than x and is less than b. An open interval is represented (a,b). [It is called open because there is no minimum or maximum value]

An interval is closed on a,b if for all x in a,b, a ≤ x ≤ b. A closed interval is represented [a,b] (It is called closed because it has both a minimum and a maximum value)

We can also mix the two notations so that x in [a,b) means that a ≤ x and x is less than b.

x in (a,b] means that a is less than x ≤ b.

I use this notation below.

Lemma 1:

Let f be a continuous function such that f(a) and f(b) have different signs,

Then:

There exists x such that x in (a,b) and f(x)=0

Proof:

This follows directly from the Intermediate Value Theorem (see Theorem, here).

QED

Corollary 1.1:

If f is continuous on [a,b] and for all x in [a,b]:

f(x) ≠ 0


Then:

for all x in [a,b]:

f(x) has the same sign.

Proof:

(1) Assume that f changes sign over [a,b]

(2) Then by Lemma 1 above there exists x such that f(x)=0

(3) But this is not true so we reject our assumption at step #1.

QED

Corollary 1.2:

Let f0, f1, ..., fs be a Sturm Chain

Let k be the only zero for any fi and it is a zero where i ≥ 1.

Let a,b be an interval such that k is in [a,b]

Then:

There are no sign changes across the Sturm Chain for x in [a,b]

Proof:

(1) For all x in [a,b]:

fi-1(x) ≠ 0 and fi+1(x) ≠ 0 and if x ≠ k, then fi(k) ≠ 0.

(2) Since fi(k)=0, we know that fi-1(k) and fi+1(k) have opposite signs. (see for properties of Sturm Chains, see Definition 1, here)

(3) Since fi-1 and fi+1 are nonzero on [a,b], we know from Corollary 1.1 above that fi-1 and fi+1 have the same opposite signs for all x in [a,b].

(4) But this means that any sign change on fi doesn't affect the total number of sign changes from fi-1 to fi to fi+1 since:

Before k:

There are the following possibilities for all fi-1, fi, fi+1

+, +, - = 1 sign change
+, -, - = 1 sign change
-, +, + = 1 sign change
-, -, + = 1 sign change

After k:

There are the following possibilities for fi-1, fi, fi+1

+, -, - = 1 sign change
+, +, - = 1 sign change
-, -, + = 1 sign change
-,+,+ = 1 sign change

(5) The above property is true even if there are multiple Sturm functions that are zero at k. The key point is that by the properties of Sturm Chains, we know that no neighboring functions are both 0 (see here for details on the properties of a Sturm Chain).

(6) It is possible that fi, fi+2, etc. are all 0 on k. But even in this case, the above logic holds and there are no sign changes for any of the Sturm functions.

(5) Because in all other cases, the Sturm functions are nonzero, it follows from Corollary 1.1 above that they don't change sign and we are done.

QED

Corollary 1.3:

Let f0, f1, ..., fs be a Sturm Chain

Let [a,b] be an interval such that for all x in [a,b]:

f0(x) ≠ 0 (but the other Sturm functions may have a zero).

Let Z(x) be the number of sign changes that occur across a Sturm Chain for a given x (see Definition 1 above)

Then:

There are no sign changes over the Sturm Chain in [a,b]. That is, Z(a) - Z(b) = 0.

Proof:

(1) Let f0, f1, ..., fs be a Sturm Chain

(2) Assume that k is the only zero in [a,b] for any function in the Sturm Chain where i ≥ 1.

(3) In this case, the Theorem holds using Corollary 1.2 above.

(4) Assume that we order all zeros in [a,b] for any function in the Sturm Chain and up to the nth zero k, there is no sign change across the Sturm Chain.

(7) Let k be the nth zero in [a,b] and k' be the n+1th zero in [a,b] and k'' be the n+2th zero. Let i,i',i'' be the function that is zero so that we have: fi(k)=0, fi'(k')=0, and fi''(k'') = 0.

(8) Since k' is the only zero on (k,k''), we can use Corollary 1.2 above to complete the inductive proof.

QED

Lemma 2:

Let f0, f1, ..., fs be a Sturm Chain

Let k be the only zero for f0 in [a,b]

Let Z(x) be the number of sign changes that occur across a Sturm Chain for a given x.

Then:

Z(a) - Z(b) = 1.

Proof:

(1) Let h be a point before k and l be a pointer after k such that f'(h) has the same sign as f'(k) and as f'(l). (See Property 3 of Sturm Chains in Definition 1, here)

(4) We have the following cases to consider:

Case I: f(h) is positive and f(l) is negative

(a) For all x in (h,l):

f(x) is decreasing, so its derivative f'(x) is negative. [See Lemma here if needed]

(b) So at h:

f0(h)=f(h) is positive and f1(h)=f'(h) is negative.

(c) At l:

f0(l) is negative and f1(l) is negative.

(d) So, Z(h) - Z(l) = 1.

Case II: f(h) is negative and f(l) is positive

(a) For all x in (h,l):

f(x) is increasing, so its derivative f'(x) is positive. [See Lemma here if needed]

(b) So at h:

f0(h) is negative and f1(h) is positive.

(c) At l:

f0(l) is positive and f1(l) is positive.

(d) So, again, Z(h) - Z(l) = 1.

QED

Theorem: Sturm's Theorem

Let f(x) be an algebraic equation with real coefficients and with only simple roots.

Let (a,b) be an interval such that f(a) ≠ 0 and f(b) ≠ 0 and a is less than b.

Let Z(x) be the number of sign changes that occur across a Sturm Chain for a given x (see Definition 1 above if needed).

Then:

The number of real roots occurring in (a,b) is equal to Z(a) - Z(b)

Proof:

(1) Let f0, f1, ..., fs be a Sturm Chain for f(x) where f0 = f. [See Lemma 1, here]

(2) Let [a,b] be an interval such that a is less than b and a,b are real numbers.

(3) Let Z(x) be the number of the sign changes across the Sturm Chain for a given x.

(4) There are three cases that we need to consider to prove this theorem.

Case I: No zeros for any function in the Sturm Chain

(a) To prove the theorem for this case, we need to show that Z(a) - Z(b) = 0.

(b) For all x in [a,b], for all 0 ≤ i ≤ s, fi(x) ≠ 0

(c) But then by Corollary 1.1 above, for fi, there is no sign changes for any function in the Sturm Chain.

(d) So, Z(a) - Z(b)=0

Case II: No zeros in (a,b) for f0(x) but at least one zero for the other functions in the Sturm Chain.

This case is equivalent to Corollary 1.3 above.

Case III: At least one zero in (a,b) for f(x)

(a) ∃x such that a ≤ x ≤ b and f0(x) = 0

(b) To prove this, I will use induction.

(c) Assume that there is only one such zero in [a,b]

(d) Using Lemma 2 above, we know that in this case, Z(a) - Z(b) = 1.

(e) Assume that the theorem is true up to the nth zero of f in [a,b]

(f) Let k be the nth zero, k' be the n+1th zero and k'' be the n+2th zero.

(g) Then there is only zero in (k,k'') which is located at k'.

(h) Let l be a point in (k,k') and l' be a point in (k',k'').

(i) Then, by Lemma 2 above Z(l) - Z(l') = 1.

(j) From step e above, we have Z(a) - Z(l) = n and from step i above we have Z(a) - Z(l') = n+1.

QED

References

Tuesday, February 03, 2009

Cauchy's Bound for Real Roots

Augustin-Louis Cauchy came up with a useful bound for real roots in a polynomial equation. This works well with Sturm's Theorem.

Theorem: Cauchy's Bound for Real Roots

Let f(x) = anxn + an-1xn-1 + ... + a0

Let c be a root of f(x) such that f(c)=0

Then

abs(c) ≤ 1 + {abs(an-1 + ... + abs(a0)}/abs(an)

Proof:

(1) Assume that abs(c) is greater than 1. [The alternative is true since 1 + abs(x) ≥ 1]

(2) Since c is a root, then ancn + an-1cn-1 + ... + a0 = 0, and we have:

ancn = -an-1cn-1 + .... + -a0

(3) Using a basic inequality (see Lemma 2, here), we have:

abs(an)*abs(c)n ≤ abs(an-1)*abs(c)n-1 + ... + abs(a0)

(4) Let H = max{abs(an-1), ..., abs(a0) }

(5) So,

abs(an)*abs(c)n ≤ H(abs(c)n-1 + ... + 1)

(6) Since (abs(c)-1)*(abs(c)n-1 + ... + 1) = abs(c)n - 1 and since abs(c) is greater than 1, it follows that:

H(abs(c)n-1 + ... + 1) = H[abs(c)n - 1]/[abs(c) - 1]

and therefore:

abs(an)*abs(c)n ≤ H[abs(c)n]/[abs(c) - 1]

(7) So, we can state:

abs(an) ≤ H/[abs(c) - 1]

(8) Since abs(an) and abs(c)-1 are positive, we can also state:

abs(c) - 1 ≤ H/abs(an)

and finally,

abs(c) ≤ 1 + H/abs(an)

QED