Difference between revisions of "Chain Rule"
m |
(added a proof of multi-variable chain rule) |
||
| Line 55: | Line 55: | ||
This can be made into a rigorous proof. The standard proof of the multi-dimensional chain rule can be thought of in this way. | This can be made into a rigorous proof. The standard proof of the multi-dimensional chain rule can be thought of in this way. | ||
| + | |||
| + | |||
| + | == Proof == | ||
| + | |||
| + | |||
| + | Here's a proof of the multi-variable Chain Rule. It's kind of a "rigorized" version of the intuitive argument given above. | ||
| + | |||
| + | |||
| + | |||
| + | I'll use the following fact. Assume <math>F: \mathbb{R}^n \to \mathbb{R}^m</math>, and <math>x \in \mathbb{R}^n</math>. Then <math>F</math> is differentiable at <math>{x}</math> if and only if there exists an <math>m</math> by <math>n</math> matrix <math>M</math> such that the "error" function <math>{E_F(\Delta x)= F(x+\Delta x)-F(x)-M\cdot \Delta x}</math> has the property that <math>\frac{|E_F(\Delta x)|}{|\Delta x|}</math> approaches <math>0</math> as <math>\Delta x</math> approaches <math>0</math>. (In fact, this can be taken as a definition of the statement "<math>F</math> is differentiable at <math>{x}</math>.") If such a matrix <math>M</math> exists, then it is unique, and it is called <math>F'(x)</math>. Intuitively, the fact that <math>\frac{|E_F(\Delta x)|}{|\Delta x|}</math> approaches <math>0</math> as <math>\Delta x</math> approaches <math>0</math> just means that <math>F(x + \Delta x)-F(x)</math> is approximated well by <math>M \cdot \Delta x</math>. | ||
| + | |||
| + | |||
| + | |||
| + | |||
| + | Okay, here's the proof. | ||
| + | |||
| + | |||
| + | Let <math>g:\mathbb{R}^n \to \mathbb{R}^m</math> and <math>f:\mathbb{R}^m \to \mathbb{R}^p</math>. (Here each of <math>n</math>, <math>m</math>, and <math>{p}</math> is a positive integer.) Let <math>{h}: \mathbb{R}^n \to \mathbb{R}^p</math> such that <math>h(x) = f(g(x)) \forall x \in \mathbb{R}^n</math>. Let <math>x_0 \in \mathbb{R}^n</math>, and suppose that <math>g</math> is differentiable at <math>{x_0}</math> and <math>f</math> is differentiable at <math>g(x_0)</math>. | ||
| + | |||
| + | |||
| + | In the intuitive argument, we said that if <math>\Delta x</math> is "small", then <math>\Delta h = f(g(x_0+\Delta x))-f(g(x_0)) \approx f'(g(x_0))\cdot \Delta g</math>, where <math>\Delta g = g(x_0+\Delta x)-g(x_0)</math>. In this proof, we'll fix that statement up and make it rigorous. What we can say is, if <math>\Delta x \in \mathbb{R}^n</math>, then <math>\Delta h = f(g(x_0)+\Delta g)-f(g(x_0)) = f'(g(x_0))\cdot \Delta g + E_f(\Delta g)</math>, where <math>E_f:\mathbb{R}^m \to \mathbb{R}^p</math> is a function which has the property that <math>\lim_{\Delta g \to 0} \frac{|E_f(\Delta g)|}{|\Delta g|}=0</math>. | ||
| + | |||
| + | |||
| + | Now let's work on <math>\Delta g</math>. In the intuitive argument, we said that <math>\Delta g \approx g'(x_0)\cdot \Delta x</math>. In this proof, we'll make that rigorous by saying <math>\Delta g = g'(x_0)\cdot \Delta x + E_g(\Delta x)</math>, where <math>E_g:\mathbb{R}^n \to \mathbb{R}^m</math> has the property that <math>\lim_{\Delta x \to 0} \frac{|E_g(\Delta x)|}{\Delta x} = 0</math>. | ||
| + | |||
| + | |||
| + | Putting these pieces together, we find that | ||
| + | <math>\Delta h = f'(g(x_0))\Delta g + E_f(\Delta g)</math> | ||
| + | <math>= f'(g(x_0))\left(g'(x_0)\Delta x + E_g(\Delta x)\right) + E_f \left( g'(x_0)\Delta x + E_g(\Delta x) \right) </math> | ||
| + | <math>=f'(g(x_0))g'(x_0)\Delta x + f'(g(x_0))E_g(\Delta x) + E_f \left( g'(x_0)\Delta x + E_g(\Delta x) \right)</math> | ||
| + | <math>= f'(g(x_0))g'(x_0)\Delta x + E_h(\Delta x)</math>, where I have taken that messy error term and called it <math>E_h(\Delta x)</math>. | ||
| + | |||
| + | |||
| + | Now, we just need to show that <math>\frac{|E_h(\Delta x)|}{|\Delta x|} \to 0</math> as <math>\Delta x \to 0</math>, in order to prove that <math>h</math> is differentiable at <math>{x_0}</math> and that <math>h'(x_0) = f'(g(x_0))g'(x_0)</math>. | ||
| + | |||
| + | |||
| + | I believe we've hit a point where intuition no longer guides us. In order to finish off the proof, we just need to look at <math>E_h(\Delta x)</math> and play around with it a bit. It's not that bad. For the time being, I'll leave the rest of the proof as an exercise for the reader. (Hint: If <math>A</math> is an <math>m</math> by <math>n</math> matrix, then there exists a number <math>k > 0</math> such that <math>|Ax| \le k|x|</math> for all <math>x \in \mathbb{R}^n</math>.) | ||
| + | |||
| + | |||
== See also == | == See also == | ||
* [[Calculus]] | * [[Calculus]] | ||
Revision as of 23:42, 22 June 2006
Statement
Basically, the Chain Rule says that if
, then
.
For example, if
,
, and
, then
.
Here are some more precise statements for the single-variable and multi-variable cases.
Single variable Chain Rule:
Let each of
be an open interval, and suppose
and
. Let
such that
. If
,
is differentiable at
, and
is differentiable at
then
is differentiable at
, and
.
Multi-dimensional Chain Rule:
Let
and
. (Here each of
,
, and
is a positive integer.) Let
such that
. Let
. If
is differentiable at
, and
is differentiable at
then
is differentiable at
and
. (Here, each of
,
, and
is a matrix.)
Intuitive Explanation
The single-variable Chain Rule is often explained by pointing out that
.
The first term on the right approaches
, and the second term on the right approaches
, as
approaches
. This can be made into a rigorous proof. (But we do have to worry about the possibility that
, in which case we would be dividing by
.)
This explanation of the chain rule fails in the multi-dimensional case, because in the multi-dimensional case
is a vector, as is
, and we can't divide by a vector.
However, there's another way to look at it.
Suppose a function
is differentiable at
, and
is "small". Question: How much does
change when its input changes from
to
? (In other words, what is
?) Answer: approximately
. This is true in the multi-dimensional case as well as in the single-variable case.
Well, suppose that (as above)
, and
is "small", and someone asks you how much
changes when its input changes from
to
. That is the same as asking how much
changes when its input changes from
to
, which is the same as asking how much
changes when its input changes from
to
, where
. And what is the answer to this question? The answer is: approximately,
.
But what is
? In other words, how much does
change when its input changes from
to
? Answer: approximately
.
Therefore, the amount that
changes when its input changes from
to
is approximately
.
We know that
is supposed to be a matrix (or number, in the single-variable case) such that
is a good approximation to
. Thus, it seems that
is a good candidate for being the matrix (or number) that
is supposed to be.
This can be made into a rigorous proof. The standard proof of the multi-dimensional chain rule can be thought of in this way.
Proof
Here's a proof of the multi-variable Chain Rule. It's kind of a "rigorized" version of the intuitive argument given above.
I'll use the following fact. Assume
, and
. Then
is differentiable at
if and only if there exists an
by
matrix
such that the "error" function
has the property that
approaches
as
approaches
. (In fact, this can be taken as a definition of the statement "
is differentiable at
.") If such a matrix
exists, then it is unique, and it is called
. Intuitively, the fact that
approaches
as
approaches
just means that
is approximated well by
.
Okay, here's the proof.
Let
and
. (Here each of
,
, and
is a positive integer.) Let
such that
. Let
, and suppose that
is differentiable at
and
is differentiable at
.
In the intuitive argument, we said that if
is "small", then
, where
. In this proof, we'll fix that statement up and make it rigorous. What we can say is, if
, then
, where
is a function which has the property that
.
Now let's work on
. In the intuitive argument, we said that
. In this proof, we'll make that rigorous by saying
, where
has the property that
.
Putting these pieces together, we find that
, where I have taken that messy error term and called it
.
Now, we just need to show that
as
, in order to prove that
is differentiable at
and that
.
I believe we've hit a point where intuition no longer guides us. In order to finish off the proof, we just need to look at
and play around with it a bit. It's not that bad. For the time being, I'll leave the rest of the proof as an exercise for the reader. (Hint: If
is an
by
matrix, then there exists a number
such that
for all
.)