2.2 Integration and Differentiation
2.2.1 Lebesgue integral
An usual way to define Lebesgue integral is from simple function to non-negative function by approximation property, then to a general function by an easy decomposition. Let us start from simple function. Assume \(\phi=\sum_\limits{i=1}^k a_iI_{A_i}\), \(A_i\)’s are disjoint. Then its integral with respect to measure \(\nu\) is \[\int \phi d\nu=\sum_\limits{i=1}^k a_i\nu(A_i).\] Clearly \(A_i\) is required to be measruable which is equivalent to say \(\phi\) is measurable. It can be seen such integration concept comes from “partition the range” while the Riemann integration comes from partition of the domain. This is also shown in the construction of the approximation property. In particular, we define \(a\infty=0\) when \(a=0\) to deal with some special circumstances. For non-negative function we have two equivalent definition of integration.
Definition 2.6 Let \(f\) be a non-negative Borel function and define its integral to be \[\int f d\nu=\underset{\phi\in S_f}{\mbox{sup}} \int \phi d\nu,\] where \(S_f\) is the collection of all non-negative simple function satisfying \(\phi(\omega) \leq f(\omega)\) for any \(\omega \in \Omega\).
Another definition may be more suitable for operation, which comes from the well known Monotone Convergence Theorem.
Definition 2.7 Let \(f\) be a non-negative Borel function and define its integral to be \[\int f d\nu=\lim_{n \to \infty} \int f_n d\nu,\] where \(0 \leq f_n \uparrow f\) for \(f_n\) is simple function for all \(n\).
For general function \(f\), its integral is defined as \[\int f d\nu=\int f_{+} d\nu-\int f_{-} d\nu,\] we say this integral exists if and only if both integral on the right hand side are finite. Furthermore, we say \(f\) is integrable if both integral are finite. Clearly, we have \(f\) is integrable if and only if \(|f|\) is since \(|f|=f_{+}+f_{-}\).
Below are some basic proposition:
Proposition 2.3 Let \(f\) ang \(g\) are Borel function. Then
- If \(f \leq g\) and \(a \in \mathbb{R}\), then \(\int (af)\, d\nu\) exists and is equal to \(a\int f \, d\nu\).
- If both \(\int f \, d\nu\) and \(\int g \, d\nu\) exist and \(\int f \, d\nu+\int g \, d\nu\) is well defined (not \(\infty-\infty\)), then \(\int (f+g) \, d\nu\) exists and is eual to \(\int f \, d\nu+\int g \, d\nu\).
- If \(f \leq g\) a.e., then \(\int f \, d\nu \leq \int g \, d\nu\) if the integrals exist.
- If \(f \geq 0\) a.e. and \(\int f d\nu =0\), then \(f=0\) a.e.
- \(\nu(A)=0\) implies that \(\int_A f d\nu =0\) where \(\int_A f d\nu := \int fI_A d\nu\).
Here we also recall some classic theorem about limit and integral without proof in the next proposition.
Proposition 2.4 Let \(f_1, f_2,\cdots,\) be a sequence of Borel functions on \((\Omega,\cal F,\nu)\).
- (Monotone convergence theorem). If \(0 \leq f_1 \leq f_2 \leq \cdots\) and \(\lim_\limits{n \to \infty} f_n=f\) a.e., then \(\int \lim_\limits{n \to \infty} f_n d\nu=\lim_\limits{n \to \infty} \int f_n d\nu\).
- (Dominated convergence theorem). If \(\lim_\limits{n \to \infty} f_n=f\) a.e. and there exists an integrable function \(g\) such that \(|f_n| \leq g\) a.e., then \(\int \lim_\limits{n \to \infty} f_n d\nu=\lim_\limits{n \to \infty} \int f_n d\nu\).
- (Fatous’s lemma). If \(f_n \geq 0\), then \(\int \lim_\limits{n \to \infty} f_n d\nu=\lim_\limits{n \to \infty} \int f_n d\nu\).
Example 2.2 Here we consider the interchange of differentiation and integration. That is, for fixed \(\theta \in \mathbb{R}\), let \(f(\omega, \theta)\) be a Borel function on \((\Omega,\cal F,\nu)\). Assume that \(\partial f(\omega, \theta)/\partial \theta\) exists a.e. for \(\theta \in (a,b) \subset \mathbb{R}\) and that \(|\partial f(\omega, \theta)/\partial \theta| \leq g(\omega)\) a.e., where \(g\) is an integrable function on \(\Omega\). Then for each \(\theta \in (a,b)\), \(\partial f(\omega, \theta)/\partial \theta\) is integrable and by mean value theorem and Dominated convergence theorem, we have \[\frac{d}{d\theta} \int f(\omega, \theta) d\nu= \int (\partial f(\omega, \theta)/\partial \theta) \, d\nu.\]
Example 2.3 Consider the moment generating function of a random variable \(X\), \(M(t)\) on a finite interval \((a,b)\). By the above example and the fact that \(|x|e^{t_0dx} \leq c_{+} e^{(t_0+\delta) x}+c_{-} e^{(t_0-\delta) x}\), where \(c_{+}=\underset{x \geq 0}{\max} \frac{|x|e^{tx}}{e^{(t_0+\delta)x} }\) and \(c_{-}=\underset{x \leq 0}{\max} \frac{|x|e^{tx}}{e^{(t_0-\delta)x} }\) for some \(t_0 \in (a,b)\), then we have \(M'(t)=E(Xe^{tX})\).
Theorem 2.2 (Change of variables) Let \(f\) be measurable from \((\Omega,\cal F,\nu)\) to \((\Lambda,\cal G)\) and \(g\) be Borel on \((\Lambda,\cal G)\). Then \[\int_{\Omega} g(f(\omega)) d\nu(\omega)= \int_{\Lambda} g(x) d(\nu \circ f^{-1})(x),\] where \(\nu \circ f^{-1}(B):= \nu(f^{-1}(B))\) for \(B \in \cal G\).
Consider an easy case, let \(g=\sum_\limits{i=1}^k c_iI_{A_i}\) for \(A_1,\cdots, A_k\) disjoint. Then the right hand side is equal to \(\sum_\limits{i=1}^k c_i\, \nu\circ f^{-1}(A_i)\) and let \(B_i=f^{-1}(A_i)\), then the equality holds clearly.
An important application of the theorem is that for random variable X with distribution \(P \circ X^{-1}\), we have \[ E(g(X))=\int g(X(\omega)) dP(\omega)=\int g(x) dP\circ X^{-1}(x).\] Also, by the uniqueness theorem of measure, \(P\circ X^{-1}\) coincides with the cumulative density function \(F(x):=P(X\leq x)\). We also denote \(P\circ X^{-1}\) as \(P_X\), the distribution of \(X\). Below we consider the interchange of integration which is known as Fubini’s Theorem.
Theorem 2.3 (Fubini Theorem) Let \(\nu_i\) be \(\sigma\)-finite measure on \((\Omega_i,\cal F_i)\) for \(i=1,2\), and let \(f\) be Borel function on the product \(\sigma\) algebra. Suppose \(f \geq 0\) (w.r.t. Tonelli’s theorem) or \(f\) integrable with respect to \(\nu_1 \times \nu_2\). Then \[\int_{\Omega_1} f(\omega_1,\omega_2) d\nu_1\] exists \(\nu_2\)-a.e. and is a Borel function on \(\Omega_2\) whose integral with respect to \(\nu_2\) exists, and \[\int_{\Omega_1 \times \Omega_2} f(\omega_1,\omega_2) d(\nu_1\times \nu_2)=\int_{\Omega_2}(\int_{\Omega_1} f(\omega_1,\omega_2) d\nu_1) d\nu_2.\]
The “\(\sigma\)-finite” condition on the measures is assumed for the uniqueness of \(\nu_1 \times \nu_2\). It can be seen necessary if we consider the interchange of integral of an indicator function on which the set is an arbitrary set in the product field.
In the following We discuss the coincidence of Riemann integral and Lebesgue integral. An easy result can be shown on finite interval \((a,b)\) that Riemann integrability implies Lebesgue integrability and the values of the integrals coincides. That is, \[\int_{(a,b)} f d\lambda=\int_a^b f(x) dx. \] Furthermore, we can extend the result to the domain like \((0,\infty)\) for a positive function \(f\) by the monotone convergence theorem.
We end this chapter by the two classical examples. The first one is used for calculating the expectation of a positvie random variable.
Example 2.4 Combining the result of Fubini’s Theorem and the above consequence, we can show that for \(X> 0\), \[E(X)=\int_0^{\infty}(1-F(t)) \, dt.\] Let the \(\lambda\) be Lebesgue measure, the RHS can be written as \(\int_{(0,\infty)} \int I_{(t, \infty)}(x) dP_X(x) \, d\lambda(t)\), then the result follows by applying Fubini’s Theorem.
Example 2.5 For function \(f \geq 0\) taken values on \(\Omega=(\omega_1,\omega_2,\cdots)\), by monotone convergence theorem and the fact that \(f=\lim_\limits{n\to \infty} \sum_\limits{i=1}^n f(\omega_i)\) we have \(\int f d\nu= \sum_i f(\omega_i)\nu(\omega_i)\).
2.2.2 Radon-Nikodym Derivative
Definition 2.8 (absolutely continuous) Given two measures \(\mu,\nu\) on \((\Omega, \cal F)\), we say \(\mu\) is absolutely continuous with respect to \(\nu\), denoted by \(\mu \ll \nu\), if \(\nu(A)=0\) implies \(\mu(A)=0\) for any \(A \in \cal F\).
Theorem 2.4 (Rando-Nikodym Theorem) Given two measures \(\mu,\nu\) on \((\Omega, \cal F)\) and \(\nu\) is \(\sigma\)-fintie. If \(\mu \ll\nu\), then there exists a nonnegative Borel function \(f\), which is unique \(\nu\)-a.e. on \(\Omega\) such that \[\mu(A)=\int_A f d\nu.\]
Such function \(f\) is called a Rando-Nikodym derivative or density and is denote by \(\frac{d\mu}{d\nu}\). A function \(f\geq 0\) \(\nu\)-a.e. is called probabiilty density function (p.d.f.) w.r.t a probabiity measure \(\mu\) if \(\int f d\nu=1\). A discrete p.d.f. is a p.d.f w.r.t counting measure and a Lebesgue p.d.f corresponds to Lebesgue measure. A sufficient and necessary condition for a c.d.f. \(F\) having a Lebesgue p.d.f is that \(F\) is absolutely continuous. Below we consider a special case that both Lebesgue p.d.f and discrete p.d.f cannot be well defined.
Example 2.6 Suppose that \(Z\) is a standard normal r.v. and \(X=ZI_{[1,\infty]}(Z)\), clearly \(X\) has no Lebesgue density since \(P_X({0})\neq 0\). Let \(\mu\) be the probability measure on \((\mathbb{R},\cal B)\) such that \(\delta_0(A)=I_A(0)\). First we claim that \(P_X \ll \delta_0+\lambda\) since for any set \(A\in \cal F\), \((\delta_0+\lambda)(A)=0\) implies \(0 \notin A\) and \(\lambda(A)=0\). Then \(P_X(A)=P(Z\geq 1,Z \in A)+P(Z<1,Z \in A)\stackrel{0\notin A}=P(Z\geq 1,Z \in A)\stackrel{\lambda(A)=0}=0,\) which proves the claim. Secondly, we would like to find out the Radon-Nikodym derivative \(\frac{dP_X}{d(\delta_0+\lambda)}\). The density with respect to \((\delta_0+\lambda)\) can be written as \(\Phi(0)I_{\{0\}}(x)+\frac{e^{-\frac{x^2}{2}}}{\sqrt{2\pi}}I_{[1,\infty)}(x)\), where \(\Phi(x)\) is the c.d.f of standard normal.
Remark. Firstly, It can be noticed that \[\int_A f\, d\delta_0=\int f(0)I_{A}(x)\, d\delta_0(x)=f(0)I_A(0).\] The first equality holds since \(fI_{A}(x)=f(0)I_{A}(x)\) \(\delta_0\)-a.e.
Secondly, for the “overlapping case” such as consider the Randon-Nykodym derivative of \(ZI_{[1,\infty)}(z)+2I_{(0,1)}(z)\), the density is \(\phi(x)I_{(1,\infty)\setminus \{2\}}+P(X=2)I_2(x)+P(X=0)I_0(x)\). It is important that the set w.r.t first component has to consider minusing \(\{2\}\).
Below we list some propositions regarding to Randon-Nykodym derivative.
Proposition 2.5 Let \(\nu\) be a \(\sigma\)-finite measure on a measure space \((\Omega,\cal F)\). Then
- If \(\lambda\) is a measure, \(\lambda \ll \nu\), and \(f\geq 0\), then \[\int f d\lambda=\int f\frac{d\lambda}{d\nu} d\nu\]
- If \(\lambda_i\ll \nu\) for \(i=1,2\), then \(\lambda_1+\lambda_2\ll \nu\) and \[\frac{d(\lambda_1+\lambda_2)}{d\nu}=\frac{d\lambda_1}{d\nu}+\frac{d\lambda_2}{d\nu} \quad \nu\mbox{-a.e.}\]
- If \(\tau\) is a measure, \(\lambda\) is a \(\sigma\)-finite measure, and \(\tau\ll\lambda\ll \nu\), then \[\frac{d\tau}{d\nu}=\frac{d\tau}{d\lambda}\frac{d\lambda}{d\nu}\quad \nu\mbox{-a.e.}\] In particular, if \(\lambda\ll \nu\) and \(\nu \ll \lambda\) (equivalent), then \(\frac{d\lambda}{d\nu}=(\frac{d\nu}{d\lambda})^{-1}\).
The first result can be quickly verified by utilizing the approximation property and Monotone Convergence Theorem. The third result (chain rule) can be directly obtained by the first one.
Below we consider the density of transformation of random variables. A general result is given in proposition 1.8 of the textbook.
Example 2.7 Suppose that \(X\) is a random variable with Lebesgue p.d.f. \(f_X\) and \(f_X(x)=0\) for \(x\leq 0\). Let \(Y=X^2\) and \[g(y)=(2\sqrt{y})^{-1}f_X(\sqrt{y})I_{(0,\infty)}(y).\] Then \(g\) is a Lebesgue p.d.f. of \(Y\).
To verify the above result, i.e. we want \(P(Y\in A)=\int_A g(y) d\lambda(y)\) for \(A \in \cal B(\mathbb{R})\). It suffices to consider the case of intervals like \((-\infty,b]\) (a \(\pi\)-system) and let \(\lambda^{+}(A):=\int_A I_{(0,\infty)}(y) d\lambda(y)\) (\(\frac{d\lambda^{+}}{d\lambda}(y)=I_{(0,\infty)}(y)\)) and \(h(y)=\sqrt{y}I_{(0,\infty)}(y)\). Then for \(b>0\), \[\begin{split} \int_{(-\infty,b]} g(y) d\lambda(y)&=\int_{(-\infty,b]} (2\sqrt{y})^{-1}f_X(\sqrt{y})I_{(0,\infty)}(y) d\lambda(y) \\ &= \int_{(-\infty,b]} (2\sqrt{y})^{-1}f_X(\sqrt{y}) \frac{d\lambda^{+}}{d\lambda}(y) d\lambda(y) \\ &=\int I_{(0,b)}(y) (2\sqrt{y})^{-1}f_X(\sqrt{y}) d\lambda^{+}(y)\\ &=\int I_{(0,\sqrt{b})}(z) (2z)^{-1}f_X(z) d(\lambda^{+}\circ h^{-1})(z). \end{split}\]
Note that \[\lambda^{+}\circ h^{-1}((-\infty,b))=\lambda^{+}((0,b^2))=b^2=\int_{(-\infty,b)}2xI_{(0,\infty)}(x) d\lambda(x).\] Thus \[\frac{d\lambda^{+}\circ h^{-1}}{d\lambda}(x)=2xI_{(0,\infty)}(x).\] Therefore by applying the first one in proposition 2.5 again, we can derive that \[\int_{(-\infty,b]} g(y) d\lambda(y)=\int_{(0,\sqrt{b})}f_X(z) d\lambda(z)=P_X((0,\sqrt{b}))=P(Y\in (0,b]).\]
For the case \(f_x \neq 0\) on \((-\infty,0)\), the RN (Radon-Nykodim derivative) is \[ f_Y(y)=(\frac{f_x(\sqrt y)}{(2\sqrt y)}+\frac{f_x(-\sqrt y)}{(2\sqrt y)})I_{(0,\infty)}(y).\]
In summary, the proof is mainly based on (i) the change of measure and (ii) the integral formula w.r.t change of variables.
Remark. Can the result above be generalized to a general measure other than Lebesgue measure?