2.4 Asymptotical Theory
2.4.1 Convergence modes
In this chapter we shall recall four types of convergence at first. We may keep the notation as in undergraduate mathematical statistics.
Definition 2.12 Let \(X,X_1,X_2,\cdots\) be random \(k\)-vectors defined on a probability space.
- \(X_n\stackrel{a.s.}\to X\) if and only if \(X_n\to X\) \(P\)-a.e.
- \(X_n\stackrel{p}\to X\) if and only if for fixed \(\epsilon>0\), \[P(|X_n-X|>\epsilon)\to 0.\]
- \(X_n\stackrel{L^r}\to X\) for \(r\geq 0\) if and only if \[E(||X_n-X||^r)\to 0.\]
- \(X_n\stackrel{d}\to X\) if and only if for each continuity point \(x\) of c.d.f of \(X\), \(F\) and \(X_n \sim F_n\), we have \[F_n(x)\to F(x).\]
By Markov inequality, \(X_n\stackrel{L^r}\to X\) implies \(X_n\stackrel{p}\to X\). Clearly, \(X_n\stackrel{a.s.}\to X\) also implies \(X_n\stackrel{p}\to X\). Furthermore, we have \(X_n\stackrel{p}\to X\) implies \(X_n\stackrel{d}\to X\) and inverse holds if \(X\) is a constant.
We also recall the so called infinitely often (i.o.) refers to the limit supremum of the sets. In other words, an event \(A_n\) happens i.o. if and only \(\cap_{k=1}^{\infty} \cup_{n=k}^{\infty} A_k\) happens. Consider \(A_n=\{||X_n-X||>\epsilon\}\) for a given \(\epsilon>0\).
Lemma 2.2 \(X_n\stackrel{a.s.}\to X\) if and only if for every \(\epsilon>0\), \[P(\mbox{limsup}_n\{||X_n-X||>\epsilon\})=0.\]
Proof. Clearly, for \(\omega \in \mbox{limsup}_n\{||X_n-X||>\epsilon\}\), \(X_n(\omega) \nrightarrow X(\omega)\). Thus \(X_n\stackrel{a.s.}\to X\) implies \(P(\mbox{limsup}_n\{||X_n-X||>\epsilon\})=0.\). On the contrary, consider the set \[A_j:=\cup_{n=1}^\infty\cap_{m=n}^\infty\{||X_m-X||\leq j^{-1}\}.\] Then the result follows from \[\cap_{j=1}^\infty A_j=\{\omega: \lim_{n\to\infty}X_n(\omega)=X(\omega)\}.\]
Theorem 2.8 (Borel Cantelli Lemma) Let \(A_n\) be a sequence of events in probability space and follow the notation of limsup.
- If \(\sum_n P(A_n)<\infty\), then \(P(\mbox{limsup}_n A_n)=0\)
- If \(A_1,A_2,\cdots\) are pairwise independent and \(\sum_n P(A_n)=\infty\), then \(P(\mbox{limsup}_n A_n)=1\).
The first B-C lemma are usually used in proving convergence almost surely. Below we start to consider the convergence with continuous function composite the random sequence.
Clearly, if \(X_n\stackrel{a.s.}\to X\) then we have \(f(X_n)\stackrel{a.s.}\to f(X)\) for any continuous function \(f\). To show, for example, the case of convergence in distribution, we introduce the following theorem.
Theorem 2.9 (Skorohod's Theorem) If \(X_n\stackrel{d}\to X\) (in particular, \(X_n\) and \(X\) need not to be definied on the same space), then there are random vectors \(Y,Y_1,Y_2,\cdots\) defined on a common probability space such that \(P_Y=P_X\), \(P_Y=P_X\) for all \(n\) and \(Y_n\stackrel{a.s.}\to X\).
With Skorohod’s theorem, it is easy to show that if \(X_n\stackrel{d}\to X\), then \(f(X_n)\stackrel{d}\to f(X)\) as long as we consider the \(Y,Y_n\)-copy on another probability space. These kinds of results are called ``continuous mapping theorem’’ which hold for convergence a.s., in distribution, and in probability.
Exercise 2.3 Prove the continuous mapping theorem for the version of convergence in probability.
In the following, we define the notation of small-o and big-o in the sense of probability and almost surely.
Definition 2.13 Let \(X_1,X_2,\cdots\) and \(Y_1,Y_2,\cdots\) be random variables defined on a common probability space.
- \(X_n=O(Y_n)\) a.s. iff \(P(||X_n||=O(|Y_n|))=1\)
- \(X_n=o(Y_n)\) a.s. iff \(\frac{X_n}{Y_n}\to 0\) a.s.
- \(X_n=O_p(Y_n)\) iff for any \(\epsilon >0\), there is a constant \(C_{\epsilon}>0\) such that \(\mbox{sup}_n P(||X_n||\geq C_{\epsilon}|Y_n|)< \epsilon\).
- \(X_n=o_p(Y_n)\) iff \(\frac{X_n}{Y_n}\to 0\) in probability.
Obviously we can show that \(X_n=o_p(1)\) implies that \(X_n=O_p(1)\). Some intuitive properties listed below is left for exercise.
Exercise 2.4 Here we list some propositions of big-o and small-o, we may abuse the notation to be clear:
- If \(X_n\stackrel{d}\to X\), then \(X_n=O_p(1)\).
- \(o_p(1)+o_p(1)=o_p(1)\).
- \(O_p(1)\times o_p(1)=o_p(1)\).
- It still holds if changing \(o_p(1)\) with \(O_p(1)\) in 2. and 3.
Theorem 2.10 Let \(X,X_1,X_2,\cdots\) be random vectors. Then the following conditions are equivalent to weak convergence (convergence in distribution):
- (Levy-Cramer continuity theorem) For all \(t\in \mathbb{R}.\) \[\lim_{n\to \infty}E[e^{itX_n}]=E[e^{itX}].\]
- \(E[h(X_n)]\to E[h(X)]\) for every bounded continuous function \(h\).
- (Cramer-Wold device) For every real vector \(c\), \(c^TX_n \stackrel{d}\to c^TX\).
Theorem 2.11 (Slutsky) If \(X_n \stackrel{d}\to X\) and \(Y_n\stackrel{p}\to c\), then
- \(X_n+Y_n\stackrel{d}\to X+c\).
- \(X_nY_n\stackrel{d}\to cX\).
For the first case of Slutsky’s theorem, note that \[P(X_n+Y_n\leq t)=P(X_n+c+Y_n-c\leq t, |Y_n-c|\leq \epsilon)+P(X_n+c+Y_n-c\leq t, |Y_n-c|\leq \epsilon).\] Then if \(X\)’s cdf \(F_X\) is continuous at \(t-c+\epsilon\) and \(t-c-\epsilon\). Since \[F_X(t-c-\epsilon)\leq P(X_n+Y_n\leq t) \leq F_X(t-c+\epsilon),\] the convergence follows by taking \(\epsilon \to 0\). (In particular, notice that \(F_X\) is discontinuous at most countable points). An intermediate result of slutsky’s theorem (along with continuous mapping theorem and Cramer-Wold device) is that if \(X_n \stackrel{d}\to X\) and \(Y_n\stackrel{p}\to c\), then we have \((X_n,Y_n) \stackrel{d}\to (X,Y).\)
Theorem 2.12 (Delta method) Let \(a_n>0\), \(a_n\to \infty\), and \(a_n(X_n-\mu)\stackrel{d}\to Z\) for some constant \(\mu\) and random variable \(Z\). If \(g\) is differentiable at \(\mu\), then \[a_n(g(X_n)-g(\mu))\stackrel{d}\to g'(\mu)Z.\]
Proof. By skorohod theorem, there exists \(Y\sim Z\) and \(Y_n=\frac{U_n}{a_n}+\mu\) which satisfies \(Y_n\sim X_n\) and \(U_n=a_n(Y_n-\mu)\). Then by Taylor expansion and let \(\epsilon(x)=\frac{g(x)-g(\mu)-g'(\mu)(x-\mu)}{x-\mu}\) for \(x\neq0\) and \(0\) otherwise. Note that \(\lim_{x\to\mu}\epsilon(x)=0\). Finally, \[a_n(g(Y_n)-g(\mu))=a_n(g'(\mu)(Y_n-\mu)+\epsilon(Y_n)(Y_n-\mu)).\] Then the result quickly follows.
Remark. If we replace \(\mu\) with a random variable \(X\), does the result of Delta method still hold? In other words, if \(a_n(X_n-X)\stackrel{d}\to Z\), is \(a_n(g(X_n)-g(X))\stackrel{d}\to g'(X)Z\)?
False, for example, take \(X_n=X+n^{-1/2}(-1)^nZ\) for \(Z\sim-Z\). Then \[\sqrt{n}(X_n-X)\stackrel{d}\to Z\] but \[\sqrt{n}(X_n^2-X^2)\sim 2(-1)^nXZ,\] which is not convergent if we consider \(X=Z\).
2.4.2 Law of Large Numbers
The law of large numbers, including weak law (WLLN) and strong law (SLLN), concerns the limiting behavior of sums of independent (not necessary identical) random variables. We will show the result of i.i.d. version and leave independent (without identical assumption) version to readers.
Theorem 2.13 (WLLN) Let \(X_1,X_2,\cdots\) be i.i.d. random variables. Then \[\frac{1}{n}\sum_{i=1}^n X_i-a_n\stackrel{p}\to 0\] if and only if \[nP(|X_1|>n)\to 0,\] where \(a_n=E(X_1I_{\{|X_1|\leq n\}})\).
Proof. We only prove the sufficiency. Consider truncating \(X_i\)’s, \(Y_{n,j}=X_jI_{\{|X_j|\leq n\}}\). Let \(T_n=\sum_{i=1}^nX_i\) and \(Z_n=\sum_{i=1}^nY_{n,i}\). Note that \(a_n=\frac{EZ_n}{n}\)Then \[P(|\frac{T_n-EZ_n}{n}|)>\epsilon)\leq P(|\frac{Z_n-EZ_n}{n}|)>\epsilon)+P(T_n\neq Z_n).\] The second term will tend to \(0\) since \[P(T_n\neq Z_n)\leq \sum_{i=1}^n P(Y_{n,i}\neq X_i)=nP(|X_1>n|)\to 0.\] By Chebyshev inequality, i.i.d. assumption, Cauchy inequality, we have \[P(|\frac{Z_n-EZ_n}{n}|)>\epsilon)\leq \frac{\mbox{E}(Y_{n,1}^2)}{\epsilon^2n}.\] Then by the equality \(E(Y^p)=\int py^{p-1}P(Y>y)dy\) for random variable\(Y>0\) and change of variables, we can derive that \[\frac{\mbox{E}(Y_{n,1}^2)}{n} \leq \frac{1}{n}\int_{0}^\infty 2yP(|Y_{n,1}|>y)dy\leq c\int_{0}^n 2yP(|X_1|>y)dy.\] For the last term \(\int_{0}^n 2yP(|X_1|>y)dy\), since \(g(y):=2yP(|X_1|>y)\to 0\), there exists \(M=\sup g(y)<\infty\) and \(\epsilon_K=\sup\{g(y):y>K\}\). Then \[\frac{1}{n}\int_{0}^n 2yP(|X_1|>y)dy\leq \frac{KM}{n}+\frac{(n-K)\epsilon_K}{n}.\] Let \(n\to \infty\), then \[\limsup_{n\to \infty}\frac{1}{n}\int_{0}^n 2yP(|X_1|>y)dy\leq \epsilon_K.\] The result follows since \(K\) is arbitrary chosen and \(\epsilon_K\to 0\) as \(K\to \infty\).
Theorem 2.14 (SLLN) Let \(X_1,X_2,\cdots\) be i.i.d. random variables. Then \[\frac{1}{n}\sum_{i=1}^n X_i \stackrel{a.s.}\to EX_1\] if and only if \(E|X_1|<\infty\).
Before proving the theorem, we may first discuss some related lemma which will be used in the proof. The first lemma is called Kronecker’s lemma.
Lemma 2.3 (Kronecker's lemma) Let \(x_n\in \mathbb{R}\), \(a_n\in \mathbb{R}\), \(0<a_n\leq a_{n+1}\) for \(n \in \mathbb{N}\) and \(a_n \to \infty\). If \(\sum_{n=1}^\infty x_n/a_n\) converges, then \(\frac{1}{a_n}\sum_{i=1}^n x_i \to 0\).
The second lemma is a quite general inequality, due to Hajek and Renyi, we will give the proof of its special case which is known as Kolmogorov inequality and connect this result with Doob’s martingale inequality as a supplement for the last chapter.
Lemma 2.4 (Hajek-Renyi) Let \(Y_1,\cdots, Y_n\) be independent random variables with finite variances. Then \[P(\max_{1\leq k\leq n}c_k|\sum_{i=1}^k (Y_i-(EY_i))|>t )\leq \frac{1}{t^2}\sum_{i=1}^n c_i^2\mbox{Var}(Y_i),\] for any \(t>0\) and \(c_1\geq c_2 \geq \cdots\geq c_n >0\). If \(c_i=1\) for all \(i\), the inequality reduces to “Kolmogorov inequality”.
Proof (special case (Kolmogorov inequality)).
Let \(\sum_{i=1}^nY_i-E(Y_i)=S_n\) for \(n\in \mathbb{N}\). Then by Example 2.14, we have known that \(S_1,S_2,\cdots, S_n\) forms a martingale. Let \(Z_0=0\) and \(Z_{i+1}\) be \(S_{i+1}\) if \(\max_{j\leq i}|S_j|< t\), \(Z_i\) otherwise. Then \(\{Z_i\}\) is a martingale. To see this, note that \(Z_{i}=S_i\) for all \(i\) if \(\max_{j\in\mathbb{N}}|S_j|< t\), otherwise we can find a positive integer \(K\) such that \(\max_{j\leq K}|S_j|< t\) and \(\max_{j\leq K+1}|S_j|=|S_{K+1}|\geq t\) which implies \(Z_i=S_i\) for all \(i\leq K+1\) and \(Z_{i+1}=Z_{i}=S_{K+1}\) for all \(i\geq K+1\). In both cases \(\{Z_i\}\) is a martingale. Furthermore, \[\begin{split} P(\max_{1\leq i\leq n} |S_i|\geq t) &=P(|Z_n|\geq n)\\ &\leq \frac{1}{\lambda^2}E(Z_n^2)\\ &=\frac{1}{\lambda^2}\sum_{i=1}^n E[(Z_i-Z_{i-1})^2]\\ &\leq \frac{1}{\lambda^2}\sum_{i=1}^n E[(S_i-S_{i-1})^2]=\frac{1}{\lambda^2}E(S_n^2)=\frac{1}{\lambda^2}\mbox{Var}(S_n) \end{split}\]The first equality can be seen from the argument above. The second inequality is Chebyshev inequality. The left ones are based on the result that for any martingale \(\{M_n\}\) with \(M_0=0\), we have \[\sum_{i=1}^n E[(M_i-M_{i-1})^2]=E(M_n^2),\] which holds since in particular \[E(M_{i-1}(M_i-M_{i-1}))=0\] for all \(i\geq 1\).
Another extension of Kolmogorov inequality is related to the proof based on the martingales. Further discussions and its proof can be referred to Section 35 in Bilingsley (2008).
Proposition 2.8 (Doob's martingale inequality) If \(X_1,\cdots,X_n\) is a submartingale, then for \(\alpha>0\), \[P(\max_{1\leq i\leq n}X_i\geq \alpha)\leq \frac{1}{\alpha}E(|X_n|).\]
Let \(X_i=S_i^2\) be the partial sum of independent random variables with mean \(0\) and finite variances, which forms a submartingale by Theorem 2.5, then the result is exactly the Kolmogorov inequality.
Proof (SLLN).
We only show the sufficiency here. Let \(Y_n=X_nI_{\{|X_n|\leq n\}}\), \(n=1,2,\cdots\). To show the result, we consider the following decomposition: \[\begin{split} \frac{1}{n}\sum_{i=1}^n X_i-EX_1&=(\frac{1}{n}\sum_{i=1}^n X_i-\frac{1}{n}\sum_{i=1}^n Y_i)\\ &+\frac{1}{n}\sum_{i=1}^n (Y_i-EY_i)+\frac{1}{n}\sum_{i=1}^n (EY_i-EX_1). \end{split}\]Since \(EY_n\to EX_1\) by LDCT, It can be seen that the third term \(\frac{1}{n}\sum_{i=1}^n (EY_i-EX_1) \to 0\) (just separate the sum by finite terms and tail sum). For the first term, by integral test and \(E|X_1|=\int P(|X_1|>x)dx <\infty\) we have \[\sum_{n=1}^\infty P(X_n\neq Y_n)=\sum_{n=1}^\infty P(|X_n|>n)=\sum_{n=1}^\infty P(|X_n|>n)<\infty.\] Then by Borel-Cantelli first lemma, we have \(P(\{X_m\neq Y_m\, ,i.o.\})=0\). Hence for sufficiently large \(n\) we have \(X_n=Y_n\) with probability \(1\). Thus \[\frac{1}{n}\sum_{i=1}^n X_i-\frac{1}{n}\sum_{i=1}^n Y_i \to 0 \quad a.s.\] (similarly by separating the sum into finite sum and tail sum.) Thereore, it remains to show the second term \(\frac{1}{n}\sum_{i=1}^n (Y_i-EY_i) \to 0\) a.s.. To show this, we define \(Z_1=\cdots=Z_{m-1}=0\), \(Z_m=Y_1+\cdots+Y_m\), and \(Z_i=Y_i\) for \(i\geq m+1\) and \(c_i=\frac{1}{i}\) for \(i\geq m\) in Hajek-Renyi inequality. Then we can derive that \[P(\max_{m\leq l\leq n}|\xi_l|>\epsilon)\leq \frac{1}{\epsilon^2m^2} \sum_{i=1}^m \mbox{Var}(Y_i)+\frac{1}{\epsilon^2}\sum_{i=m+1}^n \frac{\mbox{Var}(Y_i)}{i^2},\] where \(\xi_n=n^{-1}\sum_{i=1}^n (Z_i-EZ_i)=n^{-1}\sum_{i=1}^n (Y_i-EY_i)\) for \(n\geq m\). Note that
\[\begin{split} \sum_{n=1}^\infty \frac{EY_n^2}{n^2}&=\sum_{n=1}^\infty \sum_{j=1}^n \frac{E(X_1^2 I_{\{j-1<|X_1|\leq j\}})}{n^2}\\ &\leq \sum_{j=1}^\infty \sum_{n=j}^\infty j\frac{E(|X_1| I_{\{j-1<|X_1|\leq j\}})}{n^2}<\infty, \end{split}\] where the last inequality holds since \(\sum_n \frac{j}{n^2}<\infty\). It suffices to show that \(\xi_n=n^{-1}\sum_{i=1}^n (Y_i-EY_i) \to 0\) a.s.. To see this, by lemma 2.2, we only need to show that \(P(\limsup_n \{|\xi_n|>\epsilon\})=0\). This follows from \[\begin{split} P(\limsup_n \{|\xi_n|>\epsilon\})&=\lim_{n\to \infty} P(\cup_{l=n}^\infty \{|\xi_l|>\epsilon\} )\\ &=\lim_{n\to \infty}\lim_{k\to\infty} P(\max_{n\leq l \leq k}|\xi_l|>\epsilon)\\ &\leq \lim_{n\to \infty} \frac{1}{\epsilon^2 n^2}\sum_{i=1}^n \mbox{Var}(Y_i)+\frac{1}{\epsilon^2} \sum_{i=n+1}^\infty \frac{\mbox{Var}(Y_i)}{i^2}=0. \end{split}\]The last equality follows by Kronecker’s lemma and \(\sum_{n=1}^\infty \frac{EY_n^2}{n^2}<\infty\).
2.4.3 Central Limit Theorem
Consider \(\{X_{n,j},j=1,\cdots,k_n\}\) be independent random variables with \(0<\sigma_n^2=\mbox{Var}(\sum_{j=1}^{k_n}X_{n,j})<\infty\) \(n=1,2,\cdots\) for \(k_n\to \infty\) as \(n\to \infty\). Here we discuss some different conditions such that the following result holds. \[\frac{1}{\sigma_n} \sum_{j=1}^{k_n} (X_{n,j}-E(X_{n,j})) \stackrel{d}\to N(0.1).\] The first condition is called Linderberg condition, which assumes that \[\sum_{j=1}^{k_n} E[(X_{n,j}-E(X_{n,j}))^2I_{\{|X_{n,j}-E(X_{n,j})|>\epsilon\sigma_n\}}]=o(\sigma_n^2) \] for any \(\epsilon>0\). The second one is Lyapunov’s condition: \[\sum_{j=1}^{k_n} E|X_{n,j}-EX_{n,j}|^{2+\delta}=o(\sigma_n^{2+\delta}),\] which is more common and will imply Linderberg’s condition. The last one is implied by Linderberg’s conditon, which is called Feller’s condition: \[\lim_{n\to \infty} \frac{\max_{j\leq k_n}\sigma_{n,j}^2}{\sigma_n^2}=0, \] where \(\sigma_{n,j}^2=\mbox{Var}(X_n,j)\).
In summary, we have the following: \[\mbox{Lyapunov's condition}\Rightarrow \mbox{Linderberg's condition}\Rightarrow \mbox{Feller's condition}.\]
Exercise 2.5 Show the result about the implication of the conditions for CLT above.
(\(\mbox{Lyapunov's condition}\Rightarrow \mbox{Linderberg's condition}\))
(\(\mbox{Linderberg's condition}\Rightarrow \mbox{Feller's condition}\)) It is clear since \[\frac{\max_{j\leq k_n} \mbox{Var}(Y_{n,j})}{\sigma_n^2}\leq \frac{1}{\sigma_n^2}(\sum_{j=1}^{k_n}E(Y_{n,j}^2))I_{\{|X_{n,j}-E(X_{n,j})|>\epsilon\sigma_n\}}+\epsilon^2\sigma_n^2),\] thus the result follows by the Linderberg’s condition.
Example 2.15 (density estimation) \(X_1,X_2,\cdots\) be IID with Lebesgue PDF \(f\). Consider \[\hat{f}(x_0)=\frac{1}{nh}\sum_{i=1}^n k(\frac{x_0-x_i}{h}),\] where \(0<h=h_n\to 0\) and \(k\) is kernel function which satisfies (i) \(k\geq 0\), (ii)\(\int k(u) du=1\), (iii)\(\int u k(u) du=0\) (iv) \(\int u^2 k(u) du<\infty\). Determine the conditions such that we have \[\frac{nh_n(\hat{f}(x_0)-f(x_0))}{\sqrt{\mbox{Var}(nh_n\hat{f}(x_0))}} \stackrel{d} \to N(0,1). \] By Lyapunov’s condition, we may require that \[\frac{\sum_i E|k(\frac{x_0-X_i}{h})-E(k(\frac{x_0-X_i}{h}))|^{2+\delta}}{\sqrt{\mbox{Var}(\sum_i k(\frac{x_0-X_i}{h}))}^{2+\delta}} \to 0. \] Let \(g=k,k^2\), \[E(g(\frac{x_0-X_i}{h}))=\int_{-\infty}^{\infty} g(\frac{x_0-x}{h})f(x)dx=\int_{-\infty}^{\infty} g(u)f(x_0-hu)hdu.\] By LDCT, \[E(k^2(\frac{x_0-X_i}{h}))=h_n(f(x_0)\int_{-\infty}^{\infty}k^2(u)du+o(1)),\] and \[E(k(\frac{x_0-X_i}{h}))=h_n(f(x_0)+o(1)).\]
Then by considering \(\delta=2\) for Lyapunov’s condition and \[ \sum_i E|k(\frac{x_0-X_i}{h})-E(k(\frac{x_0-X_i}{h}))|^{4}\leq Cnh_n(1+o(1)),\] for some constant \(C\). Then the Lyapunov’s condition can therefore be verified by \(nh_n\to \infty\).
We may wonder that at what circumstances we will encounter the general type of CLT, which admits two indices (\(n,j\)). Below we illustrate an example.
Example 2.16 Consider the model \(Y=f(X)+\epsilon\) and \(X\in[0,1]\). If we sample “\(X\)” as to be \(\{X_{n,1},\cdots,X_{n,k_n}\}=\{\frac{1}{k_n+1},\cdots,\frac{k_n}{k_n+1}\}\) and \(Y_{n,j}=f(X_{n,j})+\epsilon_{n,j}\), where \(\epsilon_{n,j}\stackrel{IID}\sim N(0,\sigma^2)\). If we consider \[\hat{f}(x_0)=\frac{\sum_{j=1}^{k_n}Y_{n,j}k(\frac{x_0-X_{n,j}}{h})}{k(\frac{x_0-X_{n,j}}{h})}.\] Then the form \[\hat{f}(x_0)-\frac{\sum_{j=1}^{k_n}f(X_{n,j})k(\frac{x_0-X_{n,j}}{h})}{k(\frac{x_0-X_{n,j}}{h})}=\frac{\sum_{j=1}^{k_n}\epsilon_{n,j}k(\frac{x_0-X_{n,j}}{h})}{k(\frac{x_0-X_{n,j}}{h})}\] involves the sum of random variables with two indices.
In proving Linderberg’s CLT, we consider the characteristic function of \(\frac{1}{\sigma_n} \sum_{j=1}^{k_n} (X_{n,j}-E(X_{n,j}))\), the inequality \[|\prod_{k=1}^m Ea_k-\prod_{k=1}^m Eb_k|\leq\sum_{j=1}^m E|a_k-b_k|,\] and \[Ee^{itX_{n,j}}-(1-t^2\sigma_{n,j}^2/2)\leq E(\min\{|tX_{n,j}|^2,|tX_{n,j}|^3\}).\] Combining the two inequalities and the approximation of c.h.f, we may separate and control the two parts \(I_{\{|X_{n,j}-E(X_{n,j})|>\epsilon\}}\) and \(I_{\{|X_{n,j}-E(X_{n,j})| \leq\epsilon\}}\). The result (for the squared term) will naturally follow by the Linderberg’s condition.