~'~\":7\" ^-,yl ~i ,, ~ ' ', STATISTICS& PROBABILITY LEI'~RS Statistics & Probability Letters 33 (1997) 23-34 ELSEVIER Bayes and empirical Bayes estimation with errors in variables Shunpu Zhang *, Rohana J. Karunamuni 1 Department of Mathematical Sciences, University of Alberta, Edmonton, Alberta, Canada T6G 2G1 Received June 1995; revised February 1996 Abstract Suppose that the random variable X is distributed according to exponential families of distributions, conditional on the parameter 0. Assume that the parameter 0 has a (prior) distribution G. Because of the measurement error, we can only observe Y = X+e, where the measurement error e is independent of X and has a known distribution. This paper considers the squared error loss estimation problem of 0 based on the contaminated observation Y. We obtain an expression for the Bayes estimator when the prior G is known. For the case G is completely unknown, an empirical Bayes estimator is proposed based on a sequence of observations Y1, Y2. .... Y,, where Y~'s are i.i.d, according to the marginal distribution of Y. It is shown that the proposed empirical Bayes estimator is asymptotically optimal. AMS classification: Primary 62F15; 62C12; secondary 62F10 Keywords.\" Bayes; Empirical Bayes; Squared error loss estimation; Kernel density estimates; Asymptotically optimal I. Introduction Consider the following estimation problem. Suppose 0 is distributed according to some (prior) distribution G, and one is to estimate 0 based on a random variable X with X, given O, being distributed according to some distribution Fxlo with Lebesgue densities fxlo. Let the loss function be the squared error loss. But assume that X is not directly observable and because of measurement error or the nature of environment, one can only observe Y =X+e, (1.1) where the random disturbance or the random error e is independent of X. Assume that e has a known distribution F~. We investigate the problem of estimation of 0 based on Y with the squared error loss. In this paper, it is of our interest to develop both Bayes (in the case when G is known) and empirical Bayes (in the case when G is unknown) estimators for the preceding problem. In Section 2 below, we obtain the Bayes * Corresponding author. I Research was supported by a grant from the Natural Sciences and Engineering Research Council of Canada. 0167-7152/97/$17.00 © 1997 Elsevier Science B.V. All rights reserved PH S0 1 67-7152(96)00106-X 24 s. Zhang, R.J. Karunamuni / Statistics & Probability Letters 33 (1997)23-34 estimator of 0 w.r.t.G. In Section 3, an empirical Bayes (EB) estimator is constructed when X, given 0, is distributed according to the continuous one-parameter exponential family; that is, for some a t> - c~, fxlo(x) = u(x)c(O) e°x, (1.2) where u(x) > 0 if and only if x > a and c(O) = (fe°Xu(x)dx) -1. Empirical Bayes estimators are con- structed based on the data gathered from n independent repetitions of the same component problem (Robbins, 1956, 1964). Under the present model, then the problem occurs independently with the same unknown G throughout, there is a sequence of independent random vectors (Oi,Xi, Yi), i = 1,2 ..... where the random variables Oi's and X/'s are unobservable, and Oi's are i.i.d, with the same distribution G. Conditional on Oi, X,. is distributed according to fxloi. Only Y/'s are observable, where I1,- = X,- + ei with ~i and X/ independent. For the (n + 1 )st problem, (empirical Bayes) estimator fin(Y) depends on Y1 ..... Yn and Yn+l = Y, n ~> 1. There are a number of practical situations in which one may face the type of problem described above. For example, Y could be the measurement made on an item manufactured using certain equipment. (Usually, more than one measurement is made on successive items.) If one wishes to estimate some parameter 0 of the equipment, subject to the squared error loss, one may have available measurements Y1, Y2 ..... Yn on items manufactured using the same type of equipment in the past. The history of the standard empirical Bayes estimation problem is such that the only problem that seems to have been considered thus far is the situation where the random variables X/ are observed without an error. The literature is too extensive to warrant a complete listing here. For empirical Bayes estimation in the family (1.2) see, for instance, Yu (1970), Hannan and Macky (1971), Lin (1975), Efron and Morris (1973), Singh (1976, 1979), Van Houwelingen and Stijnen (1983) and Singh and Wei (1992). For additional references, the reader is referred to the monograph of Maritz and Lwin (1989). The proposed EB estimator and its asymptotic optimality are given in Section 3 of this paper. Proofs of the main results are deferred to Section 4. Results of a simulation study are given in Section 5. Section 6 contains concluding remarks. 2. The Bayes estimator Under the squared error loss, the Bayes estimator based on the contaminated data Y = y is the posterior mean E(OIY = y); i.e., fiG(Y) = E(OI Y = Y) = fa OfYI°(Y)dG(O)/fy(y), where G is the prior distribution on f2 and fr(y) = fafrlo(y)dG(O) with (2.1) frlo(Y) = fy-a fxlo(Y - x) dF~(x). Then 6c(Y) = fa 0 fY_~oa fxlo(Y - x) dF~(x) dG(0) fY(Y) S. Zhan9, R.J. Karunamunil Statistics & Probability Letters 33 (1997)23-34 y-a 25 If we assume that fa IOI f~-oo fxlo(Y - x)dF~(x)dG(O) < oo uniformly in y and if fxlo is given by (1.2), then by Fubini's theorem we obtain fiG(Y) = f y-afg20fxlo(Y - O-Q x) dG(O)dF~(x) fr(Y) fy-a J.(:lc) z ~Y - x)dF~(x) - fy--a ,\"~(y--x) f=t ,, -- x)dF~(x) u(y-x) sa ,.... fr(Y) where fx(x) = £ C(O)u(x)e °x dG(0) (2.2) (2.3) with u(x) as given in (1.1) and uO)(x) denotes the first derivative of u(x). When a = -cx~, then (2.2) becomes f_cx~ :(I). x~ uO)(Yt :, c~ JJ~ I.y - x) dF~(x) - f~coo ~ - x) Jx~.y - X'~) dF~(x) f~ fX(Y -- X)dF~(x) fiG(Y) = (2.4) Example 2.1. Consider the exponential family in (1.2) with u(x) = e -x~/2 and C(O) = (2/r)-l/2e-°2/2; that is, for each -c~ < 0 < c~, fxlo(x) = (27r)-l/2e-(X-W/2, where -c~ < x < c~. Then t2 = (-c~,c~) and a -- -cxz. Then the Bayes estimator fiG(Y) given by (2.4) is equal to fiG(Y) = _ f_~ f~ --I~ (y - x - O)e _~y-x2-O ): dG(O)dF~(x) f~_oo f~_~ f~(y 1 _~r-x-o:2 dG(O)dF~(x) -+ ~ l__L_e (y_x_ o)2 dG(O)dF~(x) - x) fS~~2(2.5) f~-~ f~-oo°° l_!_e ~ dG(O)dF~(x) Further, suppose that the prior on g2 is Go = Normal(0, 1). Then (2.5) reduces to - f~_~ ]-g---~(y-x)e-~7~-dF~(x)+ f~_~o,~(y- fiGo(Y) = f_~ i-~e- ~ 4~ ~J-oot\"°'~ \"Y - x),e _c.z:_) a..&..t x ) 2 j_~e 4 o&(x) dF~(x) \" (v-x)2 X) 1 e-(Y4 x)2 ~ dF~(x) Example 2.2. Consider the scale exponential family with Lebesgue densities given by fxlo(x) = Oe -°x for x > 0, where 0 > 0. Then t2 = (0, c~) and a = 0. Then the Bayes estimator fiG(Y) given by (2.2) is equal to fYoo fo 02e-(y-x)° dG(0) dEe(x) fYo~ fo Oe-(y-x)O dG(O)dF~(x) (2.6) fiG(Y) = 26 S. Zhang, R.J. Karunamuni /Statistics & Probability Letters 33 (1997) 23-34 Further, suppose that the prior on O is Go = Gamma(a, fl), e > 0,fl > 0. Then the Bayes estimator (2.6) reduces to 6ao(Y) = ffoofl(fl + 1) ~dF~(x) fl~dF~(x) (fl+l) ff-yo o(Y - x + e)-(#+2) dF~(x) fY--oo(Y x + ~)-(#+1) dF~(x) 3. An empirical Bayes estimator In this section we shall consider the case where the prior G is not completely known. Assume that a sequence of contaminated observations Y1, Y2. .... Yn is available, where Yi's are i.i.d, according to the marginal distribution Fr with density fr given by (2.1) when fxlo is given by (1.2). At the (n + 1)st problem, the estimator ~n is allowed to depend on all of the past observations as well as the (n + 1)st observation. Hence, fin is a measurable function of Y1, Y2. .... Yn and Yn+l = Y. In order to construct fin, we shall first construct estimators of fx and fx(1 ) based on Y1, Y2. .... Yn, where fx is given by (2.3) and fO) is the first derivative of fx. Let ~b~ denote the characteristic function of the error variable e. Let q~n denote the empirical characterisric function defined by q~,(t) = (1/n)~]=l exp(itYj). For a nice kernel K, let ~bg be its Fourier transform with ~bK(0)=l. For x > 0, then we define f(nl)(x) = ~ f~oo exp(--ltx)(--lt)l C~K(thn)~ dt 1 oo ^ (3.1) as our (kemel) estimator of f(t)(x), l = 0, 1, where hn is the bandwidth (hn ~ 0 as n ~ oo). Here we assume that [tl@c(thn)/C#~(t)[ is integrable on (-c~,c~). The construction of (3.1) is due to Stefanski and Carroll (1990). A similar construction is studied by Fan (1991a, b, 1992) and Zhang (1990). In the special case when l--0, denote f(t) by fn. Under the model (1.1), the estimator (3.1) can be rewritten in the kernel form n f(ni)(x) -- nhl/+l Z Knl (x - Yj J=~ \\h.]' (3.2) where K.l(X) = ~ 1 f_~ o¢) exp(-itx) qb~(t/h.) (--it)tqbK(t) dt. In view of (2.2), we shall estimate 6a through fx and f(l). Let fy_~a f(nl)(y _ x) dF~(x) - fy_~a ,,u~(\"y(-yx-x, t¢.,r, _ x) dF~(x) ) ) a~ nt~n(y) = fn(Y) with y--a y--a (3.3) fn(y) = f~-oo f.(y-x)dF~(x) An if ]f-~oo f~(Y-x)dFdx)[ > A., otherwise, (3.4) S. Zhang, tLJ. KarunamunilStatistics & Probability Letters 33 (1997) 23-34 27 and f~l) being the type of kernel estimator of fl t) given by (3.1) for l = 0, 1, where An is a sequence of positive numbers such that An ~ 0 as n ~ c~. Let R(6n, G) denote the Bayes risk of 6n given by (3.3) w.r.t. G. Then R(6,, G) = E(6n - 0) 2, where the expectation E is over the random variables Y1, I12. .... Yn, Yn+l and 0. For the Bayes estimator 66 given by (2.2), R(6G, G) achieves the minimum Bayes risk w.r.t.G. That is, R(6c, G) = infd R(d, G), where the infinimum is over all estimators d for which R(d, G) < c~. For convenience, denote R(6c, G) by R(G). Then R(G) is the Bayes envelope value of the problem. This motivates the use of the excess risk (regret) R(6n, G) - R(G) = E(6n - 0) 2 - E(fG -- 0) 2 as a measure of goodness of the estimator fin. Restricting G to those with finite Bayes risk, the excess risk satisfies (Lemma 2.1 of Singh, 1979) 0 ~R(an, G) - R(G) = E(fn - 6o) 2. (3.5) Empirical Bayes estimator 6n is said to be asymptotically optimal if limn~o~R(6n, G) = R(G) (Robbins, 1956, 1964). To state the main results of this paper, which establish the asymptotic optimality of the EB estimator 6, given by (3.3) under various conditions, we need following assumptions on the kernel and the error 8: (A1) K(.) is bounded, continuous and f_~ IxlZK(x)dx < ~. (A2) The Fourier transform ~bK of K is a symmetric function and satisfies C~K(t) = 1 + O(ItlZ), as t ~ 0. (A3) ~x(t)= 0, for Itl i> 1. (A4) The characteristic function ~b~ of ~ satisfies ~b~(t) ~ 0, for any t. (m5) Iq~(t)lltl -a0 exp(ltla/y)>>.do (as t --+ c~) for some positive constants fl,7,do and a constant fl0. (A6) Id4~K(th,)/qb~(t)l is integrable on (-cx~,c~), l = 0, 1. (A7) 14~(t)tBI ~>d0 as t ~ c~, for some positive constants do and ft. (A8) f~_~ IOK(t)t'+tldt < oo and f~_~ IC~K(t)t2<~+t)ldt< c~, for some positive constant fl and l = 0, 1. The assumptions (A1), (A2) and (A3) imply that K is a second-order kernel function. For convenience, consider the following class of priors: .~B = (G: G is a prior on f2 such that suplfx(x)l<.B with fx is given by (2.3)~ ) X L for some finite positive constant B. Theorem 3.1. Let fxlo be given by (1.2) with a = -c~. Let G E ~B, where ;~8 is oiven by (3.6). Further, suppose that the distributions G and F~ are such that fx given by (2.3) is twice differentiable on (-oo, oo), fa 0 2 dG(0) < ~, fa f~_~ IOIfxlo(y-x)dF~(x) dG(0) < c~ uniformly in y, and E[f_~ (u~l)(Y - x)/u(Y - x)) 2 dF~(x)] < oo. Furthermore, assume that the conditions (A1) to (A6) hold. Then, for the bandwidth hn = O((log n)-l/#) and the sequence An = o((log n) -1/~) (see (3.4)), we have nlina R(6n, G) = R(G), (3.7) (3.6) where R(fn, G) is the Bayes risk of the EB estimator fin defined by (3.3) with a = -c~, and R(G) is the minimum Bayes risk. Theorem 3.2. Assume that the hypotheses of Theorem 3.1 hold now with the conditions (A1) to (A6) replaced by the conditions (A1) to (A4), (A6), (A7) and (A8). Then, for the bandwidth hn = O(n -1/~#+5)) and An = o(n-l/(t~+5)), we have nlim R(fn, G) = R(G). (3.8) 28 S. Zhan9, R.J. Karunamuni / Statistics & Probability Letters 33: (1997) 23-34 The distributions normal and Cauchy satisfy the assumption (A5) above, whereas gamma and double ex- ponential satisfy (A7). A kernel satisfying (A1), (A2), (A3) and (A8) can be easily constructed; see, e.g., Fan (1991a, b, 1992).We now revisit Example 2.1 and investigate the validity of assumptions made in the theorems above for this example. Example 2.1 (continued). Let the error distribution F~ = Normal(0, 1). Then ~b6 satisfies (A4) and (A5) with fl = 2. Since fxlo(x) = (1/x/~)e -(x-0)2/2, we have fx(x) = for any prior on • = (-ee, e~). Hence, by Theorem 2.9 of Lehmann (1959), one obtains f(xl)(x) = -~ and ff '/? oo fxlo(x) dG(0) = ~ 1jr ~ e -(x-W~2 dG(0) ~¢ -(x - O)e -(x-W/2 dG(0) f(2)(x) _ v/~l Then : ~(x -- O)2e- (x-0)2/2d G(0) - ~ 1F ~ e- (x-0)2/2d G(0). sup Ifx(2)(x)l~
1. 4. Proofs In this section we prove Theorem 3.1 above. The proof of Theorem 3.2 is similar. First, we state two lemmas useful in proving these theorems. For proofs of the lemmas, see, e.g., Fan (1990a, b, 1992). Lemma 4.1. Under the assumptions of (A1) to (A6) and with the choice hn = O((logn) -W) of the bandwidth, one has sup -e~>. An (see (3.4)) and by the moment inequality, one obtains the 1st term of the RHS of (4.4) ~< 4A-~2E f_~(f(.i)(Y - x) - = 4An 2 Similarly, the 2nd term of the RHS of (4.4)~< 4A;Zf °~ (y' -~x )~J2 E(f(\"I)(Y - x) - k ,u-O~)-(-(4.4) f(xi)(y - x))2dF~(x) (4.5) F (3O E(f(1)(y - x) - fx(1)(y - x))2dF~(x). J_ fO)(y _ x))ZdF~(x).(4.6) Now consider E (fr(Y)-f.(Y) t 2 :E \\ k ~-~ l(fr(y)>~An) 2 fn(Y) } =Jl,n + J2,n, where Jl,n and JE,n are the first and second terms of the RHS of (4.7). By definition (3.4) off,(y), we note that al,n ~ An2E(/n(Y) - fr(Y))21(fr(Y) >/An) =A22E{(f~(y)-fr(y))2l(f_~f,(y-x)dF~(x) +(f.(y)-fr(y))Zl(Lf.(y-x)dF~(x) )An) (4.8) ~A.) <~A~ZE{CS2fn(y-x)dF~(x)-fr(y))2lCf;f.(y-x)dF~(x) ~An) S. Zhano, R.J. Karunamunli Statistics& ProbabilityL etters3 3 (1997) 23-34 +(f2fn(Y-x)dF~(x)-fr(Y))2l(/C~fn(Y-x)dF~(x) = An2E (fn(y-x)-fx(y-x))dF~(x) I(fr>~An) 'An) 31 <<,A-;z{f2E(f,(y-x)-fx(y-x))2dF~(x)}l(fr>>,A,) by the moment inequality. By combining (4.4), (4.5), (4.6), (4.7), (4.8) and using the fact Jz, n <~4I(fr(y) < A,), we obtain E(6\"(Y)-6a(Y))2 ~ 4An2 {f2 E(f(nl)(Y-X)- f(x')(Y-X))2dF~(x) ~° [u(l)(y-x)'~ 2 + oo ~ -~-----~ ) ~tJntY -x) - fx(Y -x))2dF~(x) S_ .... +62(y)I(fr(y)>~An) +46~(y)I(fr(y) < An) i? E(fn(y -x) - fx(Y O0 - x))2dF~(x) } (4.9) <~ C4A-;2 { (logn)-Zn f_~ dF~(x) + (logn)-4/a f_~ ( u(~@yYx)) ) ZdFdx) +(log n)-4/#f~(y)I(fr(y) >1A ,) } +46Z~(y)I(fr(y) < An), where the last inequality is obtained using Lemma 4.1 and Ca is a constant independent of n and y. Now from (4.9), we see that E(On(y) - 6a(y)) 2 <~M(y), where M(y) = C5 + C6 f_~(u(1)(y -x)/u(y- x))2dF~(x) + C762(y), with C5, C6 and C7 being constants independent of n and y. Observe that f_~M(y)fr(y)dy < oc by the assumptions of Theorem 3.1. Also, from (4.9) with the choice of A, = o((log n)-l//~), we see that lim,__.~ E(6n(y) - 6a(y)) 2 = 0 for each fixed y. Then, by an application of the dominated convergence theorem, we obtain limn--.o~ f_~ E(fn(y)- 6G(y))2fr(y)dy = 0, under the assumptions of Theorem 3.1. The result (3.7) now follows in view of (4.3). Proof of Theorem 3.2. The proof of (3.8) is verbatim the same as that of Theorem 3.1, except now that An = o(n -j/(/~+5)) and Lemma 4.2 are used instead of A, = o((log n) -l/#) and Lemma 4.1. 5. Simulation results To study the convergence of the regret R(6n, G)-R(G) of the proposed estimator (3.3), we have conducted simulation studies, and some of the results are reported here. Specifically, the following two cases of the results are presented here. 32 s. Zhan9, R.J. Karunamuni/ Statistics & Probability Letters 33 (1997)23-34 Case I: We take fxlo(X) = (1/x/~)e -(x-°)2/2, prior 9(0) = (1/~/~)e -02/2, - o~ < 0 < oo, and the error distribution F~ as the standard normal distribution. Further, we assume that the bandwidth h. = v~(log n) -1/2 and the sequence A. in (3.4) as An = v~(logn) -1. Case II: Here, we take fxlo(x) = (1/v/~)e -(x-°)2/2, 9(0) = (1/v~) e-02/2 and the error distribution F~ as Gamma(l, 1) distribution. Also, we assume hn = n -1/6 and An =-n -1/4. For both cases, we used a second-order kernel, K(x)_48cosx( rex4 1- 1~) 144sinx ( X 5 2- x~ ) -oc