This lesson introduces confidence intervals for population implies andproparts.

You are watching: Why is the sample mean an unbiased estimator of the population mean?

I. Objectives

Understand also the distinction in between populace and also sampleand also in between parameterand estimator. Understand the definition of confidence intervals. Become familiar with confidence interval formulas forthe recognized variance situation (or unrecognized variance with a big sample size). Discover just how to decide on the forced sample size.

Text: Chap. 10, p. 295-300, p. 302-310, 314 of Lapin, andpperiods 114-130 from The Cartoon Guide.

III. Homework:HW 22.

IV. Explacountry and Examples

First some terminology.

Population and Sample

A populace is the big set of measurementsmatching to the whole collection of devices forwhich inferences are to be made. A sample from a populationis the collection of dimensions that are actually built up in the course ofan examination. Characteristics of the populace like the meanμ or the varianceσ 2 are dubbed parameters. Quantities calculated from thesampleare referred to as soimg.orgistics. When a soimg.orgisticlike the sample mean Xis aimed at a populace parameter likeμ, we call Xan estimatorof μ. Similarlythe sample variance s2 is anestimator ofσ 2.

Now we will certainly talk about a couple of properties that estimatorsmust have actually.An estimator is unbiased if its expect over allsamples is equal to thepopulation parameter that it is estimating. For instance,

E(X)=μ.

Similarly,

E(s2) =σ 2.

Unbiasedness is a good property however not important if theestimator is almostunbiased. For example, the reason we usage n-1 in the definition of s2 is to acquire an unbiasedestimator. However before, itwouldn"t be negative to usage s2via an n in the denominatorbecause the resulting estimator, call it sn2,is practically unbiased:

E(sn2)= <(n-1)/n>σ 2.

By the means, we speak to the distinction in between the expectationof theestimator and also what it is estimating the bias of theestimator.For the over sn2,the bias is

bias = E(sn2)-σ 2 = <(n-1)/n>σ 2-σ 2 = -σ 2/n

Another important residential or commercial property is consistencywhich indicates that the variance andand the predisposition of the estimator both go to zero as the sample sizegets big. Since the sample expect has predisposition 0 and varianceσ 2/n,it clearly is continuous. We wouldn"t desire to usage an estimator that isnot constant.

Finally, we always desire to use estimators that have actually lowbias and also lowvariance. Such estimators are called efficientestimators.To compare 2 unbiased estimators, we need only compare theirvariances---the estimator with the lower variance is more efficient. Ifone or both of the estimators has actually prejudice, then we define

suppose squared error = variance + bias2

and also pick the estimator via reduced mean squared error.

soimg.orgistical Inference

Random variables and their distributions, presented inpreviouslessons, are offered to describe and also version populations.soimg.orgistical inference deals withillustration conclusions around populaces based on samples.But even more specifically these conclusions are based upon estimators of theparameters of the populace. That is, the sample suppose and also samplevariance are extremely useful in illustration conclusions about a populace.

Any inference about a populace parameter will involvesome uncertainty because it is based on a sample, fairly than on theentirepopulace. To be meaningful, soimg.orgistical inference should incorporate aspecification of this uncertainty. In certain because the inference isoften based on estimators, we usage conventional deviations (or perhaps meansquared errors) ofestimators and confidence intervals to assess the uncertainty. In thenextarea we will describe confidence intervals.

Confidence Intervals for a Single Mean

First we are going to focus our attention on the samplesuppose Xof a random sample X1, X2,... , Xn.Recall that we assume that the Xi"sare independent of one one more and that each Xihas actually expect μ and variance σ2.

With these assumptions we recognize that the variance of thesample suppose is

Var(X)=σ2/n,

and its conventional deviation is just

std(X)=σ / √n.

When the populace traditional deviation σis unwell-known, then we rearea σ by the samplestandard deviation s. The resulting estimator ofthe standarddeviation of the suppose, s/√n,is dubbed the standard error of X.Lapin actuallycalls σ/√nthe conventional error.

Keep in mind that we are not using the smallsample correction variable for sampling without replacement provided on page276 of Lapin.

We can be happy to simply reportσ/√nor s/√ntogether with Xas a measure ofuncertainty. However before, these uncertainty actions can be made moreinterpretablebereason of the Central Limit Theorem. On page 302 of the message, one canseeexactly how to rotate a soimg.orgement around the approximate distribution of Xintoan interval of plausible worths for the population suppose.

For example, the 95% large sample confidence interval forthe population intend is

(X- 1.96 × s/√n , X+ 1.96 × s/√n)

(We usage s in location of σ whenit is unrecognized.)The worth 1.96 have to be familiar as the 97.5th percentile of thestandardnormal. If you want a 90% confidence interval, then you relocation 1.96 by1.645. In general if you want a (1-α) x 100% confidenceinterval, then you use the (1-α/2)thpercentileof the standard normal (look up (1-alpha;/2) backwardsin Table D on p. 536-537).

Example.Suppose that a populace ofadult male heights hasunknown intend μ. The sample intend of n=20 menfrom this population is 68.2 inches, and the sample conventional deviationiss=3 inches. Find a 90% confidence interval for μ. The answer isssuggest

(68.2 - 1.645 × 3/√20,68.2 + 1.645 × 3/√20 =(68.2 - 1.1, 68.2 + 1.1) = (67.1,69.3)

The confidence soimg.orgement for the interval is interpretedas follows:

Due to the fact that the values of the sample mean Xand sample conventional deviation s depend on thegathered information points, the interval(X -1.96 × s/√n , X+ 1.96 × s/√n)is a random interval that attempts to cover the true value of thepopulace meanμ, which is unrecognized.

The probability Pr(X - 1.96× s/√n , X+ 1.96 × s/√n)= 0.95,construed as the long-run loved one frequency over many repetitions ofsampling,asserts that about 95% of the intervals will cover μ.

Once the worths of X, s, and also the interval are uncovered from therandomsamples, it is no much longer wise to sheight around the probcapacity of itsspanning a fixed quantity . This numerical realization of theconfidenceinterval have the right to either cover the true intend or miss it, which we neverknow.We deserve to just make soimg.orgements around what would certainly take place over many type of repeatedapplications of a confidence interval procedure.

Other Confidence Intervals Based on the CentralLimitTheorem

Many confidence intervals have the create

estimator ± 1.96 × std(estimator)

or replacing the conventional deviation of the estimatorstd(estimator) byan estimator, called the typical error and also dedetailed se(estimator),

estimator ± 1.96 × se(estimator).

Look on p. 321 for the distinction of two indicates and also on p.314 for onepropercent. We currently talk about the one propercent instance.

You can respeak to that for data having actually just two worths, 0and 1, or"success" and also "failure", or "defective" or "not defective," we countthe number of 1"s or the number of successes, or the number ofdefectivesin n trialsand represent that by Y. This Yhas actually a binomial distributionwith parameters n and also π, wbelow π isthe probabilityof success on one trial. The formulas on p. 314 usage the fact thatthe variance of Y is nπ(1-π). The estimator ofπis P=Y/n, the sample propercent. Because the variance of Yis nπ(1-π),the variance of P is π(1-π)/n.Thus the standard deviation of P is the square root of the varianceand denoted on p. 314 as σp.To obtain a confidence interval for π, we just replaceπ by P and gain

P ± 1.96 × √P(1-P)/n

for our 95% confidence interval.

Example. A sample of n=50 resistorsfrom a huge batch of resistorsresults in 9 that are out of tolerance. Give a 95% for the proportionofthe batch that are out of tolerance. Because P=9/50=.18, the answer is

.18 ± 1.96 × √.18(1-.18)/50= .18 ± .11, or(.07,.29).

It is important to identify in between constant typeinformation wbelow we areinterested in estimating means and also binomial kind data wright here we areinterested inprosections. Likewise, we must differentiate in between one sampleproblemswherewe are estimating simply one mean or proportion and 2 sample problemswright here we are estimating the distinction of two indicates or prosections.This difference mirrors up in the se(estimator) component of theconfidence interval. But otherwise all the formulas have actually the very same basicdevelop.

Sample Size Problems

If we want our confidence interval for the suppose

(X- 1.96 × s/√n , X+ 1.96 × s/√n)

(X- d, X+ d),

then equating terms leads to n =<1.96s/d>2. Of coursewecarry out not have s; so we have to rearea it by a guess ofthe population standard deviation.

Example.

See more: Gas With 8 Protons And 8 Neutrons, Ucsb Science Line

In the populace of adultmale heights, expect that wewanted a 90% confidence interval of develop X± 1. Thenmaking use of the guess s=3, we have actually n=<(1.645)(3)/1>2=24.35,roundingto n=24. We have actually used 1.645 in location of 1.96 because a 90% confidenceinterval was requested.

For proparts difficulties, we equate1.96 × √P(1-P)/nto d and attain

n=<1.96/d>2 × P(1-P).

Of course we don"t know P; so we mustsupply a guess or set P=1/2, whichwill give the largest n for any worth of P.

Example. A populace of resistors isto be sampled to determinethe variety of out of tolerance resistors. If d=.05, discover the requiredsample dimension for a 95% confidence interval when the initial guess isaround10% out of tolerance. The answer is

n=<1.96/.05>2 ×.10(1-.10)=138.3, rounding to n=138.

If no guess were available, then we would certainly usage

n=<1.96/.05>2 ×.5(1-.5)=384.16, rounding to n=384.

The distinction in the two sample dimension guesses illustratesthe differencein the traditional deviation of P at .10 versus .5.