11. HYPOTHESIS TESTING
A. Define a hypothesis, describe the steps of
hypothesis testing, and describe and interpret the choice of null alternative
hypotheses :
A
hypothesis is a statement about the value of a population parameter developed
for the purpose of testing a theory or belief. Hypotheses are tested in terms
of the population parameter to be tested, like the population mean µ. The
process of hypothesis testing consists of a series of steps :
State the hypothesis
↓
Select the
appropriate test statistic
↓
Specify the level of
significance
↓
State the decision
rule regarding the hypothesis
↓
Collect the sample
and calculate the sample statistics
↓
Make a decision
regarding the hypothesis
↓
Make a decision based
on the results of the test
The Null
Hypothesis and Alternative Hypothesis
The null
hypothesis, designated H0, is the hypothesis that the researcher wants to
reject. It is the hypothesis that is actually tested and is the basis for the
selection of the test statistics. The null is generally stated as a simple
statement about a population parameter.
The alternative
hypothesis, designated Ha, is what is concluded if there is sufficient
evidence to reject the null hypothesis. It is usually the alternative
hypothesis that you are really trying to assess. Why? Since you can never
really prove anything with statistics, when the null hypothesis is discredited,
the implication is that the alternative hypothesis is valid.
B. Distinguish between one-tailed and two-tailed
tests of hypotheses :
The alternative hypothesis can be one-sided or two-sided. A one-sided
test is referred to as a one-tailed test, and a two –sided test is
referred to as a two-tailed test. Whether the test is one- or two-sided depends
on the proposition being tested. If a researcher wants to test whether the
return on the stock options is greater than zero, a one-tailed test should be
used. However, a two-tailed test should be used if the research question is
whether the return on options is simply different from zero. Two-sided tests allow
for deviation on both sides of the hypothesized value (zero). In practice most
hypothesis tests are constructed as two-tailed tests.
A two-tailed test for the population mean may be structured as :
H0 : µ = µ0 versus Ha : µ ≠ µ0
Since the alternative hypothesis allows for values above and below the
hypothesized parameter, a two-tailed test uses two critical values (or rejection points).
The general decision rule for a two-tailed test is :
Reject H0 if : test
statistic > upper critical value or
Test
statistic < lower critical value.
The general decision rule for a two-tailed z-test at α = 0,05 can be
stated as :
Reject H0 if : test
statistic > -1,96 or
Test
statistic < 1,96.
For a one-tailed test for the population mean may be structured as either
:
Upper tail : H0 : µ ≤
µ0 versus Ha : µ > µ0 or
Lower tail : H0 : µ ≥ µ0 versus Ha : µ < µ0.
The Choice of the
Null and Alternative Hypotheses
The most common null hypothesis will be an “equal to” hypothesis.
Combined with a “not equal to” alternative, this will required a two-tailed
test. The alternative is often the hoped-for hypothesis.
When the null is less than or equal to, the (mutually exclusive)
alternative is framed as greater than, and a one-tail test is appropriate. If
we are trying to demonstrate that a return is greater than the risk-free rate,
this would be the correct information.
C. Distinguish between one-tailed and two-tailed
tests of hypotheses:
Hypothesis testing involves two statistics : the test statistic
calculated from the sample data and the critical value of the test statistic.
The value of the computed test statistic relative to the critical value is a
key step in assessing the validity of a hypothesis.
A test statistic is calculated by comparing the point estimate of the
population parameter with the hypothesized value of the parameter. The test statistic is the difference between the sample statistic and the hypothesized
value, scaled by the standard error of the sample statistic.
The standard error of the sample statistic is the adjusted deviation of
the sample. When the sample statistic is the sample mean, , the standard error of the sample statistic for sample size n, is
calculated as: when the population standard
deviation is known and when the population standard
deviation is unknown.
Type I and Type II Errors
Keep in mind that hypothesis testing is used to make inferences about the
parameters of a given population on the basis of statistics computed for a
sample that is drawn from the population. We must be aware that there is some
probability that the sample, in some way, does not represent the population,
and any conclusion based on the sample about the population may be made in
error.
When drawing inferences from a hypothesis test, there are two types of
error :
- Type I error : the rejection of the null hypothesis when it is actually true.
- Type II error : the failure to reject the null hypothesis when it is actually false.
The significance level is the probability of making a Type I error and is
designated by α. For instance, a significance level of 5% means there is a
5% chance of rejecting a true null hypothesis.
D. Explain a decision rule, the power of a test,
and the relation between confidence intervals and hypothesis tests :
The decision for a hypothesis test is to either reject the null
hypothesis or fail to reject the null hypothesis. Note that it is statistically
incorrect to say “accept” the null hypothesis; it can only be supported or
rejected. The decision rule for rejecting or failing to reject the null hypothesis is based on the
distribution of the test statistic.
A decision rule is specific and quantitative. Once we have determined we
have determined a one- or two-tailed test is appropriate, the significance
level we require, and the distribution of the test statistic, we can calculate
the exact critical value for the test statistic. Then we have a decision rule
of the following form: if the test statistic is (greater, less than) the value
X, reject the null.
The Power of a Test
While the significance level of a test is the probability of rejecting
the null hypothesis when it is true, the power of a test is the probability of
correctly rejecting the null hypothesis when it is false. The power of a test
is actually one minus the probability of making a Type II error, or 1 –
P(Type II error). When more than one test statistic may be used, the power of
the test for the competing test statistics may be useful in deciding which test
statistic to use. Ordinarily, we wish to use the test statistic that provides
the most powerful test among all possible tests.
Decision |
True condition |
|
H0 is true |
H0 is false |
|
Do not reject H0 |
Correct decision |
Incorrect decision Type II
error |
Reject H0 |
Incorrect decision Type I error Significance
level, α, =
P(Type I error) |
Correct decision Power of the test = 1 – P(Type II error) |
The relation between Confidence Intervals and
Hypothesis Tests
A confidence interval is a range of values within which the researcher believes
the true population parameter may lie.
A confidence interval is determined as :
The interpretation of a confidence interval is that for a level of
confidence of 95%, for example, there is a 95% probability that the true
population parameter is contained in the interval.
From the previous expression, we can see that a confidence interval and a
hypothesis test are linked by the critical value :
- critical value ≤ test statistic ≤ +critical value
This is the range within which we fail to reject the null for a
two-tailed hypothesis test at a given level of significance.
E. Distinguish between a statistical result and
an economically meaningful result :
Statistical significance does not necessarily imply economic
significance. Several factors must be
considered.
One important consideration is transaction costs. Once we consider the
costs of buying and selling the securities, we may find that the mean positive
returns to the strategy are not enough to generate positive returns. Taxes are
another factor that may make a seemingly attractive strategy a poor one in
practice. A third reason that statistically significant results may not be
economically significant is risk.
All these factors could make committing funds to a strategy unattractive,
even though the statistical evidence of positive returns is highly significant.
F. Explain and interpret the p-value as it
relates to hypothesis testing :
The p-value is the probability of obtaining a test statistic that would
lead to a rejection of the null hypothesis, assuming the null hypothesis is
true. For one-tailed tests, the p-value is the probability that lies above the
computed test statistic for upper tail tests or below the computed test
statistic for lower tail tests. For two-tailed tests the p-value is the probability
that lies above the positive value of the compounded test statistic plus the
probability that lies below the negative value of the computed test statistic.
G. Identify the appropriate test statistic and
interpret the results for a hypothesis test concerning the population mean of
both large and small samples when the population is normally distributed and
the variance is 1) known or 2) unknown :
When hypothesis testing, the choice between using a critical value based
on the t-distribution or the z-distribution depends on sample size, the
distribution of the population, and whether or not the variance of the
population is known.
The t-test
The t-test is a widely known used hypothesis test that employs a test
statistic that is distributed according to a t-distribution. Use the t-test if
the population variance is unknown and either of the following conditions exist
:
- The sample is large (n ≥ 30).
- The sample is small (less than 30), but the distribution of the
population is normal or approximately normal.
If the sample is small and the distribution is non-normal, we have no
reliable statistical test.
The computed value for the test statistic based on the t-distribution is
referred to as the t-statistic. For hypothesis tests with a population mean, a
t-statistic with n – 1 degrees of freedom is computed as :
with:
= sample mean
= hypothesized population mean (i.e., the null)
s = standard deviation of the sample
n = sample size
The z-test
The z-test is the appropriate hypothesis of the population mean when the
population is normally distributed with known variance. The computed test
statistic used with the z-test is referred to as the z-statistic. The
z-statistic for a hypothesis test for a population mean is computed as follows
:
with:
= sample mean
= hypothesized population mean (i.e., the null)
σ = standard deviation of the sample
n = sample size
To test a hypothesis, the z-statistic is compared to the critical z-value
corresponding to the significance of the test.
When the sample is large and the population variance is unknown, the
z-statistic is
Note the use of the sample standard deviation, s, versus the population
standard deviation, σ. Remember, this is
acceptable if the sample size is large, although the t-statistic is the more
conservative measure when the population variance is unknown.
H. Identify the appropriate test statistic and
interpret the results for a hypothesis test concerning the population mean of
two at least approximately normal distributed populations, based on independent
random samples with 1) equal or 2) unequal assumed variances :
There are two t-tests that are used to test differences between the means
of two populations. Application of either of these tests requires that we are
reasonably certain that our samples are independent and that theu are taken from two populations that are normally
distributed. Both of these t-tests can be used when the population variance is
unknown. In one case, the population variances are assumed to be equal, and the
sample observations are pooled. In the other case, however, no assumption is
made regarding the equality between the two population variances, and the
t-test uses an approximated value of the degrees of freedom.
A pooled variance is used with the t-test for testing the hypothesis that
the means of two normally distributed populations are equal, when the variances
of the populations are unknown but assumed to be equal.
Assuming independent samples, the t-statistic is this case is computed as
:
where :
variance of the first sample
variance of the second sample
number of observations in the
first sample
number of observations in the
second sample
The degrees of freedom, df, is (
Since we assume that the variances are equal, we just add the variances
of the two sample means in order to calculate the standard error in the
denominator.
The t-test for equality of population means when the populations are
normally distributed and we have variances that are unknown and assumed to be
unequal uses the sample variances for both populations. Assuming independent
samples, the t-statistic in this case is computed as follows :
where
degrees of freedom =
and where
variance of the first sample
variance of the second sample
number of observations in the
first sample
number of observations in the
second sample
If the sample means are very close together and the numerator of the
t-statistic are small, and we do not reject equality. If the sample means are
far apart, the numerator of the t-statistic are large, and we reject equality.
I. Identify the appropriate test statistic and
interpret the results for a hypothesis test concerning the mean difference of
two normally distributed populations :
While the tests considered in the previous section were of the difference
between the means of two independent samples, sometimes our samples may be
dependent. If the observations in the two samples both depend on some other
factor, we can construct a “paired comparisons” test of whether the means of
the differences between observations for the two samples are different.
Dependence may result from an event that affects both sets of observations for
a number of companies or because observations for two firms over time are both
influenced by market returns or economic conditions.
Remember, the paired comparisons test also requires that the sample data
be normally distributed. Although we frequently just want to test the
hypothesis that the mean of the differences in the pairs is zero, the general
form of the test for any hypothesized mean difference is as follows :
H0 : μd = μdz versus Ha : μd ≠ μdz
where
μd = mean of
the population of paired differences
μdz =
hypothesized mean of paired differences, which is commonly zero.
For
one-tail tests, the hypotheses are structured as either :
H0 : μd ≤ μdz versus Ha : μd > μdz
or, H0 : μd ≥ μdz versus Ha : μd < μdz
For the paired comparisons
test, the t-statistic with n – 1 degrees of freedom is computed as :
where
= sample mean difference =
di = difference between the ith pair
of observations
= standard error of the mean difference =
sd = sample standard deviation =
n = the number of paired observations
- The test of the differences in means is used when there are two
independent samples.
- A test of the significance of the mean of the differences paired
observations is used when the samples are not independent.
J. Identify the appropriate test statistic and
interpret the results for a hypothesis test concerning 1) the variance of a
normally distributed population, and 2) the equality of the variances of two
normally distributed populations based on two independent random samples :
The chi-square test is used for hypothesis tests concerning the variance
of a normally distributed population. The hypotheses for two-tailed tests are
structured as :
H0 : σ2 = σ02
versus Ha : σ2 ≠ σ02
The hypotheses for one-tailed tests are structured as :
H0 : σ2 ≤ σ02
versus Ha : σ2 > σ02 or,
H0 : σ2
≥ σ02 versus Ha : σ2 <
σ02
Hypothesis testing using of the population variance requires the use of a
chi-square distributed test statistic, denoted χ2.
The chi-square distribution is asymmetrical and approaches the normal
distribution in shape as the degrees of freedom increase.
The
chi-square test statistic with n – 1 degrees of freedom is computed as
where:
n = sample size
= sample variance
= hypothesized value for the
population variance
Note that the chi-square distribution is bounded below by zero,
chi-square values cannot be negative.
Testing
the Equality of the Variances of Two Normally Distributed Populations, Based on
Two Independent Random Samples
The hypotheses concerned with
the equality of the variances of two populations are tested with an
F-distributed test statistic. Hypothesis testing using a test statistic that
follows an F-distribution is referred to as the F-test. The F-test is used
under the assumption that the populations from which the samples are drawn are
normally distributed and that the samples are independent.
If we let and represent the variances of normal
Population 1 and Population 2, respectively, the hypotheses for the two-tailed
F-test differences in the variances can be structured as :
H0 : σ12
= σ22 versus Ha : σ12
≠ σ22
and the one-sided test structures can be specified as :
H0 : σ12 ≤
σ22 versus Ha : σ12
> σ22, or H0 : σ12 ≥
σ22 versus Ha : σ12
< σ22
The test statistic for the F-test is the ratio of the sample variances.
The F-statistic is computed as :
F =
where:
= variance of the sample of n1
observations drawn from Population 1
= variance of the sample of n2
observations drawn from Population 2
The F-distribution is right-skewed and is truncated at zero on the
left-hand side. The shape of the F-distribution is determined by two separate degrees
of freedom. The rejection region is in the right-side tail of the distribution.
K. Distinguish between parametric and
nonparametric tests and describe the situations in which the use of parametric
tests may be appropriate :
Parametric
tests rely on assumptions regarding
the distribution of the population and are specific to population parameters.
Nonparametric
tests either do not consider a
particular population parameter or have few assumptions about the population
that is sampled. Nonparametric tests are used when there is concern about
quantities other than the parameters of a distribution or when the assumptions
of parametric tests can’t be supported. They are also used when the data are
not suitable for parametric tests.
Situation where a nonparametric test is called for are the following :
1.
The assumptions
about the distribution of the random variable that support a parametric test
are not met.
2.
When data are ranks
rather than values.
3.
The hypothesis does
not involve the parameters of the distribution, such as testing whether a
variable is normally distributed. We can use a nonparametric test, called a
runs test, to determine whether data are random. A runs test provides an
estimate of the probability that a series of changes (+, +, -, -, +, -, ... )
are random.
The Spearman ran correlation test can be used when the data are
not normally distributed. Consider the performance ranks of 20 mutual funds for
2 years. The ranks (1 through 20) are not normally distributed, so a standard
t-test of the correlations is not appropriate. A large positive value of the
Spearman rank correlations, such as 0,85, would indicate that a high (low) rank
in one year is associated with a high (low) rank in the second year. Alternatively,
a large negative rank correlation would indicate that a high rank in year 1
suggests a low rank in year 2, and vice versa.
|