11. HYPOTHESIS TESTING

A. Define a hypothesis, describe the steps of hypothesis testing, and describe and interpret the choice of null alternative hypotheses :

A hypothesis is a statement about the value of a population parameter developed for the purpose of testing a theory or belief. Hypotheses are tested in terms of the population parameter to be tested, like the population mean µ. The process of hypothesis testing consists of a series of steps :

State the hypothesis

↓

Select the appropriate test statistic

↓

Specify the level of significance

↓

State the decision rule regarding the hypothesis

↓

Collect the sample and calculate the sample statistics

↓

Make a decision regarding the hypothesis

↓

Make a decision based on the results of the test

The Null Hypothesis and Alternative Hypothesis

The null hypothesis, designated H0, is the hypothesis that the researcher wants to reject. It is the hypothesis that is actually tested and is the basis for the selection of the test statistics. The null is generally stated as a simple statement about a population parameter.

The alternative hypothesis, designated Ha, is what is concluded if there is sufficient evidence to reject the null hypothesis. It is usually the alternative hypothesis that you are really trying to assess. Why? Since you can never really prove anything with statistics, when the null hypothesis is discredited, the implication is that the alternative hypothesis is valid.

B. Distinguish between one-tailed and two-tailed tests of hypotheses :

The alternative hypothesis can be one-sided or two-sided. A one-sided test is referred to as a one-tailed test, and a two –sided test is referred to as a two-tailed test. Whether the test is one- or two-sided depends on the proposition being tested. If a researcher wants to test whether the return on the stock options is greater than zero, a one-tailed test should be used. However, a two-tailed test should be used if the research question is whether the return on options is simply different from zero. Two-sided tests allow for deviation on both sides of the hypothesized value (zero). In practice most hypothesis tests are constructed as two-tailed tests.

A two-tailed test for the population mean may be structured as :

H0 : µ = µ0 versus Ha : µ ≠ µ0

Since the alternative hypothesis allows for values above and below the hypothesized parameter, a two-tailed test uses two critical values (or rejection points).

The general decision rule for a two-tailed test is :

Reject H0 if : test statistic > upper critical value or

Test statistic < lower critical value.

The general decision rule for a two-tailed z-test at α = 0,05 can be stated as :

Reject H0 if : test statistic > -1,96 or

Test statistic < 1,96.

For a one-tailed test for the population mean may be structured as either :

Upper tail : H0 : µ ≤ µ0 versus Ha : µ > µ0 or

Lower tail : H0 : µ ≥ µ0 versus Ha : µ < µ0.

The Choice of the Null and Alternative Hypotheses

The most common null hypothesis will be an “equal to” hypothesis. Combined with a “not equal to” alternative, this will required a two-tailed test. The alternative is often the hoped-for hypothesis.

When the null is less than or equal to, the (mutually exclusive) alternative is framed as greater than, and a one-tail test is appropriate. If we are trying to demonstrate that a return is greater than the risk-free rate, this would be the correct information.

C. Distinguish between one-tailed and two-tailed tests of hypotheses:

Hypothesis testing involves two statistics : the test statistic calculated from the sample data and the critical value of the test statistic. The value of the computed test statistic relative to the critical value is a key step in assessing the validity of a hypothesis.

A test statistic is calculated by comparing the point estimate of the population parameter with the hypothesized value of the parameter. The test statistic is the difference between the sample statistic and the hypothesized value, scaled by the standard error of the sample statistic.

The standard error of the sample statistic is the adjusted deviation of the sample. When the sample statistic is the sample mean, , the standard error of the sample statistic for sample size n, is calculated as: when the population standard deviation is known and when the population standard deviation is unknown.

Type I and Type II Errors

Keep in mind that hypothesis testing is used to make inferences about the parameters of a given population on the basis of statistics computed for a sample that is drawn from the population. We must be aware that there is some probability that the sample, in some way, does not represent the population, and any conclusion based on the sample about the population may be made in error.

When drawing inferences from a hypothesis test, there are two types of error :

- Type I error : the rejection of the null hypothesis when it is actually true.

- Type II error : the failure to reject the null hypothesis when it is actually false.

The significance level is the probability of making a Type I error and is designated by α. For instance, a significance level of 5% means there is a 5% chance of rejecting a true null hypothesis.

D. Explain a decision rule, the power of a test, and the relation between confidence intervals and hypothesis tests :

The decision for a hypothesis test is to either reject the null hypothesis or fail to reject the null hypothesis. Note that it is statistically incorrect to say “accept” the null hypothesis; it can only be supported or rejected. The decision rule for rejecting or failing to reject the null hypothesis is based on the distribution of the test statistic.

A decision rule is specific and quantitative. Once we have determined we have determined a one- or two-tailed test is appropriate, the significance level we require, and the distribution of the test statistic, we can calculate the exact critical value for the test statistic. Then we have a decision rule of the following form: if the test statistic is (greater, less than) the value X, reject the null.

The Power of a Test

While the significance level of a test is the probability of rejecting the null hypothesis when it is true, the power of a test is the probability of correctly rejecting the null hypothesis when it is false. The power of a test is actually one minus the probability of making a Type II error, or 1 – P(Type II error). When more than one test statistic may be used, the power of the test for the competing test statistics may be useful in deciding which test statistic to use. Ordinarily, we wish to use the test statistic that provides the most powerful test among all possible tests.

Decision	True condition
Decision	H₀ is true	H₀is false
Do not reject H₀	Correct decision	Incorrect decision Type II error
Reject H₀	Incorrect decision Type I error Significance level, α, = P(Type I error)	Correct decision Power of the test = 1 – P(Type II error)

The relation between Confidence Intervals and Hypothesis Tests

A confidence interval is a range of values within which the researcher believes the true population parameter may lie.

A confidence interval is determined as :

The interpretation of a confidence interval is that for a level of confidence of 95%, for example, there is a 95% probability that the true population parameter is contained in the interval.

From the previous expression, we can see that a confidence interval and a hypothesis test are linked by the critical value :

- critical value ≤ test statistic ≤ +critical value

This is the range within which we fail to reject the null for a two-tailed hypothesis test at a given level of significance.

E. Distinguish between a statistical result and an economically meaningful result :

Statistical significance does not necessarily imply economic significance. Several factors must be considered.

One important consideration is transaction costs. Once we consider the costs of buying and selling the securities, we may find that the mean positive returns to the strategy are not enough to generate positive returns. Taxes are another factor that may make a seemingly attractive strategy a poor one in practice. A third reason that statistically significant results may not be economically significant is risk.

All these factors could make committing funds to a strategy unattractive, even though the statistical evidence of positive returns is highly significant.

F. Explain and interpret the p-value as it relates to hypothesis testing :

The p-value is the probability of obtaining a test statistic that would lead to a rejection of the null hypothesis, assuming the null hypothesis is true. For one-tailed tests, the p-value is the probability that lies above the computed test statistic for upper tail tests or below the computed test statistic for lower tail tests. For two-tailed tests the p-value is the probability that lies above the positive value of the compounded test statistic plus the probability that lies below the negative value of the computed test statistic.

G. Identify the appropriate test statistic and interpret the results for a hypothesis test concerning the population mean of both large and small samples when the population is normally distributed and the variance is 1) known or 2) unknown :

When hypothesis testing, the choice between using a critical value based on the t-distribution or the z-distribution depends on sample size, the distribution of the population, and whether or not the variance of the population is known.

The t-test

The t-test is a widely known used hypothesis test that employs a test statistic that is distributed according to a t-distribution. Use the t-test if the population variance is unknown and either of the following conditions exist :

- The sample is large (n ≥ 30).

- The sample is small (less than 30), but the distribution of the population is normal or approximately normal.

If the sample is small and the distribution is non-normal, we have no reliable statistical test.

The computed value for the test statistic based on the t-distribution is referred to as the t-statistic. For hypothesis tests with a population mean, a t-statistic with n – 1 degrees of freedom is computed as :

with:

= sample mean

= hypothesized population mean (i.e., the null)

s = standard deviation of the sample

n = sample size

The z-test

The z-test is the appropriate hypothesis of the population mean when the population is normally distributed with known variance. The computed test statistic used with the z-test is referred to as the z-statistic. The z-statistic for a hypothesis test for a population mean is computed as follows :

with:

= sample mean

= hypothesized population mean (i.e., the null)

σ = standard deviation of the sample

n = sample size

To test a hypothesis, the z-statistic is compared to the critical z-value corresponding to the significance of the test.

When the sample is large and the population variance is unknown, the z-statistic is

Note the use of the sample standard deviation, s, versus the population standard deviation, σ. Remember, this is acceptable if the sample size is large, although the t-statistic is the more conservative measure when the population variance is unknown.

H. Identify the appropriate test statistic and interpret the results for a hypothesis test concerning the population mean of two at least approximately normal distributed populations, based on independent random samples with 1) equal or 2) unequal assumed variances :

There are two t-tests that are used to test differences between the means of two populations. Application of either of these tests requires that we are reasonably certain that our samples are independent and that theu are taken from two populations that are normally distributed. Both of these t-tests can be used when the population variance is unknown. In one case, the population variances are assumed to be equal, and the sample observations are pooled. In the other case, however, no assumption is made regarding the equality between the two population variances, and the t-test uses an approximated value of the degrees of freedom.

A pooled variance is used with the t-test for testing the hypothesis that the means of two normally distributed populations are equal, when the variances of the populations are unknown but assumed to be equal.

Assuming independent samples, the t-statistic is this case is computed as :

where :

variance of the first sample

variance of the second sample

number of observations in the first sample

number of observations in the second sample

The degrees of freedom, df, is (

Since we assume that the variances are equal, we just add the variances of the two sample means in order to calculate the standard error in the denominator.

The t-test for equality of population means when the populations are normally distributed and we have variances that are unknown and assumed to be unequal uses the sample variances for both populations. Assuming independent samples, the t-statistic in this case is computed as follows :

where

degrees of freedom =

and where

variance of the first sample

variance of the second sample

number of observations in the first sample

number of observations in the second sample

If the sample means are very close together and the numerator of the t-statistic are small, and we do not reject equality. If the sample means are far apart, the numerator of the t-statistic are large, and we reject equality.

I. Identify the appropriate test statistic and interpret the results for a hypothesis test concerning the mean difference of two normally distributed populations :

While the tests considered in the previous section were of the difference between the means of two independent samples, sometimes our samples may be dependent. If the observations in the two samples both depend on some other factor, we can construct a “paired comparisons” test of whether the means of the differences between observations for the two samples are different. Dependence may result from an event that affects both sets of observations for a number of companies or because observations for two firms over time are both influenced by market returns or economic conditions.

Remember, the paired comparisons test also requires that the sample data be normally distributed. Although we frequently just want to test the hypothesis that the mean of the differences in the pairs is zero, the general form of the test for any hypothesized mean difference is as follows :

H₀ : μ_d = μ_dz versus H_a : μ_d ≠ μ_dz

where

μ_d= mean of the population of paired differences

μ_dz= hypothesized mean of paired differences, which is commonly zero.

For one-tail tests, the hypotheses are structured as either :

H₀ : μ_d ≤ μ_dz versus H_a : μ_d > μ_dzor, H₀ : μ_d ≥ μ_dz versus H_a : μ_d < μ_dz

For the paired comparisons test, the t-statistic with n – 1 degrees of freedom is computed as :

where

= sample mean difference =

d_i = difference between the ith pair of observations

= standard error of the mean difference =

s_d = sample standard deviation =

n = the number of paired observations

- The test of the differences in means is used when there are two independent samples.

- A test of the significance of the mean of the differences paired observations is used when the samples are not independent.

J. Identify the appropriate test statistic and interpret the results for a hypothesis test concerning 1) the variance of a normally distributed population, and 2) the equality of the variances of two normally distributed populations based on two independent random samples :

The chi-square test is used for hypothesis tests concerning the variance of a normally distributed population. The hypotheses for two-tailed tests are structured as :

H₀ : σ² = σ₀² versus H_a : σ² ≠ σ₀²

The hypotheses for one-tailed tests are structured as :

H₀ : σ² ≤ σ₀² versus H_a : σ² > σ₀²or, H₀ : σ² ≥ σ₀² versus H_a : σ² < σ₀²

Hypothesis testing using of the population variance requires the use of a chi-square distributed test statistic, denoted χ². The chi-square distribution is asymmetrical and approaches the normal distribution in shape as the degrees of freedom increase.

The chi-square test statistic with n – 1 degrees of freedom is computed as

where:

n = sample size

= sample variance

= hypothesized value for the population variance

Note that the chi-square distribution is bounded below by zero, chi-square values cannot be negative.

Testing the Equality of the Variances of Two Normally Distributed Populations, Based on Two Independent Random Samples

The hypotheses concerned with the equality of the variances of two populations are tested with an F-distributed test statistic. Hypothesis testing using a test statistic that follows an F-distribution is referred to as the F-test. The F-test is used under the assumption that the populations from which the samples are drawn are normally distributed and that the samples are independent.

If we let and represent the variances of normal Population 1 and Population 2, respectively, the hypotheses for the two-tailed F-test differences in the variances can be structured as :

H₀ : σ₁² = σ₂² versus H_a : σ₁² ≠ σ₂²

and the one-sided test structures can be specified as :

H₀ : σ₁² ≤ σ₂² versus H_a : σ₁² > σ₂², or H₀ : σ₁² ≥ σ₂² versus H_a : σ₁² < σ₂²

The test statistic for the F-test is the ratio of the sample variances. The F-statistic is computed as :

F =

where:

= variance of the sample of n₁ observations drawn from Population 1

= variance of the sample of n₂ observations drawn from Population 2

The F-distribution is right-skewed and is truncated at zero on the left-hand side. The shape of the F-distribution is determined by two separate degrees of freedom. The rejection region is in the right-side tail of the distribution.

K. Distinguish between parametric and nonparametric tests and describe the situations in which the use of parametric tests may be appropriate :

Parametric tests rely on assumptions regarding the distribution of the population and are specific to population parameters.

Nonparametric tests either do not consider a particular population parameter or have few assumptions about the population that is sampled. Nonparametric tests are used when there is concern about quantities other than the parameters of a distribution or when the assumptions of parametric tests can’t be supported. They are also used when the data are not suitable for parametric tests.

Situation where a nonparametric test is called for are the following :

1. The assumptions about the distribution of the random variable that support a parametric test are not met.

2. When data are ranks rather than values.

3. The hypothesis does not involve the parameters of the distribution, such as testing whether a variable is normally distributed. We can use a nonparametric test, called a runs test, to determine whether data are random. A runs test provides an estimate of the probability that a series of changes (+, +, -, -, +, -, ... ) are random.

The Spearman ran correlation test can be used when the data are not normally distributed. Consider the performance ranks of 20 mutual funds for 2 years. The ranks (1 through 20) are not normally distributed, so a standard t-test of the correlations is not appropriate. A large positive value of the Spearman rank correlations, such as 0,85, would indicate that a high (low) rank in one year is associated with a high (low) rank in the second year. Alternatively, a large negative rank correlation would indicate that a high rank in year 1 suggests a low rank in year 2, and vice versa.

Follow this link to next chapter :

Follow this link to Summary.

CFA level i PREPARATION

11. HYPOTHESIS TESTING