Exercise 2, BA 3300 Business Statistics
Question: Gender and Grades --is there a difference?
It is a common stereotype that females have lesser quantitative skills
A Statistics instructor has a theory that females on average have higher
grades than males in quantitative courses at UM-St. Louis.
These data points were chosen so that they consisted of pairs carefully
matched on a variety of characteristics, such as age, experience, prior
education and training, ambition, destructive activities, IQ of parents,
and factors subject to free will.
He believes that, while the overall means of these two distributions
are not vastly different, the small gender difference may be detected if
enough of the other factors affecting the grades could be controlled and
adjusted for (the Bill Cosby school of detecting causality).
First, we will just do a two-sample t-test to see if the means
of the two groups are significantly different, and then we will look at
a distribution of the differences between pairs (a one-sample t-test)
to see if the differences have a mean significantly different from zero.
We will compare that to the EXCEL paired two sample t-test procedure.
He suspects females may do better, but others think it might go the
other way, so we will use two-tailed tests in the t-tests.
Being a lazy sort of fellow, he is willing to use built-in tools in
EXCEL to do analyses--such as t-tests- rather than resort to his Monroe
calculator and a lot of scratchpads. He has hired you as the lab assistant
to do the work. He has provided a Comma-delimited ASCII file of his data
as sexscore.txt .
Copy and paste the data into an EXCEL sheet and immediately do a SaveAs
to your A: disk, naming it sexgrad.xls. (make sure file type is EXCEL.
It's not enough to just change the extension)
Your mission is to analyze these data, printing to one page. Do this by selecting a range that includes your analyses
and only the top of the dataset (showing your student number used in the
data). Print a selected range, preview / setup click the button to fit
to 1 page by l page tall. use landscape or portrait to maximize print size
on the one page. You can also adjust margins to help with this. Hand in
a printout as exercise 2 and put a copy of the file in the digital dropbox
on mygateway. The finished file for the example we did in class will be
a good guide on what to do and how to do it.
Before you go any further, insert a few rows at the top, type in your
name, section time and student number so I can tell who did it.
The first few data have the letters A B C D in them. Replace the
letters with the last 4 numbers in your student
number in order. For example, if your student number is 1036782:
these numbers, as used in the analysis, would be 41.6, 37.7, 43.8,
39.2 Do not use your Social Security number for this!
Use the following tools in EXCEL (you don't have to do them exactly in
this order. If you are not sure about some of them, do the easy ones first):
Comment on the results of each analysis. This will work well if you put
the interpretations in text boxes by the analyses they refer to.
Textboxes are found on the drawing toolbar. Install the toolbar by
view / toolbars/ drawing. it usually goes down the bottom of the window,
click on the one that looks like a box of text with an "A" in it, then
click on your sheet where you want it and drag to a starting size. click
in it to type, click on the edge to move it or resize it or delete it by
pushing delete after clicking on the edge.
I recommend against using the cell comment tool in EXCEL, as getting them
to print out correctly can be a challenge.
Derive tables of descriptive statistics for each of the two samples, male
and female, including 95% confidence intervals. Instead of doing descriptive
statistics for the combined data (which would require reconfiguring it
to be in one column) just use the Functions to get a Mean and variance
on the combined data . Comment on whether it looks like these two samples
came from populations with different means, referring to the sample means,
standard errors and confidence intervals. What happened to the variance
when the data was split into two groups -versus combined?
Construct histograms (histograms are frequency GRAPHs) on the two groups
(males and females) separately,and the combined data, making sure the X
axis has the right values as labels on the axis so you can align them for
easy visual comparison. Use about 15-20 bins so you can see the general
shapes of the distributions. Do these look like the samples came from populations
with different means?
Conduct an F test (alpha=0.05) on the variances of the two distributions
and decide whether to do a t-test assuming equal or unequal variances.
Comment on the resulting F value and Pvalue. What is Ho in this test?
is it rejected? What does the Pvalue mean?
Conduct the appropriate (unpaired) t-test with alpha=0.05 to see if the
means of the two distributions are significantly different. What is Ho?
What is Ha?
Comment on the results. Is there a significant difference? What is
the probability of getting this observed t-statistic if the Ho is true?
Do you conclude from this analysis that males and females have different
average grades in the parent populations?
Retest using the pairing:
The strategy of pairing values to control for other variables will allow
us to detect differences that might not be seen in the midst of other effects.
we will do this manually with functions first, then compare the result
to what we get from the data analysis tools procedure. These should be
exactly the same.
Add a third column containing the differences between the pairs. Derive
descriptive statistics and a confidence interval (95% confidence) on the
column of differences. Calculate a t value for the mean of this distribution
to see if it is different from zero. You will have to calculate the t value
manually to do this, and then get a critical t and a Pvalue by using the
EXCEL functions TDIST and TINV. When asked for the observed T value
("x") in the Tdist Function, use the Function abs(X) to fix it so that
even if the Tvalue is negative, a positive number is obtained is fed into
the Tdist function.
You have to do this to go around the bug
in the Tdist function that keeps it from giving a result when the
Tobserved is negative.
Comment on the observed t statistic and the Pvalue in drawing a conclusion
whether the mean of this set of differences is significantly different
Conduct a Procedure (data analysis tools) t-test for difference of
the means of the two distributions as paired variables, with alpha =0.05
comment on the results. Are they "significant"? what does that mean?
How does this compare to the t-test on the column of differences?
Why didn't we see the difference between males and females when we did
the analysis without pairing the observations? What did the pairing do
for us mathematically so that we could identify a difference with higher
In all of these, why is it best to use a t test rather than a Z test even
though there is a large number of observations?
A preview of regression and correlation:
highlight both original columns again, click on the chart icon and this
time select an XY scatter chart. Click next a few times, making sure that
the chart appears as an object in the same spreadsheet, then move it to
a convenient location. Right click on the middle of the data and click
add trendline. This will bring up a dialogue where you can select a linear
trendline under type, and then, on the options tab, check the boxes for
display equation on chart and don't set intercept = 0. Consider
the result. How well do the data conform to the line (are there a
lot of points far away from the line)? What does this mean in comparing
male to female grades when values are paired this way? If you wanted to
predict a female's grade knowing only the grade of her pair partner, how
would you calculate your best guess? (What is the prediction equation
to use for this?)