Randomization Test Homework

 

A)  The histograms below represent the distribution of random combinations (resamplings) of our class opinion poll data. Each resampling was created by putting all of the data together into one large population and then randomly picking 88 of these to represent the male opinion (the rest would represent the female opinion).  Then the difference between the means of each group was calculated.  The first graph represents the distribution of 1000 of these differences in means.  Note that the difference in the means (between male and female) was about 0.5.

 

Graph 1

 

Graph 2

 

1.   Use graph 1 to estimate the probability that a random sample yields a larger difference in means than our classes differences in means.  In other words, calculate the number of samples which have a difference of means greater than 0.5 and the number of samples which have a difference of means smaller than -0.5 (Recall that since we randomly assigned gender to our data we could have group A represent males for one sample and represent females in another sample.  This would give us different signs when we subtract the means, so we must look at both tails of the data: plus and minus.).  The sum of these numbers will be the total number of samples with a difference of means larger than the actual classes difference in means (0.5).  Finally convert this to the percent of samples with a larger difference of means. 

2.   Use graph 2 to estimate the probability that a random sample yields a larger difference in means than our classes differences in means. 

3.   Below is the result of a randomization test on the opinion poll with 10,000 random samples. 

                          a.     Does the probability shown agree with your estimates? 

                          b.     Could the difference in the means of the male and female students¹ opinions for this poll have happened by chance (common) or is it significantly different (uncommon)?  Explain.

 

 

 

 


B)  Consider the following Calculus exam scores:

Fall Exam 1:  

98     69     97     94     85     69     80     97     110   62     82     86     105   95     70         47     94     80     94     87     91     59     91     105   119   70     108   91     104   70         96     105   102   66     114                    

Spring Exam 1:

73     79     89     77     89     95     88     81     77     127   84     85     43     63     87         84     97     59     90     58     44     31     73     85     77     59     68     98     67     71         80     96     110   76     84     83     108   93

Fall Exam 2:

94     74     90     85     84     39     73     91     99     70     69     82     78     67     51         98     73     96     81     95     80     88     121   98     76     40     73     95     87     90

 

1)            Scenario 1:  Compare the student scores on Fall Exam 1 verses Spring Exam 1 by answering the following.

a.   Create a double stem and leaf plot for Fall Exam 1 verses Spring Exam 1.

You may want to enter the data into your calculator and then have the calculator sort the data [STAT SortA(L1)].

b.   Create side-by-side box and whisker plots of these scores. 

c.    Compare the data using the graphs you created.

                                     i.     What information about the distribution (shape, center, spread) can you deduce from the double stem and leaf plot?

                                    ii.     What information about the distribution can you deduce from the side-by-side box and whisker plots? (Note #27 on page 216 from your text HW might be helpful in answering part of this.)

                                  iii.     Which class did better, and what (graphical) evidence do you have to support this?

d.   What factors could explain the difference in the grades?

 

2)            Scenario 2:  Compare the student scores on Fall Exam 1 verses Fall Exam 2 by answering the following.

a.   Create a double stem and leaf plot for Fall Exam 1 verses Fall Exam 2.

b.   Create side-by-side box and whisker plots of these scores. 

c.    Compare the data using the graphs you created.

                                     i.     What information about the distribution can you deduce from the double stem and leaf plot?

                                    ii.     What information about the distribution can you deduce from the side-by-side box and whisker plots?

                                  iii.     Which exam did the class perform better on, and what evidence do you have to support this?

d.   Note that 5 student who took the 1st exam did not take the 2nd exam.  Which 5 (in ranking) do you think opted not to take the 2nd exam?  Use the graphs to justify your hypothesis (What graphical evidence supports your claim?).

 

3)            Difference of the means for Scenario 1 (Compare the student scores on Fall Exam 1 verses Spring Exam 1)

a.   Calculate the difference of the means for Scenario 1.

b.   Looking at your work for question #1, do you think that this difference of the means could have just happened by chance or is it significant?  Explain.

c.    Below are results of randomized tests that considered the difference of the means in scenario 1.  Do they agree with your hypothesis in part b above?  Explain.

 

Fall Test 1 VS Spring Test 1

 

 

4)            Difference of the means for Scenario 2 (Compare the student scores on Fall Exam 1 verses Fall Exam 2)

a.   Calculate the difference of the means for Scenario 2.

b.   Looking at your work for question #2, do you think that this difference of the means could have just happened by chance or is it significant?  Explain.

c.    Below are results of randomized tests that considered the difference of the means in scenario 2.  Do they agree with your hypothesis in part b above?  Explain.

 

Fall Test 1 VS Fall Test 2