TEXTBOOK: Our books are STILL on order. We experienced explosive growth in AP Statistics this year, from 30 students to 120! We will issue books as soon as we receive them. At some point you will probably want to purchase a study guide, like the Barron's Guide for AP Statistics. If you would feel more comfortable with a book at this point, the Barron's Guide is a wonderful resource.
Exam date: Wednesday, May 16 at 12:00 PM
Cost: around $90
Materials needed for class: Notebook, pencil, graphing calculator with statistics package (like the TI-83, TI-84, or Nspire), sturdy composition notebook.
Summer packet due: Just kidding. There was no summer packet.
Please wear comfortable, school appropriate clothing in which you will feel comfortable kneeling or sitting on the floor.
Check this space for updates.
________________________________________________
Tuesday, August 16
We will have a quiz at the beginning of class on Wednesday over calculating the boundaries of a confidence interval for the population proportion. Students took notes on the development of this formula in class today.
Point estimate +/- critical value * standard deviation
For this computation, for a 95% confidence interval for the proportion, we use 1.96 for the critical value. This is a more precise value than the "2 standard deviations" measurement that we have used for estimating in Math 3.
___________________________________________________________
Wednesday, August 17
A copy of two versions of today's quiz is attached . Be prepared for a follow-up quiz at any time.
Homework:
For n=50, compute the 95% confidence intervals for each of the following values of p-hat. Determine Download Conf Int Quiz 1whether the target (theoretical) population proportion of 1/6 falls within that interval.
{0.06, 0.08, 0.28, 0.30, 0.32}
HW answers: Download Conf int quiz answers in PDF form
_____________________________________________________________________________
Thursday, August 18
We analyzed the results of the lab today. The mean of our sample proportions was verrrrrry close to the theoretical proportion of successes. The sample standard deviation was very close to the theoretical standard deviation. The data followed a roughly straight path on the Normal Probability Plot. These findings justified our assertion that p-hat was approximately Normally distributed with a mean equal to the population proportion and a standard deviation equal to the Sqrt(p(1-p)/n).
We also clarified our understanding using a guided note-taking form and practiced computing confidence intervals.
From the homework we observed that the intervals make no sense when the conditions for Normality are not met: when np <5 or n(1-p)< 5, the confidence interval could contain values outside the range 0-->1. We also observed that about 5% of the observed proportions from the lab activity yielded confidence intervals that did not contain the true proportion (1/6). This is what we expect from 95% confidence intervals.
HW: Be prepared for a re-quiz. Also, we will be writing in our Statistics Journals tomorrow. Bring your own composition notebook if you don't want a plain vanilla one. Otherwise, I willl provide a notebook for you.
____________________________________________________________________________________
Friday, August 19
We had our re-quiz today. All those who earned 4 points on the first quiz proved that they still knew how to construct a confidence interval. Most of those who had trouble the first time showed great improvement. For a few students the statement describing the distribution and the formula for the boundaries of the confidence interval have become blurred.
Description of the distribution:
p-hat is Approximately Normally distributed with a mean of p and a standard deviation of the square root of p(1-p)/n. In symbols this is p-hat ~N(p, sqrt(p(1-p)/n).
Formula for the boundaries:
p-hat plus and minus the margin of error
or
p-hat plus and minus 1.96* the standard deviation
or
p-hat plus and minus 1.96 * the square root of p(1-p)/n.
Symbolically this is p-hat +/- 1.96 SQRT(p(1-p)/n), and it yields two answers, one less than the estimate and one greater than the estimate.
Neat website for confidence intervals
Here's another applet designed by the chief reader for AP Statistics and the assistant chief reader. With this one you can see what happens when you try to create confidence intervals there np or n(1-p) is less than 5.
We also began to write in our journals. We made our first attempt at defining a confidence interval and responding to some other prompts based on what we've done this week. Refer back to these questions often because eventually you will be called to answer each of them. Referring to the notes from this blog for the week may help you form clear, articulate responses.
If you missed class today (Friday), respond to the first journal prompt PLUS at leas one other. You can copy your response into a journal before or after school Monday or Tuesday.
__________________________________________________________
Monday, August 22
We investigated the behavior of confidence intervals today. We constructed confidence intervals by repeatedly sampling from a population of colored puff balls. After determining the lower and upper bounds, we drew a confidence interval on a number line and posted the results on a graph for comparison. We found that most of the intervals for the population proportion of a specific color in the bowl overlapped quite a bit. About one confidence interval in each class did not look like it captured those common, overlapping ranges of numbers. That demonstrates how about 1 in 20-25 confidence intervals will not capture the true population proportion. The following video shows the results, First you will see the results for green, then purple, then yellow. Double-click on the image to start the video.
We also revisited the conditions for creating these confidence intervals. In order to use Normal procedures, both n*p-hat and n*(1 - p-hat) must be at least 10. In order to use the simplified standard deviation formula, the sample size must be less than 1/10 of the population.
We also wrote the interpretations of the confidence level and the confidence interval. These concepts are related, but not identical.
Confidence interval: For a SPECIFIC INTERVAL!!! "We are 95% confident that the true population proportion of yellow puff balls falls between .4217 and .5728." If is not correct to claim that the population proportion falls within THIS INTERVAL 95% of the time because it either DOES or DOES NOT. The population proportion does not move!
Confidence level: If we were to repeat this process many, many, many times, then approximately 95% of the confidence intervals created would contain the true population proportion.
Homework: Bring in one printed page from the internet or a published source that shows confidence intervals being used in research. This should be a page from a newspaper, an article from a magazine or journal, or a page from published research. You don't have to print the whole document, just the part that describes the confidence interval. Wikipedia or another encyclopedia-type reference is not published research and is not acceptable for this assignment. Here's an example we'll look at in class tomorrow that mentions margins of error.
article about swine flu vaccine
________________________________________________________________________
Tuesday, August 23
Students shared the articles that they found online that pertained to research reported with confidence intervals. Some of the classes had high-level discussions about some mature topics that were embedded in their articles (mortality, pre-eclampsia, etc.). We see that confidence intervals are popular tools for making decisions based on statistics and reporting the results of surveys and experiments in many contexts.
Homework: Collect your heart rate in beats per minute after four "events": resting, after 10 jumping jacks, after 20 jumping jacks, and after 25 or 30 or 50 jumping jacks. If jumping jacks are not a safe exercise for you, you may substitute an equivalent aerobic exercise for this assignment. Your results should have four pairs of data in the form (number of repetitions, heart rate), like (0, 72), (10, 91), etc.
We will compile the results to create scatterplots of heart rate vs level of exercise, run regression, and evaluate correlation on Wednesday. This is our first exploration of bivariate data (x y pairs), and will also demonstrate how confidence intervals can be applied to slopes.
You may be thinking that these are topics that would normally be covered much later in the course, and you are correct. At this point we are just previewing some of the ideas that we master later. If you think that the math is a little harder than you had expected, please do not give up hope! We will have our first AP Stat Math Night on August 31 at 7:00 PM. Our first night will address solving equations. Anyone who struggled with Math 3 or Math 4 should attend, as well as those who just want a refresher. Future topics will include exponential functions and transformations.
__________________________________________________________
Wednesday, August 24
Students entered their data along with some data from their classmates into lists 1 and 2 in the calculators. They graphed the data, observing that the growth seemed to be "steady" or linear. Because the pattern of growth seemed to be linear, it was appropriate for them to perform a linear regression on L1 and L2(# of jumping jacks and heart rate). They used the LinReg (a + bx), in the form LinReg (a + bx),L1,L2,Y1 to tell the calculator where to find the x and y values and where to store the final equation. DO NOT LEAVE THE L1 and L2 OUT! In AP Stat we will often perform regression on lists other than L1 and L2, and if you get in the habit of using the default, you will get wrong answers. Download Jumping jacks lin reg
Our next steps are to interpret the results of the other linear regression programs on the calculator, and to assess the appropriateness of the linear predictor. The answer is not as easy as just interpreting r and r-squared. The homework takes us to the next level of understanding.
Homework: Using L1 and L2, and after storing the regression equation in Y1, we begin to look at how far off our estimate is from our actual values. At the VERY TOP of List 3 on the calculator put in the formula L2-Y1(L1). Hit ENTER to fill the column with the differences between the observed heart rates and the predicted heart rates. We will start here Thursday morning.
If you still have not replaced the batteries in your calculator or even found it after its summer vacation, you need to do so tonight! The graphing in this lesson cannot be performed on the TI-30 calculators.
Some students have asked for calculator recommendations. If you plan to teach math or major in math, an Nspire is ideal for the future. If you already have a calculator in the TI-83 or TI-84 family, you can use that for this class. If you are in the market for a calculator that will get the job done, a TI-83 will work, but a TI-84 silver edition will be faster and will have more capabilities. Either is appropriate for this class.
The video we saw in class today was Statz 4 Life, produced by graduate students at the University of Oregon. link to youtube video The emphasis for our class was "the difference on the top and the error on the bottom," which summarizes the formula for a z-score: (observed - expected)/standard error.
_________________________________________________________________
Thursday, August 25
We continued our exploration of bivariate data and learned our way around the calculators a little today. Please take another look at the worksheet from today. I have included screen shots from the Nspire calculator and a running commentary that explains what we've done.
Homework: Complete the worksheet from today.Download Bivariate data 2
Friday, August 26
We revisited our strategies for working with bivariate data: Enter, graph, eyeball for linearity, run regression, check that Least Squares Regression Equation falls among the data, record the regression equation, define variables, interpret y-intercept and slope, compute residuals, graph residuals against one of the original variables, consider the pattern of the residuals. . .
We took a little assessment. . . write the equation of the least squares regression line, and interpret the slope of the least squares regression line for the data collected in class.
Your answers should have resembled these:
(1) y-hat = 4.6 - .54 x where y = number of baskets and x = number of strides from the starting point.
(2) The decrease in the number of baskets scored that we should expect on average as we increase the distance between the basket and the starting point by one additional stride.
Homework: Check out the Download Bivariate data 2 worksheet described in class and posted last night. Pages 2 and 3 have screen shots from yesterday's work using the Nspire calculator. You should understand all the procedures we've applied so far.
CiCi's:::::::: I will be at CiCi's on Hwy 92 and Trickum (in the Super WalMart shopping center) from 2 until 4 on Sunday afternoon for extra help. The buffet cost is around $6.00. There is no charge for tutoring, but it is only fair to pay for the buffet if you are taking up their chairs.
Free help is available before and after school. ASE is available for anyone interested. If you were not placed in ASE for AP Stat and you want to be in it, please let Mrs. Linner know and your wish will be fulfilled.
_________________________________________________________________________________
Monday, August 29
You received a copy of the rubric for the ticket-out-the-door from Friday. Each of the elements listed on the rubric must be included whenever you discuss the regression line or its slope. Download Rubric for the regression spotcheck
To solidify your understanding of the process, we measured and counted pasta in class today. Different measures of pasta (1 cup, 1/2 cup, 1/4 cup, etc.) yield different counts of pieces of pasta. You collected these bivariate data and performed regression and other analyses on these data (in groups).
And, of course, you were supposed to write up your results in a manner consistent with the rubric.
Common student problems:
Can't get the residuals formula to work. Most likely problem: Failed to store the regression equation in Y1. This is done automatically when you use the LinReg function as demonstrated in class--LinReg L1, L2, Y1.
Getting a Dim Mismatch error when trying to graph. Probably haven't reset your StatPlot to graph the two variables that you want. Usually this happens when you're trying to graph L1 and L2 and you last graphed residuals.
Y-intercept is negative (or large). Short answer: This may not be a problem. Due to randomness and your method of measuring you probably have a non-zero y-intercept when it doesn't make sense. In this case, the theoretical number of pieces of pasta with NO cups of pasta is zero. Variability will give numbers from negative 8 to 8 from your regression equation.
Longer answer: If you were to run a true regression analysis on these data using a computer program (like Minitab, SPSS, or JMP), your results would tell you that there is no evidence that the y-intercept is other than zero. This is one of the few cases where the t-test of the intercept is interesting for AP Statistics students. Usually it is unimportant.
________________________________________________________________
Tuesday, September 6
What a great job you did on the packets last week! We're going to take some time to squeeze all the insights we can from these rich activities. Today we looked at the first one, and assessed the effect one outlier can have on a least squares regression and on the correlation. We observed that an outlier "in line with" the rest of the points strengthens the correlation the further it is from the rest of the data in the x direction (a confirming point). If the point is a counterexample, an outlier NOT falling in line with the rest, it weakens the relationship, or may twist the line entirely if it is far away in the x and y directions. If the y-value of the outlier falls close to the mean of the data, we may change our minds about a relationship existing between the x and y--the correlation may approach 0 as the x-value increases in distance from the rest of the data.
Wednesday we will use the data from the first activity to assess the relationship between marriage and divorce rates in European countries. As an added bonus, we will practice using the graphing calculator to store lists of data for later use within programs. This skill can save you lots of data-entry time.
We will also look at the Anscombe data lists. http://en.wikipedia.org/wiki/Anscombe%27s_quartet
Thank you for all your support last week when I had to leave. Here are two tributes for my Dad, one provided by the funeral home and one created by an email pal in Germany.
--------------------------------------------------------------------------------------------------
Wednesday, September 7
We are still using the packet of activities we started last week. Today we investigated the Anscombe data sets.These were remarkable because the summary statistics and the regression equations were identical for these sets of ordered pairs, but they graphs of the data sets were drastically different. This reinforced our understanding that we have to look at the graph of the data before we decide to run a linear regression. Just because we have ordered pairs does not mean that a linear regression is appropriate.
Please continue to have the packet and your calculator on your desk when the bell rings for class, since we will be using it every day for a few more days.
TONIGHT (9/7) is the first AP Stat Math Night: 7-9 PM in my classroom. We will go over solving equations and a little logarithm/exponential equation stuff.
HW: load the Anscombe data into your calculator, then create the program ANSCOMBE to restore your lists. Please note that the x-values for sets 1-3 are all identical, so you only have to enter them once. Suggestion: If we consider the pairs of data like this (x1, y1) (x2, y2) (x3, y3), (x4, y4), then we really have the pairs (x1, y1) (x1, y2) (x1, y3), (x4, y4), and there are only six lists--x1, y1, y2, y3, x4 and y4.
Part of the coding to capture the lists would be 2nd RCL L1<enter> STO L1.
_____________________________________________________________________
Thursday, September 8
I was disappointed that some students failed to do last night's homework. If you did not practice creating a program from data, please practice this weekend.
Today we loaded the Catalog Help program on calculators and learned how to use it. By pushing the plus key instead of the enter key to select certain functions on the calculator the student sees the syntax for the function (the list of things that must/should be entered to run the function).
Prepare for Friday's quiz by completing the worksheet from class today and knowing how to perform these steps without prompts. If you have questions, please come by the classroom BEFORE SCHOOL.
Applications for Mu Alpha Theta are still available. Please let me know if you want to be switched into the AP Statistics ASE from another group. We can make it happen!
_____________________________________________________________________________________
Friday, September 9
We quizzed today on linear regression. I plan to give these back on Monday, so please come before school to take the make-up if you were absent today. Otherwise, this will be an "omit" in the gradebook.
We also completed the first histogram task in the packet, contrasting the histograms representing five different variables. Your homework is to answer the two questions at the top of page 17 completely. Perhaps you answered these questions last week when the packet was assigned, but most of you have new insights that would refine your answers now. Start over and write a more thoughtful response this time.
We will test over regression, computing and interpreting z-scores, histograms, constructing confidence intervals for proportions, and box-plots on Thursday, Sept 15.
CiCi's? Yes. Trickum and 92.
Mu Alpha Theta? Pick up an application on Monday.
Switch into AP Stat ASE? Can be done in a flash. Just let me know.
Please be safe this weekend.
____________________________________________________________
Monday, September 12
TEST TUESDAY, Sept 20
Today we collected four sets of univariate data:
The number of cell phone contacts you have programmed in your cell phone
The number of AP exams you will have taken before you graduate
The complete cost of the last haircut you got (color, perm, tip can be included)
The amount of cash you are carrying right now
We looked at he histograms for these distributions and the modified boxplots.
I assume that you know what a box and whisker plot looks like, but you may be a little new to creating one. To create a modified boxplot compute the 5 number summary (minimum, first quartile, median, third quartile, maximum), and the interquartile range (Q3 - Q1) from your sorted data.
Determine the limits on reasonable data--one and one-half IQRS below and above the middle 50% o the data (Q1-1.5IQR and Q3+1.5IQR). If observations are lower than the lower reasonable limit or higher than the upper reasonable limit, they are considered outliers. Mark them with stars.
Find the highest and lowest of the reasonable values. These will be the endpoints of the whiskers. Q1 and Q3 are the lower and upper edges of the box representing the middle 50% of the data. The median is marked by a vertical line inside the box.
Purple math does a nice job of explaining the outlier part: Link to purple math.
HW - Create a box and whisker plot for your data by hand. Be sure to label the median, Q1, and Q3, and identify outliers with a special marking. Remember that the whiskers only go as far as the last reasonable observed value.
______________________________________________________________________
Tuesday, September 13
The test is moved to next Tuesday the 20th.
Today we used a template to organize data for creation of histograms and box and whisker graphs.
The tricky part in creating effective histograms is to set the frame correctly. The result has to (1) fairly represent the data, (2) be informative, and (3) be visually satisfying.
To fairly represent the data, the frame should be balanced around the data, with any excess width split evenly between the front end and back end of the scale. Set your starting point so that the frame will be balanced.
To be informative, use a reasonable number of bins. If you have a lot of data, then more bins might better represent the data. Small data sets should not be broken into a large number of bins.
To be visually satisfying, we usually set the window to accommodate 5, 7, or 9 bins. This is the limit of what the human brain can comprehend in histogram form.
If you want your calculator window to match the beautiful graph you create by hand, set your xmin in the calculator window equal to the min - (1/2) excess "frame". For instance, if your range is 37 and your width is 40, the excess "frame" is 3 and you would put 1.5 on the left and 1.5 on the right. If the min was 4, your starting point is 4 -1.5 = 2.5.
Set your xscl to match the bin width.
Make your ymin = -1 and your ymax a little taller than the highest bar.
HW (1) Create a data set with outliers on one end.
(2) Create a data set in which the box and whisker graph will have only one whisker.
(3) Create a data set in which the box and whisker graph will have no whiskers.
(4) FInd the number of licks that it takes to get to the center of a Tootsie Pop.
(5) Finish the box and whisker graph started in class today (the grasp data) if you didn't already.
And keep the data for grasp and span handy. Handy. Ha. Ha. I crack me up.
____________________________________________________________
Wednesday, September 14
We learned to create stem and leaf plots today: regular, split, and back-to-back. Online Stat Book treatment of stem and leaf plots
We discussed the advantages of stem and leaf over histograms as well as the disadvantages. Can you explain them?
We discussed the essential principles of graph design: ethics and appeal. A graph has to convey information honestly and effectively, and then incorporate design that helps the reader to interpret or appreciate the information. If a graph does not convey the information in a way that the reader can interpret it fairly, it is worthless!
One of the experts in visual presentation of data is Edward Tufte, a professor from Yale. He has written lovely, informative books on the subject (I have one on the shelf in my classroom and another two at home), but ironically, the wikipedia page describing his work is mostly text-based. Similarly, the page on Information Design has only two graphs. Look for more examples online that contrast good representations with poor ones.
No HW tonight. We will collect and analyze more data on Thursday and Friday.
__________________________________________________________
Thursday, September 15
Thank you to the students who attended the AP Stats Math Night Wednesday. We covered a lot of the math we will be using in chapters 3 and 4.
Today we investigated the relationship (?) between hand span and Tootsie Pops grabbed and worked through the complete regression using the key from the quiz as a guide.
We practiced again, using the completely contrived data set of red Skittles vs non-red Skittles. Funny thing, this was not as strong a relationship, but at least it was different because the slope was negative.
Refine your notes for homework tonight. If you have any special requests for links to topics for posting on the blog, let me know.
Neon-out tomorrow. Go Trojans.
______________________________________________
Friday, September 16
We took one more look at the requirements for a complete regression. A large number of students learned one more time that the scatterplots need labels and interpretations (strong, positive, linear, with no unusual observations), and that predictor equations use both the names of the variables ()not x and y) and a hat over the response variable.
The worksheet from class has some problems printed on the back. Some are regression problems that build on what we've already done--like computing the slope from my favorite formula and using a predictor equation to predict a value of y. The other problems are confidence interval problems. Go back through your notes and the blog to find details if you need them.
HW Finish the worksheet. Prepare for the test.
--------------------------------------------------------------------------------------------------------------------
Monday, September 19
Test on Tuesday!!!!!!!!!!!!! It was announced last Tuesday, so you will take the test whether you were in class on Monday or not.
Content:
Computing z scores- given any three of the values in the formula for z, be able to calculate the last one. z = (observed - average)/standard deviation
Graphing - using histograms, boxplots, stemplots, and scatterplots. Be able to describe/interpret each. For scatterplots, describe direction, form, strength, and unusiual observations. For the others describe center, shape, spread, and unusual observations. Be able to construct each of these graphs.
Confidence intervals - Compute margin of error and boundaries of interval. Compute the minimum sample size to have a margin of error of a specific size. Always interpret the interval at the end. "We are 95% confident that the true population proportion of [insert context here] falls between [lower bound] and [upper bound]. NEVER interpret a confidence interval with a statement about how likely it is that the interval contains the true proportion: the interval either contains the value or it does not. All the time!
Linear regression - Perform all the steps as practiced in class and on the quiz AND interpret all the graphs and output. Use the regression to predict the response variable.