Important Concepts not on the AP Statistics Formula Sheet PDF Download

Important Concepts not on the AP Statistics Formula Sheet

Part I:

IQR = Q

– Q

Test for an outlier:

1.5(IQR) above Q

or below

The calculator will run the

test for you as long as you

choose the boxplot with the

oulier on it in STATPLOT

Linear transformation:

Addition: affects center NOT

spread

adds to , M, Q

, Q

IQR

not σ

Multiplication: affects both

center and spread

multiplies , M, Q

, Q

IQR, σ

When describing data:

describe

center, spread, and

shape.

Give a 5 number

summary or mean and

standard deviation when

necessary.

Histogram:

fairly symmetrical

unimodal

skewed

right

Skewed left

Ogive (cumulative

frequency)

Boxplot (with an

outlier)

Stem and leaf

Normal Probability Plot

The 80

percentile means that

80% of the data is below that

observation.

HOW MANY STANDARD

DEVIATIONS AN

OBSERVATION IS FROM

THE MEAN

68-95-99.7 Rule for

Normality

N(µ,σ)

N(0,1) Standard Normal

r: correlation coefficient,

The strength of the linear

relationship of data.

Close to 1 or -1 is very

close to linear

: coefficient of

determination. How well

the model fits the data.

Close to 1 is a good fit.

“Percent of variation in y

described by the LSRL on

x”

residual =

observed – predicted

y = a+bx

Slope of LSRL(b): rate of

change in y for every unit x

y-intercept of LSRL(a): y

when x = 0

Exponential Model:

y = ab

take log of y

Power Model:

y = ax

take log of x and y

Explanatory variables

explain changes in

response variables.

EV: x, independent

RV: y, dependent

Lurking Variable: A

variable that may

influence the relationship

bewteen two variables.

LV is not among the EV’s

Confounding: two

variables are confounded

when the effects of an RV

cannot be distinguished.

Part II: Designing Experiments and Collecting Data:

Sampling Methods:

The Bad:

Voluntary sample. A voluntary sample is made up of people who decide for themselves to be in the survey.

Example: Online poll

Convenience sample. A convenience sample is made up of people who are easy to reach.

Example: interview people at the mall, or in the cafeteria because it is an easy place to reach people.

The Good:

Simple random sampling. Simple random sampling refers to a method in which all possible samples of n objects are equally

likely to occur.

Example: assign a number 1-100 to all members of a population of size 100. One number is selected at a time from a list of

random digits or using a random number generator. The first 10 selected are the sample.

Stratified sampling. With stratified sampling, the population is divided into groups, based on some characteristic. Then, within

each group, a SRS is taken. In stratified sampling, the groups are called strata.

Example: For a national survey we divide the population into groups or strata, based on geography - north, east, south, and

west. Then, within each stratum, we might randomly select survey respondents.

Cluster sampling. With cluster sampling, every member of the population is assigned to one, and only one, group. Each group

is called a cluster. A sample of clusters is chosen using a SRS. Only individuals within sampled clusters are surveyed.

Example: Randomly choose high schools in the country and only survey people in those schools.

Difference between cluster sampling and stratified sampling. With stratified sampling, the sample includes subjects from each

stratum. With cluster sampling the sample includes subjects only from sampled clusters.

Multistage sampling. With multistage sampling, we select a sample by using combinations of different sampling methods.

Example: Stage 1, use cluster sampling to choose clusters from a population. Then, in Stage 2, we use simple random sampling

to select a subset of subjects from each chosen cluster for the final sample.

Systematic random sampling. With systematic random sampling, we create a list of every member of the population. From the

list, we randomly select the first sample element from the first k subjects on the population list. Thereafter, we select

every kth subject on the list.

Example: Select every 5

person on a list of the population.

Experimental Design:

A well-designed experiment includes design features that allow researchers to eliminate extraneous variables as an explanation

for the observed relationship between the independent variable(s) and the dependent variable.

Experimental Unit or Subject: The individuals on which the experiment is done. If they are people then we call them subjects

Factor: The explanatory variables in the study

Level: The degree or value of each factor.

Treatment: The condition applied to the subjects. When there is one factor, the treatments and the levels are the same.

Control. Control refers to steps taken to reduce the effects of other variables (i.e., variables other than the independent variable

and the dependent variable). These variables are called lurking variables.

Control involves making the experiment as similar as possible for subjects in each treatment condition. Three control strategies

are control groups, placebos, and blinding.

Control group. A control group is a group that receives no treatment

Placebo. A fake or dummy treatment.

Blinding: Not telling subjects whether they receive the placebo or the treatment

Double blinding: neither the researchers or the subjects know who gets the treatment or placebo

Randomization. Randomization refers to the practice of using chance methods (random number tables, flipping a coin, etc.) to

assign subjects to treatments.

Replication. Replication refers to the practice of assigning each treatment to many experimental subjects.

Bias: when a method systematically favors one outcome over another.

Types of design:

Completely randomized design With this design, subjects are randomly assigned to treatments.

Randomized block design, the experimenter divides subjects into subgroups called blocks. Then, subjects within each block

are randomly assigned to treatment conditions. Because this design reduces variability and potential confounding, it produces a

better estimate of treatment effects.

Matched pairs design is a special case of the randomized block design. It is used when the experiment has only two treatment

conditions; and subjects can be grouped into pairs, based on some blocking variable. Then, within each pair, subjects are

randomly assigned to different treatments.

Part II in Pictures: Sampling Methods

Simple Random Sample: Every group of n objects has an equal chance of being selected.

Stratified Random Sampling: Cluster Sampling:

Break population into strata (groups) Randomly select clusters then take all

then take an SRS of each group. Members in the cluster as the sample.

Systematic Random Sampling:

Select a sample using a system, like selecting every

third subject.

Experimental Design:

Completely Randomized Design: Randomized Block Design:

Matched Pairs Design:

Overview

Important Concepts not on the AP Statistics Formula Sheet

Essential AP Statistics concepts that are not included on the official formula sheet are outlined in this guide. It covers key topics such as sampling methods, experimental design, probability, and hypothesis testing. Students preparing for the AP Statistics exam will find valuable insights into statistical concepts, including linear transformations and the Central Limit Theorem. The guide also explains the importance of understanding bias, control groups, and the significance of p-values in hypothesis testing. This resource is ideal for high school students aiming to excel in AP Statistics. Key Points Explains key statistical concepts not included in the AP Statistics formula sheet. Covers sampling methods such as simple random sampling and stratified sampling. Details experimental…

/ 14

317

Figures

Important Concepts not on the AP Statistics Formula Sheet — Important Concepts not on the AP Statistics Formula Sheet

Break population into strata (groups) Randomly select clusters then take all

Select a sample using a system, like selecting every

Important Concepts not on the AP Statistics Formula Sheet — Part II in Pictures: Sampling Methods

Important Concepts not on the AP Statistics Formula Sheet — Part III: Probability and Random Variables:

Discrete Random Variable: has a countable number of possible events

Low Bias, High Variability High Bias, Low Variability

High bias, High Variability Low Bias, Low Variability

60.6% of the variation in fat gained is explained by the Least Squares Regression line on NEA.

Important Concepts not on the AP Statistics Formula Sheet — Regression in a Nutshell

Curved Model would be a good fit Fan shape loses accuracy as x increases

Construct a 95% Confidence interval for the slope of the LSRL of IQ on cry count for the 20 babies in

FAQs

What is the formula for the Interquartile Range (IQR)?

The Interquartile Range (IQR) is calculated using the formula IQR = Q3 – Q1. This measure helps to identify the spread of the middle 50% of the data points in a dataset, providing insights into its variability.

How do you test for outliers in a dataset?

To test for outliers, you can use the formula 1.5(IQR) above Q3 or below Q1. If a data point falls outside this range, it is considered an outlier. This method is commonly used in statistical analysis to ensure the integrity of the data.

What does the 68-95-99.7 rule indicate in statistics?

The 68-95-99.7 rule, also known as the empirical rule, states that for a normal distribution, approximately 68% of the data falls within one standard deviation of the mean, about 95% falls within two standard deviations, and around 99.7% falls within three standard deviations. This rule is essential for understanding the distribution of data in statistics.

What is the difference between stratified sampling and cluster sampling?

Stratified sampling involves dividing the population into groups, or strata, based on certain characteristics, and then taking a simple random sample from each group. In contrast, cluster sampling assigns every member of the population to one group, or cluster, and then randomly selects entire clusters for sampling. This distinction is crucial for designing effective surveys.

What is the significance of the correlation coefficient (r)?

The correlation coefficient (r) measures the strength of the linear relationship between two variables. Values close to 1 or -1 indicate a strong linear relationship, while values near 0 suggest a weak relationship. Understanding this coefficient is vital for interpreting data relationships in statistics.

What is the purpose of randomization in experimental design?

Randomization is used in experimental design to assign subjects to treatments using chance methods. This practice helps eliminate bias and ensures that the treatment groups are comparable, which is essential for the validity of the experiment's results.

How is a residual calculated in regression analysis?

In regression analysis, a residual is calculated as the difference between the observed value and the predicted value. It is expressed as residual = observed - predicted. Analyzing residuals helps assess the fit of the regression model and identify any patterns that may indicate issues with the model.