User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

1.9 - hypothesis test for the population correlation coefficient.

There is one more point we haven't stressed yet in our discussion about the correlation coefficient r and the coefficient of determination \(R^{2}\) — namely, the two measures summarize the strength of a linear relationship in samples only . If we obtained a different sample, we would obtain different correlations, different \(R^{2}\) values, and therefore potentially different conclusions. As always, we want to draw conclusions about populations , not just samples. To do so, we either have to conduct a hypothesis test or calculate a confidence interval. In this section, we learn how to conduct a hypothesis test for the population correlation coefficient \(\rho\) (the greek letter "rho").

In general, a researcher should use the hypothesis test for the population correlation \(\rho\) to learn of a linear association between two variables, when it isn't obvious which variable should be regarded as the response. Let's clarify this point with examples of two different research questions.

Consider evaluating whether or not a linear relationship exists between skin cancer mortality and latitude. We will see in Lesson 2 that we can perform either of the following tests:

  • t -test for testing \(H_{0} \colon \beta_{1}= 0\)
  • ANOVA F -test for testing \(H_{0} \colon \beta_{1}= 0\)

For this example, it is fairly obvious that latitude should be treated as the predictor variable and skin cancer mortality as the response.

By contrast, suppose we want to evaluate whether or not a linear relationship exists between a husband's age and his wife's age ( Husband and Wife data ). In this case, one could treat the husband's age as the response:

husband's age vs wife's age plot

...or one could treat the wife's age as the response:

wife's age vs husband's age plot

In cases such as these, we answer our research question concerning the existence of a linear relationship by using the t -test for testing the population correlation coefficient \(H_{0}\colon \rho = 0\).

Let's jump right to it! We follow standard hypothesis test procedures in conducting a hypothesis test for the population correlation coefficient \(\rho\).

Steps for Hypothesis Testing for \(\boldsymbol{\rho}\) Section  

Step 1: hypotheses.

First, we specify the null and alternative hypotheses:

  • Null hypothesis \(H_{0} \colon \rho = 0\)
  • Alternative hypothesis \(H_{A} \colon \rho ≠ 0\) or \(H_{A} \colon \rho < 0\) or \(H_{A} \colon \rho > 0\)

Step 2: Test Statistic

Second, we calculate the value of the test statistic using the following formula:

Test statistic:  \(t^*=\dfrac{r\sqrt{n-2}}{\sqrt{1-R^2}}\) 

Step 3: P-Value

Third, we use the resulting test statistic to calculate the P -value. As always, the P -value is the answer to the question "how likely is it that we’d get a test statistic t* as extreme as we did if the null hypothesis were true?" The P -value is determined by referring to a t- distribution with n -2 degrees of freedom.

Step 4: Decision

Finally, we make a decision:

  • If the P -value is smaller than the significance level \(\alpha\), we reject the null hypothesis in favor of the alternative. We conclude that "there is sufficient evidence at the\(\alpha\) level to conclude that there is a linear relationship in the population between the predictor x and response y."
  • If the P -value is larger than the significance level \(\alpha\), we fail to reject the null hypothesis. We conclude "there is not enough evidence at the  \(\alpha\) level to conclude that there is a linear relationship in the population between the predictor x and response y ."

Example 1-5: Husband and Wife Data Section  

Let's perform the hypothesis test on the husband's age and wife's age data in which the sample correlation based on n = 170 couples is r = 0.939. To test \(H_{0} \colon \rho = 0\) against the alternative \(H_{A} \colon \rho ≠ 0\), we obtain the following test statistic:

\begin{align} t^*&=\dfrac{r\sqrt{n-2}}{\sqrt{1-R^2}}\\ &=\dfrac{0.939\sqrt{170-2}}{\sqrt{1-0.939^2}}\\ &=35.39\end{align}

To obtain the P -value, we need to compare the test statistic to a t -distribution with 168 degrees of freedom (since 170 - 2 = 168). In particular, we need to find the probability that we'd observe a test statistic more extreme than 35.39, and then, since we're conducting a two-sided test, multiply the probability by 2. Minitab helps us out here:

Student's t distribution with 168 DF

The output tells us that the probability of getting a test-statistic smaller than 35.39 is greater than 0.999. Therefore, the probability of getting a test-statistic greater than 35.39 is less than 0.001. As illustrated in the following video, we multiply by 2 and determine that the P-value is less than 0.002.

Since the P -value is small — smaller than 0.05, say — we can reject the null hypothesis. There is sufficient statistical evidence at the \(\alpha = 0.05\) level to conclude that there is a significant linear relationship between a husband's age and his wife's age.

Incidentally, we can let statistical software like Minitab do all of the dirty work for us. In doing so, Minitab reports:

Correlation: WAge, HAge

Pearson correlation of WAge and HAge = 0.939

P-Value = 0.000

Final Note Section  

One final note ... as always, we should clarify when it is okay to use the t -test for testing \(H_{0} \colon \rho = 0\)? The guidelines are a straightforward extension of the "LINE" assumptions made for the simple linear regression model. It's okay:

  • When it is not obvious which variable is the response.
  • For each x , the y 's are normal with equal variances.
  • For each y , the x 's are normal with equal variances.
  • Either, y can be considered a linear function of x .
  • Or, x can be considered a linear function of y .
  • The ( x , y ) pairs are independent

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Pearson Correlation Coefficient (r) | Guide & Examples

Pearson Correlation Coefficient (r) | Guide & Examples

Published on May 13, 2022 by Shaun Turney . Revised on February 10, 2024.

The Pearson correlation coefficient ( r ) is the most common way of measuring a linear correlation. It is a number between –1 and 1 that measures the strength and direction of the relationship between two variables.

Table of contents

What is the pearson correlation coefficient, visualizing the pearson correlation coefficient, when to use the pearson correlation coefficient, calculating the pearson correlation coefficient, testing for the significance of the pearson correlation coefficient, reporting the pearson correlation coefficient, other interesting articles, frequently asked questions about the pearson correlation coefficient.

The Pearson correlation coefficient ( r ) is the most widely used correlation coefficient and is known by many names:

  • Pearson’s r
  • Bivariate correlation
  • Pearson product-moment correlation coefficient (PPMCC)
  • The correlation coefficient

The Pearson correlation coefficient is a descriptive statistic , meaning that it summarizes the characteristics of a dataset. Specifically, it describes the strength and direction of the linear relationship between two quantitative variables.

Although interpretations of the relationship strength (also known as effect size ) vary between disciplines, the table below gives general rules of thumb:

The Pearson correlation coefficient is also an inferential statistic , meaning that it can be used to test statistical hypotheses . Specifically, we can test whether there is a significant relationship between two variables.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Another way to think of the Pearson correlation coefficient ( r ) is as a measure of how close the observations are to a line of best fit .

The Pearson correlation coefficient also tells you whether the slope of the line of best fit is negative or positive. When the slope is negative, r is negative. When the slope is positive, r is positive.

When r is 1 or –1, all the points fall exactly on the line of best fit:

Strong positive correlation and strong negative correlation

When r is greater than .5 or less than –.5, the points are close to the line of best fit:

Perfect positive correlation and Perfect negative correlation

When r is between 0 and .3 or between 0 and –.3, the points are far from the line of best fit:

Low positive correlation and low negative correlation

When r is 0, a line of best fit is not helpful in describing the relationship between the variables:

Zero correlation

The Pearson correlation coefficient ( r ) is one of several correlation coefficients that you need to choose between when you want to measure a correlation. The Pearson correlation coefficient is a good choice when all of the following are true:

  • Both variables are quantitative : You will need to use a different method if either of the variables is qualitative .
  • The variables are normally distributed : You can create a histogram of each variable to verify whether the distributions are approximately normal. It’s not a problem if the variables are a little non-normal.
  • The data have no outliers : Outliers are observations that don’t follow the same patterns as the rest of the data. A scatterplot is one way to check for outliers—look for points that are far away from the others.
  • The relationship is linear: “Linear” means that the relationship between the two variables can be described reasonably well by a straight line. You can use a scatterplot to check whether the relationship between two variables is linear.

Pearson vs. Spearman’s rank correlation coefficients

Spearman’s rank correlation coefficient is another widely used correlation coefficient. It’s a better choice than the Pearson correlation coefficient when one or more of the following is true:

  • The variables are ordinal .
  • The variables aren’t normally distributed .
  • The data includes outliers.
  • The relationship between the variables is non-linear and monotonic.

Below is a formula for calculating the Pearson correlation coefficient ( r ):

\begin{equation*} r = \frac{ n\sum{xy}-(\sum{x})(\sum{y})}{% \sqrt{[n\sum{x^2}-(\sum{x})^2][n\sum{y^2}-(\sum{y})^2]}} \end{equation*}

The formula is easy to use when you follow the step-by-step guide below. You can also use software such as R or Excel to calculate the Pearson correlation coefficient for you.

Step 1: Calculate the sums of x and y

Start by renaming the variables to “ x ” and “ y .” It doesn’t matter which variable is called x and which is called y —the formula will give the same answer either way.

Next, add up the values of x and y . (In the formula, this step is indicated by the Σ symbol, which means “take the sum of”.)

Σ x = 3.63 + 3.02 + 3.82 + 3.42 + 3.59 + 2.87 + 3.03 + 3.46 + 3.36 + 3.30

Σ y = 53.1 + 49.7 + 48.4 + 54.2 + 54.9 + 43.7 + 47.2 + 45.2 + 54.4 + 50.4

Step 2: Calculate x 2 and y 2 and their sums

Create two new columns that contain the squares of x and y . Take the sums of the new columns.

Σ x 2  = 13.18 + 9.12 + 14.59 + 11.70 + 12.89 +  8.24 +  9.18 + 11.97 + 11.29 + 10.89

Σ x 2  = 113.05

Σ y 2  = 2 819.6 + 2 470.1 + 2 342.6 + 2 937.6 + 3 014.0 + 1 909.7 + 2 227.8 + 2 043.0 + 2 959.4 + 2 540.2

Step 3: Calculate the cross product and its sum

In a final column, multiply together x and y (this is called the cross product). Take the sum of the new column.

Σ xy = 192.8 + 150.1 + 184.9 + 185.4 + 197.1 + 125.4 + 143.0 + 156.4 + 182.8 + 166.3

Step 4: Calculate r

Use the formula and the numbers you calculated in the previous steps to find r .

n = 10

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

hypothesis test pearson correlation

The Pearson correlation coefficient can also be used to test whether the relationship between two variables is significant .

The Pearson correlation of the sample is r . It is an estimate of rho ( ρ ), the Pearson correlation of the population . Knowing r and n (the sample size), we can infer whether ρ is significantly different from 0.

  • Null hypothesis ( H 0 ): ρ = 0
  • Alternative hypothesis ( H a ): ρ ≠ 0

To test the hypotheses , you can either use software like R or Stata or you can follow the three steps below.

Step 1: Calculate the t value

Calculate the t value (a test statistic ) using this formula:

\begin{equation*} t = \frac{r} {\sqrt{\dfrac{1-r^2}{n-2}}} \end{equation*}

Step 2: Find the critical value of t

You can find the critical value of t ( t* ) in a t table. To use the table, you need to know three things:

  • The degrees of freedom ( df ): For Pearson correlation tests, the formula is df = n – 2.
  • Significance level (α): By convention, the significance level is usually .05.
  • One-tailed or two-tailed: Most often, two-tailed is an appropriate choice for correlations.

Step 3: Compare the t value to the critical value

Determine if the absolute t value is greater than the critical value of t . “Absolute” means that if the t value is negative you should ignore the minus sign.

Step 4: Decide whether to reject the null hypothesis

  • If the t value is greater than the critical value, then the relationship is statistically significant ( p <  α ). The data allows you to reject the null hypothesis and provides support for the alternative hypothesis.
  • If the t value is less than the critical value, then the relationship is not statistically significant ( p >  α ). The data doesn’t allow you to reject the null hypothesis and doesn’t provide support for the alternative hypothesis.

If you decide to include a Pearson correlation ( r ) in your paper or thesis, you should report it in your results section . You can follow these rules if you want to report statistics in APA Style :

  • You don’t need to provide a reference or formula since the Pearson correlation coefficient is a commonly used statistic.
  • You should italicize r when reporting its value.
  • You shouldn’t include a leading zero (a zero before the decimal point) since the Pearson correlation coefficient can’t be greater than one or less than negative one.
  • You should provide two significant digits after the decimal point.

When Pearson’s correlation coefficient is used as an inferential statistic (to test whether the relationship is significant), r is reported alongside its degrees of freedom and p value. The degrees of freedom are reported in parentheses beside r .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Chi square test of independence
  • Statistical power
  • Descriptive statistics
  • Degrees of freedom
  • Null hypothesis

Methodology

  • Double-blind study
  • Case-control study
  • Research ethics
  • Data collection
  • Hypothesis testing
  • Structured interviews

Research bias

  • Hawthorne effect
  • Unconscious bias
  • Recall bias
  • Halo effect
  • Self-serving bias
  • Information bias

You should use the Pearson correlation coefficient when (1) the relationship is linear and (2) both variables are quantitative and (3) normally distributed and (4) have no outliers.

You can use the cor() function to calculate the Pearson correlation coefficient in R. To test the significance of the correlation, you can use the cor.test() function.

You can use the PEARSON() function to calculate the Pearson correlation coefficient in Excel. If your variables are in columns A and B, then click any blank cell and type “PEARSON(A:A,B:B)”.

There is no function to directly test the significance of the correlation.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Turney, S. (2024, February 10). Pearson Correlation Coefficient (r) | Guide & Examples. Scribbr. Retrieved March 25, 2024, from https://www.scribbr.com/statistics/pearson-correlation-coefficient/

Is this article helpful?

Shaun Turney

Shaun Turney

Other students also liked, simple linear regression | an easy introduction & examples, coefficient of determination (r²) | calculation & interpretation, hypothesis testing | a step-by-step guide with easy examples, what is your plagiarism score.

This means that we can state a null and alternative hypothesis for the population correlation ρ based on our predictions for a correlation. Let's look at how this works in an example.

Now let's go through our hypothesis testing steps:

Step 1: State hypotheses and choose α level

Remember we're going to state hypotheses in terms of our population correlation ρ. In this example, we expect GPA to decrease as distance from campus increases. This means that we are making a directional hypothesis and using a 1-tailed test. It also means we expect to find a negative value of ρ, because that would indicate a negative relationship between GPA and distance from campus. So here are our hypotheses:

H 0 ρ > 0

H A : ρ < 0

We're making our predictions as a comparison with 0, because 0 would indicate no relationship. Note that if we were conducting a 2-tailed test, our hypotheses would be ρ = 0 for the null hypothesis and ρ not equal to 0 for the alternative hypothesis.

We'll use our conventional α = .05.

Step 2: Collect the sample

Here are our sample data:

Step 3: Calculate test statistic

For this example, we're going to calculate a Pearson r statistic. Recall the formula for Person r:   The bottom of the formula requires us to calculate the sum of squares (SS) for each measure individually and the top of the formula requires calculation of the sum of products of the two variables (SP). We'll start with the SS terms. Remember the formula for SS is: SS = Σ(X - ) 2 We'll calculate this for both GPA and Distance. If you need a review of how to calculate SS, review Lab 9 . For our example, we get: SS GPA = .58 and SS distance = 18.39 Now we need to calculate the SP term. Remember the formula for SP is SP = Σ(X - )(Y - ) If you need to review how to calculate the SP term, go to Lab 12 . For our example, we get SP = -.63 Plugging these SS and SP values into our r equation gives us r = -.19 Now we need to find our critical value of r using a table like we did for our z and t-tests. We'll need to know our degrees of freedom, because like t, the r distribution changes depending on the sample size. For r, df = n - 2 So for our example, we have df = 5 - 2 = 3. Now, with df = 3, α = .05, and a one-tailed test, we can find r critical in the Table of Pearson r values . This table is organized and used in the same way that the t-table is used.

Our r crit = .805. We write r crit (3) = -.805 (negative because we are doing a 1-tailed test looking for a negative relationship).

Step 4: Compare observed test statistic to critical test statistic and make a decision about H 0

Our r obs (3) = -.19 and r crit (3) = -.805

Since -.19 is not in the critical region that begins at -.805, we cannot reject the null. We must retain the null hypothesis and conclude that we have no evidence of a relationship between GPA and distance from campus.

Now try a few of these types of problems on your own. Show all four steps of hypothesis testing in your answer (some questions will require more for each step than others) and be sure to state hypotheses in terms of ρ.

(1) A high school counselor would like to know if there is a relationship between mathematical skill and verbal skill. A sample of n = 25 students is selected, and the counselor records achievement test scores in mathematics and English for each student. The Pearson correlation for this sample is r = +0.50. Do these data provide sufficient evidence for a real relationship in the population? Test at the .05 α level, two tails.

(2) It is well known that similarity in attitudes, beliefs, and interests plays an important role in interpersonal attraction. Thus, correlations for attitudes between married couples should be strong and positive. Suppose a researcher developed a questionnaire that measures how liberal or conservative one's attitudes are. Low scores indicate that the person has liberal attitudes, while high scores indicate conservatism. Here are the data from the study:

Couple A: Husband - 14, Wife - 11

Couple B: Husband - 7, Wife - 6

Couple C: Husband - 15, Wife - 18

Couple D: Husband - 7, Wife - 4

Couple E: Husband - 3, Wife - 1

Couple F: Husband - 9, Wife - 10

Couple G: Husband - 9, Wife - 5

Couple H: Husband - 3, Wife - 3

Test the researcher's hypothesis with α set at .05.

(3) A researcher believes that a person's belief in supernatural events (e.g., ghosts, ESP, etc) is related to their education level. For a sample of n = 30 people, he gives them a questionnaire that measures their belief in supernatural events (where a high score means they believe in more of these events) and asks them how many years of schooling they've had. He finds that SS beliefs = 10, SS schooling = 10, and SP = -8. With α = .01, test the researcher's hypothesis.

Using SPSS for Hypothesis Testing with Pearson r

We can also use SPSS to a hypothesis test with Pearson r. We could calculate the Pearson r with SPSS and then look at the output to make our decision about H 0 . The output will give us a p value for our Pearson r (listed under Sig in the Output). We can compare this p value with alpha to determine if the p value is in the critical region.

Remember from Lab 12 , to calculate a Pearson r using SPSS:

The output that you get is a correlation matrix. It correlates each variable against each variable (including itself). You should notice that the table has redundant information on it (e.g., you'll find an r for height correlated with weight, and and r for weight correlated with height. These two statements are identical.)

Pearson correlation

This page offers all the basic information you need about the Pearson correlation coefficient and its significance test and confidence interval. It is part of Statkat’s wiki module, containing similarly structured info pages for many different statistical methods. The info pages give information about null and alternative hypotheses, assumptions, test statistics and confidence intervals, how to find p values, SPSS how-to’s and more.

To compare the Pearson correlation coefficient with other statistical methods, go to Statkat's Comparison tool or practice with the Pearson correlation coefficient at Statkat's Practice question center

  • 1. When to use
  • 2. Null hypothesis
  • 3. Alternative hypothesis
  • 4. Assumptions of test for correlation
  • 5. Test statistic
  • 6. Sampling distribution
  • 7. Significant?
  • 8. Approximate $C$% confidence interval for $\rho$
  • 9. Properties of the Pearson correlation coefficient
  • 10. Equivalent to
  • 11. Example context

When to use?

Note that theoretically, it is always possible to 'downgrade' the measurement level of a variable. For instance, a test that can be performed on a variable of ordinal measurement level can also be performed on a variable of interval measurement level, in which case the interval variable is downgraded to an ordinal variable. However, downgrading the measurement level of variables is generally a bad idea since it means you are throwing away important information in your data (an exception is the downgrade from ratio to interval level, which is generally irrelevant in data analysis).

If you are not sure which method you should use, you might like the assistance of our method selection tool or our method selection table .

Null hypothesis

The test for the Pearson correlation coefficient tests the following null hypothesis (H 0 ):

Alternative hypothesis

The test for the Pearson correlation coefficient tests the above null hypothesis against the following alternative hypothesis (H 1 or H a ):

Assumptions of test for correlation

Statistical tests always make assumptions about the sampling procedure that was used to obtain the sample data. So called parametric tests also make assumptions about how data are distributed in the population. Non-parametric tests are more 'robust' and make no or less strict assumptions about population distributions, but are generally less powerful. Violation of assumptions may render the outcome of statistical tests useless, although violation of some assumptions (e.g. independence assumptions) are generally more problematic than violation of other assumptions (e.g. normality assumptions in combination with large samples).

The test for the Pearson correlation coefficient makes the following assumptions:

  • In the population, the two variables are jointly normally distributed (this covers the normality, homoscedasticity, and linearity assumptions)
  • Sample of pairs is a simple random sample from the population of pairs. That is, pairs are independent of one another

Test statistic

The test for the Pearson correlation coefficient is based on the following test statistic:

  • $t = \dfrac{r \times \sqrt{N - 2}}{\sqrt{1 - r^2}} $ where $r$ is the sample correlation $r = \frac{1}{N - 1} \sum_{j}\Big(\frac{x_{j} - \bar{x}}{s_x} \Big) \Big(\frac{y_{j} - \bar{y}}{s_y} \Big)$ and $N$ is the sample size
  • $r_{Fisher} = \dfrac{1}{2} \times \log\Bigg(\dfrac{1 + r}{1 - r} \Bigg )$, where $r$ is the sample correlation
  • $\rho_{0_{Fisher}} = \dfrac{1}{2} \times \log\Bigg( \dfrac{1 + \rho_0}{1 - \rho_0} \Bigg )$, where $\rho_0$ is the population correlation according to H0

Sampling distribution

  • $t$ distribution with $N - 2$ degrees of freedom
  • Approximately the standard normal distribution

Significant?

This is how you find out if your test result is significant:

  • Check if $t$ observed in sample is at least as extreme as critical value $t^*$ or
  • Find two sided $p$ value corresponding to observed $t$ and check if it is equal to or smaller than $\alpha$
  • Check if $t$ observed in sample is equal to or larger than critical value $t^*$ or
  • Find right sided $p$ value corresponding to observed $t$ and check if it is equal to or smaller than $\alpha$
  • Check if $t$ observed in sample is equal to or smaller than critical value $t^*$ or
  • Find left sided $p$ value corresponding to observed $t$ and check if it is equal to or smaller than $\alpha$
  • Check if $z$ observed in sample is at least as extreme as critical value $z^*$ or
  • Find two sided $p$ value corresponding to observed $z$ and check if it is equal to or smaller than $\alpha$
  • Check if $z$ observed in sample is equal to or larger than critical value $z^*$ or
  • Find right sided $p$ value corresponding to observed $z$ and check if it is equal to or smaller than $\alpha$
  • Check if $z$ observed in sample is equal to or smaller than critical value $z^*$ or
  • Find left sided $p$ value corresponding to observed $z$ and check if it is equal to or smaller than $\alpha$

Approximate $C$% confidence interval for $\rho$

  • $lower_{Fisher} = r_{Fisher} - z^* \times \sqrt{\dfrac{1}{N - 3}}$
  • $upper_{Fisher} = r_{Fisher} + z^* \times \sqrt{\dfrac{1}{N - 3}}$
  • lower bound = $\dfrac{e^{2 \times lower_{Fisher}} - 1}{e^{2 \times lower_{Fisher}} + 1}$
  • upper bound = $\dfrac{e^{2 \times upper_{Fisher}} - 1}{e^{2 \times upper_{Fisher}} + 1}$

Properties of the Pearson correlation coefficient

  • The Pearson correlation coefficient is a measure for the linear relationship between two quantitative variables.
  • The Pearson correlation coefficient squared reflects the proportion of variance explained in one variable by the other variable.
  • The Pearson correlation coefficient can take on values between -1 (perfect negative relationship) and 1 (perfect positive relationship). A value of 0 means no linear relationship.
  • The absolute size of the Pearson correlation coefficient is not affected by any linear transformation of the variables. However, the sign of the Pearson correlation will flip when the scores on one of the two variables are multiplied by a negative number (reversing the direction of measurement of that variable). For example:
  • the correlation between $x$ and $y$ is equivalent to the correlation between $3x + 5$ and $2y - 6$.
  • the absolute value of the correlation between $x$ and $y$ is equivalent to the absolute value of the correlation between $-3x + 5$ and $2y - 6$. However, the signs of the two correlation coefficients will be in opposite directions, due to the multiplication of $x$ by $-3$.
  • The Pearson correlation coefficient does not say anything about causality.
  • The Pearson correlation coefficient is sensitive to outliers.

Equivalent to

The test for the Pearson correlation coefficient is equivalent to:

  • $b_1 = r \times \frac{s_y}{s_x}$
  • Results significance test ($t$ and $p$ value) testing $H_0$: $\beta_1 = 0$ are equivalent to results significance test testing $H_0$: $\rho = 0$

Example context

The test for the Pearson correlation coefficient could for instance be used to answer the question:

How to compute thePearson correlation coefficient in SPSS:

  • Put your two variables in the box below Variables

How to compute thePearson correlation coefficient in jamovi :

  • Put your two variables in the white box at the right
  • Under Correlation Coefficients, select Pearson (selected by default)
  • Under Hypothesis, select your alternative hypothesis
  • Flashes Safe Seven
  • FlashLine Login
  • Faculty & Staff Phone Directory
  • Emeriti or Retiree
  • All Departments
  • Maps & Directions

Kent State University Home

  • Building Guide
  • Departments
  • Directions & Parking
  • Faculty & Staff
  • Give to University Libraries
  • Library Instructional Spaces
  • Mission & Vision
  • Newsletters
  • Circulation
  • Course Reserves / Core Textbooks
  • Equipment for Checkout
  • Interlibrary Loan
  • Library Instruction
  • Library Tutorials
  • My Library Account
  • Open Access Kent State
  • Research Support Services
  • Statistical Consulting
  • Student Multimedia Studio
  • Citation Tools
  • Databases A-to-Z
  • Databases By Subject
  • Digital Collections
  • Discovery@Kent State
  • Government Information
  • Journal Finder
  • Library Guides
  • Connect from Off-Campus
  • Library Workshops
  • Subject Librarians Directory
  • Suggestions/Feedback
  • Writing Commons
  • Academic Integrity
  • Jobs for Students
  • International Students
  • Meet with a Librarian
  • Study Spaces
  • University Libraries Student Scholarship
  • Affordable Course Materials
  • Copyright Services
  • Selection Manager
  • Suggest a Purchase

Library Locations at the Kent Campus

  • Architecture Library
  • Fashion Library
  • Map Library
  • Performing Arts Library
  • Special Collections and Archives

Regional Campus Libraries

  • East Liverpool
  • College of Podiatric Medicine

hypothesis test pearson correlation

  • Kent State University
  • SAS Tutorials
  • Pearson Correlation with PROC CORR

SAS Tutorials: Pearson Correlation with PROC CORR

  • The SAS 9.4 User Interface
  • SAS Syntax Rules
  • SAS Libraries
  • The Data Step
  • Informats and Formats
  • User-Defined Formats (Value Labels)
  • Defining Variables
  • Missing Values
  • Importing Excel Files into SAS
  • Computing New Variables
  • Date-Time Functions and Variables in SAS
  • Sorting Data
  • Subsetting and Splitting Datasets
  • Merging Datasets
  • Transposing Data using PROC TRANSPOSE
  • Summarizing dataset contents with PROC CONTENTS
  • Viewing Data
  • Frequency Tables using PROC FREQ
  • Crosstabs using PROC FREQ
  • Chi-Square Test of Independence
  • t tests are used to test if the means of two independent groups are significantly different. In SAS, PROC TTEST with a CLASS statement and a VAR statement can be used to conduct an independent samples t test." href="https://libguides.library.kent.edu/SAS/IndependentTTest" style="" >Independent Samples t Test
  • t tests are used to test if the means of two paired measurements, such as pretest/posttest scores, are significantly different. In SAS, PROC TTEST with a PAIRED statement can be used to conduct a paired samples t test." href="https://libguides.library.kent.edu/SAS/PairedSamplestTest" style="" >Paired Samples t Test
  • Exporting Results to Word or PDF
  • Importing Data into SAS OnDemand for Academics
  • Connecting to WRDS from SAS
  • SAS Resources Online
  • How to Cite the Tutorials

Sample Data Files

Our tutorials reference a dataset called "sample" in many examples. If you'd like to download the sample dataset to work through the examples, choose one of the files below:

  • Data definitions (*.pdf)
  • Data - Comma delimited (*.csv)
  • Data - Tab delimited (*.txt)
  • Data - Excel format (*.xlsx)
  • Data - SAS format (*.sas7bdat)
  • Data - SPSS format (*.sav)
  • SPSS Syntax (*.sps) Syntax to add variable labels, value labels, set variable types, and compute several recoded variables used in later tutorials.
  • SAS Syntax (*.sas) Syntax to read the CSV-format sample data and set variable labels and formats/value labels.

Pearson Correlation

The bivariate Pearson Correlation produces a sample correlation coefficient, r , which measures the strength and direction of linear relationships between pairs of continuous variables. By extension, the Pearson Correlation evaluates whether there is statistical evidence for a linear relationship among the same pairs of variables in the population, represented by a population correlation coefficient, ρ (“rho”). The Pearson Correlation is a parametric measure.

This measure is also known as:

  • Pearson’s correlation
  • Pearson product-moment correlation (PPMC)

Common Uses

The bivariate Pearson Correlation is commonly used to measure the following:

  • Correlations among pairs of variables
  • Correlations within and between sets of variables

The bivariate Pearson correlation indicates the following:

  • Whether a statistically significant linear relationship exists between two continuous variables
  • The strength of a linear relationship (i.e., how close the relationship is to being a perfectly straight line)
  • The direction of a linear relationship (increasing or decreasing)

Note: The bivariate Pearson Correlation cannot address non-linear relationships or relationships among categorical variables. If you wish to understand relationships that involve categorical variables and/or non-linear relationships, you will need to choose another measure of association.

Note: The bivariate Pearson Correlation only reveals associations among continuous variables. The bivariate Pearson Correlation does not provide any inferences about causation, no matter how large the correlation coefficient is.

Data Requirements

To use Pearson correlation, your data must meet the following requirements:

  • Two or more continuous variables (i.e., interval or ratio level)
  • Cases must have non-missing values on both variables
  • Linear relationship between the variables
  • the values for all variables across cases are unrelated
  • for any case, the value for any variable cannot influence the value of any variable for other cases
  • no case can influence another case on any variable
  • The biviariate Pearson correlation coefficient and corresponding significance test are not robust when independence is violated.
  • Each pair of variables is bivariately normally distributed
  • Each pair of variables is bivariately normally distributed at all levels of the other variable(s)
  • This assumption ensures that the variables are linearly related; violations of this assumption may indicate that non-linear relationships among variables exist. Linearity can be assessed visually using a scatterplot of the data.
  • Random sample of data from the population
  • No outliers

The null hypothesis ( H 0 ) and alternative hypothesis ( H 1 ) of the significance test for correlation can be expressed in the following ways, depending on whether a one-tailed or two-tailed test is requested:

Two-tailed significance test:

H 0 : ρ  = 0 ("the population correlation coefficient is 0; there is no association") H 1 : ρ ≠ 0 ("the population correlation coefficient is not 0; a nonzero correlation could exist")

One-tailed significance test:

H 0 : ρ  = 0 ("the population correlation coefficient is 0; there is no association") H 1 : ρ   > 0 ("the population correlation coefficient is greater than 0; a positive correlation could exist")      OR H 1 : ρ   < 0 ("the population correlation coefficient is less than 0; a negative correlation could exist")

where ρ is the population correlation coefficient.

Test Statistic

The sample correlation coefficient between two variables x and y is denoted r or r xy , and can be computed as: $$ r_{xy} = \frac{\mathrm{cov}(x,y)}{\sqrt{\mathrm{var}(x)} \dot{} \sqrt{\mathrm{var}(y)}} $$

where cov( x , y ) is the sample covariance of x and y ; var( x ) is the sample variance of x ; and var( y ) is the sample variance of y .

Correlation can take on any value in the range [-1, 1]. The sign of the correlation coefficient indicates the direction of the relationship, while the magnitude of the correlation (how close it is to -1 or +1) indicates the strength of the relationship.

  •  -1 : perfectly negative linear relationship
  •   0 : no relationship
  • +1  : perfectly positive linear relationship

The strength can be assessed by these general guidelines [1] (which may vary by discipline):

  • .1 < | r | < .3 … small / weak correlation
  • .3 < | r | < .5 … medium / moderate correlation
  • .5 < | r | ……… large / strong correlation

Note: The direction and strength of a correlation are two distinct properties. The scatterplots below [2] show correlations that are r = +0.90, r = 0.00, and r = -0.90, respectively. The strength of the nonzero correlations are the same: 0.90. But the direction of the correlations is different: a negative correlation corresponds to a decreasing relationship, while and a positive correlation corresponds to an increasing relationship. 

Scatterplot of data with correlation r = -0.90

Note that the r = 0.00 correlation has no discernable increasing or decreasing linear pattern in this particular graph. However, keep in mind that Pearson correlation is only capable of detecting linear associations, so it is possible to have a pair of variables with a strong nonlinear relationship and a small Pearson correlation coefficient. It is good practice to create scatterplots of your variables to corroborate your correlation coefficients.

[1]  Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.

[2]  Scatterplots created in R using ggplot2 , ggthemes::theme_tufte() , and MASS::mvrnorm() .

Data Set Up

Your data should include two or more continuous numeric variables.

Correlation with PROC CORR

The CORR procedure produces Pearson correlation coefficients of continuous numeric variables. The basic syntax of the CORR procedure is: 

In the first line of the SAS code above, PROC CORR tells SAS to execute the CORR procedure on the dataset given in the DATA= argument. Immediately following PROC CORR is where you put any procedure-level options you want to include. Let’s review some of the more common options:

  • NOMISS Excludes observations with missing values on any of the analysis variables specified in the VAR or WITH statements (i.e., listwise exclusion).
  • PLOTS=MATRIX Creates a scatterplot matrix of the variables in the VAR and/or WITH statements.
  • PLOTS=MATRIX(HISTOGRAM) Same as above, but changes the panels on the diagonal of the scatterplot matrix to display histograms of the variables in the VAR statement. (The HISTOGRAM option is ignored if you include a WITH statement.)
  • PLOTS=SCATTER Creates individual scatterplots of the variables in the VAR and/or WITH statements.
  • PLOTS(MAXPOINTS=n)= <...> Used to increase the limit on the number of datapoints used in a plot to some number n . By default, n is 5000. Can be used in conjunction with any of the above options for MATRIX and SCATTER. If you have included PLOTS syntax in your script but do not see any plots in your output, check your log window; if you see the message WARNING: The scatter plot matrix with more than 5000 points has been suppressed. Use the PLOTS(MAXPOINTS= ) option in the PROC CORR statement to change or override the cutoff. then you should try revising the code to PLOTS(MAXPOINTS=15000)= and rerun.

On the next line, the VAR statement is where you specify all of the variables you want to compute pairwise correlations for. You can list as many variables as you want, with each variable separated by a space. If the VAR statement is not included, then SAS will include every numeric variable that does not appear in any other of the statements.

The WITH statement is optional, but is typically used if you only want to run correlations between certain combinations of variables. If both the VAR and WITH statements are used, each variable in the WITH statement will be correlated against each variable in the VAR statement.

When ODS graphics are turned on and you request plots from PROC CORR, each plot will be saved as a PNG file in the same directory where your SAS code is. If you run the same code multiple times, it will create new graphics files for each run (rather than overwriting the old ones).
  • SAS 9.2 Procedures Guide - PROC CORR
  • SAS 9.2 Procedures Guide - PROC CORR - CORR Statement Options

Example: Understanding the linear association between height and weight

Problem statement.

Perhaps you would like to test whether there is a statistically significant linear relationship between two continuous variables, weight and height (and by extension, infer whether the association is significant in the population). You can use a bivariate Pearson Correlation to test whether there is a statistically significant linear relationship between height and weight, and to determine the strength and direction of the association.

Before the Test

Before we look at the Pearson correlations, we should look at the scatterplots of our variables to get an idea of what to expect. In particular, we need to determine if it's reasonable to assume that our variables have linear relationships. PROC CORR automatically includes descriptive statistics (including mean, standard deviation, minimum, and maximum) for the input variables, and can optionally create scatterplots and/or scatterplot matrices. (Note that the plots require the ODS graphics system . If you are using SAS 9.3 or later, ODS is turned on by default.)

In the sample data, we will use two variables: “Height” and “Weight.” The variable “Height” is a continuous measure of height in inches and exhibits a range of values from 55.00 to 84.41. The variable “Weight” is a continuous measure of weight in pounds and exhibits a range of values from 101.71 to 350.07.

Running the Test

Sas program.

The first two tables tell us what variables were analyzed, and their descriptive statistics.

2 Variables: Weight Height

The third table contains the Pearson correlation coefficients and test results.

hypothesis test pearson correlation

Notice that the correlations in the main diagonal (cells A and D) are all equal to 1. This is because a variable is always perfectly correlated with itself. Notice, however, that the sample sizes are different in cell A ( n =376) versus cell D ( n =408). This is because of missing data -- there are more missing observations for variable Weight than there are for variable Height, respectively.

The important cells we want to look at are either B or C. (Cells B and C are identical, because they include information about the same pair of variables.) Cells B and D contain the correlation coefficient itself, its p-value, and the number of complete pairwise observations that the calculation was based on.

In cell B (repeated in cell C), we can see that the Pearson correlation coefficient for height and weight is .513, which is significant ( p < .001 for a two-tailed test), based on 354 complete observations (i.e., cases with nonmissing values for both height and weight).

If you used the PLOTS=SCATTER option in the PROC CORR statement, you will see a scatter plot:

hypothesis test pearson correlation

Decision and Conclusions

Based on the results, we can state the following:

  • Weight and height have a statistically significant linear relationship ( r = 0.51, p < .001).
  • The direction of the relationship is positive (i.e., height and weight are positively correlated), meaning that these variables tend to increase together (i.e., greater height is associated with greater weight).
  • The magnitude, or strength, of the association is moderate (.3 < | r | < .5).
  • << Previous: Analyzing Data
  • Next: Chi-Square Test of Independence >>
  • Last Updated: Dec 18, 2023 12:59 PM
  • URL: https://libguides.library.kent.edu/SAS

Street Address

Mailing address, quick links.

  • How Are We Doing?
  • Student Jobs

Information

  • Accessibility
  • Emergency Information
  • For Our Alumni
  • For the Media
  • Jobs & Employment
  • Life at KSU
  • Privacy Statement
  • Technology Support
  • Website Feedback
  • How It Works

Pearson Correlation Coefficient

  • July 4, 2021
  • Posted by: admin
  • Categories: SPSS Analysis Help, Statistical Test

pearson correlation

Why is Pearson’s correlation used?

Pearson Correlation Coefficient is typically used to describe the strength of the linear relationship between two quantitative variables. Often, these two variables are designated X (predictor) and Y (outcome). Pearson’s r has values that range from −1.00 to +1.00. The sign of r provides information about the direction of the relationship between X and Y. A positive correlation indicates that as scores on X increase, scores on Y also tend to increase; a negative correlation indicates that as scores on X increase, scores on Y tend to decrease; and a correlation near 0 indicates that as scores on X increase, scores on Y neither increase nor decrease in a linear manner.

What does the Correlation coefficient tell you?

The absolute magnitude of Pearson’s r provides information about the strength of the linear association between scores on X and Y. For values of r close to 0, there is no linear association between X and Y. When r = +1.00, there is a perfect positive linear association; when r = −1.00, there is a perfect negative linear association. Intermediate values of r correspond to intermediate strength of the relationship. Figures 7.2 through 7.5 show examples of data for which the correlations are r = +.75, r = +.50, r = +.23, and r = .00.

Pearson’s r is a standardized or unit-free index of the strength of the linear relationship between two variables. No matter what units are used to express the scores on the X and Y variables, the possible values of Pearson’s r range from –1 (a perfect negative linear relationship) to +1 (a perfect positive linear relationship).

How do you explain correlation analysis?

Consider, for example, a correlation between height and weight. Height could be measured in inches, centimeters, or feet; weight could be measured in ounces, pounds, or kilograms. When we correlate scores on height and weight for a given sample of people, the correlation has the same value no matter which of these units are used to measure height and weight. This happens because the scores on X and Y are converted to z scores (i.e., they are converted to unit-free or standardized distances from their means) during the computation of Pearson’s r.

What is the null hypothesis for Pearson correlation?

A correlation coefficient may be tested to determine whether the coefficient significantly differs from zero. The value r is obtained on a sample. The value rho (ρ) is the population’s correlation coefficient. It is hoped that r closely approximates rho. The null and alternative hypotheses are as follows:

What are the assumptions for Pearson correlation?

The assumptions for the Pearson correlation coefficient are as follows: level of measurement, related pairs, absence of outliers, normality of variables, linearity, and homoscedasticity.

Linear Relationship

When using the Pearson correlation coefficient, it is assumed that the cluster of points is the best fit by a straight line.

Homoscedasticity

A second assumption of the correlation coefficient is that of homoscedasticity. This assumption is met if the distance from the points to the line is relatively equal all along the line.

If the normality assumption is violated, you can use the Spearmen Rho Correlation test

Our statisticians take it all in their stride and will produce the clever result you’re looking for.

spssanalysis-spss

Looking For a Statistician to Do Your Pearson Correlation Analysis?

12.4 Testing the Significance of the Correlation Coefficient

The correlation coefficient, r , tells us about the strength and direction of the linear relationship between x and y . However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient r and the sample size n , together.

We perform a hypothesis test of the "significance of the correlation coefficient" to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population.

The sample data are used to compute r , the correlation coefficient for the sample. If we had data for the entire population, we could find the population correlation coefficient. But because we have only sample data, we cannot calculate the population correlation coefficient. The sample correlation coefficient, r , is our estimate of the unknown population correlation coefficient.

  • The symbol for the population correlation coefficient is ρ , the Greek letter "rho."
  • ρ = population correlation coefficient (unknown)
  • r = sample correlation coefficient (known; calculated from sample data)

The hypothesis test lets us decide whether the value of the population correlation coefficient ρ is "close to zero" or "significantly different from zero". We decide this based on the sample correlation coefficient r and the sample size n .

If the test concludes that the correlation coefficient is significantly different from zero, we say that the correlation coefficient is "significant."

  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is significantly different from zero.
  • What the conclusion means: There is a significant linear relationship between x and y . We can use the regression line to model the linear relationship between x and y in the population.

If the test concludes that the correlation coefficient is not significantly different from zero (it is close to zero), we say that correlation coefficient is "not significant".

  • Conclusion: "There is insufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is not significantly different from zero."
  • What the conclusion means: There is not a significant linear relationship between x and y . Therefore, we CANNOT use the regression line to model a linear relationship between x and y in the population.
  • If r is significant and the scatter plot shows a linear trend, the line can be used to predict the value of y for values of x that are within the domain of observed x values.
  • If r is not significant OR if the scatter plot does not show a linear trend, the line should not be used for prediction.
  • If r is significant and if the scatter plot shows a linear trend, the line may NOT be appropriate or reliable for prediction OUTSIDE the domain of observed x values in the data.

PERFORMING THE HYPOTHESIS TEST

  • Null Hypothesis: H 0 : ρ = 0
  • Alternate Hypothesis: H a : ρ ≠ 0

WHAT THE HYPOTHESES MEAN IN WORDS:

  • Null Hypothesis H 0 : The population correlation coefficient IS NOT significantly different from zero. There IS NOT a significant linear relationship (correlation) between x and y in the population.
  • Alternate Hypothesis H a : The population correlation coefficient IS significantly DIFFERENT FROM zero. There IS A SIGNIFICANT LINEAR RELATIONSHIP (correlation) between x and y in the population.

DRAWING A CONCLUSION: There are two methods of making the decision. The two methods are equivalent and give the same result.

  • Method 1: Using the p -value
  • Method 2: Using a table of critical values

In this chapter of this textbook, we will always use a significance level of 5%, α = 0.05

Using the p -value method, you could choose any appropriate significance level you want; you are not limited to using α = 0.05. But the table of critical values provided in this textbook assumes that we are using a significance level of 5%, α = 0.05. (If we wanted to use a different significance level than 5% with the critical value method, we would need different tables of critical values that are not provided in this textbook.)

METHOD 1: Using a p -value to make a decision

Using the ti-83, 83+, 84, 84+ calculator.

To calculate the p -value using LinRegTTEST: On the LinRegTTEST input screen, on the line prompt for β or ρ , highlight " ≠ 0 " The output screen shows the p-value on the line that reads "p =". (Most computer statistical software can calculate the p -value.)

  • Decision: Reject the null hypothesis.
  • Conclusion: "There is sufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is significantly different from zero."
  • Decision: DO NOT REJECT the null hypothesis.
  • Conclusion: "There is insufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is NOT significantly different from zero."
  • You will use technology to calculate the p -value. The following describes the calculations to compute the test statistics and the p -value:
  • The p -value is calculated using a t -distribution with n - 2 degrees of freedom.
  • The formula for the test statistic is t = r n − 2 1 − r 2 t = r n − 2 1 − r 2 . The value of the test statistic, t , is shown in the computer or calculator output along with the p -value. The test statistic t has the same sign as the correlation coefficient r .
  • The p -value is the combined area in both tails.

An alternative way to calculate the p -value (p) given by LinRegTTest is the command 2*tcdf(abs(t),10^99, n-2) in 2nd DISTR.

  • Consider the third exam/final exam example .
  • The line of best fit is: ŷ = -173.51 + 4.83 x with r = 0.6631 and there are n = 11 data points.
  • Can the regression line be used for prediction? Given a third exam score ( x value), can we use the line to predict the final exam score (predicted y value)?
  • H 0 : ρ = 0
  • H a : ρ ≠ 0
  • The p -value is 0.026 (from LinRegTTest on your calculator or from computer software).
  • The p -value, 0.026, is less than the significance level of α = 0.05.
  • Decision: Reject the Null Hypothesis H 0
  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between the third exam score ( x ) and the final exam score ( y ) because the correlation coefficient is significantly different from zero.

Because r is significant and the scatter plot shows a linear trend, the regression line can be used to predict final exam scores.

METHOD 2: Using a table of Critical Values to make a decision

The 95% Critical Values of the Sample Correlation Coefficient Table can be used to give you a good idea of whether the computed value of r r is significant or not . Compare r to the appropriate critical value in the table. If r is not between the positive and negative critical values, then the correlation coefficient is significant. If r is significant, then you may want to use the line for prediction.

Example 12.7

Suppose you computed r = 0.801 using n = 10 data points. df = n - 2 = 10 - 2 = 8. The critical values associated with df = 8 are -0.632 and + 0.632. If r < negative critical value or r > positive critical value, then r is significant. Since r = 0.801 and 0.801 > 0.632, r is significant and the line may be used for prediction. If you view this example on a number line, it will help you.

Try It 12.7

For a given line of best fit, you computed that r = 0.6501 using n = 12 data points and the critical value is 0.576. Can the line be used for prediction? Why or why not?

Example 12.8

Suppose you computed r = –0.624 with 14 data points. df = 14 – 2 = 12. The critical values are –0.532 and 0.532. Since –0.624 < –0.532, r is significant and the line can be used for prediction

Try It 12.8

For a given line of best fit, you compute that r = 0.5204 using n = 9 data points, and the critical value is 0.666. Can the line be used for prediction? Why or why not?

Example 12.9

Suppose you computed r = 0.776 and n = 6. df = 6 – 2 = 4. The critical values are –0.811 and 0.811. Since –0.811 < 0.776 < 0.811, r is not significant, and the line should not be used for prediction.

Try It 12.9

For a given line of best fit, you compute that r = –0.7204 using n = 8 data points, and the critical value is = 0.707. Can the line be used for prediction? Why or why not?

THIRD-EXAM vs FINAL-EXAM EXAMPLE: critical value method

Consider the third exam/final exam example . The line of best fit is: ŷ = –173.51+4.83 x with r = 0.6631 and there are n = 11 data points. Can the regression line be used for prediction? Given a third-exam score ( x value), can we use the line to predict the final exam score (predicted y value)?

  • Use the "95% Critical Value" table for r with df = n – 2 = 11 – 2 = 9.
  • The critical values are –0.602 and +0.602
  • Since 0.6631 > 0.602, r is significant.
  • Conclusion:There is sufficient evidence to conclude that there is a significant linear relationship between the third exam score ( x ) and the final exam score ( y ) because the correlation coefficient is significantly different from zero.

Example 12.10

Suppose you computed the following correlation coefficients. Using the table at the end of the chapter, determine if r is significant and the line of best fit associated with each r can be used to predict a y value. If it helps, draw a number line.

  • r = –0.567 and the sample size, n , is 19. The df = n – 2 = 17. The critical value is –0.456. –0.567 < –0.456 so r is significant.
  • r = 0.708 and the sample size, n , is nine. The df = n – 2 = 7. The critical value is 0.666. 0.708 > 0.666 so r is significant.
  • r = 0.134 and the sample size, n , is 14. The df = 14 – 2 = 12. The critical value is 0.532. 0.134 is between –0.532 and 0.532 so r is not significant.
  • r = 0 and the sample size, n , is five. No matter what the dfs are, r = 0 is between the two critical values so r is not significant.

Try It 12.10

For a given line of best fit, you compute that r = 0 using n = 100 data points. Can the line be used for prediction? Why or why not?

Assumptions in Testing the Significance of the Correlation Coefficient

Testing the significance of the correlation coefficient requires that certain assumptions about the data are satisfied. The premise of this test is that the data are a sample of observed points taken from a larger population. We have not examined the entire population because it is not possible or feasible to do so. We are examining the sample to draw a conclusion about whether the linear relationship that we see between x and y in the sample data provides strong enough evidence so that we can conclude that there is a linear relationship between x and y in the population.

The regression line equation that we calculate from the sample data gives the best-fit line for our particular sample. We want to use this best-fit line for the sample as an estimate of the best-fit line for the population. Examining the scatterplot and testing the significance of the correlation coefficient helps us determine if it is appropriate to do this.

  • There is a linear relationship in the population that models the average value of y for varying values of x . In other words, the expected value of y for each particular value lies on a straight line in the population. (We do not know the equation for the line for the population. Our regression line from the sample is our best estimate of this line in the population.)
  • The y values for any particular x value are normally distributed about the line. This implies that there are more y values scattered closer to the line than are scattered farther away. Assumption (1) implies that these normal distributions are centered on the line: the means of these normal distributions of y values lie on the line.
  • The standard deviations of the population y values about the line are equal for each value of x . In other words, each of these normal distributions of y values has the same shape and spread about the line.
  • The residual errors are mutually independent (no pattern).
  • The data are produced from a well-designed, random sample or randomized experiment.

As an Amazon Associate we earn from qualifying purchases.

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute OpenStax.

Access for free at https://openstax.org/books/introductory-statistics/pages/1-introduction
  • Authors: Barbara Illowsky, Susan Dean
  • Publisher/website: OpenStax
  • Book title: Introductory Statistics
  • Publication date: Sep 19, 2013
  • Location: Houston, Texas
  • Book URL: https://openstax.org/books/introductory-statistics/pages/1-introduction
  • Section URL: https://openstax.org/books/introductory-statistics/pages/12-4-testing-the-significance-of-the-correlation-coefficient

© Jun 23, 2022 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

hypothesis test pearson correlation

Correlation and Regression with R

  •   1  
  • |   2  
  • |   3  
  • |   4  
  • |   5  
  • |   6  
  • |   7  
  • Contributing Authors:
  • Learning Objectives
  • The Dataset

Correlation

Pearson correlation, spearman's rank correlation, some notes on correlation.

  • Simple Linear Regression
  • Introduction
  • Simple Linear Regression Model Fitting
  • Other Functions for Fitted Linear Model Objects
  • Multiple Linear Regression
  • Model Specification and Output
  • Model with Categorical Variables or Factors
  • Regression Diagnostics
  • Model Assumptions
  • Diagnostic Plots
  • More Diagnostics

More Resources sidebar

Intro to R Contents

Common R Commands

Correlation is one of the most common statistics. Using one single value, it describes the "degree of relationship" between two variables. Correlation ranges from -1 to +1. Negative values of correlation indicate that as one variable increases the other variable decreases.  Positive values of correlation indicate that as one variable increase the other variable increases as well.  There are three options to calculate correlation in R, and we will introduce two of them below.

For a nice synopsis of correlation, see https://statistics.laerd.com/statistical-guides/pearson-correlation-coefficient-statistical-guide.php

The most commonly used type of correlation is Pearson correlation, named after Karl Pearson, introduced this statistic around the turn of the 20 th century. Pearson's r measures the linear relationship between two variables, say X and Y . A correlation of 1 indicates the data points perfectly lie on a line for which Y increases as X increases. A value of -1 also implies the data points lie on a line; however, Y decreases as X increases. The formula for r is

hypothesis test pearson correlation

(in the same way that we distinguish between Ȳ and µ, similarly we distinguish r from ρ)

 The Pearson correlation has two assumptions:

  • The two variables are normally distributed.  We can test this assumption using
  • A statistical test (Shapiro-Wilk)
  • A histogram
  • The relationship between the two variables is linear. If this relationship is found to be curved, etc. we need to use another correlation test. We can test this assumption by examining the scatterplot between the two variables.

To calculate Pearson correlation, we can use the cor() function . The default method for cor() is the Pearson correlation. Getting a correlation is generally only half the story, and you may want to know if the relationship is statistically significantly different from 0.

  • H 0 : There is no correlation between the two variables: ρ = 0
  • H a : There is a nonzero correlation between the two variables: ρ ≠ 0

To assess statistical significance, you can use cor.test() function.

> cor(fat$age, fat$pctfat.brozek, method="pearson")

[1] 0.2891735

> cor.test(fat$age, fat$pctfat.brozek, method="pearson")

        Pearson's product-moment correlation

data:  fat$age and fat$pctfat.brozek

t = 4.7763, df = 250, p-value = 3.045e-06

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

 0.1717375 0.3985061

sample estimates:

      cor

When testing the null hypothesis that there is no correlation between age and Brozek percent body fat, we reject the null hypothesis (r = 0.289, t = 4.77, with 250 degrees of freedom, and a p-value = 3.045e-06). As age increases so does Brozek percent body fat. The 95% confidence interval for the correlation between age and Brozek percent body fat is (0.17, 0.40). Note that this 95% confidence interval does not contain 0, which is consistent with our decision to reject the null hypothesis.

Spearman's rank correlation is a nonparametric measure of the correlation that uses the rank of observations in its calculation, rather than the original numeric values. It measures the monotonic relationship between two variables X and Y. That is, if Y tends to increase as X increases, the Spearman correlation coefficient is positive. If Y tends to decrease as X increases, the Spearman correlation coefficient is negative. A value of zero indicates that there is no tendency for Y to either increase or decrease when X increases. The Spearman correlation measurement makes no assumptions about the distribution of the data.

The formula for Spearman's correlation ρ s is

hypothesis test pearson correlation

where d i is the difference in the ranked observations from each group, ( x i – y i ), and n is the sample size. No need to memorize this formula!

> cor(fat$age,fat$pctfat.brozek, method="spearman")

[1] 0.2733830

> cor.test(fat$age,fat$pctfat.brozek, method="spearman")

        Spearman's rank correlation rho

S = 1937979, p-value = 1.071e-05

alternative hypothesis: true rho is not equal to 0

      rho

Thus we reject the null hypothesis that there is no (Spearman) correlation between age and Brozek percent fat (r = 0.27, p-value = 1.07e-05). As age increases so does percent body fat.

Correlation, useful though it is, is one of the most misused statistics in all of science. People always seem to want a simple number describing a relationship. Yet data very, very rarely obey this imperative. It is clear what a Pearson correlation of 1 or -1 means, but how do we interpret a correlation of 0.4? It is not so clear.

To see how the Pearson measure is dependent on the data distribution assumptions (in particular linearity), observe the following deterministic relationship: y = x 2 . Here the relationship between x and y isn't just "correlated," in the colloquial sense, it is totally deterministic ! If we generate data for this relationship, the Pearson correlation is 0!

> x<-seq(-10,10, 1)

> y<-x*x

> plot(x,y)

> cor(x,y)

hypothesis test pearson correlation

return to top | previous page | next page

Creative Commons license Attribution Non-commercial

Pearson Product-Moment Correlation

What does this test do.

The Pearson product-moment correlation coefficient (or Pearson correlation coefficient, for short) is a measure of the strength of a linear association between two variables and is denoted by r . Basically, a Pearson product-moment correlation attempts to draw a line of best fit through the data of two variables, and the Pearson correlation coefficient, r , indicates how far away all these data points are to this line of best fit (i.e., how well the data points fit this new model/line of best fit).

What values can the Pearson correlation coefficient take?

The Pearson correlation coefficient, r , can take a range of values from +1 to -1. A value of 0 indicates that there is no association between the two variables. A value greater than 0 indicates a positive association; that is, as the value of one variable increases, so does the value of the other variable. A value less than 0 indicates a negative association; that is, as the value of one variable increases, the value of the other variable decreases. This is shown in the diagram below:

Pearson Coefficient - Different Values

How can we determine the strength of association based on the Pearson correlation coefficient?

The stronger the association of the two variables, the closer the Pearson correlation coefficient, r , will be to either +1 or -1 depending on whether the relationship is positive or negative, respectively. Achieving a value of +1 or -1 means that all your data points are included on the line of best fit – there are no data points that show any variation away from this line. Values for r between +1 and -1 (for example, r = 0.8 or -0.4) indicate that there is variation around the line of best fit. The closer the value of r to 0 the greater the variation around the line of best fit. Different relationships and their correlation coefficients are shown in the diagram below:

Different values for the Pearson Correlation Coefficient

Are there guidelines to interpreting Pearson's correlation coefficient?

Yes, the following guidelines have been proposed:

Remember that these values are guidelines and whether an association is strong or not will also depend on what you are measuring.

Can you use any type of variable for Pearson's correlation coefficient?

No, the two variables have to be measured on either an interval or ratio scale. However, both variables do not need to be measured on the same scale (e.g., one variable can be ratio and one can be interval). Further information about types of variable can be found in our Types of Variable guide. If you have ordinal data, you will want to use Spearman's rank-order correlation or a Kendall's Tau Correlation instead of the Pearson product-moment correlation.

Do the two variables have to be measured in the same units?

No, the two variables can be measured in entirely different units. For example, you could correlate a person's age with their blood sugar levels. Here, the units are completely different; age is measured in years and blood sugar level measured in mmol/L (a measure of concentration). Indeed, the calculations for Pearson's correlation coefficient were designed such that the units of measurement do not affect the calculation. This allows the correlation coefficient to be comparable and not influenced by the units of the variables used.

What about dependent and independent variables?

The Pearson product-moment correlation does not take into consideration whether a variable has been classified as a dependent or independent variable. It treats all variables equally. For example, you might want to find out whether basketball performance is correlated to a person's height. You might, therefore, plot a graph of performance against height and calculate the Pearson correlation coefficient. Lets say, for example, that r = .67. That is, as height increases so does basketball performance. This makes sense. However, if we plotted the variables the other way around and wanted to determine whether a person's height was determined by their basketball performance (which makes no sense), we would still get r = .67. This is because the Pearson correlation coefficient makes no account of any theory behind why you chose the two variables to compare. This is illustrated below:

Not influenced by Dependent and Independent Variables

Does the Pearson correlation coefficient indicate the slope of the line?

It is important to realize that the Pearson correlation coefficient, r , does not represent the slope of the line of best fit. Therefore, if you get a Pearson correlation coefficient of +1 this does not mean that for every unit increase in one variable there is a unit increase in another. It simply means that there is no variation between the data points and the line of best fit. This is illustrated below:

The Pearson Coefficient does not indicate the slope of the line of best fit.

What assumptions does Pearson's correlation make?

The first and most important step before analysing your data using Pearson’s correlation is to check whether it is appropriate to use this statistical test. After all, Pearson’s correlation will only give you valid/accurate results if your study design and data " pass/meet " seven assumptions that underpin Pearson’s correlation.

In many cases, Pearson’s correlation will be the incorrect statistical test to use because your data " violates/does not meet " one or more of these assumptions. This is not uncommon when working with real-world data, which is often "messy", as opposed to textbook examples. However, there is often a solution, whether this involves using a different statistical test , or making adjustments to your data so that you can continue to use Pearson’s correlation.

We briefly set out the seven assumptions below, three of which relate to your study design and how you measured your variables (i.e., Assumptions #1, #2 and #3 below), and four which relate to the characteristics of your data (i.e., Assumptions #4, #5, #6 and #7 below):

Note: We list seven assumptions below, but there is disagreement in the statistics literature whether the term "assumptions" should be used to describe all of these (e.g., see Nunnally, 1978). We highlight this point for transparency. However, we use the word "assumptions" to stress their importance and to indicate that they should be examined closely when using a Pearson’s correlation if you want accurate/valid results. We also use the word "assumptions" to indicate that where some of these are not met, Pearson’s correlation will no longer be the correct statistical test to analyse your data.

  • Assumption #1: Your two variables should be measured on a continuous scale (i.e., they are measured at the interval or ratio level). Examples of continuous variables include revision time (measured in hours), intelligence (measured using IQ score), exam performance (measured from 0 to 100), weight (measured in kg), driving speed (measured in km/h) and so forth.
  • Assumption #2: Your two continuous variables should be paired , which means that each case (e.g., each participant) has two values: one for each variable. These "values" are also referred to as "data points". For example, imagine that you had collected the revision times (measured in hours) and exam results (measured from 0 to 100) from 100 randomly sampled students at a university (i.e., you have two continuous variables: "revision time" and "exam performance"). Each of the 100 students would have a value for revision time (e.g., "student #1" studied for "23 hours") and an exam result (e.g., "student #1" scored "81 out of 100"). Therefore, you would have 100 paired values.

Note: The independence of cases assumption is also known as the independence of observations assumption.

Since assumptions #1, #2 and #3 relate to your study design and how you measured your variables , if any of these three assumptions are not met (i.e., if any of these assumptions do not fit with your research), Pearson’s correlation is the incorrect statistical test to analyse your data. It is likely that there will be other statistical tests you can use instead, but Pearson’s correlation is not the correct test.

After checking if your study design and variables meet assumptions #1, #2 and #3 , you should now check if your data also meets assumptions #4, #5, #6 and #7 below. When checking if your data meets these four assumptions, do not be surprised if this process takes up the majority of the time you dedicate to carrying out your analysis. As we mentioned above, it is not uncommon for one or more of these assumptions to be violated (i.e., not met) when working with real-world data rather than textbook examples. However, with the right guidance this does not need to be a difficult process and there are often other statistical analysis techniques that you can carry out that will allow you to continue with your analysis.

Note: If your two continuous, paired variables (i.e., Assumptions #1 and 2) follow a bivariate normal distribution , there will be linearity, univariate normality and homoscedasticity (i.e., Assumptions #4, #5 and #6 below; e.g., Lindeman et al., 1980). Unfortunately, the assumption of bivariate normality is very difficult to test, which is why we focus on linearity and univariate normality instead. Homoscedasticity is also difficult to test, but we include this so that you know why it is important. We include outliers at the end (i.e., Assumption #7) because they cannot only lead to violations of the linearity and univariate normality assumptions, but they also have a large impact on the value of Pearson’s correlation coefficient, r (e.g., Wilcox, 2012).

Detecting a Linear Relationship.

Note: Pearson's correlation coefficient is a measure of the strength of a linear association between two variables. Put another way, it determines whether there is a linear component of association between two continuous variables. As such, linearity is not strictly an "assumption" of Pearson's correlation. However, you would not normally want to use Pearson's correlation to determine the strength and direction of a linear relationship when you already know the relationship between your two variables is not linear. Instead, the relationship between your two variables might be better described by another statistical measure (Cohen, 2013). For this reason, it is not uncommon to view the relationship between your two variables in a scatterplot to see if running a Pearson's correlation is the best choice as a measure of association or whether another measure would be better.

Note: The disagreements about the robustness of Pearson’s correlation are based on additional assumptions that are made to justify robustness under non-normality and whether these additional assumptions are likely to be true in practice. For further reading on this issue, see, for example, Edgell and Noon (1984) and Hogg and Craig (2014).

Homoscedasticity in Correlation.

Note: Outliers are not necessarily "bad", but due to the effect they have on the Pearson correlation coefficient, r , discussed on the next page , they need to be taken into account.

You can check whether your data meets assumptions #4, #5 and #7 using a number of statistics packages (to learn more, see our guides for: SPSS Statistics , Stata and Minitab ). If any of these seven assumptions are violated (i.e., not met), there are often other statistical analysis techniques that you can carry out that will allow you to continue with your analysis (e.g., see Shevlyakov and Oja, 2016).

On the next page we discuss other characteristics of Pearson's correlation that you should consider.

Pearson Correlation

Pearson correlation analysis examines the relationship between two variables. For example, is there a correlation between a person's age and salary?

Pearson Correlation

More specifically, we can use the pearson correlation coefficient to measure the linear relationship between two variables.

Strength and direction of correlation

With a correlation analysis we can determine:

  • How strong the correlation is
  • and in which direction the correlation goes.

We can read the strength and direction of the correlation in the Pearson correlation coefficient r , whose value varies between -1 and 1.

Strength of the correlation

The strength of the correlation can be read in a table. An r between 0 and 0.1 indicates no correlation. An amount of r between 0.7 and 1 is indicates a very strong correlation.

Direction of the correlation

A positive relationship or correlation exists when large values of one variable are associated with large values of the other variable, or when small values of one variable are associated with small values of the other variable.

Positive Pearson correlation coefficient

A positive correlation results, for example, for height and shoe size. This results in a positive correlation coefficient.

positive correlation coefficient

A negative correlation is when large values of one variable are associated with small values of the other variable and vice versa.

negative Pearson correlation coefficient

A negative correlation is usually found between product price and sales volume. This results in a negative correlation coefficient.

negative correlation coefficient

Calculate Pearson correlation

The Pearson correlation coefficient is calculated using the following equation. Here r is the Pearson correlation coefficient, x i are the individual values of one variable e.g. age, y i are the individual values of the other variable e.g. salary and x̄ and ȳ are the mean values of the two variables respectively.

Equation Pearson Correlation

In the equation, we can see that the respective mean value is first subtracted from both variables.

So in our example, we calculate the mean values of age and salary. We then subtract the mean values from each of age and salary. We then multiply both values. We then sum up the individual results of the multiplication. The expression in the denominator ensures that the correlation coefficient is scaled between -1 and 1.

If we now multiply two positive values we get a positive value. If we multiply two negative values we also get a positive value (minus times minus is plus). So all values that lie in these ranges have a positive influence on the correlation coefficient.

Positive correlation Pearson correlation

If we multiply a positive value and a negative value we get a negative value (minus times plus is minus). So all values that are in these ranges have a negative influence on the correlation coefficient.

negative correlation Pearson correlation

Therefore, if our values are predominantly in the two green areas from previous two figures, we get a positive correlation coefficient and therefore a positive correlation.

If our scores are predominantly in the two red areas from the figures, we get a negative correlation coefficient and thus a negative correlation.

If the points are distributed over all four areas, the positive terms and the negative terms cancel each other out and we might end up with a very small or no correlation.

Testing correlation coefficients for significance

In general, the correlation coefficient is calculated using data from a sample. In most cases, however, we want to test a hypothesis about the population.

Pearson Correlation Sample

In the case of correlation analysis, we then want to know if there is a correlation in the population.

For this, we test whether the correlation coefficient in the sample is statistically significantly different from zero.

Hypotheses in the Pearson Correlation

The null hypothesis and the alternative hypothesis in Pearson correlation are thus:

  • Null hypothesis: The correlation coefficient is not significantly different from zero (There is no linear relationship).
  • Alternative hypothesis: The correlation coefficient deviates significantly from zero (there is a linear correlation).

Attention: It is always tested whether the null hypothesis is rejected or not rejected.

In our example with the salary and the age of a person, we could thus have the question: Is there a correlation between age and salary in the German population (the population)?

To find out, we draw a sample and test whether the correlation coefficient is significantly different from zero in this sample.

  • The null hypothesis is then: There is no correlation between salary and age in the German population.
  • and the alternative hypothesis: There is a correlation between salary and age in the German population.

Significance and the t-test

Whether the Pearson correlation coefficient is significantly different from zero based on the sample surveyed can be checked using a t-test . Here, r is the correlation coefficient and n is the sample size.

Significance of the Pearson correlation

A p-value can then be calculated from the test statistic t . If the p-value is smaller than the specified significance level, which is usually 5%, then the null hypothesis is rejected, otherwise it is not.

Assumptions of the Pearson correlation

But what about the assumptions for a Pearson correlation? Here we have to distinguish whether we just want to calculate the Pearson correlation coefficient, or whether we want to test a hypothesis.

To calculate the Pearson correlation coefficient, only two metric variables must be present. Metric variables are, for example, a person's weight,a person's salary or electricity consumption.

The Pearson correlation coefficient, then tells us how large the linear relationship is. If there is a non-linear correlation, we cannot read it from the Pearson correlation coefficient.

Assumptions of the Pearson correlation

However, if we want to test whether the Pearson correlation coefficient is significantly different from zero in the sample, i.e. we want to test a hypothesis, the two variables must also be normally distributed!

Pearson correlation normal distribution

If this is not given, the calculated test statistic t or the p-value cannot be interpreted reliably. If the assumptions are not met, Spearman's rank correlation can be used.

Calculate Pearson correlation online with DATAtab

If you like, you can of course calculate a correlation analysis online with DATAtab. To do this, simply copy your data into this table in the statistics calculator and click on either the Hypothesis tests or Correlation tab.

If you now look at two metric variables, a Pearson correlation will be calculated automatically. If you don't know exactly how to interpret the results, you can also just click on Summary in words !

Statistics made easy

  • many illustrative examples
  • ideal for exams and theses
  • statistics made easy on 301 pages
  • 4rd revised edition (February 2024)
  • Only 6.99 €

Datatab

"Super simple written"

"It could not be simpler"

"So many helpful examples"

Statistics Calculator

Cite DATAtab: DATAtab Team (2024). DATAtab: Online Statistics Calculator. DATAtab e.U. Graz, Austria. URL https://datatab.net

A comparison of the Pearson and Spearman correlation methods

In this topic, what is correlation, comparison of pearson and spearman coefficients, other nonlinear relationships.

The Pearson correlation evaluates the linear relationship between two continuous variables. A relationship is linear when a change in one variable is associated with a proportional change in the other variable.

For example, you might use a Pearson correlation to evaluate whether increases in temperature at your production facility are associated with decreasing thickness of your chocolate coating.

The Spearman correlation evaluates the monotonic relationship between two continuous or ordinal variables. In a monotonic relationship, the variables tend to change together, but not necessarily at a constant rate. The Spearman correlation coefficient is based on the ranked values for each variable rather than the raw data.

Spearman correlation is often used to evaluate relationships involving ordinal variables. For example, you might use a Spearman correlation to evaluate whether the order in which employees complete a test exercise is related to the number of months they have been employed.

It is always a good idea to examine the relationship between variables with a scatterplot. Correlation coefficients only measure linear (Pearson) or monotonic (Spearman) relationships. Other relationships are possible.

hypothesis test pearson correlation

Pearson = +1, Spearman = +1

If the relationship is that one variable increases when the other increases, but the amount is not consistent, the Pearson correlation coefficient is positive but less than +1. The Spearman coefficient still equals +1 in this case.

hypothesis test pearson correlation

Pearson = +0.851, Spearman = +1

hypothesis test pearson correlation

Pearson = −0.093, Spearman = −0.093

hypothesis test pearson correlation

Pearson = −1, Spearman = −1

hypothesis test pearson correlation

Pearson = −0.799, Spearman = −1

Correlation values of −1 or 1 imply an exact linear relationship, like that between a circle's radius and circumference. However, the real value of correlation values is in quantifying less than perfect relationships. Finding that two variables are correlated often informs a regression analysis which tries to describe this type of relationship more.

hypothesis test pearson correlation

Coefficient of 0

This graph shows a very strong relationship. The Pearson coefficient and Spearman coefficient are both approximately 0.

  • Minitab.com
  • License Portal
  • Cookie Settings

You are now leaving support.minitab.com.

Click Continue to proceed to:

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

12.5: Testing the Significance of the Correlation Coefficient

  • Last updated
  • Save as PDF
  • Page ID 800

The correlation coefficient, \(r\), tells us about the strength and direction of the linear relationship between \(x\) and \(y\). However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient \(r\) and the sample size \(n\), together. We perform a hypothesis test of the "significance of the correlation coefficient" to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population.

The sample data are used to compute \(r\), the correlation coefficient for the sample. If we had data for the entire population, we could find the population correlation coefficient. But because we have only sample data, we cannot calculate the population correlation coefficient. The sample correlation coefficient, \(r\), is our estimate of the unknown population correlation coefficient.

  • The symbol for the population correlation coefficient is \(\rho\), the Greek letter "rho."
  • \(\rho =\) population correlation coefficient (unknown)
  • \(r =\) sample correlation coefficient (known; calculated from sample data)

The hypothesis test lets us decide whether the value of the population correlation coefficient \(\rho\) is "close to zero" or "significantly different from zero". We decide this based on the sample correlation coefficient \(r\) and the sample size \(n\).

If the test concludes that the correlation coefficient is significantly different from zero, we say that the correlation coefficient is "significant."

  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between \(x\) and \(y\) because the correlation coefficient is significantly different from zero.
  • What the conclusion means: There is a significant linear relationship between \(x\) and \(y\). We can use the regression line to model the linear relationship between \(x\) and \(y\) in the population.

If the test concludes that the correlation coefficient is not significantly different from zero (it is close to zero), we say that correlation coefficient is "not significant".

  • Conclusion: "There is insufficient evidence to conclude that there is a significant linear relationship between \(x\) and \(y\) because the correlation coefficient is not significantly different from zero."
  • What the conclusion means: There is not a significant linear relationship between \(x\) and \(y\). Therefore, we CANNOT use the regression line to model a linear relationship between \(x\) and \(y\) in the population.
  • If \(r\) is significant and the scatter plot shows a linear trend, the line can be used to predict the value of \(y\) for values of \(x\) that are within the domain of observed \(x\) values.
  • If \(r\) is not significant OR if the scatter plot does not show a linear trend, the line should not be used for prediction.
  • If \(r\) is significant and if the scatter plot shows a linear trend, the line may NOT be appropriate or reliable for prediction OUTSIDE the domain of observed \(x\) values in the data.

PERFORMING THE HYPOTHESIS TEST

  • Null Hypothesis: \(H_{0}: \rho = 0\)
  • Alternate Hypothesis: \(H_{a}: \rho \neq 0\)

WHAT THE HYPOTHESES MEAN IN WORDS:

  • Null Hypothesis \(H_{0}\) : The population correlation coefficient IS NOT significantly different from zero. There IS NOT a significant linear relationship(correlation) between \(x\) and \(y\) in the population.
  • Alternate Hypothesis \(H_{a}\) : The population correlation coefficient IS significantly DIFFERENT FROM zero. There IS A SIGNIFICANT LINEAR RELATIONSHIP (correlation) between \(x\) and \(y\) in the population.

DRAWING A CONCLUSION:There are two methods of making the decision. The two methods are equivalent and give the same result.

  • Method 1: Using the \(p\text{-value}\)
  • Method 2: Using a table of critical values

In this chapter of this textbook, we will always use a significance level of 5%, \(\alpha = 0.05\)

Using the \(p\text{-value}\) method, you could choose any appropriate significance level you want; you are not limited to using \(\alpha = 0.05\). But the table of critical values provided in this textbook assumes that we are using a significance level of 5%, \(\alpha = 0.05\). (If we wanted to use a different significance level than 5% with the critical value method, we would need different tables of critical values that are not provided in this textbook.)

METHOD 1: Using a \(p\text{-value}\) to make a decision

Using the ti83, 83+, 84, 84+ calculator.

To calculate the \(p\text{-value}\) using LinRegTTEST:

On the LinRegTTEST input screen, on the line prompt for \(\beta\) or \(\rho\), highlight "\(\neq 0\)"

The output screen shows the \(p\text{-value}\) on the line that reads "\(p =\)".

(Most computer statistical software can calculate the \(p\text{-value}\).)

If the \(p\text{-value}\) is less than the significance level ( \(\alpha = 0.05\) ):

  • Decision: Reject the null hypothesis.
  • Conclusion: "There is sufficient evidence to conclude that there is a significant linear relationship between \(x\) and \(y\) because the correlation coefficient is significantly different from zero."

If the \(p\text{-value}\) is NOT less than the significance level ( \(\alpha = 0.05\) )

  • Decision: DO NOT REJECT the null hypothesis.
  • Conclusion: "There is insufficient evidence to conclude that there is a significant linear relationship between \(x\) and \(y\) because the correlation coefficient is NOT significantly different from zero."

Calculation Notes:

  • You will use technology to calculate the \(p\text{-value}\). The following describes the calculations to compute the test statistics and the \(p\text{-value}\):
  • The \(p\text{-value}\) is calculated using a \(t\)-distribution with \(n - 2\) degrees of freedom.
  • The formula for the test statistic is \(t = \frac{r\sqrt{n-2}}{\sqrt{1-r^{2}}}\). The value of the test statistic, \(t\), is shown in the computer or calculator output along with the \(p\text{-value}\). The test statistic \(t\) has the same sign as the correlation coefficient \(r\).
  • The \(p\text{-value}\) is the combined area in both tails.

An alternative way to calculate the \(p\text{-value}\) ( \(p\) ) given by LinRegTTest is the command 2*tcdf(abs(t),10^99, n-2) in 2nd DISTR.

THIRD-EXAM vs FINAL-EXAM EXAMPLE: \(p\text{-value}\) method

  • Consider the third exam/final exam example.
  • The line of best fit is: \(\hat{y} = -173.51 + 4.83x\) with \(r = 0.6631\) and there are \(n = 11\) data points.
  • Can the regression line be used for prediction? Given a third exam score ( \(x\) value), can we use the line to predict the final exam score (predicted \(y\) value)?
  • \(H_{0}: \rho = 0\)
  • \(H_{a}: \rho \neq 0\)
  • \(\alpha = 0.05\)
  • The \(p\text{-value}\) is 0.026 (from LinRegTTest on your calculator or from computer software).
  • The \(p\text{-value}\), 0.026, is less than the significance level of \(\alpha = 0.05\).
  • Decision: Reject the Null Hypothesis \(H_{0}\)
  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between the third exam score (\(x\)) and the final exam score (\(y\)) because the correlation coefficient is significantly different from zero.

Because \(r\) is significant and the scatter plot shows a linear trend, the regression line can be used to predict final exam scores.

METHOD 2: Using a table of Critical Values to make a decision

The 95% Critical Values of the Sample Correlation Coefficient Table can be used to give you a good idea of whether the computed value of \(r\) is significant or not . Compare \(r\) to the appropriate critical value in the table. If \(r\) is not between the positive and negative critical values, then the correlation coefficient is significant. If \(r\) is significant, then you may want to use the line for prediction.

Example \(\PageIndex{1}\)

Suppose you computed \(r = 0.801\) using \(n = 10\) data points. \(df = n - 2 = 10 - 2 = 8\). The critical values associated with \(df = 8\) are \(-0.632\) and \(+0.632\). If \(r <\) negative critical value or \(r >\) positive critical value, then \(r\) is significant. Since \(r = 0.801\) and \(0.801 > 0.632\), \(r\) is significant and the line may be used for prediction. If you view this example on a number line, it will help you.

Horizontal number line with values of -1, -0.632, 0, 0.632, 0.801, and 1. A dashed line above values -0.632, 0, and 0.632 indicates not significant values.

Exercise \(\PageIndex{1}\)

For a given line of best fit, you computed that \(r = 0.6501\) using \(n = 12\) data points and the critical value is 0.576. Can the line be used for prediction? Why or why not?

If the scatter plot looks linear then, yes, the line can be used for prediction, because \(r >\) the positive critical value.

Example \(\PageIndex{2}\)

Suppose you computed \(r = –0.624\) with 14 data points. \(df = 14 – 2 = 12\). The critical values are \(-0.532\) and \(0.532\). Since \(-0.624 < -0.532\), \(r\) is significant and the line can be used for prediction

Horizontal number line with values of -0.624, -0.532, and 0.532.

Exercise \(\PageIndex{2}\)

For a given line of best fit, you compute that \(r = 0.5204\) using \(n = 9\) data points, and the critical value is \(0.666\). Can the line be used for prediction? Why or why not?

No, the line cannot be used for prediction, because \(r <\) the positive critical value.

Example \(\PageIndex{3}\)

Suppose you computed \(r = 0.776\) and \(n = 6\). \(df = 6 - 2 = 4\). The critical values are \(-0.811\) and \(0.811\). Since \(-0.811 < 0.776 < 0.811\), \(r\) is not significant, and the line should not be used for prediction.

Horizontal number line with values -0.924, -0.532, and 0.532.

Exercise \(\PageIndex{3}\)

For a given line of best fit, you compute that \(r = -0.7204\) using \(n = 8\) data points, and the critical value is \(= 0.707\). Can the line be used for prediction? Why or why not?

Yes, the line can be used for prediction, because \(r <\) the negative critical value.

THIRD-EXAM vs FINAL-EXAM EXAMPLE: critical value method

Consider the third exam/final exam example. The line of best fit is: \(\hat{y} = -173.51 + 4.83x\) with \(r = 0.6631\) and there are \(n = 11\) data points. Can the regression line be used for prediction? Given a third-exam score ( \(x\) value), can we use the line to predict the final exam score (predicted \(y\) value)?

  • Use the "95% Critical Value" table for \(r\) with \(df = n - 2 = 11 - 2 = 9\).
  • The critical values are \(-0.602\) and \(+0.602\)
  • Since \(0.6631 > 0.602\), \(r\) is significant.
  • Conclusion:There is sufficient evidence to conclude that there is a significant linear relationship between the third exam score (\(x\)) and the final exam score (\(y\)) because the correlation coefficient is significantly different from zero.

Example \(\PageIndex{4}\)

Suppose you computed the following correlation coefficients. Using the table at the end of the chapter, determine if \(r\) is significant and the line of best fit associated with each r can be used to predict a \(y\) value. If it helps, draw a number line.

  • \(r = –0.567\) and the sample size, \(n\), is \(19\). The \(df = n - 2 = 17\). The critical value is \(-0.456\). \(-0.567 < -0.456\) so \(r\) is significant.
  • \(r = 0.708\) and the sample size, \(n\), is \(9\). The \(df = n - 2 = 7\). The critical value is \(0.666\). \(0.708 > 0.666\) so \(r\) is significant.
  • \(r = 0.134\) and the sample size, \(n\), is \(14\). The \(df = 14 - 2 = 12\). The critical value is \(0.532\). \(0.134\) is between \(-0.532\) and \(0.532\) so \(r\) is not significant.
  • \(r = 0\) and the sample size, \(n\), is five. No matter what the \(dfs\) are, \(r = 0\) is between the two critical values so \(r\) is not significant.

Exercise \(\PageIndex{4}\)

For a given line of best fit, you compute that \(r = 0\) using \(n = 100\) data points. Can the line be used for prediction? Why or why not?

No, the line cannot be used for prediction no matter what the sample size is.

Assumptions in Testing the Significance of the Correlation Coefficient

Testing the significance of the correlation coefficient requires that certain assumptions about the data are satisfied. The premise of this test is that the data are a sample of observed points taken from a larger population. We have not examined the entire population because it is not possible or feasible to do so. We are examining the sample to draw a conclusion about whether the linear relationship that we see between \(x\) and \(y\) in the sample data provides strong enough evidence so that we can conclude that there is a linear relationship between \(x\) and \(y\) in the population.

The regression line equation that we calculate from the sample data gives the best-fit line for our particular sample. We want to use this best-fit line for the sample as an estimate of the best-fit line for the population. Examining the scatter plot and testing the significance of the correlation coefficient helps us determine if it is appropriate to do this.

The assumptions underlying the test of significance are:

  • There is a linear relationship in the population that models the average value of \(y\) for varying values of \(x\). In other words, the expected value of \(y\) for each particular value lies on a straight line in the population. (We do not know the equation for the line for the population. Our regression line from the sample is our best estimate of this line in the population.)
  • The \(y\) values for any particular \(x\) value are normally distributed about the line. This implies that there are more \(y\) values scattered closer to the line than are scattered farther away. Assumption (1) implies that these normal distributions are centered on the line: the means of these normal distributions of \(y\) values lie on the line.
  • The standard deviations of the population \(y\) values about the line are equal for each value of \(x\). In other words, each of these normal distributions of \(y\) values has the same shape and spread about the line.
  • The residual errors are mutually independent (no pattern).
  • The data are produced from a well-designed, random sample or randomized experiment.

The left graph shows three sets of points. Each set falls in a vertical line. The points in each set are normally distributed along the line — they are densely packed in the middle and more spread out at the top and bottom. A downward sloping regression line passes through the mean of each set. The right graph shows the same regression line plotted. A vertical normal curve is shown for each line.

Linear regression is a procedure for fitting a straight line of the form \(\hat{y} = a + bx\) to data. The conditions for regression are:

  • Linear In the population, there is a linear relationship that models the average value of \(y\) for different values of \(x\).
  • Independent The residuals are assumed to be independent.
  • Normal The \(y\) values are distributed normally for any value of \(x\).
  • Equal variance The standard deviation of the \(y\) values is equal for each \(x\) value.
  • Random The data are produced from a well-designed random sample or randomized experiment.

The slope \(b\) and intercept \(a\) of the least-squares line estimate the slope \(\beta\) and intercept \(\alpha\) of the population (true) regression line. To estimate the population standard deviation of \(y\), \(\sigma\), use the standard deviation of the residuals, \(s\). \(s = \sqrt{\frac{SEE}{n-2}}\). The variable \(\rho\) (rho) is the population correlation coefficient. To test the null hypothesis \(H_{0}: \rho =\) hypothesized value , use a linear regression t-test. The most common null hypothesis is \(H_{0}: \rho = 0\) which indicates there is no linear relationship between \(x\) and \(y\) in the population. The TI-83, 83+, 84, 84+ calculator function LinRegTTest can perform this test (STATS TESTS LinRegTTest).

Formula Review

Least Squares Line or Line of Best Fit:

\[\hat{y} = a + bx\]

\[a = y\text{-intercept}\]

\[b = \text{slope}\]

Standard deviation of the residuals:

\[s = \sqrt{\frac{SSE}{n-2}}\]

\[SSE = \text{sum of squared errors}\]

\[n = \text{the number of data points}\]

Correlation Calculator

Input your values with a space or comma between in the table below

Critical Value

Results shown here

Sample size, n

Sample correlation coefficient, r, standardized sample score.

scipy.stats.pearsonr #

Pearson correlation coefficient and p-value for testing non-correlation.

The Pearson correlation coefficient [1] measures the linear relationship between two datasets. Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact linear relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases.

This function also performs a test of the null hypothesis that the distributions underlying the samples are uncorrelated and normally distributed. (See Kowalski [3] for a discussion of the effects of non-normality of the input on the distribution of the correlation coefficient.) The p-value roughly indicates the probability of an uncorrelated system producing datasets that have a Pearson correlation at least as extreme as the one computed from these datasets.

Input array.

Defines the alternative hypothesis. Default is ‘two-sided’. The following options are available:

‘two-sided’: the correlation is nonzero

‘less’: the correlation is negative (less than zero)

‘greater’: the correlation is positive (greater than zero)

New in version 1.9.0.

Defines the method used to compute the p-value. If method is an instance of PermutationMethod / MonteCarloMethod , the p-value is computed using scipy.stats.permutation_test / scipy.stats.monte_carlo_test with the provided configuration options and other appropriate settings. Otherwise, the p-value is computed as documented in the notes.

New in version 1.11.0.

An object with the following attributes:

Pearson product-moment correlation coefficient.

The p-value associated with the chosen alternative.

The object has the following method:

This computes the confidence interval of the correlation coefficient statistic for the given confidence level. The confidence interval is returned in a namedtuple with fields low and high . If method is not provided, the confidence interval is computed using the Fisher transformation [1] . If method is an instance of BootstrapMethod , the confidence interval is computed using scipy.stats.bootstrap with the provided configuration options and other appropriate settings. In some cases, confidence limits may be NaN due to a degenerate resample, and this is typical for very small samples (~6 observations).

Raised if an input is a constant array. The correlation coefficient is not defined in this case, so np.nan is returned.

Raised if an input is “nearly” constant. The array x is considered nearly constant if norm(x - mean(x)) < 1e-13 * abs(mean(x)) . Numerical errors in the calculation x - mean(x) in this case might result in an inaccurate calculation of r.

Spearman rank-order correlation coefficient.

Kendall’s tau, a correlation measure for ordinal data.

The correlation coefficient is calculated as follows:

where \(m_x\) is the mean of the vector x and \(m_y\) is the mean of the vector y.

Under the assumption that x and y are drawn from independent normal distributions (so the population correlation coefficient is 0), the probability density function of the sample correlation coefficient r is ( [1] , [2] ):

where n is the number of samples, and B is the beta function. This is sometimes referred to as the exact distribution of r. This is the distribution that is used in pearsonr to compute the p-value when the method parameter is left at its default value (None). The distribution is a beta distribution on the interval [-1, 1], with equal shape parameters a = b = n/2 - 1. In terms of SciPy’s implementation of the beta distribution, the distribution of r is:

The default p-value returned by pearsonr is a two-sided p-value. For a given sample with correlation coefficient r, the p-value is the probability that abs(r’) of a random sample x’ and y’ drawn from the population with zero correlation would be greater than or equal to abs(r). In terms of the object dist shown above, the p-value for a given r and length n can be computed as:

When n is 2, the above continuous distribution is not well-defined. One can interpret the limit of the beta distribution as the shape parameters a and b approach a = b = 0 as a discrete distribution with equal probability masses at r = 1 and r = -1. More directly, one can observe that, given the data x = [x1, x2] and y = [y1, y2], and assuming x1 != x2 and y1 != y2, the only possible values for r are 1 and -1. Because abs(r’) for any sample x’ and y’ with length 2 will be 1, the two-sided p-value for a sample of length 2 is always 1.

For backwards compatibility, the object that is returned also behaves like a tuple of length two that holds the statistic and the p-value.

“Pearson correlation coefficient”, Wikipedia, https://en.wikipedia.org/wiki/Pearson_correlation_coefficient

Student, “Probable error of a correlation coefficient”, Biometrika, Volume 6, Issue 2-3, 1 September 1908, pp. 302-310.

C. J. Kowalski, “On the Effects of Non-Normality on the Distribution of the Sample Product-Moment Correlation Coefficient” Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 21, No. 1 (1972), pp. 1-12.

To perform an exact permutation version of the test:

To perform the test under the null hypothesis that the data were drawn from uniform distributions:

To produce an asymptotic 90% confidence interval:

And for a bootstrap confidence interval:

There is a linear dependence between x and y if y = a + b*x + e, where a,b are constants and e is a random error term, assumed to be independent of x. For simplicity, assume that x is standard normal, a=0, b=1 and let e follow a normal distribution with mean zero and standard deviation s>0.

This should be close to the exact value given by

For s=0.5, we observe a high level of correlation. In general, a large variance of the noise reduces the correlation, while the correlation approaches one as the variance of the error goes to zero.

It is important to keep in mind that no correlation does not imply independence unless (x, y) is jointly normal. Correlation can even be zero when there is a very simple dependence structure: if X follows a standard normal distribution, let y = abs(x). Note that the correlation between x and y is zero. Indeed, since the expectation of x is zero, cov(x, y) = E[x*y]. By definition, this equals E[x*abs(x)] which is zero by symmetry. The following lines of code illustrate this observation:

A non-zero correlation coefficient can be misleading. For example, if X has a standard normal distribution, define y = x if x < 0 and y = 0 otherwise. A simple calculation shows that corr(x, y) = sqrt(2/Pi) = 0.797…, implying a high level of correlation:

This is unintuitive since there is no dependence of x and y if x is larger than zero which happens in about half of the cases if we sample x and y.

IMAGES

  1. Pearson Correlation Explained (Inc. Test Assumptions)

    hypothesis test pearson correlation

  2. Hypothesis testing for the (Pearson) correlation coefficient

    hypothesis test pearson correlation

  3. How To Perform A Pearson Correlation Test In R

    hypothesis test pearson correlation

  4. Correlation Analysis Pearson Product Moment Coefficient of Correlation

    hypothesis test pearson correlation

  5. Pearson correlation matrix of the behavioural model parameters

    hypothesis test pearson correlation

  6. Pearson’s Correlation Coefficient

    hypothesis test pearson correlation

VIDEO

  1. SPSS 10- Pearson's Correlation [Urdu/Hindi]

  2. Conduct a Linear Correlation Hypothesis Test Using Free Web Calculators

  3. Pearson Test for Correlation

  4. 19. Hypothesis Testing -Pearson Correlation Coefficient

  5. Correlation Hypothesis Test Theory

  6. Conduct a Multiple Linear Correlation Hypothesis Test Using Free Web Calculators

COMMENTS

  1. 1.9

    Pearson correlation of HAge and WAge = .939...or one could treat the wife's age as the response: Pearson correlation of WAge and HAge = 0.939. In cases such as these, we answer our research question concerning the existence of a linear relationship by using the t-test for testing the population correlation coefficient \(H_{0}\colon \rho = 0\).

  2. Pearson Correlation Coefficient (r)

    Revised on February 10, 2024. The Pearson correlation coefficient (r) is the most common way of measuring a linear correlation. It is a number between -1 and 1 that measures the strength and direction of the relationship between two variables. When one variable changes, the other variable changes in the same direction.

  3. 11.2: Correlation Hypothesis Test

    This page titled 11.2: Correlation Hypothesis Test is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by OpenStax via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.

  4. SPSS Tutorials: Pearson Correlation

    The null hypothesis (H 0) and alternative hypothesis (H 1) of the significance test for correlation can be expressed in the following ways, ... Selecting Pearson will produce the test statistics for a bivariate Pearson Correlation. C Test of Significance: Click Two-tailed or One-tailed, depending on your desired significance test. SPSS uses a ...

  5. Hypothesis Testing: Correlations

    We perform a hypothesis test of the "significance of the correlation coefficient" to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population. The hypothesis test lets us decide whether the value of the population correlation coefficient. \rho ρ.

  6. Lab 20: Hypothesis testing with correlation

    Step 1: State hypotheses and choose α level. Remember we're going to state hypotheses in terms of our population correlation ρ. In this example, we expect GPA to decrease as distance from campus increases. This means that we are making a directional hypothesis and using a 1-tailed test.

  7. Pearson correlation

    The Pearson correlation is a measure for the strength and direction of the linear relationship between two variables of at least interval measurement level. Alternative hypothesis. The test for the Pearson correlation coefficient tests the above null hypothesis against the following alternative hypothesis (H 1 or H a): H 1 two sided: $\rho \neq ...

  8. SAS Tutorials: Pearson Correlation with PROC CORR

    The null hypothesis (H 0) and alternative hypothesis (H 1) of the significance test for correlation can be expressed in the following ways, ... The third table contains the Pearson correlation coefficients and test results. Notice that the correlations in the main diagonal (cells A and D) are all equal to 1. This is because a variable is always ...

  9. 12.1.2: Hypothesis Test for a Correlation

    The t-test is a statistical test for the correlation coefficient. It can be used when x x and y y are linearly related, the variables are random variables, and when the population of the variable y y is normally distributed. The formula for the t-test statistic is t = r ( n − 2 1 −r2)− −−−−−−−√ t = r ( n − 2 1 − r 2).

  10. Interpreting Correlation Coefficients

    Hypothesis Test for Correlation Coefficients. Correlation coefficients have a hypothesis test. As with any hypothesis test, this test takes sample data and evaluates two mutually exclusive statements about the population from which the sample was drawn. ... On the other hand, the hypothesis test of Pearson's correlation coefficient does ...

  11. Pearson Correlation-Hypothesis Testing, Assumptions and Why Used for?

    Pearson Correlation Coefficient is typically used to describe the strength of the linear relationship between two quantitative variables. Often, these two variables are designated X (predictor) and Y (outcome). Pearson's r has values that range from −1.00 to +1.00. The sign of r provides information about the direction of the relationship ...

  12. 12.4 Testing the Significance of the Correlation Coefficient

    PERFORMING THE HYPOTHESIS TEST. Null Hypothesis: H 0: ρ = 0 Alternate Hypothesis: H a: ρ ≠ 0 WHAT THE HYPOTHESES MEAN IN WORDS: Null Hypothesis H 0: The population correlation coefficient IS NOT significantly different from zero. There IS NOT a significant linear relationship (correlation) between x and y in the population.; Alternate Hypothesis H a: The population correlation coefficient ...

  13. Pearson Correlation

    When testing the null hypothesis that there is no correlation between age and Brozek percent body fat, we reject the null hypothesis (r = 0.289, t = 4.77, with 250 degrees of freedom, and a p-value = 3.045e-06).

  14. Pearson correlation coefficient

    In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. ... For example, if z = 2.2 is observed and a two-sided p-value is desired to test the null hypothesis that ...

  15. Pearson Product-Moment Correlation

    Pearson Product-Moment Correlation What does this test do? The Pearson product-moment correlation coefficient (or Pearson correlation coefficient, for short) is a measure of the strength of a linear association between two variables and is denoted by r.Basically, a Pearson product-moment correlation attempts to draw a line of best fit through the data of two variables, and the Pearson ...

  16. Pearson Correlation

    The null hypothesis and the alternative hypothesis in Pearson correlation are thus: Null hypothesis: The correlation coefficient is not significantly different from zero ... Significance and the t-test. Whether the Pearson correlation coefficient is significantly different from zero based on the sample surveyed can be checked using a t-test ...

  17. Pearson's correlation coefficient

    Pearson's correlation coefficient, a measurement quantifying the strength of the association between two variables. ... The significance of an association is a separate analysis of the sample correlation coefficient r using a t-test to measure the difference between the observed r and the expected r under the null hypothesis.

  18. Pearson Correlation Coefficient: Formula, Examples

    Pearson's correlation coefficient has implications for hypothesis testing as well as other decision-making processes. By measuring the strength of a linear relationship between two variables, researchers can make informed decisions based on their findings which can help guide future research studies or inform corporate policies and practices ...

  19. A comparison of the Pearson and Spearman correlation methods

    The Pearson and Spearman correlation coefficients can range in value from −1 to +1. For the Pearson correlation coefficient to be +1, when one variable increases then the other variable increases by a consistent amount. This relationship forms a perfect line. The Spearman correlation coefficient is also +1 in this case.

  20. 12.5: Testing the Significance of the Correlation Coefficient

    The hypothesis test lets us decide whether the value of the population correlation coefficient \(\rho\) is "close to zero" or "significantly different from zero". ... different from zero". We decide this based on the sample correlation coefficient \(r\) and the sample size \(n\). If the test concludes that the correlation coefficient is ...

  21. Correlation Hypothesis Test Calculator for r

    Correlation Calculator. Input your values with a space or comma between in the table below. Discover the power of statistics with our free hypothesis test for Pearson correlation coefficient (r) on two numerical data sets. Our user-friendly calculator provides accurate results to determine the strength and significance of relationships between ...

  22. scipy.stats.pearsonr

    dist = scipy.stats.beta(n/2 - 1, n/2 - 1, loc=-1, scale=2) The default p-value returned by pearsonr is a two-sided p-value. For a given sample with correlation coefficient r, the p-value is the probability that abs (r') of a random sample x' and y' drawn from the population with zero correlation would be greater than or equal to abs (r).