During this tutorial, you will learn the basics of **linear regression ****in R programming****,** which is a very popular statistical model.

- How does linear regression work?
**Linear regression in R****programming****.**- Become familiar with the concept of coefficients and residuals.

**Linear regression – what is it**

In a linear regression model, we are interested in the relationship between a response variable (often referred to as y) and one or more variables and their interactions (often referred to as x or explanatory variables). The type of relationship you make in your head all the time is, for instance, when a child’s height is used to measure her age, you are assuming the older the child is, the taller she will be and can get __r homework help__. The linear regression model is one of the most basic statistical models; its results are easily understood by most people, and it has been around since the early nineteenth century.

**Creating a Linear Regression in R**** programming****.**

** **

The linear regression algorithm assumes that the explanatory variables and the response variables are linearly related. Not every problem can be solved using the same algorithm. A line can be fitted between the two (or more) variables. In the previous example, it can be seen that there is a correlation between the age and height of children.

Knowing the age of a child allows you to calculate the height of the child:

Height=a+Age∗bHeight=a+Age∗b

In this example, “a” refers to intercept and “b” refers to the slope. Additionally, “a” refers to the value from which you begin measuring. The slope measures the change in height as a function of the months in which the baby was born. Newborns with zero months are not necessarily zero centimeters; this is the function of the intercept. Accordingly, the height of the child increases by one unit for every month the child is older. You can calculate a **linear regression in R ****programming** using the command lm.

You should import the library readxl to read Microsoft Excel files. The file format doesn’t matter, as long as R can read it. This DataCamp course will teach you how to import data to R.

This tutorial uses data obtained from an object called age and height. Download the data in the third line and create a linear regression.

Overview information on a model’s performance and coefficients can be obtained through the command summary(lmHeight).

**library(readxl)**

**ageandheight <- read_excel(“ageandheight.xls”, sheet = “Hoja2”) #Upload the data**

**lmHeight = lm(height~age, data = ageandheight) #Create the linear regression**

**summary(lmHeight) #Review the results**

**Coefficients**

In the red square, you can see the intercept value (“a” value) and slope value (“b”) for the age of the sample. These “a” and “b” values plot a line between all the data points. The model predicts (on average) that a child who is 20.5 months old, a is 64.92, and b is 0.635 will grow to about 64.92 + (0.635 * 20.5) = 77.93 cm in height.

As with the simple example you used before, the height of the child will be calculated by: When two or more predictors are used to create a regression, it is called multiple linear regression.

Height = a + Age × b1 + (Number of Siblings} × b2

As you can see, height is now a function of the child’s age and the number of siblings. The red blocks on the figure above represent the coefficients (b1 and b2). These coefficients can be interpreted as follows:

*According to this study, the predicted height of children with the same number of siblings increases by 0.63 cm for every month the child has. As a result, when children of the same age are compared, the height of the eldest child decreases (because the coefficient is negative) by about 0.01 cm for each additional sibling. *

Each additional variable can be added to a model by adding the symbol “+” in R.

**lmHeight2 = lm(height~age + no_siblings, data = ageandheight) #Create a linear regression with two variables**

**summary(lmHeight2) #Review the results**

The number of siblings is an ineffective way to predict a child’s height. The p-value of your coefficients is another important aspect of your linear models. A p-value is an indication of whether or not you can reject or accept a hypothesis. The blue rectangle shows the p-values for the coefficients age and number of siblings in the previous example. According to this hypothesis, the predictor isn’t important to your model.

- Age has a p-value of 4.34*e-10 (0.000000000434), which suggests that it would be a very good addition to your model.

- It is estimated that there is an 85% probability that the number of siblings is not a meaningful predictor for the regression with a p-value of 0.85.

In general, it is standard to check if the p-values of the predictors are less than 0.05 to see if they are not meaningful.

**Final words**

The slope measures the change in height as a function of the months in which the baby was born. In the following example, you can use this command to compute the height based on the child’s age. You should import the library readxl to read Microsoft Excel files. This tutorial uses data obtained from an object called age and height. Download the data in the third line and create a linear regression. In the red square, you can see the intercept value (“a” value) and slope value (“b”) for the age of the sample.

Pages | 5 |

Words | 925 |

Characters | 5331 |

Characters excluding spaces | 4425 |