Quick note about calculating the mean of a column with dplyr
in R
. It’s surprisingly easy to screw up, and the culprit is forgetting to change the name of the column storing the new calculation.
A simple dataframe.
library(tidyverse)
df <- data.frame(
'books_read' = c(1,2,3,4,5,6),
'intelligence' = c(4,5,6,7,8,8)
)
df
books_read intelligence
1 1 4
2 2 5
3 3 6
4 4 7
5 5 8
6 6 8
I want to calculate the mean and standard deviation of the “books read” column. If I calculate the mean and then place it into a new column that has the same name as the original variable, then standard deviation command doesn’t work.
library(tidyverse)
df %>%
summarise(
books_read = mean(books_read), # this line is the problem
sd_books_read = sd(books_read)
)
books_read sd_books_read
1 3.5 NA
Instead, I need to call the new “mean books read” column a different name.
library(tidyverse)
df %>%
summarise(
mean_books_read = mean(books_read), # this line is the problem
sd_books_read = sd(books_read)
)
mean_books_read sd_books_read
1 3.5 1.870829
Bo\(^2\)m =)