Meet R

# Meet R
## “R, in itself, is an attempt to bring order out of chaos.”
### Otho Mantegazza 2019-11-20

---

# The Rise of Private Space Agencies

![](01-meet-r_files/figure-html/unnamed-chunk-3-1.svg)

---

---

---

# R understands you

Write something in the R console:

```r
2 # numbers
```

```
## [1] 2
```

```r
"string of text must be quoted" # text
```

```
## [1] "string of text must be quoted"
```

```r
2 + 2 # any mathematical operation
```

```
## [1] 4
```

```r
3^2
```

```
## [1] 9
```

---

# R understands you

you can group together numbers

```r
c(1, 2, 5, 6, 8, 1, 45, .2, -1) # or numbers
```

```
## [1]  1.0  2.0  5.0  6.0  8.0  1.0 45.0  0.2 -1.0
```

operations are vectorized.

```r
c(1, 2, 5, 6, 8, 1, 45, .2, -1)*2 # also for numbers
```

```
## [1]  2.0  4.0 10.0 12.0 16.0  2.0 90.0  0.4 -2.0
```

---

# R understands you

The same for letters.

```r
c("you", "can", "group", "together", "strings", "of", "text")
```

```
## [1] "you"      "can"      "group"    "together" "strings"  "of"       "text"
```

```r
paste(c("you", "can", "group", "together",
        "strings", "of", "text"),
      collapse = " ") # operation are vectorized
```

```
## [1] "you can group together strings of text"
```

---

---

# Everything in R is an object

And objects have types

```r
class(1)
```

```
## [1] "numeric"
```

```r
class("ciao")
```

```
## [1] "character"
```

```r
class(FALSE)
```

```
## [1] "logical"
```

```r
class(mean)
```

```
## [1] "function"
```

---

.exercise-body[put together objects of different classes in a vector with `c(..., ...)` and see what happens.]

---

### Only elements of one class fit in a vector, others are coherced.

```r
c(2,TRUE,5,FALSE,4)
```

```
## [1] 2 1 5 0 4
```

```r
c("This", "sentence", "is", FALSE)
```

```
## [1] "This"     "sentence" "is"       "FALSE"
```

```r
c("The", 2, "sentences", "above", "are", TRUE)
```

```
## [1] "The"       "2"         "sentences" "above"     "are"       "TRUE"
```

```r
whatisthis # but some objects aren't recognized
```

```
## Error in eval(expr, envir, enclos): object 'whatisthis' not found
```

---

### For objects of different classes, you need lists

```r
list(1,
     "ciao",
     mean,
     c("You", "can", "put", "vectors", "into", "lists"))
```

```
## [[1]]
## [1] 1
## 
## [[2]]
## [1] "ciao"
## 
## [[3]]
## function (x, ...) 
## UseMethod("mean")
## <bytecode: 0x564589835d28>
## <environment: namespace:base>
## 
## [[4]]
## [1] "You" "can" "put" "vectors" "into" "lists"
```

---

# Wrap up

```
## # A tibble: 0 x 0
```

A table of R object...

---

---

---

# Let's make things confusing

You can assign any object to a variable with ` <- `.

```r
a <- "b"
```

When you call the variables it returns the object that you have assigned to it.

```r
a
```

```
## [1] "b"
```

(same for numbers:)

```r
number_1 <- 2
number_1
```

```
## [1] 2
```

---

### Any kind of objects

```r
some_numbers <- c(1,2,3)
some_numbers
```

```
## [1] 1 2 3
```

```r
some_words <- c("This", "are", "not", "numbers")
some_words
```

```
## [1] "This"    "are"     "not"     "numbers"
```

---

---

---

---

# Some variables already store data

```r
mpg
```

```
## # A tibble: 234 x 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto(l… f 18 29 p comp…
## 2 audi a4 1.8 1999 4 manual… f 21 29 p comp…
## 3 audi a4 2 2008 4 manual… f 20 31 p comp…
## 4 audi a4 2 2008 4 auto(a… f 21 30 p comp…
## 5 audi a4 2.8 1999 6 auto(l… f 16 26 p comp…
## 6 audi a4 2.8 1999 6 manual… f 18 26 p comp…
## 7 audi a4 3.1 2008 6 auto(a… f 18 27 p comp…
## 8 audi a4 quat… 1.8 1999 4 manual… 4 18 26 p comp…
## 9 audi a4 quat… 1.8 1999 4 auto(l… 4 16 25 p comp…
## 10 audi a4 quat… 2 2008 4 manual… 4 20 28 p comp…
## # … with 224 more rows
```

---

# Some variables already store data

```r
starwars
```

```
## # A tibble: 87 x 13
## name height mass hair_color skin_color eye_color birth_year gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> 
## 1 Luke… 172 77 blond fair blue 19 male 
## 2 C-3PO 167 75 <NA> gold yellow 112 <NA> 
## 3 R2-D2 96 32 <NA> white, bl… red 33 <NA> 
## 4 Dart… 202 136 none white yellow 41.9 male 
## 5 Leia… 150 49 brown light brown 19 female
## 6 Owen… 178 120 brown, gr… light blue 52 male 
## 7 Beru… 165 75 brown light blue 47 female
## 8 R5-D4 97 32 <NA> white, red red NA <NA> 
## 9 Bigg… 183 84 black light brown 24 male 
## 10 Obi-… 182 77 auburn, w… fair blue-gray 57 male 
## # … with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>,
## # films <list>, vehicles <list>, starships <list>
```

---

# Some variables already store data

```r
storms
```

```
## # A tibble: 10,010 x 13
## name year month day hour lat long status category wind pressure
## <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <chr> <ord> <int> <int>
## 1 Amy 1975 6 27 0 27.5 -79 tropi… -1 25 1013
## 2 Amy 1975 6 27 6 28.5 -79 tropi… -1 25 1013
## 3 Amy 1975 6 27 12 29.5 -79 tropi… -1 25 1013
## 4 Amy 1975 6 27 18 30.5 -79 tropi… -1 25 1013
## 5 Amy 1975 6 28 0 31.5 -78.8 tropi… -1 25 1012
## 6 Amy 1975 6 28 6 32.4 -78.7 tropi… -1 25 1012
## 7 Amy 1975 6 28 12 33.3 -78 tropi… -1 25 1011
## 8 Amy 1975 6 28 18 34 -77 tropi… -1 30 1006
## 9 Amy 1975 6 29 0 34.4 -75.8 tropi… 0 35 1004
## 10 Amy 1975 6 29 6 34 -74.8 tropi… 0 40 1002
## # … with 10,000 more rows, and 2 more variables: ts_diameter <dbl>,
## # hu_diameter <dbl>
```

---

- How are they structured?
- What do they have in common?

]

---

# Don't try this at home

```r
a_list <-
 list(c(1,2,3,4,5),
 c(TRUE, FALSE, FALSE, TRUE, FALSE),
 c("Some", "text", "in", "this", "column"))

attributes(a_list)$row.names <- 1:5
attributes(a_list)$names <- c("my", "data", "frame")
attributes(a_list)$class <- "data.frame"
```

```
## # A tibble: 5 x 3
## my data frame 
## <dbl> <lgl> <chr> 
## 1 1 TRUE Some 
## 2 2 FALSE text 
## 3 3 FALSE in 
## 4 4 TRUE this 
## 5 5 FALSE column
```

---

---

---

# Some variables already store functions

```r
mean
```

```
## function (x, ...) 
## UseMethod("mean")
## <bytecode: 0x564589835d28>
## <environment: namespace:base>
```

```r
c
```

```
## function (...)  .Primitive("c")
```

```r
plot
```

```
## function (x, y, ...) 
## UseMethod("plot")
## <bytecode: 0x564587e63228>
## <environment: namespace:graphics>
```

---
# Some variables already store functions

```r
read_csv
```

```
## function (file, col_names = TRUE, col_types = NULL, locale = default_locale(), 
## na = c("", "NA"), quoted_na = TRUE, quote = "\"", comment = "", 
## trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000, 
## n_max), progress = show_progress(), skip_empty_rows = TRUE) 
## {
## tokenizer <- tokenizer_csv(na = na, quoted_na = quoted_na, 
## quote = quote, comment = comment, trim_ws = trim_ws, 
## skip_empty_rows = skip_empty_rows)
## read_delimited(file, tokenizer, col_names = col_names, col_types = col_types, 
## locale = locale, skip = skip, skip_empty_rows = skip_empty_rows, 
## comment = comment, n_max = n_max, guess_max = guess_max, 
## progress = progress)
## }
## <bytecode: 0x56458a6fb210>
## <environment: namespace:readr>
```

---

---

# Example: how to use the `mean()` function

It takes numbers as the main input `x` and gives numbers as output:

```r
mean(x = 1)
```

```
## [1] 1
```

You might want to know the mean of more than one number. Use a vector:

```r
mean(x = c(1,5,3,7,22,-34, 3.1, 0.4))
```

```
## [1] 0.9375
```

You can provide extra arguments, for example on how to deal with missing values.

```r
mean(x = c(1,5,3,7,22,-34, 3.1, 0.4, NA_real_), na.rm = TRUE)
```

```
## [1] 0.9375
```

---

# Example: how to use the `mean()` function

You can provide a variable as argument:

```r
some_numbers <- c(1,5,3,7,22,-34, 3.1, 0.4)
ignore_na_strategy <- TRUE
*mean(x = some_numbers, na.rm = ignore_na_strategy)
```

```
## [1] 0.9375
```

And store the result in another variable:

```r
my_mean_value <- mean(x = some_numbers, na.rm = ignore_na_strategy)
```

`my_mean_value` now stores the output of the function:

```r
my_mean_value
```

```
## [1] 0.9375
```

---

# Example: how to use the `mean()` function

To access the help page of a function write its name after a question mark `?mean`.

(or search it on google...)

## Many packages are documented online

For example https://readr.tidyverse.org/

Help pages provide technical information on how to use the function, they don't introduce you to it, they are not discoursive.

They don't tell you in which when, in which contect or in which combination you would use the function. For that you need books, vignettes and blog articles.

(Learning R on help pages is like learning biology - (from zero!) on peer reviwed articles)

---

---

---

---

# For example, import CSV

We would like to read a csv into R.

There is a function that read the contents of text file, which is part of **base** R:

```r
# sample_file <- "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-01-15/launches.csv"

sample_file <- here::here("data/sample.csv")

readLines(con = sample_file, n = 5)
```

```
## [1] "tag,JD,launch_date,launch_year,type,variant,mission,agency,state_code,category,agency_type"
## [2] "1967-065,2439671.38,1967-06-29,1967,Thor Burner 2,,Secor Type II S/N 10,US,US,O,state"     
## [3] "1967-080,2439725.7,1967-08-23,1967,Thor Burner 2,,DAPP 3419,US,US,O,state"                 
## [4] "1967-096,2439774.83,1967-10-11,1967,Thor Burner 2,,DAPP 4417,US,US,O,state"                
## [5] "1968-042,2439999.69,1968-05-23,1968,Thor Burner 2,,DAPP 5420,US,US,O,state"
```

---

# For example, import CSV

But we don't need to call it directly, we can call a higher level functions that reads the contents from a CSV file and structures it in a dataframe.

```r
read.csv(file = sample_file, nrows = 5)
```

```
##        tag      JD launch_date launch_year          type variant
## 1 1967-065 2439671  1967-06-29        1967 Thor Burner 2      NA
## 2 1967-080 2439726  1967-08-23        1967 Thor Burner 2      NA
## 3 1967-096 2439775  1967-10-11        1967 Thor Burner 2      NA
## 4 1968-042 2440000  1968-05-23        1968 Thor Burner 2      NA
## 5 1968-092 2440153  1968-10-23        1968 Thor Burner 2      NA
##                mission agency state_code category agency_type
## 1 Secor Type II S/N 10     US         US        O       state
## 2            DAPP 3419     US         US        O       state
## 3            DAPP 4417     US         US        O       state
## 4            DAPP 5420     US         US        O       state
## 5            DAPP 6422     US         US        O       state
```

`read.csv` uses `readLines` an performs many other operations that are needed to read a CSV file into R.

---

# For example, import CSV

Sometimes you have many function that does the same job, and that have similar names! Tip: in doubt use the ones with the low dash "_" instead of the dot ".".

```r
read_csv(sample_file)
```

```
## # A tibble: 10 x 11
## tag JD launch_date launch_year type variant mission agency state_code
## <chr> <dbl> <date> <dbl> <chr> <lgl> <chr> <chr> <chr> 
## 1 1967… 2.44e6 1967-06-29 1967 Thor… NA Secor … US US 
## 2 1967… 2.44e6 1967-08-23 1967 Thor… NA DAPP 3… US US 
## 3 1967… 2.44e6 1967-10-11 1967 Thor… NA DAPP 4… US US 
## 4 1968… 2.44e6 1968-05-23 1968 Thor… NA DAPP 5… US US 
## 5 1968… 2.44e6 1968-10-23 1968 Thor… NA DAPP 6… US US 
## 6 1969… 2.44e6 1969-07-23 1969 Thor… NA DAPP 7… US US 
## 7 1970… 2.44e6 1970-02-11 1970 Thor… NA DAPP B… US US 
## 8 1970… 2.44e6 1970-09-03 1970 Thor… NA DAPP B… US US 
## 9 1971… 2.44e6 1971-02-17 1971 Thor… NA DAPP B… US US 
## 10 1971… 2.44e6 1971-06-08 1971 Thor… NA P70-1 US US 
## # … with 2 more variables: category <chr>, agency_type <chr>
```

---

Find a CSV or any kind of delimited text file and read it in R.

]

---

---

# You can read data into R with reader

]

---

# You can pipe your operations with magrittr

]

---

# You can manipulate text with stringr

]

---

# You can manipulate data with dplyr

]

---

# You can apply functions with purrr

]

---

# You can plot with ggplot2

]

---

---

# To use a package must to take two steps:

First **install the package** on your laptop.

You must do this **only once**.

```r
install.packages("tidyverse")
```

Then, **load the package** in your R environment.

You must do this in **every R session** (if you want to use that package).

```r
library(tidyverse)
```

---

---

# You can write your own functions!

```r
per_due <- function(n = 1) n*2

per_due()
```

```
## [1] 2
```

```r
per_due(23) # Try it out
```

```
## [1] 46
```

```r
greet <- function(person) {
 person <- stringr::str_to_title(person)
 paste0("Hi ", person, ", how are you?")
}

greet("Otho")
```

```
## [1] "Hi Otho, how are you?"
```

---

# You can write your own functions!

```r
hypot <- function(a = 1, b = 1) {
 sqrt(a^2 + b^2)
}

hypot(3, 4)
```

```
## [1] 5
```

```r
plot_blue_circle <- function(radius = .2) {
 grid::grid.circle(r = radius, gp = grid::gpar(fill = "#27A6D3"))
}

plot_blue_circle()
```

![](01-meet-r_files/figure-html/unnamed-chunk-49-1.svg)

---

---

---

# Data is on Github

https://github.com/rfordatascience/tidytuesday/tree/master/data/2019/2019-01-15

And it was used by the economist to perform [this viz](https://www.economist.com/graphic-detail/2018/10/18/the-space-race-is-dominated-by-new-contenders).

---

# You can find the data locally

in the data folder at the path `data/launches.csv`.

```r
library(readr)

launches_path <- here::here("data", "launches.csv")

launches <- read_csv(file = launches_path)
```

```
## Parsed with column specification:
## cols(
##   tag = col_character(),
##   JD = col_double(),
##   launch_date = col_date(format = ""),
##   launch_year = col_double(),
##   type = col_character(),
##   variant = col_character(),
##   mission = col_character(),
##   agency = col_character(),
##   state_code = col_character(),
##   category = col_character(),
##   agency_type = col_character()
## )
```

---

# And they are instantly ready for use

We read the data with `read_csv()` into an R ´tibble` object, and assigned them to the variable launches`

```r
launches
```

```
## # A tibble: 5,726 x 11
## tag JD launch_date launch_year type variant mission agency state_code
## <chr> <dbl> <date> <dbl> <chr> <chr> <chr> <chr> <chr> 
## 1 1967… 2.44e6 1967-06-29 1967 Thor… <NA> Secor … US US 
## 2 1967… 2.44e6 1967-08-23 1967 Thor… <NA> DAPP 3… US US 
## 3 1967… 2.44e6 1967-10-11 1967 Thor… <NA> DAPP 4… US US 
## 4 1968… 2.44e6 1968-05-23 1968 Thor… <NA> DAPP 5… US US 
## 5 1968… 2.44e6 1968-10-23 1968 Thor… <NA> DAPP 6… US US 
## 6 1969… 2.44e6 1969-07-23 1969 Thor… <NA> DAPP 7… US US 
## 7 1970… 2.44e6 1970-02-11 1970 Thor… <NA> DAPP B… US US 
## 8 1970… 2.44e6 1970-09-03 1970 Thor… <NA> DAPP B… US US 
## 9 1971… 2.44e6 1971-02-17 1971 Thor… <NA> DAPP B… US US 
## 10 1971… 2.44e6 1971-06-08 1971 Thor… <NA> P70-1 US US 
## # … with 5,716 more rows, and 2 more variables: category <chr>,
## # agency_type <chr>
```

---

# Plotting in R is easy

```r
ggplot(data = launches,
       mapping = aes(x = launch_year, fill = agency_type)) +
  geom_histogram()
```

![](01-meet-r_files/figure-html/unnamed-chunk-52-1.svg)

Data from Jonathan McDowell's JSR Launch Vehicle Database