# Data Analysis in R

# Slides

Introduction:

Hands on data with R:

- Meet R
- Manipulate Data
- Missing Values
- Visualize Data
- Notes on Correlation
- Robust Statistics
- Get Data into R
- Clean Data with R
- Explorative Data Analysis

Explore your data with statistical models:

# Tools

## R and Rstudio

You can run within Visual Studio Code, in the docker container provided by the summer school organizers.

Otherwise, you can also:

Remeber! R works with packages.

## Install a package

First install the package with `install.packages()`

(you only have to do it once).

## Load a package

Then load it with `library()`

, to make itâ€™s functions available. (you have to do it at the beginning of each of your scripts).

## Packages that we are going to use:

Please install this packages:

`install.packages(c('tidyverse', 'palmerpenguins', 'here', 'broom', 'janitor'))`

and place this snippet of code in front of all your scripts.

```
library(tidyverse)
library(here)
library(palmerpenguins)
library(broom)
library(janitor)
```

# Great Books About Data Analysis

These are the textbooks that I love and that I use as a daily reference. They are all openly accessible.

## R

- R for Data Science: An introduction to data analysis with R/Tidyverse by Hadley Wickham and Garret Grolemund.
- (2nd edition coming out soon).

- Introduction to Data Science - A detailed introduction to Data science by the biostatistician Rafael A. Irizarry.
- Advanced R - All you wish to know about programming in R by Hadley Wickham.
- Introduction to Statistical Learning - A detailed introductio to modern statistical methods, implemented in R by Gareth James, Jeffrey Heer, Dominik Moritz, Jake VanderPlas, and Brock Craft, Trevor Hastie and Rob Tibshirani.
- Text Mining in R Analyzing natural language and written text in R, by Julia Silge and David Robinson.
- Tidy Modeling with R An introduction to the tools that compose Râ€™s machine learning framework, by Max Kuhn and Julia Silge.
- Analising Data Using Linear Models, for students in social, behavioural and management science, by StĂ©phanie M. van den Berg.

## Python

- Think Python 2e Learn how to think as a computr scientist with python, by Allen B. Downey.
- The Python Data Science Handbook, foundation of python for data science, by Jake VanderPlas.
- A Whirlwind Introduction to Python, a fast paced introduction to python, by Jake VanderPlas.
- Python for Data Analysis, the basics of data analysis in Python, with numpy and pandas, by Wes McKinney.
- Visualization Curriculum Data Visualization with Python, through Vega-Lite and Altair. Available also for javascript, by Jeffrey Heer, Dominik Moritz, Jake VanderPlas, and Brock Craft.

## Javascript

- Javascript for Data Science an introduction to modern Javascript by Maya Gans, Toby Hodges, and Greg Wilson.
- D3 in Depth, visualize data on the web with D3, by Peter Cook.

## Git / Github

- Happy Git and Github for useR by Jenny Brian and Jim Hester.
- Pro Git Book, donâ€™t worry, it starts from the basics; by Scott Chacon and Ben Straub.
- Github Skills A set of practical exercise to learn Github.

## Project management

- Designing and Building Data Science Solutions how to set up a data science project, Jonathan Leslie and Neri Van Otten.

## Dataviz Design

- Data Visualization - A practical introduction Visualize data in R, by Kieran Healy.
- Scientific Color Palettes đźŽ¨ Perceptually uniform colors, for scientific data visualization, by Fabio Crameri.
- Scico, Fabio Crameriâ€™s color palettes ported to ggplot2, by Thomas Lin Pedersen.

## Dashboards

- Dashboards with R + Docker + Github Actions by Rami Krispin, head of data science at Apple.

## Computer Science

- Missing Semester A generic intro to basic CS productivity tips and tools, by Anish Athalye.

## Bayesian Statistics in R and Python

- A 10 minutes introduction to Bayesian statistics in R by Michael Clark.
- An introduction to Bayesian Thinking by Merlise Clyde et al.
- Think Bayes, an introduction to bayesian statistics in Python by Allen B. Downey.
- Bayesian Data Analysis by Andrew Gelman et al.

## Geocomputation

- Geocomputation with R; a book on geographic data analysis, visualization and modeling by Robin Lovelace, Jakub Nowosad and Jannes Muenchow.
- Spatial Data Science; concepts, packages and models for spatial data science in R, by Edzer Pebesma, Roger Bivand.

## More Books at Bookdown

- Check out the bookdown repository for many more.

# Source Code

The source code for this course is available on Github.