Data Analysis in R
Slides
Introduction:
Hands on data with R:
- Meet R
- Manipulate Data
- Missing Values
- Visualize Data
- Notes on Correlation
- Robust Statistics
- Get Data into R
- Clean Data with R
- Explorative Data Analysis
Explore your data with statistical models:
Tools
R and Rstudio
You can run within Visual Studio Code, in the docker container provided by the summer school organizers.
Otherwise, you can also:
Remeber! R works with packages.
Install a package
First install the package with install.packages() (you only have to do it once).
Load a package
Then load it with library(), to make it’s functions available. (you have to do it at the beginning of each of your scripts).
Packages that we are going to use:
Please install this packages:
install.packages(c('tidyverse', 'palmerpenguins', 'here', 'broom', 'janitor'))and place this snippet of code in front of all your scripts.
library(tidyverse)
library(here)
library(palmerpenguins)
library(broom)
library(janitor)Great Books About Data Analysis
These are the textbooks that I love and that I use as a daily reference. They are all openly accessible.
R
- R for Data Science: An introduction to data analysis with R/Tidyverse by Hadley Wickham and Garret Grolemund.
- (2nd edition coming out soon).
- Introduction to Data Science - A detailed introduction to Data science by the biostatistician Rafael A. Irizarry.
- Advanced R - All you wish to know about programming in R by Hadley Wickham.
- Introduction to Statistical Learning - A detailed introductio to modern statistical methods, implemented in R by Gareth James, Jeffrey Heer, Dominik Moritz, Jake VanderPlas, and Brock Craft, Trevor Hastie and Rob Tibshirani.
- Text Mining in R Analyzing natural language and written text in R, by Julia Silge and David Robinson.
- Tidy Modeling with R An introduction to the tools that compose R’s machine learning framework, by Max Kuhn and Julia Silge.
- Analising Data Using Linear Models, for students in social, behavioural and management science, by Stéphanie M. van den Berg.
Python
- Think Python 2e Learn how to think as a computr scientist with python, by Allen B. Downey.
- The Python Data Science Handbook, foundation of python for data science, by Jake VanderPlas.
- A Whirlwind Introduction to Python, a fast paced introduction to python, by Jake VanderPlas.
- Python for Data Analysis, the basics of data analysis in Python, with numpy and pandas, by Wes McKinney.
- Visualization Curriculum Data Visualization with Python, through Vega-Lite and Altair. Available also for javascript, by Jeffrey Heer, Dominik Moritz, Jake VanderPlas, and Brock Craft.
Javascript
- Javascript for Data Science an introduction to modern Javascript by Maya Gans, Toby Hodges, and Greg Wilson.
- D3 in Depth, visualize data on the web with D3, by Peter Cook.
Git / Github
- Happy Git and Github for useR by Jenny Brian and Jim Hester.
- Pro Git Book, don’t worry, it starts from the basics; by Scott Chacon and Ben Straub.
- Github Skills A set of practical exercise to learn Github.
Project management
- Designing and Building Data Science Solutions how to set up a data science project, Jonathan Leslie and Neri Van Otten.
Dataviz Design
- Data Visualization - A practical introduction Visualize data in R, by Kieran Healy.
- Scientific Color Palettes 🎨 Perceptually uniform colors, for scientific data visualization, by Fabio Crameri.
- Scico, Fabio Crameri’s color palettes ported to ggplot2, by Thomas Lin Pedersen.
Dashboards
- Dashboards with R + Docker + Github Actions by Rami Krispin, head of data science at Apple.
Computer Science
- Missing Semester A generic intro to basic CS productivity tips and tools, by Anish Athalye.
Bayesian Statistics in R and Python
- A 10 minutes introduction to Bayesian statistics in R by Michael Clark.
- An introduction to Bayesian Thinking by Merlise Clyde et al.
- Think Bayes, an introduction to bayesian statistics in Python by Allen B. Downey.
- Bayesian Data Analysis by Andrew Gelman et al.
Geocomputation
- Geocomputation with R; a book on geographic data analysis, visualization and modeling by Robin Lovelace, Jakub Nowosad and Jannes Muenchow.
- Spatial Data Science; concepts, packages and models for spatial data science in R, by Edzer Pebesma, Roger Bivand.
More Books at Bookdown
- Check out the bookdown repository for many more.
Source Code
The source code for this course is available on Github.