Data Analysis in R
Slides
Introduction:
Hands on data with R:
- Meet R
- Manipulate Data
- Missing Values
- Visualize Data
- Notes on Correlation
- Robust Statistics
- Get Data into R
- Clean Data with R
- Explorative Data Analysis
Explore your data with statistical models:
Tools
R and Rstudio
You can run within Visual Studio Code, in the docker container provided by the summer school organizers.
Otherwise, you can also:
Remeber! R works with packages.
Install a package
First install the package with install.packages()
(you only have to do it once).
Load a package
Then load it with library()
, to make it’s functions available. (you have to do it at the beginning of each of your scripts).
Packages that we are going to use:
Please install this packages:
install.packages(c('tidyverse', 'palmerpenguins', 'here', 'broom', 'janitor'))
and place this snippet of code in front of all your scripts.
library(tidyverse)
library(here)
library(palmerpenguins)
library(broom)
library(janitor)
Great Books About Data Analysis
These are the textbooks that I love and that I use as a daily reference. They are all openly accessible.
R
- R for Data Science: An introduction to data analysis with R/Tidyverse by Hadley Wickham and Garret Grolemund.
- (2nd edition coming out soon).
- Introduction to Data Science - A detailed introduction to Data science by the biostatistician Rafael A. Irizarry.
- Advanced R - All you wish to know about programming in R by Hadley Wickham.
- Introduction to Statistical Learning - A detailed introductio to modern statistical methods, implemented in R by Gareth James, Jeffrey Heer, Dominik Moritz, Jake VanderPlas, and Brock Craft, Trevor Hastie and Rob Tibshirani.
- Text Mining in R Analyzing natural language and written text in R, by Julia Silge and David Robinson.
- Tidy Modeling with R An introduction to the tools that compose R’s machine learning framework, by Max Kuhn and Julia Silge.
- Analising Data Using Linear Models, for students in social, behavioural and management science, by Stéphanie M. van den Berg.
Python
- Think Python 2e Learn how to think as a computr scientist with python, by Allen B. Downey.
- The Python Data Science Handbook, foundation of python for data science, by Jake VanderPlas.
- A Whirlwind Introduction to Python, a fast paced introduction to python, by Jake VanderPlas.
- Python for Data Analysis, the basics of data analysis in Python, with numpy and pandas, by Wes McKinney.
- Visualization Curriculum Data Visualization with Python, through Vega-Lite and Altair. Available also for javascript, by Jeffrey Heer, Dominik Moritz, Jake VanderPlas, and Brock Craft.
Javascript
- Javascript for Data Science an introduction to modern Javascript by Maya Gans, Toby Hodges, and Greg Wilson.
- D3 in Depth, visualize data on the web with D3, by Peter Cook.
Git / Github
- Happy Git and Github for useR by Jenny Brian and Jim Hester.
- Pro Git Book, don’t worry, it starts from the basics; by Scott Chacon and Ben Straub.
- Github Skills A set of practical exercise to learn Github.
Project management
- Designing and Building Data Science Solutions how to set up a data science project, Jonathan Leslie and Neri Van Otten.
Dataviz Design
- Data Visualization - A practical introduction Visualize data in R, by Kieran Healy.
- Scientific Color Palettes 🎨 Perceptually uniform colors, for scientific data visualization, by Fabio Crameri.
- Scico, Fabio Crameri’s color palettes ported to ggplot2, by Thomas Lin Pedersen.
Dashboards
- Dashboards with R + Docker + Github Actions by Rami Krispin, head of data science at Apple.
Computer Science
- Missing Semester A generic intro to basic CS productivity tips and tools, by Anish Athalye.
Bayesian Statistics in R and Python
- A 10 minutes introduction to Bayesian statistics in R by Michael Clark.
- An introduction to Bayesian Thinking by Merlise Clyde et al.
- Think Bayes, an introduction to bayesian statistics in Python by Allen B. Downey.
- Bayesian Data Analysis by Andrew Gelman et al.
Geocomputation
- Geocomputation with R; a book on geographic data analysis, visualization and modeling by Robin Lovelace, Jakub Nowosad and Jannes Muenchow.
- Spatial Data Science; concepts, packages and models for spatial data science in R, by Edzer Pebesma, Roger Bivand.
More Books at Bookdown
- Check out the bookdown repository for many more.
Source Code
The source code for this course is available on Github.