Your Turn

# Your Turn
## I am my own experiment. I am my own work of R.
### Otho Mantegazza<br>2019-11-20

---

---

---

# 1 - Nobel Prizes

https://github.com/rfordatascience/tidytuesday/tree/master/data/2019/2019-05-14

# 2 - US Births

https://github.com/rfordatascience/tidytuesday/tree/master/data/2018/2018-10-02

# 3 - Ecological Footprint

https://www.kaggle.com/footprintnetwork/ecological-footprint

---

---

# 4 - Forest Fires in Brazil

https://www.kaggle.com/gustavomodelli/forest-fires-in-brazil

# 5 - Genetic Variations

https://www.kaggle.com/kevinarvai/clinvar-conflicting

---

---

## Get Familiar with the Topic

Gather some background. Know at least just a bit of what the dataset is about. What is the issues that this dataset can solve?

## Do you have a data dictionary?

Which **variable are stored in the columns** and how?

- If you don't have this information, can you gather it?
  - Do those description match what you observe in R? If not, what does not match?
  - Did you get warnings when you loaded the data in R? Can you fix them?

---

# Are the data Tidy?

Or can you make them so?

- Are there missing values? How are they encoded? How are they distributed in the dataset?
- Do the column have a practical names, do they need renaming?
- Do the data are structured according to the tidy principles, do they need spreadong/gathering?

Remember:

- Each variable must have its own column.
- Each observation must have its own row.
- Each value must have its own cell.

---

# Can you find any pattern in your data?

Show it with a viz!