A Few words on Statistical Models

Home | CBSER Summer School by Mawazo Institute

Author

Otho Mantegazza

1 Why Statistical Models

When you reach the limit of what graphical exploration and summary statistics can tell you, you can use statistical models to explore your data.

1.1 Why?

When we look for insights in complex datasets, sometimes we want to know:

Is there a multivariate pattern in the data? One that is evident only when we compare multiple variables at once.
Are the patterns that we observe in data there by chance? If it they are not, we can start hypothesizing and exploring if they are caused by something.

1.2 Which models?

1.2.1 Supervised models

In the lesson on supervised models, we will use linear models to test if we could “predict” the number of spikelets from the number primary branches and other features of rice panicles, i.e. to test if we could predict a response from a collection of predictors.

1.2.2 Unsupervised models

In the lesson on unsupervised models, we will use clustering and Principal Component Analysis to explore if we can use multiple phenotypic features to group rice panicle in distinct sets that behave similarly; i.e. to detect multivariate patterns in the data.

2 Resources

I will give you an overview on the topics mentioned above, but the best resources to study them in detail are the books in this list or any other book that you feel comfortable using: