It is time again for the bi-monthly digest of R and data science news, useful for biology and oceanography in general. This time, April and May 2015!

Always follow the links to get to the article and see the complete example or tutorial.



1) General R

2) Visualization

3) Learning R and data science

4) Analyses


1) General R:

a) A new version of R (“Full of ingredients”) is online for already some time. It plays nicely together with the new R-studio version which came out recently. I am using both successfully without any bugs. Especially the R studio update is useful since including syntax completion and dark, eye-friendly themes.

b) A curated list of the best add-ons for R, – no need for further words.

from: by: Qin Wenfeng


2) Visualization:

a) Confusion matrices/heatmaps – Since using them lately a lot myself I came across a few good articles and news that I want to share.

Here a heat map for the example R dataset “iris” with annotated Pearson correlation tree: I find this useful because of the extra and easily visible information we get on the matrix.


from: by: Stephen Turner


b) Similar to the post above. An article about how to make a heatmap including hierarchical clustering:



from: by: Vidisha Vachharajani


c) Large scale time series in a heatmap? Possible and communicating a lot of information very clearly!

Check out the article and following heat map on historic measles outbreaks over time.




from: by: Benjamin Moore


d) Plotly’s interactive climate change graphs

These interactive graphs are perfect for visualizing climate change data and show just how much better we can communicate science with modern plotting engines.

Here for example temperature, global ocean heat content and atmospheric CO2


from: by: Plotly and The White House, Climate Data Initiative


e) Embedded graphs! Look at the smooth way that information can be overlaid on a map for example: Here we see US Army casualty data for Afghanistan (from wikileaks). Strinking is the amount of data points for civilian victims.



from: by: Joseph Rickert


3) Learning R and data science

a) { swirl + DataCamp }

The datacamp platform for R/statistics/data science learning includes now also swirl, an R package for interactive learning (I used it briefly as part of a coursera class I took (R programming by John Hopkins University) and can recommend it.

It includes now a machine learning tutorial as well, developed together with Machine learning tutorial

In another headline datacamp now offers all tutorials for free!

from by: datacamp


b) Statistics with R textbook: Nicole Radziwill author of ”Statistics (the easier way) with R” says that the innovation of the book is it’s completeness, including all steps necessary to recreate examples and giving every piece of code.

from: qualityandinnovation.comby: Nicole Radziwill


c) Analytics Vidhya. This website collects a variety of articles including basic data science but also how to tune your RandomForests model parameters. Definitely worth checking from time to time.

Tuning a random Forests model:

from: by: Kunal Jain, Tavish Srivastava and Ajay Ohri


4) Analyses

a) Ordination biplots in ggplot2. A nice article that covers a lot of ground on how to make ordination plots.



Further reading on ordination methods:

from: by: Marcus


b) A benchmark of R implementation!

This is also personally a very interesting benchmark since I am working a lot with RandomForests (however with the implementation by SalfordSystems. Next, one should compare Salford’s version to the H2O version).

In this graph we see how the accuracy measure “Area under the ROC curve (AUC)” varies for different RandomForests implementation. H2O achieves in this test the highest AUC values.

To get started with H2O start here: http://h2o.ai

from: by: Szilard Pafka

Leave a Reply

Your email address will not be published. Required fields are marked *

eighteen + twenty =

This site uses Akismet to reduce spam. Learn how your comment data is processed.


1 2 3
September 18th, 2018

Getting started with #deeplearning in R | #RStudio Blog

This is a reblog from the R Studio Blog by Sigrid Keydana, 2018-09-12   There are good reasons to get into deep […]

November 23rd, 2016

#ggforce for accelerating #ggplot2 in #dataviz

  This is a reblog from: Data Imaginist – Announcing ggforce: Accelerating ggplot2 by Thomas Lin Pedersen      November 22, 2016 […]

November 20th, 2016

Reblogged from: Dr. Paige Brown Jarreau, Taking Facial Recognition to the Ocean – Automatic Identification of Tiny Arctic Animals   […]

November 16th, 2016

ggplot2 2.2.0 #dataviz #R #datascience @rstudio

Yes! The new ggplot is out and my favourite is definitely the updated faceting! This is a reblog from RStudio Blog by […]

November 16th, 2016

ggedit add-on for #ggplot2 #dataviz #R #datascience

This is a reblog from: R-statistics blog: Guest post by Jonathan Sidi, Metrum Research Group ggplot2 has become the standard of […]

August 27th, 2016

#R Packages for #Data Access

This is a reblog from R Packages for Data Access by Joseph Rickert Data Science is all about getting access to […]

August 16th, 2016

Getting Your Colleagues Hooked on #R

You love R and you want your colleagues to love R too. In our latest post we will walk you […]

August 14th, 2016

Convolutional #neuralnetwork in #R (MXNet package) #MachineLearning #DataScience

The Beginner Programmer: Image recognition tutorial in R using deep convolutional neural networks (MXNet package) This is a reblog of […]

June 3rd, 2016

Mad Hatter Explains Support Vector Machines #scicomm #machinelearning #SVM #datascience

This is a reblog from: Joel Caldwell. Thanks for this great story!   “Hatter?” asked Alice, “Why are support vector […]

May 11th, 2016

What’s the difference between machine learning, statistics, and data mining?

This is a reblog from SHARP SIGHT LABS. Thanks for that great article! However, I would like to point out […]