List

It is time again for the bi-monthly digest of R and data science news, useful for biology and oceanography in general. This time, April and May 2015!

Always follow the links to get to the article and see the complete example or tutorial.

 

Content:

1) General R

2) Visualization

3) Learning R and data science

4) Analyses

 

1) General R:

a) A new version of R (“Full of ingredients”) is online for already some time. It plays nicely together with the new R-studio version which came out recently. I am using both successfully without any bugs. Especially the R studio update is useful since including syntax completion and dark, eye-friendly themes.

b) A curated list of the best add-ons for R, – no need for further words.

from: blog.revolutionanalytics.com by: Qin Wenfeng

 

2) Visualization:

a) Confusion matrices/heatmaps – Since using them lately a lot myself I came across a few good articles and news that I want to share.

Here a heat map for the example R dataset “iris” with annotated Pearson correlation tree: I find this useful because of the extra and easily visible information we get on the matrix.

1

from: gettinggeneticsdone.com by: Stephen Turner

 

b) Similar to the post above. An article about how to make a heatmap including hierarchical clustering:

 

2

from: blog.revolutionanalytics.com by: Vidisha Vachharajani

 

c) Large scale time series in a heatmap? Possible and communicating a lot of information very clearly!

Check out the article and following heat map on historic measles outbreaks over time.

 

3

 

from: benjaminlmoore.wordpress.com by: Benjamin Moore

 

d) Plotly’s interactive climate change graphs

These interactive graphs are perfect for visualizing climate change data and show just how much better we can communicate science with modern plotting engines.

Here for example temperature, global ocean heat content and atmospheric CO2

 

 

from: blog.plot.ly by: Plotly and The White House, Climate Data Initiative

 

e) Embedded graphs! Look at the smooth way that information can be overlaid on a map for example: Here we see US Army casualty data for Afghanistan (from wikileaks). Strinking is the amount of data points for civilian victims.

4

 

from: blog.revolutionanalytics.com by: Joseph Rickert

 

3) Learning R and data science

a) { swirl + DataCamp }

The datacamp platform for R/statistics/data science learning includes now also swirl, an R package for interactive learning (I used it briefly as part of a coursera class I took (R programming by John Hopkins University) and can recommend it.

It includes now a machine learning tutorial as well, developed together with www.kaggle.com: Machine learning tutorial

In another headline datacamp now offers all tutorials for free!

from www.datacamp.com by: datacamp

 

b) Statistics with R textbook: Nicole Radziwill author of ”Statistics (the easier way) with R” says that the innovation of the book is it’s completeness, including all steps necessary to recreate examples and giving every piece of code.

from: qualityandinnovation.comby: Nicole Radziwill

 

c) Analytics Vidhya. This website collects a variety of articles including basic data science but also how to tune your RandomForests model parameters. Definitely worth checking from time to time.

Tuning a random Forests model: http://www.analyticsvidhya.com/blog/2015/06/tuning-random-forest-model/

from: analyticsvidhya.com by: Kunal Jain, Tavish Srivastava and Ajay Ohri

 

4) Analyses

a) Ordination biplots in ggplot2. A nice article that covers a lot of ground on how to make ordination plots.

 

5

Further reading on ordination methods:

http://ordination.okstate.edu/overview.htm

from: beckmw.wordpress.com by: Marcus

 

b) A benchmark of R implementation!

This is also personally a very interesting benchmark since I am working a lot with RandomForests (however with the implementation by SalfordSystems. Next, one should compare Salford’s version to the H2O version).

In this graph we see how the accuracy measure “Area under the ROC curve (AUC)” varies for different RandomForests implementation. H2O achieves in this test the highest AUC values.
6

To get started with H2O start here: http://h2o.aihttp://en.wikipedia.org/wiki/H2O_(software)

from: http://datascience.la by: Szilard Pafka

Leave a Reply

  Posts

1 2 3
November 23rd, 2016

#ggforce for accelerating #ggplot2 in #dataviz

November 20th, 2016

November 16th, 2016

ggplot2 2.2.0 #dataviz #R #datascience @rstudio

November 16th, 2016

ggedit add-on for #ggplot2 #dataviz #R #datascience

August 27th, 2016

#R Packages for #Data Access

August 16th, 2016

Getting Your Colleagues Hooked on #R

August 14th, 2016

Convolutional #neuralnetwork in #R (MXNet package) #MachineLearning #DataScience

June 3rd, 2016

Mad Hatter Explains Support Vector Machines #scicomm #machinelearning #SVM #datascience

May 11th, 2016

What’s the difference between machine learning, statistics, and data mining?

April 6th, 2016

Plotter app for interactive plotting of ggplots, on shinyapps.io or locally.