This is a reblog from R Packages for Data Access by Joseph Rickert

Data Science is all about getting access to interesting data, and it is really nice when some kind soul not only points out an interesting data set but also makes it easy for you to access it. Below is a list of 17 R packages that appeared on CRAN between May 1st and August 8th that, in one way or another, provide access to publicly available data.

bigQueryR: Provides an interface to Google’s BigQuery. The vignette shows how to use it.

blscrapeR: Provides an API wrapper for Bureau of Labor Statistics data sets. There is a vignette showing how to access inflation and price data, one for accessing Wages and Benefits data, and one for mapping BLS data.


cdlTools: Provides functions to download USDA National Agricultural Statistics Service (NASS) cropscape data for a specified state.

dataone: The dataone R package enables R scripts to search, download and upload science data and metadata from/to the DataONE Federation. The website describes DataOne as “a community driven project providing access to data across multiple member repositories, supporting enhanced search and discovery of Earth and environmental data”. The package comes with several vignettes including this overview.

dataRetrieval: Package to retrieve USGS and EPA hydrologic and water quality data, officially supported by USGS. The vignette gives several examples of downloading interesting data sets.

eechidna: Provides the data from the 2013 Australian Federal Election and tools to analyze it. There are several nicely done vignettes. The following plot which shows election results by polling place comes from the vignette on plotting polling stations.


There are also vignettes on census and election data, shapefiles and mapping Australia’s Electorates.

getHFdata: Provides functions to downloads and aggregate high frequency trading data for Brazilian instruments directly from the Bovespa ftp site. There is a vignette to get you started. The following plot showing unemployment data by state comes from the vignette on Census data.

googleAnalyticsR: Provides an interface to the Google Analytics Reporting API. There is a vignette.

googleway: Provides functions to retrieve data from 6 Google Maps APIs. The vignette shows how.

gutenberg: Search and download public domain works in the Project Gutenberg collection. The vignette shows you how to search and download public domain texts.

ie2miscdata: Contains a collection of USGS environmental and water resources data sets. There is a vignette showing how to create plots from the data. (See also: dataRetrieval.)

macleish: Provides functions to data from the Ada & Archibald MacLeish field station in Whately, MA. Thev ignette shows how to obtain weather data.

muckrock: Contains public domain information on requests made by muckrock through the US Freedom of Information Act.

nasadata: Provides an interface to NASA’s Earth Imagery and Assets API and Earth Observatory and Natural Event Tracker.

oec: Provides an interface to the Observatory for Economic Complexity.

osi: Provides a connector to the Open Source Initiative API that provides machine –readable data about open source software licenses.

pewdata: Provides for reproducible, programmatic retrieval of survey data sets from the Pew Research Center. The vignette shows how to setup and use the package. Look here for an interesting poll about what Americans know about science.

TCGAretriever: Provides an interface to data sets from the The Cancer Genome Atlas (TCGA) via the Cancer Genomic Data Server web service.

For more packages that provide APIs to data sets have a look at the CRAN Task View on Web Technologies and Services. For a list of interesting data sets out there in the wild see the MRAN Data Sources page.

[Update: added the dataRetrieval package, at the suggestion of Laura DeCicco.]

Editor’s note: This is Joe’s last post to Revolutions as a member of the Microsoft team: he is heading on for further adventures in the world of R. We want to thank Joe for his many contributions to the blog over the past 6 years, and please join us in wishing him well!


Leave a Reply

Your email address will not be published. Required fields are marked *

nine + seven =

This site uses Akismet to reduce spam. Learn how your comment data is processed.


1 2 3
September 18th, 2018

Getting started with #deeplearning in R | #RStudio Blog

This is a reblog from the R Studio Blog by Sigrid Keydana, 2018-09-12   There are good reasons to get into deep […]

November 23rd, 2016

#ggforce for accelerating #ggplot2 in #dataviz

  This is a reblog from: Data Imaginist – Announcing ggforce: Accelerating ggplot2 by Thomas Lin Pedersen      November 22, 2016 […]

November 20th, 2016

Reblogged from: Dr. Paige Brown Jarreau, Taking Facial Recognition to the Ocean – Automatic Identification of Tiny Arctic Animals   […]

November 16th, 2016

ggplot2 2.2.0 #dataviz #R #datascience @rstudio

Yes! The new ggplot is out and my favourite is definitely the updated faceting! This is a reblog from RStudio Blog by […]

November 16th, 2016

ggedit add-on for #ggplot2 #dataviz #R #datascience

This is a reblog from: R-statistics blog: Guest post by Jonathan Sidi, Metrum Research Group ggplot2 has become the standard of […]

August 27th, 2016

#R Packages for #Data Access

This is a reblog from R Packages for Data Access by Joseph Rickert Data Science is all about getting access to […]

August 16th, 2016

Getting Your Colleagues Hooked on #R

You love R and you want your colleagues to love R too. In our latest post we will walk you […]

August 14th, 2016

Convolutional #neuralnetwork in #R (MXNet package) #MachineLearning #DataScience

The Beginner Programmer: Image recognition tutorial in R using deep convolutional neural networks (MXNet package) This is a reblog of […]

June 3rd, 2016

Mad Hatter Explains Support Vector Machines #scicomm #machinelearning #SVM #datascience

This is a reblog from: Joel Caldwell. Thanks for this great story!   “Hatter?” asked Alice, “Why are support vector […]

May 11th, 2016

What’s the difference between machine learning, statistics, and data mining?

This is a reblog from SHARP SIGHT LABS. Thanks for that great article! However, I would like to point out […]