This is a reblog from R Packages for Data Access by Joseph Rickert

Data Science is all about getting access to interesting data, and it is really nice when some kind soul not only points out an interesting data set but also makes it easy for you to access it. Below is a list of 17 R packages that appeared on CRAN between May 1st and August 8th that, in one way or another, provide access to publicly available data.

bigQueryR: Provides an interface to Google’s BigQuery. The vignette shows how to use it.

blscrapeR: Provides an API wrapper for Bureau of Labor Statistics data sets. There is a vignette showing how to access inflation and price data, one for accessing Wages and Benefits data, and one for mapping BLS data.


cdlTools: Provides functions to download USDA National Agricultural Statistics Service (NASS) cropscape data for a specified state.

dataone: The dataone R package enables R scripts to search, download and upload science data and metadata from/to the DataONE Federation. The website describes DataOne as “a community driven project providing access to data across multiple member repositories, supporting enhanced search and discovery of Earth and environmental data”. The package comes with several vignettes including this overview.

dataRetrieval: Package to retrieve USGS and EPA hydrologic and water quality data, officially supported by USGS. The vignette gives several examples of downloading interesting data sets.

eechidna: Provides the data from the 2013 Australian Federal Election and tools to analyze it. There are several nicely done vignettes. The following plot which shows election results by polling place comes from the vignette on plotting polling stations.


There are also vignettes on census and election data, shapefiles and mapping Australia’s Electorates.

getHFdata: Provides functions to downloads and aggregate high frequency trading data for Brazilian instruments directly from the Bovespa ftp site. There is a vignette to get you started. The following plot showing unemployment data by state comes from the vignette on Census data.

googleAnalyticsR: Provides an interface to the Google Analytics Reporting API. There is a vignette.

googleway: Provides functions to retrieve data from 6 Google Maps APIs. The vignette shows how.

gutenberg: Search and download public domain works in the Project Gutenberg collection. The vignette shows you how to search and download public domain texts.

ie2miscdata: Contains a collection of USGS environmental and water resources data sets. There is a vignette showing how to create plots from the data. (See also: dataRetrieval.)

macleish: Provides functions to data from the Ada & Archibald MacLeish field station in Whately, MA. Thev ignette shows how to obtain weather data.

muckrock: Contains public domain information on requests made by muckrock through the US Freedom of Information Act.

nasadata: Provides an interface to NASA’s Earth Imagery and Assets API and Earth Observatory and Natural Event Tracker.

oec: Provides an interface to the Observatory for Economic Complexity.

osi: Provides a connector to the Open Source Initiative API that provides machine –readable data about open source software licenses.

pewdata: Provides for reproducible, programmatic retrieval of survey data sets from the Pew Research Center. The vignette shows how to setup and use the package. Look here for an interesting poll about what Americans know about science.

TCGAretriever: Provides an interface to data sets from the The Cancer Genome Atlas (TCGA) via the Cancer Genomic Data Server web service.

For more packages that provide APIs to data sets have a look at the CRAN Task View on Web Technologies and Services. For a list of interesting data sets out there in the wild see the MRAN Data Sources page.

[Update: added the dataRetrieval package, at the suggestion of Laura DeCicco.]

Editor’s note: This is Joe’s last post to Revolutions as a member of the Microsoft team: he is heading on for further adventures in the world of R. We want to thank Joe for his many contributions to the blog over the past 6 years, and please join us in wishing him well!


Leave a Reply


1 2 3
November 23rd, 2016

#ggforce for accelerating #ggplot2 in #dataviz

November 20th, 2016

November 16th, 2016

ggplot2 2.2.0 #dataviz #R #datascience @rstudio

November 16th, 2016

ggedit add-on for #ggplot2 #dataviz #R #datascience

August 27th, 2016

#R Packages for #Data Access

August 16th, 2016

Getting Your Colleagues Hooked on #R

August 14th, 2016

Convolutional #neuralnetwork in #R (MXNet package) #MachineLearning #DataScience

June 3rd, 2016

Mad Hatter Explains Support Vector Machines #scicomm #machinelearning #SVM #datascience

May 11th, 2016

What’s the difference between machine learning, statistics, and data mining?

April 6th, 2016

Plotter app for interactive plotting of ggplots, on or locally.