RCC supports data-intensive research at the University of Chicago by providing centrally managed storage resources for hosting large research data sets. By making commonly used research data sets accessible through a centralized storage system, RCC is able to provide researchers with the data they need without the overhead of storing and managing the repository themselves. Because RCC stores research data sets in the same high-performance storage environment used by the Midway compute cluster, data and computational tools are tightly coupled, thereby allowing for efficient analysis routines.

RCC is able to host open-access as well as proprietary data sets on a case-by-case basis. Currently, RCC hosts the following data sets:

  • The Nielsen Scanner and Panel full data sets
  • The IRI Marketing Data Set
  • Python’s Natural Language ToolKit (NLTK) data corpus
  • Over 90,000 protein structures from the Protein Databank (PDB)
  • The Community Earth Science Model (CESM) input data set and models
  • The full Corpus of Contemporary American English (COCA) and Corpus of Historical American English (COHA) datasets

To request hosting of an additional data set or for information on how to access these corpora, please contact RCC.