The RCC supports data-intensive research at the University of Chicago by providing centrally managed storage resources for hosting large research data sets. By making commonly used research data sets accessible through a centralized storage system, the RCC is able to provide researchers with the data they need without the overhead of storing and managing the repository themselves. Because the RCC stores research data sets in the same high-performance storage environment used by the Midway compute cluster, data and computational tools are tightly coupled, thereby allowing for efficient analysis routines.

The RCC is able to host open-access as well as proprietary data sets on a case-by-case basis. Currently, the RCC hosts the following data sets:

  • The Community Earth Science Model (CESM) datasets and models

  • Over 150,000 protein structures from the Protein DataBank (PDB)

  • The Nielsen Scanner and Panel full data sets

  • The IRI Marketing dataset

  • Consumer-level data linked to LPS McDash loan-level data for improved risk management (CRISM-McDash)

  • CoreLogic Loan-Level Market Analytics data

  • All Natural Language ToolKit (NLTK) corpora, models and training sets

  • The full Corpus of Contemporary American English (COCA) and Corpus of Historical American English (COHA) datasets

To request hosting of an additional dataset, or for information on how to access these resources, please contact RCC.