Primary tabs

July 9, 2013

by Deborah Foote

While many consider summer prime time for relaxation, graduate, post-doctoral students, and faculty recently participated in the Data Intensive Summer School run by the Virtual School of Computational Science and Engineering (VSCSE). Hosted by the University of Chicago's Research Computing Center (RCC) and coordinated with Northwestern University's Academic and Research Technologies (A&RT) department, this three-day conference (July 8 - 10) focused on teaching researchers techniques to organize, synthesize, and analyze large data sets. The majority of participants came from the University of Chicago and Northwestern University, but representatives from Massachusetts General Hospital and the University of Nebraska also attended. Each day included three to four individual sessions that ran approximately one to two hours and addressed a variety of topics from basic data computing to more advanced text mining and complex statistical data analysis.

Summer school organizers incorporated common tools and software into the various sessions to help students use predictive analytic algorithms,  data management techniques, and non-relational database models. The VSCSE receives partial funding from the National Science Foundation and Great Lakes Consortium for Petascale Computation to organize conferences that "help graduate students, post-docs and young professionals from all disciplines and institutions across the country gain the skills they need to use advanced computational resources to advance their research." While students and faculty from departments more traditionally associated with data analysis like computer science and engineering represented the majority of participants, members of other - perhaps less obvious - departments like geophysical science, linguistics, and obstetrics/gynecology also registered to learn how to apply computer science to their discipline.

The RCC and NW-A&RT co-hosted the event in the Kathleen A. Zar room in the John Crerar Library on the University of Chicago campus in Hyde Park. Registration far outstripped the event cap of 41 because of the limitations of available facilities. The high level of interest shows both the necessity and demand for this type of instruction.

"The falling price of storage, low cost of new digital sensors, increase in processing power, and rapid growth of new internet services have led to an explosion of data available to researches across many domains," Dr. Nicholas Labello, Scientific Computing Consultant at RCC and host of the Chicagoland site, explains, "the interest in the Data Intensive Summer School is a direct result of these trends. A significant portion of the cutting edge research in the physical, biological, and social sciences happening today is possible because of these new data sources, storage capacities, and processing capabilities. The Virtual School's course was targeted at the practical elements of working with data - an aspect often not found in classrooms."

The three-day Data Intensive Summer School took advantage of videoconferencing technology to create a inner-connected cross-continental classroom experience. Each presenter was broadcast to 15 sites, from San Diego to New Jersey, and each site broadcast their own audio and video to the presenter and other remote participants - effectively creating a virtual classroom in which every remote participant (teachers and students) could see and hear one another. The technology afforded participants the opportunity to ask questions and receive live, real-time responses. The professional, cultural, and geographical diversity of the audience created a wide network for participants to meet colleagues with shared interests and further research through collaboration.

The Chicago-based participants of the conference gave resoundingly positive feedback across disciplines. Logan Ward, a materials science and engineering graduate student from Northwestern University, commented that the "R [program for statistical computing] talks were the most helpful because they taught a skill most useful to my research." Similarly, Maria Kamenetsky of the Accounting Research Center at the University of Chicago Booth School of Business said that she plans to share her newly acquired knowledge about big data storage and text mining with "the [Booth] accounting faculty so they can use it in their research." Kayoko Shimmyo, a chemistry graduate student at the University of Chicago commented that the school provided her a "broadened toolset for approaching what would otherwise be stretches into other fields, either for academic science or personal work."

Summer may not seem an ideal time to spend indoors in front of a computer screen learning abut data computing, but the VSCSE Data Intensive Summer School created an optimal environment where students, staff, and faculty at the University of Chicago and across the United States can acquire modern skills and tools needed to help distinguish themselves and their professional endeavors throughout the year.