September 8, 2016

IMG_20160902_131535.jpg

Student presenters, from left to right: Neal Jochmann, Rudyard Richter, Baixue Yao, Alex Mueller.

 

by Benjamin Recchie

 

The Research Computing Center’s staff came together for the second annual installment of its late summer tradition: Demo Day, in which RCC’s student workers present their projects in public. It’s a chance for the staff to better understand the many research projects RCC is helping, as well as for the students to practice public presentation.

First up was rising third year Alex Mueller, who had been working on multiple projects. One was a map of genetic variation for the research group of computational geneticist John Novembre. The interactive map showed the worldwide distribution of certain alleles, as well as where the genes that control those alleles occur on each chromosome. Alex had implemented a PDF generator for the site, which made it easier to share a visualization in a paper or at a conference.

Alex had also worked on two projects that manipulated texts. One, the Visual Text Explorer, took the text of a document (such as a novel) and showed the frequency of the most common words with in it. It could also be used to show where words appeared close to each other, allowing scholars to quickly see what kind of words and phrases were thematically linked to each other. (Alex had helped to bring the ability for Visual Text Explorer to handle character-based languages like Chinese.) The other project was a more experimental one for Haun Saussy, developing a script that replaced characters in a text (in this case, The Iliad) at random. This is a step in determining how many wrong characters can appear in a text before it becomes unreadable, explained Jeff Tharsen, computational scientist for the humanities at RCC.

Rudyard Richter, rising fourth year, was next with his two projects. First was working on a Bayesian optimization problem for Anastasia Zakolyukina, assistant professor of accounting and Neubauer Family faculty fellow at Chicago Booth. In Bayesian optimization, an algorithm approximates an unknown function by taking a few data points and extrapolating it to find the function’s minimum. Rudyard had worked to develop an acquisition function, a necessary component of the algorithm.

He had also been working on translating code written in SAS to Python. SAS is a proprietary, albeit awkward and poorly documented, language used for statistics. Having an equivalent bit of code in Python would make it easier for users of that more common language to design statistics programs. Translating SAS was “satisfying,” he reported, and while the general-purpose Python will presumably run slower than a specialized language like SAS, he was looking forward to running performance comparisons.

Rising fourth-year Neal Jochmann was next. Neal had also worked on a number of projects, such as helping to install the MyTardis software on Midway, which will improve RCC’s ability to handle the data flow from data-intensive laboratory experiments. But Neal had also taken on a higher profile task: a walkthrough and documentation for the Interactive Visualization Workbench (IVW), RCC’s interactive 3D display facility. Since there are only two like it in the world (the other is at the University of Minnesota), making sure the experimental technology is well documented is of great importance. In addition, Neal is working on optimizing the integration of the IVW with Midway so that users can better interact with their datasets stored on the cluster.

The last presenter was Baixue Yao, a third-year graduate student in the Department of Cellular Biology. Baixue was studying whether RCC could save electrical power by rescheduling power-intensive jobs to times of the day (such as late at night) when electricity rates are cheaper. (It’s the next step in the studies of Midway power consumption presented by student worker Will McFadden last year.) The Slurm scheduler currently schedules jobs based on a priority score, which is calculated by including size of the job and the estimated run time—but not the power needed to run it. That information can be included, but it requires some way to identify high power consumption jobs. Baixue is working on building such an estimator. If RCC can save money while still providing the services researchers need, then why not?