June 16, 2015

 Jeffrey Heer. Courtesy Research Computing Center. by Benjamin Recchie

For Jeffrey Heer, associate professor of computer science and engineering at the University of Washington, visualization isn’t just an academic interest—he’s also one of the founders of Trifacta, a software firm that produces sophisticated data visualization tools. He came to give the academic year's final talk to the Research Computing Center as part of its Visualization Speaker Series to discuss new ways to allow researchers to interact with data—as he put it, “either to ready it for data analysis or to gain new insights.”

Heer started off with a node-link diagram showing the connections between people in his own Facebook network. But when he displayed the connection data as a matrix diagram, it became apparent that there was an enormous chunk missing. Facebook, as it turns out, has an upper limit on how much data it would let you download for one query; the data for numerous connections was thus incomplete. The lesson: using different visualization tools can reveal problems with the analysis or collection of data.\

According to interviews Heer cited, 50%—or more—of the time spent working with data is spent cleaning it up rather than analyzing it. He and his collaborators developed a tool called Data Wrangler, designed to reduce that time. Data Wrangler observes the way the user is cleaning up data and suggests ways to speed it up—for example, if the user deletes an empty row, the program might offer to delete all empty rows. Likewise, a predictive interaction tool allows users to select one part of a visualization and follow that data across other visualizations, to see how that subset relates to the rest. (It was Data Wrangler that led to creation of Trifacta, he notes.)

Next up was d3.js, a JavaScript library for manipulating documents based on data developed by one of Heer’s former PhD students. Users can create numerous kinds of visualizations for their data, including scatterplots, cartographic maps, and parallel coordinates plots. Heer said that while the program was designed for serious data analysis, it has showed up in such contexts as the New York Times and, remarkably, at the MTV European Video Awards. (The show’s producers used d3.js to show a live graphic of which artists were generating Twitter traffic at that moment.)

Heer wrapped up his talk by discussing what makes a visualization good. According to research, the visualizations for quantitative values correctly interpreted by the most test subjects used position—that is, the length of lines sharing a common baseline—to compare values. In order of decreasing effectiveness came the length of stacked bars, followed by the slope of a line, and finally the area of an object. Color, it turns out, is the least accurate method to communicating quantitative values. He showed an example of a study done by researchers at Harvard University, where cardiologists at Massachusetts General Hospital were asked to diagnose heart disease based on visualizations with either a rainbow color map or another one with only shades of red and gray. The doctors correctly interpreted the results from the red and gray map much more often than the results from the full-color one—about 30 percentage points higher.

“People might say most of these visualizations are more or less the same,” Heer said. “I think you might have a different opinion if you’re handing this to your doctor...[Visualizations] are interesting from an intellectual point of view, but they can have serious real-world consequences.”

See Jeffrey Heer's slides here.