Where's Waldo? on the Gigapixel Scale

March 6, 2013

Belshazzar's Feast, by Rembrandt Harmenszoon van Rijn.

by Rob Mitchum, Computation Institute

They say a picture is worth a thousand words. But if your camera is good enough, the photos it takes could also be worth billions of data points. As digital cameras grew increasingly popular over the last two decades, they also became exponentially more powerful in terms of their image resolution. The highest-end cameras today can claim 50 gigapixel resolution, meaning they are capable of taking images made up of 50 billion pixels. Many of these incredible cameras are so advanced that they have out-paced the resolution of the displays used to view their images – and the ability of humans to find meaningful information within their borders.

Closing this gap was the focus of Amitabh Varshney's talk for the Research Computing Center's Show and Tell: Visualizing the Life of the Mind series in late February. Varshney, a professor of computer science at the University of Maryland-College Park, discussed the visual component of today's big data challenges and the solutions that scientists are developing to help extract maximum value out of the new wave of ultra-detailed images -- a kind of next-level Where's Waldo? search. The methods he discussed combine some classic psychology about how vision and attention works in humans with advanced computational techniques.

As the centerpiece of the talk, Varshney displayed a 5 gigapixel photo of Mt. Whitney in California. If you knew what to look for, the amount of detail was incredible – Varshney could zoom in thousands of times on a given region of the photograph to show a group of hikers or a bear walking up the side of a mountain. But when you don't already know what interesting information such a complex image contains, the search can be tedious and frustrating as you zoom in and laboriously check every individual pixel.

To figure out how to direct the attention of a person or a computer to the interesting parts of a massive detailed image, Varshney turned to the field of psychology. Since the 1960's, experiments have used eye-tracking technology to study how people look at images; what details are they drawn to first, and where do they linger the longest. Consistently, these experiments found that humans are drawn to contrast, whether in colors, luminance, texture or other visual qualities. Essentially, science confirmed the perceptual psychology behind tricks that artists have used for centuries to direct the viewer's attention in paintings such as Rembrandt's Belshazzar's Feast.

So Varshney and his team applied that principle to the analysis of mega-huge images, developing a multi-part process to look for contrasting or anomalous regions. First, the image is analyzed for areas of contrast using multiscale aggregation, scanning for anomalies at several different zoomed-in levels within the image. But on a 1.3 gigapixel image, this method still identifies tens of thousands of areas of interest, Varshney said, which would take a human observer 5 hours to review even if he spent only one second on each hit.

To reduce this workload, the next step is to rate those anomalous regions for their uniqueness within the photograph as a whole, ruling out the hits that repeat many times over the entire image. Varshney and his team developed an algorithm for "nearest neighbors anomaly detection" to find places of true rarity in the image. Once again, the theory behind their approach mirrors the best practices of artists, this time in the field of photography.

"The idea is if something occurs a lot, it stops being interesting," Varshney said. "In fact if you study photography, that's what they will tell you: shoot things with a different perspective. You want to do things differently from what you expect normally."

The combination of the two methods was able to detect as many as 90% of the "interesting" objects from the 5 gigapixel image of Mt. Whitney after about an hour of processing time and a final step where a human observer reviews the images. Similar methods could also work in three dimensions, Varshney said, helping medical radiologists spot areas of interest (such as forgotten surgical equipment) in an MRI image. The group is also interested in nudging observers to the important details by automatically raising the salience of potentially interesting areas in an image, perhaps by enhancing the contrast through color or luminance instead of annoying arrows or blinking pixels.

But while technological advances make the capture, analysis and improved salience of these images possible, it's important to recognize and capitalize upon the natural abilities of the human brain.

"In some ways, this is really taking us in a new direction where visualization is going to play a very important role in helping us understand trends, patterns and anomalies in large amounts of data," Varshney said. "But to do that, we need to leverage our human perception and cognitive skills, and develop these methods in close collaboration with domain scientists."

Read the original article at https://www.ci.uchicago.edu/blog/wheres-waldo-gigapixel-scale

Primary tabs