Primary tabs

2048px-NYSE127.jpgby Benjamin Recchie

People have been trying to guess the direction stock prices will go since, well, the dawn of stocks. Fortunes have been made and lost by investors who think they understand when investors will buy and sell better than anyone else. But the recent rise of social media, in theory, allows you to pool the opinions of tens of thousands of investors simultaneously--a technique being used by a group of researchers affiliated with the University of Chicago Graham School.

The research, conducted by masters’ degree candidates Benedict Augustine and Baodan Zhang, along with faculty members Yuri Balsanov and Sema Barlas, was born from an exploration of natural language processing—the development of algorithms that can understand language in the way humans use it. Natural language processing is commonly researched by linguists, but the collaborators wanted to see if it could be applied outside the academy. “Given that almost 80% of today's data is unstructured, the popularity of social media data and the challenge of deriving useful information out of it made an interesting/relevant problem to be solved,” Augustine says. Needless to say, successfully predicting the direction of individual stocks or the market as a whole would be of great use to the financial industry.

The team started with a sample of social media users and their commentary about the market. (They used data from SeekingAlpha.com, a business-focused website; the collaborators looked into getting their data from Twitter, but found it was too expensive.) Next. they tried to limit the number of calculations necessary by filtering this data, first by topic of interest and then by individual stock symbols.  They also found that instead of processing an entire article, they could process only the summary. One issue that cropped up was that the sample users sometimes wrote in ungrammatical (but still intelligible) statements. For example, “it was common to state ‘Company X – a good buy’ using a dash or a colon instead of the verb ‘is,’” says Augustine. The researchers had to preprocess their data to check for such patterns and correct for them.

The project required a system for collecting data in almost real time, all the time, says Augustine. His team turned to the Research Computing Center (and consultant Robin Weiss) to help them set up the infrastructure required to collect the data, then process and analyze it.

After collecting data between March and August of 2015, the researchers tried to match their predictions with actual movements of the markets. Their model wasn’t able to predict the direction of the S&P 500 stock index in the short term, but it was able to predict the movement of the Russell 2000 index for the next day with 80% accuracy. Interestingly, they also found that positive and negative sentiment didn’t affect the Russell 2000 index in the same way: the markets were much faster to process negative information than positive.

The model is not the first to attempt to predict the markets with social media, but their model does show strong predictive power, even in its early stages. There is much more work left to do, the researchers say: testing the model against historic data, adjusting the model for seasonality, and applying more advanced machine learning techniques to parse the users’ statements, to name just a few. (Augustine says they’re still collecting data, too, with the goal of providing more data to future students.) But even in its early form, a poster presenting their results won the Attendees’ Choice prize at the Mind Bytes 2015 research computing expo, held on campus in October.

Benedict has a word of caution for any would-be investors hoping to beat the market with a simple program:  “This model should not be used as the sole factor to guide investment decisions. However, it can definitely be used as a factor, in conjunction with other traditional factors.” It seems that even with a supercomputer behind you, there’s no such thing as easy money.