Searching for maliciousness using newspapers and Google trends data

So, I thought I would do a quick blog post, just as I have reached a block in writing and thought this would help to get me back into the mood. A couple of years ago now(!), I did some archival research on how certain malware are consumed and practiced in the media and tie this to Google trends data (search and news) to see if there were any correlations between malware events (such as its release, a malware report and so on) and then see if there was anything interesting in this.

How I did it

As one could expect, there is a strong correlation between news and google searches. I took articles published in any language using search terms, ‘stuxnet’, ‘conficker’, ‘dridex’, and ‘cryptolocker’. The former three are case studies in my thesis, and I have subsequently dropped cryptolocker. I turned to Nexis (LexisNexis), a newspaper database, to search for these terms in articles globally (which captured publications beyond English, due to the uniqueness of the words, but only those captured in Nexis unfortunately). In particular, I searched for items in the title, byline, and first paragraph so I did not pick up residual stories as much as possible. This required a substantial clean-up of newspaper editions, stories that did not make sense, mistakes in the database, and other issues. Clearly there is a lot of noise in this data, and I took some but not all precautions to try and keep this to a minimum as it was not a statistical check, but a more qualitative activity to see any form of ‘spikes’ in the data.

I used Google trends that were freely available from their website for each malware. However, frustratingly these only come out as a ratio of 0-100 (0=Least, 100=Most) on quantity of searches. So, I had to scale each malware’s newspaper articles from Nexis to 0-100 to ensure that each malware was comparable to a certain level, and to make sense of the two different sources I was using for this data. I also did this globally so that I had a close a comparison to the Nexis data as possible. This produced some interesting results, where I cover one incidence of interest.

What does it show

Though I hold little confidence on what this proves, as it was more of qualitative investigation, I think there a few points that were clear.

Combined Graph

This takes all the malware terms I was looking at, and scales them 100 on all of the equivalent data points. This shows spikes for the release of each malware; Conficker, Stuxnet, Cryptolocker, and Dridex in succession.

 

Though this may be a little hard to read, what it shows is how Stuxnet absolutely dominates over the other three malware in terms of newspaper content, however it barely registers on Google trends data when compared to the worm Conficker, that emerged in 2008 and 2009. This suggests, that though we in cybersecurity may have been greatly concerned about Stuxnet, the majority of searches on Google in fact point to those which directly impact people. In addition, though I do have graphs for this, it is clear that newspapers reacted very strong to the publication of stores in June 2012 – such as an article in the New York Times by David Sanger – that Stuxnet was related to a broader operation by the US and Israel dubbed the ‘Olympic Games’.

Emergence

When we turn to the difference between search and news data, there are some interesting phenomenon – something I would like to delve more into – where searches sometimes predate news searches. This is particularly stark with Cryptolocker and Conficker, suggesting that people may have been going online ahead of its reporting to search what to do with a ransomware and worms. Hence focusing in cybersecurity purely on news articles may not actively reflect how people come into contact with malware and how they engage with media.

Concluding Thoughts

It is not that I am claiming here that I have found something special, but it was interesting to see that my assumptions were confirmed through this work. I have a lot of data on this, and I may try and put this together as a paper at some point, but I thought it would be nice to share at least a bit about how conventional media (newspapers) and Google trends data can tell us something about how malware and its effects are consumed and practiced through different medias globally and to see how this varies over time and according to the materiality and performance of malware.

%d bloggers like this: