Searching for maliciousness using newspapers and Google trends data

So, I thought I would do a quick blog post, just as I have reached a block in writing and thought this would help to get me back into the mood. A couple of years ago now(!), I did some archival research on how certain malware are consumed and practiced in the media and tie this to Google trends data (search and news) to see if there were any correlations between malware events (such as its release, a malware report and so on) and then see if there was anything interesting in this.

How I did it

As one could expect, there is a strong correlation between news and google searches. I took articles published in any language using search terms, ‘stuxnet’, ‘conficker’, ‘dridex’, and ‘cryptolocker’. The former three are case studies in my thesis, and I have subsequently dropped cryptolocker. I turned to Nexis (LexisNexis), a newspaper database, to search for these terms in articles globally (which captured publications beyond English, due to the uniqueness of the words, but only those captured in Nexis unfortunately). In particular, I searched for items in the title, byline, and first paragraph so I did not pick up residual stories as much as possible. This required a substantial clean-up of newspaper editions, stories that did not make sense, mistakes in the database, and other issues. Clearly there is a lot of noise in this data, and I took some but not all precautions to try and keep this to a minimum as it was not a statistical check, but a more qualitative activity to see any form of ‘spikes’ in the data.

I used Google trends that were freely available from their website for each malware. However, frustratingly these only come out as a ratio of 0-100 (0=Least, 100=Most) on quantity of searches. So, I had to scale each malware’s newspaper articles from Nexis to 0-100 to ensure that each malware was comparable to a certain level, and to make sense of the two different sources I was using for this data. I also did this globally so that I had a close a comparison to the Nexis data as possible. This produced some interesting results, where I cover one incidence of interest.

What does it show

Though I hold little confidence on what this proves, as it was more of qualitative investigation, I think there a few points that were clear.

Combined Graph

This takes all the malware terms I was looking at, and scales them 100 on all of the equivalent data points. This shows spikes for the release of each malware; Conficker, Stuxnet, Cryptolocker, and Dridex in succession.

 

Though this may be a little hard to read, what it shows is how Stuxnet absolutely dominates over the other three malware in terms of newspaper content, however it barely registers on Google trends data when compared to the worm Conficker, that emerged in 2008 and 2009. This suggests, that though we in cybersecurity may have been greatly concerned about Stuxnet, the majority of searches on Google in fact point to those which directly impact people. In addition, though I do have graphs for this, it is clear that newspapers reacted very strong to the publication of stores in June 2012 – such as an article in the New York Times by David Sanger – that Stuxnet was related to a broader operation by the US and Israel dubbed the ‘Olympic Games’.

Emergence

When we turn to the difference between search and news data, there are some interesting phenomenon – something I would like to delve more into – where searches sometimes predate news searches. This is particularly stark with Cryptolocker and Conficker, suggesting that people may have been going online ahead of its reporting to search what to do with a ransomware and worms. Hence focusing in cybersecurity purely on news articles may not actively reflect how people come into contact with malware and how they engage with media.

Concluding Thoughts

It is not that I am claiming here that I have found something special, but it was interesting to see that my assumptions were confirmed through this work. I have a lot of data on this, and I may try and put this together as a paper at some point, but I thought it would be nice to share at least a bit about how conventional media (newspapers) and Google trends data can tell us something about how malware and its effects are consumed and practiced through different medias globally and to see how this varies over time and according to the materiality and performance of malware.

Robots Should (Not) be Viruses

Last week, I was introduced to a conversation by Pip Thornton (@Pip__T) that was initiated by David Gunkel on Twitter, where he was taking a suggestion from Noel Sharkey on whether robots should be classified as viruses (see the tweet below). As I have been focusing on ‘computer viruses’ as part of my PhD on malware, this strikes me as something to be avoided, and as I said on Twitter, I would write a blog post (which has morphed into a short draft essay) to go into more depth than what 280 characters can provide. So, here, I try to articulate a response to why Robots Should Not be Viruses, and why this would be an inadequate theorisation of robotics (in both their material form, and manifestation in this discussion as ‘intelligent’ agents that use extensive machine learning – note that I avoid saying AI, and I will address this later too). This goes along two axes; first that viruses are loaded with biological and negative resonance, and second that they should not be seen as a liminal form between (non)life.

A bit of background

I’ve been working on malicious software, and on what can be considered ‘computer viruses’ throughout my PhD. This has involved seven-months of (auto)ethnographic work in which I developed a method of ‘becoming-analyst’ in a malware analysis laboratory, where I actively learnt to analyse and produce detections for an ‘anti-virus’ business. I have been particularly interested in how the construct in culture of the virus has been formed by a confluence between cybernetics and cyberpunk, to form what we typically see malware, and their subset viruses, as comparable to biological viruses and something with the maliciousness as somehow embedded. This work has led to various works on the viral nature of society, particularly in the sense of widespread propagation and how it ‘infects’ society. Most closely to my work, Jussi Parikka, in Digital Contagions (2007), considers how the virus is representative of culture today. That is viral capitalism, affect, distribution and so on. However, to aid my understanding of the confluence made present in malware between biology and computing, I was a research assistant on an interdisciplinary project ‘Good Germs, Bad Germs’ (click here for the website, and most recent paper from this here) in a piece of participatory research to understand how people relate to germs, viruses, and bacteria (in the biological sense) in their kitchens. So, I have both researched DNA sequencing techniques, perceptions of viruses (biological/technical), and how these get transmitted outside of their formal scientific domains to become active participants in how we relate to the world. This is true of both biological and computing viruses.

It is from my dual experiences that I have attempted to partially understand what could be considered immunological and epidemiological approaches to the study of viruses; whether that be in practices of hygiene, prevention or social knowledges about viruses. However, in both I have sought to counter these broadly ‘pathological’ approaches to something that can be considered ecological. This is not an ecology based on an ecosystems perspective where every part of the ‘web’ of relations somehow has a place and somewhat stable state. It is one based on a coagulation of different environments and bodies (organic/inorganic/animate/inanimate and things that go beyond these dualisms) that do not have forms of equilibrium, but emergent states that have are part of new ‘occasions’ (to follow Alfred Whitehead) but maintain some semblance to a system, as they exist within material limits. These can come together to produce new things, but materials cannot have unlimited potentiality – in similar ways to how Deleuze and Guattari have a dark side (Culp, 2016) with reference to the Body without Organs. So, ecology allows for an embrace of what could be seen as outside the system to be a central part of how we understand something – and allow us to consider how ‘vibrant matter’ (Bennett, 2010) plays a role in our lives. This allows me to turn why viruses (in their cultural and real existence, are an important player in our society, and why labelling robots as viruses should be avoided).

 

Viruses as loaded

Popular culture sees viruses as broadly negative. This means they are seen as the archetype of the ‘end of the world’ in Hollywood or as something ready to destabilise the world with indiscriminate impact (think of the preparedness of governments against this threat). This is not to say that this threat for certain events is unwarranted. I would quite like to avoid dying in an ‘outbreak’, so procedures in place to capture and render knowable a biological virus are not a bad thing. Yet, after the large malware incidents of 2017 – WannaCry and (Not)Petya (I wrote a short piece on the NHS, environments and WannaCry here) – there is also a widening recognition of the power of malware (which inherits much of the baggage of previous viruses) to disrupt our worlds. So, here we have powerful actors that many would deem to be different according to the inorganic/organic distinction. Before I move onto two more detailed points, I want to disregard the good/bad morality that sits at the whole debate at the level of malware or the biological virus; it is not helpful as the former suggests that software has intent, and in the latter that a virus works in the same way in all environments. I have two things to say about how I think we should rethink both;

  1. Ecology: As I said in the previous section – this is where my research has focused. In my research on malware, I argue for what could be called a ‘more-than-pathological’ approach to its study. This means I respect and think we benefit from how we treat biological viruses or detection of malware  in our society – as these are essential and important tasks to enable society to live well. But we could change our attention to environments and social relations (human and non-human) as a way to better understand how to work together with viruses (biological and computing). So, for example, if we look at the environment in malware detection (currently now in contextual approaches – behavioural detections, reputational systems, machine learning techniques) then the ecology becomes a better way of recognising malware. This is similar to new thinking in epidemiology that looks at how the environments and cultures of different ‘germs’ that can come together which may mean that the existence of a certain pathogen is not necessarily bad – but that excessive cleaning practices, for example – can cause infection.
  2. More-than-Human: You may be, by now, questioning why I combine biological and technical viruses in the same discussion. Clearly, they are not the same, they have different materialities, which allow for different possibilities. However I refrain from working on a animate/inanimate or organic/inorganic basis. For my core interest, malware may be created by humans (through specialised hackers or by their formation in malware ‘kits’) but that does not mean that they are somehow a direct extension of the hacker. They work in different environments, and the human cannot and will not be able to understand all the environments it works within. Also there are frequently some very restricted ‘choices’ that it may make – meaning it takes one angle, produces some random IP, sets some time. In similar ways biological viruses must ‘choose’ what cell to infiltrate; albeit in very limited ways that it does not ‘think’ in the way we would understand. However when you compare computing technologies (and malware) and biological viruses, they both make choices, even in the most limited way compared to the likes of, let’s say, ocean waves and rocks. I will explain in more detail in the next section why this is so important, and more fully the distinction.

Therefore viruses are loaded with a variety of different understandings of their place in regard to the human. This is cultural, organisational, hierarchical, in a nature-technical bifurcation, as some form of liminal (non)life. I think moving away from centring the human helps address the moral questions by understanding what these things are prior to ethics. Ethics is a human construct, with human values, and one we (often) cherish – but other cognizers as I shall explain now, do not work on a register that is comparable to us. This has implications for robotics (I shall add the disclaimer here that this is not my research specialism!) – and other computing technologies – meaning that we have to develop a whole different framework that may run parallel to human ethics and law but cannot fool itself into being the same.

Moving beyond the liminal virus

What I have found intriguing from the ongoing Twitter debate, is the discussions about control (and often what feels like fear) of robots. Maybe this is one way to approach the topic. But as I have said, I do not wish here to engage in the good/bad part of the (important) debate – as I think it leads us down the path I’ve had trouble disentangling in my own research on the apparent inherent badness of malicious software. Gunkel suggested in an early tweet that Donna Haraway’s Cyborg (Cyborg Manifesto) and in a piece he kindly provided, Resistance is Futilewhich is a truly wonderful, could offer a comparison of how this could work with robots as viruses. Much of Gunkel’s piece I agree with – especially on understanding the breakdown of the Human and how we are many of us can be seen as cyborgs. It also exposes the creaky human-ness that holds us together with a plurality of things that augment us. However, I do not think the cyborg performs the same function as the virus does for robots in the sense Haraway deploys it. The virus holds the baggage of the negative without understanding its ecology. It also artificially suggests that there is an important distinction in the debate between animate and inanimate – does this really matter apart from holding onto ‘life’ as something which organic entities can have? I think the liminality that could be its strength is also its greatest weakness. I would be open to hearing of options of queering the virus to embrace its positive resonances, but I still think the latter point I make on the (in)animate holds.

Instead, I rather follow N. Katherine Hayles on her most recent work in Unthought (2017) where she starts to disentangle some of these precise issues in relation to cognition. She says it may be better to think of a distinction between cognizers and noncognizers. The former are those that make choices (no matter how limited) through the processing of signs. This allows humans, plants, computing technologies, and animals to be seen as cognizers. The noncognizers are those that do not make choices such as a rock, an ocean wave, or an atom. These are guided by processes and have no cognizing ability. This does not mean that they don’t have great impact on our world, but an atom reacts according to certain laws and conditions that can be forged (albeit this frequently generates resistance – as many scientists will attest). Focusing on computing technologies, this means that certain choices are made in a limited cognitive ability to do things. They process signs – something Hayles called cybersemiosis in a lecture I attended a couple of months ago – and therefore are a cognitive actor in the world. There are transitions between machine code, assembly, to object-oriented programming, to visualisations on a screen, that are not determined by physics. This is why I like to call computers more-than-humans – something not wholly distinct but not completely controlled by humans. Below is a simple graphic. Where the different things go is complicated by the broader ecological debate – such as whether plants are indeed non-human with the impacts of climate change and so on. But that is a different debate.

So, when we see computing technologies (and their associated software, machine learning techniques, robots) they are more-than-human cognizers. This means they have their own ability to cognize which is independent and different to human cognition. This is critical. What I most appreciate from Hayles is the careful analysis of how cognition is different in computing – I won’t say any more on this as her book is wonderful and worth the read. However, it means equating humans and robots on the same cognitive plane is impossible – they may be seem aligned yes, but they will be divergences and ones that we can only start to think of as machine learning increases its cognitive capacities.

Regarding robots more explicitly, what ‘makes’ a robot – is it its materialities (in futuristic human-looking AI, in swarming flies, or in UAVs) or its cognitive abilities that are structured and conditioned by these? Clearly there will be an interdependency due to the different sensory environments they come into contact with as all cognitive actors in our world have. What we’re talking about with robots is an explicitly geographical question – in what spaces and materialities will these robots cognate? As they work on a different register to us, I think it is worth suspending discussions on human ethics, morals, and laws to go beyond their capacities of good or bad (though they will have human influence as much as hackers do with malware). I do not think we should leave these important discussions behind, but how we create laws for a cognition different to ours is currently beyond me. I find it inappropriate to talk of ‘artificial intelligence’ (AI) due to these alternative cognitive abilities, as computing technologies will never acquire an intelligence that is human, but only parallel, to the side, even if the cognitive capacities exceed those of humans. They work on a register that is neither good nor bad, but on noise, signal and anomaly rather than how humans tend to work on the normal/abnormal abstraction. Can these two alternative logics work together? I argue that they can work in parallel but not together – and this has important ramifications for anyone working in this area.

Going back to my point on why Robot Should Not be Viruses – it is because it is not on the animate/inanimate distinction which robots (and more broadly computing technologies) need to be understood – but on cognition (where the virus loses its ‘liminal’ quality). So though I have two problems with robots as viruses; the first that it is a loaded term that I have studied in both biological and computing viruses, it is the second problem on cognition which is the real reason Robots Should Not be Viruses.

*And now back to writing my thesis…*

 

 

 

Strava, Sweat, Security

Wearable tech, the ability to share your fitness stats, suggest routes, follow them, and so on have been a growing feature of (certain) everyday lifestyles. This ability to share how the body moves, performs, and expresses itself gives many people much satisfaction.

One of the popular methods is through Strava which is primarily used by runners and cyclists to measure performance (maybe to improve), and also share that information with others publicly. There are individual privacy settings, that allow you to control what you share and do not share. All seems good and well. An individual can express their privacy settings in the app: that should be the end of the story. Yet, Strava’s temptation is to share. Otherwise we could just other wearable tech that does not have such a user-friendly sharing ability, and be done with it.

Strava has recently shared a ‘Global Heatmap‘ that allows for an amalgamation of all these different individuals sharing their exercise, their sweat, their pace, to Strava for ‘all’ to access. Hence here we have a collective (yet dividualised – in the Deleuzian sense) body that has been tracked by GPS, the sweat allowing for an expression of joy, an affective disposition to share. This sharing allows for a comparison to what a normative Strava body may be, allowing for a further generation of sweaty bodies. Yet in the generation of these sweats, security becomes entangled.

This is where privacy and security comes entangled in the mist of a fallacy of the individual. The immediate attention of Strava’s release quite literally maps to concerns over ‘secret’ locations, such as  secret military bases, but also some more trivial such as how many use it around GCHQ, in the UK. This has led to calls for bans for those in military units to reduce this exposure. However, this does not address how the multiple individual choices using an app in which privacy is only one of anonymisation when this is ‘publicly’ shared by a person. This aggregated picture is ‘fuzzy’, full of traces of this dividual sweaty body. These sweaty bodies are flattened, treated as data points, then recalibrated as though all points are the same. In fact, they are not. Privacy and security are inherently collective.

So, why does this matter? If the individuals shared certain information then they are free to do so and their individual routes are still not (re)identified. Yet privacy in this concept is based on a western canon of law, where the individual is prime. There is a form of proprietary sense of ownership over our data. This is not something which I disagree with, as much in feminist studies informs us of the importance of the control over of our bodies (and therefore the affects it produces; the sweat, on our mobile devices, on Strava). Yet there has to be a simultaneous sense of the collective privacy at work. In this case, it is rather trivial (unless you run a secret military base). Yet it explodes any myths around the use of ‘big data’. Did Strava make clear it would be putting together this data, was explicit consent sought beyond the initial terms and conditions? Just because something becomes aggregated does not mean we lose our ability to deny this access.

The use of more internet-connected devices will allow for maps that gain commercial attention, but at the expense of any sense of collective privacy. Only states were able to produce this information; through a bureaucracy, yet there have been publicly, democratically-agreed steps to protect this information (whether this is effective is another question). Now we live in a world where data is seen to be utilised, to be open, freely accessible. Yet we need far more conversation that extends beyond the individual to the collective bodies that we occupy. To do one without the other is a fallacy.

My, and your, sweaty body is not there for anyone to grab.

Do cybersecurity objects matter?

For anyone who has been following my twitter will realise I have been writing about malware as objects. This seems like a fundamentally weird and albeit useless thing to do (and one I have wondered myself). Yet thinking of objects as something that matter in cybersecurity is essential.

This is a question I posed myself: can malware be an object?

This was somewhat triggered by my other side as a geographer interested in space, time, and place. Evidently when malware was emerging in the 1990s as a political concern, cyberspace was still often referred to as ‘frictionless’ and transversing the Westphalian model of individual sovereign states – all part of a growing post-Soviet triumphalism of western liberalism. This is how malware is often seen, as being ‘out-there’ and something bounded and what travels without little connection to anything else. Yet I’ve never been able to put my finger on to what may be a malware object – it clearly is much more than the software used to construct it. How about the writers (sometimes known as hackers and artists), the malware ecology of different interdependencies? Can it extend out to speeches, political discourse, malware laboratories? Some of these things would not exist if it wasn’t for malware. Yet who knows what this is.

In a good start to thinking through these issues and implications for cybersecurity, Balzacq and Cavelty (2016) (open access available here) talk of an actor-network theory approach. Though I disagree with some points they do highlight the importance that objects have to, in this case, international relations. Yet it is also true they have a huge impact on computer science and cybersecurity. I do not want to overly dwell on the philosophy here, but there have been movements to appreciate objects as things in themselves over the past two decades or so, with one of these being Object Orientated Ontology (OOO). This helps us comprehend how objects, such as malware, have an ability to act and cause things to change. I am not saying that malwares have intention, as that would suggest they have a human quality to be malicious – that is the human working with them. Of course objects in computer science have a somewhat different meaning to what I’m referring here, but do fit in. Without falling into the trap that Alexander Galloway notes in his work (2013) that we orientate our thinking around the technology we talk about, objects have states and behaviours.

However I do not think we can locate malware in a specific location on a map. If we think of how malware communicate – through command and control servers, in botnets, through peer-to-peer networking, using the internet – to download modules, to share information, to activate, then malware is stretched across multiple different places. If you require some information from a server that is routed through Ukraine, let’s say, but your target is in the USA, then where is malware as an object in the broader sense? Yes, there is local software on the individual machine, but it requires connection to extract information for instance. Then there are the political reasons that certain groups operate out of certain places, the training required, the knowledge to do certain things are all geographically disperse. Can you separate the malware object from this? I think not, and it becomes part of the malware object, made up of different malicious elements, such as the local software on the machine, with a sever elsewhere, with the right political conditions that enable it to become malware in a sense that we can detect and analyse it and it becomes successful.

So, when we consider malware as geographically distributed in this way, it is in tension, with lots of potential for something to happen (think of the Conficker botnet that did very little). So it is when all elements of the malware object are part of doing something that it really formulates, and it becomes malicious. Yes, we can see the warning signs through signatures, but it is only when the malware object comes together that it is something we can track, analyse, detect through networks. This is why Advanced Persistent Threats (APTs) are so interesting, as they are so sleuth that the object is very difficult to detect – and may not seem to be acting differently to the norm. When is an APT part of a malware object? This is something I need to do a bit more thinking on.

Therefore when talking about malware, when detecting it, it’s about the entire ecology of malware, it is not just the end-point detection, but it only becomes malware when all the elements forge an object. This may now sound obvious – but it disrupts the idea that an object is material, located in a fixed place at a certain time, and adds tension to the mixture. Therefore you have to tackle all parts of the ecology – computer science, international relations, crime – to attempt to force it to something that is only ever partially controlled. This means that connected thinking is essential to consider how to tackle malware, and cannot be simply at the end-point. Evidently, this is just me dropping an idea at the moment but I hope to work with this much more as a core tenet of how malware can be reconsidered to assist in cybersecurity, but also challenge some geographical thinking.

Evernote – Data Protection Woes…

Unfortunately, Evernote cannot be used for any personal or confidential information it seems if you’re from the EU. As I was wading through the required confidentiality and data protection required for my DPhil fieldwork, I had to really dig around to find out what the University’s (that is Oxford’s) policy on cloud storage. It appears this excludes any transfer of data outside of the EEA (the European Economic Area). That is even with the new ‘Privacy Shield’ between the EU and the USA.

I was thinking of using Evernote as a simple tool to store notes and my research diary – with the syncing a useful back-up tool. However Evernote is not yet a signatory to the new ‘Privacy Shield’, which you can check here. Although Evernote is a signatory to the old ‘Safe Harbor’ agreement, this is now invalid – as can be seen on this page – following the European Court of Justice’s ruling in October 2015. Therefore if you are a researcher, and are using Evernote with information that falls under Data Protection, you are likely falling foul of your obligations to ensure it remains under EU jurisdiction.

Therefore I recommend you follow instructions here to create a local notebook only that is stored only on the computer you are using it on. Instructions are here. This is the only way to ensure you are keeping with requirements under EU data protection and ensuring your research maintains data security integrity. I’m hoping Evernote sign up to ‘Privacy Shield’ soon so that I can sync my notes as this would be very useful.

screen-shot-2016-10-03-at-15-32-53

If I am wrong, it would be great to know, but after a good time searching I cannot find evidence to the contrary.