It has been some time since I wrote a blog for the ‘Frankenstein 2.0’ project, which now has a dedicated page on my website here.
This blog posts covers a significant amount of development that went into the bot’s development since January, and here I will outline some of the difficulties, intrigue, and frantic development ahead of its ‘prototype’ demonstration at the 2021 Critical Legal Conference in Dundee as a plenary speaker. I believe that this event was somewhat of a ‘success’, people were able to ask questions, received a response (that was funny!), and were able to interact with the bot in several ways.
After many months of thinking and tinkering with the bot, with the conference coming ever-closer, I dedicated a significant proportion of the summer to the bot (including many weekends!). This involved a move to a dedicated computer that was purchased for the project. This meant that there was a stable system in order to (physically) move the bot around (with its now many, many open source code dependencies). This also meant that I moved the project from Linux to Windows. I could have continued to use Linux, but I genuinely felt more comfortable on Windows (which in turn had its own limitations and frustrations, as I shall outline below). I also started to use a platform to actually piece the bot together – in this case PyCharm made by JetBrians, a Czech software development company. Although I endeavoured as much as possible to avoid proprietary platforms, I had to make some compromises in where it would ‘work’… So, I didn’t exactly ‘escape’ the ecosystem. This is not to say I couldn’t have, but this was more a question of time of learning new things (which is one of the reasons that proprietary systems and platforms are so prevalent).
One such frustration with Windows comes about in the range of voices that are available for use by ‘non-native’ applications. Although the Windows 10 operating system has a range of voices – those in British and American English ‘packs’ available to my project were only feminine-presenting in both sound and name (GB: Hazel, US: Zira). One of the discursive constructs of the bot was to be an old, male, professor who only ever offers a comment and never a question. This was clearly not possible, as it was in Linux with many open-source ‘voices’, which only lead me to speculate on how gendered the production of voices are and to speculate why only Hazel and Zira were available for me to use…
As I outlined in the other blog posts to this project, however, – 1, 2, 3 – the same rough suite of tools were retained through the use of python. These included pyttsx3 (for text-to-speech), spaCy (for natural language processing – NLP), and PocketSphinx (for speech recognition). All these ‘packages’ require a huge range of other package dependencies that I will not detail here. To maintain all of this, I relied on Anaconda to deal with package management in relation to setting up my computational ‘environment’ (again, for simplicity, stepping somewhat inside a proprietary environment). I think this demonstrates the absolute complexity of producing a chatbot that can deal with *any* question (at least in English), with its varying specialisms. That is, it is quite rare for one person to ever do a whole project. This would have been genuinely impossible without open source materials – so I thank the many, many people unknown for their collective authorship of Frankenstein 2.0.
Markov, markov, markov
Getting back into the details, although I had set up the basic requirements by January 2021 to receive capture speech as audio, do some limited NLP using spaCy, and was able to get the bot to ‘speak’ words back out of the computer – I was very far off having a chatbot that would make sense of anything. In the early days of the project, there were many discussions with Hayden and Pip about the texts it should ‘learn’ upon – including Mary Shelley’s Frankenstein and Hannah Arendt’s The Origins of Totalitarianism, among others. However, I had some more internal worries around how I was going to get the bot to recognise what was being said and then get it to respond to some text that seemingly related to the ‘input’.
Of course, the ideal that I at least had including having someone ask a question that could receive an adequate response in an answer form from an individual book as a ‘canonical’ text. Some of you may think about connections to GPT-3, a machine-learning language engine, that appears to produce very high quality texts that appear to resemble human writing. Yet, this is fed on a lot of ‘big’ data. It also has a team of people working on it, with lots of question / answer constructs. A book has none of these ‘attributes’; it is ‘small’, and is written in a style (almost, always) that does not contain question / answer responses. So, I was somewhat stuck about what I could deliver in such a project.
After much deliberation and thinking, I came to what I initially thought was a very suboptimal offering – using a stochastic Markov chain. These have been around for many years, and are often used in funny chatbots on Twitter (e.g. see this) . Markov chains assess the next ‘event’ based solely on the previous ‘event’ (i.e. a chain does not have a recursive ability). For producing a text out of limited data however, the Markov chain can take the relational connections between words, and assess their probability of being connected to the next. This can produce sentences that are weird, but simultaneously human-sensible (and often, funny).
This simplified what I had to do with the language processing, I needed to extract a keyword that would be used to ‘seed’ the Markov chain trained on a particular text – this would produce something peculiarly based on the probabilistic relationality of words that would be the author’s own, but reformed and rearticulated. Thus, I could use a word and get a response from some texts, win! Here, I used a reformed version of someone else’s Markov chain for a Twitter chatbot. I’ll talk a bit later however about how many words were recognised by my use of spaCy initially later! [Spoiler: Initially, it was not good!]
At this point, I had worked out how to pass a (seed)word to a piece of text, and it now outputted something vaguely in line with what I expected. However, to make the bot really shine, I needed to also be able to add instructions – such as ‘which author would you like to speak as?‘ and ‘Please respond yes or no!‘ For some more experienced programmers than me this may have been pretty obvious. However, it took me months of trying so many different methods I can barely remember. However, I eventually landed on threads as something which could ‘listen’ for multiple different terms almost simultaenously.
Let me phrase the problem as thus – a computer works sequentially, so if you need a response to a question which is dependent on listening out to ‘yes’ or ‘no’, then if you say ‘yes’ and then (due to recognition or audio quality issues) ‘no’, it may make the bot do two different things. This gets more complicated if you have many different ‘authors’ you may want to speak to. I believe it was an Oxford PhD colleague, Matt Smith, who responded to a Twitter plea on how to manage this – so thanks for that! But essentially, the threading process meant I could actively listen for different words and assess them at almost the same time. This meant (most of the time!) that if you said ‘yes’ and then ‘no’ directly after, it should have already moved onto a branch of the program for a response of ‘yes’.
So, threads were the saviour of the spoken word part of the project. In a written response, it would be fairly easy to do a simple ‘if else’ statement, but as I was working with data that was particularly difficult to capture, render digitally, and make sense of, that was occurring across multiple different phrases with a lot of noise – this really was the best solution I could come up with (there are likely better ways, mais c’est la vie).
The birth of Dr Canonical
This all meant after lots of tweaking with how phrases would be recognised by PocketSphinx, and generally tidying up of code, insertion of texts, that I was ready to transform the bot into ‘Dr Canonical’. After meetings with the organisers, there was a great discussion about what should be considered ‘canonical’, and one which I was fully on board with. So, instead of attempting to offer a canonical set of texts (as much as this is always privileged, contested, and typically incorrect), the whole of the Critical Legal Conference – Frankenlaw – became Frankenlaw’s monster. That is, all the abstracts from the conference were collated and added to the bot.
Yet, as this conference was to go online, it required a text based output to go alongside the audio in case of any issues. This meant having to quickly put together an HTML webpage output (see tweet image below) which updated alongside what the bot was saying. I had to do a little hack to get this going as it was at fairly short notice – so instead of running a web server, I simply downloaded a Google Chrome extension that refreshed the webpage every second. A very inelegant solution, but one that worked.
However, on a test run, the recognition was a bit of a disaster. Although the text as a whole was being recognised by spaCy extremely well, my prioritisation of the ‘dobj’ (direct object) identifier meant it wasn’t picking up all that much. As a slight detour, the speaker which was bought for the project really wasn’t good – it ended up being far superior online and people speaking through the Blackboard Collaborate platform used at the conference! However, I had to solve this. So, I spent the evening before travelling to Dundee playing around with a range of sentences – so that I added ‘nsubj’ (nominal subject), as well as nouns and verbs identified in the text. This proved to pick up a relevant word (after some cleaning of common words) said in a text!
At the conference, there was much playing around, experimenting with the set up, especially as Pip was there able to experience this in person. This led to cutting out of different authors, speeding up the voice delivery, and cutting the steps and words said by the bot. This meant that on the day, I think it did pretty well. Yet, in reflecting on the prototype of this project – some of the initial conclusions is how there is a community of unknown others stitching together our digital worlds; relatively little is known about the things that are stitched, but they work. This clearly has huge implications – such as where learning data comes from, what others we collectively rely on, as well as how machine learning tools should be deployed.
I have a lot more to say on this – and this will come in future, more coherent writing, but for now – this is how my creature was born.