I’ve been slowly working away at the bot since the last post a couple of weeks ago. Although there has been little ‘concrete’ progress; in terms of thinking and practicing the techniques that the bot will be using, it was a very good couple of weeks.
I’ve focused on two main things: 1) on making the bot respond to particular requests and respond with ‘hardcoded’ responses; and, 2) on extracting the intent of questions that the bot may be asked. I’ll take each of these in turn and anything that I think is worth noting down. You can read my prior blog posts here – blog 1, blog 2.
An interactive bot
Last time, I had got to the point where I could make Franky speak and also to listen, but I had not quite been able to bring this altogether. This means something like this:
- Listen for an ‘activation’ word.
- Do some explanation of what Frankenstein (or Dr Canonical) is and ask which author you wish the bot to speak like.
- Listen for which author and ask you to ask your question.
This is about as far in this process I have got – but the next things I need to do is get the Pysttx3 package to do a continual listen for a question; and to stop listening once it has identified an end (this could be made simpler by asking for a user instruction at the end). This then will be analysed in what I did next.
However, before getting there, there’s some interesting things I had to do to get the bot to recognise Audre Lorde. The speech recognition engine is built upon phonetical guidelines, and thus, at least at this iteration, I needed to change the recognition to ‘Audrey Lord
e‘ with the extra y and the e removed. Otherwise the speech recognition system wouldn’t be able to pick this up. This is a case of me having to bend language to the logics of the python I am using. I could rectify this by building in the phonetic structure for Audre Lorde’s name, but this is a lot of work and learning on how to do this.
Another example of this difficulty also appears with Hannah Arendt. Here it is difficult for the speech recognition system to identify this name. Think if you say that name as two discreet words – this produces a distinctly different sound than when they are said together, where the ‘a’ drags between the two words (there must be a name for this phenomenon, but I don’t know what it is!). This means I’ve got more thinking on whether to only ‘listen’ for ‘rendt’ which is a ‘clearer’ signal in the sound wave.
I have now worked through a very useful book – Natural Language Processing with Python and Spacy – that works with a particular python module, spaCy. This is an open-source software library for natural language processing that is used by a lot of businesses around the world. It uses neural network algorithms – which you can use pre-loaded or other ones trained on other data. For a project like this, there needs to be some data – so I opted for the one in the book which is an English-language model that is small and is based on web resources such as Wikipedia. I could train my own, but could I find a better set of data? Likely not.
SpaCy allows you to develop different processes which can help you identify the direct object, adverbs, nouns and so on. This means that when someone will speak to the bot, I can pass this along to SpaCy (which is wholly based on the computer and does not have an ‘outside’ connection) for me to analyse the text given to identify ‘keywords’ for the bot’s authors to speak on. Exactly how this will morph and interact is still an open question, but I think I have got my head around the basics, and that means I can now at least identify the important (!!!) words from what is said. This, at least for me, feels like a major step so that the authors can at least respond to something.
That’s it for now – I’ll be back in a couple of weeks with blog 4!