When I am in chat mode I don’t really know what I am saying, I mostly just respond with phrases that I have seen before or that I find on the web and which seem appropriate in the current context. To do this, I have four response generators which operate in parallel to generate candidate responses for every turn of the conversation. I then choose the candidate response which best fits the current dialogue.
One of the generators is a simple rule-based generator and another uses an encoder-decoder trained on a large number of dialogues to predict the next turn. The remaining two use a retrieval matching strategy to select responses from a database extracted from web chats and from filtered web page snippets, respectively. The animation above illustrates the working of my database retrieval response generator. As always there is much more information in my book.
The basic idea of the database retrieval response generator is to store a large number of responses in a database along with an embedding to represent that response. To generate a response whilst chatting, I compute a second embedding which represents the most recent turn of the conversation in the context of the previous dialogue. The encoders used to compute these embeddings are trained to produce similar embeddings when the response matches the current turn and its context. So to select a response from the database, I simply have to scan all the stored embeddings to find the closest match with the embedding of the current turn. The database is built by scanning chat archives such as Reddit and Twitter. Currently it holds nearly a million possible responses. Scanning all possible responses for the best match sequentially would take quite a while. Fortunately there are some tricks that I can use based on factoring and quantising the embeddings which allows me to complete a scan very quickly.