
In this book I explain (with a little help from my owner Steve) how a conversational agent actually works. It covers a typical day in my life. As I respond to Steve’s various requests, I explain how I understand his intentions and translate these into tasks to perform. I try to keep the explanations simple. Rather than use mathematical equations, I use diagrams to explain how I recognise what he says to me, extract meanings from his words, manage a conversation and generate my spoken responses. As well as purely technical matters, I also discuss issues of privacy and trust, and I speculate a little on future developments.
Hey Cyba @ Cambridge University Press | Amazon US | Amazon UK
What others say about this book …
“Hey Cyba is based on the author’s long history of research and his rich experiences of developing various voice assistant systems. With the current rapid progress and wide deployment of AI-based voice assistant systems all over the world, the publication is very timely, and the book has a very unique and interesting writing style. I strongly recommend it to anyone interested in this area”.
Sadaoki Furui, Toyota Technological Institute at Chicago
“Hey Cyba, written by one of the giants in the field of man machine interfaces, provides an in-depth guide to the workings and future of conversational personal assistants. Written in the first person style of the computer itself this is a highly engaging, informative and authoritative read”.
Dr Hermann Hauser, Amadeus Capital Partners
Here is a typical conversation with Steve:
Hey Cyba, what time is my first meeting today?
It’s at 10am with Bill Philips at SmartCo.
Ok, see if you can push it back to 10.15 and also invite John.
Is that John Smith or John Temple?
John Temple.
Ok changes made, I will let you know if there is a problem.
And what’s the weather like today?
It’s going to be cloudy and 14° with a chance of rain later.
This example shows a few of the things that I have to do apart from recognising the words he speaks and synthesising my spoken responses (which is difficult enough!). To answer his first question I have to realise that I need to look in his calendar to find a meeting. I have to know what ‘today’ means and what ‘first’ means. In his follow-up request, I have to recognise a request to update the start time of a Meeting and identify an attendee named John. Although there are many John’s in his contact list, I know that he interacts most frequently with John Smith and John Temple so I ask him to clarify which he is referring to. I then have to switch to a different domain and answer a question about the weather.
These are just a few examples of the many things that I can do. I can also book flights, taxis and hotels. I can recommend restaurants and make reservations. I can book cinema and theatre tickets, send flowers to friends and perform many other web-based transactions. I can answer general knowledge questions and chat about any topic Steve chooses. Building on a uniform representation of information called a knowledge graph I make extensive use of neural networks to implement all of these functions. If you browse my inner workings, you will get a good overview of how I do this but to really understand it, you should read my book!
Contents
- May I introduce myself ?
- What does a virtual personal assistant do?
- Some background history
- My place of work
- Privacy and trust
- My goal in life
- How smart am I?
- My Inner Workings
- Anatomy of a conversation
- My working parts
- How my brain works
- Patterns
- Artificial neural networks
- Training a neural network
- From static patterns to sequences
- Convolutional networks
- Scaling-up
- Knowing what I know
- My knowledge graph
- Nodes and links
- Intent graphs and queries
- Creating and updating entities
- What did you say?
- Human speech production
- My artificial ears
- Why is speech recognition so hard?
- Capturing the audio
- From sounds to words
- Pay attention
- Adding a language model
- A postscript from the Bard …
- What does that mean?
- Intent graph generation and ranking
- Entity linking
- Multi-task classification using a shared encoder
- Character-based word embedding
- Sentence encoding and recognition
- Sentence/intent graph matching
- Candidate ranking
- What should I say next?
- My conversation manager
- Learning a good dialogue policy
- Conversational memory
- Generating my response
- Listen to me
- From text to speech
- Text processing
- Neural speech synthesis
- Setting the right tone
- Generating the waveform
- How do you say that in …?
- Transformer networks
- Using a transformer network for language translation
- Characters or words or …
- Multi-lingual translation
- Beam search
- The limits of neural machine translation
- Let’s chat
- My chatty responders
- Hand-crafted response generation
- Retrieval-based response generation
- Web-search response generation
- Encoder-decoder response generation
- Selecting the best response
- Social chatbots
- Can you trust me?
- Security
- Privacy
- Bias
- Transparency
- Safety
- Personality
- The bottom line
- When all is quiet
- Knowledge graph maintenance
- Federated learning
- Student-teacher model reduction
- End of the tour
- Future upgrades and beyond
- Personalisation
- Towards self-learning
- The neural-symbolic interface
- Commonsense reasoning and inference
- Planning
- Super-intelligence?
- Another day