As part of Visual Meta´s first Hackathon we were asked to submit our ideas, which we would then have the opportunity to pitch at the beginning of the event. Being a language technology guy, I’d pitched a hack on Automatic Summarisation technology that Cemal, one of the developers on my team, had agreed to work on beforehand with me.
But one of the other pitches caught my attention. Hande Demirtas, one of our UX/UI Designers at Visual Meta, had a nice idea to try and implement a conversational agent for our E-commerce platform. As I’m sure most reading this will know, conversational agents, intelligent personal assistants or chatbots as they are sometimes known, have been around for some time now. Indeed, with the arrival of Siri, Cortana and co, one can argue they are already mainstream. With only two days plus some of the weekend to work on this topic, there was no way we could hope to build something of such complexity. Nonetheless, Cemal and I decided to join forces with Hande and ditch my original idea. Besides, working on this sounded like a lot more fun.
What follows is a very brief technical overview of the prototype we built around this concept for the hackathon.
An example conversation with our prototype, known as Lara, is shown in Figure 1. There are a number of challenges associated with building such a prototype, particularly with handling ambiguous input and keeping track of dialogue state, but with a small amount of previous experience working on this topic we had an idea of a few technologies that might help.
Figure 1. Example Conversation with Lara
Probably the biggest challenge in terms of the hackathon, was how to restrict the scope of work to something that could be accomplished in a few days, yet still provide a compelling minimum viable product (MVP).
As a team we agreed to the following scope:
- Restrict the function of Lara to the navigation of products on our platform.
- Provide both spoken and written input / output options.
- Implement only minimal dialogue state management to avoid asking repeated questions.
- Implement only basic input error handling when the user input is not understood.
- Use only very shallow semantics (keyword-based) to try and understand what the user is looking for.
- Build the prototype for the English language.
- Use http://www.shopalike.in as the test site.
Based on the remit above, I’m sure you can guess that Lara is not truly intelligent yet! But not to put a good lady down, we were able to implement enough functionality to provide potential users with a natural language interface to our Indian website that would allow them to browse products. Sure, Lara is not perfect or production ready, but that’s what hackathons are about.
Figure 2 below provides a schematic of the implemented interaction flow for Lara.
Figure 2. User Interaction Flow
By now I’m sure you’re asking: so how did you implement it?
We use Java at Visual Meta. Hi Lara! is a simple Java web application that, for the purposes of the hackathon, we deployed as an executable war file with embedded Apache Tomcat server using the Maven Tomcat plugin (see here: http://tomcat.apache.org/maven-plugin-trunk/tomcat7-maven-plugin/
). The architecture can be roughly broken down into five components that are depicted (blue square boxes) in Figure 3 below. Next I will briefly describe each one in turn.
Figure 3. High Level Component Diagram
Automatic Speech Recognition (ASR)
For speech input, we made use of the Google web speech API. It’s currently only supported on Chrome but gives good quality speech recognition for the purposes of a prototype. Check it out here: https://developers.google.com/web/updates/2013/01/Voice-Driven-Web-Apps-Introduction-to-the-Web-Speech-API?hl=en
. As noted previously, we also gave users the option to switch between using speech recognition and using an instant messaging box.
Text To Speech (TTS)
We tried out the Cereproc cloud developer API for synthesised speech output. Cereproc provides an impressive range of voice options and the developer API is a great option if you are just trying things out. Check it out here: https://www.cereproc.com/en/products/cloud
Web Site Navigation
For navigating http://www.shopalike.in
based on the user input, we simply made use of the search bar functionality on the site over HTTP. We make use of Lucene for search at Visual Meta. In order to ensure that the search results were as relevant as possible (as well as help keep track of dialogue state), we made use of dictionary-based query tagging (explained next) to assign semantic tags to the user input, so that only a subset of input tokens were used for searching.
Lara requires understanding of the user input to be able to provide appropriate responses and navigate the web site for the user. As mentioned before, this serves two purposes that:
- avoid asking redundant questions of the user; and
- provide better search results by only searching on the relevant tokens in the input.
For example, when faced with user input such as: “I’d like some black adidas trainers”,
we’d like to tag the token “black”
as a colour, the token “adidas”
as a brand and the token “trainers”
as the product category. For current purposes all other tokens are superfluous. Based on such information, Lara can work out that there is no need to ask the user for the colour, brand or category of a product the user is looking for and can update her dialogue state accordingly.
We implemented the query tagging using a simple dictionary-based approach. This involved creating an inverted index from words to semantic tag in Lucene from the product inventory of our Indian website. Individual word lists for each tag were indexed using Lucene. To assign the actual tags, we used a simple top-down, greedy approach that involves generating a set of shingles (n-grams or contiguous sequence of tokens) from the user input, which are then queried against the words to tag index. The top-down, greedy approach accounts for names with multiple tokens, e.g. “Hugo Boss”. For example, the user input above becomes:
- “I’d like some black adidas trainers”
- “I’d like some black adidas”
- “like some black adidas trainers”
The resulting output of query tagging for the above input is:
- brands : [adidas]
- categories : [trainers]
- colours : [black]
Dialogue State Management
To our surprise we were able to build an MVP for a conversational agent that demonstrated a natural language interface for product navigation on one of our websites in a matter of days. This took quite some effort by a small three person team, but proved to us the utility of having a cross functional team at a hackathon: the Graphics, Product Management and Engineering departments were all represented.
We hope to devote more time to developing the Hi Lara! concept in the future, so watch this space!
P.S. If you’re wondering where the name Lara came from, there’s no clever acronym behind it, just easy to say and remember!