Hi Lara! Building a Conversational Agent For Visual Meta’s First Hackathon

Ross TurnerBy Ross Turner 12 Monaten agoNo Comments
Home  /  Tech Corner  /  Hi Lara! Building a Conversational Agent For Visual Meta’s First Hackathon

Introduction

As part of Visual Meta´s first Hackathon we were asked to submit our ideas, which we would then have the opportunity to pitch at the beginning of the event. Being a language technology guy, I’d pitched a hack on Automatic Summarisation technology that Cemal, one of the developers on my team, had agreed to work on beforehand with me. But one of the other pitches caught my attention. Hande Demirtas, one of our UX/UI Designers at Visual Meta, had a nice idea to try and implement a conversational agent for our E-commerce platform. As I’m sure most reading this will know, conversational agents, intelligent personal assistants or chatbots as they are sometimes known, have been around for some time now. Indeed, with the arrival of Siri, Cortana and co, one can argue they are already mainstream. With only two days plus some of the weekend to work on this topic, there was no way we could hope to build something of such complexity. Nonetheless, Cemal and I decided to join forces with Hande and ditch my original idea. Besides, working on this sounded like a lot more fun. What follows is a very brief technical overview of the prototype we built around this concept for the hackathon.

System Overview

An example conversation with our prototype, known as Lara, is shown in Figure 1. There are a number of challenges associated with building such a prototype, particularly with handling ambiguous input and keeping track of dialogue state, but with a small amount of previous experience working on this topic we had an idea of a few technologies that might help. pic1 Figure 1. Example Conversation with Lara Probably the biggest challenge in terms of the hackathon, was how to restrict the scope of work to something that could be accomplished in a few days, yet still provide a compelling minimum viable product (MVP). As a team we agreed to the following scope:
  • Restrict the function of Lara to the navigation of products on our platform.
  • Provide both spoken and written input / output options.
  • Implement only minimal dialogue state management to avoid asking repeated questions.
  • Implement only basic input error handling when the user input is not understood.
  • Use only very shallow semantics (keyword-based) to try and understand what the user is looking for.
  • Build the prototype for the English language.
  • Use http://www.shopalike.in as the test site.
Based on the remit above, I’m sure you can guess that Lara is not truly intelligent yet! But not to put a good lady down, we were able to implement enough functionality to provide potential users with a natural language interface to our Indian website that would allow them to browse products. Sure, Lara is not perfect or production ready, but that’s what hackathons are about. Figure 2 below provides a schematic of the implemented interaction flow for Lara. pic2 Figure 2. User Interaction Flow

Technical details

By now I’m sure you’re asking: so how did you implement it? We use Java at Visual Meta. Hi Lara! is a simple Java web application that, for the purposes of the hackathon, we deployed as an executable war file with embedded Apache Tomcat server using the Maven Tomcat plugin (see here: http://tomcat.apache.org/maven-plugin-trunk/tomcat7-maven-plugin/). The architecture can be roughly broken down into five components that are depicted (blue square boxes) in Figure 3 below. Next I will briefly describe each one in turn. pic3 Figure 3. High Level Component Diagram

Automatic Speech Recognition (ASR)

For speech input, we made use of the Google web speech API. It’s currently only supported on Chrome but gives good quality speech recognition for the purposes of a prototype. Check it out here: https://developers.google.com/web/updates/2013/01/Voice-Driven-Web-Apps-Introduction-to-the-Web-Speech-API?hl=en. As noted previously, we also gave users the option to switch between using speech recognition and using an instant messaging box.

Text To Speech (TTS)

We tried out the Cereproc cloud developer API for synthesised speech output. Cereproc provides an impressive range of voice options and the developer API is a great option if you are just trying things out. Check it out here: https://www.cereproc.com/en/products/cloud.

Web Site Navigation

For navigating http://www.shopalike.in based on the user input, we simply made use of the search bar functionality on the site over HTTP. We make use of Lucene for search at Visual Meta. In order to ensure that the search results were as relevant as possible (as well as help keep track of dialogue state), we made use of dictionary-based query tagging (explained next) to assign semantic tags to the user input, so that only a subset of input tokens were used for searching.

Query Tagging

Lara requires understanding of the user input to be able to provide appropriate responses and navigate the web site for the user. As mentioned before, this serves two purposes that:
  1. avoid asking redundant questions of the user; and
  2. provide better search results by only searching on the relevant tokens in the input.
For example, when faced with  user input such as: “I’d like some black adidas trainers”, we’d like to tag the token “black” as a colour, the token “adidas” as a brand and the token “trainers” as the product category. For current purposes all other tokens are superfluous. Based on such information, Lara can work out that there is no need to ask the user for the colour, brand or category of a product the user is looking for and can update her dialogue state accordingly. We implemented the query tagging using a simple dictionary-based approach. This involved creating an inverted index from words to semantic tag in Lucene from the product inventory of our Indian website. Individual word lists for each tag were indexed using Lucene. To assign the actual tags, we used a simple top-down, greedy approach that involves generating a set of shingles (n-grams or contiguous sequence of tokens) from the user input, which are then queried against the words to tag index. The top-down, greedy approach accounts for names with multiple tokens, e.g. “Hugo Boss”. For example, the user input above becomes:
  • “I’d like some black adidas trainers”
  • “I’d like some black adidas”
  • “like some black adidas trainers”
  • …..
  • “black”
  • “adidas”
  • “trainers
The resulting output of query tagging for the above input is:
  • brands : [adidas]
  • categories : [trainers]
  • colours : [black]

Dialogue State Management

The dialogue state management implemented for Lara is implemented in just a few lines of Javascript. A fixed set of questions can be asked of the user that relate to filters already available on the website. To introduce variation and make the interaction less mundane, the phrasing of each question is selected at random from a set of predefined canned text templates. Clarification responses are also provided where user input cannot be understood (no semantic tags could be applied). This component of the prototype would be the main focus of any further development.

Conclusion

To our surprise we were able to build an MVP for a conversational agent that demonstrated a natural language interface for product navigation on one of our websites in a matter of days. This took quite some effort by a small three person team, but proved to us the utility of having a cross functional team at a hackathon: the Graphics, Product Management and Engineering departments were all represented. We hope to devote more time to developing the Hi Lara! concept in the future, so watch this space! P.S. If you’re wondering where the name Lara came from, there’s no clever acronym behind it, just easy to say and remember!  
Category:
  Tech Corner
this post was shared 0 times
 000
Ross Turner
About

 Ross Turner

  (1 articles)

Ross Turner joined Visual Meta in late 2015. Previously he worked in a number of different companies developing software solutions in the search and language technology space. Today, he is a Senior Product Manager leading the Cheetahs engineering team

Leave a Reply

Your email address will not be published.