At Visual Meta we are continuously syncing product feeds from our partners to ensure our online shopping portals are always up to date. This is a challenging task for us with over 100,000,000 items on the platform. To guarantee that products are easily discoverable on those portals we need to identify what type of product each feed item is by mapping it to our internal product catalogue. This entails applying a set of labels to each item that identifies the product.
We make extensive use of Machine Learning for this task. At Visual Meta we employ both image and text classification techniques. This post will focus on how we have applied a specific technique from the field of Natural Language Processing to boost the performance of our text classifiers.
An obvious, but perhaps naive approach to assigning product feed items to a product catalogue is to use string matching techniques. The challenge in using this approach is that the same items are not consistently named across shops, let alone use similar descriptions. For example, Figure 1 shows the item name length distribution in a product feed for 3 large shops.
Figure 1. Item Name Length Distributions Across Shops
A more robust approach is to use supervised machine learning techniques to classify items based on their name and description. At Visual Meta we make use of large labeled datasets and a number of different algorithms, to choose from among 25,000 potential labels that can be applied to an individual item. In order to utilise these algorithms an appropriate feature representation for an item is required.
A very common feature representation for classifying text documents is the bag of words model. In this model a document is represented as a vector where each dimension corresponds to a single word. For each product category and subcategory in our product catalogue, such as shoes or smartphone, we have a manually curated dictionary that contains words that represent that category.
For the task of classifying whether a given item belongs to a given category or not, a feature vector is built from the item name and description that consists of the count of occurrences of each dictionary word in the item text. The training data consists of feature vector, label pairs as shown in Figure 2.
Figure 2. A Feature Vector Label Pair
There are some well known limitations of bag of words models, in particular:
An increasingly more common representation for representing words are neural word embeddings. This approach represents single words using low dimensional continuous vectors that can be learnt using a shallow neural network. One such implementation is Word2Vec, which comes in two flavours:
A schematic of both models is shown in Figure 3. In comparison to bag of words models Word2Vec offers the following advantages:
Robust open source implementations of Word2Vec such as https://spark.apache.org/docs/latest/mllib-feature-extraction.html#word2vec and
https://radimrehurek.com/gensim/models/word2vec.html are readily available.
Figure 3. Word2Vec Models¹
To demonstrate the utility of Word2Vec trained on a product feed of mobile phones with 7,890 items and a total of 863,000 tokens (41.5k unique), Table 1 shows the top 3 neighbouring words to “Galaxy” using this model, while Figure 4 shows a 2 dimensional projection of brand words “Apple”, “Samsung” and their nearest neighbours using Principal Components Analysis.
Table 1: Closest Word to “Galaxy” with Word2Vec
Our main interest in applying Word2Vec to our product feeds was to test the effect of replacing our bag of words feature vectors with word embeddings. In particular, our hope was that we would see significant improvement in classification performance for product categories where we had less data. Table 2 compares the Fscore of using the Spark implementation of Word2Vec and decision tree with our best bag of words classifier (based on cross validation) for a small selection of product categories. As can be seen, we saw significant improvements for categories where existing performance was low across languages.
Figure 4. Word Clustering with Word2Vec
|Class||Best BOW Classifier||Spark Decision Tree with Word2Vec|
|“Bett mit Schubladen”||0.52||0.65||0.29||0.70||0.81||0.62|
Table 2: Classification Results with Word2Vec
Word2Vec is a powerful model for representing the semantics of words than can overcome the shortcomings of bag of words models. We have applied Word2Vec to the problem of classifying items in product feeds with promising results. Due to the unsupervised nature of learning word embeddings and their ability to represent relationships between words, we see great potential in further utilising them.