Kamelia Aryafar is part of the Data Science team at creative marketplace Etsy. At AirBNB’s OpenAir Tech Talks, she described how the uniqueness of products on offer mean that approaches to machine learning are tailored to each individual Product team.
Invariably, handmade items on Etsy are one of a kind, so how do you figure out how a recommender system is going to serve up relevant content?
Whilst part of the solution is derived from community by both open sourcing their work and also using a form of clustering through user scenarios and tastes, the cultural enlightenment at Etsy is also driven by pollinating engineering teams with Data Scientists. This sees the implementation of algorithms as a team effort and spreads knowledge of machine learning techniques rather than having an almost mutually exclusive Data Science competency. Aryafar says,
The Data Science team at Etsy is seven people right now and we are also embedded with different Product teams so we are not just focussing on research and developing mathematical models but we also implement these models ourselves, so we pair up with different teams so for example I pair up with security, marketing and membership and try to develop machine learning models that can help these teams and pair up with them to serve recommendations throughout the site.
This helps create an environment where people can actually learn machine learning and adopt the tools that we develop and apply them to their own needs, so the security team has been a very good example of using Conjecture, our open source library for detection of fraud.
Etsy use matrix factorization heavily in order to serve up recommendations. This comprises using lower dimensional vectors to represent preferences based on what users have liked in the past and making recommendation based on the user vector and item vector with specific features. Aryafar explains that the goal of matrix factorization is to come up with the decomposition of what would otherwise be an enormous matrix of all binary classified items into two different representations so that they can go back and fill out the matrix more to get a ranking of the items the user is actually interested in.
To help avoid the computational workload from serving recommendation from enormous matrices for each of the more than 50 million users, Etsy use locality sensitive hashing to pool similar items into buckets which assumes that users will fall into the same bucket if they have similar tastes.
Etsy also use Latent Dirichlet Allocation to generate style profiles for all items and builds a distribution for each user amongst the style profile. Aryafar says that ‘LDA essentially gives you these style profiles and it also gives you the distribution of users preference in terms of these style profiles’. These distributions then allow users to be positioned near users who have a similar style profiles to serve recommendations.
The value of having a data science team for recommender systems at Etsy is tangible through comprehensive A/B testing and in summary Aryafar says that personalized content at Etsy has been found to engage users and enhance business metrics.
For further reading on personalized recommendations at Etsy, read Rob Hall’s blog post on Code as Craft.