Enrich Customer Experience with Deep Learning
Project Summary
The last couple of years have seen massive advancement in deep learning across many tasks such as computer vision (CV), natural language processing (NLP) and speech recognition. This advancement can be observed by end users across various online platforms, one such platform being the e-commerce domain where giants like Amazon are providing users voice assistants, personalized recommendations and efficient product search options. The advancement in the field of deep learning has been catalyzed by the availability of enormous annotated datasets like the Wikipedia corpora in various languages and the Imagenet dataset. However, there have not been appropriate amount of pre-processed datasets in the e-commerce domain that are available for research. With the customer being the most significant part on any e-commerce platform, there is a rising need of natural language and computer vision enabled applications to improve user experience and increase organizational benefits.
In order to overcome the shortage of publicly available datasets in the e-commerce domain in both textual and visual form, we propose the use of domain adaptation to leverage existing advancements in approaches like transfer learning and multi-task learning by using state of the art techniques. Domain adaptation exploits task-independent commonalities and overcomes the problem of dataset shortage [3], especially in the e-commerce domain. Through this work, we have tried to improve product catalog quality by predicting missing product information such as category, color, brand and target gender on the e-commerce platform thus enabling efficient product search and improving user experience on the platform.
As part of the dataset generation phase, we have created three different e-commerce dataset in languages including English, German and French for text based problems and English, German and Italian for image based problems. The dataset has been used to predict missing product information using deep learning approaches like transfer learning and multi-task learning. We have also compared single task approaches for image classification tasks with transfer learning and discussed benefits. In the natural language processing front, we have compared single task learning with both transfer learning and multi-task learning. We observed that for image classification tasks, single task is on equal footing with transfer learning however the latter is trained and implemented in less than half the time invested in training a deep learning model from scratch. For text classification the text corpora was trained on a state-of-the-art deep learning model, the Transformer. In addition, we compared two types of domain adaptation techniques, transfer learning and multi-task learning and found that both approaches are on an equal footing in terms of accuracy. We show that multi-task and transfer learning is advisable in situations where training data is sparse through experiments in which a jointly trained transformer is able to outperform a single-task trained transformer.
After the predictions, we conducted a survey to see if including the predicted features in the product detail pages helps online customers in making buying decisions. Majority of the respondents prefer the predicted features to be included on the product detail page. Hence, suggesting that the predictions made through transfer learning and multi-task learning are useful and applicable in the e-commerce domain to enhance user experience.
Through this project, we show how domain adaptation techniques outperform single task learning for text based datasets in terms of accuracy and f1-score and converges way faster for image classification tasks using the e-commerce datasets. These techniques are better options when dealing with dataset shortage, imbalanced classes and in cases where we do not want to train a model from scratch for a prolonged period of time.
Motivation:
Artificial Intelligence grew massively after surviving a stagnation known as the AI winter in the 1970s with the rise of ’expert systems’ in the 1980s followed by Deep Blue, when the first computer chess-playing system defeated a reigning world chess champion, Garry Kasparov. With the start of the 21st century, larger amounts of data known as ’big data’ and faster computers enabled Machine Learning to be applied in various sectors. McKinsey Global Institute estimated in their famous paper "Big data: The next frontier for innovation, competition, and productivity", "by 2009, nearly all sectors in the US economy had at least an average of 200 terabytes of stored data". By 2016, AI related products including hardware and software reached 8 billion dollars worldwide and New York Times reported that the interest in Artificial Intelligence had reached a ’frenzy’.
A branch of Machine Learning called Deep Learning emerged and gained popularity during this time due to its application in various fields including computer vision, natural language processing, medical applications, robotics and speech recognition. Deep Learning involves the application of artificial neural networks (ANN) that are stacked together to form a hierarchical structure of interconnected components that learn from huge datasets over time. Though Deep Learning gained popularity very recently, ANNs have been in the pictures for decades. The first ANN was developed in 1962 and later the idea of backpropagation algorithm was introduced in 1986. Deeper and more complex architectures like Convolution Neural Networks (CNN) and Recurrent Neural Networks (RNN) came into existence in the 21st century. These architectures could provide solutions for image and text related tasks that were comparable to humans and even better at times. One of the key challenges in Deep Learning applications was hardware support available at the time. A major breakthrough was the introduction of Graphical Processing Units (GPU) by NVIDIA in 2009 that reduced implementation time from weeks to days and paved the path for more efficient and optimized algorithms using specialized hardware and approaches. This has led to enabling the renewal of interest in the field of artificial intelligence especially in deep learning in the last couple of years. The application of this field has grown leaps and bounds ever since, with organizations all over the world employing deep learning in image and text related tasks. The applications of these techniques are visible in the e-commerce domain as well with giants like Amazon introducing personalized recommendations on the platform and launching speech enabled devices that assist in intuitive tasks. The availability of huge amounts of data through e-commerce domains and the rising demands of e-commerce users, makes e-commerce a definitive fit for the application of Deep Learning and Artificial Intelligence as a whole. Breakthrough approaches like multi-task learning and transfer learning can provide huge benefits when dealing with challenges like limited annotated data in the e-commerce domain. Applying solutions implemented on source domain to target domain can result in massive performance gain. This is one of the key motivating factors behind this project. Deep Learning is currently at its boom and there is even more need and scope for growth and improvement. Our motivation lies in exploring current solutions and applying them to resolve existing problems of poor product catalog quality and increase traffic on the platform by providing satisfactory user experience.
Problem Statement:
E-commerce has grown massively over the recent years with retail e-commerce sales world- wide estimated to reach 4.88 trillion dollars by 2020 constituting to 14.6% of total retail spending. With the growth in the e-commerce industry, various e-commerce organizations are improving the e-commerce experience by employing technical advancements to enhance the end-user experience. The main focus of an e-commerce website is to attract customers and guide them to the right products. A lot of times, customers face problems in reaching to the desired product due to the low quality of the product catalog. natural language processing techniques and computer vision can be used in such scenarios to predict missing product information which can be useful to the customer in making buying decisions. The fashion department is affected the most by this as there is a diverse range of products available and each product is unique in its own attributes; be it the color, the style, the brand or the target gender the product is aimed at. If a product does not have these vital information displayed on the website, a customer would not want to buy it. The most important part of the product information on an e-commerce site is also the title of the product. The title of the product should be attractive yet precise enough for the customer to know if it is exactly what she wants. The faster she can get to the right product, the better the chance of her to buy it and come back to the website in the future. The vital attributes of the products can be predicted using the textual information displayed on the website or the image of the product through natural language processing and computer vision respectively. However, there are many challenges to this problem. Computer vision and natural language processing tasks employ Deep Learning methods. Deep Learning requires vast amount of training data in order to achieve desirable results in terms of performance. Currently, there are very limited e-commerce product datasets that are massive and publicly available for research. The scarcity of datasets both in the form of text and image is a big challenge for application and further developments of Deep Learning in the e-commerce domain. Moreover, the challenge lies in producing annotated datasets that can be used for the training of supervised tasks. The scarcity of massive annotated datasets is countered by employing specialized methods in Deep Learning like domain adaptation. We try to solve this problem by using transfer learning and multi-task learning techniques to overcome limited data problem as well as unbalanced dataset problem. Deep Learning requires state of the art computational tools and hardware as it involves a few million or even hundreds of millions of parameters that need to be initialized and updated. For effective and successful training, advanced hardware involving GPUs are required. A huge variety of advanced computational interfaces and tools are now available for use in the field of Deep Learning. Through this project, we will apply state of the art parallel training methods and GPU hardware for solving the tasks at hand.
Research Questions:
In the recent past, massive improvements in performance across many tasks have been made through deep learning techniques like transfer learning and multi-task learning. The focus of the project will be to compare training using multiple tasks concurrently and transfer learning with training on single tasks and to see if the former boosts performance and overcomes problems like imbalanced labels, limited labeled data, etc. Previous studies and research have shown that there are a lot of techniques and approaches that can be employed while using transfer learning and multi-task learning respectively. With the help of various literature reviews, ideal architecture and hyperparameter choices have been chosen for each of the techniques. More insights on which were chosen and why is provided in the later chapters.
After selecting state of the art techniques with best combination of architecture and hyperparameter choices, different experiments were conducted and evaluated in order to extract if there are benefits from using transfer learning and multi-task learning compared to single task learning. Primarily, the experiments conducted tried to gauge if multi-task learning and transfer learning can perform better compared to its counterparts and be useful in the e-commerce domain. Necessary steps for this objective are expressed through the research questions below:
Could multi-task learning and transfer learning perform better than single task learning on the Amazon Product Catalog dataset?
What architecture choices and hyperparameters shall we use in both multi-task learning and transfer learning to obtain good performance?
Can transfer learning and multi-task learning be useful in the e-commerce domain to enhance user-experience?