DataSupervised Topic Modeling Using Hierarchical Dirichlet Process-Based Inverse Regression: Experiments on E-Commerce Applications

Tags: , , ,


The main aim of the project is to generate flexible number of topics for predicting the response of interest on the items (product) over the multiple domains based on the user reviews.

Proposed system:

We propose a novel supervised topic model called Hierarchical Dirichlet Process-based Inverse Regression (HDP-IR). Specifically, the Hierarchical Dirichlet Process (HDP) is a nonparametric topic modeling technique that allows for a flexible number of topics. Inverse Regression (IR) is a sufficient dimension reduction (SDR) technique that makes predictions with provably sufficient information. First, HDP-IR combines the advantages of both nonparametric topic modeling and inverse regression, HDP-IR avoids the model selection complications and can capture the uncertainty regarding the number of topics HDP-IR provides a SDR for each document, which can improve the predictive performance. Second, we design a scalable variation inference algorithm for fitting HDP-IR such that it can be applied to large-scale corpora (hundreds of thousands or millions of documents).

Following prior STM literature, we design HDP-IR under a hierarchical Bayesian modeling framework. The user should be adding the reviews of the item based on their intension. Then we collect the data in unstructured datasets over the multiple domains and apply the Nlp (Natural Language Processing) to identify the similar kinds of reviews on the products. Then apply the collaborative filtering technique for identifying the suitable topics for specific item in various domains based on the user reviews and sort the items.