4 Conclusion

Evaluating the performance our models in comparison to the source data, we find that our selected features are likely weak in the prediction of a feedback comment being useful. There are substantial issues with the strength of our selected methods and metrics for both the training and testing data.

We attempted multiple iterations of examining our data, examining models and how to adjust them, and how to optimize performance. Ultimately, our work did not deliver a system that could definitively be used to predict usefulness of a review.

Despite that, we were able to explore and answer several of our planned research questions. This study can prove to be a basis of further examinations into how to better classify sentiment and assess the usefulness of reviews based on customer experience.

4.1 Review of Research Questions

At the start of this effort, we set out to answer the following questions:

Can the model from Guha Majumder, Dutta Gupta, and Paul (2022) be generalized with
- larger volume of products and product types from which to mine data?
  - We find that a larger mix of products across multiple platforms does support a wider-reaching and more generalizable model. Their model, with some adaptations, was able to identify several subjectively useful comments. However, the methods we used may not be fully appropriate or meet sufficient performance thresholds to establish confidence in our selected models.
  - We ascertained that the Linear Regression model used in Guha Majumder, Dutta Gupta, and Paul (2022) could not capture the underlying patterns in our case from the larger volume of products. It is found that the results from the tuned Logistic Regression and SVM-Poly models which have credible scores up to a certain threshold. These models were able to capture insights from the large data which was not seen from the logistic regression model
- a sliding scalar multiplier representing the degree to which a product is a “search” (0) or “experience” (1) product?
  
  Leveraging a proxy, the subjectivity of a products description, as a means of measurement for a search or experience product may serve as a useful strategy in future research. This approach provides a useful way to comprehend and classify products across the consumer goods spectrum, facilitating more perceptive analysis and strategic decision-making across a range of industries.
- Adding modifiers to review content based upon:
  - Customer / Reviewer reliability and reputation?
  This question could not be answered from our analysis as for all the websites - Amazon, Target and BestBuy, only a verified buyer was allowed to review a product. Due to this factor, we were forced to accept that every reviewer was reliable. As going around this was to go through each and every review looking for jargon content which was not feasible.
  - Review Polarity?
  Positive, negative and neutral reviews all offer insightful information about the varying subjective experiences that people have with a product. Positive reviews are classified by positive attitudes and recommendations, and they are usually linked to products that provide remarkable experiences or successfully meet particular needs. On the other hand, unfavorable reviews could indicate shortcomings, contradictions, or mismatches between what customers expect from a product and how well it performs.
Can the polarity of reviews be judged accurately by using a Naive Bayes classification model? Hu, Gong, and Guo (2010)
- What is the impact of different feature extraction methods (e.g., bag-of-words, TF-IDF) on the performance of Naive Bayes classification model? Wang et al. (2018)
  
  While formulating the problem statement, we listed using Naive Bayes classification for possible solutions but did not explore them in our analysis due to personnel and time resource constraints.
Can products be classified on their degree of being search or experience based by examining product variables such as:
- Degree of specificity in the product description? (e.g. level of detail, length, numeric values, descriptive values may suggest the product is more search than it is experience-based)
  - Products with extremely detailed descriptions typically offer precise and comprehensive details about their characteristics, features, and functionalities. Precise measurements, technical details, and explicit information about the attributes of the product are frequently included in these descriptions. The focus on specificity makes it easier for customers to look for and assess products according to their own requirements, tastes, and standards. Consequently, goods that have extremely detailed descriptions are usually linked to the “search” category, as consumers can readily find and assess them based on objective criteria.
  - So indeed, depending on factors like the level of specificity in the product description, products can be categorized as “search” or “experience” in varying degrees.
- Whether the product is offered in brand-new condition only, or offered as new, used, or refurbished? (e.g. refurbished products may be more search products than they are experience products)
  
  While we lacked comprehensive data on this aspect, generally, products offered in brand-new condition may incline more towards the search-based category, whereas items available in used or refurbished conditions could incline towards being more experience-based due to the potential history of priduct use associated with them.
- Which of the 5 senses the product engages? (e.g. engagement of more senses, or engagement of solely specific senses like hearing and vision may suggest more experience-based than search based; examine relationship between search and experience vs. senses engaged)
  
  While the engagement of the senses is undoubtedly a crucial aspect of the consumer experience, it was not within the scope of our current investigation. However, examining the relationship between the engagement of the senses and the classification of products as “search” or “experience” goods could yield valuable insights for future classification efforts.
- Item rarity (limited production or unique items vs. bulk-produced items)? (e.g. limited production products may be more experience-based than search-based)
  
  Most of the items in our research were common household products i.e. bulk-produced items. We did not give sufficient research time to dig into limited production or unique items. Solely based on human psychology, it might be possible that limited production or unique items typically exhibit more experiential qualities, as they offer distinctive and potentially rare experiences compared to bulk-produced items.
Can newer natrual language processing libraries provide a better fit for Review Content metrics examined by Guha Majumder, Dutta Gupta, and Paul (2022)?

We found that there was no clear linear model from the data which could be an outcome of we how we processed sentiment as a numeric variable as opposed to a category. Based on this assumption, newer natural language processing libraries might provide an improved fit for analyzing review content metrics like sentiment. For instance, using these libraries could involve more nuanced sentiment analysis techniques that avoid treating sentiment solely as a numeric variable, potentially providing clearer insights into customer feedback.
How does sentiment in customer reviews correlate with customer satisfaction metrics or sales figures for a particular product?
Sentiment in customer reviews, characterized by higher star ratings, tends to correlate positively with customer satisfaction metrics. Higher star ratings often reflect stronger polarity and subjectivity in sentiment, indicating greater overall satisfaction of the customer and potentially influencing purchasing decisions for future buyers.
Can we categorize customer reviews based on customer experience and sentiment?

Our study results show that it is possible to categorize customer reviews based on customer experience and sentiment. One way of doing this is using machine learning algorithms like classifiers that inculcate subjectivity and polarity as model features. The results of the classifiers implemented in this project might not be robust but can act as a base for methodological adaptations to better classify sentiment and assess the usefulness of reviews based on customer experience.
Do specific product star ratings tend to incite more reviews, and if so, how does this impact the overall reputation measurement?

There was no distinct connection between star ratings and number of reviews for a given item. Similarly to our examination of review subjectivity and polarity, there seemed to be central points where our data coalesced, namely a lot of products in the 4-5 star range with relatively few reviews (only several hundred) and around the same star rating, an island of products that held in the tens of thousands of reviews. This area almost represents change in one variable irrespective of another, this suggested a degree of independence, and as such we did not pursue further research on the matter.
Are specific quality descriptors in text-based reviews (e.g., ‘enthusiastic’, ‘disappointed’) strongly associated with certain rating levels, and how does this association affect product reputation?

Our findings suggest a positive trend between star ratings and positive polarity in reviews. In review content, specific quality descriptors like ‘enthusiastic’ or ‘disappointed’ seem to be associated with certain rating levels. However, further analysis is needed to assess how these associations impact overall product reputation, including how sentiment and specific descriptors contribute to consumer perceptions and purchasing decisions.

4.2 Interesting Findings, In Spite of Model Performance

Despite what we perceive as weakness in the models, the degree of detail and specificity within some of the predicted “useful” comments is enlightening. Let’s have a look at a small selection of comments, near the top of the prediction list for our implemented models -

Figure 4.1: Comment on Dyson Vacuum Cleaner - predicted useful by Logistic Regression

A Dyson vacuum is an expensive product. Hearing of someone’s challenges, particularly an extreme case of having a vacuum melt while the customer was using it, may make one think twice about the investment. While appearing negative in sentiment, the commenter appears to provide a degree of objectivity to their review - and does talk about both the good and the bad of the product. These are all things we believe may be useful to a customer.

Figure 4.2: Comment on a Sofa - predicted useful by SVM

An approximately $400 sofa is also quite the investment, and knowing about issues with returns, issues with the product and its craftsmanship is likely a useful datapoint for a prospective customer.

Figure 4.3: Comment on a Pizza Oven - predicted useful by MLR

This customer looks like they had a good experience - so good in fact that they went online to a commerce platform that they didn’t even use to purchase the item, just so that they could post about their experience. Knowing the settings used, how quickly the oven managed to cook their food, and the ease of use likely all give utility to a prospective customer.

None of these reviews had any votes for the comment as being useful to other buyers. That being said - it’s all subjective in terms of determining utility.

There are several other comments within our results that fall along these lines. Further exploration and refinement of our research could produce better modeling and reliable results for customers.