Topic Modeling of Reviews

Author
Affiliation

Group 19

Rutgers University, New Brunswick

C:\Users\16096\Documents\KAJAL\Semester2\Data Wrangling\Projects-HW\dwproj_new\Lib\site-packages\requests\__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.18) or chardet (3.0.4) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
[nltk_data] Downloading package punkt_tab to
[nltk_data]     C:\Users\16096\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
(126725, 8)
review review_datetime data_source app_name upvote_count total_comments app_rating sentiment
0 uber eats for owls? will they ever come out wi... 2025-04-20 21:51:15 Reddit UberEats 1.0 2.0 NaN Neutral
1 serious question yall is it worth going out to... 2025-04-20 21:41:21 Reddit UberEats 1.0 1.0 NaN Neutral
2 ubereats charged me for a successful chargebac... 2025-04-20 20:50:04 Reddit UberEats 1.0 2.0 NaN Negative
3 ubereats driver scammed me by buying half the ... 2025-04-20 20:48:13 Reddit UberEats 1.0 9.0 NaN Negative
4 ubereats why you do this? family went out of t... 2025-04-20 20:19:15 Reddit UberEats 1.0 3.0 NaN Negative
Found 3 unique apps: UberEats, DoorDash, GrubHub

Coherence Score vs Number of Topics

Code
for app in unique_apps:
    print(f"\nProcessing app: {app}")
    app_reviews = df[df['app_name'] == app]['review']

    processed_reviews = preprocess_reviews(app_reviews)
    #print(len(processed_reviews))
    coherence_scores, models = compute_coherence_values(processed_reviews, start=2, limit=20, step=1)
    plot_coherence_scores(coherence_scores, app_name=app)

    best_num_topics, best_score = max(coherence_scores, key=lambda x: x[1])
    print(f"Best number of topics for {app}: {best_num_topics} with coherence score {best_score:.4f}")

    for num, model, corpus, dictionary in models:
        if num == best_num_topics:
            best_model, best_corpus, best_dictionary = model, corpus, dictionary
            break

    visualize_topics_pyldavis(best_model, best_corpus, best_dictionary, app_name=app)

print("\n✅ All Apps Processed Successfully!")

Processing app: UberEats
Preprocessing reviews...
Completed preprocessing 64153 reviews.
Computing coherence scores...
Training LDA with 2 topics...
Training LDA with 3 topics...
Training LDA with 4 topics...
Training LDA with 5 topics...
Training LDA with 6 topics...
Training LDA with 7 topics...
Training LDA with 8 topics...
Training LDA with 9 topics...
Training LDA with 10 topics...
Training LDA with 11 topics...
Training LDA with 12 topics...
Training LDA with 13 topics...
Training LDA with 14 topics...
Training LDA with 15 topics...
Training LDA with 16 topics...
Training LDA with 17 topics...
Training LDA with 18 topics...
Training LDA with 19 topics...
Training LDA with 20 topics...
Completed coherence score calculation.

Best number of topics for UberEats: 15 with coherence score 0.5595
Preparing pyLDAvis visualization for UberEats...
Saved pyLDAvis visualization to UberEats_topics.html 🚀

Processing app: DoorDash
Preprocessing reviews...
Completed preprocessing 53719 reviews.
Computing coherence scores...
Training LDA with 2 topics...
Training LDA with 3 topics...
Training LDA with 4 topics...
Training LDA with 5 topics...
Training LDA with 6 topics...
Training LDA with 7 topics...
Training LDA with 8 topics...
Training LDA with 9 topics...
Training LDA with 10 topics...
Training LDA with 11 topics...
Training LDA with 12 topics...
Training LDA with 13 topics...
Training LDA with 14 topics...
Training LDA with 15 topics...
Training LDA with 16 topics...
Training LDA with 17 topics...
Training LDA with 18 topics...
Training LDA with 19 topics...
Training LDA with 20 topics...
Completed coherence score calculation.

Best number of topics for DoorDash: 6 with coherence score 0.5708
Preparing pyLDAvis visualization for DoorDash...
Saved pyLDAvis visualization to DoorDash_topics.html 🚀

Processing app: GrubHub
Preprocessing reviews...
Completed preprocessing 8844 reviews.
Computing coherence scores...
Training LDA with 2 topics...
Training LDA with 3 topics...
Training LDA with 4 topics...
Training LDA with 5 topics...
Training LDA with 6 topics...
Training LDA with 7 topics...
Training LDA with 8 topics...
Training LDA with 9 topics...
Training LDA with 10 topics...
Training LDA with 11 topics...
Training LDA with 12 topics...
Training LDA with 13 topics...
Training LDA with 14 topics...
Training LDA with 15 topics...
Training LDA with 16 topics...
Training LDA with 17 topics...
Training LDA with 18 topics...
Training LDA with 19 topics...
Training LDA with 20 topics...
Completed coherence score calculation.

Best number of topics for GrubHub: 4 with coherence score 0.5687
Preparing pyLDAvis visualization for GrubHub...
Saved pyLDAvis visualization to GrubHub_topics.html 🚀

✅ All Apps Processed Successfully!