Topic Modeling of Reviews

Author

Affiliation

Group 19

Rutgers University, New Brunswick

C:\Users\16096\Documents\KAJAL\Semester2\Data Wrangling\Projects-HW\dwproj_new\Lib\site-packages\requests\__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.18) or chardet (3.0.4) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
[nltk_data] Downloading package punkt_tab to
[nltk_data]     C:\Users\16096\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!

(126725, 8)

	review	review_datetime	data_source	app_name	upvote_count	total_comments	app_rating	sentiment
0	uber eats for owls? will they ever come out wi...	2025-04-20 21:51:15	Reddit	UberEats	1.0	2.0	NaN	Neutral
1	serious question yall is it worth going out to...	2025-04-20 21:41:21	Reddit	UberEats	1.0	1.0	NaN	Neutral
2	ubereats charged me for a successful chargebac...	2025-04-20 20:50:04	Reddit	UberEats	1.0	2.0	NaN	Negative
3	ubereats driver scammed me by buying half the ...	2025-04-20 20:48:13	Reddit	UberEats	1.0	9.0	NaN	Negative
4	ubereats why you do this? family went out of t...	2025-04-20 20:19:15	Reddit	UberEats	1.0	3.0	NaN	Negative

Found 3 unique apps: UberEats, DoorDash, GrubHub

Coherence Score vs Number of Topics

Code

for app in unique_apps:
    print(f"\nProcessing app: {app}")
    app_reviews = df[df['app_name'] == app]['review']

    processed_reviews = preprocess_reviews(app_reviews)
    #print(len(processed_reviews))
    coherence_scores, models = compute_coherence_values(processed_reviews, start=2, limit=20, step=1)
    plot_coherence_scores(coherence_scores, app_name=app)

    best_num_topics, best_score = max(coherence_scores, key=lambda x: x[1])
    print(f"Best number of topics for {app}: {best_num_topics} with coherence score {best_score:.4f}")

    for num, model, corpus, dictionary in models:
        if num == best_num_topics:
            best_model, best_corpus, best_dictionary = model, corpus, dictionary
            break

    visualize_topics_pyldavis(best_model, best_corpus, best_dictionary, app_name=app)

print("\n✅ All Apps Processed Successfully!")


Processing app: UberEats
Preprocessing reviews...
Completed preprocessing 64153 reviews.
Computing coherence scores...
Training LDA with 2 topics...
Training LDA with 3 topics...
Training LDA with 4 topics...
Training LDA with 5 topics...
Training LDA with 6 topics...
Training LDA with 7 topics...
Training LDA with 8 topics...
Training LDA with 9 topics...
Training LDA with 10 topics...
Training LDA with 11 topics...
Training LDA with 12 topics...
Training LDA with 13 topics...
Training LDA with 14 topics...
Training LDA with 15 topics...
Training LDA with 16 topics...
Training LDA with 17 topics...
Training LDA with 18 topics...
Training LDA with 19 topics...
Training LDA with 20 topics...
Completed coherence score calculation.

Best number of topics for UberEats: 15 with coherence score 0.5595
Preparing pyLDAvis visualization for UberEats...
Saved pyLDAvis visualization to UberEats_topics.html 🚀

Processing app: DoorDash
Preprocessing reviews...
Completed preprocessing 53719 reviews.
Computing coherence scores...
Training LDA with 2 topics...
Training LDA with 3 topics...
Training LDA with 4 topics...
Training LDA with 5 topics...
Training LDA with 6 topics...
Training LDA with 7 topics...
Training LDA with 8 topics...
Training LDA with 9 topics...
Training LDA with 10 topics...
Training LDA with 11 topics...
Training LDA with 12 topics...
Training LDA with 13 topics...
Training LDA with 14 topics...
Training LDA with 15 topics...
Training LDA with 16 topics...
Training LDA with 17 topics...
Training LDA with 18 topics...
Training LDA with 19 topics...
Training LDA with 20 topics...
Completed coherence score calculation.

Best number of topics for DoorDash: 6 with coherence score 0.5708
Preparing pyLDAvis visualization for DoorDash...
Saved pyLDAvis visualization to DoorDash_topics.html 🚀

Processing app: GrubHub
Preprocessing reviews...
Completed preprocessing 8844 reviews.
Computing coherence scores...
Training LDA with 2 topics...
Training LDA with 3 topics...
Training LDA with 4 topics...
Training LDA with 5 topics...
Training LDA with 6 topics...
Training LDA with 7 topics...
Training LDA with 8 topics...
Training LDA with 9 topics...
Training LDA with 10 topics...
Training LDA with 11 topics...
Training LDA with 12 topics...
Training LDA with 13 topics...
Training LDA with 14 topics...
Training LDA with 15 topics...
Training LDA with 16 topics...
Training LDA with 17 topics...
Training LDA with 18 topics...
Training LDA with 19 topics...
Training LDA with 20 topics...
Completed coherence score calculation.

Best number of topics for GrubHub: 4 with coherence score 0.5687
Preparing pyLDAvis visualization for GrubHub...
Saved pyLDAvis visualization to GrubHub_topics.html 🚀

✅ All Apps Processed Successfully!