Churn Prediction with Logistic Regression, Choice Models & NLP

Posted on Jan 18, 2026 in Commerce

Predict P(churn = 1) via logit link → ln(p/(1-p)) = β₀ + βx

Metric	Meaning	Trade-off / Exam Tip
Precision = TP/(TP+FP)	How accurate are my churn flags?	Higher precision → fewer false alarms
Recall = TP/(TP+FN)	How many real churners are caught?	Higher recall → catch more churners
AUC	Probability the model ranks a churner higher than a non‑churner	0.5 = random, 1 = perfect
ROC Curve	TPR (Recall) vs FPR (1 − Specificity)	Curves that bow toward the top‑left are better
Threshold	0.3 → higher recall, lower precision; 0.7 → lower recall, higher precision	Choose threshold using business context

Rows = individual × alternative.
Example: 151 people × 3 routes = 453 rows.

Type	Example	Include Rule	Meaning
ASC	`asc_rural`, `asc_freeway`	Drop base (arterial)	Alternative specific constant: baseline bias vs base
ASV	`dist_arter`, `dist_rural`, `dist_freew`	Keep all	Route attributes
ISV	`male_rural`, `male_freew`	Drop base (`male_arter`)	Individual trait × route effect

Utility functions:
U_arterial = 0
U_rural = ASC_r + β₁ dist_r + β₂ vehage_r + β₃ male_r
U_freew = ASC_f + β₄ dist_f + β₅ vehage_f + β₆ male_f

P_j = exp(U_j) / Σ_k exp(U_k)

Interpretation:
ASC > 0 → preferred versus base; β < 0 (dist) → dislike long routes. Drop the base for ASC & ISV to avoid collinearity.

✅ Exam Checklist:
☑ Include ASCs (non‑base) ☑ Keep all ASVs ☑ Interact ISVs ☑ Drop 1 baseline ☑ Rows = ind × alts.

Pattern / Tool	Keeps	Loses	Use for
`\w+`	Letters, digits, underscore	Splits “don’t” → “don”, “t”	Keep numbers (e.g., prices, IDs)
`[A-Za-z']+`	Letters, apostrophe	Drops numbers	Keep contractions (don’t, I’m)
spaCy	Tokens + POS + lemma	—	Linguistic pipeline
`split()`	Whitespace only	Breaks on spaces	Quick, simple use

Trade‑offs:
Regex = rule‑based and fast ⚡ (no POS or lemma).
spaCy = linguistically aware 🧠 (gives tokenization, POS, and lemma in one call).

Feature	Stemming	Lemmatization
Rule	Cuts suffix (heuristic)	Uses dictionary + POS
POS‑aware	❌	✅
Example	“running” → run*	“running” → run
“better”	better	good
Speed	Fast	Slower
Accuracy	Low	High
Use for	Quick counting	Model inputs (TF‑IDF, sentiment)

Purpose: unify word variants → smaller vocabulary → better generalization.
Not compulsory but recommended for TF‑IDF and ML models.

TF: term frequency in a document
Raw = count. Sublinear = 1 + log(TF).

IDF: inverse document frequency
= log((1 + N) / (1 + df)) + 1

TF‑IDF = TF × IDF

Concept	Meaning
High TF, High IDF	Distinctive keyword
High TF, Low IDF	Common word (reduced weight)
Sublinear TF	Reduces gap between 100 vs 10 (repetition ≠ 10× meaning)
Normalization	Makes long and short documents comparable (L2 norm)
POS filter before TF‑IDF	Keep NOUN/VERB/ADJ → less noise

Structure: each person has multiple rows (one per alternative).

Interactions: vehage_freew = vehage × (alt == freeway)
Creates within‑person variation → required for ISVs.

✅ Drop base alternative for ASC & ISV (arterial).
✅ Keep all ASVs.

Trade‑off:
Preserve apostrophes for English contractions; decide whether numbers matter (years, prices).