The false positive next to the missed fraud. The one the model caught is the one that mattered least.
I trained a LightGBM model on the public Kaggle credit card fraud dataset. 284,807 transactions. The fraud rate was 0.17%, a 577 to 1 class imbalance. After tuning, it caught 84% of fraud while flagging just 0.28% of all transactions. ROC-AUC over 0.95. PR-AUC over 0.90.
Then I asked a different question. Not “is the model good” but “what does this cost a real payments business.”
At a million transactions a month, 0.28% is 2,800 flagged transactions. A handful are real fraud. The rest are good customers at checkout watching their card decline.
The model is working. The business is bleeding.
A blocked legitimate customer costs you their acquisition spend plus their lifetime value. A chargeback costs you the disputed amount plus a fee. For a fintech, that means 100 to 300 dollars of CAC on the line versus 50 to 100 dollars on a typical chargeback. LTV widens the gap. The math rarely favors blocking aggressively.
The asymmetry compounds over time. A chargeback is a single event with a known dollar value. A blocked customer is a lifetime of revenue gone. They tell people. They never come back. The cost lands in marketing’s CAC report two quarters later, far from the fraud dashboard that created it.
This is one reason chasing 95% accuracy is the wrong fight. The accuracy number hides the asymmetry.

PR curve from the portfolio model. Each point is a threshold. Each threshold is a different business outcome.
A precision-recall curve shows what you give up at each threshold. Move the threshold down, you catch more fraud and flag more good customers. Move it up, you flag fewer transactions but miss more fraud. The curve is the whole conversation.
On the Kaggle dataset I worked with, the model hit 84% recall at a 0.28% overall alert rate. Push recall to 90% and the alert rate climbs. Push it to 95% and you are flagging closer to 1.5% of all traffic. At a million transactions a month, the difference between 0.28% and 1.5% is roughly 12,200 extra customers held up at checkout.
That is the trade. Not “is the model accurate” but “how many good customers can you afford to block to catch a few more fraudsters.”
The full LightGBM model and PR analysis are on my GitHub, and the full walkthrough of the model behind these numbers covers the imbalanced training setup.
Risk teams report up to compliance or to a CFO worried about fraud loss. Their dashboards track fraud caught and chargebacks prevented. Those are the numbers the board sees.
Nobody on the risk side owns the count of good customers blocked. Marketing might track it as declined checkout abandonment, and support sees it as card-declined tickets. The growth team watches CAC creep up and blames the ad channel.
The data exists. It just sits in different tools owned by different people. So the threshold moves one way only: up. The team paying the cost has no hand on the lever.
This is the gap most fraud detection conversations miss. The model is rarely the problem. The org chart is.
If your risk team can quote fraud caught but can’t quote good customers blocked, that’s the gap. Book a call.
You weight each error by what it costs your business.
Define two costs. Let C_fp be the cost of one false positive (blocked good customer) and C_fn the cost of one false negative (missed fraud). For each candidate threshold, the model produces a false positive rate and a false negative rate on a held-out set. Total expected loss per transaction is:
Total cost = C_fp * FPR * P(legit) + C_fn * FNR * P(fraud)
You pick the threshold that minimizes the total cost.

Two cost columns. One slider. The optimum is where total cost is lowest, not where recall is highest.
Here is the math with real numbers. Suppose you process one million transactions a month at a 0.17% fraud rate. Average fraud transaction is 150 dollars. Average blocked customer costs you 80 dollars in lost LTV plus support time.
At threshold A: FPR is 0.28%, FNR is 16%. Expected monthly cost is 2,795 blocked customers times 80 dollars plus 272 missed fraud transactions times 150 dollars. That is 223,600 plus 40,800. Total: 264,400 dollars.
At threshold B (more lenient): FPR is 0.1%, FNR is 25%. Expected monthly cost is 998 blocked times 80 plus 425 missed times 150. That is 79,840 plus 63,750. Total: 143,590 dollars.
Threshold B costs less. The model did not change.
Here is the snippet that does the search:
import numpy as np
from sklearn.metrics import precision_recall_curve
# Probabilities from your trained LightGBM model on a held-out set
y_true = np.array([...]) # 0 = legitimate, 1 = fraud
y_scores = np.array([...]) # model.predict_proba(X_test)[:, 1]
# Business costs (set these from your finance and growth teams)
cost_fp = 80 # blocked good customer: LTV loss plus support time
cost_fn = 150 # missed fraud transaction: chargeback plus fees
_, _, thresholds = precision_recall_curve(y_true, y_scores)
best_t, best_cost = None, float("inf")
for t in thresholds:
preds = (y_scores >= t).astype(int)
fp = ((preds == 1) & (y_true == 0)).sum()
fn = ((preds == 0) & (y_true == 1)).sum()
total_cost = fp * cost_fp + fn * cost_fn
if total_cost < best_cost:
best_cost, best_t = total_cost, t
print(f"Optimal threshold: {best_t:.4f}")
print(f"Expected monthly cost at this threshold: ${best_cost:,.0f}")
Run this on a representative held-out set. Update cost_fp and cost_fn as your business changes.
The cost weights change with your business model. A bank pays the chargeback. A marketplace eats some of it. A payment processor charges the merchant and walks away. A B2B SaaS fintech might rarely see fraud at all but cares deeply about onboarding friction. Each one has a different optimal threshold on the same model.

Same model, four businesses, four cutoffs. The tolerance is the lever, not the algorithm.
| Business type | Fraud loss tolerance | Customer experience priority | Typical FPR target |
|---|---|---|---|
| High-volume marketplace | Higher (1 to 2% of GMV) | High (repeat purchase economy) | 0.5% to 1.5% |
| Consumer fintech app | Low (regulator scrutiny) | High (acquisition is expensive) | Under 0.3% |
| Card-issuing bank | Very low (liability) | Medium (sticky customers) | 0.3% to 0.5% |
| B2B payment processor | Moderate (merchant absorbs cost) | Medium (contract-driven) | 0.5% to 1% |
The takeaway is not a number. It is that the same LightGBM model works for all four with a different cutoff. If your fraud team is using a default threshold across all customer segments, ask why.
Four questions. Each one isolates a piece of the gap.
If you would rather have someone tune this for you, I work on this with founders directly.
Running a payments or marketplace platform and want a second opinion on your fraud threshold? I work with fintech founders on this exact problem. Book a 20-minute call.
A legitimate transaction the model flags as fraud. The customer gets declined or held up at checkout even though the purchase was real. False positives are the hidden cost of aggressive fraud thresholds.
Fraud datasets are heavily imbalanced. A model can score 99% accuracy by predicting everything as not-fraud. False positive rate captures how often the model wrongly blocks legitimate customers, which is the metric a payment platform actually feels.
It depends on your fraud loss tolerance and customer acquisition cost. A marketplace can usually tolerate 0.5 to 2% false positives, while a high-trust fintech app may need under 0.3%. There is no universal target.
Tune the threshold, not the model. Most teams overweight model accuracy and underweight threshold selection. A 5% recall reduction at the right threshold often saves more in retained customers than it costs in missed fraud.
Three numbers at minimum: recall on fraud, false positive rate, and dollar value of blocked legitimate transactions per million processed. The third one is the metric most teams do not track.
Neither alone. The right answer is the cost-weighted blend. If a missed fraud costs 50 USD and a blocked customer costs 200 USD in lifetime value, precision wins. If the ratio inverts, recall wins.