Altman Z-Score vs Machine Learning: Which Predicts Bankruptcy Better?

The Altman Z-Score is the most widely cited corporate distress model in finance. Published by Edward Altman in 1968, it has been taught in business schools for over 50 years and remains a standard reference in credit analysis.

It also has well-documented limitations when applied to the modern corporate landscape — limitations that gradient-boosted machine learning models are specifically designed to address.

Our model achieves meaningfully higher out-of-sample accuracy than the Z-Score on equivalent validation data. Here is an objective breakdown of why, and what it means in practice.

What the Altman Z-Score Actually Measures

The Z-Score combines five accounting ratios into a single number using coefficients calibrated in 1968 from a sample of 66 manufacturing companies — roughly equal numbers bankrupt and solvent. The resulting formula weights working capital efficiency, retained earnings, operating profitability, market-to-debt valuation, and asset turnover.

The output produces a zone classification — safe, grey, or distress — based on where the composite score falls relative to fixed thresholds.

For 1968, this was a genuine methodological advance. It brought quantitative rigour to what had previously been largely qualitative credit judgements.

The Problems With Altman Z-Score in 2026

1. It was calibrated on a narrow historical sample

The original dataset was 66 companies — all manufacturing, all from the late 1960s. Capital structures, business models, and default dynamics have changed fundamentally in the intervening decades. Technology companies, platform businesses, asset-light service models, and highly leveraged private equity-backed entities behave in ways that the original sample simply does not reflect.

Altman himself published modified versions for non-manufacturing and private companies, but the core coefficients remain anchored to that original context.

2. It assumes linear relationships

The Z-Score treats the relationship between each ratio and default probability as linear and additive. In practice, financial distress is highly non-linear. Small differences in debt serviceability at critical thresholds represent dramatically different risk profiles — differences that a linear additive model cannot adequately capture.

3. It ignores market-derived signals

The Z-Score uses market capitalisation as one input but does not incorporate short interest, implied volatility, or credit market indicators — signals that frequently lead accounting disclosures by several months. A model that ignores what informed market participants are saying is leaving a significant source of forward-looking information on the table.

4. It misses interaction effects

The most dangerous companies are rarely just high-leverage in isolation. They are high-leverage and experiencing revenue deterioration and facing a near-term debt maturity and attracting elevated short interest simultaneously. The combination and interaction of signals matters enormously — interactions that a linear additive formula cannot model by design.

How Machine Learning Addresses These Limitations

Our model was trained on hundreds of corporate defaults spanning 1998 to 2025 — covering multiple economic cycles, interest rate regimes, sector rotations, and market environments. This breadth allows the model to learn default patterns across contexts that the original Z-Score dataset could not anticipate.

Key structural advantages:

Non-linearity: Gradient-boosted decision trees naturally capture non-linear relationships and threshold effects without requiring manual specification
Signal breadth: The model incorporates a wider set of inputs including market-implied signals that lead accounting data
Interaction learning: The tree architecture automatically discovers which combinations of signals are most predictive without manual feature engineering
Continuous retraining: The model is updated as new filings are ingested, rather than relying on coefficients fixed decades ago

On out-of-sample validation covering the period 2018–2025, the performance gap relative to the Z-Score is substantial — particularly on the false positive rate, which matters in practice because too many false alarms destroy analytical credibility and create noise.

Performance Comparison

Out-of-Sample Model Performance (2018–2025 Holdout)

ROC-AUC score — higher is better. Both models evaluated on identical out-of-sample validation data.

Altman Z-Score benchmark based on published out-of-sample studies on modern corporate datasets.

A Concrete Example: Hertz 2019

Running both approaches against Hertz’s Q4 2019 financials illustrates the difference clearly.

The Altman Z-Score placed Hertz firmly in the distress zone — which is directionally correct, but the Z-Score had been flagging Hertz as distressed for several years prior to the filing, making it difficult to determine whether this was an imminent risk or a chronic condition.

Our model assigned Hertz a high danger score in Q4 2019, flagging it approximately 12 months before its Chapter 11 filing. Crucially, the model also surfaces the primary drivers behind the score — not just a composite number, but an interpretable signal breakdown showing which factors are most responsible for the elevated risk. This allows analysts to validate the model’s reasoning against their own judgement rather than treating the output as a black box.

When Altman Z-Score Is Still Useful

The Z-Score remains a reasonable first-pass screen and retains practical value because it is:

Simple to calculate from any publicly available balance sheet
Widely understood and accepted in credit memoranda and investment committee presentations
Free to compute, with no proprietary data requirements
A credible benchmark for contextualising ML outputs

We report the Altman Z-Score alongside our danger scores in every Distress Report — not because it outperforms, but because it is a recognisable reference point that provides useful context for the primary signal.

Conclusion

The Altman Z-Score is a remarkable achievement for a model designed with 1968 data and 1968 computational tools. It advanced credit analysis in a meaningful way and its continued relevance is a testament to the underlying insight behind its construction.

But decades of accumulated default data, modern machine learning architectures, and access to market signals that did not exist in 1968 have fundamentally shifted what is achievable in systematic credit screening. For institutions monitoring large universes of companies on a frequent basis, modern gradient-boosted classifiers represent the current state of the art.

See our full methodology → · Try DistressSignal →

For educational and research purposes only. Not financial advice. Model performance on historical data is not indicative of future results.