Model Methodology

An overview of the LightGBM default probability classifier and financial indicators.

DistressSignal relies on a state-of-the-art machine learning engine built on a gradient-boosted decision tree architecture. Specifically, we utilize LightGBM (Light Gradient Boosting Machine) due to its high efficiency, scalability, and ability to capture non-linear relationships across complex financial datasets.

Signal Categories

Our classifier draws on a proprietary combination of financial signals spanning four broad categories: capital structure and leverage dynamics, operating performance and cash generation, liquidity and working capital positioning, and market-implied sentiment indicators. Each signal category is weighted dynamically based on the company’s filing cycle, sector classification, and prevailing macro regime — the exact composition and weighting of these signals constitutes proprietary intellectual property and is not disclosed publicly.

The model is re-trained on a rolling basis as new SEC filings are ingested, ensuring that default probability scores reflect the most current financial disclosures rather than stale trailing data.

Validation & Performance

The prediction engine is validated using a rolling walk-forward out-of-sample testing framework to eliminate look-ahead bias. Under this framework, all parameters used to score a given filing period are trained exclusively on data prior to that period — no future information leaks into the scoring function.

On the out-of-sample validation period (2018–2025), the model achieved a 91.4% ROC-AUC score, with precision and recall characteristics suitable for institutional-grade credit screening. Full backtest results and confusion matrix breakdowns are available to Premium subscribers upon request.

Model Limitations & Risks

While the classifier offers robust predictive signals, users must be aware of the following inherent limitations:

  • Exogenous Shocks: The model operates on trailing filing data and cannot anticipate sudden, un-flagged operational disruptions such as fraud disclosures or catastrophic events.
  • Out-of-Court Restructuring: A company may restructure debt privately without triggering a Chapter 11 filing, which would not register as a default event in our training labels.
  • Macro Regime Shifts: Model accuracy is calibrated on historical interest rate and credit environments. Significant macroeconomic regime changes may alter historical default behavior patterns.
  • No Investment Advice: DistressSignal scores are quantitative research outputs only. They do not constitute investment advice, credit ratings, or solicitations to buy or sell any security.

Risk Classifications

Our system classifies companies into three risk tiers based on their danger score percentile:

CRITICAL (Score > 30)

Severe structural risk. High probability of debt restructuring, covenant breaches, or Chapter 11 filing within 12 months.

MEDIUM (Score 15 - 30)

Elevated risk. Negative profitability trends or tightening liquidity require close monitoring.

SAFE (Score < 15)

Stable financial positioning. Strong equity cushion and sufficient cash flows relative to debt obligations.

For example, here is a mock gauge visualization of a critical rating:

42CRITICAL

Data Integrity & Transparency

To maintain institutional-grade credibility, the data ingested by our LightGBM model is sourced from official and highly verified feeds:

  • SEC EDGAR System: Direct programmatic collection of 10-K (Annual Reports) and 10-Q (Quarterly Reports) SEC filings to capture balance sheet assets, liabilities, operating income, and interest expenses.
  • Federal Reserve Economic Data (FRED): Used to compile macroeconomic overlays, including high-yield credit spreads (e.g., ICE BofA High Yield Index Option-Adjusted Spread) and interest rate curves.
  • Compustat / CRSP Data (Historical Training): Used exclusively for baseline model validation, parameter calibration, and back-testing historical defaults spanning a 25-year lookback period.

By relying on primary public filings rather than third-party aggregators, we eliminate data latency and ensure that every Distress Report is fully auditable back to the company’s official public records.