The Beneish M-Score, Reimplemented for Korean IFRS - Writing

Source code → github.com/pon00050/kr-beneish

The Beneish M-Score is one of the oldest and most widely cited fraud-screening models in accounting. Beneish (1999) trained an 8-variable logit on a sample of US GAAP filers, derived discriminant coefficients, and proposed a threshold (M > -1.78) above which a company was statistically more likely to have manipulated earnings. It is taught in every forensic accounting course. It is included in every off-the-shelf risk-screening library.

It also assumes things about the financial statements that Korean IFRS filings do not always provide. Apply the formula naively to a DART download and you will silently produce wrong scores for roughly a fifth of KOSDAQ.

This library is the version that handles the differences explicitly.

What the Score Actually Is

The M-Score is a weighted sum of eight ratios, each computed from one year of financials versus the prior year:

M = -4.84 + 0.920·DSRI + 0.528·GMI + 0.404·AQI + 0.892·SGI
         + 0.115·DEPI - 0.172·SGAI + 4.679·TATA - 0.327·LVGI

The components have economic interpretations. DSRI (Days Sales in Receivables Index) flags when receivables grow faster than revenue — a signature of revenue pull-forward. GMI (Gross Margin Index) flags margin deterioration — a motive for manipulation. AQI (Asset Quality Index) flags when soft assets (intangibles, capitalized expenses) grow as a share of the balance sheet. SGI (Sales Growth Index) flags rapid revenue growth — not fraud itself, but the motive condition. DEPI (Depreciation Index) flags slowing depreciation — possible useful-life extension to inflate income. SGAI (SG&A Index) flags when SG&A grows faster than sales — operational stress that creates manipulation pressure. TATA (Total Accruals to Total Assets) flags accrual-heavy income — the bookkeeping signature of earnings management. LVGI (Leverage Index) flags rising leverage — debt covenants as motive.

A high M-Score is not a fraud conclusion. It is a flag for human review.

Where Korean IFRS Breaks the Formula

Korean IFRS lets companies report operating expenses by function (revenue / COGS / SG&A — the GAAP-style layout) or by nature (revenue / employee benefits / depreciation / raw materials — the European layout). Roughly 19% of KOSDAQ companies file by nature.

For nature-of-expense filers, COGS and SG&A are not separately reported. They cannot be — the filing structure does not contain those line items. Two of the eight Beneish components (GMI, which depends on gross margin, and SGAI, which depends on the SG&A line) are structurally undefined.

A naive implementation either produces NaN scores for ~19% of KOSDAQ, drops those companies from the screen, or fills the missing values with zeros and produces nonsense. None of those is the right answer.

This library sets GMI and SGAI to 1.0 for nature-of-expense filers — a neutral value that contributes the component’s weighted constant to the score without injecting a directional signal. The remaining six components compute normally. The decision is documented and visible: the input requires a expense_method column that is either "function" or "nature", and the library branches on it explicitly. The user can override by mapping the values themselves before passing the DataFrame in.

The other Korean-specific adjustments are smaller but accumulate. Zero denominators — common in young companies with negligible prior-year revenue — get replaced with NaN before ratio computation, not zero or Inf. Components are winsorized at the 1st/99th percentile per fiscal year, matching Beneish (1999) methodology, with the caveat that years with fewer than 20 non-null observations are not winsorized (the percentile estimate is unreliable). When two or fewer of the five core components are null, the score is still computed with neutral imputation; when more than two are null, the score is NaN.

On the Threshold

Beneish’s original threshold (-1.78) was calibrated on US GAAP companies in the 1990s. Whether it transfers to Korean KOSDAQ in the 2020s is an empirical question.

The library ships with both. The default is -1.78 — the original Beneish value. A Korean bootstrap calibration on a labeled dataset of 50 cases (17 confirmed fraud or enforcement actions, 13 clean controls verified by sustained low M-Scores and no enforcement history, 20 auto-controls drawn from the broader universe) yields a recalibrated threshold of -2.45 with a 95% bootstrap confidence interval of [-3.50, -1.60].

The US threshold falls inside the Korean confidence interval, so it is not statistically distinguishable. But the Korean threshold of -2.45 produces fewer false positives on KOSDAQ small-caps in practice, and is the recommended value for Korean data:

scores = compute_mscores(df, threshold=-2.45)

The labeled dataset (labels.csv, 30 rows) is shipped with the package. The calibration methodology is documented in docs/calibration.md.

Where the Score Is Wrong by Design

The M-Score has known structural false positive patterns, and Korean data inherits them.

Biotech and pharmaceutical companies generate elevated TATA almost mechanically. R&D capitalization creates large accruals that are not earnings management — they are an accounting policy choice that the formula reads as suspicious. Any biotech screen using the unmodified M-Score will flag a high fraction of clean companies. The library does not auto-exclude these sectors; that is the user’s call. But the limitation is documented and the label dataset includes biotech examples to make the false-positive rate visible.

Companies undergoing legitimate growth phases (rapid sales expansion, rising SGI; inventory build-out, rising AQI; debt-financed expansion, rising LVGI) score high on the same components that fraud cases do. The model cannot distinguish growth from manipulation from financial structure alone. This is a feature of the original Beneish design, not a bug — the threshold defines a screen that requires human follow-up, not a classifier.

The library deliberately does not include the things that depend on your specific data pipeline: XBRL-to-14-column mapping, sector-relative percentile computation, risk-tier bucketing, DART link generation, parquet output, or LASSO/RF recalibration. Those belong in your pipeline layer, where the data conventions are yours. This library is the math, validated against published values, with the Korean structural adjustments — and nothing else.

What This Buys You

A consistent, audited M-Score implementation for any company-year pair where you can supply 14 standard fields (corp_code, year, receivables, revenue, cogs, sga, ppe, depreciation, total_assets, lt_debt, net_income, cfo, expense_method, fs_type). 70 tests, including regression tests against hand-calculated values from Beneish (1999). A labeled validation dataset for calibration and false-positive analysis. Documented handling of every Korean IFRS edge case the author has hit in production.

The library is at github.com/pon00050/kr-beneish. MIT license. Install with uv add git+https://github.com/pon00050/kr-beneish.

What the Score Actually Is

Where Korean IFRS Breaks the Formula

On the Threshold

Where the Score Is Wrong by Design

What This Buys You

Part of