Sixteen Years of Forensic Accounting Research, in One JSON File - Writing

Source data → github.com/pon00050/jfia-catalog

The Journal of Forensic & Investigative Accounting (JFIA), published by the National Association of Certified Valuators and Analysts (NACVA), has run since 2009. It is one of the few peer-reviewed journals in the world dedicated specifically to forensic accounting — fraud detection, earnings manipulation, disclosure timing, insider networks, audit failure modes. Sixteen years of issues. 469 articles. The corpus a researcher would actually want to mine.

But the journal’s website is a paginated list of issue pages, each with article titles, occasionally abstracts, occasionally keywords, and a PDF download link. There is no full-text search across the corpus. There is no API. There is no structured index. To answer “what has JFIA published on Madoff-style fraud?” you click through 46 issue pages.

This catalog is the structured index that should have existed.

What Is in the File

jfia_catalog.json is a single 681 KB JSON file containing every article in JFIA from Volume 1 (2009 Q1) through Volume 17 (2024–2025). 46 issues. 469 articles.

For each article, the catalog stores: the title, the author list, the publication issue and period, the abstract (where available), the keyword list (where available), and the PDF URL on NACVA’s S3 bucket. Issues are tagged with is_special_issue for the periodic thematic editions.

The JSON structure is deliberately flat enough to be useful directly:

import json
catalog = json.load(open("jfia_catalog.json", encoding="utf-8"))

articles = [
    {**a, "volume": i["volume"], "issue": i["issue"], "period": i["period"]}
    for i in catalog["issues"]
    for a in i["articles"]
]

beneish_papers = [
    a for a in articles
    if "beneish" in (a.get("abstract") or "").lower()
    or "beneish" in " ".join(a.get("keywords") or []).lower()
]

Three lines. Every paper that mentions Beneish.

What the Coverage Looks Like

Field	Coverage
Title	469 of 469 (100%)
Authors	469 of 469 (100%)
PDF URL	469 of 469 (100%)
Abstract	363 of 469 (77.6%)
Keywords	242 of 469 (51.6%)

Titles, authors, and PDF links are complete because they appear on every JFIA issue page. Abstracts and keywords are not, and the gap is structural rather than accidental: across 17 years, NACVA’s website has used at least four different HTML layouts for issue pages. Older issues often omit abstract sections entirely; some issues from the middle period embed abstracts in the PDF but not the HTML. The scraper extracts what is on the page; what is missing reflects what was published.

For research that needs full-text content, the PDF URLs are present for every article and download cleanly. The catalog is the index; the PDFs are the content.

What This Catalog Cannot Do

It is a metadata index, not a full-text search engine. Search across abstracts works for the 77.6% with abstracts; search across body text requires downloading the PDFs and running OCR or text extraction.

It is not automatically refreshed. The list of issues is hardcoded in the scraper. When NACVA publishes Volume 18, the catalog needs a manual update — add the new issue URLs to the script and re-run. The scraper is in the same repository, so the maintenance path is documented.

It does not include citation counts, download statistics, or any usage metrics. NACVA does not publish those. Citation analysis requires joining against Google Scholar or similar — out of scope for this catalog.

Why It Exists

The catalog is the foundation layer for a larger system that maps academic forensic accounting research onto computable detection rules — what we have been calling “detectlets.” A detectlet is a small structured object that says: this paper proposes that signal X (e.g., elevated Days Sales in Receivables Index) indicates pattern Y (e.g., revenue fabrication), tested against dataset Z. If you can extract detectlets from the literature, you can build detection libraries that are grounded in published methodology rather than ad-hoc heuristics.

That mapping needs an index of the literature first. The catalog is that index.

But the catalog is also useful on its own. A researcher writing a literature review on Korean financial enforcement can grep the JSON for relevant authors and abstracts in seconds. A practitioner looking for prior work on a specific detection technique can search across 16 years in a single Python list comprehension. A graduate student working on a thesis can know whether their idea has been covered before without clicking through 46 issue pages.

The data is at github.com/pon00050/jfia-catalog. MIT license on the structured metadata. PDF content remains copyright NACVA — the catalog provides the URLs; whether and how you fetch the underlying PDFs is your decision.

What Is in the File

What the Coverage Looks Like

What This Catalog Cannot Do

Why It Exists

Part of