Skip to main content

Overview

Topograph’s extract financial data automatically from company annual statements. We analyze PDF documents and extracts key financial metrics into a structured, machine-readable format.

How It Works

When a financial statement document is processed:
  1. Document Classification - The AI first determines if the document is a financial statement
  2. Data Extraction - If identified as a financial statement, key metrics are extracted
  3. Structured Output - Data is returned in the extractedData.financialData field
Non-financial documents will not have the extractedData field populated.

Data Model

The structure for extracted data covers all major components of financial statements as presented below.

Top-Level Structure

{
  extractedData: {
    financialData: {
      // Metadata
      fiscalYear: FiscalYear;
      approvalDate: string | null;
      currency: string | null;
      accountingStandard: AccountingStandard;
      statementType: StatementType;

      // Financial sections
      incomeStatement: IncomeStatement;
      balanceSheet: BalanceSheet;
    }
  }
}

Metadata Fields

Fiscal Year

fiscalYear: {
  startDate: string; // Format: "YYYY-MM-DD"
  endDate: string; // Format: "YYYY-MM-DD"
}

Accounting Standard

  • "IFRS" - International Financial Reporting Standards
  • "French GAAP" - French Generally Accepted Accounting Principles
  • "US GAAP" - United States GAAP
  • "Swiss GAAP" - Swiss GAAP
  • "Lux GAAP" - Luxembourg GAAP
  • "Other" - Other accounting standards

Statement Type

  • "consolidated" - Group/consolidated financial statements
  • "simplified" - Individual/standalone statements

Income Statement

incomeStatement: {
  revenue: {
    amount: number | null; // Current period
    previousAmount: number | null; // Prior period
    localName: string; // Original term from document
  }
  depreciationAndAmortization: {
    amount: number | null;
    previousAmount: number | null;
    localName: string;
  }
  operatingIncome: {
    amount: number | null;
    previousAmount: number | null;
    localName: string;
    standardDefinition: string | null; // How it's calculated
    notes: string | null; // Additional context
  }
  netIncome: {
    amount: number | null;
    previousAmount: number | null;
    localName: string;
  }
}

Balance Sheet

The balance sheet is organized into two main sections.

Assets

assets: {
  fixedAssets: {
    total: {
      amount: number | null;
      previousAmount: number | null;
    }
    components: {
      tangible: {
        // Property, plant, equipment
        amount: number | null;
        previousAmount: number | null;
        localName: string;
      }
      intangible: {
        // Goodwill, patents, software
        amount: number | null;
        previousAmount: number | null;
        localName: string;
      }
      financialAssets: {
        // Long-term investments
        amount: number | null;
        previousAmount: number | null;
        localName: string;
      }
    }
  }
  currentAssets: {
    total: {
      amount: number | null;
      previousAmount: number | null;
    }
    components: {
      cashAndEquivalents: {
        amount: number | null;
        previousAmount: number | null;
        localName: string;
      }
      tradeReceivables: {
        // Accounts receivable
        amount: number | null;
        previousAmount: number | null;
        localName: string;
      }
      inventory: {
        amount: number | null;
        previousAmount: number | null;
        localName: string;
      }
      otherCurrentAssets: {
        amount: number | null;
        previousAmount: number | null;
        localName: string;
      }
    }
  }
}

Equity and Liabilities

equityAndLiabilities: {
  equity: {
    total: {
      amount: number | null;
      previousAmount: number | null;
    }
    components: {
      shareCapital: {
        amount: number | null;
        previousAmount: number | null;
        localName: string;
      }
      retainedEarnings: {
        amount: number | null;
        previousAmount: number | null;
        localName: string;
      }
      otherReserves: {
        amount: number | null;
        previousAmount: number | null;
        localName: string;
      }
    }
  }
  liabilities: {
    nonCurrent: {
      // Long-term (> 1 year)
      total: {
        amount: number | null;
        previousAmount: number | null;
      }
      components: {
        financialDebt: {
          amount: number | null;
          previousAmount: number | null;
          localName: string;
        }
        otherNonCurrentLiabilities: {
          amount: number | null;
          previousAmount: number | null;
          localName: string;
        }
      }
    }
    current: {
      // Short-term (< 1 year)
      total: {
        amount: number | null;
        previousAmount: number | null;
      }
      components: {
        tradePayables: {
          amount: number | null;
          previousAmount: number | null;
          localName: string;
        }
        currentFinancialDebt: {
          amount: number | null;
          previousAmount: number | null;
          localName: string;
        }
        otherCurrentLiabilities: {
          amount: number | null;
          previousAmount: number | null;
          localName: string;
        }
      }
    }
  }
}

Important Notes

Data Quality

  • Null Values - Fields return null when data is missing, unclear, or not applicable
  • Numeric Values - All amounts are numeric values without currency symbols or thousand separators
  • Negative Values - Losses and deficits preserve the negative sign
  • Local Terms - The localName field presents the exact wording used in the source document

Multi-Language Support

The extraction works across multiple languages and accounting frameworks:
  • Recognizes financial terms in various languages
  • Maps local terminology to standardized fields
  • Preserves original terms in localName fields

Current Limits

  • Only processes PDF financial statements
  • Extraction accuracy depends on document quality and structure
  • Complex or non-standard formats may have reduced accuracy
  • Currently focuses on core financial metrics

Example Usage

When retrieving documents with financial statements:
POST https://api.topograph.co/v2/company
{
  "companyId": "your-company-id",
  "countryCode": "DE",
  "dataPoints": [],
  "documents": ["financial_statement_id"]
}
The response will include the extracted data:
{
  "documents": {
    "financialStatements": [
      {
        "id": "financial_statement_id",
        "extractedData": {
          "financialData": {
            // Structured financial data as described above
          }
        }
        // ... other document fields
      }
    ]
  }
}

Feedback

This is a beta feature and we’re actively collecting feedback to improve extraction accuracy and expand coverage. Please share your experiences and suggestions on support channels or to your account manager.