Overview

Topograph’s AI-powered financial data extraction automatically analyzes PDF financial statements and extracts key financial metrics into a structured, machine-readable format. This beta feature works seamlessly with our document retrieval system, requiring no additional configuration.
Beta Feature: The financial data extraction feature is currently in beta. The data model may evolve as we refine extraction accuracy and expand coverage. We welcome your feedback to help improve this feature.

How It Works

When a financial statement document is processed:
  1. Document Classification - The AI first determines if the document is a financial statement
  2. Data Extraction - If identified as a financial statement, key metrics are extracted
  3. Structured Output - Data is returned in the extractedData.financialData field
Non-financial documents will not have the extractedData field populated.

Data Model

The extracted financial data follows a comprehensive structure covering all major components of financial statements:

Top-Level Structure

{
  extractedData: {
    financialData: {
      // Metadata
      fiscalYear: FiscalYear;
      approvalDate: string | null;
      currency: string | null;
      accountingStandard: AccountingStandard;
      statementType: StatementType;

      // Financial sections
      incomeStatement: IncomeStatement;
      balanceSheet: BalanceSheet;
    }
  }
}

Metadata Fields

Fiscal Year

fiscalYear: {
  startDate: string; // Format: "YYYY-MM-DD"
  endDate: string; // Format: "YYYY-MM-DD"
}

Accounting Standard

  • "IFRS" - International Financial Reporting Standards
  • "French GAAP" - French Generally Accepted Accounting Principles
  • "US GAAP" - United States GAAP
  • "Swiss GAAP" - Swiss GAAP
  • "Lux GAAP" - Luxembourg GAAP
  • "Other" - Other accounting standards

Statement Type

  • "consolidated" - Group/consolidated financial statements
  • "simplified" - Individual/standalone statements

Income Statement

incomeStatement: {
  revenue: {
    amount: number | null; // Current period
    previousAmount: number | null; // Prior period
    localName: string; // Original term from document
  }
  depreciationAndAmortization: {
    amount: number | null;
    previousAmount: number | null;
    localName: string;
  }
  operatingIncome: {
    amount: number | null;
    previousAmount: number | null;
    localName: string;
    standardDefinition: string | null; // How it's calculated
    notes: string | null; // Additional context
  }
  netIncome: {
    amount: number | null;
    previousAmount: number | null;
    localName: string;
  }
}

Balance Sheet

The balance sheet is organized into two main sections:

Assets

assets: {
  fixedAssets: {
    total: {
      amount: number | null;
      previousAmount: number | null;
    }
    components: {
      tangible: {
        // Property, plant, equipment
        amount: number | null;
        previousAmount: number | null;
        localName: string;
      }
      intangible: {
        // Goodwill, patents, software
        amount: number | null;
        previousAmount: number | null;
        localName: string;
      }
      financialAssets: {
        // Long-term investments
        amount: number | null;
        previousAmount: number | null;
        localName: string;
      }
    }
  }
  currentAssets: {
    total: {
      amount: number | null;
      previousAmount: number | null;
    }
    components: {
      cashAndEquivalents: {
        amount: number | null;
        previousAmount: number | null;
        localName: string;
      }
      tradeReceivables: {
        // Accounts receivable
        amount: number | null;
        previousAmount: number | null;
        localName: string;
      }
      inventory: {
        amount: number | null;
        previousAmount: number | null;
        localName: string;
      }
      otherCurrentAssets: {
        amount: number | null;
        previousAmount: number | null;
        localName: string;
      }
    }
  }
}

Equity and Liabilities

equityAndLiabilities: {
  equity: {
    total: {
      amount: number | null;
      previousAmount: number | null;
    }
    components: {
      shareCapital: {
        amount: number | null;
        previousAmount: number | null;
        localName: string;
      }
      retainedEarnings: {
        amount: number | null;
        previousAmount: number | null;
        localName: string;
      }
      otherReserves: {
        amount: number | null;
        previousAmount: number | null;
        localName: string;
      }
    }
  }
  liabilities: {
    nonCurrent: {
      // Long-term (> 1 year)
      total: {
        amount: number | null;
        previousAmount: number | null;
      }
      components: {
        financialDebt: {
          amount: number | null;
          previousAmount: number | null;
          localName: string;
        }
        otherNonCurrentLiabilities: {
          amount: number | null;
          previousAmount: number | null;
          localName: string;
        }
      }
    }
    current: {
      // Short-term (< 1 year)
      total: {
        amount: number | null;
        previousAmount: number | null;
      }
      components: {
        tradePayables: {
          amount: number | null;
          previousAmount: number | null;
          localName: string;
        }
        currentFinancialDebt: {
          amount: number | null;
          previousAmount: number | null;
          localName: string;
        }
        otherCurrentLiabilities: {
          amount: number | null;
          previousAmount: number | null;
          localName: string;
        }
      }
    }
  }
}

Important Notes

Data Quality

  • Null Values - Fields return null when data is missing, unclear, or not applicable
  • Numeric Values - All amounts are numeric values without currency symbols or thousand separators
  • Negative Values - Losses and deficits preserve the negative sign
  • Local Terms - The localName field preserves the exact terminology from the source document

Multi-Language Support

The extraction works across multiple languages and accounting frameworks:
  • Recognizes financial terms in various languages
  • Maps local terminology to standardized fields
  • Preserves original terms in localName fields

Limitations

  • Only processes PDF financial statements
  • Extraction accuracy depends on document quality and structure
  • Complex or non-standard formats may have reduced accuracy
  • Currently focuses on core financial metrics

Example Usage

When retrieving documents with financial statements:
POST https://api.topograph.co/v2/company
{
  "companyId": "your-company-id",
  "countryCode": "DE",
  "dataPoints": [],
  "documents": ["financial_statement_id"]
}
The response will include the extracted data:
{
  "documents": {
    "financialStatements": [
      {
        "id": "financial_statement_id",
        "extractedData": {
          "financialData": {
            // Structured financial data as described above
          }
        }
        // ... other document fields
      }
    ]
  }
}

Feedback

As this is a beta feature, we’re actively collecting feedback to improve extraction accuracy and expand coverage. Please share your experiences and suggestions through your account manager or support channels.