Overview
Topograph’s AI-powered financial data extraction automatically analyzes PDF financial statements and extracts key financial metrics into a structured, machine-readable format. This beta feature works seamlessly with our document retrieval system, requiring no additional configuration.
Beta Feature: The financial data extraction feature is currently in beta.
The data model may evolve as we refine extraction accuracy and expand
coverage. We welcome your feedback to help improve this feature.
How It Works
When a financial statement document is processed:
- Document Classification - The AI first determines if the document is a financial statement
- Data Extraction - If identified as a financial statement, key metrics are extracted
- Structured Output - Data is returned in the
extractedData.financialData
field
Non-financial documents will not have the extractedData
field populated.
Data Model
The extracted financial data follows a comprehensive structure covering all major components of financial statements:
Top-Level Structure
{
extractedData: {
financialData: {
// Metadata
fiscalYear: FiscalYear;
approvalDate: string | null;
currency: string | null;
accountingStandard: AccountingStandard;
statementType: StatementType;
// Financial sections
incomeStatement: IncomeStatement;
balanceSheet: BalanceSheet;
}
}
}
Fiscal Year
fiscalYear: {
startDate: string; // Format: "YYYY-MM-DD"
endDate: string; // Format: "YYYY-MM-DD"
}
Accounting Standard
"IFRS"
- International Financial Reporting Standards
"French GAAP"
- French Generally Accepted Accounting Principles
"US GAAP"
- United States GAAP
"Swiss GAAP"
- Swiss GAAP
"Lux GAAP"
- Luxembourg GAAP
"Other"
- Other accounting standards
Statement Type
"consolidated"
- Group/consolidated financial statements
"simplified"
- Individual/standalone statements
Income Statement
incomeStatement: {
revenue: {
amount: number | null; // Current period
previousAmount: number | null; // Prior period
localName: string; // Original term from document
}
depreciationAndAmortization: {
amount: number | null;
previousAmount: number | null;
localName: string;
}
operatingIncome: {
amount: number | null;
previousAmount: number | null;
localName: string;
standardDefinition: string | null; // How it's calculated
notes: string | null; // Additional context
}
netIncome: {
amount: number | null;
previousAmount: number | null;
localName: string;
}
}
Balance Sheet
The balance sheet is organized into two main sections:
Assets
assets: {
fixedAssets: {
total: {
amount: number | null;
previousAmount: number | null;
}
components: {
tangible: {
// Property, plant, equipment
amount: number | null;
previousAmount: number | null;
localName: string;
}
intangible: {
// Goodwill, patents, software
amount: number | null;
previousAmount: number | null;
localName: string;
}
financialAssets: {
// Long-term investments
amount: number | null;
previousAmount: number | null;
localName: string;
}
}
}
currentAssets: {
total: {
amount: number | null;
previousAmount: number | null;
}
components: {
cashAndEquivalents: {
amount: number | null;
previousAmount: number | null;
localName: string;
}
tradeReceivables: {
// Accounts receivable
amount: number | null;
previousAmount: number | null;
localName: string;
}
inventory: {
amount: number | null;
previousAmount: number | null;
localName: string;
}
otherCurrentAssets: {
amount: number | null;
previousAmount: number | null;
localName: string;
}
}
}
}
Equity and Liabilities
equityAndLiabilities: {
equity: {
total: {
amount: number | null;
previousAmount: number | null;
}
components: {
shareCapital: {
amount: number | null;
previousAmount: number | null;
localName: string;
}
retainedEarnings: {
amount: number | null;
previousAmount: number | null;
localName: string;
}
otherReserves: {
amount: number | null;
previousAmount: number | null;
localName: string;
}
}
}
liabilities: {
nonCurrent: {
// Long-term (> 1 year)
total: {
amount: number | null;
previousAmount: number | null;
}
components: {
financialDebt: {
amount: number | null;
previousAmount: number | null;
localName: string;
}
otherNonCurrentLiabilities: {
amount: number | null;
previousAmount: number | null;
localName: string;
}
}
}
current: {
// Short-term (< 1 year)
total: {
amount: number | null;
previousAmount: number | null;
}
components: {
tradePayables: {
amount: number | null;
previousAmount: number | null;
localName: string;
}
currentFinancialDebt: {
amount: number | null;
previousAmount: number | null;
localName: string;
}
otherCurrentLiabilities: {
amount: number | null;
previousAmount: number | null;
localName: string;
}
}
}
}
}
Important Notes
Data Quality
- Null Values - Fields return
null
when data is missing, unclear, or not applicable
- Numeric Values - All amounts are numeric values without currency symbols or thousand separators
- Negative Values - Losses and deficits preserve the negative sign
- Local Terms - The
localName
field preserves the exact terminology from the source document
Multi-Language Support
The extraction works across multiple languages and accounting frameworks:
- Recognizes financial terms in various languages
- Maps local terminology to standardized fields
- Preserves original terms in
localName
fields
Limitations
- Only processes PDF financial statements
- Extraction accuracy depends on document quality and structure
- Complex or non-standard formats may have reduced accuracy
- Currently focuses on core financial metrics
Example Usage
When retrieving documents with financial statements:
POST https://api.topograph.co/v2/company
{
"companyId": "your-company-id",
"countryCode": "DE",
"dataPoints": [],
"documents": ["financial_statement_id"]
}
The response will include the extracted data:
{
"documents": {
"financialStatements": [
{
"id": "financial_statement_id",
"extractedData": {
"financialData": {
// Structured financial data as described above
}
}
// ... other document fields
}
]
}
}
Feedback
As this is a beta feature, we’re actively collecting feedback to improve extraction accuracy and expand coverage. Please share your experiences and suggestions through your account manager or support channels.