Skip to content
>cat insights/building-a-single-source-of-truth-for-revenue.mdx
GTM Engineering

Building a Single Source of Truth for Revenue

Your CRM, billing system, and ERP show three different ARR numbers. Here's how to unify them with a warehouse-native architecture.

December 24, 2024
14 min read
Tolga Oral
#revops#data-architecture#arr#dbt#data-warehouse
revenue-single-source-of-truth.flow
Live
SalesforceCRMSourceexec:FivetranStripeBillingSourceexec:FivetranNetSuiteERPSourceexec:FivetranSnowflakeData WarehouseStorageexec:BatchdbtTransformSQL Modelsexec:Hourlyfct_mrr_dailySSOTMartexec:Authoritative
InputProcessingOutput

ELT pipeline unifying CRM, billing, and ERP into warehouse-native revenue models

The Board Meeting Problem

The CEO, CFO, and CRO walk into a board meeting. Each brings a different ARR number.

Sales says $12.4M—that's what's in Salesforce. Finance says $11.8M—that's what's recognized in NetSuite. The CFO's spreadsheet says $12.1M after "adjustments." The meeting that was supposed to discuss growth strategy becomes a reconciliation exercise.

This isn't a communication problem. It's an architecture problem.

As B2B SaaS companies scale past $5M ARR, the "spreadsheet glue" that held everything together starts failing. Each system—CRM, billing, ERP, product analytics—operates with its own definition of revenue, its own timing, its own customer identity. Without a unifying architecture, these realities drift apart until strategic alignment becomes impossible.

Why Your Systems Disagree

The discrepancy between systems isn't a bug—it's a feature of how each system was designed. They model Time, Identity, and Value differently.

The Time Horizon Paradox

Consider a customer signing a $120,000 contract on January 15th for a one-year term starting February 1st, with Net 30 payment terms.

typescript
// The same deal, four different "revenue" numbers
const dealTimeline = {
  // CRM records $120k in January (contract signed)
  bookings: {
    amount: 120000,
    date: "2024-01-15",
    system: "Salesforce",
    meaning: "Intent to generate revenue",
  },
 
  // Billing generates invoice in February (service starts)
  billings: {
    amount: 120000,
    date: "2024-02-01",
    system: "Stripe",
    meaning: "Cash obligation created",
  },
 
  // Cash arrives in March (Net 30)
  cash: {
    amount: 120000,
    date: "2024-03-02",
    system: "Bank",
    meaning: "Money in the account",
  },
 
  // Revenue recognized monthly over 12 months
  revenue: {
    amount: 10000, // per month
    dates: ["2024-02", "2024-03", "...through 2025-01"],
    system: "NetSuite",
    meaning: "Service delivered (GAAP)",
  },
};

The sales rep hits their January quota. The invoice shows February. Cash arrives in March. The P&L spreads it across twelve months. All four numbers are "correct"—but they answer different questions.

A unified data layer must capture all four timestamps and model the relationships between them. It cannot flatten them into a single "Date" column.

The Mid-Cycle Amendment Problem

SaaS relationships are dynamic. Customers upgrade, downgrade, or churn mid-cycle. This creates "proration chaos."

The scenario: A customer is on a $12,000/year plan (Jan-Dec). On July 1st, they upgrade to a $24,000/year plan.

  • Billing system: Issues a prorated invoice. Credits back 6 months of the old plan (-$6,000), charges 6 months of the new plan (+$12,000). Net invoice: $6,000.
  • CRM: Shows a new Opportunity for $12,000 (the upsell).
  • The discrepancy: Sum of Closed Won Opportunities = $24k ARR (correct). Sum of Invoices = $18k billed (also correct, but different).

The SSOT must define ARR not as a sum of transactions, but as the state of the subscription on a given day. On June 30th, ARR was $12k. On July 1st, ARR became $24k.

The Currency Problem

For companies operating globally, FX fluctuations introduce noise.

A contract signed in Euros (€100k) on January 1st is worth $110k USD. By December 31st, that same €100k might be worth $105k USD.

  • Sales: Wants "Budget Rate" reporting to protect commissions from FX headwinds
  • Finance: Must report "Spot Rate" for GAAP compliance

The data layer must compute three versions of every revenue metric:

sql
-- Multi-currency revenue calculation
SELECT
    subscription_id,
    amount_local,
    currency_code,
    -- For financial reporting (GAAP)
    amount_local * fx.spot_rate as amount_usd_spot,
    -- For sales performance analysis
    amount_local * fx.budget_rate as amount_usd_budget
FROM subscriptions s
JOIN fx_rates fx
    ON s.currency_code = fx.from_currency
    AND s.transaction_date = fx.rate_date

The Identity Mismatch

Systems model "Who is the Customer?" differently.

  • CRM: Complex hierarchies (Global Parent → Regional Subsidiary → Department)
  • Billing: Often flat (One Card = One Customer)
  • Support: Uses "Users" or "Tenants"

You may have 5 billing accounts in Stripe that roll up to one Global Account in Salesforce. If your revenue data layer cannot map these relationships, your Customer Lifetime Value calculations will be fragmented and understated.

The Cost of Disconnection

The impact is quantifiable:

  • Strategic paralysis: Board meetings spent reconciling data rather than making decisions
  • Operational waste: RevOps teams spend up to 40% of their time manually stitching spreadsheets for Weekly Business Reviews
  • Customer experience degradation: A churned customer still receives upsell emails because systems aren't synced
  • Valuation risk: Investors scrutinize data quality heavily. A lack of unified revenue view signals operational immaturity and creates due diligence risk

Architecture Patterns: Choosing Your Approach

This is the dominant architecture for mid-market B2B SaaS. Centralize all logic in a Cloud Data Warehouse.

┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│  Salesforce │  │   Stripe    │  │   NetSuite  │
└──────┬──────┘  └──────┬──────┘  └──────┬──────┘
       │                │                │
       └────────────────┼────────────────┘
                        │
              ┌─────────▼─────────┐
              │  Fivetran/Airbyte │  (Extract & Load)
              └─────────┬─────────┘
                        │
              ┌─────────▼─────────┐
              │     Snowflake     │  (Storage)
              └─────────┬─────────┘
                        │
              ┌─────────▼─────────┐
              │        dbt        │  (Transform)
              └─────────┬─────────┘
                        │
              ┌─────────▼─────────┐
              │   fct_mrr_daily   │  (Single Source of Truth)
              └─────────┬─────────┘
                        │
              ┌─────────▼─────────┐
              │  Census/Hightouch │  (Reverse ETL)
              └─────────┬─────────┘
                        │
       ┌────────────────┼────────────────┐
       │                │                │
┌──────▼──────┐  ┌──────▼──────┐  ┌──────▼──────┐
│  Salesforce │  │    Slack    │  │   Looker    │
│ (ARR field) │  │  (alerts)   │  │ (dashboards)│
└─────────────┘  └─────────────┘  └─────────────┘

The mechanism:

  1. Extract & Load: Fivetran or Airbyte ingest raw data from all source systems into the warehouse. Data is loaded in its raw, unaltered state.
  2. Transform: dbt writes modular SQL models that clean, join, and aggregate data. This is where business logic lives ("What is Churn?").
  3. Model: A unified table (fct_mrr_daily) serves as the definitive source of truth.
  4. Reverse ETL: Census or Hightouch sync calculated metrics back to operational tools.

Why this works:

  • Decoupling: Analytics logic is separated from operational systems. If Salesforce changes a field name, you update the dbt model, not integration code.
  • Auditability: Every metric definition is version-controlled in Git. You can trace exactly how "ARR" is calculated.
  • Flexibility: SQL handles any pricing scheme complexity.

Trade-offs:

  • Latency: Typically batch-based (15-60 minutes old)
  • Requires data engineering expertise

Option 2: Middleware (iPaaS)

MuleSoft, Workato, or custom middleware sits between systems, synchronizing in real-time.

When it works: Real-time sync requirements, simpler data models.

Why it fails at scale: As systems grow, integration paths explode (N×(N-1) connections). You get "spaghetti architecture" that's impossible to debug. Changing business logic becomes risky.

Option 3: Event-Driven Architecture

All systems publish events (OrderCreated, PaymentFailed) to a central bus (Kafka, Kinesis). Downstream services subscribe and react.

When it works: High-volume, usage-based pricing, consumption models (think Twilio, Snowflake).

Why it's overkill for most: Requires advanced engineering. Hard to query "state" (current ARR) without aggregating streams. Usually unnecessary for standard B2B subscription models.

Technical Implementation: The Date Spine

The naive approach to calculating MRR/ARR is summing active subscriptions. This breaks because subscriptions are intervals, not points. To analyze trends, convert intervals to a daily time series.

Step 1: Create a Date Dimension

sql
-- Generate a date spine (every day from 2020 to 2030)
-- Most warehouses have a utility for this
CREATE TABLE dim_dates AS
SELECT
    date_day,
    DATE_TRUNC('month', date_day) as month_start,
    DATE_TRUNC('quarter', date_day) as quarter_start,
    YEAR(date_day) as fiscal_year
FROM (
    SELECT DATEADD(day, seq4(), '2020-01-01') as date_day
    FROM TABLE(GENERATOR(ROWCOUNT => 4000))
)
WHERE date_day <= '2030-12-31';

Step 2: Explode Subscriptions to Daily Grain

sql
-- Convert subscription intervals to daily records
SELECT
    d.date_day,
    s.subscription_id,
    s.customer_id,
    s.mrr,
    s.plan_name
FROM dim_dates d
JOIN subscriptions s
    ON d.date_day >= s.start_date
    AND d.date_day < COALESCE(s.end_date, '2099-12-31')
WHERE s.status = 'active';

This "explodes" a single subscription row into 365 rows (for a 1-year contract). Now you can GROUP BY date_day to get exact ARR for any day in history—handling mid-month churn, leap years, and odd-length contracts.

Step 3: Calculate ARR Movements (The Flux Model)

Understanding why ARR changed is as important as the number itself.

sql
-- ARR movement categorization
WITH daily_arr AS (
    SELECT
        customer_id,
        date_day,
        SUM(mrr) * 12 as arr,
        LAG(SUM(mrr) * 12) OVER (
            PARTITION BY customer_id
            ORDER BY date_day
        ) as arr_yesterday
    FROM fct_subscriptions_daily
    GROUP BY customer_id, date_day
)
SELECT
    date_day,
    customer_id,
    arr,
    arr_yesterday,
    CASE
        -- New Business: Customer didn't exist yesterday
        WHEN arr_yesterday IS NULL AND arr > 0
        THEN 'new_business'
 
        -- Churn: Customer existed yesterday, gone today
        WHEN arr_yesterday > 0 AND arr = 0
        THEN 'churn'
 
        -- Expansion: ARR increased
        WHEN arr > arr_yesterday
        THEN 'expansion'
 
        -- Contraction: ARR decreased but not zero
        WHEN arr < arr_yesterday AND arr > 0
        THEN 'contraction'
 
        ELSE 'no_change'
    END as movement_type,
 
    arr - COALESCE(arr_yesterday, 0) as arr_delta
 
FROM daily_arr;

The mathematical identity that validates your model:

Beginning ARR + New + Expansion - Contraction - Churn = Ending ARR

If this doesn't balance to zero, your data model has a bug.

Handling Historical Changes (SCD Type 2)

Revenue data is historical. You can't just look at current state.

The problem: Customer A was "SMB" segment last year. Today they're "Enterprise." If you run a report on "2023 Revenue by Segment" using their current tag, you incorrectly attribute 2023 revenue to Enterprise.

The solution: Use dbt snapshots to implement Slowly Changing Dimensions Type 2.

yaml
# dbt snapshot configuration
# models/snapshots/snap_customers.sql
{% snapshot snap_customers %}
 
{{
    config(
      target_schema='snapshots',
      unique_key='customer_id',
      strategy='check',
      check_cols=['segment', 'industry', 'owner_id'],
    )
}}
 
SELECT * FROM {{ source('salesforce', 'accounts') }}
 
{% endsnapshot %}

This creates new rows whenever tracked fields change, with valid_from and valid_to timestamps. Revenue queries join to the snapshot table, filtering for the row valid on the date of the revenue event.

sql
-- Historical segmentation query
SELECT
    r.date_day,
    c.segment,  -- Segment AS OF that date, not current
    SUM(r.arr) as arr
FROM fct_revenue r
JOIN snap_customers c
    ON r.customer_id = c.customer_id
    AND r.date_day >= c.valid_from
    AND r.date_day < COALESCE(c.valid_to, '2099-12-31')
GROUP BY r.date_day, c.segment;

Data Quality Tests

A "Single Source of Truth" is worthless if nobody trusts it. Automated testing is mandatory.

yaml
# dbt schema.yml - Revenue model tests
version: 2
 
models:
  - name: fct_mrr_daily
    description: "Daily MRR by subscription - the source of truth"
    columns:
      - name: subscription_id
        tests:
          - unique
          - not_null
 
      - name: mrr
        tests:
          - not_null
          - dbt_utils.accepted_range:
              min_value: 0
              # MRR should never be negative
 
      - name: date_day
        tests:
          - not_null
 
    tests:
      # Custom test: ARR movements must balance
      - arr_movements_balance:
          tolerance: 0.01  # Allow 1 cent rounding error

The "Zero-Sum" test as a custom dbt test:

sql
-- tests/arr_movements_balance.sql
-- Beginning + New + Expansion - Contraction - Churn = Ending
 
WITH monthly_summary AS (
    SELECT
        DATE_TRUNC('month', date_day) as month,
        SUM(CASE WHEN movement_type = 'new_business' THEN arr_delta ELSE 0 END) as new_arr,
        SUM(CASE WHEN movement_type = 'expansion' THEN arr_delta ELSE 0 END) as expansion_arr,
        SUM(CASE WHEN movement_type = 'contraction' THEN arr_delta ELSE 0 END) as contraction_arr,
        SUM(CASE WHEN movement_type = 'churn' THEN arr_delta ELSE 0 END) as churn_arr
    FROM {{ ref('fct_arr_movements') }}
    GROUP BY 1
),
 
arr_by_month AS (
    SELECT
        DATE_TRUNC('month', date_day) as month,
        SUM(arr) as ending_arr,
        LAG(SUM(arr)) OVER (ORDER BY DATE_TRUNC('month', date_day)) as beginning_arr
    FROM {{ ref('fct_mrr_daily') }}
    WHERE date_day = LAST_DAY(date_day)  -- End of month snapshots
    GROUP BY 1
)
 
SELECT
    a.month,
    a.beginning_arr,
    m.new_arr,
    m.expansion_arr,
    m.contraction_arr,
    m.churn_arr,
    a.ending_arr,
    (a.beginning_arr + m.new_arr + m.expansion_arr + m.contraction_arr + m.churn_arr) as calculated_ending,
    ABS(a.ending_arr - (a.beginning_arr + m.new_arr + m.expansion_arr + m.contraction_arr + m.churn_arr)) as variance
FROM arr_by_month a
JOIN monthly_summary m ON a.month = m.month
WHERE ABS(a.ending_arr - (a.beginning_arr + m.new_arr + m.expansion_arr + m.contraction_arr + m.churn_arr)) > {{ tolerance }}
 
-- Test fails if any rows are returned (variance exceeds tolerance)

Governance: The Human Layer

Technology alone doesn't solve this. A "Single Source of Truth" requires a "Single Source of Definition."

The Revenue Data Council

Establish a cross-functional group that meets monthly:

  • Composition: VP RevOps, VP Finance (Controller), Head of Data, Sales Leadership
  • Mandate: Debate and ratify definitions

Example conflict: Sales wants to count a signed contract with an opt-out clause as "Bookings." Finance argues it's not "Revenue."

Resolution: The Council defines a new state: "Conditional Booking." The Data Team implements this flag in the warehouse.

The Data Dictionary

The Council's output is a published Data Dictionary:

Entry: Annual Recurring Revenue (ARR)
Definition: The normalized annual value of the recurring portion
            of active contracts.
Exclusions: One-time implementation fees, Training, Pilot overages.
Source: snowflake.prod.marts.fct_arr
Owner: Finance
Last Updated: 2024-12-01

Clear Ownership (RACI)

TaskRevOpsFinanceData EngineeringSales
Define "Booking"ResponsibleAccountableConsultedInformed
Define "Revenue" (GAAP)ConsultedAccountableInformedInformed
Build/Maintain ETLInformedInformedAccountableInformed
Input Data (CRM Accuracy)AccountableInformedInformedResponsible

Finance must be Accountable for definitions because they bear audit liability. Engineering is Accountable for pipeline reliability, not content.

Common Anti-Patterns

The Reverse ETL Loop of Death

The mistake: Bi-directional sync where Salesforce updates Snowflake, Snowflake recalculates a score, Reverse ETL updates Salesforce.

The result: Without careful Change Data Capture logic, you create infinite loops that exhaust API limits and cause data flickering.

The fix: Designate one system as "Master" for each field. Salesforce owns "Phone Number." Snowflake owns "Lifetime Value." Never let them overwrite each other.

Logic in the Source System

The mistake: Using Salesforce Formula Fields to calculate ARR.

Why it fails: Salesforce formulas can't easily access other objects without complex Apex. They're invisible to the warehouse until ingested.

The fix: Push logic down. Calculate in the warehouse where you have full SQL power and access to all data sources, then push results back via Reverse ETL.

The Spreadsheet Bridge

The mistake: Building a robust data stack, but Finance still exports to Excel for "final adjustments" before the board deck.

The impact: The Excel version becomes the real truth. The warehouse becomes irrelevant.

The fix: Bring adjustments into the warehouse. Create an adjustments table where Finance can upload overrides (via Google Sheet connected to Fivetran). The dbt model becomes: Raw Data + Adjustments = Final Truth. Human judgment is captured in the system of record.

What Good Looks Like

GitLab: Built a "Trusted Data Framework" using Snowflake and dbt. Dashboards are only marked "Trusted" if underlying dbt models pass automated tests. This accelerated their IPO financial close process.

Typeform: Migrated from fragmented Redshift to centralized Snowflake with dbt. Report generation dropped from hours to minutes. Different domains (Product, Sales) own their data marts while adhering to global governance for core revenue metrics.

Netlify: Uses best-of-breed tools connected via warehouse. When they launched usage-based pricing, they updated logic in dbt without overhauling billing infrastructure.

Getting Started

If your leadership team argues about ARR numbers in meetings, the fix isn't better communication—it's better architecture.

Phase 1: Inventory your systems. Map where revenue data lives (CRM, Billing, ERP, Product). Document what each system calls "revenue."

Phase 2: Choose your warehouse. Snowflake and BigQuery are the standard choices. Don't overthink this—both work.

Phase 3: Set up ingestion. Fivetran is the safe choice for mid-market. Airbyte if you need custom connectors or are cost-sensitive.

Phase 4: Build your revenue model. Start with one metric: ARR as of today. Get Finance to sign off that the number matches their expectation. Then expand to historical, then to movements (New, Expansion, Churn).

Phase 5: Operationalize. Push the unified metrics back to Salesforce, Slack, dashboards. Make the warehouse the source, not one of many sources.

The transition from "Operating by Feel" to "Operating by Fact" is the foundational challenge for scaling SaaS companies. The companies that solve it don't just report better—they decide faster.


We build unified revenue data layers as part of our Autonomous Ops track. If you're spending more time reconciling data than acting on it, we can help you architect the system that ends the debate.

author.json

Tolga Oral

Founder & GTM Engineer

8+ years architecting integrated growth engines for ambitious tech companies. Building bridges between marketing, sales, and operations.