Harbinger Explorer

Back to Knowledge Hub
solutions
Published:

API Schema Validation Tool: How to Stop Silent Breaking Changes Before They Break Your Data

13 min read·Tags: api schema validation, schema monitoring, api changes, data quality, schema drift, api monitoring

API Schema Validation Tool: How to Stop Silent Breaking Changes Before They Break Your Data

You built a pipeline last quarter that pulls data from a third-party API. It runs every night, loads clean data into your database, and feeds a dashboard your VP checks every Monday morning. Everything works perfectly — until one Monday when the dashboard shows null values for half the metrics.

You spend two hours debugging. Eventually you discover that the API provider quietly added a new required field, renamed an existing one, and changed a numeric field to a string. No announcement. No changelog. No versioning. The API just changed, and your pipeline kept running — silently ingesting broken data for two weeks before anyone noticed.

This is not a rare edge case. API schemas change constantly, and most teams find out the hard way.


Try it yourselfStart exploring for free. No credit card. 8 demo data sources ready to query.


The Problem with APIs: They Change Without Warning

Schema Drift Is the Norm, Not the Exception

Most public and commercial APIs are maintained by teams under competitive pressure. Features get added, fields get restructured, deprecated fields get removed — often on aggressive timelines. For internal APIs, the situation is even more unpredictable: a backend engineer changes a response format, doesn't think to tell the data team, and three pipelines break simultaneously.

The word "schema" covers a range of things that can drift:

  • Field additions: A new field appears in the response. Usually harmless, but can break strict schema validation.
  • Field removals: A field your pipeline depends on disappears. Silent data loss.
  • Field renames: user_id becomes userId. Your join key breaks.
  • Type changes: A field that returned integers now returns strings. Aggregation queries fail.
  • Nesting changes: A flat field becomes an object. Your extraction logic reads null.
  • Enum changes: Valid values for a categorical field change. Filters silently exclude new values.

Each of these can cause failures ranging from obvious crashes to subtle data quality degradation that nobody catches for weeks.

Why Standard Monitoring Doesn't Catch Schema Changes

Most teams monitor their pipelines for failures: did the job finish? Did it throw an exception? These checks tell you when something hard-breaks — when the API returns a 500, or when your database load fails with a type error.

They don't tell you when data silently degrades. If the API renames a field and your code reads it as null, the pipeline often finishes successfully. The data is wrong, but the monitoring says green.

Row count monitoring is slightly better — if a field rename causes a join to return zero rows, you might catch it. But subtle changes like type coercions or new nullable fields are invisible to row count checks.

True schema validation means comparing the actual structure of an API response against a known baseline, field by field, type by type, on every run.

The Cost of Late Detection

Consider a real scenario: a SaaS company pulls CRM data from a vendor API to feed their revenue forecasting model. The vendor quietly changes the structure of the deal_stage field from a string to an integer code. The pipeline keeps running. The forecasting model keeps training — on corrupted data. Twelve weeks later, when the forecast is visibly wrong, an engineer traces it back to the schema change. Three months of model training are invalidated.

The cost isn't just engineering time. It's the decisions made with bad data: staffing plans, inventory orders, marketing budgets.

What Existing API Schema Validation Tools Offer

Postman / Insomnia

Postman is excellent for manual API testing. You can define a schema and validate responses against it in a test script. But Postman is a development tool — it's designed for one-off checks, not continuous automated monitoring. Running a Postman collection on a schedule requires a CI/CD integration, and even then you're validating against a static schema file that you have to manually update.

JSON Schema Validators

Tools like ajv (JavaScript) or jsonschema (Python) let you write a JSON Schema spec and validate API responses against it programmatically. This is powerful but requires significant upfront work: you have to write the schema, maintain it as the API evolves intentionally, and integrate validation into every pipeline.

When an API changes in a way you didn't anticipate, your schema file is already wrong before you've had a chance to update it.

OpenAPI / Swagger Specs

Some APIs publish an OpenAPI specification — a machine-readable description of all endpoints, parameters, and response schemas. If your API provider publishes one and keeps it up to date, you can validate responses against it automatically.

The problem: many APIs have outdated or incomplete OpenAPI specs. And the spec itself can lag behind actual API behavior, giving you a false sense of security.

Homegrown Monitoring Scripts

Many data teams end up writing their own schema monitoring scripts: fetch the API, check that expected fields exist, alert if something changes. This works, but it's toil. Every API needs its own script. Scripts need to be maintained. Edge cases pile up. Eventually the monitoring code becomes more complex than the pipeline it's watching.

The Better Approach: Automatic Schema Change Detection on Every Recrawl

Imagine registering an API endpoint once. You paste the URL, optionally add authentication, and the system crawls the endpoint — examining the full response structure, inferring types for every field, documenting the schema it observed. No manual JSON Schema writing. No Postman collections to maintain.

Then, every time the data is refreshed, the system automatically compares the new response against the stored baseline. If anything has changed — a field added, a field removed, a type changed — you're alerted immediately, before broken data flows downstream.

That's exactly what Harbinger Explorer does with its AI Crawler and automatic schema change detection.

How Harbinger Explorer's API Schema Validation Works

Step 1: Register Your API Endpoint

In Harbinger Explorer, add a new data source and paste your API endpoint URL. The AI Crawler fetches the endpoint and automatically maps the full response structure: every field name, every inferred data type, every nested object and array. You don't write a schema — the system infers it from the live response.

For authenticated APIs, you provide headers (Authorization tokens, API keys) which are stored securely. Harbinger Explorer supports standard REST patterns and can handle paginated APIs.

Step 2: Start Querying Immediately

Once crawled, you can run SQL against the API data immediately using DuckDB:

SELECT
  endpoint_id,
  response_field_name,
  inferred_type,
  nullable
FROM schema_registry
WHERE source_name = 'crm_deals_api'
ORDER BY endpoint_id

You can also query the actual data:

SELECT
  deal_id,
  deal_name,
  deal_stage,
  amount,
  close_date
FROM crm_deals_api
WHERE deal_stage IN ('Proposal', 'Negotiation')
  AND close_date >= CURRENT_DATE
ORDER BY amount DESC

Step 3: Enable Recrawling with Schema Diff Alerts

On the Pro plan, you can schedule automatic recrawls. Every time Harbinger Explorer refetches your API, it runs a schema diff against the stored baseline. If anything has changed:

  • New fields are logged and highlighted
  • Missing fields trigger an alert
  • Type changes are flagged with the old and new type shown side by side
  • Nullable status changes are noted

You see exactly what changed, when it changed, and what the old structure looked like.

Step 4: Update Your Queries Accordingly

When a schema change is detected, you can immediately run the updated data in the query editor, see how the new structure looks, and update your saved queries before any downstream pipeline is affected. You're ahead of the breakage, not catching up to it.


Pricing: Starter at €8/month (25 chats/day, 10 crawls/month) or Pro at €24/month (200 chats/day, 100 crawls/month, recrawling, priority support). See pricing →

Free 7-day trial, no credit card required. Start free →


Advanced Use Cases for API Schema Monitoring

Monitoring Multiple API Versions Simultaneously

If a provider supports v1 and v2 of their API, register both as sources in Harbinger Explorer. Run a cross-version comparison query:

SELECT
  v1.field_name,
  v1.field_type AS v1_type,
  v2.field_type AS v2_type,
  CASE WHEN v1.field_type != v2.field_type THEN 'TYPE_CHANGE' ELSE 'OK' END AS status
FROM api_v1_schema v1
FULL OUTER JOIN api_v2_schema v2 ON v1.field_name = v2.field_name
WHERE v1.field_type != v2.field_type OR v2.field_name IS NULL OR v1.field_name IS NULL

This gives you a clear picture of what changed between versions before you commit to a migration.

Combining API Schema Data with Historical Baselines

Store your schema snapshots in Harbinger Explorer and query across time:

SELECT
  crawl_date,
  COUNT(*) AS field_count,
  SUM(CASE WHEN field_type = 'string' THEN 1 ELSE 0 END) AS string_fields,
  SUM(CASE WHEN nullable = true THEN 1 ELSE 0 END) AS nullable_fields
FROM api_schema_history
WHERE source_name = 'payments_api'
GROUP BY crawl_date
ORDER BY crawl_date

This longitudinal view lets you see schema evolution over time — useful for understanding how aggressively a provider changes their API.

PII Detection in API Responses

API responses sometimes include unexpected personal data. Harbinger Explorer's PII Detection automatically flags fields that look like email addresses, phone numbers, national IDs, or IP addresses. When a new field appears in an API response and it looks like PII, you're alerted before that data reaches any storage layer.

Common Mistakes with API Schema Validation

Mistake 1: Validating only the happy path Most schema validation setups test with a single sample response. But APIs often return different schemas for different query parameters, error states, or edge cases. Register multiple endpoint variants — with different filters, different IDs — to get complete schema coverage.

Mistake 2: Ignoring nullable changes A field changing from required to nullable (or vice versa) looks like a minor change. In practice, nullable fields that weren't nullable before mean your aggregations suddenly include nulls, which changes results silently.

-- Check for nullability surprises:
SELECT
  field_name,
  COUNT(*) AS total_rows,
  COUNT(field_value) AS non_null_rows,
  ROUND(COUNT(field_value) * 100.0 / COUNT(*), 2) AS pct_non_null
FROM your_api_source
GROUP BY field_name
ORDER BY pct_non_null ASC

Mistake 3: Only monitoring production APIs Staging and development APIs often receive schema changes before production. Monitor them too — catching a change in staging gives you a warning before it hits production.

Mistake 4: Not documenting why a schema changed When Harbinger Explorer detects a change, immediately add a note in your team's documentation about what changed and why. This turns a potential crisis into a managed process.

Feature Comparison

CapabilityPostmanJSON SchemaHarbinger Explorer
Auto-infer schema from live API
Scheduled recrawl with diff
SQL queries across API data
PII detection in responses
Alert on type changesManual setupManual setup✅ Automatic
Multi-source joins

FAQ

Does Harbinger Explorer support authenticated APIs? Yes. You can provide Authorization headers, API keys, and other authentication parameters when registering a source. Credentials are stored securely and used on every recrawl.

How often does recrawling happen? On the Pro plan, you can configure recrawl frequency. The system supports daily recrawls with automatic schema diff detection.

What happens when a breaking change is detected? Harbinger Explorer flags the change in your source dashboard and logs the old and new schema side by side. Your existing queries continue to run against the last-known-good data until you explicitly update the source.

Can I integrate alerts with Slack or email? Schema change alerts can be reviewed directly in the Harbinger Explorer dashboard. Webhook and notification integrations are on the roadmap.

Real-World Case Study: SaaS Analytics Team and the Silent CRM Schema Change

A B2B SaaS company's analytics team was pulling deal pipeline data from their CRM via API every night. The pipeline loaded the data into a reporting database, and the head of sales reviewed the pipeline dashboard every morning.

One Tuesday, the CRM vendor pushed a schema update as part of a larger product release. Two fields changed:

  • deal_stage changed from a string like "Proposal Sent" to a numeric stage code like 3
  • owner_id was renamed to assigned_rep_id

The pipeline didn't crash. It continued running every night. The deal_stage field loaded as numbers, and the sales dashboard (which did string comparisons like WHERE deal_stage = 'Proposal Sent') returned zero rows for those filters. The assigned_rep_id field was absent from the load — the pipeline's column mapping still referenced the old name — so rep-level attribution silently went to null.

For nine days, the sales team's dashboard showed misleading numbers: pipeline stage distributions that appeared empty, and all deals showing as "unassigned." Nobody flagged it as a data problem — they assumed it was a slow pipeline period.

The damage: a commission reconciliation that had to be manually reconstructed for nine days of data, and a Q3 close call where the VP of Sales almost approved a headcount freeze based on pipeline data that showed 30% fewer qualified deals than actually existed.

Had Harbinger Explorer been monitoring this API, the schema diff on the night of the vendor release would have shown:

Schema change detected: crm_deals_api
Crawl: 2025-09-14 02:31:07 UTC

CHANGED FIELDS:
  deal_stage: string → integer
  owner_id: REMOVED
  
NEW FIELDS:
  assigned_rep_id: string
  stage_code: integer
  stage_label: string

Alert sent. Previous schema version preserved.

The on-call analyst would have seen this alert before the sales team opened their dashboards. The pipeline mapping would have been updated the same morning. Zero days of bad data.

The lesson: API schema validation isn't about catching malicious changes. Most schema changes are intentional improvements from the provider's side. The problem is the communication gap — providers change things on their timeline, not yours. Automated schema monitoring closes that gap before it costs you anything. Every day without monitoring is a day where a schema change could be silently corrupting data. The question isn't whether your APIs will change — they will. The question is whether you'll find out immediately or in two weeks when a report looks wrong and you can't explain why.

-- Query to inspect a detected schema change in Harbinger Explorer:
SELECT
  field_name,
  previous_type,
  current_type,
  change_type,
  detected_at
FROM schema_change_log
WHERE source_name = 'crm_deals_api'
  AND detected_at >= CURRENT_DATE - INTERVAL '7 days'
ORDER BY detected_at DESC

Conclusion

API schemas change constantly, and most teams only find out when something breaks. By the time the alert fires, you may have days or weeks of corrupted data in your systems. An API schema validation tool that monitors continuously, diffs automatically, and alerts immediately transforms a recurring crisis into a managed workflow.

Harbinger Explorer registers your API endpoints, infers schemas automatically, and flags changes on every recrawl — without manual schema files, without homegrown monitoring scripts, and without waiting for something to break before you notice.


Ready to skip the setup and start exploring? Try Harbinger Explorer free →



Continue Reading

Try Harbinger Explorer for free

Connect any API, upload files, and explore with AI — all in your browser. No credit card required.

Start Free Trial

Command Palette

Search for a command to run...