Harbinger Explorer

Back to Knowledge Hub
solutions
Published:

API Data Quality Check Tool: Automatic Profiling for Every Response

12 min read·Tags: api data quality check tool, data quality, api profiling, schema validation, data reliability, pii detection

API Data Quality Check Tool: Automatic Profiling for Every Response Before You Trust the Data

You built the dashboard. The numbers looked right in testing. The stakeholders signed off. Then, six weeks into production, someone noticed the revenue figure was 40% lower than it should have been. Turned out the API had started returning nulls for a key billing field three weeks earlier — silently, with no error, no warning, and no indication that anything had changed.

You trusted the data. The data had quietly broken.

This is the core problem with API data quality: you can't see the failures until the damage is downstream. You need a tool that profiles every API response automatically — before the data reaches your dashboards, your models, or your decisions.

Harbinger Explorer is that tool. It runs data quality checks on every API response automatically, with zero configuration required.


Why API Data Quality Is Different From Database Quality

If you're used to data quality tooling for databases or data warehouses, API data quality will feel familiar in some ways and completely different in others.

You don't control the source. With a database, you own the schema, the write process, and the constraints. With an external API, you're at the mercy of the provider. Fields appear and disappear. Types change silently. The documentation lags the actual behaviour by months. When the API breaks, it doesn't break loudly — it drifts.

Responses are unpredictable at field level. A database row has a defined schema: every row has the same columns with the same types. An API response can be wildly inconsistent. The same endpoint might return a string in one response and an integer in the next. Nested objects might be present in some responses and absent in others. Null handling varies by endpoint, by version, and sometimes by record.

There's no schema contract you can enforce. OpenAPI specifications exist for some APIs, but they describe intent, not reality. The actual response structure can diverge significantly from the spec, especially in older or less-maintained APIs. You find this out the hard way.

Volume makes manual checking impossible. If you're hitting an API 10,000 times a day across 15 endpoints, manually reviewing response quality is not feasible. You need automated profiling that runs continuously — not a one-off check someone runs when they have time.

Quality issues compound downstream. A null value in an API response might be fine to ignore. Or it might be a broken calculation that propagates through your pipeline, corrupts an aggregation, and surfaces as a wrong number in a report three layers downstream. Without upstream quality checks, you're debugging in the dark.


The Limits of Existing Quality Approaches

Teams handle API data quality in different ways, and most of them are reactive, manual, or both.

Assertions in transformation pipelines (dbt tests, Great Expectations) are the most principled approach available. Write tests that validate data after transformation — check that revenue is never negative, that user IDs are always non-null, that dates fall within expected ranges. This is good practice, but it has a critical limitation: it runs after the data has already entered your system. You're catching problems at the output, not at the source. And writing meaningful assertions requires someone to first understand the expected data distribution — which requires prior analysis.

Monitoring dashboards surface quality issues after they've affected metrics. When a number looks wrong, someone investigates. This is better than nothing, but it's not quality checking — it's quality incident response. The damage is done before the check runs.

Manual API testing with Postman or similar tools lets you inspect individual responses. It's useful for development and debugging, but it's not scalable quality monitoring. You'd have to manually run each endpoint, inspect each response, and compare to expected patterns — and do it continuously. That's not a workflow; it's a nightmare.

Custom Python monitoring scripts can automate response checking, but they require significant development and maintenance effort. Writing a comprehensive quality check script for every API your organisation uses takes weeks. Keeping it updated as APIs change takes ongoing engineering time. This is a significant investment for what should be table-stakes functionality.

What's missing is a tool that profiles API responses automatically — capturing null rates, type distributions, range statistics, uniqueness, and schema consistency — without any upfront configuration or ongoing maintenance.


Automatic Quality Profiling, Zero Configuration

What if your API data quality check ran every time you crawled a source — automatically, without writing a single assertion or test script?

Imagine connecting an API and immediately seeing: which fields have null values, what the data type distribution looks like, which fields contain potential PII, what the min/max/average values are for numeric fields, and how the schema compares to the last crawl. No setup. No configuration. Just data, profiled.

That's what Harbinger Explorer's automatic profiling delivers. When the AI Crawler maps an API, it doesn't just record field names and types — it profiles the actual data. Sample values are analysed to produce quality metrics that give you immediate, actionable visibility into the health of every response.

Null rate detection flags fields with high proportions of null or missing values. A field that's null in 80% of responses might be expected behaviour — or it might be a broken upstream calculation. Either way, you know. Before you build anything on top of that field, you know its reliability profile.

Type consistency checking identifies fields where the type varies across responses. A field that should be a number occasionally returning a string is a classic API quirk that causes downstream failures at the worst possible time. Automatic type profiling surfaces this pattern immediately.

Schema change detection compares the current crawl against the previous one and shows you exactly what changed. New fields, removed fields, type changes, renamed keys — the diff is shown clearly, so schema evolution is visible rather than silent.

PII detection is built into the profiling layer. Fields containing personal data — names, email addresses, phone numbers, national IDs — are flagged automatically. This isn't a separate governance module; it's part of every crawl.

Range and distribution analysis gives context to numeric fields. Knowing that a revenue field ranges from €0 to €1.2M tells you something about what constitutes an anomaly. Automatic range profiling on crawl gives you a baseline — useful both for understanding the data and for writing targeted quality checks later.


How Quality Checks Work in Harbinger Explorer

Step 1: Connect your API. From the Sources panel, add any REST API with its authentication credentials. Harbinger Explorer supports API keys, bearer tokens, and OAuth out of the box.

Step 2: Crawl with automatic profiling. Click crawl. The AI Crawler maps endpoints, samples responses, and automatically runs the quality profiling suite. Within minutes, you have a complete quality report for every field in every endpoint.

Step 3: Review the quality dashboard. The quality overview shows field-level metrics across all endpoints: null rates, type distributions, PII flags, schema change summaries. High-risk fields are surfaced prominently — you don't have to go hunting for problems.

Step 4: Investigate with DuckDB SQL. Use the built-in SQL editor to dig deeper into any quality concern. Query sample data, calculate custom statistics, cross-reference fields across endpoints. If you need to understand why a field has a high null rate, the query layer gives you the tools to investigate.

Step 5: Set up recrawling for ongoing monitoring. On Pro plans, schedule automatic recrawls on a daily or weekly cadence. Schema changes and quality regressions are flagged as diffs — so you're alerted when an API silently changes, not three weeks later when a dashboard starts showing wrong numbers.


Try it yourselfStart exploring for free. No credit card. 8 demo data sources ready to query.


Advanced Quality Features

Beyond baseline profiling, Harbinger Explorer offers depth for teams with more demanding quality requirements.

Cross-source consistency checks. When the same conceptual data appears in multiple APIs — say, customer IDs in both your CRM API and your billing API — you can use DuckDB SQL JOINs to check consistency across sources. Do the same customer IDs appear in both? Do their associated values agree? Cross-source quality is one of the hardest problems in data engineering; Harbinger Explorer makes it a SQL query.

Governance and lineage. Mark fields as quality-verified, flag fields with known issues, and document expected behaviour in Column Mapping. This governance layer turns individual quality findings into institutional knowledge — future team members benefit from every quality investigation that came before.

Alerting on schema change. On Pro plans, schema change detection sends alerts when a crawled API changes structure. You define which changes matter: a new optional field might be fine to ignore, but a removed required field is critical. Configure the alerting threshold to match your risk tolerance.

Historical quality trending. As recrawls accumulate, Harbinger Explorer builds a history of quality metrics for each source. You can see null rates over time, track schema stability, and spot gradual data quality degradation before it becomes a crisis.

Export for downstream testing. Quality profiles generated by Harbinger Explorer can inform the assertions you write in dbt or Great Expectations. Instead of guessing what normal looks like, start from the actual distribution data captured during profiling.


Comparison: Manual Quality vs. Harbinger Explorer

Quality CheckManual ApproachHarbinger Explorer
Null rate detectionWrite custom assertion per fieldAutomatic on every crawl
Type consistencyManual inspection or scriptAutomatic profiling
Schema change detectionSpot-check or pipeline failureAutomatic diff on recrawl
PII detectionSeparate audit processBuilt-in, runs on crawl
Range / distribution statsCustom pandas profiling scriptAutomatic, no config needed
Time to first quality reportHours to daysMinutes
Ongoing maintenanceHigh (scripts need updating)None (recrawl handles it)
CostEngineering time + toolingFrom €8/month

Pricing: Starter at €8/month (25 chats/day, 10 crawls/month) or Pro at €24/month (200 chats/day, 100 crawls/month, recrawling, priority support). See pricing →

Free 7-day trial, no credit card required. Start free →


Frequently Asked Questions

Does it work with private or internal APIs? Yes. Harbinger Explorer supports authenticated REST APIs with API keys, bearer tokens, and OAuth. Internal APIs are supported — contact support for private network or VPN-based configurations.

What does "automatic profiling" actually cover? Every crawl profiles null rates, inferred type distributions, sample value ranges for numeric fields, uniqueness indicators for potential key fields, PII detection, and schema comparison against the previous crawl. This runs without any configuration — you get the report as part of the crawl result.

Can I write custom data quality rules on top of the automatic profiling? The DuckDB SQL editor lets you write custom quality checks as SQL queries — for example, "show me all records where revenue is null but order_status is 'completed'". You can save these as named views for repeated use. For more formal test frameworks, the profiling data can inform assertions you write in dbt or Great Expectations.

How does schema change detection work? Each crawl captures the full field schema for every endpoint. On recrawl, the new schema is diffed against the previous one. Added fields, removed fields, and type changes are highlighted in the quality report. On Pro plans, schema changes can trigger email alerts.

Is this a replacement for dbt tests or Great Expectations? No — it's complementary. Harbinger Explorer provides upstream source profiling so you understand what you're working with before you build transformations. dbt tests and Great Expectations validate transformed data. Both have a role; Harbinger Explorer fills the gap at the source layer.


Know What You're Working With Before You Build On It

The cost of bad data quality isn't just wrong numbers — it's wrong decisions, missed opportunities, damaged credibility, and hours of debugging to find a root cause that was detectable at the source.

Harbinger Explorer makes API data quality a default, not an afterthought. Every time you connect a new source, you get a full quality profile automatically. Every time the API changes, you see the diff. PII is flagged before it flows downstream. Schema mutations are visible before they break your pipeline.

Stop trusting data you haven't checked. Start from €8/month with a 7-day free trial — no credit card required.


Ready to know your data before you use it? Try Harbinger Explorer free →



Continue Reading

Try Harbinger Explorer for free

Connect any API, upload files, and explore with AI — all in your browser. No credit card required.

Start Free Trial

Command Palette

Search for a command to run...