Quick API Data Quality Checks Without Writing Python Scripts
Quick API Data Quality Checks Without Writing Python Scripts
Here's an uncomfortable truth about API data: it's rarely clean.
APIs return nulls where you expect values. They return strings where you expect numbers. They silently change field names between versions. They return stale data without telling you. They paginate inconsistently. They include test records in production responses.
Every analyst who works with API data has a story about a report that went wrong because the source data was garbage. A metric that was off because of duplicate records. A trend line that spiked because of a timezone issue in the timestamps.
The standard solution is to write a data quality script in Python. Install pandas. Write a DataFrame profiler. Check nulls, dtypes, duplicates, value ranges. Output a report. Set it up to run before your analysis.
It's good practice. It's also a significant time investment — especially if you're not a Python developer, or if you need to check a new API quickly without setting up a project.
There's a faster path.
What "Data Quality" Actually Means for API Data
Before we talk tools, let's be clear about what we're checking:
1. Completeness
Are the fields I expect actually populated? A user_id column shouldn't be 30% null. A revenue field shouldn't have blanks.
2. Uniqueness
Does the API return duplicates? Pagination bugs, caching issues, and API version differences can cause the same record to appear multiple times.
3. Validity
Are the values in a sensible range? Negative prices. Future timestamps in historical data. Age fields with values of 0 or 999. These are validity failures.
4. Consistency
Does the data agree with itself? If start_date is always before end_date. If country_code and country_name match. If totals match the sum of line items.
5. Freshness
Is the data up to date? If the API claims to update daily but the latest timestamp is from last week, that's a freshness failure.
6. Schema Drift
Has the API silently changed? New fields, renamed fields, changed data types — these break downstream analysis silently and painfully.
The Old Way: Python Quality Checks
Here's what a thorough data quality check on an API response looks like in the traditional workflow:
Step 1 — Set up the environment: Create a virtual environment. Install requests, pandas, numpy, maybe great_expectations or ydata-profiling.
Step 2 — Fetch the data: Write the API call. Handle authentication. Handle pagination. Handle rate limits. Flatten the JSON.
Step 3 — Load into pandas: df = pd.DataFrame(data). Deal with nested columns. Cast types.
Step 4 — Write the checks:
df.isnull().sum()— null counts per columndf.duplicated().sum()— duplicate rowsdf.describe()— stats per column- Manual range checks for each critical field
- Custom consistency checks
- Timestamp max for freshness
Step 5 — Interpret the results: 200 lines of output. Decide what's a problem and what isn't.
Step 6 — Document for next time: Hope you remember what you checked and why.
This is a 2–4 hour project the first time. Even with experience, it's 30–60 minutes per new API source. And it requires Python fluency throughout.
The New Way: Harbinger Explorer
Harbinger Explorer collapses this workflow to minutes — for people who know what questions to ask, but don't want to write a program to ask them.
Here's the workflow:
Step 1: Crawl Your API (2 minutes)
Add your API endpoint to Harbinger Explorer's Source Catalog. Authenticate once. Run the crawl. The data is loaded into DuckDB WASM — in your browser, instantly queryable.
Step 2: Ask Quality Questions in Plain English or SQL
The beauty of having your API data in a SQL engine is that every data quality check is a query. And with Harbinger's AI agent chat, you can ask in plain English:
Completeness checks:
- "How many rows have a null value in the user_id column?"
- "What percentage of records are missing revenue data?"
- "Show me all columns and their null counts."
Uniqueness checks:
- "Are there any duplicate record IDs?"
- "Show me any rows where the same email appears more than once."
Validity checks:
- "Are there any negative values in the price column?"
- "Show me records where the end_date is before the start_date."
- "What's the min and max value of the age field?"
Freshness checks:
- "What's the most recent timestamp in this dataset?"
- "How many records were created in the last 7 days?"
- "Show me the distribution of records by date."
Schema drift detection:
- Compare current schema to a previous crawl
- "What new columns appeared since last week?"
- "Has the data type of the revenue column changed?"
All of these translate to DuckDB SQL under the hood — fast, accurate, and reproducible.
Example: Validating an External Data Feed
You're an analyst receiving daily data from a third-party market intelligence API. You've been burned before — a field that went null for a week without warning, causing incorrect calculations in your weekly report.
Your new quality gate with Harbinger Explorer:
Monday, 8:50 AM — Open Harbinger Explorer. Re-crawl the API. 60 seconds.
8:52 AM — Ask: "How many rows are in today's data? Is that roughly the same as last week?"
8:53 AM — Ask: "Are there any nulls in the signal_score or market_region columns?"
8:54 AM — Ask: "What's the most recent update timestamp in the data?"
8:55 AM — Ask: "Are there any duplicate record IDs?"
8:57 AM — Green light. Data is clean. Start your analysis.
Total quality check time: 7 minutes. No scripts. No environment setup. No Python.
Compare that to the old way, and you've saved 40+ minutes every single day — while actually being more thorough because you're checking the right things for your specific use case.
The Most Important Quality Checks for API Data
Here are the checks that catch 80% of API data problems, phrased as questions you can ask directly in Harbinger Explorer:
| Check | Question to Ask |
|---|---|
| Row count sanity | "How many rows did we get? Is that normal?" |
| Null completeness | "Show me null counts for each column" |
| Duplicate detection | "Are there any duplicate IDs?" |
| Value range validation | "What's the min and max of [critical numeric field]?" |
| Timestamp freshness | "What's the most recent date in the dataset?" |
| Category distribution | "Show me unique values in [category field] with counts" |
| Cross-field consistency | "Show me rows where end_date < start_date" |
| Outlier detection | "Show me records where [metric] is more than 3x the average" |
These eight checks will catch the vast majority of data quality issues before they reach your reports.
Who Needs This Most
Freelance Data Consultants
When you're delivering analysis to clients, data quality is your responsibility — even when the source is a third-party API you don't control. A quick quality gate before every deliverable protects your reputation. With Harbinger Explorer, it's a 10-minute habit, not a half-day project.
Internal Analysts at Fast-Moving Companies
Your data team is heads-down on roadmap. You can't create a ticket every time you want a new API validated. Harbinger gives you the autonomy to run your own checks.
Researchers Working with Public APIs
Academic and public datasets are notoriously inconsistent. APIs revise historical data, update field definitions, and change response formats without announcement. Regular quality checks catch these changes before they corrupt your research.
Bootcamp Graduates Entering Analyst Roles
You know what data quality means, but you haven't built the Python tooling yet. Harbinger Explorer gives you the outcome — validated, understood data — while you develop your scripting skills.
Competitor Comparison
| Tool | For Non-Devs | API Crawling | NL Queries | Data Quality Checks | Price |
|---|---|---|---|---|---|
| pandas profiling | ❌ Python required | ❌ | ❌ | ✅ Auto-profile | Free |
| Great Expectations | ❌ Engineering heavy | ❌ | ❌ | ✅ Test suite | Open source |
| Ataccama | ✅ UI-based | ❌ | ❌ | ✅ Full platform | Enterprise $$$ |
| Metabase | ✅ | ❌ (needs DB) | ⚠️ | ⚠️ | $500+/mo |
| Harbinger Explorer | ✅ | ✅ | ✅ | ✅ Via SQL/NL | €8/mo |
Harbinger is the only tool that combines API access, browser-based SQL, and natural language queries at a price accessible to freelancers and small teams.
What Harbinger Explorer Doesn't Do (Be Honest)
It's worth being transparent about limitations:
- No automated alerting — Harbinger doesn't send you a notification if today's crawl has more nulls than yesterday. You run the checks manually. Think of it as a tool you use, not a sentinel that watches for you.
- No persistent data storage — Data lives in your browser session. You're not building a long-term quality history unless you export your results.
- Not a replacement for a full data quality platform — If your organization needs automated, continuous, enterprise-grade data contracts across 50 sources, that's a different tool category. Harbinger is for individuals and small teams who need fast, ad-hoc validation.
For what it is designed to do — fast, browser-based quality checks on API data for non-engineers — it's uniquely positioned.
Time Savings: By the Numbers
| Task | Python Script | Harbinger Explorer |
|---|---|---|
| Environment setup | 15–30 min | 0 |
| Fetch and flatten API data | 30–60 min | 2 min |
| Write null checks | 10 min | 30 sec |
| Write duplicate checks | 10 min | 30 sec |
| Write freshness checks | 10 min | 30 sec |
| Write consistency checks | 15–20 min | 1 min |
| Interpret and document results | 15–30 min | 5 min |
| Total first-time quality check | 1.5–3 hours | ~10 minutes |
| Total repeat check (same source) | 20–30 min | 5–7 min |
For a consultant running quality checks on 3–4 API sources per week, that's 4–6 hours saved every week. That's time that goes back into actual analysis, client communication, and deliverables.
Getting Started
Data quality checks don't need to be a project. They can be a habit.
- Visit harbingerexplorer.com
- Start your 7-day free trial
- Crawl your most important API source
- Ask: "Are there any nulls in the columns I care about?"
- Ask: "Are there any duplicates?"
- Ask: "How fresh is this data?"
You'll have a quality assessment in under 10 minutes — and a new habit that will save your analysis from bad data.
Pricing
| Plan | Price | Best For |
|---|---|---|
| Starter | €8/month | Freelancers, solo analysts, researchers |
| Pro | €24/month | Power users with multiple API sources |
| Free Trial | 7 days | Validate your most important source today |
Bad data is silent. It produces reports that look right, decisions that feel informed, and insights that are subtly wrong. The only defence is asking the right questions before you trust the numbers.
With Harbinger Explorer, those questions take 10 minutes, not half a day.
Continue Reading
Search and Discover API Documentation Efficiently: Stop Losing Hours in the Docs
API documentation is the final boss of data work. Learn how to find what you need faster, stop getting lost in sprawling docs sites, and discover APIs you didn't know existed.
Automatically Discover API Endpoints from Documentation — No More Manual Guesswork
Reading API docs to manually map out endpoints is slow, error-prone, and tedious. Harbinger Explorer's AI agent does it for you — extracting endpoints, parameters, and auth requirements automatically.
Track API Rate Limits Without Writing Custom Scripts
API rate limits are silent project killers. Learn how to monitor them proactively — without building a custom monitoring pipeline — and stop losing hours to 429 errors.
Try Harbinger Explorer for free
Connect any API, upload files, and explore with AI — all in your browser. No credit card required.
Start Free Trial