Data Quality Improvement & Governance

Reduced recurring data defects and improved reporting trust by implementing rule-based monitoring, clear ownership, and a source-first remediation workflow.
Stack: Oracle ADW • PowerCenter • Python • Azure DevOps

Impact

  • Reduced time-to-fix in the source system from weeks to an hour, creating a governed triage process and clear ownership.
  • Reduced defects reaching downstream applications by over 95% through automated checks and exception handling.
  • Improved reporting accuracy by standardizing definitions and preventing “patch fixes” in reporting layers.

Context

Our analytics ecosystem relied on accurate HR/employee master data flowing through ETL (PowerCenter) into Oracle ADW, which fed downstream reports and applications. Data quality issues created rework, delayed reporting cycles, and reduced stakeholder confidence. The team needed a practical governance model that improved outcomes quickly—without slowing delivery.

Problem

Data defects were arriving late (often discovered in reports), requiring manual investigation and downstream workarounds. This created three recurring failure modes:

  • Slow remediation because ownership and the “correct place to fix” weren’t consistently enforced
  • Downstream defect leakage—bad records passed through to reporting and other applications
  • Accuracy drift—inconsistent or incorrect fields undermined trust in metrics and dashboards

Goals (first 90 days)

  • Reduce time to fix data in the source system (not just patch it downstream)
  • Reduce downstream defect leakage into reporting and consuming applications
  • Increase reporting accuracy through consistent definitions and automated quality controls

Top recurring issues reduced

  • Missing essential employee data (e.g., position, cost center)
  • Company codes not matching translations/descriptions (mapping issues)
  • Inaccurate employee attributes (e.g., wrong salary or job title)
  • Invalid effective-date / status combinations (e.g., active workers with termination dates)
  • Duplicate or conflicting records (e.g., multiple active assignments)

Define: Critical Data Elements

I identified critical employee fields (CDEs) that were most important to downstream accuracy—position, cost center, company code, job title, compensation fields—and documented business definitions and acceptable values.

Measure: Rule-based data quality checks

I implemented a set of quality rules to detect issues early in the pipeline:

Uniqueness: one active assignment per employee (if applicable)

Completeness: position/cost center must be populated for active employees

Validity: salary in expected range / correct currency / non-negative

Consistency: company codes must map cleanly to company descriptions

Implementation:

Python + SQL checks in/against Oracle ADW

Fix: Source-first remediation workflow

Instead of “fixing the report,” the workflow enforced fixing in the source system so issues didn’t recur. Each exception produced:

‣ resolution notes (so repeats become preventable)

‣ severity

‣ owning team

‣ expected SLA

‣ evidence (employee id / record key / failing rule)

RESULTS IN 90 DAYS

Faster Fixes, Fewer Fires, Trusted Reporting

In the first 90 days, this program turned data quality from a recurring headache into a measurable, manageable process—speeding up source-system corrections, stopping defects before they spread downstream, and restoring confidence in core HR reporting.

Faster source-system remediation

Improved median time from defect detection → corrected in source system through clear ownership, SLAs, and structured triage.

Reduced downstream defect leakage

Reduced the number of data issues reaching downstream applications and reporting by catching exceptions earlier and requiring validation before closure.

Improved reporting accuracy

Increased accuracy and consistency of core HR reporting by enforcing rule-based validation (completeness, consistency, effective dating, and duplication controls) rather than downstream patching.