Python Data Engineer

Loma Technology

  • Location:

    Phnom Penh, Cambodia

  • Category:

    Information Technology

  • Job Type:

    Full Time

  • Salary:

    Negotiable

Skills Required: Python


Educational Requirements:
  • Bachelor Degree
Experience:
  • 3 Years

Extra Benefits:

  • Sick Leave
  • Annual Leave
  • Special Leave

Job Description:

Responsibilities

• End-to-end ETL Delivery

◦ Own data extraction, cleansing, transformation, aggregation, and loading into target

tables (fact tables / aggregated tables / metrics tables) to support reporting and

downstream consumption.

• Re-runnable / Backfill / Traceable Pipeline Design

◦ Design and implement idempotency and deduplication strategies (primary key / dedup

keys / write strategy), enabling re-runs and backfills by date/batch while ensuring

consistent results with no duplicates or missing records.

• Data Quality (DQ) Rules & Alerting Loop

◦ Define DQ rules (null checks, duplicates, range checks, amount/summary reconciliation,

row-count anomaly, etc.), generate DQ reports and alerting mechanisms, and drive the

closed loop of investigation, fix, backfill, and validation.

• Reconciliation & Variance Analysis

◦ Define reconciliation scope and dimensions (by day / user / merchant / currency, etc.),

produce variance overview and detailed reports; categorize variances (lateness,

duplication, missing data, logic/definition mismatch, etc.), and propose corrective and

backfill actions through to closure.

• SQL & Job Performance Optimization (Measurable)

◦ Optimize SQL and processing performance (batching, incremental loads, preaggregation,

computation pushdown, etc.), establish operational baselines and SLAs, and

continuously reduce runtime and failure rates.

• Observability & Operational Governance

◦ Establish run metadata and observability standards (batch id / watermark / processed

range / row counts / duration / failure reasons) to ensure failures are diagnosable,

traceable, and reviewable.

• Metric Definitions & Data Contract Alignment

◦ Align with business and engineering teams on field definitions, metric logic, granularity,

and time-cut rules; maintain versioning and change logs to prevent definition drift and

ensure traceability.

Requirements

• 3–5 years of experience in Python data processing / ETL, with proven production delivery

experience including scheduled runs, failure handling, re-runs, backfills, and traceability.

• Proficient in either Pandas or Polars; capable of

joins/aggregations/deduplication/exception handling, with performance optimization

awareness.

• Strong SQL skills (complex aggregations; window functions preferred), with understanding of

indexes and common performance troubleshooting approaches.

• Familiar with at least one scheduler: Airflow / XXL-Job / cron; able to design job

dependencies, retries, timeouts, and alerting.

• Solid fundamentals in data modeling: fact/dimension modeling, metric definitions,

granularity, and time-cut conventions.

• Able to clearly explain and implement: ETL idempotency, deduplication, variance

categorization & handling, and foundational DQ rules & alerting.

• Basic engineering discipline: version control, structured coding practices, and necessary

tests/validations (unit tests or data validation scripts).

• Data security awareness: follow least-privilege access principles; support masking/audit

requirements when needed.

Nice to Have

• Kafka streaming / incremental processing experience (watermark, late-arriving data

handling).

• Experience with analytical engines such as ClickHouse / Elasticsearch.

• Experience with data lineage / data contracts / metadata governance (field versioning,

impact analysis, traceable changes).

• Data quality platform mindset (templated rules, standardized reports, process-driven

exception handling).

Job Summary:
  • Job Posted:17 Apr, 2026

  • Expiration:17 May, 2026

  • Vacancy:5

  • Gender: No Preference

Working Conditions:
  • On Site