Python Data Engineer
Loma Technology
-
Location:
Phnom Penh, Cambodia
-
Category:
Information Technology
-
Job Type:
Full Time
-
Salary:
Negotiable
Skills Required: Python
Educational Requirements:
- Bachelor Degree
Experience:
- 3 Years
Extra Benefits:
- Sick Leave
- Annual Leave
- Special Leave
Job Description:
Responsibilities
• End-to-end ETL Delivery
◦ Own data extraction, cleansing, transformation, aggregation, and loading into target
tables (fact tables / aggregated tables / metrics tables) to support reporting and
downstream consumption.
• Re-runnable / Backfill / Traceable Pipeline Design
◦ Design and implement idempotency and deduplication strategies (primary key / dedup
keys / write strategy), enabling re-runs and backfills by date/batch while ensuring
consistent results with no duplicates or missing records.
• Data Quality (DQ) Rules & Alerting Loop
◦ Define DQ rules (null checks, duplicates, range checks, amount/summary reconciliation,
row-count anomaly, etc.), generate DQ reports and alerting mechanisms, and drive the
closed loop of investigation, fix, backfill, and validation.
• Reconciliation & Variance Analysis
◦ Define reconciliation scope and dimensions (by day / user / merchant / currency, etc.),
produce variance overview and detailed reports; categorize variances (lateness,
duplication, missing data, logic/definition mismatch, etc.), and propose corrective and
backfill actions through to closure.
• SQL & Job Performance Optimization (Measurable)
◦ Optimize SQL and processing performance (batching, incremental loads, preaggregation,
computation pushdown, etc.), establish operational baselines and SLAs, and
continuously reduce runtime and failure rates.
• Observability & Operational Governance
◦ Establish run metadata and observability standards (batch id / watermark / processed
range / row counts / duration / failure reasons) to ensure failures are diagnosable,
traceable, and reviewable.
• Metric Definitions & Data Contract Alignment
◦ Align with business and engineering teams on field definitions, metric logic, granularity,
and time-cut rules; maintain versioning and change logs to prevent definition drift and
ensure traceability.
Requirements
• 3–5 years of experience in Python data processing / ETL, with proven production delivery
experience including scheduled runs, failure handling, re-runs, backfills, and traceability.
• Proficient in either Pandas or Polars; capable of
joins/aggregations/deduplication/exception handling, with performance optimization
awareness.
• Strong SQL skills (complex aggregations; window functions preferred), with understanding of
indexes and common performance troubleshooting approaches.
• Familiar with at least one scheduler: Airflow / XXL-Job / cron; able to design job
dependencies, retries, timeouts, and alerting.
• Solid fundamentals in data modeling: fact/dimension modeling, metric definitions,
granularity, and time-cut conventions.
• Able to clearly explain and implement: ETL idempotency, deduplication, variance
categorization & handling, and foundational DQ rules & alerting.
• Basic engineering discipline: version control, structured coding practices, and necessary
tests/validations (unit tests or data validation scripts).
• Data security awareness: follow least-privilege access principles; support masking/audit
requirements when needed.
Nice to Have
• Kafka streaming / incremental processing experience (watermark, late-arriving data
handling).
• Experience with analytical engines such as ClickHouse / Elasticsearch.
• Experience with data lineage / data contracts / metadata governance (field versioning,
impact analysis, traceable changes).
• Data quality platform mindset (templated rules, standardized reports, process-driven
exception handling).
