Job Title: Data Engineer Lead (Fix Term Contract 2 Years)
Role Purpose
· The Lead Data Engineer (AI) is responsible for designing, building, and managing scalable data infrastructure and pipelines that enable AI and advanced analytics across Mitr Phol. This role ensures reliable ingestion, transformation, governance, and delivery of structured, semi-structured, and unstructured data to support Data Science, Generative AI, and enterprise AI applications. The Lead DE will oversee a team of data engineers and collaborate closely with data scientists, product managers, and business stakeholders to ensure data readiness and operational excellence.
Key Responsibilities
· Data Architecture & Pipeline Development
o Design and implement data pipelines for AI/ML workloads (batch & streaming).
o Manage ingestion of diverse data sources (ERP, IoT, APIs, external market feeds, unstructured text).
o Build data lakes and warehouses optimized for AI use cases (structured + unstructured).
o Ensure pipelines are resilient, scalable, and cost-efficient in hybrid cloud environments (Azure/AWS/GCP).
· Data Governance & Quality
o Define and enforce data standards (metadata, lineage, schema, naming).
o Implement monitoring and validation to ensure high-quality, AI-ready data.
o Collaborate with AI Data Governance Lead on compliance, access control, and Responsible AI practices.
· Collaboration & Stakeholder Management
o Partner with Data Scientists to provide curated, feature-ready datasets.
o Work with AI Product Managers to align pipelines with product roadmaps.
o Coordinate with IT/Infra for integration, performance, and cybersecurity.
· Leadership & Mentoring
o Lead a team of Data Engineer Analysts (mentoring, code reviews, best practices).
o Set technical standards for data engineering, CI/CD, and cloud operations.
o Provide thought leadership on new data engineering tools and AI-driven architectures.
· Innovation & Optimization
o Evaluate and introduce modern data engineering practices (event-driven, lakehouse, microservices).
o Support advanced AI requirements such as RAG pipelines, real-time inference, and vector databases.
o Drive continuous improvements in performance, scalability, and cost optimization.