You will build, manage, and operate scalable data platforms centered around Kafka and data lakes.
Responsibilities
• Design data pipelines to ingest, process, and move data from various sources into the data lake using Kafka.
• Deploy, configure, and maintain Kafka clusters, including Kafka Connect and Schema Registry, ensuring high availability.
• Oversee the architecture and governance of the data lake, managing storage (e.g., S3/ADLS), security, and metadata.
• Develop producers and consumers to interact with Kafka topics using Python, Java, or Scala.
• Implement data quality checks, manage lineage, and enforce security controls across data flows.
Required Skills
• 5+ years of proven experience designing and managing data platforms with Apache Kafka and big data technologies.
• Strong proficiency in Python, Java, or Scala.
• Expertise in big data processing frameworks like Apache Spark and Apache Flink.
• Hands-on experience with cloud environments (AWS, Azure, or GCP) and services like S3 or Azure Data Lake Storage.
• Solid understanding of data lake design principles, including Delta Lake or Apache Iceberg.
• Familiarity with infrastructure-as-code tools like Terraform or Ansible and containerization with Docker and Kubernetes.
• Experience with SQL and NoSQL database systems.