[Remote] Senior Data Engineer
Note: The job is a remote job and is open to candidates in USA. Effectual is seeking a Senior Data Engineer with specialized expertise in data streaming technologies to join their data team. This role focuses on building and maintaining high-performance data streaming architectures that enable real-time data processing and analytics.
Responsibilities
- Design, build, and maintain scalable streaming data architectures using Kafka, MSK, and Kinesis
- Develop real-time data pipelines that handle high-volume, high-velocity data streams
- Implement event-driven architectures and microservices patterns for streaming data processing
- Create and optimize data streaming topologies for complex event processing scenarios
- Design fault-tolerant streaming systems with proper error handling and data recovery mechanisms
- Configure, deploy, and manage Apache Kafka clusters and AWS MSK environments
- Implement Kafka Connect pipelines for streaming data integration
- Design optimal Kafka topic partitioning strategies and replication configurations
- Monitor and optimize Kafka cluster performance, throughput, and latency
- Implement Kafka security configurations including SSL/TLS, SASL, and ACLs
- Manage Kafka Schema Registry for data serialization and evolution
- Design and implement Amazon Kinesis Data Streams and Kinesis Data Firehose solutions
- Configure Kinesis Analytics applications for real-time stream processing
- Optimize Kinesis shard management and auto-scaling configurations
- Implement Kinesis data retention and archival strategies
- Integrate Kinesis with other AWS services for comprehensive streaming solutions
- Develop real-time stream processing applications using Apache Spark Streaming, Kafka Streams, or AWS Lambda
- Implement complex event processing (CEP) patterns for real-time analytics
- Build streaming ETL pipelines that transform data in motion
- Create real-time aggregations, windowing operations, and stateful stream processing
- Optimize streaming query performance and resource utilization
- Ensure seamless integration between streaming systems and data lakes, data warehouses, and operational databases
- Implement data lineage and monitoring for streaming data pipelines
- Create automated data quality checks and validation for streaming data
- Manage data serialization formats (Avro, JSON, Protobuf) and schema evolution
- Coordinate with data scientists and analysts to ensure streaming data meets analytical requirements
- Implement Infrastructure as Code (IaC) for streaming data platforms using Terraform or CloudFormation
- Automate deployment and management of streaming infrastructure through CI/CD pipelines
- Monitor streaming system health, performance metrics, and alerting
- Implement disaster recovery and high availability strategies for streaming systems
- Stay current with emerging trends in streaming technologies and cloud-native solutions
- Collaborate with data architects, data scientists, and application teams on streaming data requirements
- Support rigorous project governance through daily progress reviews and time tracking
- Provide technical leadership and mentorship to junior data engineers
- Communicate complex streaming concepts to technical and non-technical stakeholders
- Operate with transparency and responsiveness to support high-performing teams
Skills
- 7+ years of experience in the data engineering field with significant streaming data specialization
- Bachelor's degree in Computer Science, Engineering, or related STEM field
- Extensive hands-on experience with Apache Kafka including cluster management, performance tuning, and ecosystem tools
- Proven experience with AWS MSK and Amazon Kinesis services in production environments
- Strong background in real-time data processing and stream analytics
- Streaming Technologies: Apache Kafka, Kafka Connect, Kafka Streams, Amazon MSK, Amazon Kinesis (Data Streams, Data Firehose, Analytics)
- Programming Languages: Proficient in Python, Java, and Scala for streaming applications
- Stream Processing Frameworks: Apache Spark Streaming, Apache Flink, AWS Lambda for stream processing
- Data Serialization: Experience with Avro, Protocol Buffers, JSON, and schema registry management
- Big Data Technologies: Hadoop ecosystem, Apache Spark, distributed computing concepts
- Database Technologies: SQL and NoSQL databases, data warehousing solutions, time-series databases
- AWS Services: Deep knowledge of AWS streaming and analytics services (MSK, Kinesis, Lambda, EMR, Glue)
- Containerization: Docker and Kubernetes for streaming application deployment
- Infrastructure as Code: Terraform, CloudFormation for streaming infrastructure automation
- Monitoring: CloudWatch, Prometheus, Grafana for streaming system observability
- Security: Implementation of streaming data security, encryption, and access controls
- Expert use of code versioning tools such as GitHub
- Expert knowledge of Agile methodologies and delivery practices
- Experience with CI/CD pipelines for streaming data applications
- Understanding of data APIs, REST services, and microservices architectures
- Leadership & Team Management
- Risk Management and mitigation strategies for streaming systems
- Conflict Resolution
- Strategic Planning & Leadership for data streaming initiatives
- Resource Management and capacity planning
- Change Management for streaming technology adoption
- Core AWS Certifications: AWS Data Engineer Associate (required)
- AWS Solutions Architect Professional (preferred)
- AWS Developer Professional (recommended)
- Confluent Certified Administrator for Apache Kafka (highly recommended)
- Confluent Certified Developer for Apache Kafka (preferred)
- AWS Big Data Specialty (if available in current form)
- AWS Security Specialist
- Certified Associate Data Analyst with Python
- Certified Professional Python Programmer Level 1
- Databricks Data Engineer Professional
- Certified Associate Python Programmer
- Java or Scala certification (Oracle Certified Professional)
- Experience with Apache Flink for advanced stream processing
- Knowledge of Apache Pulsar as an alternative messaging system
- Experience with event sourcing and CQRS patterns
- Understanding of Apache Airflow for batch and streaming workflow orchestration
- Experience with ksqlDB for stream processing using SQL
- Background in financial services, IoT, or other real-time data intensive industries
- Experience with multi-cloud streaming architectures
- Knowledge of Apache NiFi for data flow automation
Benefits
- Medical, dental, and vision health insurances
- Short term disability, long term disability and life insurances
- 401k with Company match
- Paid time off (PTO) (120 hours PTO that accrue over one year)
- Paid time off for major holidays (14 days per year)
- These and any other employee benefit offerings are subject to management’s discretion and may change at any time.
Company Overview
Company H1B Sponsorship