[Remote] Founding Data Engineer, AI Platform
Note: The job is a remote job and is open to candidates in USA. GC AI is the fastest-growing legal AI platform for in-house legal teams, seeking a Founding Data Engineer to own the entire data stack. This role involves consolidating data from multiple sources into a single warehouse, building internal data tools, and defining data engineering practices as the first dedicated data hire.
Responsibilities
- Take ownership of the data warehouse in BigQuery: modeling, pipeline development, data quality, and performance
- Build pipelines that consolidate product usage data, CRM data, billing, customer contract data, and user analytics into a single source of truth
- Design and build internal data tools using applied AI, including natural-language query interfaces and automated reporting, so the rest of the company can self-serve without waiting on an analyst
- Set up the warehouse so business teams can run their own queries and pull their own numbers without filing a ticket
- Build toward a data lake architecture that supports personalization and model fine-tuning for the GC AI product
- Keep the stack lean. Use what's available in BigQuery and the broader GCP ecosystem and make smart decisions to reduce complexity and cost without introducing tool sprawl
- Define data engineering practices, tooling, and standards as the first hire on what will become a team
Skills
- 5+ years of experience in data engineering, with hands-on experience building and maintaining data warehouses and pipelines
- Strong SQL skills and deep experience with BigQuery or comparable analytical databases
- Proficiency in Python for pipeline development, scripting, and tooling
- Experience building ETL/ELT pipelines that consolidate data from multiple source systems (SaaS APIs, event streams, databases)
- Experience working within GCP or a comparable cloud ecosystem
- Ability to design data models that are clean, performant, and usable by non-engineers
- Experience building internal data tools or agents using LLMs (text-to-SQL, natural language interfaces, automated reporting). This is a strong differentiator
- Experience as the first or early data hire at a startup, where you owned the full stack
- Familiarity with legaltech, legal operations, or SaaS product analytics
- Experience setting up self-serve analytics layers (semantic layers, BI tool configuration, data documentation)
- Experience with data infrastructure that supports ML workflows (feature stores, training data pipelines, data lakes)
- Experience with infrastructure as code, especially Terraform, for managing GCP data infrastructure
Company Overview