Senior Data Engineer
Data Science
About Volfront
Options analytics and intelligence consultancy. We combine decades of derivatives expertise with AI to build tools and solutions for volatility traders. Purpose-built by practitioners, for practitioners.
About The Role
VolFront builds volAI, a conversational AI terminal for professional options traders. Behind the chat interface is a proprietary data layer: vol surfaces, IV-by-delta, realized vol, earnings analytics, dividend forecasts, option prints, OCC volume, fundamentals, and live quotes – all produced by an in-house fleet of data loaders. When a loader fails, traders see stale or missing data. Data freshness and correctness ARE the product – and our proprietary, in-house data is a core competitive moat that is constantly expanding with bespoke datasets you will not find anywhere else. The role You own the data-loader fleet – the loader repository, its scheduled tasks, the live-quote infrastructure, and the PostgreSQL schema behind it – as the senior engineer accountable for it. The founder sets product and domain direction and does not write the code; you turn that into a pipeline that is correct, current, observable, and continuously improving.
What You’ll Do
Own (the mandate) • Be the single accountable owner of the market-data pipeline: a large fleet of Windows scheduled tasks on an Azure batch VM (US Central time), systemd timers and Go loaders on a Linux quote-cache host, the sidecar Postgres + FDW layer, and the analytics schema they write to. • Set the engineering standard for ingestion: schema design, idempotency, partitioning, backfill strategy, error handling, and data-contract discipline across every loader. • Own the PostgreSQL data layer at scale: partitioned options tables in the 100M-plus-row range, indexing and query performance, vacuum/bloat and storage health, FDW pushdown, connection management. Operate (the floor, not the ceiling) • Guarantee the overnight and evening batch runs land complete, fresh data before the US market open, every day. • Direct first-line monitoring: a data-ops engineer triages the morning data-quality email and alert stream under your runbooks; you own the hard incidents, the root causes, and the systemic fixes. • Drive recurring failure patterns to extinction (API quota exhaustion, IP/firewall changes, vendor schema drift, memory pressure, stuck processes) rather than re-fixing them each morning. Improve (the core of a senior seat) • Harden the fleet so failures are loud and self-announcing: row-count assertions, freshness checks, dead-man switches, retry/backoff, heartbeat monitoring, better alert routing. A loader that silently writes zero rows and reports success is the bug class you are paid to eliminate. • Reduce toil and snowflake risk: consolidate logging, standardize task wrappers, improve the centralized monitoring/remediation tooling, move manual steps into code, raise IaC coverage. • Improve performance and cost: faster loads, leaner storage, fewer wasted compute hours. Build (a core part of the seat) • Design and ship new loaders, new data sources, and bespoke in-house datasets as the product expands – this is how we widen the data moat, and it is a core, ongoing part of the seat, not an occasional project. • Evolve the live-quote architecture (in-memory cache, sidecar Postgres + FDW, HTTP API, Go loaders) – pub/sub and push-delta by design, never polling. • Own deploy safety for the data layer: staging, verification, rollback. Coordinate • Direct the junior data-ops engineer and the interns on pipeline work. • Interface with the app/LLM engineers on the read side (the application, its tools) so schema and contract changes never break the product. Tech you will own • Python (all loaders), PostgreSQL on Azure at scale, partitioning, FDW, performance tuning, psql • Windows Server: Task Scheduler, batch wrappers, PowerShell • Linux (Ubuntu): systemd services/timers, journald, SSH via jump host • Go (the live-quote loaders) – read fluently, extend comfortably • Azure: VMs, networking/firewall, App Service, storage and scaling • Git/GitHub workflow; CI/CD and IaC (Terraform or equivalent) • Vendor data APIs: SpiderRock, AlphaVantage, OCC, EDGAR, news feeds • AI coding tooling is used heavily for development and diagnostics – fluency working alongside it is expected
What You’ll Bring
• 6+ years building and running production data systems, with real end-to-end ownership (data engineering, backend, or platform/SRE). • Deep PostgreSQL: partitioning, indexing strategy, query performance on 100M-plus-row tables, FDW, vacuum/bloat and storage health. You diagnose with EXPLAIN ANALYZE, not by guessing. • Strong Python – you architect loaders, not just patch them. • Comfortable owning systems on BOTH Windows Server and Linux. •Data-integrity obsession: fail-loud discipline is a hard requirement. No silent fallbacks, no COALESCE-papering over missing inputs, no swallowed exceptions, no default values for required data. Works as designed or fails loud. • Pub/sub over polling for anything live (quotes, chains, deltas). • Monitoring instinct: you believe a job that silently writes zero rows and reports success is a worse bug than a crash, and you build the systems that catch it before a customer does. • Self-directed senior: small shop, high autonomy, “just do it.” You set the bar; you are not hand-held and you do not wait for process. • Working-hours overlap with US Central Time, including attentiveness in the pre-market window (roughly 6:00-8:30 AM CT) when overnight failures must be caught and fixed – though as owner your job is to make that window quiet. • Clear written English (design docs, incident notes, runbooks, change logs). Nice to have • Market-data or finance feed experience (options, equities, OPRA, SpiderRock) – this is gold and cuts the domain ramp significantly. • Streaming / real-time ingestion architecture. • Azure specifically (we are all-in on Azure). • Experience being the sole or primary owner of a production data platform. • Mentoring or directing junior engineers.
*This job posting exists to fill a vacancy.