Sr Site Reliability Engineer
ClickUp is the world's only all-in-one productivity platform that flexes to the way people want to work. It replaces all individual workplace productivity tools with a single, unified platform including project management, document collaboration, spreadsheets, chat, goals, and more. On a mission to make the world more productive, ClickUp is headquartered in San Diego and scaling remotely and internationally. As one of the fastest-growing SaaS companies in the world, ClickUp helps millions of users to be more productive and save at least one day every week. 🦄
- Build a deep understanding of how ClickUp's systems behave, scale, interact and fail, and use that insight to identity risks and opportunities for remediation
- Own, drive and improve the incident management process across engineering org and participate in the team's follow-the-sun model
- Define SLOs and SLIs for all of our services and introduce error budgeting
- Own and improve our observability on all of our services
- Build software solutions to enable reliability and operability of large scale distributed systems handling petabytes of data and serving
- Build tools and automation to eliminate toil and reduce operational overhead. Create frameworks, processes and best practices to be used across ClickUp Engineering
- Automate critical portions of ClickUp engineering processes, to minimize risk and maximize the speed of innovation
- Manage capacity and performance to help scale our infrastructure both on public and private clouds around the world
- Software engineering: At the very core, we are looking strong software engineers with operational, infrastructural or SRE mentality who can design and build systems for platform and infrastructure layers
- Cloud experience: Production working experience in a major cloud environment around doing CI/CD deployments, using managed services, bootstrapping and provisioning services via infrastructure-as-code (IAC) systems, automations and operations
- Infrastructure Management: You have worked with and managed production grade infrastructure with IaC tools or configuration management tools
- Operating systems: Strong knowledge of *nix based operating systems, their internals and advanced troubleshooting commands
- Compute: Experience of working with VMs, containers and container orchestration systems
- Database: Experience of working with RDBMS and NoSQL storage solutions within production capacity and know your way around running and inspecting queries. A good understanding of indexing, locking, replication and sharding are a bonus!
- Observability: You have worked with logging, monitoring and alerting tools before and you know how logs are collected, aggregated and injected. You have set up monitors and alerts for production services and know your way around concepts such as SLOs and SLIs
- Bonus points: We believe strong engineers can pick up any technologies and tools fast and hit the ground up running. Therefore, we avoid listing specific technologies. However, if you have worked with at least one of the technologies we have in our stack that would definitely be a bonus point.
- CloudFormation/CDK, ECS, ElasticBeanstalk
- PostgreSQL, DynamoDB, AuroraDB
ClickUp was founded on a culture of hard work, consistent growth, and a desire to break norms. We’re a values-driven company and hire based on ambition, merit, and a willingness to do what it takes to succeed. We don’t care where you’re from, what you look like, or who you’re in a relationship with—we hire the best people for the job, and create an environment that supports employees on their journey to do the most exciting work of their lives! ClickUp is an Equal Opportunity Employer, and qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, or national origin.