Hundreds of millions of customers. Billions of queries per year and dollars in revenue. The scale and impact of Amazon Search is huge and we need smart engineers to manage the infrastructure and operations of the product service. The Search Customer Experience team is responsible for the worldwide customer facing features on desktops, tablets, and mobile devices, everything from the moment a customer clicks into the search box to when they view search results.
Join us and be part of a new team driving operational excellence within one of Amazons largest impact services. This team combines application operations, systems engineering and software development expertise to run large-scale and fault-tolerant Tier-1 systems. We need customer-obsessed engineers who relentlessly focus on performance, availability and efficiency to help us solve complex growth and support issues.
Be part of a close-knit team that are agile, data driven and highly collaborative. Outside of work, we have regular team social events and we regularly get out of the office (rain or shine) to clear our heads and have some fun in Seattle as a team.
Site reliability, systems, and application operations of the Amazon Search CX service.
Project ownership of engineering initiatives from inception, actively engaging during design reviews and development efforts to ensure a sound deployment plan and mitigation of operational burden.
Represent the Ops team on key engineering releases and features ensure operational readiness and communicate deployment and mitigation planning to worldwide Ops team.
Lead operational excellence efforts and propose high impact initiatives and projects lead the effort by working with other ops or search development engineers.
Daytime on-call support, monitoring, and triaging as part of a shared rotation. Diagnose and mitigate critical failures in high pressure situations. Perform deep dives and root cause analysis as needed.
Collaborate with engineering and remote support engineers to drive down operational burden through improved documentation and SOP/runbook creation.
Fleet and application performance analysis and scaling to keep up with business growth and improve efficiency.
Analyze big data sets to identify optimization opportunities and act on them.
Develop tools and scripts to automate manual processes or improve existing frameworks.
Bachelors in Computer Science or equivalent experience.
5+ years of recent systems engineering, software engineering, site reliability, or dev-ops experience in a medium to large scale production Linux or other UNIX environment.
5+ years experience with Linux, Apache, DNS, monitoring, load-balancing, and caching.
2+ years experience developing software in Java or C++.
Experience with Amazon Web Services (AWS), ideally S3, EC2, EMR, and DynamoDB.
Experience managing large scale systems on the internet with a focus on application operations.
Experience working with high-availability, distributed systems and services in a hosting environment including hardware, OS, storage, network, and database solutions.
Experience in the development and rollout of technical operations processes and new services.
Working knowledge of Agile development methods (Kanban, Scrum, etc).
Working knowledge of data structures/algorithms.