Site Reliability Engineer
We are looking for a very well-rounded, experienced Site Reliability Engineer (SRE) to join a team of SREs dedicated to support and improvement of our back end and firewall platforms. This person must dive deep into operational issues, from systems, automation, and process perspectives. The candidate will understand the challenges around integrating disparate infrastructures into a new facility and new processes and procedures.
- Perform deep dives into both systemic and latent reliability issues; partner with software and systems engineers across the organization to produce and roll out fixes
- Troubleshoot issues across the entire stack: hardware, software, application and network
- Drive standardization efforts across multiple disciplines and services in conjunction with embedded SREs throughout the organization
- Identify and drive opportunities to improve automation for the company; scope and create automation for deployment, management and visibility of our services
- Represent the SRE organization in design reviews and operational readiness exercises for new and existing services
- Work with software engineers and development SREs to improve upon deployment processes.
- Sound fundamentals in operating systems, networking, and distributed systems
- Strong familiarity with Linux systems administration, management, best practices, and performance tuning
- Familiarity with OS container technology: Docker, LXC, namespaces/cgroups
- Strong understanding of Ethernet, VLAN, IPv4/IPv6, ARP, DHCP, DNS, and TCP
- Strong understanding of routing protocols such as BGP, ISIS, OSPF
- Familiarity with the operations of carrier-grade backbone networking
- Familiarity with distributed system problems: leader election, consensus, etc.
- Solid understanding of systems and application design, including the operational trade-offs of various designs
- Expert level understanding with at least one public or private cloud technology such as Amazon AWS, Azure, OpenStack, etc.
- Practical knowledge of various aspects of service design, including messaging protocols & behavior, caching strategies and software design practices
- Practical, intermediate knowledge of shell scripting and python
- Demonstrable knowledge of TCP/IP, HTTP, web application security, and experience supporting multi-tier web application architectures
- Minimum 5 years of managing services in an internet scale Unix/Linux environment
- Must work well with and be able to influence myriad personalities at all levels
- Ability to prioritize tasks and work independently, must be able to work with multiple teams across multiple customers
- Must be adaptable and able to focus on the simplest, most efficient & reliable solutions
- Track record of successful practical problem solving, excellent written and interpersonal communication, and documentation skills
- Curiosity and an interest in networking, systems software, and/or distributed systems
- Experience as a systems administrator or operations engineer in a 24/7 production environment
- Experience deploying code to and/or managing large-scale node deployments providing software, platforms, or infrastructure as a service
- Experience with Arista, Dell, Brocade, and HP networking gear.
- Experience with HP, Dell/EMC, Super Micro server and storage gear.
- Experience with configuration management tools such as CFEngine, Bcfg2, Puppet, Chef, or Ansible
- Experience with Amazon Web Services, Google Compute Engine, or similar
- Experience with distributed compute (e.g., Spark or Hadoop), storage (relational databases such as Postgres or MySQL, horizontally-scalable non-relational databases such as HBase, Riak, or Cassandra), and search infrastructure (such as ElasticSearch or Solr/Lucene)
- Experience in horizontally scaling a production environment by an order of magnitude, ideally in a startup or other rapid-growth environment
Health, Dental, Vision,
About OPĀQ Networks
OPĀQ Networks breaks the traditional security mold with its cloud based service that empowers organizations to simplify, centralize, and secure their networks. OPĀQ Networks’ management platform integrates networking and security in a way that reduces complexity and costs, tightens security control, and establishes a truly agile infrastructure that allows them to more easily keep ahead of emerging threats and instantly adapt to business and regulatory requirements. Based in Northern Virginia, OPĀQ Networks is a privately held company that has earned the trust of reputable brands.