Site Reliability Engineer - IBM Cloud Databases Introduction : Software Developers are the backbone of our strategic initiatives to design, code, test, and provide industry-leading solutions that make the world run today - planes and trains take off on time, bank transactions complete in the blink of an eye and the world remains safe because of the work our software developers do. Whether you are working on projects internally or for a client, software development is critical to the success of IBM and our clients worldwide. You will use the latest software development tools, techniques and approaches and work with leading minds in the industry to build solutions you can be proud of.
Your Role and Responsibilities : Site Reliability Engineer - IBM Cloud Databases
- Sydney based
The Cloud Data services team is responsible for developing and operating the Software as a Service offerings that provide Data Services in Cloud. IBM Cloudant, Databases for PostgreSQL, Databases for Redis, Databases for Elasticsearch, Database for etcd, Databases for MongoDB, Messages for RabbitMQ, and IBM Event Streams which is IBM's Apache Kafka offering, make up the family of services that run in multiple IBM Cloud datacenters around the globe. As a rapidly growing set of offerings, we will be bringing new, secure, and compliant data service offerings to market, expanding the breadth of databases that we offer on our platform.
Candidates should have a strong desire to work within a CI/CD environment and have a passion for embracing new cloud technologies and working with our customers to ensure they are successful. You need to be collaborative, able to handle responsibility, and love learning new techniques and tools.
With quality and robustness in mind you will drive and implement new tools to facilitate operations, create applications to gather insights into our platform, or changes to resolve or mitigate commonly hit operational issues related to our database offerings.
As a member of the data services team you will join the primary on-call rotation (includes weekends) where you will be the primary responder for day to day operational issues and inquiries from our global support team. Working closely with our worldwide teams, this provides a unique opportunity to gain first-hand experience with the latest database technologies. You will follow runbooks to resolve such issues and use your troubleshooting and analytical skills to diagnose or troubleshoot platform or Data Service issues.
When not the primary responder you will be working to improve the automation involved in operating the service, finding enhancements and innovative solutions to help the services both scale and become increasingly self-healing.
There is no requirement to be an expert in any one language. However, knowledge of Go, Python, Jenkins, Kubernetes, Chef are useful. Knowledge in operating highly-available databases such as PostgreSQL, Redis, Elasticsearch, etcd production environments and/or streaming technologies such as Apache Kafka would also be useful.
The key requirement is to have a passion for supporting, operating and developing a high quality, highly available service.
Required skills (5 years + experience)
Bachelor's in Engineering, Computer Science, or relevant experience
- Experience with developing monitoring for production components and instrumenting code for observability
- Ability to debug, optimize code, and automate routine tasks
- Experience in Systems Engineering, such as Linux I/O tuning, performance, memory management and troubleshooting
- Experience with programming languages
- Experience with developing and operating complex mission-critical production database systems
- Experience with troubleshooting issues in production systems
- Capability to work in a global, multicultural and diverse environment
- Fluent English
- Knowledge of Go, Python, Jenkins, Kubernetes, Chef.
- Experienced in working with containerised workloads and management platforms like Docker or Kubernetes.
- Knowledge in operating highly-available PostgreSQL, Redis, Elasticsearch, etcd, RabbitMQ, Cassandra, and MongoDB environments
- Knowledge in Event Streaming technologies such as Apache Kafka.
- Experience with database internals and diagnosing memory leaks, data corruption, replication, database performance and tuning.
- Past experience working with public cloud platforms like IBM Cloud, AWS, Azure or others.
- Systematic problem-solving approach, coupled with excellent communication skills and a sense of ownership and drive
ManpowerGroup is committed to being a Diversity Confident Recruiter and encourages applications from people from a diverse range of backgrounds, including people with a disability. Please indicate your preferred method of communication in your resume and please let us know if you require any reasonable adjustments should you be contacted for an interview.
Aboriginal and Torres Strait Islander people are encouraged to apply.
State: QLD, licensee/s Manpower Services (Australia) Pty Ltd, LHL-02026-D5L4Q. State: QLD, licensee/s Greythorn Pty Ltd, LHL-02014-Y5F6D. State: SA, licensee/s Manpower Services (Australia) Pty Ltd, LHS 288856