DevOps/k8s engineer

Almaty, Kazakhstan
Full Time
Experienced
About Tothemoon

Tothemoon is a user-centric, multiservice digital assets trading platform. At Tothemoon, we prioritize what matters most in finance: reliability. Whether it’s buying, selling, exchanging, or investing in cryptocurrencies, you can trust us to protect your financial interests and propel you towards a prosperous future. Join a rapidly growing community of users who choose Tothemoon for their digital transactions.

We offer hands-on experience, challenging tasks, and opportunities for professional and career growth within a dynamic fintech project. We’re looking for a specialist to test our product, including the mobile and web applications, as well as APIs and backend services.

Key Responsibilities

  • Production infrastructure operations and development (90%)
    • Maintain and improve managed Kubernetes clusters (control plane, node pools, autoscaling, PDB, network policies).
    • Support API and ML workloads.
    • Set up monitoring, alerting, logging, backups, and disaster recovery procedures.
    • Investigate and resolve incidents, including on-call participation.

  • R&D and automation (10%)
    • Research, optimize, and automate the current infrastructure setup.
     

    Tech Stack / Core of the Project

  • Orchestration: Kubernetes (multi-pool, autoscaling, GPU workloads)

  • GPU / ML: NVIDIA H100, NVIDIA stack (CUDA, drivers, nvidia-device-plugin), LLM inference

    Requirements

  • Deep Kubernetes experience (3+ years):
    • Designing and maintaining production clusters (preferably with autoscaling, PDB, network policies).
    • Confident use of Deployments, StatefulSets, Ingress, RBAC, StorageClass, Helm/Kustomize.
    • Experience integrating Kubernetes with cloud providers (EKS, GKE, AKS, etc.).

  • Strong Linux background:
    • Understanding of kernel operations, networking stack, cgroups, and namespaces.
    • Ability to diagnose performance issues (CPU, memory, IO, network).

  • GPU and high-load ML/LLM experience — a strong advantage:
    • Deploying and managing GPU-based applications in Kubernetes.
    • Basic knowledge of CUDA, NVIDIA drivers, and nvidia-device-plugin.
    • Experience monitoring GPU utilization, memory, thermals, and errors.

  • Operational and integration experience:
    • Integrating external services into Kubernetes (logging, monitoring, security, storage).
    • Building monitoring and alerting aligned with SLO/SLA standards; incident analysis end-to-end.
    • Writing runbooks and automating routine operations.
     

    Why Join Us

  • A senior-level team and a friendly, collaborative environment open to innovation and experimentation.

  • Real technical challenges: high load, performance optimization, GPU infrastructure, and real-time workloads.

  • A product team, not outsourcing — your contribution directly impacts the company’s core technology.

  • Opportunities for professional growth and development in AI, ML infrastructure, and blockchain computing.

  • Supportive culture and a comfortable, modern workspace.

    Conditions

  • Format: On-site work in Almaty, Kulan Business Center.

  • Compensation: Competitive salary in USDT or fiat, including paid vacation and sick leave.

  • Benefits: Comfortable office and free lunches.

  • Schedule: Full-time, flexible working hours.

Share

Apply for this position

Required*
We've received your resume. Click here to update it.
Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

Human Check*