Distributed Systems / GPU Infrastructure Engineer

at CapaCloud • Full-time

Location

hybrid (Wyoming, United States)

Experience

2-4 years

Compensation

$5k-$7.5k

Must have skills

Product Management

Wordpress

Web Development

About this Opportunity

We are looking for a Distributed Systems / GPU Infrastructure Engineer to help architect and scale the core infrastructure behind the CapaCloud decentralized GPU network.

You will work on GPU orchestration, node infrastructure, distributed computing systems, workload scheduling, performance optimization, and platform reliability.

This is a high-impact engineering role for someone passionate about building the next generation of decentralized AI infrastructure.

Key Responsibilities

Design and build scalable distributed GPU infrastructure
Develop systems for node orchestration and workload scheduling
Optimize GPU utilization and compute performance
Build fault-tolerant infrastructure for decentralized environments
Improve network reliability, scalability, and uptime
Develop deployment automation and infrastructure tooling
Work with AI and blockchain teams to integrate compute systems
Monitor infrastructure performance and troubleshoot bottlenecks
Contribute to backend architecture and cloud-native systems
Implement secure infrastructure best practices

Required Skills & Experience

Strong experience with distributed systems and backend infrastructure
Experience with Kubernetes, Docker, and container orchestration
Strong Linux systems administration knowledge
Experience with GPU infrastructure and CUDA environments
Proficiency in Go, Rust, Python, or similar backend languages
Experience with cloud infrastructure platforms
Understanding of networking, virtualization, and load balancing
Experience building scalable APIs and infrastructure services
Familiarity with monitoring tools and observability stacks
Strong debugging and performance optimization skills

Nice To Have

Experience in decentralized infrastructure or Web3
Experience with AI/ML infrastructure
Bare-metal infrastructure experience
Experience with distributed storage systems
Knowledge of peer-to-peer networking systems
Open-source contributions

What Success Looks Like

Reliable decentralized GPU orchestration system
High-performance compute scheduling infrastructure
Reduced latency and improved GPU efficiency
Stable infrastructure scaling across multiple regions
Strong uptime and system reliability metrics

Employment Type

Full-time
Remote

Find the perfect job!

Use Job Hunt AI to find the perfect job for you.

Job Hunt AI