Jobs at CapaCloud
Must have skills
About this Opportunity
We are looking for a Distributed Systems / GPU Infrastructure Engineer to help architect and scale the core infrastructure behind the CapaCloud decentralized GPU network.
You will work on GPU orchestration, node infrastructure, distributed computing systems, workload scheduling, performance optimization, and platform reliability.
This is a high-impact engineering role for someone passionate about building the next generation of decentralized AI infrastructure.
Key Responsibilities
Design and build scalable distributed GPU infrastructure
Develop systems for node orchestration and workload scheduling
Optimize GPU utilization and compute performance
Build fault-tolerant infrastructure for decentralized environments
Improve network reliability, scalability, and uptime
Develop deployment automation and infrastructure tooling
Work with AI and blockchain teams to integrate compute systems
Monitor infrastructure performance and troubleshoot bottlenecks
Contribute to backend architecture and cloud-native systems
Implement secure infrastructure best practices
Required Skills & Experience
Strong experience with distributed systems and backend infrastructure
Experience with Kubernetes, Docker, and container orchestration
Strong Linux systems administration knowledge
Experience with GPU infrastructure and CUDA environments
Proficiency in Go, Rust, Python, or similar backend languages
Experience with cloud infrastructure platforms
Understanding of networking, virtualization, and load balancing
Experience building scalable APIs and infrastructure services
Familiarity with monitoring tools and observability stacks
Strong debugging and performance optimization skills
Nice To Have
Experience in decentralized infrastructure or Web3
Experience with AI/ML infrastructure
Bare-metal infrastructure experience
Experience with distributed storage systems
Knowledge of peer-to-peer networking systems
Open-source contributions
What Success Looks Like
Reliable decentralized GPU orchestration system
High-performance compute scheduling infrastructure
Reduced latency and improved GPU efficiency
Stable infrastructure scaling across multiple regions
Strong uptime and system reliability metrics
Employment Type
Full-time
Remote
Find the perfect job!
Use Job Hunt AI to find the perfect job for you.