Location: Remote - EST timezone
Remote | Full-time
Compensation: $100K - $130K
We are hiring on behalf of our client who is seeking an exceptional, production-proven Infrastructure & DevOps Engineer to take absolute ownership of the deployment, secure networking, architectural lifecycle, and overall reliability of this distributed agent fleet from day one. The client is engineering a sophisticated infrastructure designed to launch a highly distributed fleet of managed, single-tenant personal artificial intelligence (AI) trading agents. Operating non-stop, these isolated processes execute high-frequency, complex financial workflows natively on blockchain infrastructure, dedicated exclusively to individual user portfolios.
Key Responsibilities
-
Fleet Orchestration & Scaling: Architect, provision, and scale the core user agent fleet across a hybrid Railway and AWS ecosystem, ensuring each user retains an isolated, secure, and predictable containerized process with optimized cost tracking and precise lifecycle hooks.
-
Secure Network Engineering: Establish, manage, and continuously harden private overlay networks using Tailscale in production, linking disparate user agents securely with core Model Context Protocol (MCP) servers and the underlying live trading runtimes.
-
Automated User Provisioning: Design and construct an end-to-end, zero-touch deployment pipeline utilizing advanced infrastructure-as-code and CI/CD best practices, enabling seamless, single-click automated provisioning of containers, secrets management, and environmental configurations for new users.
-
Operational Resilience & SRE: Define, build, and maintain comprehensive monitoring, telemetry, alerting, and automated incident response frameworks to guarantee graceful state retention, preserving live in-flight transaction states across sudden host restarts, scheduled key rotations, or regional cloud outages.
-
Incident Management: Oversee system health and participate in direct real-incident response and on-call rotations to maintain strict operational continuity for the live global fleet.
Requirements
-
Container PaaS Orchestration: Proven professional experience deploying, monitoring, and scaling complex architectures in production utilizing Railway, or equivalent containerized platform-as-a-service frameworks (such as Fly.io, Render, or Northflank).
-
Advanced AWS Proficiency: In-depth technical mastery of Amazon Web Services (AWS), with practical expertise spanning Virtual Private Clouds (VPC), Identity & Access Management (IAM), Secrets Manager, and elastic scaling frameworks (ECS / AWS Lambda).
-
Production-Grade Tailscale Networking: Demonstrated experience implementing Tailscale within a high-security production environment, with distinct competence configuring Access Control Lists (ACLs), complex subnet routing, and ephemeral node lifecycles.
-
Modern Infrastructure & CI/CD: Mastery of Docker containerization, comprehensive CI/CD deployment pipelines, and modern Infrastructure-as-Code (IaC) paradigms.
-
Blockchain & Onchain Context: Technical familiarity with blockchain mechanics, smart contract interactions, or web3 infrastructure paradigms to support decentralized application layers.
-
High-Availability / Financial SRE Background: A proven professional history managing environments where system stability impacts critical financial outcomes, paired with total comfort managing on-call duties and live incident response.
Nice to Have
- Direct experience deploying, managing, and monitoring Large Language Model (LLM) or autonomous AI agent fleets at multi-tenant scale.
- Prior exposure to quantitative trading systems, high-frequency execution runtimes, or deep integrations with platforms such as Hyperliquid.
Benefits
- Highly competitive compensation package
- The flexibility of a fully remote operating environment with an immediate start timeline.
- The opportunity to shape the architectural foundation of a cutting-edge technical ecosystem intersecting Artificial Intelligence and decentralized financial infrastructure.
- Access to top-tier modern tooling, modern infrastructure frameworks, and a highly streamlined, zero-red-tape development culture.