🇧🇷 Datacenter Hardware & Network Support Technician (Remote, from Brazil)
Context
AS+ provides run support for GPU clusters operated by a cloud infrastructure partner. We are building a support team to handle day-to-day incidents on these clusters. This first role focuses on weekday coverage. The work sits low in the stack — hardware and network diagnosis — rather than high-level HPC or application support.
Responsibilities
Diagnose and triage incidents on GPU compute clusters, determining whether a fault originates on our side or the client's.
Investigate hardware failures: collect and analyze hardware logs, identify failed components, and document findings for resolution or RMA.
Diagnose GPU hardware faults (failure detection and isolation — not performance tuning or porting).
Configure and troubleshoot network connectivity, including InfiniBand fabric.
Work directly with the client as first line of support, in English.
Required skills
Solid system and network fundamentals — low-level networking and connectivity diagnosis.
Hands-on hardware troubleshooting, ideally on Dell server hardware.
Ability to diagnose GPU hardware failures (no deep GPU expertise required).
InfiniBand knowledge (important).
Fluent English (all client communication is in English).
Not required
No advanced OS administration.
No Slurm or workload-scheduler expertise.
No HPC application or GPU-porting background.
Setup
Full remote.
Weekday coverage (first hire; the team will expand to cover a wider window).
Ă€ propos de GECI Int.
GECI International est un spécialiste de la Technologie et du Digital. Depuis son origine en 1980, le Groupe innove pour concevoir et développer des solutions, produits et services intelligents pour les secteurs de la Recherche, de l’Industrie et des Services.