Support-Ansible
An automation framework for hyperconverged cluster maintenance, updates, and log collection — turning repetitive, error-prone manual work into reproducible, one-command runs.
Overview
Support-Ansible is the automation framework I designed and deployed for our support team at Scale Computing. It standardizes the repetitive, high-stakes work that surrounds an escalation — cluster maintenance, updates, and log collection — so any engineer can run it the same way, every time.
The problem
When you’re working a Tier 3 escalation, the slowest part often isn’t the diagnosis — it’s the setup. Collecting logs across cluster nodes, applying maintenance steps, and prepping updates by hand is slow, easy to get subtly wrong, and hard to reproduce when you need to hand a case to engineering. Every engineer had their own slightly different way of doing it.
What I built
- Ansible playbooks that codify cluster maintenance and update procedures into repeatable, reviewable runs — no more tribal, by-hand sequences.
- Python tooling for collecting and organizing logs across nodes, so an escalation arrives at engineering with consistent, complete data.
- A standardized workflow the whole team can use, turning one person’s process into the team’s process.
Impact
It cut manual workload on routine maintenance and, just as importantly, made issues reproducible — which is what lets root causes actually get found and fixed instead of worked around. It’s a core part of how I own escalations end to end.