About the job
Ensure 24/7 availability and performance of network infrastructure through monitoring, incident response, escalation management, and technical support while maintaining SLAs and collaborating with cross-functional teams.
Responsibilities~
Network Monitoring & Performance~
Monitor infrastructure using SolarWinds, PRTG, Nagios, Zabbix, and proprietary systems like Cosmos
Analyze performance trends, bandwidth, latency, packet loss, and KPIs
Maintain monitoring thresholds, alert policies, and SLA metrics
Incident & Problem Management~
Respond to alerts and outages within SLA timeframes
Perform triage, diagnosis, and manage the incident lifecycle
Conduct root cause analysis and document lessons learned
Track trends and recommend preventive measures
Escalation Management~
Serve as the primary escalation point for Tier 1 and Tier 2 support teams
Escalate complex issues to Tier 4 engineers and specialized teams
Coordinate vendor support and lead bridge calls during major incidents
Manage communication between technical teams, management, and stakeholders
Network Operations & Maintenance~
Execute firmware updates, patches, and configuration changes
Configure routers, switches, firewalls, load balancers, wireless controllers
Implement VLANs, access control policies, and VPN connections
Perform change management and maintenance window activities
Troubleshooting & Support~