By Anuj Tuli, Chief Technology Officer
As the industry moves towards self-healing containers, agile applications, and seamless infrastructures, there is an impending need for setting up auto-remediation of incidents and configuration drifts. Infrastructure and Operations teams have to depend heavily on automated tools, systems and processes, to manage the ever-expanding parlance of the IT framework.
Closed-Loop Incident Process is one such subset of Closed Loop Automation, and is defined as follows:
- You receive an alert for a service down in your operations center console
- An automation framework picks up the alert, and fetches information contained in the various fields (e.g. reason for alert, configuration item). If the configuration item that alerted does not exist in the CMDB, then it creates the corresponding CI in the CMDB (Configuration Management Database). If the CI already exists in the CMDB, it creates an Incident Ticket in your IT Service Management system.
- The framework auto-remediates the issue based on the custom runbooks you have defined for your organization. For example, if the disk is full, delete the logs and removes any temporary files. The Incident ticket is also updated with the results of the remediation effort.
- If the auto-remediation succeeds, the associated incident ticket is updated, and closed. If the auto-remediation fails for any reason, a notification is then sent out for human intervention.
Many organizations have already adopted this automated remediation process and expanded it to include the top 5 common alert types on which they spend the most time. In most cases, they are automating consistent repeatable processes that an engineer works on, again and again, day in and day out. Automating these processes have saved these organizations a ton of manual hours, reduced human errors, and added tangible efficiency to their infrastructure and operations teams.
If you need assistance in building the auto-remediation framework, Keyva can help. If you’d like to talk about how other organizations have garnered benefits from such automation, please feel free to reach us at [email protected]
About the Author
Anuj Tuli, Chief Technology Officer Anuj specializes in developing and delivering vendor-agnostic solutions that avoid the “rip-and-replace” of existing IT investments. He has worked on Cloud Automation, DevOps, Cloud Readiness Assessments, and Migration projects for healthcare, banking, ISP, telecommunications, government and other sectors. He leads the development and management of Cloud Automation IP (intellectual property) and related professional services. During his career, he held multiple roles in the Cloud and Automation, and DevOps domains. With certifications in AWS, VMware, HPE, BMC and ITIL, Anuj offers a hands-on perspective on these technologies. Like what you read? Follow Anuj on LinkedIn at https://www.linkedin.com/in/anujtuli/ |