By Omry Koschitzky, VP Solutions of XpoLog
Complex IT environments have made troubleshooting increasingly difficult, but remedies including the DevOps trend and IT log analysis tools have emerged in response. These approaches can be a powerful combination for faster resolution.
It’s never good when mission critical applications fail; it’s even worse when this happens after normal business hours. Application uptime matters – a lot, because users want constant connectivity and IT is critical to many business operations.
An application has many stakeholders from DevOps to compliance, security teams and, of course, the sponsor. The motivations for rapid troubleshooting might differ. However, time-to-resolution is a unifying factor that dissolves old divisions. But how do you determine where the problem lies? The cause can be elusive when users complain about transaction failure. There may be an integration issue, bad code, and problems with infrastructure, security, load, and so on.
Deployments that have several components running in private and public clouds, and/or highly virtualized hybrid deployments make the triage process a complicated task, and the adoption of cloud services is only increasing. A recent Gartner survey of 651 organizations found that only 38% were using cloud services, but 80% will be within the next 12 months. That’s a recipe for greater complexity.
DevOps to the rescue
A holistic approach is needed to find exactly where something went wrong. DevOps provides a number of benefits including closer collaboration between IT and developers. These departments didn’t always see eye-to-eye, and developers were sometimes thought of as second-class citizens. DevOps has changed that by making everyone a stakeholder working for a common purpose, and can result in much faster code deployments (30%) and remarkably fewer failures (50%), according to a recent Puppet Labs survey.
“Why is DevOps reshaping enterprise IT? Quite simply, because it works. Because IT operations and development are better in collaboration than in competition,” said Matt Asay, vice president of business development and corporate strategy at MongoDB. “63% of organizations have adopted DevOps practices,” he noted.
Teamwork is essential to the DevOps triage process, which can untangle complex systems to determine a root cause. Otherwise, it would be difficult to learn whether a transaction failed across an entire tier (e.g., all app servers), if only one server is causing the problem, or whether bad code or a 3rd party or cloud service is to blame. It’s almost impossible to use old style methodologies in such dynamic environments, but adopting DevOps practices is a good starting point.
In fact, DevOps yields superior results over time. The Puppet Labs survey found that failure rates and time-to-resolution continue to fall the longer DevOps practices are followed. Puppet Labs commercializes open source IT automation software, but it’s part of a booming market to support DevOps. Other companies include IBM and Opscode, the company behind Chef, an open source tool. Gartner has rated several tool companies as cool vendors for DevOps. Tools establish a framework for collaboration with a range of capabilities that can scale up to large enterprises.
Finding the needle in a needles factory
However, collaboration is only one aspect of the triage process. Teams also require tools that must be flexible enough to support dynamic data center architectures and that have the power to provide real time analysis of the data. That’s where IT log analysis comes into play. Application and server logs can exist in many places and formats, so analysis tools have evolved to centralize log management and search.
Some tools even utilize correlation and machine learning to help users triage their application problems by filtering out unrelated events. For instance, it’s now possible for software to determine that 25 log entries out of thousands related to database connectivity are urgent. These tools can all help to resolve the aforementioned database problem, but can differ in time- to-resolution depending on the level of automation that the project leaders or vendor has provided.
Dark Reading has said that while log analysis can sometimes be a daunting task, prior to the introduction of augmented search results: “When done right, however, it is a process that can improve response time for both operational and security staff.”
Log analysis tools can vary considerably in utility, but there are many mature options available, both commercial as well as free software. Every organization has its own requirements and can determine what tool is best for its DevOps teams. Just using log mining makes DevOps even more effective and helps with triage of problems – which is important for increasingly complex environments.
Organizations should consider implementing DevOps practices alongside IT log analysis tools. Receiving an alert about a transaction failure at 10pm could be disaster, but the right organizational resources can swiftly turn a would-be late night crisis into an easily resolved incident and extra pillow time.