A lot has been happening to change how IT operates in enterprises with the entrance of new technologies, including the consumerization of IT, BYOD, and cloud computing. Unfortunately, many of the innovations and changes for corporate IT have opened the door for escalated cyber security challenges. Corporate security teams now have to address global venues for protecting the enterprise and can no longer view security as a silo’d function of “wall building” and defensive functions. Attacks by sophisticated cyber criminals and hackers call for proactive cyber security processes, where enterprises continuously hunt for current and potential threats.
Today’s cyber security threats frequently operate as patterns that usually deviate from expected behavior for most authorized users or for activity on particular devices or IP addresses. To identify and fight such attacks, security teams need technology that can find and analyze deviant trends. This is a natural fit for big data mining and analytics (predictive analytics in particular). Enterprises can now take the necessary actions to block such cyber threats and make improvements to prevent future onslaughts.
Machine-generated data provides fertile ground for using big data analytics to root out cyber security threats. But machine-generated data can be quite challenging for aggregation, data mining, and analytics. Such data must be processed very quickly, frequently in real-time, and usually exists in large volumes that are continuously proliferating. Machine data sources are quite disparate; many of them are multi-structured formats. Machine data lives in the IT infrastructure: network logs, event logs, firewall and security system data, web logs, email logs—anything and everything operating in the infrastructure.
Once data has been extracted from machine-generated sources, it can be enriched with other kinds of data to elicit patterns and trends related to cyber attacks. Analytics take on a forensics quality while searching through data for patterns of irregular or unexpected activity. Big data analytics require complex data modeling and queries with continuous refinement and testing to scour through all of the data. Based on the understanding derived from analytics results, machine learning algorithms can be developed for ongoing monitoring of systems to detect new threats. Analytics and monitoring approaches will have to be constantly altered and fine-tuned to anticipate ever-changing cyber-threat tactics. Situational awareness is another important aspect of fighting cyber attacks. Big data analytics strengthen situational awareness primarily through fast real-time assessments that can reduce time to decisions and actions in response to potential threats and certain anomalies.
The advent of Hadoop processing infrastructures is making a significant impact on more successful outcomes for big data analytics and various applications such as cyber security. Middleware offerings are becoming available to reduce processing time and bottlenecks, and to provide better tools to quickly develop data mining and analytics processes. With cost-effective and efficient tools, enterprises are able to widen the ability to ask many kinds of questions, to test—and to fail—many ways, and explore more of what might be found in large volumes of machine-generated data.
For all of these techniques, it is essential that a variety of domain experts are involved to prescribe the right methodologies, and to validate results and approaches. Expertise in cyber threats combined with human experience and insight will bring vital perspectives to solving security problems. There is a lot of work to do to create and maintain vigilant processes to fight cyber attacks as is shown in a recent ESG research report:
While big data security analytics will roll out faster than most people think, there are bound to be some speed bumps along the way. In fact, some of the more annoying short-term issues will be around basic operational tasks like collecting, normalizing, and sharing security data in a multitude of formats, schemas, and syntaxes.
- 54 percent are experiencing “significant difficulties” or “some difficulties” with security data normalization
- 54 percent are experiencing “significant difficulties” or “some difficulties” with security data capture
- 52 percent are experiencing “significant difficulties” or “some difficulties” with security data sharing