Maintaining robust security is essential to keeping up with the increasing complexity of cyber threats. Continuous security monitoring offers constant visibility of the IT ecosystem, including infrastructure, networks, endpoints, cloud services, applications, data flows, and user identities (via IAM). It also helps with detecting and responding to threats in real time.
This article discusses best practices for effectively implementing and managing continuous security monitoring. We also look at tools and platforms that help establish effective continuous security monitoring.
Summary of key continuous security monitoring best practices
Best practice |
Description |
Establish security objectives |
Define precise security monitoring goals that align with business requirements. Focus on critical assets and high-impact threats rather than attempting to monitor everything. |
Implement effective logging and monitoring |
Set up a scalable security telemetry pipeline to track and monitor system activities, enabling the real-time identification of anomalies or security threats. |
Real-time detection and analysis |
Implement continuous, sub-second monitoring of systems, networks, and applications to detect security incidents as they occur rather than after the fact. |
Enable cross-correlation capability |
Use tools that can correlate events across different systems to identify complex attack patterns that might not be evident in isolated logs. |
Conduct regular security scanning |
Adapt day-to-day vulnerability scanning of the IT environment, code, and configuration files. A proactive approach helps enhance understanding of the scope of possible security risks. |
Integrate compliance monitoring |
Align monitoring practices with relevant regulatory and industry compliance requirements to satisfy both security and compliance objectives. |
Develop an incident response plan |
Create a structured plan outlining responses to various security incidents and ensure that stakeholders know their roles. |
Maintain a continuous improvement cycle |
Implement a formal process for regularly reviewing and updating the monitoring process based on new threats and post-mortems. |
Establish security objectives
Setting security objectives is the foundation of an effective monitoring strategy. Organizations need standardized methods and rules defined by their security objectives to protect digital assets and physical infrastructure. The end goal of these objectives is to maintain the pillars of information security known as the CIA triad: confidentiality, integrity, and availability.
Key stakeholders (security, IT, risk, and business leadership) can establish an organization’s security objectives in a few steps that address the following key areas.
Align with the business goal
Security objectives should be in sync with broader business goals. It's hardly possible to cover all potential vectors of attack, so it makes sense to focus the security monitoring efforts on the most critical parts of the business. For instance, a company engaged in online retail services may point its security monitoring to the main parts of the ecommerce sales pipeline, like the availability of the underlying infrastructure and the protection of customer data.
Define specific metrics
Outlining measurable metrics helps with evaluating the organization's security posture and identifying potential bottlenecks. Usually, these metrics include the number of security incidents, mean time to detect (MTTD) threats, and mean time to respond to them (MTTR).
Prioritize assets
The challenge is to conduct a risk assessment and identify the most valuable assets. Assets include critical data, applications, servers, network infrastructure, and other key resources, and they are not equally critical. As previously mentioned, the primary goal of security monitoring is not to cover everything but rather the high-impact threats that may cause the most damage.
Implement a security framework
Adopting a comprehensive security framework, such as ISO 27001 or NIST, helps the organization manage security risks. A key component required by these broad frameworks is vulnerability management.
Frameworks, like the Common Vulnerability Scoring System (CVSS) and Stakeholder-Specific Vulnerability Categorization (SSVC) aim to assess and categorize vulnerabilities in software and systems. CVSS is an open industry standard that provides a traditional score for vulnerabilities, while SSVC is a decision-making framework that helps prioritize vulnerabilities considering the unique perspectives of various stakeholders.
These systems and methods are complementary and designed to work together. ISO/NIST guides the overall security program, CVSS provides severity data, and SSVC helps prioritize actions within that program's vulnerability management process.
Next-generation telemetry data pipelines for logs, metrics, and traces built with leading-edge technologies
- 1
Optimize, enrich & route your real-time telemetry data at the ingestion point
- 2
Drag-and-drop data pipeline creation & maintenance with the need for Regex
- 3
Rely on a future-proof, simple, efficient & flexible architecture built for hyperscale
Implement effective logging and monitoring
After establishing security objectives, the next step is to set up the security monitoring infrastructure. The monitoring components, like dashboards, analytics, and alerts, require accurate input, which the scalable and highly available telemetry pipeline can provide. Picking an efficient security telemetry pipeline is an excellent starting point for implementing security monitoring.
The telemetry pipeline is a solution that delivers data—logs, metrics, and events—from the source (e.g., the OS, application server, or network application) to a log collection destination. However, the data pipeline is not about the destination but the changes and transformations to the raw data before ingesting it into a data warehouse or security information and event management (SIEM) tools.
Several points need to be considered when designing a data collection pipeline:
Data source types, like OS log files, API endpoints, database logs, network inputs, application logs and metrics, etc.
Requirements for compliance
Requirements for data transformation
Requirements for data ingest latency
Requirements for data distribution across consumers (e.g., data warehouses, SIEM systems, and other log collection and storage solutions)
After identifying the relevant data sources and details about data types, storage, and transformations, security engineering teams must determine how to collect the data. This involves setting up data collection agents or using methods like APIs or custom integrations to connect the data sources to the telemetry pipeline.
Observability solutions like Onum can help you take full control of data pipelines, reduce the cost of managing telemetry data, remove irrelevant or repetitive data, transform data before ingestion, and ensure that the data format is correct and easy to analyze.
Onum accomplishes this by doing the following:
Taking control and reducing costs: Onum sits between data sources (servers, applications, cloud services, etc.) and data destinations (like SIEM systems, data lakes, monitoring tools). This intermediary role is key to cost savings because it allows Onum to filter out redundant or low-value data while transforming data and reducing data volumes by removing unnecessary fields or aggregating information.
Removing irrelevant or repetitive data: Security telemetry pipeline solutions, like Onum, act as a data refinery, significantly improving the quality and efficiency of the data stream before it reaches security analysts or downstream tools.
Transforming data and ensuring correct format: Before data lands in your SIEM or data lake, Onum can modify it to make it more valuable and consistent.
For example, some log types, like the NetFlow data shown below, can have many details, requiring substantial amounts of storage space. At the same time, each log may be quickly narrowed down to the part needed to perform the analytics, implement monitoring, or be stored as a compliance requirement.
date="2022-02-18" time="15:54:39" tz="PST" logid="0608000001" devid="FAIVMSTM21000033" type="ndr" subtype="ML" severity="low" sessionid=1135774 alproto="DNS" tlproto="TCP" srcip="212.102.209.138" srcport=7033 dstip="123.31.20.78" dstport=63294 srcmac="63:c5:7c:8b:12:da" dstmac="cb:8a:f9:18:bd:81" vlanid=0
Real-time detection and analysis
Traditionally, security monitoring operated with some periodicity, meaning that analytics were performed only after some observability signals were collected over time. Periodical monitoring has significant downsides that cannot be underestimated: It leaves critical gaps and may allow potential attackers to compromise security. Some organizations still consider periodic monitoring due to its low cost and easy implementation, but continuous monitoring has become a standard practice in recent years.
To gain more reliability, there is no option but to follow a real-time security monitoring approach. In continuous sub-second monitoring, it is possible to analyze data and detect anomalies as they occur rather than after the fact. At the same time, real-time monitoring implies higher standards for the observability of data pipelines, demanding the delivery of the observability data in seconds or milliseconds.
The following table summarizes the benefits of real-time detection and monitoring.
Monitoring aspect |
Benefit |
Rapid threat detection |
If signals of potential compromise are collected and analyzed in real time, the security team has more time to catch security incidents and prevent them from escalating. |
Proactive incident response |
Continuous real-time monitoring allows for transforming your incident monitoring and management approach from reactive to proactive. Proactivity enables recognizing the weak spots in systems, applications, and configurations before somebody can exploit them. |
Greater visibility |
Improving visibility means fully understanding what is happening across the IT environment, including networks, endpoints, applications, and cloud services. This also reduces latency when the security team deals with ongoing security incidents. |
Improved compliance management |
The primary goal of real-time compliance monitoring is to identify areas where the organization may not comply with internal policies or regulatory requirements. |
Informed decision-making |
Continuous real-time monitoring supplies vital information needed to support risk management decisions. It may help to move to data-driven risk management instead of risk management passively driven by compliance. |
Onum's ability to parse, normalize, enrich, and filter data before it reaches analytical tools (like SIEMs) is key for real-time analysis. By offloading this processing work from the destination tools, Onum ensures that data arrives “analysis-ready.” This allows SIEMs and detection engines to apply correlation rules and run queries more quickly, minimizing the critical time gap between an event happening and it being detected or investigated.
Enable cross-correlation capability
While monitoring single observability signals is a good starting point, actionable insights into the entire technology stack (including networks, systems, applications, and cloud services) require data correlation. Log monitoring tools like ELK stack, Splunk, Graylog, or QRadar that correlate data from different sources are important for getting an end-to-end view. Ingesting data from diverse sources and using correlation engines to link related events reveals the full picture of an activity or incident.
Consider the example of a security observability setup that monitors corporate network devices and office employees' workstations. At first, a VPN device sent a message saying that user "Michael" successfully logged into the corporate network with a foreign IP address:
date=2019-05-13 time=15:55:56 logid="0102043008" type="event" subtype="vpn" logdesc="Authentication success" srcip=104.104.130.115 policyid=1 interface="port10" user="michael" group="workgroup” action="authentication" status="success" msg="User michael succeeded in authentication"
Looking at this log alone, the security operation team cannot tell if Michael went on a business trip or wanted to check his email while on vacation. This single event does not identify it as suspicious activity if no policy restricts a VPN login from a foreign country.
However, if we have access to the Windows Event Logs from the employee's PC in the office, we can find an event about a successful login to Michael's PC account in the office:
An account was successfully logged on.
Subject:
Security ID: SYSTEM
Account Name: MICHAEL
Account Domain: WORKGROUP
Logon ID: 0x3E7
That raises suspicions because Michael cannot simultaneously be in two different places—at least one of these successful logins has a suspicious origin. This represents a use case with a potentially compromised user account using a simple correlation between the VPN network log and the local Windows Event Log.
This example shows the importance of cross-correlation for analyzing data and looking for anomalies. Security teams can automate this process using anomaly detection tools and correlation rules. Security teams leverage tools like SIGMA to create and share rules to detect security threats. These tools are vendor-agnostic and may be used to share detection content without translating a query from one SIEM solution syntax into another.
SIGMA is used primarily for pattern detection within log messages. A similar use case with a successfully performed brute force attack scenario may be detected with two rules correlated with each other. The first rule detects multiple failed logins.
title: Multiple failed logins
id: a8418a5a-5fc4-46b5-b23b-6c73beb19d41
description: Detects multiple failed logins within a certain amount of time
name: multiple_failed_login
correlation:
type: event_count
rules:
- failed_login
group-by:
- User
timespan: 10m
condition:
gte: 10
The second rule catches a successful login. While a successful login does not mean the account is compromised, a successful login after a chain of failed logins may mean a successful brute-force attempt.
title: Successful login
id: 4d0a2c83-c62c-4ed4-b475-c7e23a9269b8
description: Detects a successful login
name: successful_login
logsource:
product: windows
service: security
detection:
selection:
EventID:
- 528
- 4624
condition: selection
The correlation in this scenario combines the logic of two previous rules:
title: Correlation - Multiple Failed Logins Followed by Successful Login
id: b180ead8-d58f-40b2-ae54-c8940995b9b6
status: experimental
description: Detects multiple failed logins by a single user followed by a successful login of that user
references:
- https://reference.com
author: Florian Roth (Nextron Systems)
date: 2023-06-16
correlation:
type: temporal_ordered
rules:
- multiple_failed_login
- successful_login
group-by:
- User
timespan: 10m
falsepositives:
- Unlikely
level: high
The main goal of this kind of correlation is to provide meaningful insight for potentially successful brute force attacks, unlike the single rule, which simply shows a sequence of failed logins without any context.
Conduct regular security scanning
Regular security scanning helps teams understand the scope of possible security risks to prevent data breaches. In data observability, it also ensures that the platform, which holds sensitive metadata, is secure. This process includes network scanning, vulnerability assessments, penetration testing, and more. In the context of a data observability platform, it extends to scanning the platform's infrastructure, data pipelines, and the data itself for potential security risks.
Maintaining up-to-date scanning tools and techniques to detect evolving threats is always challenging, and continuous updates of scanning tools and vulnerability databases are a pillar of cybersecurity. Also, scanning large and complex data environments without impacting performance can be difficult.
Decide on the scanning scope and how often scans should be executed. Sensitive assets are usually scanned weekly, while less crucial ones get monthly or annual check-ups.
Source code scanning is another important part of detecting vulnerabilities in the earlier stages of the software development lifecycle. Static application security testing (SAST) tools scan and analyze source code and library dependencies to identify cybersecurity flaws without running the application. This proactive approach helps engineers catch and remediate known issues such as cross-site scripting (XSS) and SQL injections before the code moves to production or pre-production environments.
Selecting the right SAST tool requires considering various criteria, such as understanding the needed languages, libraries, or frameworks, the ability to run against binaries, ease of deployment/use, and license cost. Depending on the programming languages and libraries supported, SAST tools can be divided into two categories:
Multi-language tools: These are platforms that support a variety of programming languages. Some widely adopted multi-language SAST tools include:
Language-dedicated tools: Tools focused on a specific programming language/library, such as:
Bandit: A tool that finds known security issues in Python code
Brakeman: A static analysis tool designed to check Ruby on Rails applications
OWASP Find Security Bugs: It and its SpotBugs plugin are used for security audits of Java applications
Another approach is dynamic application security testing (DAST), which tests a running application. DAST aims to identify issues at runtime while simulating attacks. It helps identify security flaws with authentication or code injections. Since this method evaluates running applications, it’s language-agnostic.
To make security scanning more efficient, integrate SAST/DAST into your CI/CD pipeline, implement prioritization of critical assets, and automate the process.
Once the scanning report is completed, the next step is remediation. All patches and updates should be applied with attention to vulnerability, severity of affected assets, and time to remediation. No one wants to be exposed to critical vulnerabilities or have their business stop for an extended period.
Integrate compliance monitoring
Modern regulations are complex, and mapping them to specific data and system configurations poses a significant challenge. Real-time security monitoring helps organizations comply with regulatory requirements like GDPR, HIPAA, and PCI DSS. This process helps avoid legal penalties and simultaneously maintains customer trust. Additionally, aligning with regulations helps organizations demonstrate the company's diligence and accountability. It provides real-time visibility into compliance status, allowing immediate remediation when violations occur.
Many compliance regulations emphasize the principle of data minimization. Organizations should only collect and store the data necessary for specific, legitimate purposes. This means storing only the data required by relevant regulations. Such an approach also brings additional advantages like:
Cost savings on data storage
Easier data management
Lower impact of potential data breaches
Optimized compliance reporting
Integrating compliance monitoring with existing security tools can be complex, and interpreting compliance reports can require specialized knowledge. To make this process easier, organizations can use compliance automation platforms that provide prebuilt compliance checks and reporting. Using SIEM systems can help you correlate compliance data with other security events. Properly processing, filtering, and labeling data can achieve all those objectives without losing insights or compromising security.
Develop an incident response plan
“It takes 20 years to build a reputation and a few minutes of cyber-incident to ruin it.”
– Stéphane Nappo, Global Chief Information Security Officer at Société Générale International Banking
Every organization requires an action plan for security-related incidents, which is where the incident response plan comes into play. An incident response plan is an established set of procedures for handling a security incident, such as a data breach or malware infection. It defines roles, responsibilities, communication protocols, containment, eradication, and recovery steps.
An incident response plan reduces the impact of security incidents. It allows organizations to respond effectively and quickly, minimizing downtime and data loss. A mature and well-organized plan will help any organization before, during, and after a security incident.
Commonly, such a plan consists of six stages:
Preparation
Detection and analysis
Containment
Eradication
Recovery
Post-incident activity
To make a plan work, organizations need to continuously update their security procedures and policies, defining the roles and responsibilities of each IR plan member. The IR team is not limited to people who remediate the threat but also involves business, legal, and HR personnel.
Regular incident simulations and modeling are highly recommended since responding to an incident requires fast decision-making and rapid execution of technical steps.
Maintain a continuous improvement cycle
Continuous improvement involves reviewing and improving security practices based on feedback, incident reports, and threat intelligence. This iterative process aims to improve an organization's security posture over time.
Continuous improvement allows organizations to adapt to the ever-changing threat landscape. It helps them identify and address security gaps, increase efficiency, and improve security performance. It ensures that security practices remain relevant and practical.
It can be challenging to gain support from all stakeholders and allocate sufficient resources to support continuous improvement initiatives. It can also be challenging to measure the effectiveness of security improvements and demonstrate the return on investment.
One way to overcome those challenges is to use the plan, do, check, act (PDCA) methodology, which helps organizations introduce objectives, accomplish them, evaluate results, and solidify improvements. A fundamental aspect of the method is iteration—once a hypothesis is confirmed, executing the cycle again will extend understanding of the process and related issues. Repeating the PDCA cycle can bring its users closer to the goal—usually perfect operations and output.
Conclusion
Implementation of continuous security monitoring is a key to keeping the organization’s infrastructure and assets safe. However, the proper monitoring setup requires attention to detail: identifying the security objective that meets the business goal, setting up a security observability pipeline that delivers relevant data into the log aggregation solution, integrating regulatory compliance monitoring, and creating a clear and conclusive incident response plan.
Want the latest from Onum?
Subscribe to our LinkedIn newsletter to stay up to date on technical best practices for building resilient and scalable observability splutions and telemetry pipelines.