At Salesforce, Trust is our #1 value. Nothing is more important than the security and privacy of customer data. Our Customer 360 Platform connects data from marketing, sales, commerce, service, and IT teams around every customer. They work together while boosting productivity, increasing efficiency, and decreasing costs.
How are we supporting the growth of our global customer base? We now deliver Customer 360 on both first-party infrastructure and through Hyperforce, a new infrastructure architecture that allows Salesforce to scale rapidly and securely using public cloud partners — including AWS and its latest security innovations.
Solving large-scale log collection on Hyperforce
The Detection and Response (DnR) team is critical to securing Salesforce’s infrastructure. The team detects malicious threat activities and provides timely responses to security events. We do this by collecting and inspecting petabytes of security logs across dozens of organizations, some with thousands of accounts.
Hyperforce instances, accounts, and services are growing. As a response, the DnR team is challenged to improve the scalability of our data pipeline and data lake. We account for time-consuming tasks like reducing log ingestion latency, improving log onboarding efficiency, making pipeline scalable, and rationalizing cost-to-serve.
DnR designed and deployed an existing pipeline to collect security logs from global Hyperforce instances on AWS. Due to the variety of security log types, we applied a divide-and-conquer approach to build distinct log collection mechanisms for different log types. For example, we leveraged CloudWatch-based log collection, S3 storage-based log collection, Lambda-based log collection, Kinesis-based log collection, MSK-based log collection, etc. As log volume increases exponentially, the complexity and efficiency of our log collection mechanisms can be challenging to handle.
To solve a large-scale log collection on Hyperforce, we want to enable a convergent architecture that supports multiple log sources, enables scalable ETL, activates accurate schema management, helps log analytics/queries, and covers end-to-end observability of the entire pipeline.
Security events at scale with Amazon Security Lake
Amazon Security Lake is a service that automatically centralizes an organization’s security data into a purpose-built data lake in a customer’s AWS account. This allows enterprise customers like Salesforce to centrally aggregate, manage, and use security-related log and event data at scale. It easily consolidates security logs and events from AWS, on-premise, and other cloud providers.
It does so by automating the collection of security-related log and event data from integrated AWS services and third-party sources. Amazon Security Lake manages the lifecycle of that data with customizable retention settings, roll-up to preferred AWS Regions, and transforms that data into a standard open-source format called Open Cybersecurity Schema Framework (OCSF). We can then use the security data that’s stored in Amazon Security Lake for incident response and security data analytics.
Maximize scale and innovation with Hyperforce.
Accelerate growth with Salesforce on the public cloud. With Hyperforce you can scale your infrastructure with more flexibility and protect sensitive data with built-in trust.
Here are the advantages
Salesforce DnR had the opportunity to experiment with the beta version of Amazon Security Lake. In doing so, we identified several advantages. Amazon Security Lake is good at collecting and transforming AWS native logs (e.g. CloudTrail log, VPC flow log, Route53 Resolver log). After transformation, the log is directly consumable in the OCSF schema and Parquet data format.
- One-stop log collection management for accounts under an AWS organization: Amazon Security Lake offers centralized management of security log collection across AWS accounts. It’s easy and straightforward to enable log collection by reducing previous days and weeks of work to a few hours.
- Abundant security log sources: Amazon Security Lake supports a list of AWS native logs such as CloudTrail log, VPC flow log, Route53 Resolver log, etc. In addition, it adds support for OCSF-compliant vendor logs.
- Automatic ETL to transform log with OCSF schema: Amazon Security Lake runs automatic ETL jobs to transform log data to specific OCSF schema and makes Parquet format data in Amazon Security Lake S3 bucket easily consumable.
- Effective log partitioning and regional rollup: Amazon Security Lake provides log data rollup to designated regions that can help manage log data in our global infrastructure. Log data is well partitioned by log source, region, account id, and event hour.
- Custom log source ingestion support: Customer log data can be ingested into Amazon Security Lake as long as customer log data is OCSF-compliant. This creates a unified data lake to manage AWS’s native security logs and Salesforce’s own log data.
- Integration support for other services: Amazon Security Lake has good support for integration with other services, such as Athena, Splunk, etc. We can easily search and run analytics jobs against Amazon Security Lake log data with several query engines.
Innovation meets trust
Ultimately, our Amazon Security Lake adoption helps us solve some heavy-duty work for log collection, transformation, aggregation, search, and management. It fits well with Salesforce Hyperforce on AWS and complements DnR’s own data pipeline. Moreover, Amazon Security Lake log aggregation and regional roll-up fully align with DnR’s own global, decentralized, and hybrid data lake infrastructure.
We’re excited about Amazon Security Lake’s potential to integrate with third-party OCSF consortium members and bring all security log data under one roof. We anticipate that Amazon Security Lake will help offload 30% – 50% of the traffic of our own data pipeline. This will significantly reduce our log onboarding time, increase log coverage, and further uplift the security posture of Salesforce. Our focus will always be on trust and innovation.
This post was authored by members of Salesforce’s Detection and Response (DnR) Engineering Team:
Lei Ye, Software Engineering Architect, Detection & Response (DnR) Engineering, focuses on innovating data processing pipelines, data lake, and query engine for DnR. He’s the tech lead driving Salesforce’s Amazon Security Lake project. He architected the blueprint of DnR’s next-generation data lake infrastructure and drove the collaboration across organizations and teams.
Ajith Jayamohan, Vice President of Detection & Response (DnR) Engineering, leads the Machine Learning, and AI-driven data platform teams at Salesforce. He‘s focused on detecting and responding to internal and external adversaries to protect Salesforce and its customers.