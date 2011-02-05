StreamSets Debuts Solution to Discover, Secure and Govern Personal Data in Motion

March 2018 by Marc Jacob

StreamSets Inc., provider of the industry’s only enterprise DataOps platform, announced immediate availability of the industry’s first solution to discover, secure and govern personal identifiable information (PII) while “in flight” — as it arrives from a batch or streaming data source or moves between compute platforms. Designed with data privacy regulations in mind, StreamSets Data Protector reduces risk of expensive and embarrassing violations by helping companies meet requirements for GDPR, HIPAA and other compliance regimes. Until now, solutions for handling personal data have relied on “after the fact” scanning of data stores which, while valuable, can only discover sensitive data once it lands and potentially has already been shared. Companies are missing the opportunity to encrypt, mask, generalize or discard personal data as it arrives rather than storing it in the clear.

StreamSets Data Protector extends protection to the point of initial data ingestion, leveraging unique Dataflow Sensors that are part of StreamSets Data Collector. These sensors discover PII by comparing incoming data to built-in patterns such as national ID, tax ID or driver license numbers, bank account or credit card numbers, or IP addresses, or additional patterns created by the customer. Without the automation StreamSets provides, laborious hand-coding is required to continuously check each data source against dozens or hundreds of PII patterns. This approach becomes impossible, especially as unstructured data and data drift — unexpected changes to the structure and semantics of the incoming data — come to the fore.

StreamSets Data Protector gives enterprises an automatic, centralized and data drift-resistant way to implement data protection policies across all inbound pipelines. The key capabilities of StreamSets Data Protector are to discover sensitive data, secure it “in flight” and provide centralized governance to ensure continuous policy compliance:

• Discover — Dataflow Sensors detect sensitive data as it arrives. Incoming data is checked against hundreds of built-in identifiers or patterns defined in enterprise data catalogs. Enterprises can also customize protection by designing their own identifiers.

• Secure — Once sensitive data is detected, processors can perform a number of standardized operations such as the application of reversible or irreversible obfuscation algorithms, and also take actions such as route, filter, quarantine or alert.

• Govern — Enterprise-wide policies are centrally managed and applied to pipelines while audit reports trace where personal data came from and how it has been handled. It includes the concept of Security Zones that allow security architects to design defense-in-depth strategies around data. It complements data governance solutions for data at rest, integrating with catalogs such as Alation, Apache Atlas, Cloudera Navigator, IBM Information Governance and Waterline Data.