Datrium Awarded Key Patent for Converged Primary and Backup Deduplication
April 2019 by Marc Jacob
Datrium announced it was assigned US patent #10,235,044 – its ninth US patent – for its high-performance deduplication primary data storage system technology. Invented by Datrium’s founders, who innovated the industry-leading backup deduplication technology at Data Domain, this novel method for converged primary-and-backup deduplication technology provides effective deduplication without impacting the performance of high-IOPS applications. The core underpinning of the newly patented system and methods is a content-addressed architecture that optimises data at rest, on premises or in the cloud; it also optimises data in motion among data centres and the public cloud. Deduplication is always turned on in Datrium’s software-converged infrastructure, both on-premises and in the hybrid cloud, ensuring consistently efficient data transport and as much as 95 percent less storage space consumed.
Previous industry approaches to data deduplication had limitations. They either supported narrow use cases such as streaming sequential backup to disk or required flash, which was cost prohibitive for snapshot retention and data protection. In contrast, Datrium’s patented deduplication can handle the random IOs of primary storage and can also store snapshots on cost-effective media, such as disks and Amazon S3 in the cloud, so that multitudes of snapshots can be retained, liberating enterprises from having to do backups.
Datrium’s latest patent, as published by the U.S. Patent and Trademark Office, describes a system and method by which “data in a storage system is deduplicated after receiving from at least one writing entity requests for a plurality of write operations for a corresponding plurality of data blocks in a storage object. The received blocks are buffered and sorted in order and a sequence of clumps is created from the buffered blocks, where each clump comprises a grouping of at least one of the sorted, buffered blocks. A boundary is determined between at least one pair of clumps based at least in part on the content of at least one of the buffered blocks, and it is then determined whether at least one of the clumps is a duplicate of a previously stored clump.”