DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are pending in this office action.
Response to Amendment
This Office Action is in response to applicant’s communication filed on January 21st, 2026. The applicant’s remark and amendments to the claims were considered with the results that follow.
In response to the last Office Action, claims 1, 8, and 15 have been amended. As a result, claims 1-20 are pending in this application.
Response to Arguments
Applicant’s argument, see pgs. 8-9 of the remarks, filed on January 21st, 2026, with respect to the rejection of independent claims 1, 8, and 15 as amended under 35 U.S.C 103, where the applicant asserts the prior art does not teach or suggest the amended limitations as amended in independent claims 1, 8, and 15.
Examiner respectfully disagrees. Applicant's arguments fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references.
Additionally, Examiner believes that U.S Patent Application Publication 2016/0342545 issued to ARAI et al. (hereinafter as “ARAI”) still teaches the amended limitation reciting, “adding the data protection field, identified as skipped data, to an electronic record for electronic storage, wherein the electronic record is configured to store a skip descriptor comprising an offset and a byte count for the data protection field”.
ARAI teaches “adding the data protection field, identified as skipped data, to an electronic record for electronic storage”. ARAI teaches the “skipped data” as shown on [0096], by stating, “data protection information, e.g., a T10 DIF, which is often attached on a sector-by-sector basis in storage, may be detached and left in the compression information instead of being compressed”. The data that is detached left in the compression information and not being compressed is the skipped data. The data protection field identified as skipped data is the T10 DIF field which is being excluded which is the data that left in the compression information.
Additionally, applicant amended the limitation to include, “…store a skip descriptor comprising an offset and a byte count for the data protection field”.
ARAI teaches the “skip descriptor” as the description indicates on [0096], “For example, data protection information, e.g., a T10 DIF, which is often attached on a sector-by-sector basis in storage, may be detached and left in the compression information instead of being compressed. In a case where 8 B of T10 DIF is attached to 512 B of data, the data may be compressed in units of 512 B×four sectors, with 8 B×four sectors of T10 DIF information recorded in the compression information. In a case where sectors are 4,096 B and 8 B of T10 DIF is attached, 4,096 B are compressed and 8 B are recorded in the compression information”. That is the compression information acts as a metadata holder in which indicates 8 bytes is treated as a skip descriptor defining skipped data and stored separately (in compression information). The statement of “the data may be compressed in units of 512 B×four sectors, with 8 B×four sectors of T10 DIF information recorded in the compression information” which indicates that the compression information must store an offset to know where the 512 B of raw data ends and where the 8B of T10 DIF information begins which is stored separately. That is the 8B of T10 DIF acts as the length or byte count of being managed separately are skipped follow by doing it for every starting point of 4 sectors which is the offset. Thus the specific amount of data which is the 8 bytes is attached to each of the specific blocks 512B to be left uncompressed and detached from each know sector thus would specify a offset to locate that T10 DIF needed to provide a location for the end of each sector.
As such, ARAI teaches the above limitation.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-2, 5-9, 12-16, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over U.S Patent Application Publication 2008/0098083 issued to Shergill et al. (hereinafter as “Shergill”) in view of U.S Patent Application Publication 2016/0342545 issued to ARAI et al. (hereinafter as “ARAI”) in further view of U.S Patent 10,452,616 issued to Bassov et al. (hereinafter as “Bassov”).
Regarding claim 1, Shergill teaches defining a set of data blocks within the data set based on the hash block size (Shergill: [0047]-[0050]; calculate a hash value associated with the LOB file...LOB file 50 have certain prescribed size {NOTE: Shergill also teaches the hash block size being selected independent of the compression block size as the LOB file having a prescribed size setting is independent from the compression block size as the prescribing is based on hashing and searching for duplicates associated to the prescribed size});
electronically analyzing the data set to identifyassociated with the data block to skip (Shergill: [0052]; The de-duplication flag 310 is used to indicate whether duplication of already stored data is desired. In the illustrated example, the de-duplication flag 310 for the LOB file is set to “ON,” indicating that no duplication of already stored data is desired. [0096]; after the LOB file data have been processed by the data encryption module 26. The metadata are similar to those discussed with reference to FIG. 11A, except that the metadata of FIG. 16A also includes an encryption flag 800 and an encryption key address 802. In the illustrated example, the encryption flag 800 is set to “ON,” indicating that the compression units “CU1,” “CU2,” and “CU3” are encrypted. [0101]; Compression unit “CU1” has an encryption flag that is set to “ON,” indicating that the compression unit “CU1” has been encrypted, wherein the encryption key is located at address “K1.” {Examiner correlates that since encryption flag is on this would indicates a data protection field in which the data is already encrypted therefore should not be hashed and should be skip since its sensitive data that requires encryption. See Fig. 16B below
PNG
media_image1.png
246
632
media_image1.png
Greyscale
});
generating, by a processor, a hash for each data block in the set of data blocks within the data set and refraining from hashing the data protection field associated with the data block (Shergill: [0048]; upon receiving the first block 52 a of the LOB file 50, the de-duplication module 22 then calculates a hash value for the block 52 a. The de-duplication module 22 then checks to see if the calculated hash value can be found in the database 14…then continues to receive the next block 52 b of the LOB file 50, and calculates a second hash value using the data from block 52 b…calculates a third hash value using the data from block 52 b. The above process is repeated until the last block 52 (e.g., block 52 e) of the LOB file 50 is received and processed. [0101]; Compression unit “CU1” has an encryption flag that is set to “ON,” indicating that the compression unit “CU1” has been encrypted, wherein the encryption key is located at address “K1.” {Examiner correlates that since encryption flag is on this indicates a data protection field associated with this hash should not be hashed since its sensitive data that requires encryption
PNG
media_image1.png
246
632
media_image1.png
Greyscale
});
deduplicating a data block in the data set based on a respective hash for the data block (Shergill: [0048]; For example, upon receiving the first block 52 a of the LOB file 50, the de-duplication module 22 then calculates a hash value for the block 52 a. The de-duplication module 22 then checks to see if the calculated hash value can be found in the database 14. For example, the de-duplication module 22 can look up a hash value table or a B-tree. If the calculated hash value cannot be found, then the de-duplication module 22 determines that the LOB file 50 is not yet stored by the system 10. Alternatively, if the calculated hash value can be found, the de-duplication module 22 then continues to receive the next block 52 b of the LOB file 50, and calculates a second hash value using the data from block 52 b. [0058]; As shown in FIG. 5A, the LOB file identifier “XYZ” can be found in the table 300, and has a counter value of “2.” The de-duplication module 22 then updates the counter from “2” to “1” and remove the LOB identifier “XYZ,” thereby “deleting” the LOB file “XYZ” without actually deleting the LOB file data (FIG. 6));
Shergill does not explicitly teach compressing the data set based on the compression block size and refraining from compressing the data protection field; and adding the data protection field, identified as skipped data, to an electronic record for electronic storage.
However, Arai teaches compressing the data set based on the compression block size and refraining from compressing the data protection field (Arai: [0096]; For example, data protection information, e.g., a T10 DIF, which is often attached on a sector-by-sector basis in storage, may be detached and left in the compression information instead of being compressed. In a case where 8 B of T10 DIF is attached to 512 B of data, the data may be compressed in units of 512 B×four sectors, with 8 B×four sectors of T10 DIF information recorded in the compression information {Examiner correlates the 512 B x four sectors as a data set based compressed based on a compression block size and a the T10DIF as the data protection field in which in the manner it is detached causes that field not to be compressed but left in the compression information}); and
adding the data protection field, identified as skipped data, to an electronic record for electronic storage, wherein the electronic record is configured to store a skip descriptor comprising an offset and a byte count for the data protection field (Arai: [0096]; For example, data protection information, e.g., a T10 DIF, which is often attached on a sector-by-sector basis in storage, may be detached and left in the compression information instead of being compressed. In a case where 8 B of T10 DIF is attached to 512 B of data, the data may be compressed in units of 512 B×four sectors, with 8 B×four sectors of T10 DIF information recorded in the compression information. In a case where sectors are 4,096 B and 8 B of T10 DIF is attached, 4,096 B are compressed and 8 B are recorded in the compression information {{Examiner correlates the compression information acts as a metadata holder in which indicates 8 bytes is treated as a skip descriptor defining skipped data and stored separately (in compression information). The statement of “the data may be compressed in units of 512 B×four sectors, with 8 B×four sectors of T10 DIF information recorded in the compression information” which indicates that the compression information must store an offset to know where the 512 B of raw data ends and where the 8B of T10 DIF information begins which is stored separately. That is the 8B of T10 DIF acts as the length or byte count of being managed separately are skipped follow by doing it for every starting point of 4 sectors which is the offset. Thus the specific amount of data which is the 8 bytes is attached to each of the specific blocks 512B to be left uncompressed and detached from each know sector thus would specify a offset to locate that T10 DIF needed to provide a location for the end of each sector}).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify Shergill (teaches defining a set of data blocks within the data set based on the hash block size; generating a hash for each data block in the set of data blocks within the data set; deduplicating a data block in the data set based on a respective hash for the data block; and compressing the data set based on the compression block size) with the teachings of Arai (teaches compressing the data set based on the compression block size and refraining from compressing the data protection field; and adding the data protection field, identified as skipped data, to an electronic record for electronic storage…). One of ordinary skill in the art would have been motivated to make such a combination is by providing better results in storing data without being compress improves the processing efficiency (See Arai: [0108]). In addition, the references (Shergill and Arai) teach features that are directed to analogous art and they are directed to the same field of endeavor as Shergill and Arai are directed to analyzing data block information and performing the according criteria when analyze.
The modification of Shergill and Arai teaches claimed invention substantially as claimed, however the modification of Shergill and Arai does not explicitly teach a method, comprising: identifying a data set to deduplicate based on a hash block size and to compress based on a compression block size, the hash block size being selected independent of the compression block size;
However, Bassov teaches a method, comprising: identifying a data set to deduplicate based on a hash block size and to compress based on a compression block size (Bassov: Col 9, lines 34-37; depending on the data deduplication technique utilized, the chunks of 210 may be of varying or different sizes. Each chunk is provided as an input to hash function 215. Col 9, lines 30-31; The original data may be partitioned into multiple data chunks Cl, C2, C3, C4 and the like. Col 9, lines 40-42; each chunk of 210, the hash function 215 may perform processing and generate, as an output, a hash value or digest. Col 10, lines 24-28; to map the digest to a corresponding table entry varies…when storing a new data chunk. Col 18, lines 56-60; using a variable size compression granularity, unit or chunk of data compressed as a single unit…have an entire data set or all compressed data stored in the data storage system be compressed in the same compression unit size, such as 4 KB or 8 KB {Examiner correlates the identifying the data set to deduplicate based on the hash block size based on the chunk being outputted from the hash function in which generates a hash value and the system decide to map the digest varies when storing the new data chunk}),
the hash block size being selected independent of the compression block size (Bassov: Col 18, lines 59-62; the data storage system…provide for varying the size of the compression granularity, unit or chunk compressed as a single unit. Col 18, lines 66-67 and Col 19, lines 1-7; use the entropy metric, such as described elsewhere herein to select the number of blocks (e.g., size or amount of data) to compress in a single unit….amount of data in a single chunk to decide whether to compress, as a single unit {Examiner correlates selecting the hash block size based on selecting the number of blocks (size or amount of data) before being compressed in the blocks are selected independent of the compression block size as the system is selecting the number of data to compress});
Although, Bassov teaches the chunks may be of varying or different sizes (See Bassov: Col 9, lines 34-37; deduplication technique utilized, the chunks…may be of varying or different sizes), comparing the size of the compressed chunk to the original uncompressed size of the chunk (See Bassov: Col 16, lines 43-45; the size of the compressed chunk is compared to the original uncompressed size of the chunk input to compression) and designating a size to be specified for the size of the compression (See Bassov :Col 18, lines 17-22; denoting a number of 4 KB blocks specified as the size of the compression unit or compression granularity. For example, an X value of 4 means that the data set is partitioned into chunks having a size of 4 blocks where compression is performed with respect to each 4 block chunk or unit). Bassov does not explicitly teach the hash block size to be smaller than the compression block size.
However, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify Shergill (teaches defining a set of data blocks within the data set based on the hash block size; generating a hash for each data block in the set of data blocks within the data set; deduplicating a data block in the data set based on a respective hash for the data block; and compressing the data set based on the compression block size) with the teachings of Arai (teaches compressing the data set based on the compression block size and refraining from compressing the data protection field; and adding the data protection field, identified as skipped data, to an electronic record for electronic storage…) with the further teachings of Bassov (teaches a method, comprising: identifying a data set to deduplicate based on a hash block size and to compress based on a compression block size). One of ordinary skill in the art would have been motivated to make such combination would have been obvious skill in the art to evaluate the block size and designate the hash block size to be smaller than the compression block size by choosing a finite number of identified prediction solutions (larger than, smaller than, equal to)) that, with a reasonable expectation of success and rationale may be support a conclusion of obviousness (See KSR Int'l Co. v. Teleflex Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007) See MPEP 2143 (I) (E))).
Regarding claim 2, the modification of Shergill, Arai, and Bassov teaches claimed invention substantially as claimed, however, the modification of Shergill, Arai, and Bassov does not explicitly teach the compression block size is at least twice a size of the hash block size.
Although, Bassov teaches the chunks may be of varying or different sizes (See Bassov: Col 9, lines 34-37; deduplication technique utilized, the chunks…may be of varying or different sizes), comparing the size of the compressed chunk to the original uncompressed size of the chunk (See Bassov: Col 16, lines 43-45; the size of the compressed chunk is compared to the original uncompressed size of the chunk input to compression), and designating a size to be specified for the size of the compression (See Bassov :Col 18, lines 17-22; denoting a number of 4 KB blocks specified as the size of the compression unit or compression granularity. For example, an X value of 4 means that the data set is partitioned into chunks having a size of 4 blocks where compression is performed with respect to each 4 block chunk or unit). Bassov does not explicitly teach the compression block size is at least twice a size of the hash block size.
However, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify Shergill (teaches defining a set of data blocks within the data set based on the hash block size; generating a hash for each data block in the set of data blocks within the data set; deduplicating a data block in the data set based on a respective hash for the data block; and compressing the data set based on the compression block size) with the teachings of Arai (teaches compressing the data set based on the compression block size and refraining from compressing the data protection field; and adding the data protection field, identified as skipped data, to an electronic record for electronic storage…) with the further teachings of Bassov (teaches a method, comprising: identifying a data set to deduplicate based on a hash block size and to compress based on a compression block size). One of ordinary skill in the art would have been motivated to make such combination would have been obvious skill in the art to evaluate the block size and to designated the compression block size to be larger than the hash block size by choosing a finite number of identified prediction solutions (twice then, triple then, or larger than)) that, with a reasonable expectation of success and rationale may be support a conclusion of obviousness (See KSR Int'l Co. v. Teleflex Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007) See MPEP 2143 (I) (E))).
Regarding claim 5, the modification of Shergill, Arai, and Bassov teaches claimed invention substantially as claimed, and Shergill further teaches receiving, via a user interface, user input indicative of the hash block size (Shergill: [0046]; In the illustrated embodiments, the system 10 includes a user interface that allows a user, such as an administrator, to input the prescribed data processing threshold. [0051]; For example, the system 10 may be configured to monitor the size of the portion of the LOB file 50 that has been processed by the de-duplication module 22. [0084]; the LOB file 50 is separated by the data receiving module 22, based on the prescribed processing threshold, into five blocks 52 a-52 e); and
setting the hash block size based on the user input (Shergill: [0064]; FIG. 7 illustrates an example in which the LOB file 50 is separated into five blocks (such as blocks 52 a-52 e shown in FIG. 2). Blocks 52 a-52 e have block identifiers “B1”-“B5,” respectively. The sizes of blocks 52 a-52 e are 250 kb, 250 kb, 248 kb, 252 kb, and 30 kb, respectively. [0073]; In the illustrated example, the prescribed unit size is 500 kb, which may be input by a user, such as an administrator).
Regarding claim 6, the modification of Shergill, Arai, and Bassov teaches claimed invention substantially as claimed, and Shergill further teaches the data set is an electronic file, wherein the compression block size is equal to a size of the electronic file (Shergill: [0070]-[0072]; In other embodiments, the user interface may also allow the user to prescribe which data compression algorithm (which may or may not corresponds to a desired level of compression) to use for certain file based on the file type and/or file size…the data compression module 24 may be configured to determine compression efficiency on at least a portion of the LOB file, and automatically determines a level of data compression for the LOB file. Returning to FIG. 9, next, the data compression module 24 compresses the LOB file data based on the compression criteria obtained from Step 402 (Step 404), and then stores the compressed LOB file data in the database 14 (Step 406)).
Regarding claim 7, the modification of Shergill, Arai, and Bassov teaches claimed invention substantially as claimed, and Shergill further teaches the deduplicated data block is not to be compressed (Shergill: [0069]-[0070]; On the other hand, if the de-duplication module 22 determines that the LOB file 50 is not yet stored, the de-duplication module 22 then passes the LOB file data to the data compression module 24. In the illustrated embodiments, the system 10 may include a user interface that allows a user, such as an administrator, to input the data compression criteria. For example, an administrator may prescribe one of four levels of compression, namely, “None,” “Low,” “Medium,” and “High” for a certain type or/and size of file. “None” compression is prescribed when no compression is desired to be performed for the file).
Regarding claim 8, Shergill teaches defining a set of data blocks within the data set based on the hash block size (Shergill: [0047]-[0050]; calculate a hash value associated with the LOB file...LOB file 50 have certain prescribed size {NOTE: Shergill also teaches the hash block size being selected independent of the compression block size as the LOB file having a prescribed size setting is independent from the compression block size as the prescribing is based on hashing and searching for duplicates associated to the prescribed size});
electronically analyzing the data set to identify a data protection field associated with the data block to skip, the data protection field to skip not being deduplicated (Shergill: [0052]; The de-duplication flag 310 is used to indicate whether duplication of already stored data is desired. In the illustrated example, the de-duplication flag 310 for the LOB file is set to “ON,” indicating that no duplication of already stored data is desired. [0096]; after the LOB file data have been processed by the data encryption module 26. The metadata are similar to those discussed with reference to FIG. 11A, except that the metadata of FIG. 16A also includes an encryption flag 800 and an encryption key address 802. In the illustrated example, the encryption flag 800 is set to “ON,” indicating that the compression units “CU1,” “CU2,” and “CU3” are encrypted. [0101]; Compression unit “CU1” has an encryption flag that is set to “ON,” indicating that the compression unit “CU1” has been encrypted, wherein the encryption key is located at address “K1.” {Examiner correlates that since encryption flag is on this indicates a data protection field associated with this hash should not be hashed since its sensitive data that requires encryption and if the deduplication flag is off then the data protection field would not be deduplicated in combination with encryption flag being on. See Fig. 16B below
PNG
media_image1.png
246
632
media_image1.png
Greyscale
});
generating, by a processor, a first instruction to generate a hash for each data block in the set of data blocks within the data set except for the data protection field (Shergill: [0048]; upon receiving the first block 52 a of the LOB file 50, the de-duplication module 22 then calculates a hash value for the block 52 a. The de-duplication module 22 then checks to see if the calculated hash value can be found in the database 14…then continues to receive the next block 52 b of the LOB file 50, and calculates a second hash value using the data from block 52 b…calculates a third hash value using the data from block 52 b. The above process is repeated until the last block 52 (e.g., block 52 e) of the LOB file 50 is received and processed. [0052]; The de-duplication flag 310 is used to indicate whether duplication of already stored data is desired. In the illustrated example, the de-duplication flag 310 for the LOB file is set to “ON,” indicating that no duplication of already stored data is desired. [0101]; Compression unit “CU1” has an encryption flag that is set to “ON,” indicating that the compression unit “CU1” has been encrypted, wherein the encryption key is located at address “K1.” {Examiner correlates that since encryption flag is on this indicates a data protection field associated with this hash should not be hashed since its sensitive data that requires encryption and if the deduplication flag is off then the data protection field would not be deduplicated in combination with encryption flag being on. See Fig. 16B below
PNG
media_image1.png
246
632
media_image1.png
Greyscale
});
generating a second instruction to deduplicate a data block in the data set based on a respective hash for the data block to generate a deduplicated data set (Shergill: [0053]; requesting the system 10 to perform data de-duplication. In such cases, the system 10 determines that data de-duplication is desired if it receives a request from the client 12 to perform data de-duplication. [0056]; a request to the system 10 requesting that a LOB file having identifier “ABC” be stored, wherein the LOB file data is the same as that-of file “XYZ” stored for client 12 a. Because the LOB data for files “XYZ” and “ABC” are the same, the calculated hash value for the file “ABC” would be the same as the hash value for the “XYZ” file. The de-duplication module 22, upon checking the table 300, will determine that data being the same as that of the file “ABC” is already stored by the system 10 because the calculated hash value (“H3” in the example) can be found); and
generating a third instruction to compress the deduplicated data set based on the compression block size (Shergill: [0070]; Upon detecting that there is a LOB file that is desired to be stored at the system 10, the data compression module 24 first checks data compression criteria (Step 402). The data compression criteria prescribes whether and/or how to perform data compression for the LOB file data based on certain rules set by a user...the user interface may also allow the user to prescribe which data compression algorithm (which may or may not corresponds to a desired level of compression) to use for certain file based on the file type and/or file size).
Shergill does not explicitly teach adding the data protection field, identified as skipped data, to an electronic record for electronic storage.
However, Arai teaches adding the data protection field, identified as skipped data, to an electronic record for electronic storage, wherein the electronic record is configured to store a skip descriptor comprising an offset and a byte count for the data protection field (Arai: [0096]; For example, data protection information, e.g., a T10 DIF, which is often attached on a sector-by-sector basis in storage, may be detached and left in the compression information instead of being compressed. In a case where 8 B of T10 DIF is attached to 512 B of data, the data may be compressed in units of 512 B×four sectors, with 8 B×four sectors of T10 DIF information recorded in the compression information. In a case where sectors are 4,096 B and 8 B of T10 DIF is attached, 4,096 B are compressed and 8 B are recorded in the compression information {{Examiner correlates the compression information acts as a metadata holder in which indicates 8 bytes is treated as a skip descriptor defining skipped data and stored separately (in compression information). The statement of “the data may be compressed in units of 512 B×four sectors, with 8 B×four sectors of T10 DIF information recorded in the compression information” which indicates that the compression information must store an offset to know where the 512 B of raw data ends and where the 8B of T10 DIF information begins which is stored separately. That is the 8B of T10 DIF acts as the length or byte count of being managed separately are skipped follow by doing it for every starting point of 4 sectors which is the offset. Thus the specific amount of data which is the 8 bytes is attached to each of the specific blocks 512B to be left uncompressed and detached from each know sector thus would specify a offset to locate that T10 DIF needed to provide a location for the end of each sector}}).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify Shergill (teaches defining a set of data blocks within the data set based on the hash block size; generating a hash for each data block in the set of data blocks within the data set; deduplicating a data block in the data set based on a respective hash for the data block; and compressing the data set based on the compression block size) with the teachings of Arai (teaches adding the data protection field, identified as skipped data, to an electronic record for electronic storage…). One of ordinary skill in the art would have been motivated to make such a combination is by providing better results in storing data without being compress improves the processing efficiency (See Arai: [0108]). In addition, the references (Shergill and Arai) teach features that are directed to analogous art and they are directed to the same field of endeavor as Shergill and Arai are directed to analyzing data block information and performing the according criteria when analyze.
The modification of Shergill and Arai teaches claimed invention substantially as claimed, however the modification of Shergill and Arai does not explicitly teach a method, comprising: identifying a data set to deduplicate based on a hash block size and to compress based on a compression block size, the hash block size being selected independent of the compression block size;
However, Bassov teaches a method, comprising: identifying a data set to deduplicate based on a hash block size and to compress based on a compression block size (Bassov: Col 9, lines 34-37; depending on the data deduplication technique utilized, the chunks of 210 may be of varying or different sizes. Each chunk is provided as an input to hash function 215. Col 9, lines 40-42; each chunk of 210, the hash function 215 may perform processing and generate, as an output, a hash value or digest. Col 10, lines 24-28; to map the digest to a corresponding table entry varies…when storing a new data chunk. Col 13, lines 24-31; determine whether or not it (the data chunk) is compressible (and should therefore be stored in its compressed form), or otherwise achieves at least a minimum amount of data reduction (e.g., whether or not a compressed form of a data chunk has a reduced size that is less than the size of the original data chunk by at least a threshold amount) to warrant storing the chunk in its compressed form).
Although, Bassov teaches the chunks may be of varying or different sizes (See Bassov: Col 9, lines 34-37; deduplication technique utilized, the chunks…may be of varying or different sizes), comparing the size of the compressed chunk to the original uncompressed size of the chunk (See Bassov: Col 16, lines 43-45; the size of the compressed chunk is compared to the original uncompressed size of the chunk input to compression) and designating a size to be specified for the size of the compression (See Bassov :Col 18, lines 17-22; denoting a number of 4 KB blocks specified as the size of the compression unit or compression granularity. For example, an X value of 4 means that the data set is partitioned into chunks having a size of 4 blocks where compression is performed with respect to each 4 block chunk or unit). Bassov does not explicitly teach the hash block size to be smaller than the compression block size.
However, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify Shergill (teaches defining a set of data blocks within the data set based on the hash block size; generating a hash for each data block in the set of data blocks within the data set; deduplicating a data block in the data set based on a respective hash for the data block; and compressing the data set based on the compression block size) with the teachings of Arai (teaches compressing the data set based on the compression block size and refraining from compressing the data protection field; and adding the data protection field, identified as skipped data, to an electronic record for electronic storage…) with the further teachings of Bassov (teaches a method, comprising: identifying a data set to deduplicate based on a hash block size and to compress based on a compression block size). One of ordinary skill in the art would have been motivated to make such combination would have been obvious skill in the art to evaluate the block size and to designated the compression block size to be larger than the hash block size by choosing a finite number of identified prediction solutions (twice then, triple then, or larger than)) that, with a reasonable expectation of success and rationale may be support a conclusion of obviousness (See KSR Int'l Co. v. Teleflex Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007) See MPEP 2143 (I) (E))).
Regarding claim 9, the modification of Shergill, Arai, and Bassov teaches claimed invention substantially as claimed, however the modification of Shergill, Arai, and Bassov does not explicitly teach the compression block size is at least twice a size of the hash block size.
Although, Bassov teaches the chunks may be of varying or different sizes (See Bassov: Col 9, lines 34-37; deduplication technique utilized, the chunks…may be of varying or different sizes), comparing the size of the compressed chunk to the original uncompressed size of the chunk (See Bassov: Col 16, lines 43-45; the size of the compressed chunk is compared to the original uncompressed size of the chunk input to compression), and designating a size to be specified for the size of the compression (See Bassov :Col 18, lines 17-22; denoting a number of 4 KB blocks specified as the size of the compression unit or compression granularity. For example, an X value of 4 means that the data set is partitioned into chunks having a size of 4 blocks where compression is performed with respect to each 4 block chunk or unit). Bassov does not explicitly teach the compression block size is at least twice a size of the hash block size.
However, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify Shergill (teaches defining a set of data blocks within the data set based on the hash block size; generating a hash for each data block in the set of data blocks within the data set; deduplicating a data block in the data set based on a respective hash for the data block; and compressing the data set based on the compression block size) with the teachings of Arai (teaches compressing the data set based on the compression block size and refraining from compressing the data protection field; and adding the data protection field, identified as skipped data, to an electronic record for electronic storage…) with the further teachings of Bassov (teaches a method, comprising: identifying a data set to deduplicate based on a hash block size and to compress based on a compression block size). One of ordinary skill in the art would have been motivated to make such combination would have been obvious skill in the art to evaluate the block size and to designated the compression block size to be larger than the hash block size by choosing a finite number of identified prediction solutions (twice then, triple then, or larger than)) that, with a reasonable expectation of success and rationale may be support a conclusion of obviousness (See KSR Int'l Co. v. Teleflex Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007) See MPEP 2143 (I) (E))).
Regarding claim 12, the modification of Shergill, Arai, and Bassov teaches claimed invention substantially as claimed, and Shergill further teaches receiving, via a user interface, user input indicative of the hash block size (Shergill: [0046]; In the illustrated embodiments, the system 10 includes a user interface that allows a user, such as an administrator, to input the prescribed data processing threshold. [0051]; For example, the system 10 may be configured to monitor the size of the portion of the LOB file 50 that has been processed by the de-duplication module 22. [0084]; the LOB file 50 is separated by the data receiving module 22, based on the prescribed processing threshold, into five blocks 52 a-52 e); and
setting the hash block size based on the user input (Shergill: [0064]; FIG. 7 illustrates an example in which the LOB file 50 is separated into five blocks (such as blocks 52 a-52 e shown in FIG. 2). Blocks 52 a-52 e have block identifiers “B1”-“B5,” respectively. The sizes of blocks 52 a-52 e are 250 kb, 250 kb, 248 kb, 252 kb, and 30 kb, respectively. [0073]; In the illustrated example, the prescribed unit size is 500 kb, which may be input by a user, such as an administrator).
Regarding claim 13, the modification of Shergill, Arai, and Bassov teaches claimed invention substantially as claimed, and Shergill further teaches the data set is an electronic file, wherein the compression block size is equal to a size of the electronic file (Shergill: [0070]-[0072]; In other embodiments, the user interface may also allow the user to prescribe which data compression algorithm (which may or may not corresponds to a desired level of compression) to use for certain file based on the file type and/or file size…the data compression module 24 may be configured to determine compression efficiency on at least a portion of the LOB file, and automatically determines a level of data compression for the LOB file. Returning to FIG. 9, next, the data compression module 24 compresses the LOB file data based on the compression criteria obtained from Step 402 (Step 404), and then stores the compressed LOB file data in the database 14 (Step 406)).
Regarding claim 14, the modification of Shergill, Arai, and Bassov teaches claimed invention substantially as claimed, and Shergill further teaches the deduplicated data block is not to be compressed (Shergill: [0069]-[0070]; On the other hand, if the de-duplication module 22 determines that the LOB file 50 is not yet stored, the de-duplication module 22 then passes the LOB file data to the data compression module 24. In the illustrated embodiments, the system 10 may include a user interface that allows a user, such as an administrator, to input the data compression criteria. For example, an administrator may prescribe one of four levels of compression, namely, “None,” “Low,” “Medium,” and “High” for a certain type or/and size of file. “None” compression is prescribed when no compression is desired to be performed for the file).
Regarding claim 15, Shergill teaches define a set of data blocks within the data set based on the hash block size, the set of data blocks being a subset of the data set (Shergill: [0047]-[0050]; calculate a hash value associated with the LOB file...LOB file 50 have certain prescribed size {NOTE: Shergill also teaches the hash block size being selected independent of the compression block size as the LOB file having a prescribed size setting is independent from the compression block size as the prescribing is based on hashing and searching for duplicates associated to the prescribed size});
generate a hash for each data block in the set of data blocks within the data set (Shergill: [0048]; upon receiving the first block 52 a of the LOB file 50, the de-duplication module 22 then calculates a hash value for the block 52 a. The de-duplication module 22 then checks to see if the calculated hash value can be found in the database 14…then continues to receive the next block 52 b of the LOB file 50, and calculates a second hash value using the data from block 52 b…calculates a third hash value using the data from block 52 b. The above process is repeated until the last block 52 (e.g., block 52 e) of the LOB file 50 is received and processed); and
deduplicate a data block in the set of data blocks based on a respective hash for the data block (Shergill: [0053]; requesting the system 10 to perform data de-duplication. In such cases, the system 10 determines that data de-duplication is desired if it receives a request from the client 12 to perform data de-duplication. [0056]; a request to the system 10 requesting that a LOB file having identifier “ABC” be stored, wherein the LOB file data is the same as that-of file “XYZ” stored for client 12 a. Because the LOB data for files “XYZ” and “ABC” are the same, the calculated hash value for the file “ABC” would be the same as the hash value for the “XYZ” file. The de-duplication module 22, upon checking the table 300, will determine that data being the same as that of the file “ABC” is already stored by the system 10 because the calculated hash value (“H3” in the example) can be found).
refrain from hashing a data protection field associated with the data set based on the data protection field being identified as a skip candidate, the data protection fieldupon receiving the first block 52 a of the LOB file 50, the de-duplication module 22 then calculates a hash value for the block 52 a. The de-duplication module 22 then checks to see if the calculated hash value can be found in the database 14…then continues to receive the next block 52 b of the LOB file 50, and calculates a second hash value using the data from block 52 b…calculates a third hash value using the data from block 52 b. The above process is repeated until the last block 52 (e.g., block 52 e) of the LOB file 50 is received and processed. [0052]; The de-duplication flag 310 is used to indicate whether duplication of already stored data is desired. In the illustrated example, the de-duplication flag 310 for the LOB file is set to “ON,” indicating that no duplication of already stored data is desired. [0101]; Compression unit “CU1” has an encryption flag that is set to “ON,” indicating that the compression unit “CU1” has been encrypted, wherein the encryption key is located at address “K1.” {Examiner correlates that since encryption flag is on this indicates a data protection field associated with this hash should not be hashed since its sensitive data that requires encryption and if the deduplication flag is off then the data protection field would not be deduplicated in combination with encryption flag being on. See Fig. 16B below
PNG
media_image1.png
246
632
media_image1.png
Greyscale
});
Shergill does not explicitly teach adding the data protection field, identified as skipped data, to an electronic record for electronic storage.
However, Arai teaches add the data protection field, identified as skipped data, to an electronic record for electronic storage, wherein the electronic record is configured to store a skip descriptor comprising an offset and a byte count for the data protection field (Arai: [0096]; For example, data protection information, e.g., a T10 DIF, which is often attached on a sector-by-sector basis in storage, may be detached and left in the compression information instead of being compressed. In a case where 8 B of T10 DIF is attached to 512 B of data, the data may be compressed in units of 512 B×four sectors, with 8 B×four sectors of T10 DIF information recorded in the compression information. In a case where sectors are 4,096 B and 8 B of T10 DIF is attached, 4,096 B are compressed and 8 B are recorded in the compression information {{Examiner correlates the compression information acts as a metadata holder in which indicates 8 bytes is treated as a skip descriptor defining skipped data and stored separately (in compression information). The statement of “the data may be compressed in units of 512 B×four sectors, with 8 B×four sectors of T10 DIF information recorded in the compression information” which indicates that the compression information must store an offset to know where the 512 B of raw data ends and where the 8B of T10 DIF information begins which is stored separately. That is the 8B of T10 DIF acts as the length or byte count of being managed separately are skipped follow by doing it for every starting point of 4 sectors which is the offset. Thus the specific amount of data which is the 8 bytes is attached to each of the specific blocks 512B to be left uncompressed and detached from each know sector thus would specify a offset to locate that T10 DIF needed to provide a location for the end of each sector}}).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify Shergill (teaches defining a set of data blocks within the data set based on the hash block size; generating a hash for each data block in the set of data blocks within the data set; deduplicating a data block in the data set based on a respective hash for the data block; and compressing the data set based on the compression block size) with the teachings of Arai (teaches adding the data protection field, identified as skipped data, to an electronic record for electronic storage…). One of ordinary skill in the art would have been motivated to make such a combination is by providing better results in storing data without being compress improves the processing efficiency (See Arai: [0108]). In addition, the references (Shergill and Arai) teach features that are directed to analogous art and they are directed to the same field of endeavor as Shergill and Arai are directed to analyzing data block information and performing the according criteria when analyze.
The modification of Shergill and Arai teaches claimed invention substantially as claimed, however the modification of Shergill and Arai does not explicitly teach a system, comprising: a memory; and a processing unit coupled to the memory, the processing unit being configured to cause the system to perform operations comprising: identify a data set to deduplicate based on a hash block size.
However, Bassov teaches a system, comprising: a memory; and a processing unit coupled to the memory (Bassov: Col 13, lines 4-6; the processor performs processing, such as in connection with inline processing 105a, 105b as noted 5 above, data may be loaded from main memory), the processing unit being configured to cause the system to perform operations comprising: identify a data set to deduplicate based on a hash block size (Bassov: Col 9, lines 34-37; depending on the data deduplication technique utilized, the chunks of 210 may be of varying or different sizes. Each chunk is provided as an input to hash function 215. Col 9, lines 40-42; each chunk of 210, the hash function 215 may perform processing and generate, as an output, a hash value or digest. Col 10, lines 24-28; to map the digest to a corresponding table entry varies…when storing a new data chunk).
Although, Bassov teaches the chunks may be of varying or different sizes (See Bassov: Col 9, lines 34-37; deduplication technique utilized, the chunks…may be of varying or different sizes), comparing the size of the compressed chunk to the original uncompressed size of the chunk (See Bassov: Col 16, lines 43-45; the size of the compressed chunk is compared to the original uncompressed size of the chunk input to compression) and designating a size to be specified for the size of the compression (See Bassov :Col 18, lines 17-22; denoting a number of 4 KB blocks specified as the size of the compression unit or compression granularity. For example, an X value of 4 means that the data set is partitioned into chunks having a size of 4 blocks where compression is performed with respect to each 4 block chunk or unit). Bassov does not explicitly teach the hash block size to be smaller than the compression block size.
However, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify Shergill (teaches defining a set of data blocks within the data set based on the hash block size; generating a hash for each data block in the set of data blocks within the data set; deduplicating a data block in the data set based on a respective hash for the data block; and compressing the data set based on the compression block size) with the teachings of Arai (teaches refrain from hashing a particular data block within the data set based on the particular data block being identified as a skip candidate, the particular data block not being deduplicated…) with the further teachings of Bassov (teaches a method, comprising: identifying a data set to deduplicate based on a hash block size and to compress based on a compression block size). One of ordinary skill in the art would have been motivated to make such combination would have been obvious skill in the art to evaluate the block size and to designate the hash block size to be smaller than the compression block size by choosing a finite number of identified prediction solutions (larger than, smaller than, equal to)) that, with a reasonable expectation of success and rationale may be support a conclusion of obviousness (See KSR Int'l Co. v. Teleflex Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007) See MPEP 2143 (I) (E))).
Regarding claim 16, the modification of Shergill, Arai, and Bassov teaches claimed invention substantially as claimed, however the modification of Shergill, Arai, and Bassov does not explicitly teach the compression block size is at least twice a size of the hash block size.
Although, Bassov teaches the chunks may be of varying or different sizes (See Bassov: Col 9, lines 34-37; deduplication technique utilized, the chunks…may be of varying or different sizes), comparing the size of the compressed chunk to the original uncompressed size of the chunk (See Bassov: Col 16, lines 43-45; the size of the compressed chunk is compared to the original uncompressed size of the chunk input to compression), and designating a size to be specified for the size of the compression (See Bassov :Col 18, lines 17-22; denoting a number of 4 KB blocks specified as the size of the compression unit or compression granularity. For example, an X value of 4 means that the data set is partitioned into chunks having a size of 4 blocks where compression is performed with respect to each 4 block chunk or unit). Bassov does not explicitly teach the compression block size is at least twice a size of the hash block size.
However, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify Shergill (teaches defining a set of data blocks within the data set based on the hash block size; generating a hash for each data block in the set of data blocks within the data set; deduplicating a data block in the data set based on a respective hash for the data block; and compressing the data set based on the compression block size) with the teachings of Arai (teaches refrain from hashing a particular data block within the data set based on the particular data block being identified as a skip candidate, the particular data block not being deduplicated…) with the further teachings of Bassov (teaches a method, comprising: identifying a data set to deduplicate based on a hash block size and to compress based on a compression block size). One of ordinary skill in the art would have been motivated to make such combination would have been obvious skill in the art to evaluate the block size and to designated the compression block size to be larger than the hash block size by choosing a finite number of identified prediction solutions (twice then, triple then, or larger than)) that, with a reasonable expectation of success and rationale may be support a conclusion of obviousness (See KSR Int'l Co. v. Teleflex Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007) See MPEP 2143 (I) (E))).
Regarding claim 19, the modification of Shergill, Arai, and Bassov teaches claimed invention substantially as claimed, and Shergill further teaches receive, via a user interface, user input indicative of the hash block size (Shergill: [0046]; In the illustrated embodiments, the system 10 includes a user interface that allows a user, such as an administrator, to input the prescribed data processing threshold. [0051]; For example, the system 10 may be configured to monitor the size of the portion of the LOB file 50 that has been processed by the de-duplication module 22. [0084]; the LOB file 50 is separated by the data receiving module 22, based on the prescribed processing threshold, into five blocks 52 a-52 e); and
set the hash block size based on the user input (Shergill: [0064]; FIG. 7 illustrates an example in which the LOB file 50 is separated into five blocks (such as blocks 52 a-52 e shown in FIG. 2). Blocks 52 a-52 e have block identifiers “B1”-“B5,” respectively. The sizes of blocks 52 a-52 e are 250 kb, 250 kb, 248 kb, 252 kb, and 30 kb, respectively. [0073]; In the illustrated example, the prescribed unit size is 500 kb, which may be input by a user, such as an administrator).
Regarding claim 20, the modification of Shergill, Arai, and Bassov teaches claimed invention substantially as claimed, and Shergill further teaches the data set is an electronic file, wherein the compression block size is equal to a size of the electronic file (Shergill: [0070]-[0072]; In other embodiments, the user interface may also allow the user to prescribe which data compression algorithm (which may or may not corresponds to a desired level of compression) to use for certain file based on the file type and/or file size…the data compression module 24 may be configured to determine compression efficiency on at least a portion of the LOB file, and automatically determines a level of data compression for the LOB file. Returning to FIG. 9, next, the data compression module 24 compresses the LOB file data based on the compression criteria obtained from Step 402 (Step 404), and then stores the compressed LOB file data in the database 14 (Step 406)).
Claims 3-4, 10-11 and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over U.S Patent Application Publication 2008/0098083 issued to Shergill et al. (hereinafter as “Shergill”) in view of U.S Patent Application Publication 2016/0342545 issued to ARAI et al. (hereinafter as “ARAI”) in view of U.S Patent 10,452,616 issued to Bassov et al. (hereinafter as “Bassov”) in further view of U.S Patent Application Publication 2010/0250896 issued to John Edward Gerard Matze (hereinafter as "Matze").
Regarding claim 3, the modification of Shergill, ARAI, and Bassov teaches claimed invention substantially as claimed, however the modification of Shergill, ARAI, and Bassov does not explicitly teach the hash block size is a divisor of the compression block size.
Matze teaches the hash block size is a divisor of the compression block size (Matze: [0021]; In one embodiment, a system for deduplicating data may comprise a dedicated hardware card operable to receive at least one data block. A processor on the card may generate a hash for each data block. The system may also comprise a first module that determines a processing status for the hash. The processing status may indicate whether the data block associated with the hash is unique. [0023]; Each VBD can be configured to use different deduplication block sizes. For example, the deduplication block size may be set at 4 k, 8 k, 16 k, or 32 k. If the block size is set at 4 k, then a file will be broken into however many 4 k sized blocks are necessary to contain the file).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify Shergill (teaches defining a set of data blocks within the data set based on the hash block size; generating a hash for each data block in the set of data blocks within the data set; deduplicating a data block in the data set based on a respective hash for the data block; and compressing the data set based on the compression block size) with the teachings of ARAI (teaches compressing the data set based on the compression block size and refraining from compressing the data protection field; and adding the data protection field, identified as skipped data, to an electronic record for electronic storage…) with the teachings of Bassov (teaches a method, comprising: identifying a data set to deduplicate based on a hash block size and to compress based on a compression block size) with the further teachings of Matze (teaches the hash block size is a divisor of the compression block size). One of ordinary skill in the art would have been motivated to make such a combination of providing better results in evaluating the data set and determine the block size accordingly to deduplicate by reviewing the data set (See Matze: [0023]; configured to provide improved performance for different applications having different performance requirements). In addition, the references (Shergill, Arai, Bassov, and Matze) teach features that are directed to analogous art and they are directed to the same field of endeavor as Shergill, ARAI, Bassov, and Matze are directed to analyzing data block information and performing the according criteria when analyze.
Regarding claim 4, the modification of Shergill, ARAI, and Bassov teaches claimed invention substantially as claimed, however the modification of Shergill, ARAI, and Bassov does not explicitly teach the compression block size block size is not an integer multiple of the hash block size, wherein a second data block of the set of data blocks has a unique hash block size, wherein compressing the data set based on the compression block size comprises at least one of: refraining from compressing the second data block; or discarding a hash that is associated with the second data block.
Matze teaches the compression block size block size is not an integer multiple of the hash block size (Matze: [0019]; The method additionally comprises discarding the hash value and the block of data if the hash value is not unique, and writing the block of data to a disk if the hash value is unique. [0023]; Each VBD can be configured to use different deduplication block sizes. For example, the deduplication block size may be set at 4 k, 8 k, 16 k, or 32 k. If the block size is set at 4 k, then a file will be broken into however many 4 k sized blocks are necessary to contain the file. [0030]; As shown in FIG. 4, in one embodiment, each LCN is then mapped to addresses in the Virtual Block Device), wherein
a second data block of the set of data blocks has a unique hash block size (Matze: [0019]; The method additionally comprises discarding the hash value and the block of data if the hash value is not unique, and writing the block of data to a disk if the hash value is unique. Furthermore, data compression services may be used in conjunction with data deduplication to further minimize the required storage space for a given dataset), wherein compressing the data set based on the compression block size comprises at least one of: refraining from compressing the second data block; or discarding a hash that is associated with the second data block (Matze: [0019]; The method additionally comprises discarding the hash value and the block of data if the hash value is not unique, and writing the block of data to a disk if the hash value is unique. Furthermore, data compression services may be used in conjunction with data deduplication to further minimize the required storage space for a given dataset).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify Shergill (teaches defining a set of data blocks within the data set based on the hash block size; generating a hash for each data block in the set of data blocks within the data set; deduplicating a data block in the data set based on a respective hash for the data block; and compressing the data set based on the compression block size) with the teachings of ARAI (teaches identifying a particular data block to skip, the particular data block to skip not being deduplicated…)
with the teachings of Bassov (teaches a method, comprising: identifying a data set to deduplicate based on a hash block size and to compress based on a compression block size) with the further teachings of Matze (teaches the hash block size is a divisor of the compression block size). One of ordinary skill in the art would have been motivated to make such a combination of providing better results in evaluating the data set and determine the block size accordingly to deduplicate by reviewing the data set (See Matze: [0023]; configured to provide improved performance for different applications having different performance requirements). In addition, the references (Shergill, ARAI, Bassov, and Matze) teach features that are directed to analogous art and they are directed to the same field of endeavor as Shergill, ARAI, Bassov, and Matze are directed to analyzing data block information and performing the according criteria when analyze.
Regarding claim 10, the modification of Shergill, ARAI, and Bassov teaches claimed invention substantially as claimed, however the modification of Shergill and Bassov does not explicitly teach the hash block size is a divisor of the compression block size.
Matze teaches the hash block size is a divisor of the compression block size (Matze: [0021]; In one embodiment, a system for deduplicating data may comprise a dedicated hardware card operable to receive at least one data block. A processor on the card may generate a hash for each data block. The system may also comprise a first module that determines a processing status for the hash. The processing status may indicate whether the data block associated with the hash is unique. [0023]; Each VBD can be configured to use different deduplication block sizes. For example, the deduplication block size may be set at 4 k, 8 k, 16 k, or 32 k. If the block size is set at 4 k, then a file will be broken into however many 4 k sized blocks are necessary to contain the file).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify Shergill (teaches defining a set of data blocks within the data set based on the hash block size; generating a hash for each data block in the set of data blocks within the data set; deduplicating a data block in the data set based on a respective hash for the data block; and compressing the data set based on the compression block size) with the teachings of Arai (teaches identifying a particular data block to skip, the particular data block to skip not being deduplicated…) with the teachings of Bassov (teaches a method, comprising: identifying a data set to deduplicate based on a hash block size and to compress based on a compression block size) with the further teachings of Matze (teaches the hash block size is a divisor of the compression block size). One of ordinary skill in the art would have been motivated to make such a combination of providing better results in evaluating the data set and determine the block size accordingly to deduplicate by reviewing the data set (See Matze: [0023]; configured to provide improved performance for different applications having different performance requirements). In addition, the references (Shergill, ARAI, Bassov, and Matze) teach features that are directed to analogous art and they are directed to the same field of endeavor as Shergill, ARAI, Bassov, and Matze are directed to deduplicating and compressing data accordingly based on receiving the data.
Regarding claim 11, the modification of Shergill, ARAI, and Bassov teaches claimed invention substantially as claimed, however the modification of Shergill, ARAI, and Bassov does not explicitly teach the compression block size block size is not an integer multiple of the hash block size, wherein a second data block of the set of data blocks has a unique hash block size, wherein the third instruction to compress the data set based on the compression block size comprises a fourth instruction to perform at least one of: refraining from compressing the second data block; or discarding a hash that is associated with the second data block.
Matze teaches the compression block size block size is not an integer multiple of the hash block size (Matze: [0019]; The method additionally comprises discarding the hash value and the block of data if the hash value is not unique, and writing the block of data to a disk if the hash value is unique. [0023]; Each VBD can be configured to use different deduplication block sizes. For example, the deduplication block size may be set at 4 k, 8 k, 16 k, or 32 k. If the block size is set at 4 k, then a file will be broken into however many 4 k sized blocks are necessary to contain the file. [0030]; As shown in FIG. 4, in one embodiment, each LCN is then mapped to addresses in the Virtual Block Device), wherein
a second data block of the set of data blocks has a unique hash block size (Matze: [0019]; The method additionally comprises discarding the hash value and the block of data if the hash value is not unique, and writing the block of data to a disk if the hash value is unique. Furthermore, data compression services may be used in conjunction with data deduplication to further minimize the required storage space for a given dataset), wherein
the third instruction to compress the data set based on the compression block size comprises a fourth instruction to perform at least one of: refraining from compressing the second data block; or discarding a hash that is associated with the second data block (Matze: [0019]; The method additionally comprises discarding the hash value and the block of data if the hash value is not unique, and writing the block of data to a disk if the hash value is unique. Furthermore, data compression services may be used in conjunction with data deduplication to further minimize the required storage space for a given dataset).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify Shergill (teaches defining a set of data blocks within the data set based on the hash block size; generating a hash for each data block in the set of data blocks within the data set; deduplicating a data block in the data set based on a respective hash for the data block; and compressing the data set based on the compression block size) with the teachings of ARAI (teaches identifying a particular data block to skip, the particular data block to skip not being deduplicated…) with the teachings of Bassov (teaches a method, comprising: identifying a data set to deduplicate based on a hash block size and to compress based on a compression block size) with the further teachings of Matze (teaches the hash block size is a divisor of the compression block size). One of ordinary skill in the art would have been motivated to make such a combination of providing better results in evaluating the data set and determine the block size accordingly to deduplicate by reviewing the data set (See Matze: [0023]; configured to provide improved performance for different applications having different performance requirements). In addition, the references (Shergill, ARAI, Bassov, and Matze) teach features that are directed to analogous art and they are directed to the same field of endeavor as Shergill, ARAI, Bassov, and Matze are directed to deduplicating and compressing data accordingly based on receiving the data.
Regarding claim 17, the modification of Shergill, ARAI, and Bassov teaches claimed invention substantially as claimed, however the modification of Shergill, ARAI, and Bassov does not explicitly teach the hash block size is a divisor of the compression block size.
Matze teaches the hash block size is a divisor of the compression block size (Matze: [0021]; In one embodiment, a system for deduplicating data may comprise a dedicated hardware card operable to receive at least one data block. A processor on the card may generate a hash for each data block. The system may also comprise a first module that determines a processing status for the hash. The processing status may indicate whether the data block associated with the hash is unique. [0023]; Each VBD can be configured to use different deduplication block sizes. For example, the deduplication block size may be set at 4 k, 8 k, 16 k, or 32 k. If the block size is set at 4 k, then a file will be broken into however many 4 k sized blocks are necessary to contain the file).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify Shergill (teaches defining a set of data blocks within the data set based on the hash block size; generating a hash for each data block in the set of data blocks within the data set; deduplicating a data block in the data set based on a respective hash for the data block; and compressing the data set based on the compression block size) with the teachings of ARAI (teaches identifying a particular data block to skip, the particular data block to skip not being deduplicated…) with the teachings of Bassov (teaches a method, comprising: identifying a data set to deduplicate based on a hash block size and to compress based on a compression block size) with the further teachings of Matze (teaches the hash block size is a divisor of the compression block size). One of ordinary skill in the art would have been motivated to make such a combination of providing better results in evaluating the data set and determine the block size accordingly to deduplicate by reviewing the data set (See Matze: [0023]; configured to provide improved performance for different applications having different performance requirements). In addition, the references (Shergill, ARAI, Bassov, and Matze) teach features that are directed to analogous art and they are directed to the same field of endeavor as Shergill, ARAI, Bassov, and Matze are directed to deduplicating and compressing data accordingly based on receiving the data.
Regarding claim 18, the modification of Shergill, ARAI, and Bassov teaches claimed invention substantially as claimed, however the modification of Shergill, ARAI, and Bassov does not explicitly teach the compression block size block size is not an integer multiple of the hash block size, wherein a second data block of the set of data blocks has a unique hash block size.
Matze teaches the compression block size block size is not an integer multiple of the hash block size (Matze: [0019]; The method additionally comprises discarding the hash value and the block of data if the hash value is not unique, and writing the block of data to a disk if the hash value is unique. [0023]; Each VBD can be configured to use different deduplication block sizes. For example, the deduplication block size may be set at 4 k, 8 k, 16 k, or 32 k. If the block size is set at 4 k, then a file will be broken into however many 4 k sized blocks are necessary to contain the file. [0030]; As shown in FIG. 4, in one embodiment, each LCN is then mapped to addresses in the Virtual Block Device), wherein
a second data block of the set of data blocks has a unique hash block size (Matze: [0019]; The method additionally comprises discarding the hash value and the block of data if the hash value is not unique, and writing the block of data to a disk if the hash value is unique. Furthermore, data compression services may be used in conjunction with data deduplication to further minimize the required storage space for a given dataset).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify Shergill (teaches defining a set of data blocks within the data set based on the hash block size; generating a hash for each data block in the set of data blocks within the data set; deduplicating a data block in the data set based on a respective hash for the data block; and compressing the data set based on the compression block size) with the teachings of ARAI (teaches identifying a particular data block to skip, the particular data block to skip not being deduplicated…) with the teachings of Bassov (teaches a method, comprising: identifying a data set to deduplicate based on a hash block size and to compress based on a compression block size) with the further teachings of Matze (teaches the hash block size is a divisor of the compression block size). One of ordinary skill in the art would have been motivated to make such a combination of providing better results in evaluating the data set and determine the block size accordingly to deduplicate by reviewing the data set (See Matze: [0023]; configured to provide improved performance for different applications having different performance requirements). In addition, the references (Shergill, ARAI, Bassov, and Matze) teach features that are directed to analogous art and they are directed to the same field of endeavor as Shergill, ARAI, Bassov, and Matze are directed to deduplicating and compressing data accordingly based on receiving the data.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
U.S Patent Application Publication 2012/0221533 issued to Burness et al. (hereinafter as “Burness”) teaches a hierarchical compression which employs a grouping of data block which are store in a storage device where each stored data blocks indicates a data portion and a data integrity field.
U.S Patent 11,055,265 issued to Wang et al. (hereinafter as “Wang”) teaches scaling deduplication files among the plurality of nodes by dividing the files to be deduplicated among the nodes.
U.S Patent 10,761,758 issued to Doerner et al. (hereinafter as “Doerner”) teaches data awareness for deduplicating data based on consistent hashing on the object store by performing variable length deduplication and providing a shared nothing approach.
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANDREW N HO whose telephone number is (571)270-0590. The examiner can normally be reached Tuesday and Thursday 10:00-6:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sherief Badawi can be reached at (571) 272-9782. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
3/9/2026
/ANDREW N HO/Examiner
Art Unit 2169
/SHERIEF BADAWI/Supervisory Patent Examiner, Art Unit 2169