DETAILED ACTION
This Action is responsive to the RCE filed on 02/18/2026.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 02/18/2026 has been entered.
Claim Status
Claims 1 and 3-20 are amended. Claim 2 is cancelled. Claims 1 and 3-20 are pending and have been examined.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 1, 3-9, and 16-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Claims 1 and 16 recite the limitations “the locally stored information managed and tracked before any request for transferring data” in Lines 10-11 and 12-13, respectively.
Examiner notes that the instant Specification does not explicitly disclose the aforementioned features (i.e., managing and tracking local information before any request for transferring data) using the exact language as recited in the independent claims. Further, having reviewed the instant Specification, examiner cannot identify a disclosure of a temporal requirement on managing and tracking local information “before any” request for transferring data is received. Examiner notes that the instant Specification ¶0046 generally discloses that a database stores (i.e., ‘manage[s]’) fingerprint information (“the locally stored information”) in order to track data pattens (i.e., fingerprint information is generally related to tracking)). However, the instant Specification does not appear disclose an instance where fingerprint information is stored into a database before any request for transferring data is received.
Accordingly, the instant specification does not provide evidence that applicant had possession of the invention now recited in the aforementioned limitations. Therefore, the claims fail to comply with the written description requirement.
Claims 3-9 and 17-20 are similarly rejected under 35 U.S.C. 112(a) due to their respective claim dependencies.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 3, and 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over Wu et al. (US 20180173732 A1)(cited by examiner in previous action)(hereafter referred to as Wu) further in view of Armangau et al. (US 20240028229 A1)(cited by examiner in previous action)(hereafter referred to as Armangau).
Regarding Claim 1,
Wu discloses the following limitations:
A method for managing transfer of data from a source storage system (Client 210, Fig. 2), the method comprising:
obtaining (Fig. 3, step 306), from one or more volumes (“the data in need of backup” [0053]) of a source storage system (Client 210, Fig. 2), first fingerprint information (“a signature” [0053])(“After the client 210 samples the data in need of backup, a signature is generated for the sampled data … At 306, the signature is transmitted to a master storage node in the storage cluster including a plurality of storage nodes, to allow the master storage node to select one storage nodes from the plurality of storage nodes” [0053]) – As shown in Figs. 2 + 3 and detailed in ¶0053, during step 306, a client 210 provides a signature associated with data to be backed up to a master storage node 221. One of ordinary skill in the art would understand that data in need of backup would at least include data from “one or more volumes”--;
providing (Fig. 4, steps 404 + 406) the first fingerprint information to candidate storage systems (Storage Nodes 221-223, Fig. 2 // “a plurality of storage nodes” [0056])(“the master storage node 221 distributes the signature to at least one slave storage node” [0057]);
receiving (Fig. 4, steps 404 + 406), from the candidate storage systems, estimates (“remote” + “local matching information” [0057-58])(“allow the one or more slave storage node to determine remote matching information … the remote matching information fed back by the at least one lave storage node, from the plurality of storage nodes for storing the data to be backed up” [0057-58] // “Meanwhile, the local matching information indicating the matching degree between the sampled data and data stored in the master storage node is determined based on the signature of the sampled data at 406” [0057] // ¶0059) – As shown in Fig.4 steps 404 + 406 and detailed in ¶0057-58, a master storage node receives “remote” and “local matching information” (i.e., receives “estimates”) from storage nodes— indicative of post-transfer storage consumption based on deduplication effectiveness (“the client 210 routes the data to be backed up to a suitable storage node based on a routing strategy … achieves higher data de-duplication rate, thereby saving storage space of the storage cluster system” [0055] // “the matching information indicates the number of data segments matched in each respective node” [0059] // ¶0004) – As disclosed in ¶0055, the matching information used to route backup data to a particular storage node such enables “higher data de-duplication rate” to be achieved (i.e., indicates “deduplication effectiveness”) so that “storage space” is saved in the storage cluster (i.e., “indicative of post-transfer storage consumption”)--,
wherein each candidate storage system generates its estimate by comparing the first fingerprint information with second fingerprint information (“the local storage data” + “data stored on in the master storage node” [0057]; see also “the fingerprints of locally stored data” [0074]) stored locally to identify duplicate data blocks based on data patterns (“the master storage node 221 distributes the received signature of the sampled data to the slave storage nodes 222 and 223, such that the slave storage nodes 222 and 223 may determine matching information between the sampled data and the local storage data by lookup and matching operations. Meanwhile, the local matching information indicating the matching degree between the sampled data and the data stored in the master storage node is determined based on the signature of the sampled data 406. For example, the master storage 221 may look up the signature (or fingerprint(s)) to the locally stored data, and further match and compare the received signature of the sample data to obtain the corresponding matching information” [0057]) – As taught in ¶0057, “local storage data” and “data stored in the master storage node” (i.e., “second fingerprint information”) which is compared to the signature (“the first fingerprint information”) is stored locally on their respective nodes-- …
based on the estimates, selecting (Fig. 4, step 408), among the candidate storage systems, one or more destination storage systems (“the selected target storage node” [0058]) to minimize total storage consumption across the selected destination storage systems(“At 408, a storage node is selected, at least based on the local matching information and the remote matching information fed back by the at least one slave storage node, from the plurality of storage nodes for storing the data to be backed up. Then, a first indication of the selected target storage node is transmitted to the client at 410” [0058] // “selecting the storage node corresponding to the maximum number of data segments matched” [0059]) – As taught in ¶¶0058-59, the selected target node for receiving the backup data corresponds to the node having “the maximum number of data segments matched”. One of ordinary skill in the art would accordingly understand that within the context of data deduplication (see; e.g., Wu ¶0004), the node identified as having the maximum number of matching data segments correlates to the node requiring the minimum total storage consumption for storing the data because deduplication removes redundant matching segments to improve storage capacity (i.e., “to minimize total storage consumption”)--; and
transferring (Fig. 3, step 310) the data from the one or more volumes to at least the selected storage systems (“At 308, an indication of the selected storage node is received from the master storage node … Correspondingly, the client 210 receives the indication of the selected storage node. Afterwards, data to be backed up is transmitted to the selected target storage node based on the indication at 310. For example, the client 210 transmits the data to be backed up directly to a storage node in the storage cluster system 220 as indicated” [0054]).
wherein transferring comprises deduplicating and compressing the data using the identified duplicate data blocks to reduce an amount of data …(“When the client 110 has a need for backing up data, it may alternately or randomly select the storage nodes for backup storage, for example to “route” the data to the corresponding storage nodes. Then the storage nodes employ the de-duplication technology to de-duplicate the data and save storage space accordingly.” [0040] // “De-duplication technology is a special data compression technology based on removal of redundant data with the purpose of reducing storage capacity used in the storage system” [0004])
Although Wu ¶¶0057-58 discloses that “the local storage data” of slave nodes and “data stored in the master node” which is used to generate the local and remote matching information (i.e., “second fingerprint information” used to generate “estimates”) is stored on respective slave and master nodes, Wu does not appear to explicitly disclose that the aforementioned local storage data is stored locally on the aforementioned nodes prior to the aforementioned nodes receiving the signature and generating the corresponding matching information. Additionally, although Wu discloses that employing de-duplication saves storage space, Wu does not explicitly disclose the data transmit to the selected storage node is first deduplicated prior to transmission, thus reducing an amount of data transferred. Specifically, the combined teaching of Wu and Dixit do not explicitly disclose the following limitations:
the locally stored second fingerprint information managed and tracked before any request for transferring of data …
to reduce an amount of data transferred
However, Armangau discloses the following limitations:
the locally stored second fingerprint information (“second fingerprints” [0051]) managed and tracked before any request for transferring of data (“in advance of the replication activities described, such that the digest database 142 at the destination natively stores its fingerprints as second fingerprints … e.g., upon ingest when blocks are initially received. Thus, fingerprints stored in the digest database 142 at the destination are computed as second fingerprints 204, such that the second fingerprints are already present and waiting to be compared with first fingerprints 202” [0051]) …
to reduce an amount of data transferred (Fig. 2, steps 250 – 270 // “At step 250, the destination stores the matching first set of blocks 180s by reference to the second set of blocks 180d … without having to copy the data from source to destination as part of the snapshot-shipping update” [0042] // ¶¶0042-44) – As shown in Fig. 2, data identified by the destination as duplicate blocks are not transmit from the source to the destination, effectively deduplicating the data and thereby reducing an amount of data transferred from source to destination.
Wu and Armangau are considered analogous to the claimed invention because they all relate to the same field of performing data deduplication in distributed storage environments. Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wu with the teachings of Armangau and realize a method whereby candidate storage systems store local second fingerprint information prior to receiving a data transfer request. Doing so would ensure that fingerprints are available to the candidate storage system for performing deduplication, reducing the amount of time required to identify duplicate data, as disclosed in Armangau ¶0051: “Preferably, the destination 116d performs the depicted acts of section 360 in advance of replication activities described, such that the digest database 142 at the destination natively stores its fingerprints as second fingerprints 204 … e.g., upon data ingest when blocks are initially received. Thus, … the second fingerprints 204 are already present and waiting to be compared with first fingerprints 202.” [0051]
Regarding Claim 3,
The same motivation to combine provided in Claim 1 is equally applicable to Claim 3. The combined teachings of Wu and Armangau disclose the following limitations:
The method of claim 1, wherein selecting (Wu, Fig. 4, step 408) one or more destination storage systems comprises selecting a destination storage system associated with a lowest estimate among the estimates (Wu, “At 408, a storage node is selected, at least based on the local matching information and the remote matching information fed back by the at least one slave storage node … In one embodiment … selecting the storage node corresponding to the maximum number of data segments matched” [0058]) – As disclosed in Wu ¶0058, a storage node having the maximum number of matching data segments can be selected as the target storage node. As previously discussed (see Claim 1 limitation mappings above), a rate of matching data segments corresponds to an estimated deduplication effectiveness. One of ordinary skill in the art would accordingly understand that a storage node having the “maximum number of matching data segments” amongst a plurality of storage nodes would in effect correspond to “the lowest estimate among the estimates” (i.e., the lowest estimated amount of data storage amongst all estimated amounts of data storage).
Regarding Claim 16,
Wu discloses the following limitations:
A system (Data Processing System 200, Fig. 2) for reducing a data transfer volume, the system comprising:
a management system (Storage Cluster System 220, Fig. 2) configured to receive (Fig. 3, step 306) first information (“a signature” [0053])(“After the client 210 samples the data in need of backup, a signature is generated for the sampled data … At 306, the signature is transmitted to a master storage node in the storage cluster including a plurality of storage nodes, to allow the master storage node to select one storage nodes from the plurality of storage nodes” [0053]) – As shown in Figs. 2 + 3 and detailed in ¶0053, during step 306, a client 210 provides a signature associated with data to be backed up to a master storage node 221-- associated with at least one of fingerprint information (Wu, “the signature of the sampled data may be fingerprints of the sampled data segments” [0056]),
a data access pattern, a data duplication rate, a volume label, or a compression ratio;
a source storage (Client 210, Fig. 2) communicatively coupled to the management system (Fig. 2), the source storage configured to provide (Fig. 3, step 306) the first information to the management system (¶0053); and
candidate storage systems (Storage Nodes 221-223, Fig. 2 // “a plurality of storage nodes” [0056])(“the master storage node 221 distributes the signature to at least one slave storage node” [0057]) communicatively coupled to the management system (Fig. 2),
each of the candidate storage systems is configured to use the first information to generate (Fig. 4, steps 404 + 406) an estimate (“remote” + “local matching information” [0057-58])(“After receiving the signature of the sampled data, the master storage node 221 distributes the signature to at least one slave storage node … so as to allow the at least one slave storage node to determine remote matching information … Meanwhile, the local matching information … is determined based on the signature of the sampled data at 406.” [0057]) – As shown in Fig. 4 steps 404 + 406 and disclosed in ¶0057, the signature of data to be backed up is used to generate both “remote” and “local matching information” corresponding to “a matching degree” between sampled and stored data.-- indicative of post-transfer storage consumption based on deduplication effectiveness (“the client 210 routes the data to be backed up to a suitable storage node based on a routing strategy … achieves higher data de-duplication rate, thereby saving storage space of the storage cluster system” [0055] // “the matching information indicates the number of data segments matched in each respective node” [0059] // ¶0004) – As disclosed in ¶0055, the matching information used to route backup data to a particular storage node such enables “higher data de-duplication rate” to be achieved (i.e., indicates “deduplication effectiveness”) so that “storage space” is saved in the storage cluster (i.e., “indicative of post-transfer storage consumption”)--,
wherein each candidate storage system generates its estimate by comparing the first information with fingerprint information (“the local storage data” + “data stored on in the master storage node” [0057]; see also “the fingerprints of locally stored data” [0074]) stored locally to identify duplicate data blocks based on data patterns (“the master storage node 221 distributes the received signature of the sampled data to the slave storage nodes 222 and 223, such that the slave storage nodes 222 and 223 may determine matching information between the sampled data and the local storage data by lookup and matching operations. Meanwhile, the local matching information indicating the matching degree between the sampled data and the data stored in the master storage node is determined based on the signature of the sampled data 406. For example, the master storage 221 may look up the signature (or fingerprint(s)) to the locally stored data, and further match and compare the received signature of the sample data to obtain the corresponding matching information” [0057]) – As taught in ¶0057, “local storage data” and “data stored in the master storage node” (i.e., “fingerprint information”) which is compared to the signature (“the first information”) is stored locally on their respective nodes--
wherein the management system is configured to use the estimates to select (Fig. 4, step 408) one or more destination storage systems (“the selected target storage node” [0058]) among the candidate storage systems (“At 408, a storage node is selected, at least based on the local matching information and the remote matching information fed back by the at least one slave storage node, from the plurality of storage nodes for storing the data to be backed up. Then, a first indication of the selected target storage node is transmitted to the client at 410.” [0058]), -- to minimize total storage consumption across the selected destination storage systems (“At 408, a storage node is selected, at least based on the local matching information and the remote matching information fed back by the at least one slave storage node, from the plurality of storage nodes for storing the data to be backed up. Then, a first indication of the selected target storage node is transmitted to the client at 410” [0058] // “selecting the storage node corresponding to the maximum number of data segments matched” [0059]) – As taught in ¶¶0058-59, the selected target node for receiving the backup data corresponds to the node having “the maximum number of data segments matched”. One of ordinary skill in the art would accordingly understand that within the context of data deduplication (see; e.g., Wu ¶0004), the node identified as having the maximum number of matching data segments correlates to the node requiring the minimum total storage consumption for storing the data because deduplication removes redundant matching segments to improve storage capacity (i.e., “to minimize total storage consumption”)--
wherein transferring comprises deduplicating and compressing the data using the identified duplicate data blocks to reduce an amount of data …(“When the client 110 has a need for backing up data, it may alternately or randomly select the storage nodes for backup storage, for example to “route” the data to the corresponding storage nodes. Then the storage nodes employ the de-duplication technology to de-duplicate the data and save storage space accordingly.” [0040] // “De-duplication technology is a special data compression technology based on removal of redundant data with the purpose of reducing storage capacity used in the storage system” [0004])
Although Wu ¶¶0057-58 discloses that “the local storage data” of slave nodes and “data stored in the master node” which is used to generate the local and remote matching information (i.e., “second information” used to generate “estimates”) is stored on respective slave and master nodes, Wu does not appear to explicitly disclose that the aforementioned local storage data is stored locally on the aforementioned nodes prior to the aforementioned nodes receiving the signature and generating the corresponding matching information. Additionally, although Wu discloses that employing de-duplication saves storage space, Wu does not explicitly disclose the data transmit to the selected storage node is first deduplicated prior to transmission, thus reducing an amount of data transferred. Specifically, the combined teaching of Wu and Dixit do not explicitly disclose the following limitations:
the locally stored fingerprint information managed and tracked before any request for transferring of data
to reduce an amount of data transferred
However, Armangau discloses the following limitations:
the locally stored fingerprint information (“second fingerprints” [0051]) managed and tracked before any request for transferring of data (“in advance of the replication activities described, such that the digest database 142 at the destination natively stores its fingerprints as second fingerprints … e.g., upon ingest when blocks are initially received. Thus, fingerprints stored in the digest database 142 at the destination are computed as second fingerprints 204, such that the second fingerprints are already present and waiting to be compared with first fingerprints 202” [0051]) …
to reduce an amount of data transferred (Fig. 2, steps 250 – 270 // “At step 250, the destination stores the matching first set of blocks 180s by reference to the second set of blocks 180d … without having to copy the data from source to destination as part of the snapshot-shipping update” [0042] // ¶¶0042-44) – As shown in Fig. 2, data identified by the destination as duplicate blocks are not transmit from the source to the destination, effectively deduplicating the data and thereby reducing an amount of data transferred from source to destination.
Wu and Armangau are considered analogous to the claimed invention because they all relate to the same field of performing data deduplication in distributed storage environments. Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wu with the teachings of Armangau and realize a method whereby candidate storage systems store local second fingerprint information prior to receiving a data transfer request. Doing so would ensure that fingerprints are available to the candidate storage system for performing deduplication, reducing the amount of time required to identify duplicate data, as disclosed in Armangau ¶0051: “Preferably, the destination 116d performs the depicted acts of section 360 in advance of replication activities described, such that the digest database 142 at the destination natively stores its fingerprints as second fingerprints 204 … e.g., upon data ingest when blocks are initially received. Thus, … the second fingerprints 204 are already present and waiting to be compared with first fingerprints 202.” [0051]
Regarding Claim 17,
The same motivation to combine provided in Claim 16 is equally applicable to Claim 17. The combined teachings of Wu and Armangau disclose the following limitations:
The system of claim 16, wherein the management system uses a storage device selection module (Wu, Master Storage Node 221, Fig. 2) to select (Wu, Fig. 4, step 408) a destination storage system that is associated with a lowest estimate among the estimates (Wu, “At 408, a storage node is selected, at least based on the local matching information and the remote matching information fed back by the at least one slave storage node … In one embodiment … selecting the storage node corresponding to the maximum number of data segments matched” [0058]) – As disclosed in Wu ¶0058, a storage node having the maximum number of matching data segments can be selected as the target storage node. As previously discussed (see Claim 16 limitation mappings above), a rate of matching data segments corresponds to an estimated deduplication effectiveness. One of ordinary skill in the art would accordingly understand that a storage node having the “maximum number of matching data segments” amongst a plurality of storage nodes would in effect correspond to “a lowest estimate among the estimates” (i.e., the lowest estimated amount of data amongst all estimated amounts of data).
Regarding Claim 18,
The same motivation to combine provided in Claim 16 is equally applicable to Claim 18. The combined teachings of Wu and Armangau disclose the following limitations:
The system of claim 16, wherein the candidate storage systems generate the estimates by using a storage consumption estimation module (Wu, Slave Storage Nodes 222 + 223, Fig. 2) to compare (Wu, Fig. 7, step 704) the first information to second information (Wu, “the fingerprints of locally stored data” [0075]) associated with the candidate storage systems (Wu, Fig. 7 // “At 704, matching information between the sampled data and the data stored in the slave storage node is determined … For example, it may compare the received fingerprints with the fingerprints of locally stored data and determine a comparison result accordingly, such as the number of matched data segments or a similarity score etc.” [0075]) – As shown in Wu Fig. 7 and detailed in ¶0075, remote matching information (i.e., “the estimate[s]”) is generated by slave storage nodes by comparing the received signature (i.e., “the first information”) with fingerprints associated with locally-stored data (i.e., “second information associated with the candidate storage systems”).
Claims 10-11 and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Wu et al. (US 20180173732 A1)(cited by examiner in previous action)(hereafter referred to as Wu) further in view of Dixit et al. (US 20200226107 A1)(cited by examiner in previous action)(hereafter referred to as Dixit) and Armangau et al. (US 20240028229 A1)(cited by examiner in previous action)(hereafter referred to as Armangau).
Regarding Claim 10,
Wu discloses the following limitations:
A non-transitory computer-readable medium for storing instructions for executing a process (¶0022), the instructions comprising:
at a management system (Master Storage Node 221, Fig. 2), receiving (Fig. 3, step 306), from a source node (Client 210, Fig. 2), first information (“a signature” [0053]) (“After the client 210 samples the data in need of backup, a signature is generated for the sampled data … At 306, the signature is transmitted to a master storage node in the storage cluster including a plurality of storage nodes, to allow the master storage node to select one storage nodes from the plurality of storage nodes” [0053]) – As shown in Figs. 2 + 3 and detailed in ¶0053, during step 306, a client 210 provides a signature associated with data to be backed up to a master storage node 221-- associated with a data match rate (“the master storage node 221 distributes the signature to at least one slave storage node … as to allow the at least one slave storage node to determine remote matching information indicating a matching degree between the sampled data and data stored in the at least one slave storage node.” [0057]) – As detailed in ¶0057, a signature enables a storage node to determine a “matching degree” between sampled data and data stored on the storage node.--;
providing (Fig. 4, steps 404 + 406) the first information to candidate storage systems (Storage Nodes 221-223, Fig. 2 // “a plurality of storage nodes” [0056]) that each use the first information to generate an estimate (“remote” + “local matching information” [0057-58])(“After receiving the signature of the sampled data, the master storage node 221 distributes the signature to at least one slave storage node … so as to allow the at least one slave storage node to determine remote matching information … Meanwhile, the local matching information … is determined based on the signature of the sampled data at 406.” [0057]) – As shown in Fig. 4 steps 404 + 406 and disclosed in ¶0057, the signature of data to be backed up is used to generate both “remote” and “local matching information” corresponding to “a matching degree” between sampled and stored data.--…;
wherein each candidate storage system generates its estimate by comparing the first information with fingerprint information (“the local storage data” + “data stored on in the master storage node” [0057]; see also “the fingerprints of locally stored data” [0074]) stored locally to identify duplicate data blocks based on data patterns, the locally stored fingerprint information used to track data patterns (“the master storage node 221 distributes the received signature of the sampled data to the slave storage nodes 222 and 223, such that the slave storage nodes 222 and 223 may determine matching information between the sampled data and the local storage data by lookup and matching operations. Meanwhile, the local matching information indicating the matching degree between the sampled data and the data stored in the master storage node is determined based on the signature of the sampled data 406. For example, the master storage 221 may look up the signature (or fingerprint(s)) to the locally stored data, and further match and compare the received signature of the sample data to obtain the corresponding matching information” [0057]) – As taught in ¶0057, “local storage data” and “data stored in the master storage node” (i.e., “fingerprint information”) which is compared to the signature (“the first information”) is stored locally on their respective nodes and enables matching data segments stored on different nodes to be identified (i.e., “used to track data patterns”) --
in response to obtaining the estimates, selecting (Fig. 4, step 408) one or more destination storage systems (“the selected target storage node” [0058]) among the candidate storage systems (“At 408, a storage node is selected, at least based on the local matching information and the remote matching information fed back by the at least one slave storage node, from the plurality of storage nodes for storing the data to be backed up. Then, a first indication of the selected target storage node is transmitted to the client at 410.” [0058])
to reduce an amount of data to be transferred to at least one of the one or more destination storage systems (“In one embodiment, in the case that the matching information indicates the number of data segments matched in each respective storage node, the master storage node 221 may select a storage node with the number of data segments matched greater than a predetermined threshold as the target storage node” [0059] // “Compared with Round Robin algorithm, the solution proposed by the present disclosure gains 14% storage space saving in the cluster storage system having 4 storage nodes” [0079]) – As detailed in ¶0059, a storage node having a number of matching data segments greater than a threshold (as determined by remote and local matching information; see Fig. 4 steps 404 + 406) is selected as a target storage node. As clarified in ¶0079, selecting a target storage node based on a number of matching segments enables a gain of “14% storage space saving” (i.e., in effect “reduce[s] an amount of data to be transferred”)--; and
causing the data to be transferred (Fig. 3, step 310) from the source node to the at least one of the one or more destination storage systems (“At 308, an indication of the selected storage node is received from the master storage node … Correspondingly, the client 210 receives the indication of the selected storage node. Afterwards, data to be backed up is transmitted to the selected target storage node based on the indication at 310. For example, the client 210 transmits the data to be backed up directly to a storage node in the storage cluster system 220 as indicated” [0054]).
wherein transferring comprises deduplicating and compressing the data using the identified duplicate data blocks to reduce an amount of data …(“When the client 110 has a need for backing up data, it may alternately or randomly select the storage nodes for backup storage, for example to “route” the data to the corresponding storage nodes. Then the storage nodes employ the de-duplication technology to de-duplicate the data and save storage space accordingly.” [0040] // “De-duplication technology is a special data compression technology based on removal of redundant data with the purpose of reducing storage capacity used in the storage system” [0004])
and wherein the candidate storage systems maintain similar fingerprint information (“the storage cluster system 200 segments the data by Rolling Hash algorithm. Likewise, the client 210 uses the same algorithm to segment and sample the data in need of backup” [0048] // ¶¶0047-48) and share comparison results with each other before performing write operations (“If a new data segment is identical to the existing data in the storage cluster system 220, the new data will not be stored … the master storage node 221 also has interaction with slave storage nodes regarding related information” [0044-45])
Wu does not explicitly disclose that the remote and local matching information corresponds to an estimate of an amount of data stored on a storage system. Specifically, Wu does not explicitly disclose in a single embodiment the following limitations:
an estimate of an amount of data to be stored on that respective candidate storage system
However, Dixit discloses within the context of data compression and deduplication that an estimated frequency of matches between referenced and stored data directly corresponds to an amount of data to be stored on a storage node.
Dixit discloses the following limitations:
an estimate (“frequency of matches” [0002]) of an amount of data to be stored on that respective candidate storage system (“Deduplication can include a first stage involving identifying … unique sequences or patterns of bytes of data … Deduplication can also include a second stage involving comparing the sequences to stored copies, such as by hashing these sequences and performing a lookup in the deduplication database. If a match is found, the matched sequence can be replaced with a pointer or other reference to the stored copy. The frequency of matches can depend on the size of the sequence, which can in turn affect the amount of data that deduplication can reduce for storage and/or transfer over the network.” [0002]) – Examiner considers the concept of “remote” and “local matching information” as disclosed in Wu ¶0057 as analogous to the concept of the “frequency of matches” between sequences of data and stored data as disclosed in Dixit ¶0002. As taught by Dixit, the frequency of matches (i.e., the claimed “estimate[s]”) corresponds directly to an amount of data which is reduced for storage (i.e., “an amount of data to be stored on that respective candidate storage system”).
Wu and Dixit are considered analogous to the claimed invention because they all relate to the same field of performing data deduplication in distributed storage environments. Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wu with the teachings of Dixit and realize a method of estimating an amount of data to be stored on a destination storage system based on a data match rate derived from a fingerprint. Estimating an amount of data based on a data match rate would be expected to provide network operators information to assess the rewards and costs of data deduplication, leading to improved decision-making and improved CPU resource utilization, as disclosed in Dixit ¶¶0002 // 0011-13: “The frequency of matches can depend on the size of the sequences, which can in turn affect the amount of data that deduplication can reduce for storage and/or transfer over a network. For example, a smaller size for the sequences can increase the rate of matches while a larger size for the sequences can result in a smaller deduplication database size, faster deduplication, and less fragmentation.” [0002] // “Data deduplication can involve eliminating duplicate or redundant data in storage and/or before transmitting the data over a network … In such environments, where data and resource availability are unpredictable, a network operator may not be capable of assessing the rewards (or costs) of deduplication … However, the network or network operator may not be capable of predicting the rewards (or costs) associated with a particular implementation of deduplication … Various embodiments of the present disclosure can overcome these and other deficiencies of the prior art by utilizing reinforcement learning to determine the set of actions to take for data deduplication to optimize computing resource utilization.” [0011-13]
Although Wu ¶¶0057-58 discloses that “the local storage data” of slave nodes and “data stored in the master node” which is used to generate the local and remote matching information (i.e., “second information” used to generate “estimates”) is stored on respective slave and master nodes, Wu does not appear to explicitly disclose that the aforementioned local storage data is stored locally on the aforementioned nodes prior to the aforementioned nodes receiving the signature and generating the corresponding matching information. Additionally, although Wu discloses that employing de-duplication saves storage space, Wu does not explicitly disclose the data transmit to the selected storage node is first deduplicated prior to transmission, thus reducing an amount of data transferred. Specifically, the combined teaching of Wu and Dixit do not explicitly disclose the following limitations:
to reduce an amount of data transferred
However, Armangau discloses the following limitations:
to reduce an amount of data transferred (“in advance of the replication activities described, such that the digest database 142 at the destination natively stores its fingerprints as second fingerprints … e.g., upon ingest when blocks are initially received. Thus, fingerprints stored in the digest database 142 at the destination are computed as second fingerprints 204, such that the second fingerprints are already present and waiting to be compared with first fingerprints 202” [0051] // Fig. 2, steps 250 – 270 // “At step 250, the destination stores the matching first set of blocks 180s by reference to the second set of blocks 180d … without having to copy the data from source to destination as part of the snapshot-shipping update” [0042] // ¶¶0042-44) – As shown in Fig. 2, data identified by the destination as duplicate blocks are not transmit from the source to the destination, effectively deduplicating the data and thereby reducing an amount of data transferred from source to destination.
Wu, Dixit, and Armangau are considered analogous to the claimed invention because they all relate to the same field of performing data deduplication in distributed storage environments. Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wu and Dixit with the teachings of Armangau and realize a method whereby candidate storage systems store local second fingerprint information prior to receiving a data transfer request. Doing so would ensure that fingerprints are available to the candidate storage system for performing deduplication, reducing the amount of time required to identify duplicate data, as disclosed in Armangau ¶0051: “Preferably, the destination 116d performs the depicted acts of section 360 in advance of replication activities described, such that the digest database 142 at the destination natively stores its fingerprints as second fingerprints 204 … e.g., upon data ingest when blocks are initially received. Thus, … the second fingerprints 204 are already present and waiting to be compared with first fingerprints 202.” [0051]
Regarding Claim 11,
The same motivation to combine provided in Claim 10 is equally applicable to Claim 11. The combined teachings of Wu, Dixit, and Armangau disclose the following limitations:
The non-transitory computer-readable medium of claim 10, wherein generating the estimate (Wu, Fig. 4, steps 404 + 406 // Fig. 7, steps 702 + 704) comprises comparing (Wu, Fig. 7, step 704) the first information to second information (Wu, “the fingerprints of locally stored data” [0075]) associated with the candidate storage systems (Wu, Fig. 7 // “At 704, matching information between the sampled data and the data stored in the slave storage node is determined … For example, it may compare the received fingerprints with the fingerprints of locally stored data and determine a comparison result accordingly, such as the number of matched data segments or a similarity score etc.” [0075]) – As shown in Wu Fig. 7 and detailed in ¶0075, remote matching information (i.e., “the estimate[s]”) is generated by slave storage nodes by comparing the received signature (i.e., “the first information”) with fingerprints associated with locally-stored data (i.e., “second information associated with the candidate storage systems”)
Regarding Claim 13,
The same motivation to combine provided in Claim 10 is equally applicable to Claim 13. The combined teachings of Wu, Dixit, and Armangau disclose the following limitations:
The non-transitory computer-readable medium of claim 11, wherein selecting (Wu, Fig. 4, step 408) the one or more destination storage systems comprises selecting a destination storage system associated with a lowest estimate among the estimates (Wu, “At 408, a storage node is selected, at least based on the local matching information and the remote matching information fed back by the at least one slave storage node … In one embodiment … selecting the storage node corresponding to the maximum number of data segments matched” [0058] // Dixit, ¶0002) – As disclosed in Wu ¶0058, a storage node having the maximum number of matching data segments can be selected as the target storage node. As previously discussed (see Dixit ¶0002 and Claim 10 limitation mappings above), a rate of matching data segments corresponds to an estimated amount of deduplicated data stored on a storage system. One of ordinary skill in the art would accordingly understand that a storage node having the “maximum number of matching data segments” amongst a plurality of storage nodes would in effect correspond to “the lowest estimate among the estimates” (i.e., the lowest estimated amount of data amongst all estimated amounts of data).
Regarding Claim 14,
The same motivation to combine provided in Claim 10 is equally applicable to Claim 14. The combined teachings of Wu, Dixit, and Armangau disclose the following limitations:
The non-transitory computer-readable medium of claim 11, wherein at least one of the first information (Wu, “a signature” [0053]) or the second information (Wu, “the fingerprints of locally stored data” [0075]) comprises
at least one of the fingerprint information (Wu, “the signature of the sampled data may be fingerprints of the sampled data segments” [0056]) – As disclosed in Wu, both the signature of sampled data (i.e., “first information”) and the fingerprints of data stored locally on slave storage nodes (i.e., “second information”) correspond to fingerprints of associated data segments--,
a data access pattern, a data duplication rate, a volume label, or a compression ratio.
Claims 4 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Wu further in view of Armangau and Madan et al. (US 20140136789 A1)(cited by examiner in previous action)(hereafter referred to as Madan).
Regarding Claim 4,
The same motivation to combine provided in Claim 1 is equally applicable to Claim 4. The combined teachings of Wu, and Armangau disclose the following limitations:
The method of claim 1 (see Claim 1 limitation mappings above),
Although Wu ¶0075 discloses that slave storage nodes use “a procedure of matching with the local data” to determine remote matching information, the combined teachings of Wu and Armangau do not explicitly disclose the following limitations:
wherein comparing comprises comparing the first fingerprint information with a subset of the second fingerprint information to reduce a computational cost without significantly reducing an estimate accuracy
However, Madan discloses the following limitations:
comparing comprises comparing the first fingerprint information (“a signature for the writable data” [0054]) with a subset of the second fingerprint information (“one or more signatures associated with data stored by the storage server” [0054]) to reduce a computational cost (“a signature for the writable data may be computed … the signature may be used to query a data structure, such as an index, comprising one or more signatures associated with data stored by the storage server … To reduce the size of the data structure and/or improve query time, signatures of data … accessed by a host computing device below a threshold frequency … may be excluded from the data structure.” [0054]) – As taught in ¶0054, a computed signature for write data (i.e., “first fingerprint information”) is compared to signatures of data stored on a storage server (i.e., “second fingerprint information”) to determine a match. As clarified in ¶0054, comparing the computed signature only with signatures of frequently-accessed data (i.e., comparing to “a subset of” second fingerprints) reduces a size of an index and improves query time (i.e., “reduce[s] a computational cost”)-- without significantly reducing an estimate accuracy (¶0054) – One of ordinary skill in the art would understand that limiting signature comparison only to the subset of frequently-accessed signatures would not impact the estimation accuracy for frequently-accessed data but would diminish the estimation accuracy for data which is not frequently accessed. One of ordinary skill in the art would accordingly understand that such diminished accuracy would by definition only occur for data which is infrequently accessed and thus would infrequently occur (i.e., would not “significantly reduc[e] an estimate accuracy”).
Wu, Armangau, and Madan are all considered analogous to the claimed invention because they all relate to the same field of computing fingerprints in distributed and deduplicated storage environments. Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wu and Armangau with the teachings of Madan and realize a signature matching procedure limited to a subset of signatures associated with data stored on a storage system. Doing so would improve resource utilization and performance by reducing the size of an index and improving query time, as disclosed in Madan ¶0054: “To reduce the size of the data structure and/or improve the query time, signatures of data accessed by the host computing device outside a threshold time frame (e.g., data last accessed over 10 days from the current date) and/or data accessed by the host computing device below a threshold frequency (e.g., data accessed less than two times within 3 days from the current date) may be excluded from the data structure.” [0054]
Regarding Claim 19,
The same motivation to combine provided in Claim 16 is equally applicable to Claim 19. The combined teachings of Wu and Armangau disclose the following limitations:
The system of claim 18 (see Claim 18 limitation mappings above),
Although Wu ¶0075 discloses that slave storage nodes use “a procedure of matching with the local data” to determine remote matching information, the combined teachings of Wu and Armangau do not explicitly disclose the following limitations:
wherein the candidate storage systems compare the first information with a subset of the second information to reduce a computational cost
However, Madan discloses the following limitations:
wherein the candidate storage systems compare the first information (“a signature for the writable data” [0054]) with a subset of the second information (“one or more signatures associated with data stored by the storage server” [0054]) to reduce a computational cost (“a signature for the writable data may be computed … the signature may be used to query a data structure, such as an index, comprising one or more signatures associated with data stored by the storage server … To reduce the size of the data structure and/or improve query time, signatures of data … accessed by a host computing device below a threshold frequency … may be excluded from the data structure.” [0054]) – As taught in ¶0054, a computed signature for write data (i.e., “first fingerprint information”) is compared to signatures of data stored on a storage server (i.e., “second fingerprint information”) to determine a match. As clarified in ¶0054, comparing the computed signature only with signatures of frequently-accessed data (i.e., comparing to “a subset of” second fingerprints) reduces a size of an index and improves query time (i.e., “reduce[s] a computational cost”)
Wu, Armangau, and Madan are all considered analogous to the claimed invention because they all relate to the same field of computing fingerprints in distributed and deduplicated storage environments. Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wu and Armangau with the teachings of Madan and realize a signature matching procedure limited to a subset of signatures associated with data stored on a storage system. Doing so would improve resource utilization and performance by reducing the size of an index and improving query time, as disclosed in Madan ¶0054: “To reduce the size of the data structure and/or improve the query time, signatures of data accessed by the host computing device outside a threshold time frame (e.g., data last accessed over 10 days from the current date) and/or data accessed by the host computing device below a threshold frequency (e.g., data accessed less than two times within 3 days from the current date) may be excluded from the data structure.” [0054]
Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Wu further in view of Dixit, Armangau, and Madan et al. (US 20140136789 A1)(cited by examiner in previous action)(hereafter referred to as Madan).
Regarding Claim 12,
The same motivation to combine provided in Claim 10 is equally applicable to Claim 12. The combined teachings of Wu, Dixit, and Armangau disclose the following limitations:
The non-transitory computer-readable medium of claim 11 (see Claim 11 limitation mappings above),
Although Wu ¶0075 discloses that slave storage nodes use “a procedure of matching with the local data” to determine remote matching information, the combined teachings of Wu, Dixit, and Armangau do not explicitly disclose the following limitations:
wherein comparing comprises comparing the first information with a subset of the second information to reduce a computational demand
However, Madan discloses the following limitations:
comparing comprises comparing the first fingerprint information (“a signature for the writable data” [0054]) with a subset of the second fingerprint information (“one or more signatures associated with data stored by the storage server” [0054]) to reduce a computational cost (“a signature for the writable data may be computed … the signature may be used to query a data structure, such as an index, comprising one or more signatures associated with data stored by the storage server … To reduce the size of the data structure and/or improve query time, signatures of data … accessed by a host computing device below a threshold frequency … may be excluded from the data structure.” [0054]) – As taught in ¶0054, a computed signature for write data (i.e., “first fingerprint information”) is compared to signatures of data stored on a storage server (i.e., “second fingerprint information”) to determine a match. As clarified in ¶0054, comparing the computed signature only with signatures of frequently-accessed data (i.e., comparing to “a subset of” second fingerprints) reduces a size of an index and improves query time (i.e., “reduce[s] a computational cost”)
Wu, Dixit, Armangau, and Madan are all considered analogous to the claimed invention because they all relate to the same field of computing fingerprints in distributed and deduplicated storage environments. Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wu, Dixit, and Armangau with the teachings of Madan and realize a signature matching procedure limited to a subset of signatures associated with data stored on a storage system. Doing so would improve resource utilization and performance by reducing the size of an index and improving query time, as disclosed in Madan ¶0054: “To reduce the size of the data structure and/or improve the query time, signatures of data accessed by the host computing device outside a threshold time frame (e.g., data last accessed over 10 days from the current date) and/or data accessed by the host computing device below a threshold frequency (e.g., data accessed less than two times within 3 days from the current date) may be excluded from the data structure.” [0054]
Claims 5-6 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Wu further in view of Armangau and Narayanam et al. (US 20190205055 A1)(cited by examiner in previous action)(hereafter referred to as Narayanam).
Regarding Claim 5,
The same motivation to combine provided in Claim 1 is equally applicable to Claim 5. The combined teachings of Wu and Armangau disclose the following limitations:
The method of claim 1 (see Claim 1 limitation mappings above),
The combined teachings of Wu and Armangau do not explicitly disclose the following limitations:
wherein the first fingerprint information is provided to the candidate storage systems in response to determining one or more properties associated with at least a subset of the candidate storage systems
However, Narayanam discloses the following limitations:
wherein the first fingerprint information (“a deduplication signature” [0021]) is provided to the candidate storage systems (“a second storage system site (“site-2”)” [0021]) in response to determining one or more properties (“data chunk size” [0018]) associated with at least a subset of the candidate storage systems (“a fingerprint of a write operation for data from a source system to a target system may be calculated and an early lookup operation may be commenced on the target system (prior to receiving the write data). This system (e.g., each target storage system) will be made aware of data chunk size used by the other De-dupe system.” [0018] // “It should be noted that for different deduplication block sizes at different sites, calculating a deduplication signature at a first storage system site (“site-1”) and then sending the deduplication signature to a second storage system site (“site-2”)” [0021]) – As taught in Narayanam ¶¶0018-21, a source storage system determines a chunk data size prior to sending fingerprint information to a target storage system. In this case, examiner considers the source storage system and target storage system disclosed in Narayanam ¶0018 as analogous to Client 210 and Storage Cluster System 220 as disclosed in Wu Fig. 2, respectively (i.e., the claimed “source storage system” and “candidate storage systems”, respectively). Examiner considers a chunk data size as “one or more properties” of a storage system storing deduplicated data.
Wu, Armangau, and Narayanam are all considered analogous to the claimed invention because they all relate to the same field of distributing fingerprint information to target storage systems in distributed and deduplicated storage environments. Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wu and Armangau with the teachings of Narayanam and realize a method of determining one or more properties of candidate storage systems before transmitting fingerprint information to the candidate storage systems. Doing so would enable fingerprint information to be calculated prior to distribution to the candidate storage systems, which would be expected to improve efficiency and reduce computation time in environments employing variable length deduplication block sizes, as disclosed in Narayanam ¶0021: “It should be noted that for different deduplication block sizes at different sites, calculating a deduplication signature at a first storage system (“site-1”) and then sending the deduplication signature to a second storage system site (“site-2”) is faster (in terms of computation time) and more efficient than doing it directly on the storage system site (“site-2”) … For different length deduplication block size (i.e., variable length deduplication), the deduplication signature calculated at site-1 will be used to look up and match variable length deduplication fingerprints on site-2. This will reduce the lookup time when the actual data has not been received at site-2.” [0021]
Regarding Claim 6,
The same motivation to combine provided in Claim 5 is equally applicable to Claim 6. The combined teachings of Wu, Armangau, and Narayanam disclose the following limitations:
The method of claim 5, wherein the one or more properties (Narayanam, “data chunk size” [0018]) comprise a data access pattern that is associated with at least one of
read data,
write data (Narayanam, “While performing the WRITE operation, the system … with larger chunk size will calculate the fingerprint…” [0018]) – As taught in Narayanam ¶0018, data is written into storage with a particular chunk size. Examiner accordingly considers a data chunk size of a fingerprint in a deduplication system as reading on the claimed concept of “a data access pattern” which is “associated with” “write data”, under the Broadest Reasonable Interpretation (BRI) of the claimed language--
a data length (Narayanam, “data chunk size” [0018]) – Examiner considers a data chunk size of a deduplication system as reading on the claimed concept of “a data access pattern” which is “associated with” “a data length”, under the BRI of the claimed language--,
or
a time series.
Regarding Claim 20,
The same motivation to combine provided in Claim 16 is equally applicable to Claim 20. The combined teachings of Wu and Armangau disclose the following limitations:
The system of claim 16 (see Claim 16 limitation mappings above),
The combined teachings of Wu and Armangau do not explicitly disclose the following limitations:
wherein the management system provides the first information to the candidate storage systems in response to determining one or more properties associated with at least a subset of the candidate storage systems.
However, Narayanam discloses the following limitations:
wherein the management system (“a first storage system site (“site-1”)” [0018]) provides first fingerprint information (“a deduplication signature” [0021]) to the candidate storage systems (“a second storage system site (“site-2”)” [0021]) in response to determining one or more properties (“data chunk size” [0018]) associated with at least a subset of the candidate storage systems (“a fingerprint of a write operation for data from a source system to a target system may be calculated and an early lookup operation may be commenced on the target system (prior to receiving the write data). This system (e.g., each target storage system) will be made aware of data chunk size used by the other De-dupe system.” [0018] // “It should be noted that for different deduplication block sizes at different sites, calculating a deduplication signature at a first storage system site (“site-1”) and then sending the deduplication signature to a second storage system site (“site-2”)” [0021]) – As taught in Narayanam ¶¶0018-21, a source storage system determines a chunk data size prior to sending fingerprint information to a target storage system. In this case, examiner considers the source storage system and target storage system disclosed in Narayanam ¶0018 as analogous to Client 210 and Storage Cluster System 220 as disclosed in Wu Fig. 2, respectively (i.e., the claimed “source storage system” and “candidate storage systems”, respectively). Examiner considers a chunk data size as “one or more properties” of a storage system storing deduplicated data.
Wu, Armangau, and Narayanam are all considered analogous to the claimed invention because they all relate to the same field of distributing fingerprint information to target storage systems in distributed and deduplicated storage environments. Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wu and Armangau with the teachings of Narayanam and realize a method of determining one or more properties of candidate storage systems before transmitting fingerprint information to the candidate storage systems. Doing so would enable fingerprint information to be calculated prior to distribution to the candidate storage systems, which would be expected to improve efficiency and reduce computation time in environments employing variable length deduplication block sizes, as disclosed in Narayanam ¶0021: “It should be noted that for different deduplication block sizes at different sites, calculating a deduplication signature at a first storage system (“site-1”) and then sending the deduplication signature to a second storage system site (“site-2”) is faster (in terms of computation time) and more efficient than doing it directly on the storage system site (“site-2”) … For different length deduplication block size (i.e., variable length deduplication), the deduplication signature calculated at site-1 will be used to look up and match variable length deduplication fingerprints on site-2. This will reduce the lookup time when the actual data has not been received at site-2.” [0021]
Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Wu further in view of Dixit, Armangau, and Narayanam et al. (US 20190205055 A1)(cited by examiner in previous action)(hereafter referred to as Narayanam).
Regarding Claim 15,
The same motivation to combine provided in Claim 10 is equally applicable to Claim 15. The combined teachings of Wu, Dixit, and Armangau disclose the following limitations:
The non-transitory computer-readable medium of claim 11 (see Claim 11 limitation mappings above),
The combined teachings of Wu, Dixit, and Armangau do not explicitly disclose the following limitations:
wherein the first information is provided to the candidate storage systems in response to determining one or more properties associated with at least a subset of the candidate storage systems
However, Narayanam discloses the following limitations:
wherein the first fingerprint information (“a deduplication signature” [0021]) is provided to the candidate storage systems (“a second storage system site (“site-2”)” [0021]) in response to determining one or more properties (“data chunk size” [0018]) associated with at least a subset of the candidate storage systems (“a fingerprint of a write operation for data from a source system to a target system may be calculated and an early lookup operation may be commenced on the target system (prior to receiving the write data). This system (e.g., each target storage system) will be made aware of data chunk size used by the other De-dupe system.” [0018] // “It should be noted that for different deduplication block sizes at different sites, calculating a deduplication signature at a first storage system site (“site-1”) and then sending the deduplication signature to a second storage system site (“site-2”)” [0021]) – As taught in Narayanam ¶¶0018-21, a source storage system determines a chunk data size prior to sending fingerprint information to a target storage system. In this case, examiner considers the source storage system and target storage system disclosed in Narayanam ¶0018 as analogous to Client 210 and Storage Cluster System 220 as disclosed in Wu Fig. 2, respectively (i.e., the claimed “source storage system” and “candidate storage systems”, respectively). Examiner considers a chunk data size as “one or more properties” of a storage system storing deduplicated data.
Wu, Dixit, Armangau, and Narayanam are all considered analogous to the claimed invention because they all relate to the same field of distributing fingerprint information to target storage systems in distributed and deduplicated storage environments. Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wu, Dixit, and Armangau with the teachings of Narayanam and realize a method of determining one or more properties of candidate storage systems before transmitting fingerprint information to the candidate storage systems. Doing so would enable fingerprint information to be calculated prior to distribution to the candidate storage systems, which would be expected to improve efficiency and reduce computation time in environments employing variable length deduplication block sizes, as disclosed in Narayanam ¶0021: “It should be noted that for different deduplication block sizes at different sites, calculating a deduplication signature at a first storage system (“site-1”) and then sending the deduplication signature to a second storage system site (“site-2”) is faster (in terms of computation time) and more efficient than doing it directly on the storage system site (“site-2”) … For different length deduplication block size (i.e., variable length deduplication), the deduplication signature calculated at site-1 will be used to look up and match variable length deduplication fingerprints on site-2. This will reduce the lookup time when the actual data has not been received at site-2.” [0021]
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Wu further in view of Armangau, Narayanam, and Zhao et al. (US 20170357553 A1)(cited by examiner in previous action)(hereafter referred to as Zhao).
Regarding Claim 7,
The same motivation to combine provided in Claim 5 is equally applicable to Claim 7. The combined teachings of Wu, Armangau, and Narayanam disclose the following limitations:
The method of claim 5 (see Claim 5 limitations mappings above),
Wu is silent regarding how client 210 selects the candidate storage systems (e.g., cluster system 220) to provide the first signature information. Specifically, the combined teachings of Wu, Armangau, and Narayanam are silent regarding the following limitations:
wherein the one or more properties comprise a volume label associated with at least one of inventory information or account information.
However, Zhao discloses within the context of performing backup operations in distributed storage environments that a backup agent selects candidate nodes for storing data based on account information associated with the candidate nodes.
Zhao discloses the following limitations:
wherein the one or more properties (“second configuration information” [0031]) comprise a volume label associated with at least one of inventory information or account information (“account information for the target node” [0031])(“ // “As used herein, the “target node” refers to a destination node to which the application data is to be backed up. Additionally or alternatively, in some embodiments, the configuration information may also include second configuration information about the target node … In some embodiments, the second configuration information may include account information for the target node … such as authorization and authentication information … the data of the application 120 will be directly forwarded to the target node 130 by the backup service agent” [0031-32]) – As shown in Zhao Fig. 1 and detailed ¶¶0031-32, a backup service agent transmits backup data to a target node 130, similar to how client 210 of Wu Fig. 2 transmits backup data to master storage node 221. As disclosed in Zhao, “account information” including “authorization and authentication information” is used by a backup service agent to determine the target node 130 for receiving backup data. Examiner considers account information including authorization and authentication information as reading on the claimed concept of “a volume label associated with” “account information”, under the Broadest Reasonable Interpretation (BRI) of the claimed language.
Wu, Armangau, Narayanam, and Zhao are all considered analogous to the claimed invention because they all relate to the same field of performing data backup in a distributed storage environment. Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wu, Armangau, and Narayanam with the teachings of Zhao and realize a method of determining account information associated with candidate storage nodes before transmitting fingerprint information to the candidate storage nodes. Determining account information associated with candidate storage nodes enables improved configuration and dynamic mapping of backup services in distributed environments with a scaling number of nodes, resulting in simpler and transparent system management and reduced management overhead, as disclosed in Zhao ¶¶0004 // 0019-20: “In a modern cloud computing environment, all resources are dynamically configured and applications usually can be dynamically deployed, scheduled and scaled. This may require that the data backup system can be quickly configured, dynamically mapped with instances of an application regardless of scale-in, scale-out or movement of the instances of the application across nodes, and easily scaled to support massive application instances, etc. In addition, in such a large-scale and dynamically changed environment, management of the data backup system should be as simple and transparent as possible, thereby avoiding extra management overhead.” [0004] // “it is difficult for the traditional data backup system to be rapidly configured, dynamically mapped with instances of an application, and scaled to support massive application instances. In order to solve one or more of the above and other potential problems, exemplary embodiments of the present disclosure provide a solution for data backup. The solution may register the data backup as a service and implements the backup of data of an application from a source node to a target node via a backup service agent” [0019-20]
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Wu further in view of Armangau and Mayo et al. (US 20240143213 A1)(cited by examiner in previous action)(hereafter referred to as Mayo).
Regarding Claim 8,
The same motivation to combine provided in Claim 1 is equally applicable to Claim 8. The combined teachings of Wu and Armangau disclose the following limitations:
The method of claim 1 (see Claim 1 limitation mappings above),
The combined teachings of Wu and Armangau are silent regarding the following limitations:
wherein the second fingerprint information is processed in a tree format.
However, Mayo discloses the following limitations:
wherein the second fingerprint information is processed in a tree format (“In some implementations, upon receiving a tracking query for a particular fingerprint, the storage system may traverse the corresponding path in the tracking tree structure, and may collect tracking information from the fingerprint entries for that fingerprint that are stored in the traversed path” [0103]) – As taught in Mayo, a storage device performs a lookup of a fingerprint (i.e., performs a lookup of “second fingerprint information”) using a “tracking tree structure”. Examiner considers performing a lookup using a tree structure, such as taught by Mayo, as processing fingerprint information “in a tree format”.
Wu, Armangau, and Mayo are all considered analogous to the claimed invention because they all relate to the same field of performing fingerprint lookups on storage nodes in a distributed storage environment. Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wu and Armangau with the teachings of Mayo and realize a method of processing fingerprint information in a tree format. Doing so would improve performance of a deduplication storage system by providing a relatively rapid and efficient lookup mechanism for fingerprints, as disclosed in Mayo ¶0103: “In this manner, use of the fingerprint tracking structure may provide a relatively rapid and efficient processing of tracking queries, and may thereby improve the performance of the deduplication storage system.” [0103]
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Wu further in view of Armangau, and Wei et al. (US 20170017407 A1)(cited by examiner in previous action)(hereafter referred to as Wei).
Regarding Claim 9,
The same motivation to combine provided in Claim 1 is equally applicable to Claim 9. The combined teachings of Wu, Dixit, and Armangau disclose the following limitations:
The method of claim 1 (see Claim 1 limitation mappings above),
The combined teachings of Wu, Dixit, and Armangau are silent regarding the following limitations:
wherein the first fingerprint information comprises a compression ratio of non-duplicated data
However, Wei discloses within the context of performing deduplication in distributed storage environments that fingerprint information associated with data additionally includes “sample compression ratios”.
Wei discloses the following limitations:
wherein the first fingerprint information (“the fingerprints of the chunks” [0072]) comprises a compression ratio (“the sample compression ratio of the chunks” [0072]) of non-duplicated data (Fig. 4, step 49 // “49: Output the chunks and the fingerprints of the chunks, and optionally, the sample compression ratios of the chunks may also be output.” [0072] // ¶0007) – Examiner considers step 49 of Wei Fig. 4 as analogous to step 306 of Wu Fig. 3 because both are steps taken in a distributed storage environment whereby fingerprints associated with non-duplicate data are output to a deduplication storage device (see Wei ¶0007). As clarified in Wei Fig. 4 and disclosed in Wei ¶0072, a “sample compression ratio” of non-duplicate data chunks (i.e., “a compression ratio of non-duplicated data”) is output along with associated chunk fingerprints. Examiner considers outputting both fingerprints and sample compression ratios for data chunks to a deduplication storage device as reading on the claimed concept of “first fingerprint information” “compris[ing]” (e.g., additionally including) “a compression ratio”.
Wu, Armangau Wei are all considered analogous to the claimed invention because they all relate to the same field of outputting non-duplicate data chunks into a distributed storage environment. Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Wu and Armangau with the teachings of Wei and realize a method of outputting first fingerprint information including a compression ratio to candidate storage systems. Providing a compression ratio along with a fingerprint to candidate storage systems would save time and processing resources by enabling appropriate selection of chunk length without increased query pressure to a deduplication storage system, as disclosed in Wei ¶¶0020-22: “Reducing an expected length of a data chunk is conducive to obtaining a higher deduplication rate, but may also increase the number of data chunks and corresponding indexes, thereby increasing the complexity of searching for duplicate data chunks and restricting the deduplication performance. In the prior art, fine-granularity chunking is adopted in a region of transition between duplicate content and non-duplicate content, whereas coarse-granularity chunking is adopted in other regions, thereby forming a bimodal chunking method. The method, however, requires frequently querying for duplication of candidate chunks in a chunking process, and therefore causes query load pressure to a deduplication storage system … a data object is divided into blocks, the blocks are aggregated into a data segment according to a sample compression ratio of each block, and then an expected length is selected according to the lengths and sample compression ratios of data segments to divide each data segment into data chunks” [0020-22].
Response to Arguments
The previous Objection to the Specification is withdrawn in view of the instant claim amendments.
The previous Objection to Claim 10 is withdrawn in view of the instant claim amendments.
The previous 35 U.S.C. 112(a) Rejections of Claims 10-15 are withdrawn in view of the instant amendments to Claim 10. The previous 35 U.S.C. 112(a) Rejections of Claims 1, 3-9, and 16-20 are maintained.
Examiner notes that amendment to Claim 10 to specify that locally stored fingerprint information “used to track data patterns” overcomes the outstanding 35 U.S.C. 112(a) Rejection.
In the final paragraph of the 2nd page of Remarks (numbered as page 8), applicant alleges that Claims 1 and 16 are amended similarly with respect to Claim 10. However, examiner notes that such amendments are not present in Claims 1 and 16 as currently presented. Therefore, the 35 U.S.C. 112(a) Rejections of Claims 1, 3-9, and 16-20 are maintained. Examiner notes that amendment of Claims 1 and 16 in a manner identical to that described above with respect to Claim 10 would overcome the remaining outstanding 35 U.S.C. 112(a) Rejections.
Applicant's arguments filed 01/05/0206 with respect to 35 U.S.C. 103 have been fully considered but they are not persuasive.
With respect to applicant’s argument located within the final paragraph of the 3rd page of remarks (numbered as page 9), continuing to the 4th page of remarks (numbered as page 10), which recites:
“Wu discloses matching information indicating a matching degree between sampled data and stored data for routing purposes. However, Wu does not disclose or teach estimates "indicative of post-transfer storage consumption based on deduplication effectiveness" as recited by claim 1 as amended. Wu's matching information relates to similarity or matching degree for backup routing purposes, not specifically to post-transfer storage consumption estimates based on deduplication effectiveness. Furthermore, Wu selects a storage node based on the maximum number of data segments matched for backup routing purposes, but does not disclose or teach selecting destination storage systems "to minimize total storage consumption across the selected destination storage systems" as recited by claim 1 as amended. Wu's selection criterion is based on matching degree for routing, not on minimizing total storage consumption.”
Examiner has fully considered the aforementioned argument but does not find it persuasive. Applicant argues that the matching information disclosed in Wu relates to a similarity or matching degree for “backup routing purposes” as opposed to specifically relating to “post-transfer storage consumption estimates based on deduplication effectiveness” as recited in the claims as amended; and further that selection of a destination storage system is performed based on segments matched “for backup routing purposes” as opposed to “minimiz[ing] total storage consumption across the selected destination storage systems”. Examiner respectfully disagrees with applicant’s characterization that the concept of similarity/matching degree information as taught in Wu does not specifically relate to post-transfer storage consumption based on deduplication effectiveness.
As detailed in Wu ¶0055, the purpose of employing the routing strategy described in Wu (i.e., using similarity/matching degree information to route backup data to a destination; see Wu Fig. 4 // ¶¶0057-58) is explicitly to save storage space in the destination by leveraging a high deduplication rate: “The routing strategy, which will be further elaborated below, achieves higher data de-duplication rate, thereby saving the storage space of the storage cluster system 220.” [0055] (Emphasis added).
Examiner considers any information which can be employed by a system to indicate an effectiveness of deduplication on storage consumption after a transfer of data as reading on the claimed concept of “estimates indicative post-transfer storage consumption based on deduplication effectiveness”. As taught in Wu ¶¶0055; 0057, the similarity/matching degree information is used in order to route backup data to a destination such that a high de-duplication rate will be achieved and storage space will be saved; i.e., indicates a degree of deduplication effectiveness with respect to storage space consumed post-transfer. Nothing in the claims as currently presented precludes such an interpretation of Claim 1.
Further, as taught in Wu ¶¶0057-59, similarity/matching degree information is used to route backup data to a target destination node having a “maximum number of data segments matched” (¶0059). As would be understood by a person of ordinary skill in the art, within the context of data deduplication, matching (i.e., redundant) data segments of backup data are removed in order to save storage capacity in a storage system (see also Wu ¶0004). Thus, in the context of Wu ¶0059, selection of a destination storage node corresponding to a “maximum number of data segments matched” would correlate to selection of a destination storage node corresponding to a minimum amount of storage consumption at the destination storage node because matching data segments are removed to save storage space. Therefore, the Wu disclosure of using similarity/matching degree information to select a destination storage node having a maximum number of matching segments reads on the claimed concept of “minimiz[ing] total storage consumption across the selected destination storage systems”. Nothing in the claims as currently presented precludes such an interpretation of Claim 1.
With respect to applicant’s argument located within the final paragraph of the 4th page of remarks (numbered as page 10) continuing to the 5th page of remarks (numbered as page 11), which recites:
“The Examiner relied on Armangau to teach that locally stored fingerprint information is managed before data transfer requests. Armangau discloses asynchronous replication where a source storage system sends fingerprints calculated from data blocks along with LBAs to a destination, and the destination uses the fingerprint to perform inline deduplication by attempting to identify a matching target block already stored at the destination. Armangau et al., paragraphs [0028]-[0029]. Armangau's approach involves fingerprint-based matching for replication between a source and a single destination, not generating estimates indicative of post-transfer storage consumption based on deduplication effectiveness from multiple candidate storage systems to select destinations that minimize total storage consumption. Armangau discloses that the destination attempts to deduplicate candidate blocks by matching first fingerprints of identified blocks in the deltaset with second fingerprints at the destination. Armangau et al., paragraphs [0040]-[0041]. Armangau does not disclose or teach receiving estimates from multiple candidate storage systems indicative of post-transfer storage consumption based on deduplication effectiveness, nor selecting among candidate storage systems to minimize total storage consumption as recited by claim 1 as amended.”
Examiner has fully considered the aforementioned argument but finds it moot because examiner relies on Wu, instead of Armangau, for disclosing the claimed features referenced in the aforementioned argument. As detailed above (see 35 U.S.C. 103 rejections), examiner relies on Armangau only to disclose the concepts of performing deduplication on data prior to transferring data to a destination (Independent Claims); and for the concept of managing and tracking second fingerprint information before receiving a request to transfer data (Claims 1 and 16).
With respect to applicant’s arguments located within the 3rd paragraph of the 5th page of remarks (numbered as page 11) continuing through the 7th page of remarks (numbered as page 13), applicant arguments with respect to dependent Claims 3-9, 11-15, and 17-20 are each moot because the new grounds for rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Yoshii et al. (US 20180039423 A1) – Discloses a method of using fingerprint samples of data sets to estimate deduplication effectiveness (see Figs. 6 + 7)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JULIAN SCOTT MENDEL whose telephone number is (703)756-1608. The examiner can normally be reached M-F 10am - 4pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Rocío del Mar Pérez-Vélez can be reached at 571-270-5935. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/J.S.M./Examiner, Art Unit 2133
/ROCIO DEL MAR PEREZ-VELEZ/Supervisory Patent Examiner, Art Unit 2133