Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office action is in response to the amendment filed on 06/04/2025 (hereinafter “amendment”).
Claims 1-20 remain pending. Claims 1-20 have been amended.
The previous Office Action mailed 03/05/2025 sets forth the following rejections: Claims 9-14 were rejected under 35 U.S.C. § 112(b) as being indefinite;
Claims 1-20 were rejected under 35 U.S.C. § 101; and
Claims 1-20 were rejected under 35 U.S.C. § 102 as being anticipated by Tofano (U.S. Publication No. 2020/0057752).
The rejection of claims 9-14 under 35 U.S.C. 112(b) as being indefinite is hereby withdrawn.
The rejection of claims 1-20 under 35 U.S.C. 101 as being directed to an abstract idea
without significantly more, is hereby withdrawn.
Response to Amendment
Applicant’s arguments with respect to the 35 U.S.C. § 102 rejection as being anticipated by Tofano have been considered. Examiner respectfully disagrees with the following characterization of what the applied reference Tofano teaches.
Applicant argues:
“Specifically, Tofano is generally directed to data portion classification using a global deduplication index. See Tofano, par. [0019] ("Some implementations herein are directed to techniques and arrangements for a high- speed data portion classification mechanism that may employ a cluster-wide (i.e., global) distributed deduplication index").
Further, Tofano states as follows (emphases added): The deduplication data portions 310 are received by the classifier 304, which may build a data-portion identifier 312 for each deduplication data portion 310. As mentioned above, comparing full deduplication data portions with each other would be computationally expensive, so one alternative is to generate a data-portion identifier that is representative of at least a portion of the content of each deduplication data portion, and that is far smaller than the actual deduplication data portion. Various types of data-portion identifier generating techniques may be employed, such as hashing, or the like.
Alternatively, the data-portion identifiers may be complex or similarity based and/or may be composed of a reference plus one or more delta bytes. Thus, the data-portion identifiers (of Tofano) herein are not limited to hash-based schemes.
It appears that Applicant has not consider the following relevant teachings of Tofano.,
[0080] The deduplication data portions 310 are received by the classifier 304, which may build a data-portion identifier 312 for each deduplication data portion 310. As mentioned above, comparing full deduplication data portions with each other would be computationally expensive, so one alternative is to generate a data-portion identifier that is representative of at least a portion of the content of each deduplication data portion, and that is far smaller than the actual deduplication data portion. Various types of data-portion identifier generating techniques may be employed, such as hashing, or the like. Alternatively, the data-portion identifiers may be complex or similarity based and/or may be composed of a reference plus one or more delta bytes. Thus, the data-portion identifiers herein are not limited to hash-based schemes.
As shown above, Tofano discusses the efficiency gained by processing data portions selectively and avoiding processing of all the data portions. See Tofano, par. [0080].
Since the claims have been amended by adding the generation/extraction and/or comparing of a plurality of fingerprints to identify the target data items and their locations, examiner has updated the prior art search accordingly. Examiner has reviewed the prior art references and applied a reference that is found to be relevant, and mapped claimed limitations to relevant teachings with necessary explanations and clarifications. As such, the teachings should be readily apparent to the applicant in the rejection below.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1-6, 9-13, and 15-19 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by U. S. Patent No. 9952933 issued to Zhang et al. , published 2018-04-24, hereinafter “Zhang.”
Meaning of some claim limitations are provided below for convenience. It is noted that claim limitations are interpretated in light of applicant’s disclosure. Paragraph no. of the spec, such as [0018] an [0030] are provided for convenience.
Different hashing schemes:
[0018] …. The multiple storage systems may implement different hashing schemes for data deduplication. For example, a storage environment may be a datacenter hosting multiple storage systems that perform data deduplication using different block sizes, different block types (e.g., fixed or variable blocks sizes), different hash functions, and so forth.
Location query:
[0030] Referring again to FIG. 1A, in some implementations, the data location engine 120 (in computing device 100) may receive a location query to search for stored instances of a target data item in the storage environment 160. Further, in response to receiving the location query, the data location engine 120 may determine a user entity (e.g., a human user, a company, an organization, an application, etc.) that generated or otherwise caused the location query. For example, the location query may be generated by a user of the client device 105, by a user of the computing device 100, and so forth. The target data item may be a file, data object, string, and so forth.
[0040] In some implementations, all or part of the location query may include (or identify) a target data item to be located in a storage environment 160 that includes multiple deduplication storage systems…. to perform data deduplication.
location report:
[0054] Accordingly, that storage location may be used to generate an entry in a location report, as described above with reference to FIG. 3. Further, in some implementations, the entry in the Location report may include the match level (or a value or indication based on the match value) that was calculated to identify the storage location. For example, the match level may indicate a degree of confidence or probability that the storage location (or the corresponding user visible object) actually stores the target data item 440.
potential storage locations:
[0061] Referring again to FIG. 7, instruction 750 may be executed to identify, using the generated fingerprints, potential storage locations of the target data item in the plurality of deduplication storage systems. For example, referring to FIG. 4B, the controller attempts to match the fingerprint 460A against fingerprints stored in data unit references of the deduplication metadata of DSS ?S5? 470A. ….. the controller may read a pointer included in that data unit reference to determine storage location(s) 480A of the target data item 440 in DSS ?S5? 470A. …..the controller matches the fingerprint 460C against deduplication metadata of DSS ?S8? 470C, and thereby determine storage location(s) 480C of the target data item 440 in DSS ?S8? 470C.
Regarding claim 1,
Zhang teaches, a computing device comprising: a processor; a memory (Zhang, Fig. 1A-B); and a machine-readable storage storing instructions, the instructions executable by the processor to:
receive (Fig. 7, step 705, “detecting a request”) a location query for a (Zhang, col. 6, line 28, “data segment location,”; col. 9, lines 17-20, “location of each data segment”; Fig. 7A, steps 705, 710, 715; request from a client; col. 15, line 33-34, “At 715, the process locates data segments based on the container ID of the data segments”) target data item to be located in a storage environment including a set of deduplication storage systems that use different hashing schemes (Zhang col. 11, lines 20-24, “each fingerprint entry 20 includes a first fingerprint calculated using a first fingerprinting algorithm, a second fingerprint previously-calculated
using a second fingerprinting algorithm, and locations 420(1)-(N).” The “first finger printing algorithm” and “second fingerprinting algorithm” are equated with the “different hashing schemes”)
determine a plurality of deduplication storage systems based on the received location query, wherein the plurality of deduplication storage systems (Fig. 1A-B, element 70, col. 5, lines 12-20, “Local storage 70 can be a persistent storage device and can include one or more of a variety of different storage devices, … or one or more logical storage devices such as volumes implemented on one or more such physical storage devices”) is a subset (In Fig. 7A-B, at “745, the process retrieves a subset of data segments (e.g., data segments 130(1)-(2)) that still have the second fingerprint ( e.g., entries 1 or 2 as shown in FIG. 4C)”of the set of deduplication storage systems included in the storage environment;”
determine a plurality of hashing schemes ( Zhang, col. 11, lines 15-28 “first finger printing algorithm” and “second fingerprinting algorithm”) used by the plurality of deduplication storage systems, respectively;
generate a plurality of fingerprints that represent the target data item, wherein each of plurality of fingerprints is generated by applying, to the target data item, a different hashing scheme (Zhang, col. 11, lines 15-28) of the plurality of hashing schemes used by the plurality of deduplication storage systems;
identify, using the generated plurality of fingerprints that represent the target data item, potential storage locations of the target data item in the plurality of deduplication storage systems; (Zhang, col. 16, lines 24-39, “in FIG. 7B ….. the source side is a client system (e.g., computing device 10) and the target side is a server (e.g., deduplication server 210) … the client system … … does not need to calculate fingerprints …..the source side first loads data object A which is to be replicated, and then retrieves data object B of the last full backup from the target side (e.g., from the server). The client system on the source side then compares a fingerprint list of data objects A and B ( e.g., list A and list B). If a fingerprint exists in list A, but does not exist in list B, the client system queries the cache in the target side”); In order for the client to compare the fingerprints, the client has to identify the fingerprints first; see also Zhang, col. 11, lines 24-34, “… in FIG. 4A, both the old and new fingerprints of a data segment and a location ( or container ID) of the data segment are stored in the data segment's corresponding fingerprint entry. Deduplicated data store 260 …stores fingerprints FPl and FPl' are identifiers of a respective data segment stored in deduplicated data store 260. Location 420 is an identifier of a location of where a respective data segment is stored in deduplicated data store 260, such as an identifier of a storage container (e.g., container ID) that includes the respective data segment;”;
generate a location report based on the identified potential storage locations (see Zhang, Fig. 4B, col. 11, lines 52-61, “FIG. 4B …Fingerprint cache 140 is … to store a subset of fingerprint entries retrieved from index file 90, …. in deduplication server cache 410 or local storage 70…. each fingerprint entry includes both first (new) and second (old) fingerprints (shown as FPl, FPl' and FP2, FP2' etc.), locations of the old and new fingerprints shown as offsets 450(1)-(N)), and a segment size of the data segments (shown as sizes 440(1)-(N)).” The offsets are equivalent to the locations as recited in the claims.
Regarding dependent claim 2,
Zhang teaches the step to determine a particular user entity associated with the location query; and determine the plurality of deduplication storage systems to include each deduplication storage system, as well as that the Zhang process is accessible to the particular user entity, because Zhang responds to a client query (“thus, storage management module 350 can add information received from a client (e.g. a data object, data segment, or data segment fingerprint) to update its information or can retrieve the necessary information to respond to a client query (e.g., from computing device 10).” See Zhang, col. 9, lines 65-67.
Regarding claim 3,
Zhang teaches a computing device of claim 2, including instructions executable by the processor to, for each deduplication storage system of the plurality of deduplication storage systems: identify a set of fingerprint matches between the generated plurality of fingerprints and fingerprints stored in a set of metadata records of the deduplication storage system; and identify the potential storage locations of the target data item based on the set of metadata records of the deduplication storage system, because Zhang teaches matching new data segment with existing fingerprints that are stored in a metadata store. (See Zhang, col. 9, lines 32-39: “If a new data segment's fingerprint matches existing fingerprints (e.g., first fingerprint 150(1) and second fingerprint 110(1)) presently stored in metadata store 250 and associated with the new data segment, deduplication server 210 can determine that the new data segment is likely to be already stored within data segments 130 (e.g., the new data segment is a common data segment), and thus does not need to be written to deduplication data store 260.”)
Regarding claim 4,
Zhang teaches a computing device of claim 3, including instructions executable by the processor to, for each deduplication storage system of the plurality of deduplication storage systems: translate, based on a mapping data structure, the potential storage locations into user visible objects; and generate the location report comprising a listing of the user visible objects, because Zhang teaches that its “Deduplication server 210 can create a new entry in metadata store 250 for a new data segment, and can store the data segment's location in the new entry. Deduplication server 210 can also add the new fingerprint of a data segment to the new entry associated with the corresponding data segment. … Thus, in the embodiment shown, metadata store 250 can contain a new first signature 150(N+1) and a new location that correspond to a new data segment 130(N+1) that is stored in deduplicated data store 260.. See col. 9, lines 22-27.” Storing the new data segment and/or new fingerprints, and the locations of segment/fingerprints in a metadata store is equated with the making of user visible objects.
Regarding claim 5,
Zhang teaches a degree of confidence (i.e., probability) associated with each user visible object including in the location report, because Zhang improves the cache hit for fingerprint matching and identifies the likely candidates. See Zhang, col. 11, lines 64-67. (“ storing relevant and frequently accessed fingerprint entries in fingerprint cache 140 improves the likelihood of fingerprint cache hits, since the data segments corresponding to the relevant fingerprint entries are likely candidates to be reused as part of a new backup operation for data segments that have not changed since a previous (or initial) backup operation.”). The likelihood is equated with the probability and confidence level of cache hits.
Regarding claim 6 (a computing device of claim 2, including instructions executable by the processor to: identify a source device that generated the location query; and determine, based on the identified source device, the particular user entity associated with the location query, Zhang teaches client devices wherein client is the user associated with the client device. See Zhang, col. 9, lines: “Thus, storage management module 350 can add information received from a client (e.g. a data object, data segment, or data segment fingerprint) to update its information or can retrieve the necessary information to respond to a client query (e.g., from computing device 10).”
Claims 2-6, and 15-19 are essentially the same as claims 8 except they are directed to a method and computer program product respectively, and are rejected under the same rationale applied to the rejection of claim 8 above.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 7-8, 14 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang, US Patent No. 9952933 (“Zhang”).
Regarding claim 7 (each hashing scheme to includes a chunking algorithm and a hashing function,) Zhang teaches the use of hashing algorithm and generating hash values and digests (“A fingerprint is a value generated for a given data segment. Typically, such fingerprint values need to be substantially unique to each data segment, and thus distinguish data segments from one another. An example of a fingerprint is a hash value. For example, hashing algorithms (also called fingerprinting algorithms) such as Rabin's Algorithm, Message-Digest Algorithm 5 (MD5), Secure Hash Algorithm 1 (SHA-1), and Secure Hash Algorithm 256 (SHA-256) and the like can be used to generate hash values. The function of a hashing algorithm is to recreate input data from the hashing algorithm's hash value alone. The input data is typically referred to as the “message” and the hash value is typically referred to as the “message digest” or simply “digest.” See Zhang, col. 3, lines 51-60.), however, does not explicitly indicate that generating digest is a chunking algorithm. Zhang teaches the use stronger cryptographic hash function (like SHA-1 or MD5) is used to generate a unique identifier (a "fingerprint" or "digest") for each block. It would have been obvious to one pf ordinary skills in the art to use a chunking algorithm because Zhang teaches dividing the image files into fixed-size chunks (see Zhang, col. 8, lines 1-4, “backup image file can be divided into a plurality of chunks, and each chunk can be divided into a plurality of fixed-size data segments. The person of ordinary skill would be motivated by Zhang’s suggestion that a deduplication system would have more efficient when fixed-sized chucks are used as the backup speed improves (see Zhang, col. 16, lines 1-10, the determination whether the subset of data retrieved is small enough not to affect backup speed
can be made by a predetermined threshold or by a system/network administrator. If the subset is not small enough (e.g., the subset negatively affects backup speed), the process, at 755, retrieves a smaller subset of data segments than the previously retrieved subset of data segments. For example, the process can retrieve a small a smaller subset of data segments so that backup speed is not adversely affected.)
Regarding claims 8, 14 and 20, (the computing device of claim 1, including a plurality of deduplication storage systems: divide, based on the respective hashing scheme of the deduplication storage system, the target data item into a set of data units; generate, based on the respective hashing scheme of the deduplication storage system, a sequence of fingerprints for the set of data units; determine a match level between the generated sequence of fingerprints and a sequence of stored fingerprints of the deduplication storage system; and determine that the target data item is stored in the deduplication storage system in response to a determination that the match level exceeds a predefined threshold), Zhang teaches dividing the data units into a size based on a predetermined threshold, however, does not explicitly call the threshold a “match level.” (see Zhang, col. 16, lines 1-10, the determination whether the subset of data retrieved is small enough not to affect backup speed can be made by a predetermined threshold or by a system/network administrator. If the subset is not small enough (e.g., the subset negatively affects backup speed), the process, at 755, retrieves a smaller subset of data segments than the previously retrieved subset of data segments. For example, the process can retrieve a small a smaller subset of data segments so that backup speed is not adversely affected).
It would have been obvious to one pf ordinary skills in the art to use a chunking algorithm because Zhang teaches dividing the image files into fixed-size chunks (see Zhang, col. 8, lines 1-4, “backup image file can be divided into a plurality of chunks, and each chunk can be divided into a plurality of fixed-size data segments.” The person of ordinary skill would be motivated by Zhang’s suggestion that a deduplication system would have more efficient when fixed-sized chucks are used as the backup speed improves (see Zhang, col. 16, lines 1-10).
Claims 14 and 20 are essentially the same as claims 8 except they are directed to a method and computer program product respectively, and are rejected under the same rationale applied to the rejection of claim 8 above.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claims 1-20 have been amended by adding the generation/extraction and/or comparing of a plurality of fingerprints to identify the target data items and their locations, examiner has updated the prior art search accordingly. Examiner has reviewed the prior art references and applied a reference that is found to be relevant, and mapped claimed limitations to relevant teachings with necessary explanations and clarifications. As such, the teachings cited in the office action should be readily apparent to the applicant in the rejection below.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HOSAIN T ALAM whose telephone number is (571)272-3978. The examiner can normally be reached Mon-Thu, 8:00 - 4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/HOSAIN T ALAM/Supervisory Patent Examiner, Art Unit 2132