Last updated: May 29, 2026

Application No. 18/799,925

INTELLIGENT GARBAGE COLLECTION BASED ON CONTENT SIMILARITY

Non-Final OA §102§103

Filed

Aug 09, 2024

Priority

Jan 25, 2021 — continuation of 12/061,814

Examiner

NGUYEN, THAN VINH

Art Unit

2138

Tech Center

2100 — Computer Architecture & Software

Assignee

Pure Storage Inc.

OA Round

3 (Non-Final)

Interview Optional

— +4.3% interview lift. Interview lift (+4.3%) is below the 15.0% threshold. A written response is recommended.

Based on 806 resolved cases, 2023–2026

Examiner Intelligence

NGUYEN, THAN VINH View full profile →

Grants 91% — above average

Career Allowance Rate

732 granted / 806 resolved

+35.8% vs TC avg

Minimal +4% lift

Without

With

+4.3%

Interview Lift

resolved cases with interview

Typical timeline

2y 2m

Avg Prosecution

11 currently pending

Career history

816

Total Applications

across all art units

Statute-Specific Performance

§101

3.2%

-36.8% vs TC avg

§103

30.5%

-9.5% vs TC avg

§102

47.2%

+7.2% vs TC avg

§112

10.1%

-29.9% vs TC avg

Black line = Tech Center average estimate • Based on career data from 806 resolved cases

Office Action

§102 §103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 2/16/26 has been entered.
Claims 1-20 are pending.

Response to Amendment/Argument
Applicant has amended the claims with new limitations.  The Office Action is amended to address the amended claims.
Applicant's arguments filed 11/4/25 have been fully considered but they are not persuasive.
As to independent claims 1 and 10, Applicant argues that “Lu does not disclose or suggest the features of "determine, prior to initiating garbage collection, similarity of content of a plurality of data segments…”.This argument is not persuasive.  Lu teaches determining similarity in data for reorganizing data for better compression.  Lu perform scanning of data for similarity before reorganizing them and sending them for compression (C7:L54-63).  After the data is reorganized and compressed, the data can be deleted to make space available for other purposes, which is considered as the garbage collection operation (C8:L1-7).  Thus, the scanning of data for similarity occurs before the garbage collection (deletion of data to make space available for other purpose). Furthermore, Lu teaches this reorganization of data can be part of garbage collection operation or separately part of data migration (C14:L15-33).  This shows that the determining of similar content can be separate from the garbage collection process.
As to claim 9, 18, and 20, Applicant argues Lu does not suggest the selection of the group of data segments is based on similarity between different data segments rather than similarity between data chunks within a single data segment.  This argument is not persuasive.  Lu teaches improving data compression by finding similar data regions and moving them closer together for more effective data compression (C4:L10-30).  Lu teaches data is partitioned into multiple chunks (also referred to as segments) (C6:L55-58).  Lu determining similar data chunks/segments  (C7:L15-24, 30-35; C18:L5-25).  Therefore, Lu does teach selection of different segments based on similarity.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1, 3-5, 7-10, 12-14, and 16-20 are rejected under 35 U.S.C. 102(a)(1) and (a)(2) as being anticipated by Lu (Patent No.: US 9411815 B1), “LU”).
As claim 1, 10, 19:
LU teaches: A storage system (system; Fig. 1), associated method of operation (Fig. 5A-B), computer readable media storing instructions (C28:L41-50) (comprising: storage memory; and a processing device operatively coupled to the storage memory (C14:L33-56 teach storage software 105 executed from a memory by a processor; see also FIG. 9 105 and 108), the processing device configured to: 
Determine, prior to initiating garbage collection (perform scanning of data for similarity before reorganizing them and sending them for compression (C7:L54-63); after data is reorganized and compressed, the data can be deleted to make space available for other purposes, which is interpreted as the garbage collection operation (C8:L1-7); reorganization of data can be part of garbage collection operation or separately part of data migration (C14:L15-33)), similarity of content of a plurality of data segments stored in the storage memory based on a plurality of hash values associated with the plurality of data segments, the plurality of data segments comprising live data and dead data (C7:L15-24 discloses a similarity detector 121 is configured to detect or determine the similarity of data chunks based on their respective features, super features, and/or sketches. Based on the similarity of the data chunks, reorganizer is configured to reorganize or rearrange the order or locations of the data chunks, such that similar data chunks are grouped together. Thereafter, compressor is configured to compress the grouped similar data chunks and store them together in one of the storage units. As a result, the data compression of the data chunks stored can be greatly improved).  C7:L30-35 discloses the similarity of the data chunks is determined by similarity detector based on matching of data patterns of the data chunks. A data pattern of a data chunk can be a feature extracted from content of the data chunk, a super feature formed based on multiple features of the data chunk, or a sketch formed based on multiple super features of the data chunk.  C18:L5-25 teaches when chunks are copied forward to a new location, they are grouped based on similarity to achieve better compression, which can be done by sketching and binning the chunks, identifying similar containers and grouping those chunks, or identifying all chunks to be copied and sorting their sketches to group them, and then implementing the compression technique in the storage system);
select two or more of the plurality of data segments for combined garbage collection and compression based on a similarity of content in the two or more of the plurality of data segments satisfying a similarity threshold (C7:L15-24 discloses a similarity detector 121 is configured to detect or determine the similarity of data chunks based on their respective features, super features, and/or sketches. Based on the similarity of the data chunks, reorganizer is configured to reorganize or rearrange the order or locations of the data chunks, such that similar data chunks are grouped together. Thereafter, compressor is configured to compress the grouped similar data chunks and store them together in one of the storage units. As a result, the data compression of the data chunks stored can be greatly improved).  The act of selecting and moving data chunks based on similarity characteristics meet this claimed limitation.
perform garbage collection of the dead data for two or more of the plurality of data segments (C18:L3-6 teaches performing garbage collection to free unused/dead space), wherein to perform the garbage collection, the processing device is configured to compress the live data based on similarities of portions of the live data (C7:L15-24 discloses a similarity detector 121 is configured to detect or determine the similarity of data chunks based on their respective features, super features, and/or sketches. Based on the similarity of the data chunks, reorganizer is configured to reorganize or rearrange the order or locations of the data chunks, such that similar data chunks are grouped together. Thereafter, compressor is configured to compress the grouped similar data chunks and store them together in one of the storage units. As a result, the data compression of the data chunks stored can be greatly improved). (C18:L9-25 teaches when live chunks are copied forward to a new location, they are grouped based on similarity to achieve better compression, which can be done by sketching and binning the live chunks, identifying similar containers and grouping those live chunks, or identifying all live chunks to be copied and sorting their sketches to group them, and then implementing the compression technique in the storage system (C14:L15-56 & FIG. 9 teach data chunks that have been deduplicated can be reorganized based on their similarity, compressed, and stored in the same or different storage area such as a compression region or a container, where storage units 108-109 have containers containing one or more compression regions and their respective metadata, and each compression region contains one or more data chunks and their respective metadata);
and writing the compressed live data of the two or more data segments into the storage memory (C14:L15-56 & FIG. 9; teach data chunks that have been deduplicated can be reorganized based on their similarity, compressed, and stored in the same or different storage area such as a compression region or a container, where storage units 108-109 have containers containing one or more compression regions and their respective metadata, and each compression region contains one or more data chunks and their respective metadata). (C7:L15-24;The data chunks are then reorganized by reorganizer into a second sequence order based on the similarity of the data chunks, where the second sequence order is different than the first sequence order. The reorganized data chunks are then compressed by compressor into a second file to be stored in storage system, such that similar data chunks are stored and compressed together in the second file).
As to claim 3, 12:
LU teaches the elements of claim 1 as outlined above. LU also teaches wherein the obtaining the hash values comprises: storing a set of hashes for each of the plurality of data segments with corresponding data segment, for each of the plurality of data segments (LU C15:L25-38 teach when data chunks 905-906 were stored in compression regions 903-904, a chunk similarity representation such as a sketch of each data chunk may be generated and stored as part of metadata 907-908). 
As to claim 4, 13:
LU teaches the elements of claim 1 as outlined above. LU also teaches wherein the determining the similarity of content of the plurality of data segments comprises determining similarity of portions of data of the data segments according to a similarity metric applied across the plurality of data segments, based on the hash results (see LU C4:L10-23, C18:L19-32, and FIG. 5A & 7 as taught above in reference to claim 1; C4:L24-35 also teach similarity of the data chunks is determined based on matching of data patterns of the data chunks, which can be a feature extracted from content of the data chunk, a super feature, or a sketch; see also C2:L1- 24 for generating a resemblance hash over a portion of the data chunk and checking the sketch against an index of previously stored data chunks, as well as applying a rolling hash function (e.g. Rabin fingerprint) over all overlapping small regions of the data chunk, and generating any number of independent features, which are used for similarity matching; C2:L88-45 also teach a cryptographic hash referred to as a fingerprint can be utilized to identify a specific data chunk, which can be a portion of a file). 
As to claim 5, 14:
LU teaches the elements of claim 1 as outlined above. LU also teaches wherein the determining the similarity of content of the plurality of data segments comprises determining dissimilarity of portions of data of the data segments according to a dissimilarity metric applied across the plurality of data segments, based on the hash results (LU C4:L10-23, C18:L19-32, and FIG. 5A & 7 as taught above in reference to claim 1; C4:L24-35 also teach similarity of the data chunks is determined based on matching of data patterns of the data chunks, which can be a feature extracted from content of the data chunk, a super feature, or a sketch; see also C2:L1-24 for generating a resemblance hash over a portion of the data chunk and checking the sketch against an index of previously stored data chunks, as well as applying a rolling hash function (e.g. Rabin fingerprint) over all overlapping small regions of the data chunk, and generating any number of independent features, which are used for similarity matching; C2:L38-45 also teach a cryptographic hash referred to as a fingerprint can be utilized to identify a specific data chunk, which can be a portion of a file). 
As to claim 7, 16: 
LU teaches the elements of claim 1 as outlined above. LU also teaches wherein to compress the live data the processing device is further configured to: identify identical portions of data in plurality of data segments (C25:L54-63 teach metadata associated with a chunk is used to identify identical data segments, where C27:L10-37 teach deduplicated chunks may be compressed into one or more CRs; FIG. 23 and C24:L17-39 also teach groups of similar data chunks are compressed and stored; see also C15:L44-48). 
As to claim 8, 17:
LU teaches the elements of claim 1 as outlined above. LU also teaches wherein the performing the data compression of the live data of the two or more of the plurality of data segments comprises performing data compression that records differences among similar portions of data in the two or more data segments (LU C18:L9-25 teach when live chunks are copied forward to a new location, they are grouped based on similarity to achieve better compression, identifying similar containers and grouping those live chunks, or identifying all live chunks to be copied and sorting their sketches to group them, and then implementing the compression technique in the storage system, i.e. live chunks with different sketches are grouped into different containers). 
As to claim 9, 18, 20:
LU teaches the elements of claim 1 as outlined above. LU also teaches wherein the selection of the group of data segments is based on similarity between different data segments rather than similarity between data chunks within a single data segment (C4:L10-30 teaches improving data compression by finding similar data regions and moving them closer together for more effective data compression; C6:L55-58 teaches data is partitioned into multiple chunks (also referred to as segments;  C7:L15-24, 30-35; C18:L5-25 determines similar data chunks/segments; C15:L6-24 teaches determining the similarity of the data chunks/segments stored in first storage areas, grouping them based on the similarity, compressing and storing them in second storage areas, and then reclaiming resources associated with the first storage areas). 
 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 6, 15 are rejected under 35 U.S.C. 103 as being unpatentable over LU in view of LUO (Pub. No.: US 20210397350 A1), “LUO”.
As to claim 6, 15: 
LU teaches the elements of claim 1 as outlined above. LU does not appear to explicitly teach: wherein the determining the similarity of content of the plurality of data segments comprises determining a Jaccard distance between data segments, based on the hash values. 
However, LU in view of LUO teaches the limitation (LUO [0021] teaches calculating the difference between any two locality sensitive hash values by using a Jaccard distance, a Jaccard distance between the first hash value and the second hash value is less than a first distance threshold, where LU C4:L10-35 teach similarity of the data chunks is determined based on sketches).
Accordingly, it would have been obvious to a person having ordinary skill in the art at the time of the effective filing of the invention, having the teachings of LU and LUO before them, to modify LU’s deduplicated storage system to utilize a Jaccard distance to calculate the difference between two hash values as taught by LUO. Using the known technique of calculating the Jaccard distance to provide the predictable result of the deduplicated storage system utilizing a Jaccard distance to calculate the difference between two hash values in LU would have been obvious to a person having ordinary skill in the art, since a person having ordinary skill in the art would recognize that LU was ready for improvement to incorporate the use of a Jaccard distance to calculate the difference between two hash values as taught by LUO.

Claims 2 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over LU in view of OLTEAN (Pub. No.: US 20140244604 A1), “OLTEAN”.
Regarding claim 2, 11: 
LU teaches the elements of claim 1 as outlined above. LU does not appear to explicitly teach:
the processing device is further configured to: obtain the plurality of hash values based on a sliding window hash function; and deterministically select a subset of the plurality of hash values, for each of the plurality of data segments.
However, OLTEAN teaches the limitation (OLTEAN [0032] teaches an algorithm to chunk file contents based on fast hashing techniques that is repeatedly computed on a sliding window, where a chunk is being selected when the hash functions and the current chunk size/content meet certain heuristics).  Accordingly, it would have been obvious to a person having ordinary skill in the art at the time of the effective filing of the invention, having the teachings of LU and OLTEAN before them, to modify LU’s deduplicated storage system to utilize fast hashing techniques using a sliding window as taught by OLTEAN. Using the known technique of fast hashing techniques using a sliding window to provide the predictable result of the deduplicated storage system utilizing fast hashing techniques using a sliding window in LU would have been obvious to a person having ordinary skill in the art, since a person having ordinary skill in the art would recognize that LU was ready for improvement to incorporate the fast hashing techniques using a sliding window as taught by OLTEAN.

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to THAN NGUYEN whose telephone number is (571)272-4198. The examiner can normally be reached M-F 7:00am -4:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tim Vo can be reached at (571)272-3642. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/THAN NGUYEN/Primary Examiner, Art Unit 2138

Read full office action

Prosecution Timeline

Aug 09, 2024

Application Filed

Aug 04, 2025

Non-Final Rejection mailed — §102, §103

Nov 04, 2025

Response Filed

Dec 17, 2025

Final Rejection mailed — §102, §103

Feb 16, 2026

Request for Continued Examination

Feb 24, 2026

Response after Non-Final Action

Mar 02, 2026

Non-Final Rejection mailed — §102, §103

May 15, 2026

Response Filed

Precedent Cases

Applications granted by this same examiner with similar technology

18/809,810

Patent 12639005

STORAGE DEVICE FOR SKIPPING REDUNDANT REPLAY AND JOURNAL REPLAY METHOD THEREOF

1y 9m to grant Granted May 26, 2026

18/828,045

Patent 12638999

APPARATUS, METHOD, AND SYSTEM FOR MANAGING REPLICATION BASED ON REPLICATION DELAY TIME

1y 8m to grant Granted May 26, 2026

18/806,112

Patent 12625646

STORAGE DEVICE, STORAGE CONTROLLER, AND OPERATING METHOD FOR REORDERING READ REQUESTS

1y 9m to grant Granted May 12, 2026

18/884,449

Patent 12625637

COMPUTER SYSTEM, REMOTE COPY CONTROL METHOD, AND REMOTE COPY CONTROL PROGRAM

1y 8m to grant Granted May 12, 2026

18/376,094

Patent 12602185

MAINTAINING SYNCHRONISATION BETWEEN MEMORY WRITING AND READING BLOCKS USING AN INTERNAL BUFFER AND A CONTROL CHANNEL

2y 6m to grant Granted Apr 14, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4

Expected OA Rounds

91%

Grant Probability

95%

With Interview (+4.3%)

2y 2m (~5m remaining)

Median Time to Grant

High

PTA Risk

Based on 806 resolved cases by this examiner. Grant probability derived from career allowance rate.