Last updated: May 04, 2026

Application No. 18/084,054

COHERENCY BYPASS TAGGING FOR READ-SHARED DATA

Non-Final OA §103

Filed

Dec 19, 2022

Examiner

TALUKDAR, ARVIND

Art Unit

2132

Tech Center

2100 — Computer Architecture & Software

Assignee

Intel Corporation

OA Round

1 (Non-Final)

Interview Optional

— +3.7% interview lift. Interview lift (+3.7%) is below the 15.0% threshold. A written response is recommended.

Based on 559 resolved cases, 2023–2026

Examiner Intelligence

TALUKDAR, ARVIND View full profile →

Grants 80% — above average

Career Allowance Rate

450 granted / 559 resolved

+25.5% vs TC avg

Minimal +4% lift

Without

With

+3.7%

Interview Lift

resolved cases with interview

Typical timeline

2y 9m

Avg Prosecution

36 currently pending

Career history

595

Total Applications

across all art units

Statute-Specific Performance

§101

7.8%

-32.2% vs TC avg

§103

51.7%

+11.7% vs TC avg

§102

15.0%

-25.0% vs TC avg

§112

13.7%

-26.3% vs TC avg

Black line = Tech Center average estimate • Based on career data from 559 resolved cases

Office Action

§103

DETAILED ACTION
Claims 10-20 are pending. Claims 1-9 are cancelled. 
Priority: 12/19/2022
Assignee: Intel

	Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 10-13, 16-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Haber et al.(20190004801), and further in view of Turner et al.(20220019534).

As per claim 10, Haber discloses:
An apparatus comprising: 
decoder circuitry to decode a single instruction(Haber, [0073 -- After fetching the instruction from code storage 102, decode circuit 106 decodes the fetched instruction, including by parsing the various fields of the instruction.]), the single instruction to include a field for an identifier of a first source operand and a field for an opcode(Haber, [0080 -- As shown, instruction 300 includes opcode 302, destination identifier 304, immediate 306, optional source identifier 308 (optional instruction fields are shown in boxes with dashed outlines), optional second immediate 310, optional element size identifier 312, and optional write mask 314.], [0122 -- The instruction is received by decode circuitry 605. For example, the decode circuitry 605 receives this instruction from fetch logic/circuitry. The instruction 601 includes fields for an opcode (e.g., mnemonic “VBROADCASTIMM”), a destination identifier to specify a packed destination register, and an immediate.]),;
Haber does not explicitly disclose the following, however Turner discloses:
 the opcode to indicate execution circuitry is to update coherency bypass information(Turner, [0126 --  Referring to FIG. 12A, in optional block 1202 in the method 1200a, the virtual cache device may receive a release synchronization operation command from a processor core (e.g., processor 124 in FIG. 1, processor 200 in FIG. 2, processor core 310 in FIGS. 3, 6, 10) or other coherent processing device.], [0136 --  Referring to FIG. 12B, in optional block 1230 of the method 1200b, the virtual cache device may receive an invalidate cache coherency operation. The invalidate cache coherency operation my correspond to a type of snoop received by a snoop filter. For example, the cache coherency operation may be an invalidate operation in response to a write snoop.]);
 and execution circuitry to execute the decoded instruction according to the opcode to update coherency bypass information for data indicated by the first source operand(Turner, [0127 -- In optional block 1204, the virtual cache device may update synchronization status bits for lines in the virtual cache subject to the release synchronization operation command. The synchronization status bits may indicate to the virtual cache device whether there is a pending synchronization operation and what type of cache coherence operation to implement in response to receiving a cache coherence operation while a synchronization operation is pending], [0137 -- In optional block 1232, the virtual cache device may update synchronization status bits for lines in the virtual cache subject to the invalidate cache coherency operation. ]).
Therefore it would have been obvious to a person of ordinary skill at the time of filing to incorporate the features of Turner into the system of Haber for the benefit of a cache coherency operation that is sent to the virtual cache in response to determining that the entry exists in the snoop filter, and which enables to reduce the latency and constrained bandwidth, reduces the complexity such as support for parallel memory management unit (MMU) lookups, and also reduces the cost of the MMU on the critical path to the lowest level caches, and enables to avoid bottlenecks, and prevents the over-invalidation of cache lines(Turner, 0001).

As per claim 11, the rejection of claim 10 is incorporated, in addition, Haber discloses:
 wherein the field for the identifier of the first source operand is to identify a vector register(Haber, [0083 -- Destination identifier 304 in some embodiments specifies a vector register, such as one of the vector registers provided in a processor's register file.]).

As per claim 12, the rejection of claim 10 is incorporated, in addition, Haber discloses:
wherein the field for the identifier of the first source operand is to identify a memory location(Haber, [0084 -- In some embodiments, optional source identifier 308 identifies a general purpose register included in the processor's register file, for example, as illustrated and discussed with respect to the embodiment of FIG. 4C, below. FIG. 30 and its associated description further below describe an embodiment of a processor's register file. In some embodiments, optional source identifier 308 identifies a memory location.]).

As per claim 13, the rejection of claim 10 is incorporated, in addition, Haber discloses:
wherein the single instruction is further to include a field for an identifier of a second source operand to indicate a size of the data indicated by the first source operand(Haber, [0086 -- Optional element size identifier 312, in some embodiments, is included in the opcode, such as a prefix or suffix, “B,” “W,” “D,” and “Q,” corresponding to a size—1 byte, 2 bytes, 4 bytes, or 8 bytes, respectively—of each destination vector element. In some embodiments, optional element size identifier 312 is included in the opcode, such as a prefix or suffix, “H,” “5,” “D,” “Q,” and “E,” corresponding to precision levels]).

As per claim 16, Haber discloses:
A method, comprising: 
fetching an instruction having a field for an opcode and a field for an identifier of a first source operand(Haber, [0074 --  After starting the process, a fetch circuit at 202 fetches the instruction from a code storage, the instruction including an opcode, a destination identifier to specify a destination vector register, a first immediate, a second immediate, and a write mask identifier to specify a write mask register, the write mask register comprising at least one bit corresponding to each element of the destination vector register], [0122 -- The instruction is received by decode circuitry 605. For example, the decode circuitry 605 receives this instruction from fetch logic/circuitry. The instruction 601 includes fields for an opcode (e.g., mnemonic “VBROADCASTIMM”), a destination identifier to specify a packed destination register, and an immediate.]); 
decoding the instruction(Haber, [0075 -- At 204, the fetched instruction is decoded by decode circuitry]); 
scheduling execution of the instruction(Haber, [0166 -- In some embodiments, register renaming, register allocation, and/or scheduling circuitry 1207 provides functionality]); 
Haber does not explicitly disclose the following, however Turner discloses:
and executing the decoded instruction according to the opcode to update coherency bypass information for data indicated by the first source operand(Turner, [0127 -- In optional block 1204, the virtual cache device may update synchronization status bits for lines in the virtual cache subject to the release synchronization operation command. The synchronization status bits may indicate to the virtual cache device whether there is a pending synchronization operation and what type of cache coherence operation to implement in response to receiving a cache coherence operation while a synchronization operation is pending], [0137 -- In optional block 1232, the virtual cache device may update synchronization status bits for lines in the virtual cache subject to the invalidate cache coherency operation.]).
Therefore it would have been obvious to a person of ordinary skill at the time of filing to incorporate the features of Turner into the system of Haber for the benefit of a cache coherency operation that is sent to the virtual cache in response to determining that the entry exists in the snoop filter, and which enables to reduce the latency and constrained bandwidth, reduces the complexity such as support for parallel memory management unit (MMU) lookups, and also reduces the cost of the MMU on the critical path to the lowest level caches, and enables to avoid bottlenecks, and prevents the over-invalidation of cache lines(Turner, 0001).

Claim 17 is a method claim that implements steps from the apparatus claim 13, and therefore the corresponding mappings are incorporated. 

Claim(s) 14, 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Haber et al.(20190004801), and further in view of Turner et al.(20220019534), and further in view of Gabor et al.(20210209023).

As per claim 14, the rejection of claim 10 is incorporated, in addition, Haber does not disclose the following, however Gabor discloses:
wherein the execution circuitry is further to execute the decoded instruction according to the opcode to: set a field value according to the opcode for one or more linear address masks for the data indicated by the first source operand(Gabor, [0042 -- In one embodiment, a thread/CPL where the bits are masked out uses the following linear address space of FIG. 3A. FIG. 3A illustrates a linear address 300 with a proper subset of bits 306 (e.g., bit positions 55-52) inside the address space (bit positions 56-0) available for software use. For example, leaving bit positions 302B and 302A to store values that are not to be masked.], [0126 -- Write mask field 870 (EVEX byte 3, bits [2:0]-kkk)—its content specifies the index of a register in the write mask registers as previously described. In one embodiment of the disclosure, the specific value EVEX kkk=000 has a special behavior implying no write mask is used for the particular instruction (this may be implemented in a variety of ways including the use of a write mask hardwired to all ones or hardware that bypasses the masking hardware).]).
Therefore it would have been obvious to a person of ordinary skill at the time of filing to incorporate the features of Gabor into the system of Haber for the benefit of the use of linear address masking on a proper subset of bits inside the address space bits of the linear address, thus improving the performance of a computer by allowing software to use metadata bits inside pointers which the hardware subsequently ignore. The core includes logic to support a packed data instruction set extension, thus allowing the operations used by many multimedia applications to be performed using packed data(Gabor,  0035).

Claim 18 is a method claim that implements steps from the apparatus claim 14, and therefore the corresponding mappings are incorporated. 


Claim(s) 15, 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Haber et al.(20190004801), and further in view of Turner et al.(20220019534), and further in view of Basu et al.(20170337136).

As per claim 15, the rejection of claim 10 is incorporated, in addition, Haber does not disclose the following, however Basu discloses:
wherein the execution circuitry is further to execute the decoded instruction according to the opcode to: set a field value according to the opcode for one or more page table attributes for the data indicated by the first source operand(Basu, [0079 -- The operating system also sets, in metadata 204 for the page table entry 200, cache coherency indicator 306 to indicate the first type of processor (step 608). Continuing the example above, this operation includes setting one or more bits, characters, etc. to indicate a CPU. For example, assuming that there are four bits in the cache coherency indicator and the pattern 0110 indicates a CPU, memory management unit 122 can set the cache coherency indicator to 0110. ]).
Therefore it would have been obvious to a person of ordinary skill at the time of filing to incorporate the features of Basu into the system of Haber for the benefit of avoiding coherence directory lookups can reduce the number of communication on a system bus, reduce power consumption, reduce the number of computational operations performed by the coherence directory, and avoid delay, so that the computing device is operated more efficiently. The operation is performed to enforce coherency between the copy of the data in the cache and the other copies of the data to avoid incoherency/inconsistency between cached copies of data held by processors and the corresponding data in the memory and/or copies of the data held in caches by other processors. (Basu,  0022).

Claim 19 is a method claim that implements steps from the apparatus claim 15, and therefore the corresponding mappings are incorporated. 

Claim(s) 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Haber et al.(20190004801), and further in view of Turner et al.(20220019534), and further in view of Bernat et al.(20180165196).

As per claim 20, the rejection of claim 16 is incorporated, in addition, Haber does not disclose the following, however Turner discloses:
wherein the opcode indicates that the data indicated by the first source operand is to bypass a coherency operation(Turner, [0127 -- In optional block 1204, the virtual cache device may update synchronization status bits for lines in the virtual cache subject to the release synchronization operation command. The synchronization status bits may indicate to the virtual cache device whether there is a pending synchronization operation and what type of cache coherence operation to implement in response to receiving a cache coherence operation while a synchronization operation is pending], [0137 -- In optional block 1232, the virtual cache device may update synchronization status bits for lines in the virtual cache subject to the invalidate cache coherency operation.]), further comprising: 
executing the instruction(Turner, [0056 -- The processor 200 may include a shared processor cache memory 230 that may be dedicated for read and/or write access by the processor cores 202, 204, 206, 208 of the processor 200. The shared processor cache 230 may store data and/or instructions, and make the stored data and/or instructions available to the processor cores 202, 204, 206, 208, for use in execution by the processor cores 202, 204, 206, 208]);
Therefore it would have been obvious to a person of ordinary skill at the time of filing to incorporate the features of Turner into the system of Haber for the benefit of a cache coherency operation that is sent to the virtual cache in response to determining that the entry exists in the snoop filter, and which enables to reduce the latency and constrained bandwidth, reduces the complexity such as support for parallel memory management unit (MMU) lookups, and also reduces the cost of the MMU on the critical path to the lowest level caches, and enables to avoid bottlenecks, and prevents the over-invalidation of cache lines(Turner, 0001).
Turner does not explicitly disclose the following, however Bernat discloses:
executing the instruction further comprises:
executing the decoded instruction according to the opcode to flush any modified data indicated by the first source operand from one or more caches, invalidate any shared data indicated by the first source operand, flush any translation look-aside buffer entries for data indicated by the first source operand, and set one or more tags associated with data indicated by the first source operand to indicate that copies of the data are to bypass the coherency operation(Bernat, [0134 -- At phase 1501, a core 1420 transmits an ENQ command to controller 1470 specifying an address range (e.g., @a-@b) needs to be copied from its cache to memory. Upon acceptance of the command, controller 1470 transmits an acknowledgement back to the core 1420, phase 1502. At phase 1503, controller 1470 copies the memory range from core 1420 to memory]).
Therefore it would have been obvious to a person of ordinary skill at the time of filing to incorporate the features of Bernat into the system of Haber for the benefit of the faster the throughput of instructions, the better the overall performance of the processor. The multimedia applications are accelerated and executed more efficiently by using the full width of processor's data bus for performing operations on packed data. The out-of-order execution logic has number of buffers to smooth out and re-order the flow of instructions to optimize performance as they go down the pipeline and get scheduled for execution.(Bernat,  0038).

   	    Examiner Notes
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Rychlik et al.(20140040552) where the method involves storing shared variable data to cache lines of a cache of a first programmable processor. A store-with-release operation is executed with the first programmable processor. A load-with-acquire operation is executed with a second programmable processor. A value of the data is loaded from a cache of the second programmable processor. The cache of the first programmable processor is snooped with the second programmable processor. A cache hit associated with an updated value of the data is detected with the second programmable processor(Rychlik, 0010).


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ARVIND TALUKDAR whose telephone number is (303)297-4475. The examiner can normally be reached M-F, 10 am-6pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hosain Alam can be reached at 571-272-3978. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

Arvind Talukdar
Primary Examiner
Art Unit 2132



/ARVIND TALUKDAR/Primary Examiner, Art Unit 2132

Read full office action

Prosecution Timeline

Dec 19, 2022

Application Filed

Jan 25, 2023

Response after Non-Final Action

Mar 07, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/820,201

Patent 12602317

MEMORY DEVICE HARDWARE HOST READ ACTIONS BASED ON LOOKUP OPERATION RESULTS

3y 8m to grant Granted Apr 14, 2026

17/385,890

Patent 12591520

LINEAR TO PHYSICAL ADDRESS TRANSLATION WITH SUPPORT FOR PAGE ATTRIBUTES

4y 8m to grant Granted Mar 31, 2026

18/473,030

Patent 12591382

STORAGE DEVICE OPERATION ORCHESTRATION

2y 6m to grant Granted Mar 31, 2026

17/949,803

Patent 12579074

HARDWARE PROCESSOR CORE HAVING A MEMORY SLICED BY LINEAR ADDRESS

3y 5m to grant Granted Mar 17, 2026

17/754,124

Patent 12566712

A RING BUFFER WITH MULTIPLE HEAD POINTERS

3y 11m to grant Granted Mar 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

80%

Grant Probability

84%

With Interview (+3.7%)

2y 9m (~0m remaining)

Median Time to Grant

Low

PTA Risk

Based on 559 resolved cases by this examiner. Grant probability derived from career allowance rate.