DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on November 7, 2025, has been entered.
Claims 1-20 are pending in this office action and presented for examination. Claims 1, 10, and 19 are newly amended by the RCE received December 3, 2025.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 4-8 and 13-17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 4 recites the limitation “an open aperture command” in line 2. However, it is indefinite as to whether this open aperture command is the same as, or different from, “an open aperture command” as recited in claim 1, lines 2-3.
Claims 5-8 are rejected for failing to alleviate the rejection of claim 4 above.
Claim 5 recites the limitation “the open aperture command” in lines 1-2. However, it is indefinite as to whether the antecedent basis for this limitation is “an open aperture command” as recited in claim 1, lines 2-3, or “an open aperture command” as recited in claim 4, line 2.
Claims 6-8 are rejected for failing to alleviate the rejection of claim 5 above.
Claim 13 recites the limitation “an open aperture command” in line 2. However, it is indefinite as to whether this open aperture command is the same as, or different from, “an open aperture command” as recited in claim 10, lines 4-5.
Claims 14-17 are rejected for failing to alleviate the rejection of claim 13 above.
Claim 14 recites the limitation “the open aperture command” in lines 1-2. However, it is indefinite as to whether the antecedent basis for this limitation is “an open aperture command” as recited in claim 10, lines 4-5, or “an open aperture command” as recited in claim 13, line 2.
Claims 15-17 are rejected for failing to alleviate the rejection of claim 14 above.
The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.
The following is a quotation of pre-AIA 35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA 35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.
Claims 4 and 13 are rejected under 35 U.S.C. 112(d) or pre-AIA 35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends. Applicant may cancel the claim(s), amend the claim(s) to place the claim(s) in proper dependent form, rewrite the claim(s) in independent form, or present a sufficient showing that the dependent claim(s) complies with the statutory requirements.
Claim 4 recites the limitation “The method of claim 1, wherein opening the aperture is performed in response to an open aperture command” in lines 1-2. Claim 1, upon which claim 4 is dependent, recites the limitation “opening an aperture for processing partial results per an open aperture command” in lines 2-3. Therefore, claim 4 does not appear to further limit the subject matter of the claim upon which it depends.
Claim 13 recites the limitation “The system of claim 10, wherein opening the aperture is performed in response to an open aperture command” in lines 1-2. Claim 10, upon which claim 13 is dependent, recites the limitation “open the aperture for processing partial results per an open aperture command” in lines 4-5. Therefore, claim 13 does not appear to further limit the subject matter of the claim upon which it depends.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-2, 4, 9-11, 13, and 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (Kim) (US 20070027870 A1) in view of Norrie et al. (Norrie) (US 20210263739 A1) in view of Zbiciak (US 20170308381 A1).
Consider claim 1, Kim discloses a method comprising: opening an aperture for processing partial results per an open aperture command, wherein the aperture is a memory address range for writing operands into ([0023], lines 20-25, in one embodiment, the memory controller manages the partial results by storing them in a buffer before reducing the partial results to a combined result, such as a summation of the partial results, product of the partial results, or other operation; regarding the open aperture command, see [0022], lines 3-15, at operation 401, a task is detected that will use multiple processing elements to compute a number of partial results that will be reduced into a combined result. In one embodiment, the task is indicated by a particular type of instruction within a program being executed by a processing system in which one embodiment is used. In another embodiment, the task is detected by an access by an instruction to a particular memory range, in which a data structure is stored, that is designated to be associated with tasks that will use multiple processing elements to generate a number of partial results, which will then be reduced to a combined result before being stored to the data structure. In other embodiments, the task may be indicated in other ways); receiving the partial results in the aperture ([0023], lines 20-25, in one embodiment, the memory controller manages the partial results by storing them in a buffer before reducing the partial results to a combined result, such as a summation of the partial results, product of the partial results, or other operation) from multiple processing units operating in parallel ([0001], lines 3-4, performing a number of computations in a number of processing elements in parallel; claim 9, lines 4-5, a plurality of processing elements to generate a plurality of partial results of the instruction in parallel; [0018], lines 3-4, processing elements 301, 303, 305, 307); processing the partial results to generate final results ([0023], lines 20-25, in one embodiment, the memory controller manages the partial results by storing them in a buffer before reducing the partial results to a combined result, such as a summation of the partial results, product of the partial results, or other operation; [0019], lines 5-13, in one embodiment, the shared memory and/or main memory contains a data structure 317 that will store a final result of a combined set of calculations from each of the processing elements. For example, in one embodiment, the data structure will contain a sum of a plurality of numbers computed by each of the processing elements, whereas in other embodiments, the data structure may contain a product, or some other mathematical reduction of numbers computed by each of the processing elements; [0020], lines 6-17, the individual processing elements may store a result of the partial calculation performed by each processing element in their respective local copies of the data structure and then may send the partial results concurrently to the shared L2 cache or to the memory controller, in which the partial results can be combined and stored in main memory concurrently, instead of serially as in the prior art. In one embodiment of the invention, the memory controller contains logic to combine the results of the computations produced by the processing elements and store the combined result into the original data structure within the main memory).
In addition, to any extent to which Kim does not implicitly disclose that the aperture is a memory address range for writing operands into, Norrie explicitly discloses a memory address range in general, as well as an aperture being a memory address range for writing operands into in particular ([0079], lines 14-17, the address locations in memory cells of the shared memory 104 can be used to write (store) results of computes that occur at different components of system 100; [0080], lines 1-5, the system 100 includes an operator/accumulator unit 320 (“operator 320”) that is (or can be) coupled to the shared memory 104. The operator 320 is configured to accumulate values based on one or more arithmetic operations). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Norrie with the invention of Kim to facilitate data storage. Alternatively, this modification merely entails combining prior art elements (the cited prior art elements of Kim, and the well-known concept of a memory address range as explicitly disclosed by Norrie) according to known methods (Examiner submits that memory addressing in general was known to one of ordinary skill in the art before the effective filing date of the claimed invention, as also evidenced by Norrie) to yield predictable results (the invention of Kim, wherein the aperture comprises, in particular, a memory address range, for writing operands into), which is a rationale that may support a conclusion of obviousness as per MPEP 2143.
However, the combination thus far does not entail closing the aperture per a close aperture command, wherein after the aperture is closed, the multiple processing units are not permitted to write into the aperture.
On the other hand, Zbiciak discloses, in addition to an open command ([0198], lines 3-4, A STROPEN instruction opens a stream), closing per a close command ([0205], lines 1-2, a STRCLOSE instruction closes a stream in active state 3602), wherein after closing, not permitting data to be written from memory to be operated on ([0205], lines 7-12, a STRCLOSE instruction ends the corresponding stream immediately, allowing a program to prematurely terminate one or both streams. A STRCLOSE instruction does not stall processor 100, and does not wait for the stream to finish sending requests).
Zbiciak’s teaching allows a program to specifically state it no longer needs resources, and clears state (Zbiciak, [0207], lines 1-8), which facilitates multitasking by enabling other tasks to use those resources and increases security by preventing other tasks from seeing state. Zbiciak’s teaching also enables a known state to be reached (Zbiciak, [0206], lines 6-9).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Zbiciak with the combination of Kim and Norrie in order to facilitate reuse of resources, increase security, and enable a known state to be reached. Note that Zbiciak’s teaching of an open command and a close command to respectively permit and not permit data to be written from memory to be operated on, when applied to the combination of Kim and Norrie wherein data to be operated on is written from multiple processing units to a buffer, results in the overall claimed limitation of closing the aperture per a close aperture command, wherein after the aperture is closed, the multiple processing units are not permitted to write into the aperture. Alternatively, this modification merely entails combining prior art elements (the prior art elements cited above, including an aperture, of the combination of Kim and Norrie, and Zbiciak’s teaching of an open and close command) according to known methods (Examiner submits the general concept of opening and closing to regulate data flow is known, such as opening and closing network ports) to yield predictable results (the combination of Kim and Norrie, wherein writing to an aperture is or is not permitted based on an open aperture command and a close aperture command respectively), which is an example of a rationale that may support a conclusion of obviousness as per MPEP 2143.
Consider claim 2, the overall combination entails the method of claim 1 (see above), wherein the aperture comprises a memory address into which the partial results are written (Kim, [0023], lines 20-25, in one embodiment, the memory controller manages the partial results by storing them in a buffer before reducing the partial results to a combined result, such as a summation of the partial results, product of the partial results, or other operation; Norrie, [0079], lines 14-17, the address locations in memory cells of the shared memory 104 can be used to write (store) results of computes that occur at different components of system 100; [0080], lines 1-5, the system 100 includes an operator/accumulator unit 320 (“operator 320”) that is (or can be) coupled to the shared memory 104. The operator 320 is configured to accumulate values based on one or more arithmetic operations).
Consider claim 4, the overall combination entails the method of claim 1 (see above), wherein opening the aperture is performed in response to an open aperture command (Kim, [0022], lines 3-15, at operation 401, a task is detected that will use multiple processing elements to compute a number of partial results that will be reduced into a combined result. In one embodiment, the task is indicated by a particular type of instruction within a program being executed by a processing system in which one embodiment is used. In another embodiment, the task is detected by an access by an instruction to a particular memory range, in which a data structure is stored, that is designated to be associated with tasks that will use multiple processing elements to generate a number of partial results, which will then be reduced to a combined result before being stored to the data structure. In other embodiments, the task may be indicated in other ways).
Consider claim 9, the overall combination entails the method of claim 1 (see above), wherein the partial results are generated in parallel (Kim, [0021], lines 9-10, moreover, embodiments of the invention may concurrently generate partial results).
Consider claim 10, Kim discloses a system comprising: a memory configured to store data for an aperture ([0018], line 5, local memories; [0023], line 22, buffer); and a hardware processor ([0018], lines 5-9, processor) configured to: open the aperture for processing partial results per an open aperture command, wherein the aperture is a memory address range for writing operands into ([0023], lines 20-25, in one embodiment, the memory controller manages the partial results by storing them in a buffer before reducing the partial results to a combined result, such as a summation of the partial results, product of the partial results, or other operation; regarding the open aperture command, see [0022], lines 3-15, at operation 401, a task is detected that will use multiple processing elements to compute a number of partial results that will be reduced into a combined result. In one embodiment, the task is indicated by a particular type of instruction within a program being executed by a processing system in which one embodiment is used. In another embodiment, the task is detected by an access by an instruction to a particular memory range, in which a data structure is stored, that is designated to be associated with tasks that will use multiple processing elements to generate a number of partial results, which will then be reduced to a combined result before being stored to the data structure. In other embodiments, the task may be indicated in other ways); receiving the partial results in the aperture ([0023], lines 20-25, in one embodiment, the memory controller manages the partial results by storing them in a buffer before reducing the partial results to a combined result, such as a summation of the partial results, product of the partial results, or other operation) from multiple processing units of the hardware processor operating in parallel ([0001], lines 3-4, performing a number of computations in a number of processing elements in parallel; claim 9, lines 4-5, a plurality of processing elements to generate a plurality of partial results of the instruction in parallel; [0018], lines 3-4, processing elements 301, 303, 305, 307); processing the partial results to generate final results ([0023], lines 20-25, in one embodiment, the memory controller manages the partial results by storing them in a buffer before reducing the partial results to a combined result, such as a summation of the partial results, product of the partial results, or other operation; [0019], lines 5-13, in one embodiment, the shared memory and/or main memory contains a data structure 317 that will store a final result of a combined set of calculations from each of the processing elements. For example, in one embodiment, the data structure will contain a sum of a plurality of numbers computed by each of the processing elements, whereas in other embodiments, the data structure may contain a product, or some other mathematical reduction of numbers computed by each of the processing elements; [0020], lines 6-17, the individual processing elements may store a result of the partial calculation performed by each processing element in their respective local copies of the data structure and then may send the partial results concurrently to the shared L2 cache or to the memory controller, in which the partial results can be combined and stored in main memory concurrently, instead of serially as in the prior art. In one embodiment of the invention, the memory controller contains logic to combine the results of the computations produced by the processing elements and store the combined result into the original data structure within the main memory).
In addition, to any extent to which Kim does not implicitly disclose that the aperture is a memory address range for writing operands into, Norrie explicitly discloses a memory address range in general, as well as an aperture being a memory address range for writing operands into in particular ([0079], lines 14-17, the address locations in memory cells of the shared memory 104 can be used to write (store) results of computes that occur at different components of system 100; [0080], lines 1-5, the system 100 includes an operator/accumulator unit 320 (“operator 320”) that is (or can be) coupled to the shared memory 104. The operator 320 is configured to accumulate values based on one or more arithmetic operations). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Norrie with the invention of Kim to facilitate data storage. Alternatively, this modification merely entails combining prior art elements (the cited prior art elements of Kim, and the well-known concept of a memory address range as explicitly disclosed by Norrie) according to known methods (Examiner submits that memory addressing in general was known to one of ordinary skill in the art before the effective filing date of the claimed invention, as also evidenced by Norrie) to yield predictable results (the invention of Kim, wherein the aperture comprises, in particular, a memory address range, for writing operands into), which is a rationale that may support a conclusion of obviousness as per MPEP 2143.
However, the combination thus far does not entail closing the aperture per a close aperture command, wherein after the aperture is closed, the multiple processing units are not permitted to write into the aperture.
On the other hand, Zbiciak discloses, in addition to an open command ([0198], lines 3-4, A STROPEN instruction opens a stream), closing per a close command ([0205], lines 1-2, a STRCLOSE instruction closes a stream in active state 3602), wherein after closing, not permitting data to be written from memory to be operated on ([0205], lines 7-12, a STRCLOSE instruction ends the corresponding stream immediately, allowing a program to prematurely terminate one or both streams. A STRCLOSE instruction does not stall processor 100, and does not wait for the stream to finish sending requests).
Zbiciak’s teaching allows a program to specifically state it no longer needs resources, and clears state (Zbiciak, [0207], lines 1-8), which facilitates multitasking by enabling other tasks to use those resources and increases security by preventing other tasks from seeing state. Zbiciak’s teaching also enables a known state to be reached (Zbiciak, [0206], lines 6-9).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Zbiciak with the combination of Kim and Norrie in order to facilitate reuse of resources, increase security, and enable a known state to be reached. Note that Zbiciak’s teaching of an open command and a close command to respectively permit and not permit data to be written from memory to be operated on, when applied to the combination of Kim and Norrie wherein data to be operated on is written from multiple processing units to a buffer, results in the overall claimed limitation of closing the aperture per a close aperture command, wherein after the aperture is closed, the multiple processing units are not permitted to write into the aperture. Alternatively, this modification merely entails combining prior art elements (the prior art elements cited above, including an aperture, of the combination of Kim and Norrie, and Zbiciak’s teaching of an open and close command) according to known methods (Examiner submits the general concept of opening and closing to regulate data flow is known, such as opening and closing network ports) to yield predictable results (the combination of Kim and Norrie, wherein writing to an aperture is or is not permitted based on an open aperture command and a close aperture command respectively), which is an example of a rationale that may support a conclusion of obviousness as per MPEP 2143.
Consider claim 11, the overall combination entails the system of claim 10 (see above), wherein the aperture comprises a memory address into which the partial results are written (Kim, [0023], lines 20-25, in one embodiment, the memory controller manages the partial results by storing them in a buffer before reducing the partial results to a combined result, such as a summation of the partial results, product of the partial results, or other operation; Norrie, [0079], lines 14-17, the address locations in memory cells of the shared memory 104 can be used to write (store) results of computes that occur at different components of system 100; [0080], lines 1-5, the system 100 includes an operator/accumulator unit 320 (“operator 320”) that is (or can be) coupled to the shared memory 104. The operator 320 is configured to accumulate values based on one or more arithmetic operations).
Consider claim 13, the overall combination entails the system of claim 10 (see above), wherein opening the aperture is performed in response to an open aperture command (Kim, [0022], lines 3-15, at operation 401, a task is detected that will use multiple processing elements to compute a number of partial results that will be reduced into a combined result. In one embodiment, the task is indicated by a particular type of instruction within a program being executed by a processing system in which one embodiment is used. In another embodiment, the task is detected by an access by an instruction to a particular memory range, in which a data structure is stored, that is designated to be associated with tasks that will use multiple processing elements to generate a number of partial results, which will then be reduced to a combined result before being stored to the data structure. In other embodiments, the task may be indicated in other ways).
Consider claim 18, the overall combination entails the system of claim 10 (see above), wherein the partial results are generated in parallel (Kim, [0021], lines 9-10, moreover, embodiments of the invention may concurrently generate partial results).
Consider claim 19, Kim discloses a non-transitory computer-readable medium storing instructions that, when executed, cause a processor to perform operations ([0032], lines 4-8, aspects may be implemented using instructions stored on a machine-readable medium (software), which if executed by a processor, would cause the processor to perform a method to carry out embodiments of the invention) comprising: opening an aperture for processing partial results per an open aperture command, wherein the aperture is a memory address range for writing operands into ([0023], lines 20-25, in one embodiment, the memory controller manages the partial results by storing them in a buffer before reducing the partial results to a combined result, such as a summation of the partial results, product of the partial results, or other operation; regarding the open aperture command, see [0022], lines 3-15, at operation 401, a task is detected that will use multiple processing elements to compute a number of partial results that will be reduced into a combined result. In one embodiment, the task is indicated by a particular type of instruction within a program being executed by a processing system in which one embodiment is used. In another embodiment, the task is detected by an access by an instruction to a particular memory range, in which a data structure is stored, that is designated to be associated with tasks that will use multiple processing elements to generate a number of partial results, which will then be reduced to a combined result before being stored to the data structure. In other embodiments, the task may be indicated in other ways); receiving the partial results in the aperture ([0023], lines 20-25, in one embodiment, the memory controller manages the partial results by storing them in a buffer before reducing the partial results to a combined result, such as a summation of the partial results, product of the partial results, or other operation) from multiple processing units operating in parallel ([0001], lines 3-4, performing a number of computations in a number of processing elements in parallel; claim 9, lines 4-5, a plurality of processing elements to generate a plurality of partial results of the instruction in parallel; [0018], lines 3-4, processing elements 301, 303, 305, 307); processing the partial results to generate final results ([0023], lines 20-25, in one embodiment, the memory controller manages the partial results by storing them in a buffer before reducing the partial results to a combined result, such as a summation of the partial results, product of the partial results, or other operation; [0019], lines 5-13, in one embodiment, the shared memory and/or main memory contains a data structure 317 that will store a final result of a combined set of calculations from each of the processing elements. For example, in one embodiment, the data structure will contain a sum of a plurality of numbers computed by each of the processing elements, whereas in other embodiments, the data structure may contain a product, or some other mathematical reduction of numbers computed by each of the processing elements; [0020], lines 6-17, the individual processing elements may store a result of the partial calculation performed by each processing element in their respective local copies of the data structure and then may send the partial results concurrently to the shared L2 cache or to the memory controller, in which the partial results can be combined and stored in main memory concurrently, instead of serially as in the prior art. In one embodiment of the invention, the memory controller contains logic to combine the results of the computations produced by the processing elements and store the combined result into the original data structure within the main memory).
In addition, to any extent to which Kim does not implicitly disclose that the aperture is a memory address range for writing operands into, Norrie explicitly discloses a memory address range in general, as well as an aperture being a memory address range for writing operands into in particular ([0079], lines 14-17, the address locations in memory cells of the shared memory 104 can be used to write (store) results of computes that occur at different components of system 100; [0080], lines 1-5, the system 100 includes an operator/accumulator unit 320 (“operator 320”) that is (or can be) coupled to the shared memory 104. The operator 320 is configured to accumulate values based on one or more arithmetic operations). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Norrie with the invention of Kim to facilitate data storage. Alternatively, this modification merely entails combining prior art elements (the cited prior art elements of Kim, and the well-known concept of a memory address range as explicitly disclosed by Norrie) according to known methods (Examiner submits that memory addressing in general was known to one of ordinary skill in the art before the effective filing date of the claimed invention, as also evidenced by Norrie) to yield predictable results (the invention of Kim, wherein the aperture comprises, in particular, a memory address range, for writing operands into), which is a rationale that may support a conclusion of obviousness as per MPEP 2143.
However, the combination thus far does not entail closing the aperture per a close aperture command, wherein after the aperture is closed, the multiple processing units are not permitted to write into the aperture.
On the other hand, Zbiciak discloses, in addition to an open command ([0198], lines 3-4, A STROPEN instruction opens a stream), closing per a close command ([0205], lines 1-2, a STRCLOSE instruction closes a stream in active state 3602), wherein after closing, not permitting data to be written from memory to be operated on ([0205], lines 7-12, a STRCLOSE instruction ends the corresponding stream immediately, allowing a program to prematurely terminate one or both streams. A STRCLOSE instruction does not stall processor 100, and does not wait for the stream to finish sending requests).
Zbiciak’s teaching allows a program to specifically state it no longer needs resources, and clears state (Zbiciak, [0207], lines 1-8), which facilitates multitasking by enabling other tasks to use those resources and increases security by preventing other tasks from seeing state. Zbiciak’s teaching also enables a known state to be reached (Zbiciak, [0206], lines 6-9).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Zbiciak with the combination of Kim and Norrie in order to facilitate reuse of resources, increase security, and enable a known state to be reached. Note that Zbiciak’s teaching of an open command and a close command to respectively permit and not permit data to be written from memory to be operated on, when applied to the combination of Kim and Norrie wherein data to be operated on is written from multiple processing units to a buffer, results in the overall claimed limitation of closing the aperture per a close aperture command, wherein after the aperture is closed, the multiple processing units are not permitted to write into the aperture. Alternatively, this modification merely entails combining prior art elements (the prior art elements cited above, including an aperture, of the combination of Kim and Norrie, and Zbiciak’s teaching of an open and close command) according to known methods (Examiner submits the general concept of opening and closing to regulate data flow is known, such as opening and closing network ports) to yield predictable results (the combination of Kim and Norrie, wherein writing to an aperture is or is not permitted based on an open aperture command and a close aperture command respectively), which is an example of a rationale that may support a conclusion of obviousness as per MPEP 2143.
Consider claim 20, the overall combination entails the non-transitory computer-readable medium of claim 19 (see above), wherein the aperture comprises a memory address into which the partial results are written (Kim, [0023], lines 20-25, in one embodiment, the memory controller manages the partial results by storing them in a buffer before reducing the partial results to a combined result, such as a summation of the partial results, product of the partial results, or other operation; Norrie, [0079], lines 14-17, the address locations in memory cells of the shared memory 104 can be used to write (store) results of computes that occur at different components of system 100; [0080], lines 1-5, the system 100 includes an operator/accumulator unit 320 (“operator 320”) that is (or can be) coupled to the shared memory 104. The operator 320 is configured to accumulate values based on one or more arithmetic operations).
Claim(s) 3 and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kim, Norrie, and Zbiciak as applied to claims 1 and 10 above, and further in view of Park (US 20180315153 A1).
Consider claim 3, the combination thus far discloses the method of claim 1 (see above), but does not disclose receiving a first portion of the partial results occurs concurrently with processing a second portion of the partial results.
On the other hand, Park discloses receiving a first portion of partial results occurs concurrently with processing a second portion of the partial results ([0139], lines 1-22, the execution clusters 810 and 812 are programmable circuits that performs computation operations. For this purpose, the execution clusters 810 and 812 may include the multiplier circuits FE0 through FEN, a compressor 1010 and a multi-cycle accumulator 1014. Each of the multiplier circuits FE0 through FEN may store a pixel value in the read data 1008 and a corresponding filter element value in the kernel memory 808. The pixel value and the corresponding filter element value are multiplied in the multiplier circuit to generate a multiplied value 1009. In some embodiments, the compressor 1010 receives the multiplied values 1009 and accumulates subsets of multiplied values 1009 to generate compressed values 1012. In other embodiments, instead of accumulating the subsets of multiplied values 1009, the compressor 1010 may select (i) a minimum value, (ii) a maximum value, or (iii) a median value from each subset of multiplied values 1009. The multi-cycle accumulator 1014 receives the compressed values 1012 and performs accumulation (or selection of a minimum value, a maximum value or a media value) on the compressed values 1012 generated across multiple processing cycles of the convolution core 802).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Park with the combination of Kim, Norrie, and Zbiciak in order to increase system performance via concurrency. Alternatively, this modification merely entails combining prior art elements (the cited prior art elements of Kim, Norrie, and Zbiciak, as well as Kim’s disclosure that in one embodiment, each processing element may return its respective partial result as soon as it has been computed rather than waiting for other processing elements to complete their respective computations, in [0023], lines 14-17, and the accumulation across multiple processing cycles as disclosed by Park) according to known methods (Examiner submits that the concept of performing a running calculation was known to one of ordinary skill in the art before the effective filing date of the claimed invention) to yield predictable results (the combination of Kim, Norrie, and Zbiciak, wherein receiving a first portion of the partial results occurs concurrently with processing a second portion of the partial results), which is a rationale that may support a conclusion of obviousness as per MPEP 2143.
Consider claim 12, the combination thus far discloses the system of claim 10 (see above), but does not disclose receiving a first portion of the partial results occurs at least partially concurrently with processing a second portion of the partial results.
On the other hand, Park discloses receiving a first portion of partial results occurs at least partially concurrently with processing a second portion of the partial results ([0139], lines 1-22, the execution clusters 810 and 812 are programmable circuits that performs computation operations. For this purpose, the execution clusters 810 and 812 may include the multiplier circuits FE0 through FEN, a compressor 1010 and a multi-cycle accumulator 1014. Each of the multiplier circuits FE0 through FEN may store a pixel value in the read data 1008 and a corresponding filter element value in the kernel memory 808. The pixel value and the corresponding filter element value are multiplied in the multiplier circuit to generate a multiplied value 1009. In some embodiments, the compressor 1010 receives the multiplied values 1009 and accumulates subsets of multiplied values 1009 to generate compressed values 1012. In other embodiments, instead of accumulating the subsets of multiplied values 1009, the compressor 1010 may select (i) a minimum value, (ii) a maximum value, or (iii) a median value from each subset of multiplied values 1009. The multi-cycle accumulator 1014 receives the compressed values 1012 and performs accumulation (or selection of a minimum value, a maximum value or a media value) on the compressed values 1012 generated across multiple processing cycles of the convolution core 802).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Park with the combination of Kim, Norrie, and Zbiciak in order to increase system performance via concurrency. Alternatively, this modification merely entails combining prior art elements (the cited prior art elements of Kim, Norrie, and Zbiciak, as well as Kim’s disclosure that in one embodiment, each processing element may return its respective partial result as soon as it has been computed rather than waiting for other processing elements to complete their respective computations, in [0023], lines 14-17, and the accumulation across multiple processing cycles as disclosed by Park) according to known methods (Examiner submits that the concept of performing a running calculation was known to one of ordinary skill in the art before the effective filing date of the claimed invention) to yield predictable results (the combination of Kim, Norrie, and Zbiciak, wherein receiving a first portion of the partial results occurs at least partially concurrently with processing a second portion of the partial results), which is a rationale that may support a conclusion of obviousness as per MPEP 2143.
Claim(s) 5-8 and 14-17 are is/are rejected under 35 U.S.C. 103 as being unpatentable over Kim, Norrie, and Zbiciak as applied to claims 4 and 13 above, and further in view of Sprangle et al. (Sprangle) (US 20090172349 A1).
Consider claim 5, the combination thus far entails the method of claim 4, wherein the open aperture command specifies an address of an output buffer (Kim, [0022], lines 3-15, at operation 401, a task is detected that will use multiple processing elements to compute a number of partial results that will be reduced into a combined result. In one embodiment, the task is indicated by a particular type of instruction within a program being executed by a processing system in which one embodiment is used. In another embodiment, the task is detected by an access by an instruction to a particular memory range, in which a data structure is stored, that is designated to be associated with tasks that will use multiple processing elements to generate a number of partial results, which will then be reduced to a combined result before being stored to the data structure. In other embodiments, the task may be indicated in other ways), and an operator ([0022], lines 3-15, at operation 401, a task is detected that will use multiple processing elements to compute a number of partial results that will be reduced into a combined result. In one embodiment, the task is indicated by a particular type of instruction within a program being executed by a processing system in which one embodiment is used. In another embodiment, the task is detected by an access by an instruction to a particular memory range, in which a data structure is stored, that is designated to be associated with tasks that will use multiple processing elements to generate a number of partial results, which will then be reduced to a combined result before being stored to the data structure. In other embodiments, the task may be indicated in other ways; [0019], lines 5-13, in one embodiment, the shared memory and/or main memory contains a data structure 317 that will store a final result of a combined set of calculations from each of the processing elements. For example, in one embodiment, the data structure will contain a sum of a plurality of numbers computed by each of the processing elements, whereas in other embodiments, the data structure may contain a product, or some other mathematical reduction of numbers computed by each of the processing elements).
However, the combination thus far does not entail that the open aperture command specifies an input data type and an output data type.
On the other hand, Sprangle discloses a command specifies an input data type and an output data type (FIG. 7, which shows the command specifying an addition entailing input data type float16 and output data type F32 (float 32)). (Note that Sprangle also discloses an address of an output buffer, e.g., the destination argument 84, 0b1000, in FIG. 8, and an operator, e.g., opcode 172 in FIG. 7.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Sprangle with the combination of Kim, Norrie, and Zbiciak in order to increase processor capability by supporting different data types. Alternatively, this modification merely entails combining prior art elements (the cited prior art elements of Kim, Norrie, and Zbiciak, and Sprangle’s teaching of a command specifying an input data type and an output data type) according to known methods (Examiner submits that the concept of an input data type and an output data type was known to one of ordinary skill in the art before the effective filing date of the claimed invention) to yield predictable results (the combination of Kim, Norrie, and Zbiciak, entailing the command specifying an input data type and an output data type), which is a rationale that may support a conclusion of obviousness as per MPEP 2143.
Consider claim 6, the overall combination entails the method of claim 5 (see above), wherein the processing comprises storing the final results in the output buffer (Kim, [0019], lines 5-13, in one embodiment, the shared memory and/or main memory contains a data structure 317 that will store a final result of a combined set of calculations from each of the processing elements. For example, in one embodiment, the data structure will contain a sum of a plurality of numbers computed by each of the processing elements, whereas in other embodiments, the data structure may contain a product, or some other mathematical reduction of numbers computed by each of the processing elements; [0020], lines 6-17, the individual processing elements may store a result of the partial calculation performed by each processing element in their respective local copies of the data structure and then may send the partial results concurrently to the shared L2 cache or to the memory controller, in which the partial results can be combined and stored in main memory concurrently, instead of serially as in the prior art. In one embodiment of the invention, the memory controller contains logic to combine the results of the computations produced by the processing elements and store the combined result into the original data structure within the main memory).
Consider claim 7, the overall combination entails the method of claim 5 (see above), wherein the operator specifies a fixed operation or a programmatically defined operation (Kim, [0019], lines 5-13, in one embodiment, the shared memory and/or main memory contains a data structure 317 that will store a final result of a combined set of calculations from each of the processing elements. For example, in one embodiment, the data structure will contain a sum of a plurality of numbers computed by each of the processing elements, whereas in other embodiments, the data structure may contain a product, or some other mathematical reduction of numbers computed by each of the processing elements; [0020], lines 6-17, the individual processing elements may store a result of the partial calculation performed by each processing element in their respective local copies of the data structure and then may send the partial results concurrently to the shared L2 cache or to the memory controller, in which the partial results can be combined and stored in main memory concurrently, instead of serially as in the prior art. In one embodiment of the invention, the memory controller contains logic to combine the results of the computations produced by the processing elements and store the combined result into the original data structure within the main memory; [0022], lines 3-15, at operation 401, a task is detected that will use multiple processing elements to compute a number of partial results that will be reduced into a combined result. In one embodiment, the task is indicated by a particular type of instruction within a program being executed by a processing system in which one embodiment is used. In another embodiment, the task is detected by an access by an instruction to a particular memory range, in which a data structure is stored, that is designated to be associated with tasks that will use multiple processing elements to generate a number of partial results, which will then be reduced to a combined result before being stored to the data structure. In other embodiments, the task may be indicated in other ways).
Consider claim 8, the overall combination entails the method of claim 5 (see above), wherein the processing comprises applying the operator to the partial results to generate the final results (Kim, [0019], lines 5-13, in one embodiment, the shared memory and/or main memory contains a data structure 317 that will store a final result of a combined set of calculations from each of the processing elements. For example, in one embodiment, the data structure will contain a sum of a plurality of numbers computed by each of the processing elements, whereas in other embodiments, the data structure may contain a product, or some other mathematical reduction of numbers computed by each of the processing elements; [0020], lines 6-17, the individual processing elements may store a result of the partial calculation performed by each processing element in their respective local copies of the data structure and then may send the partial results concurrently to the shared L2 cache or to the memory controller, in which the partial results can be combined and stored in main memory concurrently, instead of serially as in the prior art. In one embodiment of the invention, the memory controller contains logic to combine the results of the computations produced by the processing elements and store the combined result into the original data structure within the main memory).
Consider claim 14, the combination thus far entails the system of claim 13, wherein the open aperture command specifies an address of an output buffer (Kim, [0022], lines 3-15, at operation 401, a task is detected that will use multiple processing elements to compute a number of partial results that will be reduced into a combined result. In one embodiment, the task is indicated by a particular type of instruction within a program being executed by a processing system in which one embodiment is used. In another embodiment, the task is detected by an access by an instruction to a particular memory range, in which a data structure is stored, that is designated to be associated with tasks that will use multiple processing elements to generate a number of partial results, which will then be reduced to a combined result before being stored to the data structure. In other embodiments, the task may be indicated in other ways), and an operator ([0022], lines 3-15, at operation 401, a task is detected that will use multiple processing elements to compute a number of partial results that will be reduced into a combined result. In one embodiment, the task is indicated by a particular type of instruction within a program being executed by a processing system in which one embodiment is used. In another embodiment, the task is detected by an access by an instruction to a particular memory range, in which a data structure is stored, that is designated to be associated with tasks that will use multiple processing elements to generate a number of partial results, which will then be reduced to a combined result before being stored to the data structure. In other embodiments, the task may be indicated in other ways; [0019], lines 5-13, in one embodiment, the shared memory and/or main memory contains a data structure 317 that will store a final result of a combined set of calculations from each of the processing elements. For example, in one embodiment, the data structure will contain a sum of a plurality of numbers computed by each of the processing elements, whereas in other embodiments, the data structure may contain a product, or some other mathematical reduction of numbers computed by each of the processing elements).
However, the combination thus far does not entail that the open aperture command specifies an input data type and an output data type.
On the other hand, Sprangle discloses a command specifies an input data type and an output data type (FIG. 7, which shows the command specifying an addition entailing input data type float16 and output data type F32 (float 32)). (Note that Sprangle also discloses an address of an output buffer, e.g., the destination argument 84, 0b1000, in FIG. 8, and an operator, e.g., opcode 172 in FIG. 7.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Sprangle with the combination of Kim, Norrie, and Zbiciak in order to increase processor capability by supporting different data types. Alternatively, this modification merely entails combining prior art elements (the cited prior art elements of Kim, Norrie, and Zbiciak, and Sprangle’s teaching of a command specifying an input data type and an output data type) according to known methods (Examiner submits that the concept of an input data type and an output data type was known to one of ordinary skill in the art before the effective filing date of the claimed invention) to yield predictable results (the combination of Kim, Norrie, and Zbiciak, entailing the command specifying an input data type and an output data type), which is a rationale that may support a conclusion of obviousness as per MPEP 2143.
Consider claim 15, the overall combination entails the system of claim 14 (see above), wherein the processing comprises storing the final results in the output buffer (Kim, [0019], lines 5-13, in one embodiment, the shared memory and/or main memory contains a data structure 317 that will store a final result of a combined set of calculations from each of the processing elements. For example, in one embodiment, the data structure will contain a sum of a plurality of numbers computed by each of the processing elements, whereas in other embodiments, the data structure may contain a product, or some other mathematical reduction of numbers computed by each of the processing elements; [0020], lines 6-17, the individual processing elements may store a result of the partial calculation performed by each processing element in their respective local copies of the data structure and then may send the partial results concurrently to the shared L2 cache or to the memory controller, in which the partial results can be combined and stored in main memory concurrently, instead of serially as in the prior art. In one embodiment of the invention, the memory controller contains logic to combine the results of the computations produced by the processing elements and store the combined result into the original data structure within the main memory).
Consider claim 16, the overall combination entails the system of claim 14 (see above), wherein the operator specifies a fixed operation or a programmatically defined operation (Kim, [0019], lines 5-13, in one embodiment, the shared memory and/or main memory contains a data structure 317 that will store a final result of a combined set of calculations from each of the processing elements. For example, in one embodiment, the data structure will contain a sum of a plurality of numbers computed by each of the processing elements, whereas in other embodiments, the data structure may contain a product, or some other mathematical reduction of numbers computed by each of the processing elements; [0020], lines 6-17, the individual processing elements may store a result of the partial calculation performed by each processing element in their respective local copies of the data structure and then may send the partial results concurrently to the shared L2 cache or to the memory controller, in which the partial results can be combined and stored in main memory concurrently, instead of serially as in the prior art. In one embodiment of the invention, the memory controller contains logic to combine the results of the computations produced by the processing elements and store the combined result into the original data structure within the main memory; [0022], lines 3-15, at operation 401, a task is detected that will use multiple processing elements to compute a number of partial results that will be reduced into a combined result. In one embodiment, the task is indicated by a particular type of instruction within a program being executed by a processing system in which one embodiment is used. In another embodiment, the task is detected by an access by an instruction to a particular memory range, in which a data structure is stored, that is designated to be associated with tasks that will use multiple processing elements to generate a number of partial results, which will then be reduced to a combined result before being stored to the data structure. In other embodiments, the task may be indicated in other ways).
Consider claim 17, the overall combination entails the system of claim 14 (see above), wherein the processing comprises applying the operator to the partial results to generate the final results (Kim, [0019], lines 5-13, in one embodiment, the shared memory and/or main memory contains a data structure 317 that will store a final result of a combined set of calculations from each of the processing elements. For example, in one embodiment, the data structure will contain a sum of a plurality of numbers computed by each of the processing elements, whereas in other embodiments, the data structure may contain a product, or some other mathematical reduction of numbers computed by each of the processing elements; [0020], lines 6-17, the individual processing elements may store a result of the partial calculation performed by each processing element in their respective local copies of the data structure and then may send the partial results concurrently to the shared L2 cache or to the memory controller, in which the partial results can be combined and stored in main memory concurrently, instead of serially as in the prior art. In one embodiment of the invention, the memory controller contains logic to combine the results of the computations produced by the processing elements and store the combined result into the original data structure within the main memory).
Response to Arguments
Applicant on page 8 argues: "The Examiner objected to the specification because the abstract is unclear. The abstract has been amended in response. Withdrawal of the objection to the specification is therefore respectfully requested."
In view of the aforementioned amended abstract, the previously presented objections to the specification is withdrawn.
Applicant on page 8 argues: ‘Claims 1-9 stand rejected under 35 U.S.C. 112, 1st paragraph. Claims 1-9 stand rejected under 35 U.S.C. 112, 2nd paragraph. These rejections are both related to the use of the open-ended term "comprises" in claim 1. This word has changed to "is" and thus Applicants request withdrawal of the rejections under 35 USC 112.’
In view of the aforementioned amendments, the previously presented rejections under 35 USC 112(a) and (b) are withdrawn.
Applicant on page 9 argues: "Claims 1, 10, and 19 have been amended to recite ... For the foregoing reasons, Applicant submits that the cited references do not teach each and every feature of claims 1, 10, and 19. Thus, Applicant requests withdrawal of the rejections of claims 1, 10, and 19 and all claims dependent thereon."
In view of the aforementioned amendment, Examiner is newly relying upon the Zbiciak reference — see the Claim Rejections - 35 USC § 103 section above.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEITH E VICARY whose telephone number is (571)270-1314. The examiner can normally be reached Monday to Friday, 9:00 AM to 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached at (571)270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/KEITH E VICARY/ Primary Examiner, Art Unit 2183