DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1 – 18 are pending.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1 – 5 and 11 – 14 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 4 – 7, 11 and 14 – 16 of U.S. Patent No. 12,189,527. Although the claims at issue are not identical, they are not patentably distinct from each other because the claims at issue are anticipated by said Patent 12,189,527.
Instant Application
Patent 12,189,527
1. A method of managing a unified virtual memory (UVM) that is backed by a main processor memory and a coprocessor memory, the method comprising:
in a forward propagation stage of a deep learning model, executing a first layer of the deep learning model to determine a second data block of the UVM storing an output of the first layer;
prefetching the second data block into the main processor memory;
based on the second data block being prefetched into the main processor memory, in a backward propagation stage of the deep learning model, prefetching the second data block from the main processor memory into the coprocessor memory; and
updating a parameter of the first layer using the second data block.
1. A method of managing a unified virtual memory (UVM) that is backed by a main processor memory and a coprocessor memory, the method comprising:
4. The method of claim 1, further comprising:
in a forward propagation stage of the deep learning model, executing a first layer of the deep learning model to determine a second data block of the UVM storing an output of the first layer;
prefetching the second data block into the main processor memory;
based on the second data block being prefetched into the main processor memory, in a backward propagation stage of the deep learning model, prefetching the second data block from the main processor memory into the coprocessor memory; and
updating a parameter of the first layer using the second data block.
5. The method of claim 1, further comprising:
checking properties of data blocks of the UVM used to execute the deep learning model;
based on a first of the data blocks storing weight data of the deep learning model, selecting, between the main processor memory and the coprocessor memory, the main processor memory to store the first data block therein; and
performing an operation of the deep learning model based on the first data block using a coprocessor while directly loading at least a portion of the first data block from the main processor memory into a cache memory of the coprocessor without migration of the first data block from the main processor memory to the coprocessor memory.
1. checking properties of data blocks of the UVM used to execute a deep learning model;
based on a first of the data blocks storing weight data of the deep learning model, storing the first data block in the main processor memory among the main processor memory and the coprocessor memory; and
performing an operation of the deep learning model based on the first data block using a coprocessor while directly loading at least a portion of the first data block from the main processor memory into a cache memory of the coprocessor without migration of the first data block from the main processor memory to the coprocessor memory.
11. An electronic device using a unified virtual memory (UVM) that is backed by a main processor memory and a coprocessor memory, the electronic device comprising:
one or more processors; and
a memory storing instructions configured to cause the one or more processors to:
in a forward propagation stage of a deep learning model, execute a first layer of the deep learning model to determine a second data block of the UVM storing an output of the first layer;
prefetch the second data block into the main processor memory;
based on the second data block being prefetched into the main processor memory, in a backward propagation stage of the deep learning model, prefetch the second data block from the main processor memory into the coprocessor memory; and
update a parameter of the first layer using the second data block.
11. An electronic device, comprising:
check properties of data blocks used to execute a deep learning model, the data blocks comprising blocks of a unified virtual memory (UVM) backed by a main processor memory and by a coprocessor memory;
one or more processors; and
a memory storing instructions configured to cause the one or more processors to:
14. The electronic device of claim 11, wherein the instructions are further configured to cause the one or more processors to:
in a forward propagation stage of the deep learning model, execute a first layer of the deep learning model to determine a second data block of the UVM storing an output of the first layer;
prefetch the second data block into the main processor memory;
based on the second data block being prefetched into the main processor memory, in a backward propagation stage of the deep learning model, prefetch the second data block from the main processor memory into the coprocessor memory; and
update a parameter of the first layer using the second data block.
14. The electronic device of claim 11, wherein the instructions are further configured to cause the one or more processors to:
check properties of data blocks used to execute the deep learning model, the data blocks comprising blocks of the UVM;
in response to a first of the data blocks storing weight data of the deep learning model, store the first data block in the main processor memory among the main processor memory and the coprocessor memory; and
perform an operation of the deep learning model based on the first data block using a coprocessor while directly loading at least a portion of the first data block from the main processor memory into a cache memory of the coprocessor, without migration of the first data block of the main processor memory to the coprocessor memory.
11. An electronic device, comprising:
a memory storing instructions configured to cause the one or more processors to:
check properties of data blocks used to execute a deep learning model, the data blocks comprising blocks of a unified virtual memory (UVM) backed by a main processor memory and by a coprocessor memory;
in response to a first of the data blocks storing weight data of the deep learning model, store the first data block in the main processor memory among the main processor memory and the coprocessor memory; and
perform an operation of the deep learning model based on the first data block using a coprocessor while directly loading at least a portion of the first data block from the main processor memory into a cache memory of the coprocessor, without migration of the first data block of the main processor memory to the coprocessor memory.
Claims 2 – 4 map to claims 5 – 7 of Patent 12,189,527.
Claims 12 – 13 map to claims 15 – 16 of Patent 12,189,527.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 4 and 13 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 4, “the prefetching of the second data block is performed at least partly based on” is unclear and indefinite. Claim 1 has two prefetching of said second data block, one into main processor and one into coprocessor memory. Therefore, the limitation in question is unclear as to which prefetching is being referred to rendering the claim unclear and indefinite. For the purposes of examination, Examiner is interpreting this limitation to refer to prefetching of second data block into main processor memory (see spec Fig. 7 step 703 to step 707).
Claim 13 is the electronic device claim corresponding to method claim 4, and is rejected on the same grounds as claim 4.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1 – 4, 9, 11 – 13 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Jin (US 20200272907) in view of Hu (US 20200302304) and Rashid (US 20180314431).
Regarding claim 1, Jin teaches
A method of managing [a unified virtual memory (UVM) that is backed by] a main processor memory and a coprocessor memory, the method comprising:
in a forward propagation stage of a deep learning model, executing a first layer of the deep learning model to determine [a second data block of the UVM storing] an output of the first layer;
prefetching the [second data block] output into the main processor memory; (Jin teaches GPU pre-storing (prefetching) previous locally-input feature map (output) into host memory (main processor memory) (see ¶[14]) wherein i) said previously locally input feature map is a result of convolution operation in a forward propagation process (forward propagation stage) (see Fig. 5, ¶[67]) of neural network (deep learning model) (see ¶[3]), and ii) said convolution operation is part of convolution layer (first layer) (see ¶[6], [39]) of said neural network (deep learning model) (see ¶[37]).)
based on the [second data block] output being prefetched into the main processor memory, in a backward propagation stage of the deep learning model, prefetching the [second data block] output from the main processor memory into the coprocessor memory; and (Jin teaches, during forward-propagation process, data not required by present worked layer are offloaded to said host memory (main processor memory) where during back-propagation process (backward propagation stage) of said neural network (deep learning model), said data is pre-stored (prefetching) into memory (coprocessor memory) of GPU (coprocessor) (see ¶[65]). As such, said previous locally-input feature map (output) is pre-stored (prefetched) in said host memory when said previous locally-input feature map is not required, and during said back-propagation process (backward propagation stage), said previous locally-input feature map (output) is pre-stored (prefetching) into said memory (coprocessor memory) of said GPU (coprocessor). Note that said back-propagation process uses (based on) said previous locally-input feature map (output) that is pre-stored (prefetched) in said host memory (main processor memory).)
updating a parameter [of the first layer] using the output [second data block] (Jin teaches, in said back-propagation process, intermediate data is used to amend (update) weight (parameter) (see ¶[51]). Jin teaches that said previous locally-input feature map (output), result of convolution operation, is used to generate complete output feature map (see Fig. 5, ¶[67]). Note that said previous locally-input feature map (output) is an intermediate data. As such, said previous locally-input feature map (output) is used to amend (update) said weight (parameter).)
As noted in claim 1, Jin teaches updating parameter (or weight) using output of first layer but does not appear to explicitly teach said parameter/weight is of said first layer.
However, Hu teaches
updating a parameter of [the] first layer using the output [second data block] (Hu teaches Y1 (layer 1’s (first layer) output) (output) is transferred into BP1(1) of backward propagation which causes increment (updating) to W1 (parameter) (layer 1’s (first layer) weight) (see Fig. 1 and corresponding paragraphs).)
In view of Hu, Jin is modified such that said first layer’s output is used to update parameter/weight of said first layer.
Jin and Hu are analogous art to the claimed invention because they are in the same field of endeavor, memory management.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which said subject matter pertains to modify Jin in the manner described supra because it would minimize error as a function of weights of deep neural network (Hu, ¶[39]).
As noted in claim 1, Jin in view of Hu teach storing output of first layer in main processor memory and coprocessor memory but does not appear to explicitly teach said main processor memory and said coprocessor memory are unified virtual memory (UVM) where said output is stored and prefetched/moved as a second data block in said UVM.
However, Rashid teaches using system memory (main processor memory) and PPU memory (compressor memory) to from unified virtual memory system (UVM) (see Fig. 3A, ¶[40]) wherein i) data is stored in pages (second data block) in said system memory (see ¶[43]) and said PPU memory (see ¶[52]) and ii) said pages are copied/moved (see Rashid abstract).
In view of Rashid, modified Jin is modified such that said main processor memory and said coprocessor memory form unified virtual memory system (UVM) where i) said output is stored and moved as pages (second data block) in said main processor memory and said coprocessor memory.
Jin, Hu and Rashid are analogous art to the claimed invention because they are in the same field of endeavor, storage management.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which said subject matter pertains to modify modified Jin in the manner described supra because it would allow CPU/PPU to access physical memory location using common virtual memory address, regardless of whether said physical memory location is in system memory or memory local to said PPU (Rashid, ¶[26]).
Claim 11 is the electronic device claim corresponding to method claim 1, and is rejected on the same grounds as claim 1.
Jin in view of Hu and Rashid teach operations of claim 11. The claimed invention improves upon said operations by implementing said operations using electronic device as outlined below.
An electronic device using a unified virtual memory (UVM) that is backed by a main processor memory and a coprocessor memory, the electronic device comprising:
one or more processors; and
a memory storing instructions configured to cause the one or more processors to
This improvement to said electronic device is an application of known technique from Rashid – implementing method using computer program instructions executed by processor. In particular, Rashid teaches
An electronic device using a unified virtual memory (UVM) that is backed by a main processor memory and a coprocessor memory, the electronic device comprising: (Rashid teaches unified virtual memory (UVM) system formed by system memory (main processor memory) and PPU memory (coprocessor memory) (see Fig. 3A, ¶[40]))
one or more processors; and
a memory storing instructions configured to cause the one or more processors to (Rashid teaches computer program instructions (instructions) which execute on processor (one or more processors) of computer (electronic device), enable implementation of disclosed functions/acts (see ¶[84]) wherein said computer program instructions are stored in computer readable storage medium such as disk (memory) (see ¶[83]) that is part of said computer (see Fig. 1).)
One of ordinary skill in the art would recognize that this known technique of using computer to implement functions/acts can also be applied to implement the operations of claim 11, and the result would have been predictable. In this instance, the operations of claim 11 is implemented by using computer’s processor to execute computer program instructions stored in disk of said computer. It would have been obvious to one of ordinary skill in the art at the time of filing to recognize that applying Rashid’s known technique would have yielded i) predictable result of the operations of claim 11 being implemented using computer’s (electronic device) processor (one or more processors) to execute computer program instructions (instructions) stored in disk (memory) of said computer, and ii) the improved claimed invention (see MPEP 2143(I)(D)).
Regarding claim 2, Jin in view of Hu and Rashid teach the method of claim 1 where Jin also teaches
determining whether to prefetch the second data block into the main processor memory based on an output prefetch condition (Jin teaches when (determining whether) working on present locally-input feature map (output prefetch condition), pre-storing (prefetch) previously locally-input feature map (output) in host memory (main processor memory) (see ¶[14]). Note that in claim 1, said previously locally-input feature map has been modified to be stored in pages (second data block) in said host memory.)
Regarding claim 3, Jin in view of Hu and Rashid teach the method of claim 2 where Jin also teaches
wherein the output prefetch condition comprises:
[a condition of a position of the first layer in the deep learning model] or a condition of usage of the coprocessor memory in a learning process of the deep learning model (Jin teaches when working on the present locally-input feature map (output prefetch condition), pre-storing (prefetch) previously locally-input feature map in host memory (see ¶[14]) wherein said previously locally input feature map is a result of convolution operation in a forward propagation process (learning process) (see Fig. 5, ¶[67]) of neural network (deep learning model) (see ¶[3]).)
Claim 12 is the electronic device claim corresponding to method claims 2+3, and is rejected on the same grounds as claims 2+3.
Regarding claim 4, Jin in view of Hu and Rashid teach the method of claim 1 where Jin also teaches
wherein the prefetching of the second data block is performed at least partly based on a prediction of an oversubscription condition occurring with respect to the coprocessor memory (112(b) interpretation: This prefetching refers to prefetching of the second data block into the main processor memory.) (Jin teaches pre-storing (prefetch) previously locally-input feature map in host memory (main processor memory) (see ¶[14]). Jin also teaches when (prediction) storage room (condition) occupied by one iteration of training for neural network is greater than (oversubscription) memory (coprocessor memory) of GPU (coprocessor), during forward-propagation process, data not required by present worked layer are offloaded to said host memory (main processor memory) (see ¶[65]). As such, said previous locally-input feature map (output) is pre-stored (prefetched) in said host memory when (prediction) said storage room (condition) occupied by one iteration of training for said neural network is greater than (oversubscription) said memory (coprocessor memory) of said GPU (coprocessor). Note that in claim 1, said previously locally-input feature map has been modified to be stored in pages (second data block) in said host memory.)
Claim 13 is the electronic device claim corresponding to method claim 4, and is rejected on the same grounds as claim 4.
Regarding claim 9, Jin in view of Hu and Rashid teach the method of claim 1 where Jin also teaches
wherein the main processor comprises a central processing unit (CPU), and the coprocessor comprises a graphic processing unit (GPU) (Jin already teaches in claim 1, memory (coprocessor memory) of GPU (coprocessor, GPU). Jin teaches memory of CPU (main processor, CPU) comprises three pools for storing training sample data, intermediate data and parameter data (see ¶[51] where said memory is shown as host memory (main processor memory) (see Fig. 4).)
Claim 18 is the electronic device claim corresponding to method claim 9, and is rejected on the same grounds as claim 9.
Allowable Subject Matter
Claim 5 recites limitations that have been indicated as allowable in parent application 18/343,099 (US Patent 12,189,527). Therefore, reasons for allowance in said parent application also applies here and reproduced below.
Claim 5 recites, at least, storing, based on first data block being deep learning model’s weight, said first data block in main processor memory wherein said first data block is loaded into coprocessor cache without migrating said first data block from said main processor memory to said coprocessor’s memory. This subject matter is reflected in the following limitations of claim 5.
based on a first of the data blocks storing weight data of the deep learning model, storing the first data block in the main processor memory among the main processor memory and the coprocessor memory; and
performing an operation of the deep learning model based on the first data block using a coprocessor while directly loading at least a portion of the first data block from the main processor memory into a cache memory of the coprocessor without migration of the first data block from the main processor memory to the coprocessor memory
Steadman (US 20220327660) teaches in response to control device 2500 (main processor)’s instruction to execute (performing) particular application routine 3140 (operation of deep learning model) (that train neural network (see ¶[168])), particular node device (coprocessor) executes retrieval of particular image block 3100 (first of the data blocks) (which contains a copy of said application routine 3140) from cache 2363 (cache memory) where when said particular image block 3100 is not found in said cache 2363, retrieving said particular image block 3100 from repository device 2100 (main processor memory) (which communicates with processor 2550 in said control device 2500 (see Steadman ¶[183])) and storing said particular image block 3100 in said cache 2363 of said particular node device 2300 (coprocessor) (see Steadman Fig. 13A, ¶[185-186]). Note that said particular image block 3100 is directly stored (without migration of the first data block from the main processor memory to coprocessor memory) in said cache 2623. However, Steadman does not appear to explicitly teach said particular image block 3100 (first of the data blocks) is stored in said repository device 2100 (main processor memory) based on said particular image block 3100 being weight data of said particular application routine 3140 (operation of deep learning model). Therefore, claim 1 is allowable over Steadman.
Claim 14 is the electronic device claim corresponding to method claim 5, and is allowable over prior art for the same reasons as claim 5.
Claims, dependent upon claims 5 or 14, are also allowable over prior art for the same reasons as said claims 5 or 14.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHIE YEW whose telephone number is (571)270-5282. The examiner can normally be reached Monday - Thursday and alternate Fridays.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Reginald Bragdon can be reached at (571) 272-4204. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CHIE YEW/ Primary Examiner, Art Unit 2139