Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-11, 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over FAHS et al (20120089792) in view of TPOINT (One Dimensional Array Address Calculation - https://www.youtube.com/watch?v=NrhuLHp2vRw).
As per claim 1, Fahs teaches the claimed “graphics processor” comprising: “a graphics core including functional units to perform parallel processing operations on data elements stored in a memory” (Fahs, [0027] - The connection topology, including the number and arrangement of bridges, the number of CPUs 102, and the number of parallel processing subsystems 112, may be modified as desired. For instance, in some embodiments, system memory 104 is connected to CPU 102 directly rather than through a bridge, and other devices communicate with system memory 104 via memory bridge 105 and CPU 102); and “memory access circuitry configured to facilitate access to the memory by the functional units of the graphics core” (Fahs, [0036] - PPUs 202 may transfer data from system memory 104 and/or local parallel processing memories 204 into internal (on-chip) memory, process the data, and write result data back to system memory 104 and/or local parallel processing memories 204, where such data can be accessed by other system components, including CPU 102 or another parallel processing subsystem 112), “wherein the memory access circuitry is configured to: receive a message to access a data element of an array of data elements in the memory” (Fahs, [0060] - FIG. 4A illustrates an array of structures of arrays (AoSoA) 400 within the DRAM 220 of the PP Memory 204 of FIG. 2, according to one embodiment of the invention… As described in greater detail herein, the structure of the AoSoA 400 enables the threads executing within the thread/data lane 408 to access memory locations with the DRAM 220 at a unit stride length proportional to the number of thread/data lanes 408), “the message to include an index of the data element in the array of data elements” (Fahs, [0061] - The AoSoA 400 is divided into rows and columns. Each column of the AoSoA 400 is associated with a different thread/data lane 408, and each row of the AoSoA is associated with a different data element) (Noted: the location of a stored element data is defined by its row and its column, also called index); and “submit a memory access request to the memory to access the data element at the byte address” (Fahs, Abstract - Specifically, the device driver receives an instruction targeted to a memory set up as an array of structures of arrays; Fahs, [0027] - Referring again to FIG. 1, in some embodiments, some or all of PPUs 202 in parallel processing subsystem 112 are graphics processors with rendering pipelines that can be configured to perform various tasks related to generating pixel data from graphics data supplied by CPU 102 and/or system memory 104 via memory bridge 105 and bus 113, interacting with local parallel processing memory 204 (which can be used as graphics memory including, e.g., a conventional frame buffer) to store and update pixel data, delivering pixel data to display device 110, and the like; [0075] - Once the AoSoA 400 is set up, the device driver 103 receives instructions that target memory locations within the AoSoA 400. The device driver 103 calculates the final address within the AoSoA based on the number of thread/data lanes 408). It is noted that Fahs does not explicitly teach “calculate a byte address for the data element based in part on the index of the data element in the array of data elements” as claimed; however, Fahs’ address calculation steps set forth in Table 2 (e.g., [0069] - The device driver 103 typically receives various parameters from the instruction including the base address 404 of the AoSoA 400 (AoSoA_base), an index value indicating which structure is to be accessed (struct_idx), an offset indicating which field is to be accessed (field_offset), and a value representing the size in memory of each structure (struct_sz)…; [0070] - Another example is aligning the SoAs within the AoSoA 400 on boundaries consistent with the memory access byte granularity of the particular architecture. FIG. 4A and the code segments of Tables 1 and 2 indicate interleave groups of structures on a granularity equal to sizeof(int). For example, in a system where sizeof(int) is four bytes, FIG. 4A and the code segments of Tables 1 and 2 indicate a four-byte granularity; [0076] - The device driver also receives information from the instruction such as the base address 404 of the AoSoA 400, the structure index (e.g., struct_index in TABLE 2), the field offset, and the structure size. The device driver utilizes these parameters to compute the address of the target field 414 within the AoSoA 400. The result is a memory allocation and access approach where the device driver properly computes the memory address of the target field 414 in the AoSoA 400) suggests the calculate a byte address for the data element (e.g., TABLE 2, AoSoA_addr) based in part on the index of the data element (e.g., TABLE 2, Struct_index) in the array of data elements. Furthermore, Tpoint teaches the claimed “calculate a byte address for the data element based in part on the index of the data element in the array of data elements” (Tpoint, Array Address Calculation using its index address, 01;41-10:37 – the specific formular, in case of one dimensional array, to calculate the byte address for the data element based in part on the index of the data element using Location(A(k))= BA + W*(k – Lower Bound) where BA is Base Address of the array, W is size of element in byte, and k is index of the data element). Thus, it would have been obvious, in view of Tpoint, to configure Fahs’ method as claimed by using the index of the data element in calculating of its byte address. The motivation is to allow the processor to access the data stored at a specific byte location or loading data into the specified byte location.
Claim 2 adds into claim 1 “wherein the memory access request is a request to store the data element to the byte address” (Fahs, [0027] - Referring again to FIG. 1, in some embodiments, some or all of PPUs 202 in parallel processing subsystem 112 are graphics processors with rendering pipelines that can be configured to perform various tasks related to generating pixel data from graphics data supplied by CPU 102 and/or system memory 104 via memory bridge 105 and bus 113, interacting with local parallel processing memory 204 (which can be used as graphics memory including, e.g., a conventional frame buffer) to store and update pixel data, delivering pixel data to display device 110, and the like).
Claim 3 adds into claim 1 “wherein the memory access request is a request to load the data element from the byte address” (Fahs, [0027] - Referring again to FIG. 1, in some embodiments, some or all of PPUs 202 in parallel processing subsystem 112 are graphics processors with rendering pipelines that can be configured to perform various tasks related to generating pixel data from graphics data supplied by CPU 102 and/or system memory 104 via memory bridge 105 and bus 113, interacting with local parallel processing memory 204 (which can be used as graphics memory including, e.g., a conventional frame buffer) to store and update pixel data, delivering pixel data to display device 110, and the like).
Claim 4 adds into claim 1 “the memory access circuitry configured to calculate the byte address for the data element based on a size of the data element” (Fahs, [0070] - FIG. 4A and the code segments of Tables 1 and 2 indicate interleave groups of structures on a granularity equal to sizeof(int). For example, in a system where sizeof(int) is four bytes, FIG. 4A and the code segments of Tables 1 and 2 indicate a four-byte granularity. Other granularities are possible, and interleave granularity would typically be chosen to match the memory access byte granularity most efficiently accessible by the threads; Tpoint, Array Address Calculation using its index address, 01:41-10:37 – the specific formular, in case of one dimensional array, to calculate the byte address for the data element based in part on the index of the data element using Location(A(k))= BA + W*(k – Lower Bound) where BA is Base Address of the array, W is size of element in byte, and k is index of the data element). Thus, it would have been obvious, in view of Tpoint, to configure Fahs’ method as claimed by using the index of the data element in calculating of its byte address. The motivation is to allow the processor to access the data stored at a specific byte location or loading data into the specified byte location.
Claim 5 adds into claim 4 “wherein the memory access circuitry includes first circuitry configured to multiply the size of the data element by the index of the data element to generate a byte offset” (Fahs, [0069] - The device driver 103 first computes the base address 404 of the AoSoA 400 using AoSoA_base, as shown in Table 2 as the first line of the return command. The device driver 103 computes a first partial offset proportional to the starting address of the target SoA 410, as shown in the second line of the return command. This is typically an integer number of rows from the base address 404. In the exemplary AoSoA 400, the target SoA 410 is the second SoA and each structure has three fields, therefore the target SoA is three rows down from the base address 404. To this first partial offset, the device driver 103 adds a second partial offset representing the location of the target structure 412 within the target SoA 410, as shown in the third line of the return command. This is typically an integer number of structures after the target SoA 410 address. In the exemplary AoSoA 400, the target structure 412 is the fourth structure in the target SoA 410, therefore the target SoA is in the fourth column of the AoSoA 400. To this second partial offset, the device driver 103 adds a a third partial offset representing the location of the target field 414 within the target structure 412, as shown in the fourth line of the return command. This is typically an integer number of rows relative to the address of the target structure 412. In the exemplary AoSoA 400, the target field 414 is the second field of the target structure 412, therefore the target field 414 is in the second row relative to the target structure 412. The device driver 103 then completes the memory access at the memory location of the target field 414 as determined by the sum of the base address and the computed partial offsets) (Noted: shown in the third line of the return command on TABLE II (i.e., struct_idx%SIMT_ WIDTH * sizeof(int)).
Claim 6 adds into claim 5 “wherein the memory access circuitry includes second circuitry configured to scale the byte offset by an offset scale factor to generate a scaled offset, the offset scale factor provided by the message” (Tpoint, Array Address Calculation using its index address, 01:41-03:14 – each data element (e.g., 4 bytes) has four (4) sub-elements (1 byte per sub-element); Fahs, [0065] - In this particular declaration, each structure within the AoSoA 400 has three fields: field 0 (F0) of type int, field 1 (F1) of type float, and field 2 (F2) of type some type. In one embodiment, each thread accesses a different memory location in data row A 416 to perform an operation on the F0 field from a sequential number of data structures proportional to the number of thread/data lanes 408. SIMT_WIDTH in the declaration of Table 1 represents the number of thread/data lanes 408. In the AoSoA 400, SIMT_WIDTH is eight because there are eight thread/data lanes 408. However, SIMT_WIDTH is dependent on the specific architecture of SPM 310) (Noted: the multiply the size of the data element (e.g., 1 byte) and the index of the data element k by a number of sub elements (e.g., 4) to generate the byte offset). Thus, it would have been obvious, in view of Tpoint, to configure Fahs’ method as claimed by using the index of the data element in calculating of its byte address. The motivation is to allow the processor to access the data stored at a specific byte location or loading data into the specified byte location.
Claim 7 adds into claim 6 “wherein the first circuitry or the second circuitry includes a shifter circuit” (Fahs, [0070] - For example, if SIMT_WIDTH is a power of 2, as is commonly the case, then the modulo (`%`) operations may be performed using simple Boolean operations, and the division (`/`) and multiplication (`*`) operations may be performed using bit shift).
Claim 8 adds into claim 6 “wherein the memory access circuitry includes third circuitry to add a global offset to the scaled offset, the global offset provided by the message” (Fahs, [0069] - The device driver 103 first computes the base address 404 of the AoSoA 400 using AoSoA_base, as shown in Table 2 as the first line of the return command) (Noted: the first line of the return command on TABLE II (i.e., AoSoA_base)).
Claim 9 adds into claim 6 “wherein the data element includes multiple sub elements and to calculate the byte address for the data element includes to calculate a byte address for each sub element” (Tpoint, Array Address Calculation using its index address, 01:41-03:14 – each data element (e.g., 4 bytes) has four (4) sub-elements (1 byte per sub-element); Fahs, [0065] - In this particular declaration, each structure within the AoSoA 400 has three fields: field 0 (F0) of type int, field 1 (F1) of type float, and field 2 (F2) of type some type. In one embodiment, each thread accesses a different memory location in data row A 416 to perform an operation on the F0 field from a sequential number of data structures proportional to the number of thread/data lanes 408. SIMT_WIDTH in the declaration of Table 1 represents the number of thread/data lanes 408. In the AoSoA 400, SIMT_WIDTH is eight because there are eight thread/data lanes 408. However, SIMT_WIDTH is dependent on the specific architecture of SPM 310). Thus, it would have been obvious, in view of Tpoint, to configure Fahs’ method as claimed by using the index of the data element in calculating of its byte address. The motivation is to allow the processor to access the data stored at a specific byte location or loading data into the specified byte location.
Claim 10 adds into claim 9 “the first circuitry configured to multiply the size of the data element and the index of the data element by a number of sub elements to generate the byte offset” (Fahs, [0070] - FIG. 4A and the code segments of Tables 1 and 2 indicate interleave groups of structures on a granularity equal to sizeof(int). For example, in a system where sizeof(int) is four bytes, FIG. 4A and the code segments of Tables 1 and 2 indicate a four-byte granularity. Other granularities are possible, and interleave granularity would typically be chosen to match the memory access byte granularity most efficiently accessible by the threads; Tpoint, Array Address Calculation using its index address, 01:41-10:37 – each data element (e.g., 4 bytes) has four (4) sub-elements (1 byte per sub-element); the specific formular, in case of one dimensional array, to calculate the byte address for the data element based in part on the index of the data element using Location(A(k))= BA + W*(k – Lower Bound) where BA is Base Address of the array, W is size of element in byte, and k is index of the data element). Thus, it would have been obvious, in view of Tpoint, to configure Fahs’ method as claimed by using the index of the data element in calculating of its byte address. The motivation is to allow the processor to access the data stored at a specific byte location or loading data into the specified byte location.
Claim 11 adds into claim 10 “wherein the functional units include multiple processor lanes, the multiple processor lanes associated with the multiple sub elements” (Tpoint, Array Address Calculation using its index address, 01:41-03:14 – each data element (e.g., 4 bytes) has four (4) sub-elements (1 byte per sub-element); Fahs, [0065] - In this particular declaration, each structure within the AoSoA 400 has three fields: field 0 (F0) of type int, field 1 (F1) of type float, and field 2 (F2) of type some type. In one embodiment, each thread accesses a different memory location in data row A 416 to perform an operation on the F0 field from a sequential number of data structures proportional to the number of thread/data lanes 408. SIMT_WIDTH in the declaration of Table 1 represents the number of thread/data lanes 408. In the AoSoA 400, SIMT_WIDTH is eight because there are eight thread/data lanes 408. However, SIMT_WIDTH is dependent on the specific architecture of SPM 310). Thus, it would have been obvious, in view of Tpoint, to configure Fahs’ method as claimed by using the index of the data element in calculating of its byte address. The motivation is to allow the processor to access the data stored at a specific byte location or loading data into the specified byte location.
Claims 17-20 claim a system based on the method of claims 1-11; therefore, they are rejected under a similar rationale.
Claim 4 is objected to because of the following informalities: ”any one of claims” should be amended to - - claim - -. Appropriate correction is required.
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1-3 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 4 and 5 of U.S. Patent No. 12,333,310. Although the claims at issue are not identical, they are not patentably distinct from each other because the calculation of the memory byte address in the claims of the US patent implies the claimed “calculation of the memory byte address” of the pending application.
Claims of the application
Claims of the US patent
1.A graphics processor comprising:
a graphics core including functional units to perform parallel processing operations on data elements stored in a memory; and
memory access circuitry configured to facilitate access to the memory by the functional units of the graphics core, wherein the memory access circuitry is configured to: receive a message to access a data element of an array of data elements in the memory, the message to include an index of the data element in the array of data elements;
calculate a byte address for the data element based in part on the index of the data element in the array of data elements; and
submit a memory access request to the memory to access the data element at the byte address.
1.A graphics processor comprising:
a graphics core including a plurality of processing resources, each having a plurality of processor lanes to perform a parallel processing operation on a plurality of data elements stored in a memory; and
memory access circuitry configured to receive offload of memory address calculations for the plurality of data elements from the plurality of processor lanes of the plurality of processing resources, the memory access circuitry is configured to:
determine byte addresses for the plurality of data elements stored in the memory, the byte addresses determined based on a base address, an offset between addresses of data elements of the plurality of data elements, and a scale factor to apply to the offset, wherein the byte addresses are byte granularity addresses of data elements to be processed by the plurality of processor lanes; and
submit a memory access request to the memory on behalf of the plurality of processor lanes to access the plurality of data elements at the byte addresses determined for the plurality of data elements.
2. The graphics processor of claim 1, wherein the memory access request is a request to store the data element to the byte address.
4. The graphics processor of claim 1, wherein the memory access request includes a request to store a data element to the memory.
3. The graphics processor of claim 1, wherein the memory access request is a request to load the data element from the byte address.
5. The graphics processor of claim 1, wherein the memory access request is a request to load a data element from the memory.
4. The graphics processor of any one of claims 1, the memory access circuitry configured to calculate the byte address for the data element based on a size of the data element.
Claim 1 …
determine byte addresses for the plurality of data elements stored in the memory, the byte addresses determined based on a base address, an offset between addresses of data elements of the plurality of data elements, and a scale factor to apply to the offset, wherein the byte addresses are byte granularity addresses of data elements to be processed by the plurality of processor lanes.
Claims 17-20 claim a system based on the method of claims 1-4; therefore, they are rejected under a similar rationale
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PHU K NGUYEN whose telephone number is (571)272-7645. The examiner can normally be reached M-F 8-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel F. Hajnik can be reached at (571) 272-7515. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/PHU K NGUYEN/Primary Examiner, Art Unit 2616