DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d), Chinese Patent Application No. 202211098928.9 dated 09/07/2022 has been placed of record in the file.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 04/14/2025 has been received and considered. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Drawings
The applicant’s drawings submitted are acceptable for examination purposes.
Specification
The applicant’s specification submitted is acceptable for examination purposes.
Examiner Notes
(1) In the case of amending the Claimed invention, Applicant is respectfully requested to indicate the portion(s) of the specification which dictate(s) the structure relied on for proper interpretation and also to verify and ascertain the metes and bounds of the claimed invention. This will assist in expediting compact prosecution. MPEP 714.02 recites: “Applicant should also specifically point out the support for any amendments made to the disclosure. See MPEP § 2163.06. An amendment which does not comply with the provisions of 37 CFR 1.121 (b), (c), (d), and (h) may be held not fully responsive. See MPEP § 714.” Amendments not pointing to specific support in the disclosure may be deemed as not complying with provisions of 37 C.F.R. 1.131 (b), (c), (d), and (h) and therefore held not fully responsive. Generic statements such as "Applicants believe no new matter has been introduced" may be deemed insufficient.
(2) Examiner cites particular columns, paragraphs, figures and line numbers in the references as applied to the claims below for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in their entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner.
.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are directed to non-statutory subject matter because it does not fall within four category of patentable subject matter recited in 35 U.S.C 101 (Process, machine manufacture or composition of matter).
When considering subject matter eligibility under 35 USC 101, it must be determined whether the claim is directed to one of the four statutory categories of invention, i.e., process, machine, manufacture, or composition of matter (Step 1). If the claim does fall within one of the statutory categories, it must then be determined whether the claim is directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea) (Step 2A), and if so, it must additionally be determined whether the claim is a patent-eligible application of the exception. If an abstract idea is present in the claim, any element or combination of elements in the claim must be sufficient to ensure that the claim amounts to significantly more than the abstract idea itself (Step 2B). Examples of abstract ideas include fundamental economic practices; certain methods of organizing human activities; an idea itself; and mathematical relationships/formulas.
Analysis
STEP 1:
Claims 1, 10 and 19 subject matter falls within the four statutory categories of patentable subject matter identified by 35 U.S.C. § 101: process, machine, manufacture, or composition of matter.
STEP 2A, PRONG l (Claim 1):
Under step 2A, prong 1, of the 2019 Guidance, we first look to whether the claim recites any judicial exceptions, including certain groupings of abstract ideas (i.e., mathematical concepts, certain methods of organizing human activities such as a fundamental economic practice, or mental processes). MPEP § 2106.04(a).
A model optimization method, comprising:;; and providing the second model file for the user or sending the second model file to the Al application
Limitation recites:
receiving optimization request information entered by a user or sent by an artificial intelligence (AI) application, wherein the optimization request information comprises a first model file, the first model file comprises M operators, each operator is used to perform one matrix multiplication calculation, each operator corresponds to one kernel function, and M is a positive integer”
generating a second model file based on the first model file, wherein the second model file comprises N fused operators, each fused operator is used to perform at least two matrix multiplication calculations, each fused operator corresponds to one kernel function, N is a positive integer, and N<M”
providing the second model file for the user or sending the second model file to the Al application”
Claim 1 recites limitation “generating a second model file based on the first model file, wherein the second model file comprises N fused operators, each fused operator is used to perform at least two matrix multiplication calculations, each fused operator corresponds to one kernel function, N is a positive integer, and N<M”; this limitation is, under their broadest reasonable interpretation, covers a mathematical concept/calculation.
STEP 2A, PRONG 2 (Claim 1):
Limitations (a) and (c) recite “receiving… ” and “providing …” respectively, which merely constitute extra-insignificant solution activity (mere data gathering and output, selecting a particular data source or type of data to be manipulated; see MPEP 2106.05(g) – presenting offers, selecting information examples).
The additional limitations “AI application” describe generic computer components, akin to adding the word "apply it" in connection with the abstract idea.
STEP 2B (Claim 1):
Under step 2B, the limitation (a) and (c) merely constitute extra-insignificant solution activity (mere data gathering and output, selecting a particular data source or type of data to be manipulated; see MPEP 2106.05(g) – presenting offers, selecting information examples) and is well-known, conventional, and routine in the art (See Mayo, 566 U.S. at 79, 101 USPQ2d at 1968; OIP Techs., Inc. v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1092-93 (Fed. Cir. 2015) (presenting offers and gathering statistics amounted to mere data gathering).
Viewed as a whole, the additional claim elements do not provide meaningful limitations sufficient to transform the abstract idea into a patent eligible application of the abstract idea such that the claims amount to “significantly more” than the abstract idea itself. Therefore, claim 1 is rejected under 35 U.S.C. §101 as being directed to non-statutory subject matter.
Claims 10 and 19 are being rejected under U.S.C. 101 for similar reasons.
Similarly, additional limitations “at least one processor”, “at least one memory”; “AI application” in claim 10 and “a computing device…; a processor…; a memory; user interface…”, “a plurality of prediction engines”; “a non-transitory computer-readable storage medium”, “AI application” in claim 19 describe generic computer components, akin to adding the word "apply it" in connection with the abstract idea.
Under step 2B, the limitations “receiving… ” and “providing …” merely constitute insignificant solution activity (mere data gathering and output, selecting a particular data source or type of data to be manipulated; see MPEP 2106.05(g) – presenting offers, selecting information examples) and is well-known, conventional, and routine in the art (See Mayo, 566 U.S. at 79, 101 USPQ2d at 1968; OIP Techs., Inc. v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1092-93 (Fed. Cir. 2015) (presenting offers and gathering statistics amounted to mere data gathering)).
Claims 2-9, 11-18 and 20 are dependent on their respective parent claims 1, 10 and 19 respectively and include all the limitations of claims 1, 10 and 19; since these claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception, thus the claims are direct to abstract idea.
Claims 1-20 are therefore not drawn to eligible subject matter as they are directed to an abstract idea without significantly more.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-7, 10-16, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over KAIYUAN et al. (CN112529207 A) in view of Shivam et al. (U.S. Pub. No. 2023/0297643 A1), further in view of Ashari et al. (U.S. Pub. No. 2017/0032487 A1).
Regarding claim 1, KAIYUAN teaches a method, comprising: receiving optimization request information entered by a user or sent by an artificial intelligence (AI) application, wherein the optimization request information comprises a first model file (page 7, paragraph [7], the search configuration information is input or selected by a user on a graphical user interface GUI; also see page 20, paragraph [1]-[3], regarding the implementation process of obtain the original model, the model optimization device may obtain the original model according to a storage path configured in the configuration interface by the user, or receive the original model directly uploaded by the user through the client; also see page 14, paragraph [8]-[9]), and providing the second model file for the user or sending the second model file to the Al application (page 13, paragraph [3], obtain an optimized model, and feedback the optimized model to the client).
KAIYUAN does not explicitly disclose: the first model file comprises M operators, each operator is used to perform one matrix multiplication calculation, each operator corresponds to one kernel function, and M is a positive integer; generating a second model file based on the first model file.
Shivam teaches: the first model file comprises M operators, each operator is used to perform one matrix multiplication calculation, each operator corresponds to one kernel function, and M is a positive integer (paragraph [0039], a matrix multiplication operation can be referred as a General Matrix Multiplication; A and B are input matrices of dimensions M rows by k columns and K rows by N column, respectively, and C is a preexisting output matrix that is overwritten with the result and has dimension of M rows by N columns; it will be appreciated that the product of AB has MxN elements, each of which is calculated as the dot-product of two K element vectors, thus a total of MxNxK FMA (fused multiply add) operations are needed to compute the product of AB, and each FMA operation is a combination of a multiply operation followed by an addition operation; also see paragraph [0037], launching the kernels to execute the desired operation(s) in the PPU; launching operations on two or more PPUs simultaneously, also see paragraph [0139]); generating a second model file based on the first model file (paragraph [0039], a matrix multiplication operation can be referred as a General Matrix Multiplication; A and B are input matrices of dimensions M rows by k columns and K rows by N column, respectively, and C is a preexisting output matrix that is overwritten with the result and has dimension of M rows by N columns)
It would have been obvious to one of ordinary skill in art before the effective filing date of the claim invention to include the first model file comprises M operators, each operator is used to perform one matrix multiplication calculation, each operator corresponds to one kernel function, and M is a positive integer; generating a second model file based on the first model file into model optimization of KAIYUAN.
Motivation to do so would be to include the first model file comprises M operators, each operator is used to perform one matrix multiplication calculation, each operator corresponds to one kernel function, and M is a positive integer; generating a second model file based on the first model file to address the issue with conventional solutions simply are not executed as fast as desirable (Shivam, paragraph [0003], line 20-21).
KAIYUAN as modified by Shivam do not explicitly disclose: wherein the second model file comprises N fused operators, each fused operator is used to perform at least two matrix multiplication calculations, each fused operator corresponds to one kernel function, N is a positive integer, and N<M.
Ashari teaches: wherein the second model file comprises N fused operators, each fused operator is used to perform at least two matrix multiplication calculations, each fused operator corresponds to one kernel function, N is a positive integer, and N<M (paragraph [0058], development of GPU kernels for primitive linear algebraic operators (e.g., matrix vector multiplication) that are used in developing ML processes is extended by developing fused kernels for a combination of primitive operators …; a fused kernel is developed to optimize the computation on the GPU; also see paragraph [0061], receiving input matrix comprising a matric (sparse or dense) that includes row and column data; also see [0064], GPU exploitation is to launch separate kernels for individual operations; one for matrix-vector multiplication (XxY) and another transpose-matrix-vector multiplication (XTx(∙)); also see paragraph [0085]-[0086]; in combination with the matrices of the matrix multiplication of Shivam, it reads on as claimed)
It would have been obvious to one of ordinary skill in art before the effective filing date of the claim invention to include wherein the second model file comprises N fused operators, each fused operator is used to perform at least two matrix multiplication calculations, each fused operator corresponds to one kernel function, N is a positive integer, and N<M into model optimization of KAIYUAN.
Motivation to do so would be to include wherein the second model file comprises N fused operators, each fused operator is used to perform at least two matrix multiplication calculations, each fused operator corresponds to one kernel function, N is a positive integer, and N<M such that not only reduces the process cost of data load due to improved temporal locality but also enables other optimizations (Ashari, paragraph [0058], line 17-19).
Regarding claim 2, KAIYUAN as modified by Shivam and Ashari teach all claimed limitations as set forth in rejection of claim 1, further teach wherein the first model file is a model file of a trained AI model (KAIYUAN, page 13, paragraph [2]-[4], optimized the original model according to the combined operator to obtain an optimized model; the model optimized device is operated in the purchased computing resources, so that the model optimization device optimizes the AI model).
Regarding claim 3, KAIYUAN as modified by Shivam and Ashari teach all claimed limitations as set forth in rejection of claim 1, further teach wherein any matrix multiplication calculation performed using each fused operator does not depend on a calculation result of another matrix multiplication calculation performed using the same fused operator (Ashari, paragraph [0058], development of GPU kernels for primitive linear algebraic operators (e.g., matrix vector multiplication) that are used in developing ML processes is extended by developing fused kernels for a combination of primitive operators…; a fused kernel is developed to optimize the computation on the GPU; also see paragraph [0061], receiving input matrix comprising a matric (sparse or dense) that includes row and column data; also see [0064], GPU exploitation is to launch separate kernels for individual operations; one for matrix-vector multiplication (XxY) and another transpose-matrix-vector multiplication (XTx(∙)); also see paragraph [0077], the goal behind the technique is to reuse the rows r while computing w[:]=X [r,:] TxpH, where p[r] = X[r,:]x y[:]).
Regarding claim 4, KAIYUAN as modified by Shivam and Ashari teach all claimed limitations as set forth in rejection of claim 1, further teach wherein that each operator is used to perform one matrix multiplication calculation comprises: each operator is used to perform a matrix multiplication calculation between a left hand side matrix and a right hand side matrix (Shivam, paragraph [0039], a matrix multiplication operation can be referred as a General Matrix Multiplication; A and B are input matrices of dimensions M rows by k columns and K rows by N column, respectively, and C is a preexisting output matrix that is overwritten with the result and has dimension of M rows by N columns; it will be appreciated that the product of AB has MxN elements, each of which is calculated as the dot-product of two K element vectors, thus a total of MxNxK FMA (fused multiply add) operations are needed to compute the product of AB, and each FMA operation is a combination of a multiply operation followed by an addition operation).
Regarding claim 5, KAIYUAN as modified by Shivam and Ashari teach all claimed limitations as set forth in rejection of claim 4, further teach wherein the generating a second model file based on the first model file comprises: generating the second model file based on a dimension of the left hand side matrix or a dimension of the right hand side matrix in the first model file (Shivam, paragraph [0039], a matrix multiplication operation can be referred as a General Matrix Multiplication; A and B are input matrices of dimensions M rows by k columns and K rows by N column, respectively, and C is a preexisting output matrix that is overwritten with the result and has dimension of M rows by N columns; it will be appreciated that the product of AB has MxN elements, each of which is calculated as the dot-product of two K element vectors, thus a total of MxNxK FMA (fused multiply add) operations are needed to compute the product of AB, and each FMA operation is a combination of a multiply operation followed by an addition operation).
Regarding claim 6, KAIYUAN as modified by Shivam and Ashari teach all claimed limitations as set forth in rejection of claim 4, further teach wherein left hand side matrices of a plurality of matrix multiplication calculations performed using a first fused operator have a same dimension, right hand side matrices of the plurality of matrix multiplication calculations performed using the first fused operator have a same dimension, and the first fused operator is one of the N fused operators (Ashari, paragraph [0058], development of GPU kernels for primitive linear algebraic operators (e.g., matrix vector multiplication) that are used in developing ML processes is extended by developing fused kernels for a combination of primitive operators …; a fused kernel is developed to optimize the computation on the GPU; also see paragraph [0061], receiving input matrix comprising a matric (sparse or dense) that includes row and column data; also see [0064], GPU exploitation is to launch separate kernels for individual operations; one for matrix-vector multiplication (XxY) and another transpose-matrix-vector multiplication (XTx(∙)).
Regarding claim 7, KAIYUAN as modified by Shivam and Ashari teach all claimed limitations as set forth in rejection of claim 4, further teach wherein the method further comprises: determining a plurality of second matrix multiplication calculations based on a first matrix multiplication calculation, wherein the first matrix multiplication calculation is a matrix multiplication calculation performed using one of the M operators; performing the plurality of second matrix multiplication calculations based on at least one second fused operator of the N fused operators (Ashari, paragraph [0058], development of GPU kernels for primitive linear algebraic operators (e.g., matrix vector multiplication) that are used in developing ML processes is extended by developing fused kernels for a combination of primitive operators …; a fused kernel is developed to optimize the computation on the GPU; also see paragraph [0061], receiving input matrix comprising a matric (sparse or dense) that includes row and column data; also see [0064], GPU exploitation is to launch separate kernels for individual operations; one for matrix-vector multiplication (XxY) and another transpose-matrix-vector multiplication (XTx(∙)); and adding calculation results of the plurality of second matrix multiplication calculations (Ashari, paragraph [0058], development of GPU kernels for primitive linear algebraic operators (e.g., matrix vector multiplication) that are used in developing ML processes is extended by developing fused kernels for a combination of primitive operators …; a fused kernel is developed to optimize the computation on the GPU; also see paragraph [0061], receiving input matrix comprising a matric (sparse or dense) that includes row and column data; also see [0064], GPU exploitation is to launch separate kernels for individual operations; one for matrix-vector multiplication (XxY) and another transpose-matrix-vector multiplication (XTx(∙); also see paragraph [0062], performing inter-vector or intra-block aggregations for the vector via atomic operations using the partial output vector results; also see paragraph [0067], the partial values of w computed by multiple threads spanning warps and blocks are aggregated to obtain the final vector result).
As per claims 10-16, these claims are rejected on grounds corresponding to the same rationales given above for rejected claims 1-7 and are similarly rejected.
As per claims 19-20, these claims are rejected on grounds corresponding to the same rationales given above for rejected claims 1-2 and are similarly rejected.
Claims 8 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over KAIYUAN et al. (CN112529207 A) in view of Shivam et al. (U.S. Pub. No. 2023/0297643 A1), and Ashari et al. (U.S. Pub. No. 2017/0032487 A1), further in view of INKUN et al. (CN114168154 A).
Regarding claim 8, KAIYUAN as modified by Shivam and Ashari teach all claimed limitations as set forth in rejection of claim 7, but do not explicitly disclose: wherein the determining a plurality of second matrix multiplication calculations based on a first matrix multiplication calculation comprises: splitting the first matrix multiplication calculation into the plurality of second matrix multiplication calculations based on a dimension of the left hand side matrix or a dimension of the right hand side matrix in the first matrix multiplication calculation.
JINKUN teaches: wherein the determining a plurality of second matrix multiplication calculations based on a first matrix multiplication calculation comprises: splitting the first matrix multiplication calculation into the plurality of second matrix multiplication calculations based on a dimension of the left hand side matrix or a dimension of the right hand side matrix in the first matrix multiplication calculation (page 9, paragraph [5], one representation file obtained after decomposition correspond to “+” in the operator (i.e., the algorithm of multiplication included in the operator, and the corresponding algorithm type is the algorithm type of multiplication); also see page 11, paragraph [4], the electric device may use, as the intermediate representation to be fused, , and may convert one intermediate representations, which may be understood as splitting operator corresponding to the intermediate representation to be adjusted to obtain the corresponding operators (that is, splitting operators) respectively converted and represented by the plurality of fused intermediate representations).
It would have been obvious to one of ordinary skill in art before the effective filing date of the claim invention to include wherein the determining a plurality of second matrix multiplication calculations based on a first matrix multiplication calculation comprises: splitting the first matrix multiplication calculation into the plurality of second matrix multiplication calculations based on a dimension of the left hand side matrix or a dimension of the right hand side matrix in the first matrix multiplication calculation into model optimization of KAIYUAN.
Motivation to do so would be to include wherein the determining a plurality of second matrix multiplication calculations based on a first matrix multiplication calculation comprises: splitting the first matrix multiplication calculation into the plurality of second matrix multiplication calculations based on a dimension of the left hand side matrix or a dimension of the right hand side matrix in the first matrix multiplication calculation such that the operator fusion rate is improved, the memory reuse can be improved, and the model effect and the model performance after compiling optimization is improved (JINKUN, page 2, third to last paragraph, line 10-11).
As per claim 17, this claim is rejected on grounds corresponding to the same rationales given above for rejected claim 8 and is similarly rejected.
Claims 9 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over KAIYUAN et al. (CN112529207 A) in view of Shivam et al. (U.S. Pub. No. 2023/0297643 A1), and Ashari et al. (U.S. Pub. No. 2017/0032487 A1), further in view of Palmer et al. (U.S. patent No. 11,169,798 B1).
Regarding claim 9, KAIYUAN as modified by Shivam and Ashari teach all claimed limitations as set forth in rejection of claim 1, but do not explicitly disclose: wherein the AI application deploys the second model file on a cloud server of the AI application after receiving the second model file.
Palmer teaches: wherein the AI application deploys the second model file on a cloud server of the AI application after receiving the second model file (col. 6, line 24-40, enables build-once/reusability/generic models with model inheritance from the base model class, model version tracking, model packaging, automatic realtime performance profiling, etc.; also see col. 3, line 43-56, AI model serving system can reside on specific computers or be otherwise distributed between multiple computer system, including within a fabric/cloud-based computing environment in which the functionality of the AI model serving system is provided as service over a network; also see col. 5, line 18-23, accept new model package versions and allow remote program package managers or the like to download model packages, by e.g., version number, and load them into their program runtime(s); also see col. 8, line 15-18, deploying new and in-development models to the API).
It would have been obvious to one of ordinary skill in art before the effective filing date of the claim invention to include wherein the AI application deploys the second model file on a cloud server of the AI application after receiving the second model file into model deployment of KAIYUAN.
Motivation to do so would be to include wherein the AI application deploys the second model file on a cloud server of the AI application after receiving the second model file which provide and effective automated way to share and prototype runnable model packages (Palmer, col. 6, line 30-31).
As per claim 18, this claim is rejected on grounds corresponding to the same rationales given above for rejected claim 9 and is similarly rejected.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEN HOANG whose telephone number is (571)272-8401. The examiner can normally be reached M-F 7:30am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Rones can be reached at (571)272-4085. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/KEN HOANG/Examiner, Art Unit 2168