DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are:
“the permutation module is a random patch permutation module and configured to randomly patch-permutate data transmitted from a client side to the server side…” in claim 16 – which is interpreted as being implemented using hardware or a combination of software and/or hardware components as described in page 27 paragraph 2 of the instant application.
“the permutation module is a random patch permutation module and configured to randomly patch-permutate data transmitted from a client side to the server side…” in claim 16 – which is interpreted as being implemented using hardware or a combination of software and/or hardware components as described in page 27 paragraph 2 of the instant application.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a mathematical concept without significantly more.
In determining whether the claims are subject matter eligible, the Examiner applies the 2019 USPTO Patent Eligibility Guidelines. (2019 Revised Patent Subject Matter Eligibility Guidance, 84 Fed. Reg. 50, Jan. 7, 2019.)
Regarding claim 1:
Step 1: Is the claim to a process, machine, manufacture, or composition of matter? Yes—claim 1, a method.
Step 2A, prong one: Does the claim recite an abstract idea, law of nature or natural phenomenon? Yes. The claim recites “A transformation method using a distributed learning-based multi-task vision transformer through random patch permutation, the transformation method comprising: preparing, using a task non-specific patch embedder, a patch embedding for each client and passing the patch embedding through a permutation module and then transmitting the patch embedding to a server; and storing, by the server, the received patch embedding and using the patch embedding to update body and tail parts of a vision transformer model.” This all recites mathematical concepts of organizing data and mathematical transformation, mathematical operations, and mathematical relationships, which recite the abstract idea of mathematical concepts that can be performed in the human mind and/or with the aid of pen and paper without significantly more.
Step 2A, prong two: Does the claim recite additional elements that integrate the judicial exception into a practical application? No – Although claim 1 recites “server” and the use of “a model” the recited device and method are recited at a high-level of generality such that it amounts to no more than a mere instructions to apply the exception using a generic machines that gather data and input into readily available routine into an off the shelf models.
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception? No— Although claim 1 recites “server” and the use of “a model” the recited device and method are recited at a high-level of generality such that it amounts to no more than a mere instructions to apply the exception using a generic machines that gather data and input into readily available routine into an off the shelf models. See Alice Corp. Pty. Ltd. v. CLS Bank Int’l, 573 U.S. 208, 134 S. Ct. 2347, 2360 (2014).
For the reasons above, claim 1 is rejected as being directed to non-patentable subject matter under §101. The additional limitations of the dependent claims are addressed briefly below:
Regarding dependent claim 2: “wherein the vision transformer model that performs multi-task learning is separated into a head and tail part that is a model of a client side and a body part that is a model of a server side and learning is performed with a distributed learning method without directly sharing data” – which continues to recite the abstract idea of mathematical concepts of organizing data and mathematical transformation, mathematical operations, and mathematical relationships of claim 1.
Regarding dependent claim 3: “wherein the preparing and the passing and then transmitting comprises randomly shuffling, using the permutation module, patch permutation before transmitting patch features from a client side to the server and transmitting the same” – which continues to recite the abstract idea of mathematical concepts of organizing data and mathematical transformation, mathematical operations, and mathematical relationships of claim 1.
Regarding dependent claim 4: “wherein the permutation module is a random patch permutation module and configured to randomly patch-permutate data transmitted from a client side to a server side to transmit representational feature data in which original data is unidentifiable” – which continues to recite the abstract idea of mathematical concepts of organizing data and mathematical transformation, mathematical operations, and mathematical relationships of claim 1.
Regarding dependent claim 5: “wherein the permutation module is a random patch permutation module and configured to allow only some of model weights aggregated and distributed by a server side to be shared to make it infeasible to restore the entire data in reverse order” – which continues to recite the abstract idea of mathematical concepts of organizing data and mathematical transformation, mathematical operations, and mathematical relationships of claim 1.
Regarding dependent claim 5: “wherein the permutation module is a random patch permutation module and configured to allow only some of model weights aggregated and distributed by a server side to be shared to make it infeasible to restore the entire data in reverse order” – which continues to recite the abstract idea of mathematical concepts of organizing data and mathematical transformation, mathematical operations, and mathematical relationships of claim 1.
Regarding dependent claim 6: “further comprising: performing, by the body part of the vision transformer model of the server, forward pass with permutated patch features and transmitting encoded features back to the client” – which continues to recite the abstract idea of mathematical concepts of organizing data and mathematical transformation, mathematical operations, and mathematical relationships of claim 1.
Regarding dependent claim 7: “further comprising: reversing, by the client, permutation with a stored key, transmitting reverted features to a task-specific tail part, and yielding a final output, wherein the preparing and the passing and then transmitting comprises randomly shuffling, using the permutation module, patch permutation before transmitting patch features from a client side to the server and storing the key to reverse the permutation on the client side” – which continues to recite the abstract idea of mathematical concepts of organizing data and mathematical transformation, mathematical operations, and mathematical relationships of claim 6.
Regarding dependent claim 8: “further comprising: reversing, by the client, permutation with a stored key, transmitting reverted features to a task-specific tail part, and yielding a final output, wherein the preparing and the passing and then transmitting comprises randomly shuffling, using the permutation module, patch permutation before transmitting patch features from a client side to the server and storing the key to reverse the permutation on the client side” – which continues to recite the abstract idea of mathematical concepts of organizing data and mathematical transformation, mathematical operations, and mathematical relationships of claim 6.
Regarding claim 9:
Step 1: Is the claim to a process, machine, manufacture, or composition of matter? Yes—claim 9, a method.
Step 2A, prong one: Does the claim recite an abstract idea, law of nature or natural phenomenon? Yes. The claim recites “A transformation method using a distributed learning-based multi-task vision transformer through random patch permutation, the transformation method comprising: randomly shuffling, using a permutation module, patch permutation and storing a key to reverse permutation on a client side and then transmitting patch features from a client to a server; performing, by a body part of a vision transformer model of the server, forward pass with permutated patch features and transmitting encoded features back to the client; and reversing, by the client, the permutation with the stored key, passing reverted features to a task-specific tail part, and yielding a final output.” This all recites mathematical concepts of organizing data and mathematical transformation, mathematical operations, and mathematical relationships, which recite the abstract idea of mathematical concepts that can be performed in the human mind and/or with the aid of pen and paper without significantly more.
Step 2A, prong two: Does the claim recite additional elements that integrate the judicial exception into a practical application? No – Although claim 9 recites “server” and the use of “a model” the recited device and method are recited at a high-level of generality such that it amounts to no more than a mere instructions to apply the exception using a generic machines that gather data and input into readily available routine into an off the shelf models.
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception? No— Although claim 9 recites “server” and the use of “a model” the recited device and method are recited at a high-level of generality such that it amounts to no more than a mere instructions to apply the exception using a generic machines that gather data and input into readily available routine into an off the shelf models. See Alice Corp. Pty. Ltd. v. CLS Bank Int’l, 573 U.S. 208, 134 S. Ct. 2347, 2360 (2014).
For the reasons above, claim 9 is rejected as being directed to non-patentable subject matter under §101. The additional limitations of the dependent claims are addressed briefly below:
Regarding dependent claim 10: “wherein the transmitting the encoded features back to the client comprises performing, using the permutation module, back-propagation in order of tail, body, and head of the vision transformer model that is the opposite way of forward propagation” – which continues to recite the abstract idea of mathematical concepts of organizing data and mathematical transformation, mathematical operations, and mathematical relationships of claim 9.
Regarding dependent claim 11: “wherein the transmitting the patch features to the server comprises preparing, using a task non-specific patch embedder, a patch embedding for each client and passing the patch embedding through the permutation module and then transmitting the patch embedding to the server” – which continues to recite the abstract idea of mathematical concepts of organizing data and mathematical transformation, mathematical operations, and mathematical relationships of claim 9.
Regarding dependent claim 12: “further comprising: storing, by the server, the received patch embedding and using the patch embedding to update body and tail parts of a vision transformer model” – which continues to recite the abstract idea of mathematical concepts of organizing data and mathematical transformation, mathematical operations, and mathematical relationships of claim 9.
Regarding claim 13:
Step 1: Is the claim to a process, machine, manufacture, or composition of matter? No—claim 13, claims “A distributed learning-based multi-task vision transformer” which is not a process, machine, manufacture, or composition of matter. The claimed invention is directed to non-statutory subject matter. The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because the recited “distributed learning-based multi-task vision transformer” is a product that does “not have a physical or tangible form” and is considered “information (often referred to as "data per se") or a computer program per se (often referred to as "software per se") when claimed as a product without any structural recitations.” Therefore, claim 13 is considered software per se as being not one of the statutory categories because “As the courts' definitions of machines, manufactures and compositions of matter indicate, a product must have a physical or tangible form in order to fall within one of these statutory categories.” (See MPEP 2106.03).
Step 2A, prong one: Does the claim recite an abstract idea, law of nature or natural phenomenon? Yes. The claim recites “A distributed learning-based multi-task vision transformer through random patch permutation, the transformer comprising: a head part configured to prepare, using a task non-specific patch embedder, a patch embedding for each client, to pass the patch embedding through a permutation module and then transmit the patch embedding to a server; and a feature storage configured to store the received patch embedding in the server and to use the patch embedding to update and tail parts of a vision transformer model.” This all recites mathematical concepts of organizing data and mathematical transformation, mathematical operations, and mathematical relationships, which recite the abstract idea of mathematical concepts that can be performed in the human mind and/or with the aid of pen and paper without significantly more.
Step 2A, prong two: Does the claim recite additional elements that integrate the judicial exception into a practical application? No – Although claim 13 recites “server” and the use of “a model” the recited device and method are recited at a high-level of generality such that it amounts to no more than a mere instructions to apply the exception using a generic machines that gather data and input into readily available routine into an off the shelf models.
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception? No— Although claim 13 recites “server” and the use of “a model” the recited device and method are recited at a high-level of generality such that it amounts to no more than a mere instructions to apply the exception using a generic machines that gather data and input into readily available routine into an off the shelf models. See Alice Corp. Pty. Ltd. v. CLS Bank Int’l, 573 U.S. 208, 134 S. Ct. 2347, 2360 (2014).
For the reasons above, claim 13 is rejected as being directed to non-patentable subject matter under §101. The additional limitations of the dependent claims are addressed briefly below:
Regarding dependent claim 14: “wherein the vision transformer model that performs multi-task learning is separated into a head and tail part that is a model of a client side and a body part that is a model of a server side and learning is performed with a distributed learning method without directly sharing data” – which continues to recite software per se and the abstract idea of mathematical concepts of organizing data and mathematical transformation, mathematical operations, and mathematical relationships of claim 13.
Regarding dependent claim 15: “1herein the head part is configured to randomly shuffle, using the permutation module, patch permutation before transmitting patch features from a client side to the server and to transmit the same” – which continues to recite the software per se and the abstract idea of mathematical concepts of organizing data and mathematical transformation, mathematical operations, and mathematical relationships of claim 13.
Regarding dependent claim 16: “wherein the permutation module is a random patch permutation module and configured to randomly patch-permutate data transmitted from a client side to the server side to transmit representational feature data in which original data is unidentifiable”— which continues to recite the software per se and the abstract idea of mathematical concepts of organizing data and mathematical transformation, mathematical operations, and mathematical relationships of claim 13.
Regarding dependent claim 17: “wherein the permutation module is a random patch permutation module and configured to allow only some of model weights aggregated and distributed by the server side to be shared to make it infeasible to restore the entire data in reverse order” – which continues to recite the software per se and the abstract idea of mathematical concepts of organizing data and mathematical transformation, mathematical operations, and mathematical relationships of claim 13.
Regarding dependent claim 18: “further comprising: a body part configured to perform forward pass with permutated patch features in the body part of the vision transformer model of the server and to transmit encoded features back to the client” – which continues to recite the software per se and the abstract idea of mathematical concepts of organizing data and mathematical transformation, mathematical operations, and mathematical relationships of claim 13.
Regarding dependent claim 19: “further comprising: a tail part configured to reverse, by the client, permutation with a stored key, to transmit reverted features to a task-specific tail part, and to yield a final output, wherein the head part is configured to randomly shuffle, using the permutation module, patch permutation before transmitting patch features from a client side to the server and to store the key to reverse the permutation on the client side” – which continues to recite the software per se and the abstract idea of mathematical concepts of organizing data and mathematical transformation, mathematical operations, and mathematical relationships of claim 18.
Regarding dependent claim 20: “wherein the body part is configured to perform, using the permutation module, back-propagation in order of tail, body, and head of the vision transformer model that is the opposite way of forward propagation” – which continues to recite the software per se and the abstract idea of mathematical concepts of organizing data and mathematical transformation, mathematical operations, and mathematical relationships of claim 18.
Taken alone, the additional elements of the dependent claims above do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1-4, 6, 13-16, and 18 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Ma et al., (US 2023/0306723 A1, hereinafter Ma)
Regarding claims 1 and 13, taking claim 13 as exemplary:
Ma shows:
“A distributed learning-based multi-task vision transformer through random patch permutation, the transformer comprising: a head part configured to prepare, using a task non-specific patch embedder,” (Paragraph [0010]: “Unlike CNNs, Transformers measure the relationships between pairs of input tokens (words in the case of text strings), termed attention. The cost is exponential with the number of tokens. For images, the basic unit of analysis is the pixel. However, computing relationships for every pixel pair in a typical image is prohibitive in terms of memory and computation. Instead, ViT computes relationships among pixels in various small sections of the image (e.g., 16x16 pixels), at a drastically reduced cost. The sections (with positional embeddings) are placed in a sequence. The embeddings are learnable vectors. Each section is arranged into a linear sequence and multiplied by the embedding matrix. The result, with the position embedding is fed to the transformer. The architecture for image classification is the most common and uses only the Transformer Encoder in order to transform the various input tokens.” And in paragraph [0044]: “For transfer learning to the classification target tasks, the transformer is taken as a pre-trained model and a task-specific classification head is then added. The transfer learning performance of all pre-trained models are further assessed by fine-tuning all layers in the downstream networks.” And in paragraph [0123]: “such a ImageNet, with randomized initialization weights (e.g., via the transformer benchmarking platform 1145) to generate initialized vision transformer models corresponding to each of the pre-trained base models stored in memory via model training 1147”)
“a patch embedding for each client, to pass the patch embedding through a permutation module and then transmit the patch embedding to a server;” (Paragraph [0010]: “Unlike CNNs, Transformers measure the relationships between pairs of input tokens (words in the case of text strings), termed attention. The cost is exponential with the number of tokens. For images, the basic unit of analysis is the pixel. However, computing relationships for every pixel pair in a typical image is prohibitive in terms of memory and computation. Instead, ViT computes relationships among pixels in various small sections of the image (e.g., 16x16 pixels), at a drastically reduced cost. The sections (with positional embeddings) are placed in a sequence. The embeddings are learnable vectors. Each section is arranged into a linear sequence and multiplied by the embedding matrix. The result, with the position embedding is fed to the transformer. The architecture for image classification is the most common and uses only the Transformer Encoder in order to transform the various input tokens.” And in paragraph [0044]: “For transfer learning to the classification target tasks, the transformer is taken as a pre-trained model and a task-specific classification head is then added. The transfer learning performance of all pre-trained models are further assessed by fine-tuning all layers in the downstream networks.” And in paragraph [0121]: “FIG. 11 shows a diagrammatic representation of a system 1101 within which embodiments may operate, be installed, integrated, or configured. In accordance with one embodiment, there is a system 1101 having at least a processor 1190 and a memory 1195 therein to execute implementing application code 1194. Such a system 1101 may communicatively interface with and cooperatively execute with the benefit of remote systems, such as a user device sending instructions and data, a user device to receive as an output from the system 1101 a semantics-enriched pre-trained model having a trained encoder-decoder structure with generic feature extraction and refinement operations as performed by the system 1101, or systems within a networked or within a client-server environment, etc.”)
“and a feature storage configured to store the received patch embedding in the server and to use the patch embedding to update and tail parts of a vision transformer model.” (Paragraph [0044]: “For transfer learning to the classification target tasks, the transformer is taken as a pre-trained model and a task-specific classification head is then added. The transfer learning performance of all pre-trained models are further assessed by fine-tuning all layers in the downstream networks.”)
Regarding claims 2 and 14, taking claim 14 as exemplary:
Ma shows the method and transformer of claims 1 and 13 as claimed and specified above.
And Ma shows “wherein the vision transformer model that performs multi-task learning is separated into a head and tail part that is a model of a client side and a body part that is a model of a server side and learning is performed with a distributed learning method without directly sharing data.” (Paragraph [0044]: “For transfer learning to the classification target tasks, the transformer is taken as a pre-trained model and a task-specific classification head is then added. The transfer learning performance of all pre-trained models are further assessed by fine-tuning all layers in the downstream networks.” And in paragraph [0121]: “FIG. 11 shows a diagrammatic representation of a system 1101 within which embodiments may operate, be installed, integrated, or configured. In accordance with one embodiment, there is a system 1101 having at least a processor 1190 and a memory 1195 therein to execute implementing application code 1194. Such a system 1101 may communicatively interface with and cooperatively execute with the benefit of remote systems, such as a user device sending instructions and data, a user device to receive as an output from the system 1101 a semantics-enriched pre-trained model having a trained encoder-decoder structure with generic feature extraction and refinement operations as performed by the system 1101, or systems within a networked or within a client-server environment, etc.” And in paragraph [0151]: “The machine may operate in the capacity of a server or a client machine in a client-server network environment, as a peer machine in a peer-to-peer (or distributed) network environment, as a server or series of servers within an on-demand service environment.”)
Regarding claims 3 and 15, taking claim 15 as exemplary:
Ma shows the method and transformer of claims 3 and 15 as claimed and specified above.
And Ma shows “wherein the head part is configured to randomly shuffle, using the permutation module, patch permutation before transmitting patch features from a client side to the server and to transmit the same.” (Paragraph [0123]: “The system 1101 is further configured for initializing each of the pre-trained base models 1139 stored in memory 1195 using a standardized dataset 1140, such a ImageNet, with randomized initialization weights (e.g., via the transformer benchmarking platform 1145) to generate initialized vision transformer models corresponding to each of the pre-trained base models stored in memory via model training 1147.”)
Regarding claims 4 and 16, taking claim 16 as exemplary:
Ma shows the method and transformer of claims 4 and 16 as claimed and specified above.
And Ma shows “wherein the permutation module is a random patch permutation module and configured to randomly patch-permutate data transmitted from a client side to the server side to transmit representational feature data in which original data is unidentifiable.” (Paragraph [0055]: “Masked Image Modeling (MIM) is an approach in which portions of the input image signals are randomly masked and then the original input signals are subsequently recovered at the masked area. Such a technique has recently received great attention in computer vision for pre-training transformers in a self-supervised manner. The MIM-based self-supervised methods are widely accepted to capture more task-agnostic features than supervised pre-trained models, making them better suited for fine-tuning on various vision tasks.” In paragraph [0121]: “FIG. 11 shows a diagrammatic representation of a system 1101 within which embodiments may operate, be installed, integrated, or configured. In accordance with one embodiment, there is a system 1101 having at least a processor 1190 and a memory 1195 therein to execute implementing application code 1194. Such a system 1101 may communicatively interface with and cooperatively execute with the benefit of remote systems, such as a user device sending instructions and data, a user device to receive as an output from the system 1101 a semantics-enriched pre-trained model having a trained encoder-decoder structure with generic feature extraction and refinement operations as performed by the system 1101, or systems within a networked or within a client-server environment, etc.” And in paragraph [0123]: “The system 1101 is further configured for initializing each of the pre-trained base models 1139 stored in memory 1195 using a standardized dataset 1140, such a ImageNet, with randomized initialization weights (e.g., via the transformer benchmarking platform 1145) to generate initialized vision transformer models corresponding to each of the pre-trained base models stored in memory via model training 1147.”)
Regarding claims 6 and 18, taking claim 18 as exemplary:
Ma shows the method and transformer of claims 1 and 13 as claimed and specified above.
And Ma shows “further comprising: a body part configured to perform forward pass with permutated patch features in the body part of the vision transformer model of the server and to transmit encoded features back to the client.” (Paragraph [0010]: “Unlike CNNs, Transformers measure the relationships between pairs of input tokens (words in the case of text strings), termed attention. The cost is exponential with the number of tokens. For images, the basic unit of analysis is the pixel. However, computing relationships for every pixel pair in a typical image is prohibitive in terms of memory and computation. Instead, ViT computes relationships among pixels in various small sections of the image (e.g., 16x16 pixels), at a drastically reduced cost. The sections (with positional embeddings) are placed in a sequence. The embeddings are learnable vectors. Each section is arranged into a linear sequence and multiplied by the embedding matrix. The result, with the position embedding is fed to the transformer. The architecture for image classification is the most common and uses only the Transformer Encoder in order to transform the various input tokens.” In paragraph [0044]: “For transfer learning to the classification target tasks, the transformer is taken as a pre-trained model and a task-specific classification head is then added. The transfer learning performance of all pre-trained models are further assessed by fine-tuning all layers in the downstream networks.” And in paragraph [0121]: “FIG. 11 shows a diagrammatic representation of a system 1101 within which embodiments may operate, be installed, integrated, or configured. In accordance with one embodiment, there is a system 1101 having at least a processor 1190 and a memory 1195 therein to execute implementing application code 1194. Such a system 1101 may communicatively interface with and cooperatively execute with the benefit of remote systems, such as a user device sending instructions and data, a user device to receive as an output from the system 1101 a semantics-enriched pre-trained model having a trained encoder-decoder structure with generic feature extraction and refinement operations as performed by the system 1101, or systems within a networked or within a client-server environment, etc.” And in paragraph [0151]: “The machine may operate in the capacity of a server or a client machine in a client-server network environment, as a peer machine in a peer-to-peer (or distributed) network environment, as a server or series of servers within an on-demand service environment.”)
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 5 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ma in view of Jacobs et al., (US 2022/0051133 A1, hereinafter Jacobs).
Regarding claims 5 and 17, taking claim 17 as exemplary:
Ma shows the method and transformer of claims 1 and 13 as claimed and specified above.
But Ma does not appear to explicitly recite “wherein the permutation module is a random patch permutation module and configured to allow only some of model weights aggregated and distributed by the server side to be shared to make it infeasible to restore the entire data in reverse order.”
However, Jacobs teaches “wherein the permutation module is a random patch permutation module and configured to allow only some of model weights aggregated and distributed by the server side to be shared to make it infeasible to restore the entire data in reverse order.” (Paragraph [0081]: “3) Computer-based healthcare systems: Due to privacy concerns, hospitals cannot share training data but still might benefit from knowledge transfer, which can be achieved using embodiments of the present invention by sharing only metadata and model weights, potentially in anonymized form. Prediction tasks here include, but are not limited to, segmentation of normal structures and segmentation of white matter lesions in brain magnetic resonance imaging (MRI) and treating electronic health record (EHR) systems as different tasks.”)
Ma and Jacobs are analogous in the arts because both Ma and Jacobs both describe models with image data.
Therefore, it would be obvious to one of ordinary skill in the art at the filing date of the instant application, having the teachings of Ma and Jacobs before him or her, to modify the teachings of Ma to include the teachings of Jacobs in order to increase privacy of Ma and thereby increase marketability (see Jacobs paragraph [0081]).
Claim(s) 8 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ma in view of Mahto et al., (US 2019/0244064 A1, hereinafter Mahto).
Regarding claims 8 and 20, taking claim 20as exemplary:
Ma shows the method and transformer of claims 6 and 18 as claimed and specified above.
But Ma does not appear to explicitly recite “further comprising: a body part configured to perform forward pass with permutated patch features in the body part of the vision transformer model of the server and to transmit encoded features back to the client.”
However, Mahto teaches “further comprising: a body part configured to perform forward pass with permutated patch features in the body part of the vision transformer model of the server and to transmit encoded features back to the client.” (Paragraph [0055]: “The parameter updater 140 updates the parameters of the feature transformer 110 and the classifier 120 according to the cost which is minimized by using popular numerical methods such as back propagation. This process of the pattern recognition apparatus 100 keeps going till convergence when the cost can be reduced no more. After convergence, the parameter updater 140 stores the parameters of the trained feature transformer 110 in the storage 150. The parameter updater 140 or the feature transformer 110 may store structure of the feature transformer 110.”)
Ma and Mahto are analogous in the arts because both Ma and Mahto both describe models with transformer data.
Therefore, it would be obvious to one of ordinary skill in the art at the filing date of the instant application, having the teachings of Ma and Mahto before him or her, to modify the teachings of Ma to include the teachings of Mahto in order to decrease cost of pattern recognition through back propagation (see Mahto paragraph [0055]).
Would Be Allowable Subject Matter
Claims 7 and 19 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 101 set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of would be allowable subject matter:
As per claims 7 and 19, taking claim 19 as exemplary:
Though Pang et al., (US 2024/0078666 A1), part of the prior art made of record, teaches the use of patch embeddings for a visual transformer model in paragraph [0020] through the use of “an image transformer operates by dividing an image into fixed-size patches, correctly embedding each of the patches, and concatenating positional embedding as an input to a transformer encoder.”
And though Jai et al., (Visual Prompt Tuning, 20 July 2022), part of the prior art made of record, teaches the use of embeddings for a visual transformer model in page 4, paragraph 2, through “Visual-Prompt Tuning (VPT) for adapting large pre-trained vision Transformer models. VPT injects a small number of learnable parameters into Transformer’s input space and keeps the backbone frozen during the downstream training stage.”
And though Ma et al., (US 2023/0306723 A1), part of the prior art made of record, teaches the use of use of patch embeddings for a visual transformer model in paragraph [0010] position embedding fed to a transformer for image classification.
The primary reason for marking of would be allowable subject matter of dependent claims 7 and 19, taking claim 19, in the instant application, is the combination with the inclusion in these claims of the limitations of a method and transformer comprising:
“a tail part configured to reverse, by the client, permutation with a stored key, to transmit reverted features to a task-specific tail part, and to yield a final output, wherein the head part is configured to randomly shuffle, using the permutation module, patch permutation before transmitting patch features from a client side to the server and to store the key to reverse the permutation on the client side.”
The prior art of made of record above neither anticipates nor renders obvious the above-recited combinations. Specifically, though the prior art of made of record does teach embeddings with image classification, it does not teach a tail part configured to reverse, by the client, permutation with a stored key, to transmit reverted features to a task-specific tail part, and to yield a final output, wherein the head part is configured to randomly shuffle, using the permutation module, patch permutation before transmitting patch features from a client side to the server and to store the key to reverse the permutation on the client side.
Claims 9-12 would be allowable if rewritten or amended to overcome the rejection(s) under 35 U.S.C. 101 set forth in this Office action.
The following is a statement of reasons for the indication of would be allowable subject matter:
As per claim 9:
Though Pang et al., (US 2024/0078666 A1), part of the prior art made of record, teaches the use of patch embeddings for a visual transformer model in paragraph [0020] through the use of “an image transformer operates by dividing an image into fixed-size patches, correctly embedding each of the patches, and concatenating positional embedding as an input to a transformer encoder.”
And though Jai et al., (Visual Prompt Tuning, 20 July 2022), part of the prior art made of record, teaches the use of embeddings for a visual transformer model in page 4, paragraph 2, through “Visual-Prompt Tuning (VPT) for adapting large pre-trained vision Transformer models. VPT injects a small number of learnable parameters into Transformer’s input space and keeps the backbone frozen during the downstream training stage.”
And though Ma et al., (US 2023/0306723 A1), part of the prior art made of record, teaches the use of use of patch embeddings for a visual transformer model in paragraph [0010] position embedding fed to a transformer for image classification.
The primary reason for marking of would be allowable subject matter of independent claim 9 is the combination with the inclusion in these claims of the limitations of a method comprising:
“a distributed learning-based multi-task vision transformer through random patch permutation, the transformation method comprising: randomly shuffling, using a permutation module, patch permutation and storing a key to reverse permutation on a client side and then transmitting patch features from a client to a server; performing, by a body part of a vision transformer model of the server, forward pass with permutated patch features and transmitting encoded features back to the client; and reversing, by the client, the permutation with the stored key, passing reverted features to a task-specific tail part, and yielding a final output.”
The prior art of made of record above neither anticipates nor renders obvious the above-recited combinations. Specifically, though the prior art of made of record does teach embeddings with image classification, it does not teach a distributed learning-based multi-task vision transformer through random patch permutation, the transformation method comprising: randomly shuffling, using a permutation module, patch permutation and storing a key to reverse permutation on a client side and then transmitting patch features from a client to a server; performing, by a body part of a vision transformer model of the server, forward pass with permutated patch features and transmitting encoded features back to the client; and reversing, by the client, the permutation with the stored key, passing reverted features to a task-specific tail part, and yielding a final output.
Dependent claim(s) 10-12 are marked as would be allowable at least for the reasons recited above as including all of the limitations
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Pang et al., (US 2024/0078666 A1), part of the prior art made of record, teaches the use of patch embeddings for a visual transformer model of claims 1, 9, and 13 in paragraph [0020] through the use of “an image transformer operates by dividing an image into fixed-size patches, correctly embedding each of the patches, and concatenating positional embedding as an input to a transformer encoder.”
Jai et al., (Visual Prompt Tuning, 20 July 2022), part of the prior art made of record, teaches the use of embeddings for a visual transformer model of claims 1, 9, and 13 in page 4, paragraph 2, through “Visual-Prompt Tuning (VPT) for adapting large pre-trained vision Transformer models. VPT injects a small number of learnable parameters into Transformer’s input space and keeps the backbone frozen during the downstream training stage.”
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHANE D WOOLWINE whose telephone number is (571)272-4138. The examiner can normally be reached M-F 9:30-6:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MIRANDA HUANG can be reached at (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
SHANE D. WOOLWINE
Primary Examiner
Art Unit 2124
/SHANE D WOOLWINE/Primary Examiner, Art Unit 2124