DETAILED ACTION
This action is responsive to the application filed on 11/10/2025. Claims 1-5,7-19, and 21-24 are pending and have been examined. This action is Non-final (RCE).
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C.
120, 121, 365(c), or 386(c) is acknowledged.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 11/10/2025 has been entered.
Response to Arguments
Argument 1: The applicant argues that the claims do recite subject matters that are in line of the 8/6/2025 guidelines.
Examiner Response to Argument 1: The examiner has carefully considered the arguments set forth above, however it’s not persuasive. As set forth in the current 101 analysis, even as amended, independent claim 1 (and claims depending therefrom) is still directed, under Step 2A Prong 1, to generating feature vectors, interpreting those feature vectors to produce different “interpretations” (including semantic segmentations and text), and using those interpretations for functions such as searching images, captioning images, providing navigation or instructions, or communicating with a human operator, all of which, as drafted, amount to evaluation, interpretation, and use of information, which are mental processes. The amendments (e.g., expressly reciting that at least one of the different tasks comprises semantic segmentation and generating text data, and that a natural language processing decoder generates text data from a final feature vector) merely refine the type of data interpretation and text generation already recited and do not add any additional element that improves the functioning of the computer itself or otherwise integrates the abstract idea into a practical application under Step 2A Prong 2. Instead, the claim continues to implement these mental processes on generic computer components, like processors, memories, a camera, a “machine,” and neural-network encoders/decoders performing their conventional roles, which as explained in the current mapping, are well-understood, routine, and conventional and do not provide “significantly more” under Step 2B. Accordingly, notwithstanding Applicant’s citation to more recent guidance generally addressing AI-related subject matter, the claims as amended remain directed to a judicial exception (a mental process and, in some instances, mathematical concepts) without additional elements that integrate that exception into a practical application or add an inventive concept, and the 101 rejection is therefore maintained.
Argument 2: Applicant contends that the OA acknowledges Wang does not describe “wherein the text data is used to perform at least one of searching the images captured by the camera, captioning the images, providing navigation or instructions to the machine, or communicating with a human operator of the machine” but nevertheless relies on Wang for that feature. Applicant further asserts that Wang’s model “generates an answer prediction from the encoded visual dialogue input based on dialogue input from a human user” and therefore does not describe “the natural language processing decoder generates text data by operating on a single input that is a final one of the feature vectors received from the unified encoder” as recited in claim 1. Applicant argues that Shazeer and Teichmann likewise fail to describe this “single input / final feature vector” configuration and therefore independent claims 1, 5, and 19 (and their dependents) are allowable over the cited references.
Examiner Response to Argument 2: The examiner has considered the argument set forth above, however it’s not persuasive because it mischaracterizes how the rejection relies on the references and because, under 103 rejection, no single reference is required to disclose all limitations. As set forth in the current mapping, the mapping does not rely on Wang to teach the limitation “wherein the natural language processing decoder generates the text data by operating on a single input that is a final one of the feature vectors received from the unified encoder.” Rather, Shazeer is relied on for a computer-implemented system with processors, memories, and executable instructions implementing a machine-learning model and text-generating tasks; Teichmann is relied on for a unified encoder that produces shared feature vectors used by task-specific decoders (including semantic segmentation) in a multi-task architecture; Wang is relied on for using text data to perform searching images, captioning images, and communicating with a human operator; and Vinyals is relied on specifically for the natural language processing decoder that operates on a single final feature vector output by the encoder to generate text data. As explained in the mapping for claim 1 (and similarly for claims 5 and 19), Vinyals discloses a CNN image encoder whose last hidden layer is used “as an input to the RNN decoder that generates sentences” and states that “the image I is only input once,” i.e., a single fixed-length vector produced by the encoder is supplied as the sole image input to the decoder RNN that generates sentences. The examiner interprets this last hidden-layer feature vector, provided once as the only image input to the RNN decoder that “generates sentences,” to be the same as “a single input that is a final one of the feature vectors received from the unified encoder” being supplied to a natural language processing decoder that generates text data, because both describe a natural-language decoder that takes the encoder’s final feature representation (a single feature vector produced by the encoder) as its only input for text generation. For independent claims 1, 5, and 19 specifically, the overall combination remains obvious. Shazeer supplies the general multi-task, multi-modal ML system with text-generating tasks; Teichmann supplies the unified image encoder and semantic-segmentation decoder in a multi-task encoder/decoder architecture; Wang supplies the use of generated text data to search images, caption images, and communicate with a human operator; and Vinyals supplies the particular natural-language processing (NLP) decoder that operates on a single final feature vector from an image encoder to generate sentences. It would have been obvious to a person of ordinary skill in the art, before the effective filing date, to incorporate the well-known Neural Image Caption (NIC)-style encoder/decoder configuration from Vinyals into the multi-task image and text system of Shazeer/Teichmann so that the text-generating portion is implemented as a dedicated NLP decoder that consumes the encoder’s final image feature vector and outputs sentences, and to further use those sentences for image search, captioning, and human to machine communication as taught by Wang. Doing so represents a predictable use of known encoder/decoder image captioning techniques (Vinyals) in the context of multi-task visual perception and dialogue systems (Shazeer, Teichmann, Wang) and thus would have been a routine design choice for a skilled practitioner seeking to generate text interpretations from shared image feature vectors. Accordingly, the combination of Shazeer, Teichmann, Wang, and Vinyals still renders independent claims 1, 5, and 19 obvious, in spite of Applicant’s observation that Wang alone does not describe the “single input / final feature vector” NLP decoder.
Argument 3: Applicant further asserts that “independent claims 1, 5, and 19 are allowable over the cited references” and that “the dependent claims are submitted to be allowable over the cited references in the same manner, at least because they are dependent on the independent claims and thus contain all the limitations of the independent claims.”
Examiner Response to Argument 3: The examiner has considered the argument set forth above, however it’s not persuasive because the dependent claims have been individually addressed and rejected under Sec. 103 based on additional specific teachings in the cited art, beyond those applied to the independent claims. Accordingly, Applicant’s general assertion that the dependent claims are allowable “in the same manner” as the independent claims does not overcome the specific 103 rejections of those dependent claims.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition
of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the
conditions and requirements of this title.
Claims 1-5,7-19, and 21-24 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding claim 1,
Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
The claim is directed to a system, which falls under the category of machine. Step 1 is satisfied.
Step 2A Prong 1:
“generate the one or more feature vectors useful for performing, based on the one or more feature vectors, a plurality of different tasks each comprising different interpretations of the data; and at the different tasks comprise semantic segmentation and generating text data; interpreting the one or more feature vectors so as to decode one or more of the feature vectors… or communicating with a human operator of the machine” -- The limitation is directed to analyzing data and producing interpretations (semantic segmentations and text) from internal representations, and communication with a human operator. In its recite, this is evaluation and interpretation of information, which can be performed mentally using observation, reasoning, and judgment, and therefore recites a mental process.
Step 2A Prong 2 and Step 2B:
“A computer implemented system for interpreting data using machine learning, comprising: one or more processors; one or more memories; and one or more computer executable instructions embedded on the one or more memories, wherein the computer executable instructions are configured to execute…a unified encoder comprising a neural network encoding data of the images into one or more feature vectors, wherein the unified encoder is trained using machine learning to…a plurality of decoders connected to the unified encoder, each of the decoders comprising a neural network…a plurality of decoders, including a semantic segmentation decoder and a natural language processing decoder connected to the unified encoder, each of the decoders comprising a neural network”-- This limitation recites a system for machine learning that will have instructions to apply the limitations of the claim onto a computer which is the processors/memories and the computer system itself, which cannot be integrated to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(f)).
“to output one of the interpretations and wherein the natural language processing decoder-generates the text data by operating on a single input that is a final one of the feature vectors received from the unified encoder; and an output to the machine comprising the semantic segmentation and the text data, -- The limitation recites outputting interpretations and where the NL processing decoder will generate data by a single input of feature vectors received from the encoder, and an output to the machine that comprises semantic segmentation and text data. The limitation is directed to an insignificant, extra-solution activity that cannot be integrated to a practical application (see MPEP 2106.05(g)). Furthermore, under Step 2B, transmitting data over a network is a well-understood, routine, and conventional activity (WURC), and does not provide significantly more than the judicial exception (see MPEP 2106.05(d)(II)).
“wherein the text data is used to perform at least one of searching the images captured by the camera, captioning the images, providing navigation or instructions to the machine,”-- The limitation is directed to text data that is instructed to performing image searching that’s captured by a camera, image captioning, and machine communications/instructions. The limitation is directed to mere instructions to apply onto a computer, and thus it does not integrate to a practical application, nor provides significantly more than the judicial exception (see MPEP 2106.05(f)).
Therefore, claim 1 is non-patent eligible.
Regarding claim 2,
Step 1: The claim depends on claim 1, and thus is considered a process and an allowable subject matter. The claim satisfies Step 1.
Step 2A Prong 1:
“The computer implemented system of claim 1, wherein the different interpretations comprise at least one of a different classification or a conversion of the data into a different data format.” – The limitation is directed to interpretations of data include classifications/conversions that are in different formats, which is directed to a mental process.
There are no elements to be evaluated under step 2A Prong 2 and Step 2B.
Therefore, claim 2 is non-patent eligible.
Regarding claim 3,
Step 1: The claim depends on claim 1, and thus is considered a process and an allowable subject matter. The claim satisfies claim 3.
There are no elements to be evaluated under step 2A Prong 1
Step 2A Prong 2 and Step 2B:
“The system of claim 1, wherein the data comprises first image data, and the different interpretations comprise at least one of text data, second image data, or semantic segmentation” – This limitation recites data that is further comprising image data, and that the different interpretations further comprise of data or semantic segmentation. This is directed to merely limiting to a field of use to a particular environment and cannot be integrated to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(h)).
Therefore, claim 3 is non-patent eligible.
Regarding claim 4,
Step 1: The claim depends on claim 1, and thus is considered a process and an allowable subject matter. The claim satisfies claim 4.
There are no elements to be evaluated under step 2A Prong 1.
Step 2A Prong 2 and Step 2B:
“The system of claim 1, wherein the different tasks comprise image captioning or natural language processing, semantic segmentation, and image reconstruction.” – This limitation recites tasks first introduced in claim 1 will further include natural language processing, and semantic reconstruction as well as image reconstructions, which is merely applying the mental process of claim 1 to a particular use/field of use, and cannot be integrated to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(h)).
Therefore, claim 4 is non-patent eligible.
Regarding claim 5,
Step 1: The claim is directed to a system, which falls under the category of machine. The claim satisfies Step 1.
Step 2A Prong 1:
“to generate the one or more feature vectors useful for performing a plurality of different tasks each comprising different interpretations of data;…interpreting the one or more feature vectors so as to decode one or more feature vectors to output one of the interpretations;” -- This limitation is directed to generating feature vectors for performing tasks based on interpretation of data, which is directed to a process that can be performed mentally using observation, evaluation, and judgement, and thus is considered a mental process.
Step 2A Prong 2 and Step 2B:
“A computer implemented system for interpreting data using machine learning, comprising: an application-specific integrated circuit (ASIC) for an artificial neural network, the ASIC comprising one or more processors and one or more memories configured to execute: a unified encoder comprising a neural network encoding data into one or more feature vectors, wherein the unified encoder is trained using machine learning… a plurality of decoders including a semantic segmentation decoder and a natural language processing decoder connected to the unified encoder, each of the decoders comprising a neural network…wherein the natural language processing decoder generates text data by operating on a single input that is a final one of the feature vectors from the unified encoder:” -- The limitation recites a system for data interpretation using machine learning that will comprise a integrated circuit system for an ANN that will configure processors and memories to execute the preceding instructions to apply onto the computer/system/machine, which cannot be integrated to a practical application, nor can it provide significantly more than the judicial exception (see MPEP 2106.05(f)).
Therefore, claim 5 is non-patent eligible.
Regarding claim 7,
Step 1: The claim is directed to a system, which is directed to a machine. The claim satisfies Step 1.
There are no elements to be evaluated under step 2A Prong 1.
Step 2A Prong 2 and Step 2B:
“The system of claim 1, wherein: the system comprises a distributed network of the processors, “ the unified encoder is modular so that the unified encoder can be transmitted between different ones of the processors and executed or trained on each of the different ones of the processors, and the decoders can be executed on different ones of the processors.” – The limitation recites a system that is a “distributed”/transmittable network over a computer (processor), which falls under insignificantly, extra solution activity that cannot be integrated to a practical application (see MPEP 2106.05(g)). Furthermore, under step 2B, the act of a distributed network transmitting data from the encoder is a well-understood, routine and conventional activity (WURC), and cannot provide significantly more than the judicial exception (see MPEP 2106.05(d)(II)).
Therefore, claim 7 is non-patent eligible.
Regarding claim 8,
Step 1: The claim is directed to a system, which is directed to a machine. The claim satisfies Step 1.
There are no elements to be evaluated under step 2A Prong 1.
Step 2A Prong 1:
“concatenates the first output and the second output to form a combined output;” – This limitation recites concatenating outputs to form a combined output, which in synthesized terms, means linking output together to product a combined output, which can be performed in a mental process.
Step 2A Prong 2 and Step 2B:
“the data comprises an image; one of the decoders comprises a semantic segmentation decoder executing a decoder convolution layer and deconvolution layers, the semantic segmentation decoder:” – This limitation recite data that is further comprising images and a decoder that will execute (apply onto a computer) convolution/deconvolution layers, which cannot be integrated to a practical application, nor provide significantly more than judicial exception (see MPEP 2106.05(f)).
“the unified encoder executes a plurality of encoder convolution layers so as to output a first one of the feature vectors comprising an intermediate feature vector after a first plurality of the convolution layers and a final feature vector after all the convolution layers; receives the intermediate feature vector and passing the intermediate feature vector through the decoder convolution layer to form a first output; receives the final feature vector and passing the final feature vector through one a first one of the deconvolution layers to form a second output; passes the combined output through at least a second one of the deconvolution layers to form the one of the interpretations. -- This limitation recites a encoder that will execute convolution layers to be the output of vectors that comprised gathered data of feature vectors from a group of convolution layers and also receiving/passing final feature vectors and convolution and/or deconvolution layers, which is directed to mere data gathering and data manipulation, and thus is an insignificant, extra-solution activity and cannot be integrated to a practical application (see MPEP 2106.05(g)). Furthermore, under step 2B, the act of transmitting data over a network is a well-understood, routine, and conventional (WURC) and thus cannot be considered significantly more than the judicial exception (see MPEP 2106.05(d)(II)).
Therefore, claim 8 is non-patent eligible.
Regarding claim 9,
Step 1: The claim depends on claim 8, and claim 8 depends on claim 1,and thus is considered a process and an allowable subject matter. The claim satisfies Step 1.
Step 2A Prong 1:
“concatenates the reduced dimension feature vector with a previous hidden state and previous hidden word, if necessary; to form a concatenated layer” – This limitation is directed to concatenating vectors from previous words or state; the act of concatenating vectors is considered to be a mental process.
Step 2A Prong and Step 2B:
“receives only the final feature vector; passes the flattened feature vector through a fully connected layer to reduce a number of dimensions and form a reduced dimension feature vector; inputs the concatenated layer to a bidirectional GRU layer to form a GRU output passes the GRU output through at least one fully connected layer so as to reduce a dimensions and form another of the interpretations comprising a word output.” – This limitation recites receiving vectors, passing the vector over layers in a network, and outputting GRU through layers to form a word output, and is an insignificant , and thus is an insignificant, extra-solution activity and cannot be integrated to a practical application (see MPEP 2106.05(g)). Furthermore, under step 2B, the act of transmitting data over a network is a well-understood, routine, and conventional (WURC) and thus cannot be considered significantly more than the judicial exception (see MPEP 2106.05(d)(II)).
Therefore, claim 9 is non-patent eligible.
Regarding claim 10,
Step 1: The claim is directed to a system, which is considered to be under the category.
Step 2A Prong 1:
“successively deconvolutes the final feature vector through a plurality of deconvolution layers so as to reconstruct the data comprising an image” – The limitation is directed to deconvolution of vectors from layers in the neural network for image reconstruction, and involves a mathematical concept, and thus is directed to math.
Step 2A Prong 2 and Step 2B:
“The system of claim 9, wherein another one of the decoders comprises an image reconstruction decoder:” – This limitation merely recites furthering the limitation to a particular environment or field of use, and thus cannot be integrated to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(h)).
Therefore, claim 10 is non-patent eligible.
Regarding claim 11,
Step 1: The claim depends on claim 10, and claim 10 depends on claim 9,and thus is considered a process and an allowable subject matter. The claim satisfies Step 1.
There are no elements to be evaluated under Step 2A prong 1.
Step 2A Prong 2 and Step 2B:
“The system of claim 10, wherein hidden layers in the image reconstruction decoder and the semantic segmentation decoder are equipped with RELU activation.” – This limitation merely recites furthering the limitation to a particular environment or field of use, and thus cannot be integrated to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(h)).
Therefore, claim 11 is non-patent eligible.
Regarding claim 12,
Step 1: The claim depends on claim 8, and claim 8 depends on claim 1,and thus is considered a process and an allowable subject matter. The claim satisfies Step 1.
There are no elements to be evaluated under Step 2A Prong 1.
Step 2A Prong 2 and step 2B
“The system of claim 8, wherein the unified encoder comprises a spatial pyramid pooling layer after the convolution layers.” – The limitation recites an unified encoder that will further comprise a spatial pooling layer after the convolution layers, for which is directed to mere further limiting the unified encoder to a field of use/particular environment for the convolutional layers, and thus cannot be integrated to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(h)).
Therefore, claim 12 is non-patent eligible.
Regarding claim 13,
Step 1: The claim is directed to a system, which is directed to a machine. The claim satisfies Step 1.
There are no elements to be evaluated under step 2A Prong 1.
Step 2A Prong 2 and Step 2B:
“The system of claim 1, wherein the different tasks comprise terrain classification and image captioning.” – This limitation merely recites furthering the limitation to a particular environment or field of use, and thus cannot be integrated to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(h)).
Therefore, claim 13 is non-patent eligible.
Regarding claim 14,
Step 1: The claim is directed to a system, which is directed to a machine. The claim satisfies Step 1.
There are no elements to be evaluated under step 2A Prong 1.
Step 2A Prong 2 and Step 2B:
“The system of claim 1, wherein the unified encoder comprises a RES NET neural network, an Xception neural network, or a MobileNet neural network.” -- This limitation merely recites furthering the limitation to a particular environment or field of use, and thus cannot be integrated to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(h)).
Therefore, claim 14 is non-patent eligible.
Regarding claim 15,
Step 1: The claim is directed to a system, which is directed to a machine. The claim satisfies Step 1.
Step 2A Prong 1:
“wherein the machine utilizes one or more of the interpretations for operation of the machine.” – The limitation is directed to one or more interpretations being utilized in a machine, which is human mind capable and is directed to a mental process with or without the aid of a machine.
Step 2A Prong 2 and Step 2B:
“The system of claim 1, further comprising a machine coupled to or including one of the processors, wherein the machine comprises a vehicle, a spacecraft, a weapon, an aircraft, a robot, a medical device, an imaging device or camera, a rover, a sensor, an actuator, an intelligent agent, or a smart device in one or more smart buildings” -- This limitation merely recites furthering the limitation to a particular environment or field of use, and thus cannot be integrated to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(h)).
Therefore, claim 15 is non-patent eligible.
Regarding claim 16,
Step 1: The claim is directed to a system, which is directed to a machine. The claim satisfies Step 1.
Step 2A Prong 1:
“utilizing the interpretations for operation of the apparatus” – The limitation is directed to utilizing interpretations, which is directed to a mental process.
Step 2A Prong 2 and Step 2B:
“The system of claim 1, further comprising an apparatus coupled to or including one of the processors… herein the apparatus comprises at least one machine selected from a machine performing automated manufacturing, devices controlled by a control system, one or more devices used in banking, one or more devices supplying power or controlling power distribution, or one or more devices in an automotive or aerospace system.” – The limitation recites instructions, in a generic manner, to apply to the system that comprises an apparatus with processors, and thus cannot be integrated to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(f)).
Therefore, claim 16 is non-patent eligible.
Regarding claim 17,
Step 1: The claim depends on claim 15, and claim 15 depends on 1, and thus is considered a process and an allowable subject matter. The claim satisfies Step 1.
There are no elements to be evaluated under step 2A Prong 1.
Step 2A Prong 2 and Step 2B:
“The system of claim 15, comprising a control system actuating motion of the machine in response to the interpretations” – This limitation recited generic further instructions of application that the control system will be controlling/actuating the motion of the machine based off of interpretations, which cannot be integrated to a practical application nor provide significantly more than the judicial exception (see MPEP 2106.05(f)).
Therefore, claim 17 is non-patent eligible.
Regarding claim 18,
Step 1: The claim is directed to a system, which is directed to a machine. The claim satisfies Step 1.
There are no elements to be evaluated under step 2A Prong 1.
Step 2A Prong 2 and Step 2B:
“The system of claim 1, further comprising a display displaying the interpretations and a camera for capturing the data” -- This limitation merely recites furthering the limitation to a particular environment or field of use, and thus cannot be integrated to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(h)).
Therefore, claim 18 is non-patent eligible.
Regarding claim 19,
Step 1: The claim is directed to a method, which is directed to a machine. The claim satisfies Step 1.
Step 2A Prong 1:
“to generate one or more training feature vectors useful for performing, based on the one or more feature vectors, a plurality of different tasks each comprising different interpretations of training data;” -- This limitation is directed to generating feature vectors for performing tasks based on interpretation of data, which is directed to a process that can be performed mentally using observation, evaluation, and judgement, and thus is considered a mental process.
“mutual transfer learning comprising propagating and gradient across orthogonal task specific parameter spaces -- The limitation is directed to using mutual transfer learning method which involves gradient propagation across parameter spaces. The limitation is directed to the use of a known mathematical concept, gradient propagation, and thus it is directed to a mathematical concept.
“wherein the text data is used to perform at least one of searching the images captured by the camera, captioning the images … or communicating with the human operator of machine;” -- The limitation is directed to that text data will be used to perform searching of images captured by a camera, image captioning, and communicating with a human operator portion of a machine. The limitation is directed to a process that can be performed in the human mind using evaluation, observation, and judgement, and thus it is directed to a mental process.
Step 2A Prong 2 and Step 2B:
“outputting the interpretations to a machine coupled to a camera, wherein the interpretations comprise an identification of an environment of the machine” -- The limitation recites outputting interpretations to a machine that’s couple to identification of an machine environment. The limitation is directed to outputting interpretations (gathered data) to be inputted/outputted over a network, and it is considered an insignificant, extra-solution activity that cannot be integrated to a practical application (see MPEP 2106.05(g)). Furthermore, under Step 2B, the act of inputting/outputting gathered data is a well-understood, routine, and conventional activity (WURC) that cannot provide significantly more than the judicial exception (see MPEP 2106.05(d)(II)).
“A method for interpreting data using machine learning, comprising: training a unified encoder, comprising neural network, using one or more machine learning models,… encoding new data, using the unified encoder, into one or more feature vectors,… using a plurality of decoders, including a semantic segmentation decoder and a natural language processing decoder connected to the unified encoder, each of the decoders comprising a neural network…wherein the natural language processing decoder generates one of the interpretations comprising text data by operating on a single input that is a final one of the feature vectors from the unified encoder;…and the text data is used to perform at least one of searching the images captured by the camera, captioning the images, providing navigation or instructions to the machine…and wherein the unified encoder is trained using at least one of:…or the machine learning comprising a first model for performing at first one of the different tasks and a second model for performing a second one of the different tasks, and the training of the unified encoder alternates between the first model and the second model after an epoch, or trains both methods each epoch.” -- The limitation recites a method for training an encoder, using machine learning models, encoding new data using an encoder and its feature vectors, and multiple other instructions to apply onto a computer, which does not integrate to a practical application, nor does it provide significantly more than the judicial exception (see MPEP 2106.05(f)).
Therefore, claim 19 is non-patent eligible.
Regarding claim 21:
Step 1: The claim is directed to a system, which is directed to a machine. The claim satisfies Step 1.
There are no elements to be evaluated under Step 2A Prong 1.
Step 2A Prong 2 and Step 2B:
“The system of claim 1, wherein the machine comprises a vehicle, a spacecraft, a weapon, an aircraft, a robot, a medical device, a machine performing automated manufacturing, one or more devices used in banking, one or more devices supplying power or controlling power distribution, or a smart device in one or more smart buildings.” – The limitation recites that the machine first introduced in claim 1 will further include a bunch of other elements like a vehicle, spacecraft, a machine that will perform auto-manufacturing, and one or more devices used in different ways. The limitation amounts to no more than merely limiting the machine to field of uses/environments, and thus it cannot be integrated to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(h)).
Therefore, claim 21 is non-patent eligible.
Regarding claim 22:
Step 1: The claim is directed to a system, which is directed to a machine. The claim satisfies Step 1.
There are no elements to be evaluated under Step 2A Prong 1.
Step 2A Prong 2 and Step 2B:
“The system of claim 1, wherein the image captioning comprises connections of words relating different parts in the image according to Scientific Captioning of Terrain Images (SCOTI).” -- The limitation recites that the image captioning will further comprise of words connections relation to different parts of the image based on SCOTI captioning standards. The limitation amounts to no more than mere further limits to a field of use/environment, and thus does not integrate to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(h)).
Therefore, claim 22 is non-patent eligible.
Regarding claim 23:
Step 1: The claim is directed to a system, which is directed to a machine. The claim satisfies Step 1.
There are no elements to be evaluated under Step 2A Prong 1.
Step 2A Prong 2 and Step 2B:
“The system of claim 1, wherein the unified encoder is trained using the machine learning comprising a first model for performing a first one of the different tasks and a second model for performing a second one of the different tasks, and the training of the unified encoder alternates between the first model and the second model after an epoch or trains both methods each epoch.” -- The limitation recites that the unified encoder is further trained comprising the first model to performing different tasks, the rest of the limitations were recited before. The limitation amounts to no more than mere further limits to a field of use/environment, and thus does not integrate to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(h)).
Therefore, claim 23 is non-patent eligible.
Regarding claim 24:
Step 1: The claim is directed to a system, which is directed to a machine. The claim satisfies Step 1.
Step 2A Prong 1:
“The system of claim 5, wherein the unified encoder is trained using mutual transfer learning comprising gradient propagation across orthogonal task-specific parameter spaces” -- The limitation is directed to a unified encoder that is trained using mutual transfer learning that comprises gradient propagation across parameter spaces. The limitation is directed to the use of mathematical concepts/calculation, and thus the limitation is directed to math.
Step 2A Prong 2 and Step 2B:
“and the different tasks comprise commonalities or utilize shared information” -- The limitation recites that the different tasks will further comprise commonalities or use shared information. The limitation amounts to no more than mere further limits to a field of use/environment, and thus does not integrate to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(h)).
Therefore, claim 24 is non-patent eligible.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this
Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not
identically disclosed as set forth in section 102, if the differences between the claimed invention and the
prior art are such that the claimed invention as a whole would have been obvious before the effective filing
date of the claimed invention to a person having ordinary skill in the art to which the claimed invention
pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are
summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-4, 7-8, 13-18, 21, and 23 are rejected under 35 U.S.C. 103 as being unpatentable over US-10789427-B2 by Shazeer et. al. (referred herein by Shazeer) in view of NPL reference “Multinet: Real-time joint semantic reasoning for autonomous driving.” by Teichmann et. al. (referred herein as Teichmann) in view of US-11562147-B2 by Wang (referred herein as Wang) further in view of NPL reference “Show and tell: A neural image caption generator.”, by Vinyals et. al. (referred herein as Vinyals).
Regarding claim 1, Shazeer teaches:
A computer implemented system for interpreting data using machine learning, comprising a camera for capturing images, a machine, and one or more processors, one or more memories, and one or more computer executable instructions; ([Shazeer, col. 1, lines 40-44] “one innovative aspect … can be embodied in a system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to implement a machine learning model”, AND [Shazeer, col. 8 line 66, col. 9 line 1] “An image input modality network is a neural network that is configured to deepen a received input image feature depth using one or more convolutional layers”, wherein the examiner interprets one or more computers and one or more storage devices storing instructions to be the same as one or more processors, one or more memories, and computer-executable instructions, and interprets a received input image to be the same as an image captured by a camera).
embedded on the one or more memories, wherein the computer executable instructions are configured to execute: ([Shazeer, col 14 lines 16-20] “The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data.” wherein the examiner interprets “central processing unit for performing or executing instructions and one or more memory devices” to be the same as “memories, wherein the computer executable instructions are configured to execute”.)
each comprising different interpretations of the data, and the different tasks comprise semantic segmentation and generating text data; ([Shazeer, col. 6] “input text segment … to be parsed … an output modality neural network … may be configured to generate a text output” AND [Shazeer, col 5 lines 20-26] “machine learning tasks include speech recognition, image classification, machine translation, or parsing. For example, the multi task multi modal machine learning model 100 may receive text inputs corresponding to a machine translation task, e.g., an input text segment in an input natural language to be translated into a target natural language, or text inputs corresponding to a parsing task, e.g., an input text segment to be parsed.”, wherein the examiner interprets parsing the input segment and generating a text output to be the same as semantic segmentation generating text data). wherein the examiner interprets “machine learning tasks include speech recognition, image classification, machine translation, or parsing” to be the same as “different interpretations of the data, and the different tasks” because they are both related to different segmentation tasks. The examiner further interprets “… an input text segment in an input natural language to be translated into a target natural language … an input text segment to be parsed” and “an output modality neural network … may be configured to generate a text output” to be the same as “comprise semantic segmentation and generating text data” because they are both directed to multiple “machine learning tasks” that provide different “image classification,” “machine translation,” and “parsing” interpretations of the data and that use an “output modality neural network … configured to generate a text output” as text data corresponding to those different interpretations of the data.)
Shazeer does not teach a unified encoder comprising a neural network encoding data of the images into one or more feature vectors, wherein the unified encoder is trained using machine learning to generate the one or more feature vectors useful for performing, based on the one or more feature vectors, a plurality of different tasks a plurality of decoders, including a semantic segmentation decoder …, connected to the unified encoder, each of the decoders comprising a neural network interpreting the one or more feature vectors so as to decode one or more of the feature vectors to output one of the interpretations and wherein the natural language processing decoder generates the text data by operating on a single input that is a final one of the feature vectors received from the unified encoder, and an output to the machine comprising the semantic segmentation and the text data, wherein the text data is used to perform at least one of searching the images captured by the camera, captioning the images providing navigation or instructions to the machine, or communicating with a human operator of the machine.
Teichmann teaches:
a unified encoder comprising a neural network encoding data of the images into one or more feature vectors, wherein the unified encoder is trained using machine learning to generate the one or more feature vectors useful for performing, based on the one or more feature vectors, a plurality of different tasks ([Teichmann, page 1] “we present an approach to joint classification, detection and semantic segmentation using a unified architecture where the encoder is shared amongst the three tasks”, AND [Teichmann, page 1] “The encoder is a deep CNN, producing rich features that are shared among all task. Those features are then utilized by task-specific decoders”, wherein the examiner interprets unified architecture where the encoder is shared and rich features … shared among all task to be the same as a unified encoder that produces feature vectors reused across multiple different tasks).
a plurality of decoders, including a semantic segmentation decoder …, connected to the unified encoder, each of the decoders comprising a neural network interpreting the one or more feature vectors so as to decode one or more of the feature vectors to output one of the interpretations … ([Teichmann, page 1] “This is done by incorporating all three task into a unified encoder-decoder architecture. We name our approach MultiNet … The encoder is a deep CNN, producing rich features that are shared among all task. Those features are then utilized by task-specific decoders”, wherein the examiner interprets the “unified encoder-decoder architecture” with “rich features that are shared among all task” and “task-specific decoders” to be the same as “a plurality of decoders, including a semantic segmentation decoder, … connected to the unified encoder, each of the decoders comprising a neural network interpreting the one or more feature vectors so as to decode one or more of the feature vectors to output one of the interpretations” because they are both directed to a deep CNN encoder that produces shared feature representations (feature vectors) which are fed into multiple task-specific decoder neural networks (for classification, detection, and semantic segmentation), each decoder taking the shared features as input and producing its own task-specific interpretation as output.)
an output to the machine comprising the semantic segmentation … ([Teichmann, page 1, Introduction] “Fig. 1: Our goal: Solving street classification, vehicle detection and road segmentation in one forward pass.” AND [Teichmann, page 1] “The encoder is a deep CNN …which produce their outputs in real-time. ”, wherein the examiner interprets “segmentation” done by the “CNN” to “produce their outputs” to be the same as “output to the machine comprising the semantic segmentation”, because both are providing segmentation of the scene into different categories using a machine).
… providing navigation or instructions to the machine; ([Teichmann, page 1] “we argue that computational times are very important in order to enable real-time applications such as autonomous driving.”, wherein the examiner interprets real-time applications such as autonomous driving to be the same as providing navigation or instructions to the machine).
Shazeer and Teichmann do not teach and the text data, wherein the text data is used to perform at least one of searching the images captured by the camera, captioning the images, communicating with a human operator of the machine;
Wang teaches:
and the text data, wherein the text data is used to perform at least one of searching the images captured by the camera, ([Wang, col 2, lines 35-37] “yielding mixed results in tasks, such as VQA, visual reasoning, and image retrieval.”, wherein the examiner interprets image retrieval to be the same as searching the images captured by the camera).
captioning the images; ([Wang, col 16, lines 54-57] “The text data may also include one or more captions 352 relating or corresponding to the image data 350.”, wherein the examiner interprets captions 352 relating … to the image data 350 to be the same as captioning the images).
communicating with a human operator of the machine; ([Wang, col. 3, lines 46-48] “The visual dialogue model 140 can operate with an AI-based machine agent to hold a meaningful dialogue with humans in natural, conversational language about visual content.”, wherein the examiner interprets meaningful dialogue with humans in natural, conversational language about visual content to be the same as communicating with a human operator of the machine).
Wang does not teach …and a natural language processing decoder … and wherein the natural language processing decoder generates the text data by operating on a single input that is a final one of the feature vectors received from the unified encoder,.
Vinyals teaches …and a natural language processing decoder … and wherein the natural language processing decoder generates the text data by operating on a single input that is a final one of the feature vectors received from the unified encoder, ([Vinyals, page 3156] “An ‘encoder’ RNN reads the source sentence and transforms it into a rich fixed-length vector representation, which in turn is used as the initial hidden state of a ‘decoder’ RNN that generates the target sentence.”, [Vinyals, page 3157] “Hence, it is natural to use a CNN as an image “encoder”, by first pre-training it for an image classification task and using the last hidden layer as an input to the RNN decoder that generates sentences (see Fig. 1). We call this model the Neural Image Caption, or NIC.”, [Vinyals, page 3159] “The image I is only input once” AND [Vinyals, page 3163] “the CNN to extract features that are relevant to horse-looking animals.”, wherein the examiner interprets “last hidden layer” of the CNN image encoder, which is provided as the only image input (“image I is only input once”) to “the CNN to extract features”, and RNN decoder that “generates sentences,” to be the same as a single final feature vector produced by a un