Last updated: April 19, 2026
Application No. 17/822,553
MACHINE LEARNING APPARATUS, MACHINE LEARNING METHOD, AND INFERENCE APPARATUS

Final Rejection §101§103
Filed
Aug 26, 2022
Examiner
MANG, VAN C
Art Unit
2126
Tech Center
2100 — Computer Architecture & Software
Assignee
Kabushiki Kaisha Toshiba
OA Round
2 (Final)
Interview Optional

— +26.9% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 241 resolved cases, 2023–2026
Examiner Intelligence

MANG, VAN C View full profile →
Grants 75% — above average
Career Allow Rate
181 granted / 241 resolved
+20.1% vs TC avg
Strong +27% interview lift
Without
With
+26.9%
Interview Lift
resolved cases with interview
Typical timeline
3y 10m
Avg Prosecution
31 currently pending
Career history
272
Total Applications
across all art units
Statute-Specific Performance

§101
31.2%
-8.8% vs TC avg
§103
42.5%
+2.5% vs TC avg
§102
8.0%
-32.0% vs TC avg
§112
13.5%
-26.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 241 resolved cases
Office Action

§101 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. JP2022-019858, filed on 02/10/2022.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 08/26/2022, 04/06/2023 and 04/18/2025 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Response to Arguments
Applicant's arguments filed 09/08/2025 have been fully considered but they are not persuasive.
Rejections Under 35 U.S.C. 101: 
Applicant asserts that “According to the background section, existing systems use various detection techniques for detecting potentially malicious network packets and can alert a network administrator to potential problems. The disclosed system detects network intrusions and takes real-time remedial actions, including dropping suspicious packets and blocking traffic from suspicious source addresses. The background section further explains that the disclosed system enhances security by acting in real time to proactively prevent network intrusions. The claimed invention reflects this improvement in the technical field of network intrusion detection. Steps (d)- (f) provide for improved network security using the information from the detection to enhance security by taking proactive measures to remediate the danger by detecting the source address associated with the potentially malicious packets. Specifically, the claim reflects the improvement in step (d), dropping potentially malicious packets in step (e), and blocking future traffic from the source address in step (f). These steps reflect the improvement described in the background. Thus, the claim as a whole integrates the judicial exception into a practical application such that the claim is not directed to the judicial exception.” (Remarks pg. 6-10)

Examiner’s response:
The Examiner respectfully disagrees. The claim as a whole is still directed to abstract idea mental process. The newly added limitations do not include the improvement aspect of the invention. The newly added limitations under its broadest reasonable interpretations still recite mental process see newly rejected below. The claim is directed to an abstract idea.




Claim Rejections - 35 USC § 103
Applicant’s arguments with respect to claim(s) 1, 7-8 and 10-13 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 7-8 and 10-13 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea and does not integrate the judicial exception into a practical application or amount to significantly more than the judicial exception.  
Regarding claim 1
Step 1:  The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1:  The claim recites multiple mental processes, as explained below.  The claim recites, inter alia:
“…the training sample in the VQA format including a combination of an object, a question text regarding the object and an answer text in response to the question text as elements, the sample in the non-VQA format including a combination of an object and a label related to the object as elements, and trains a statistical model of the VQA task based on the generated training sample in the VQA format.”
This limitation is directed to the abstract idea of a mental process (concepts performed in the human mind, including observation and evaluation [see MPEP 2106.04(a)(2) III. C.]).
“wherein the VQA task is an image detection task, 
the sample in a format of the image detection task of the non-VQA format includes, as the label, a rectangular ground truth position and ground truth size surrounding an obiect in an image and a ground truth class of the object, and 
the …generates the question text and the answer text based on the ground truth position and/or the ground truth size and the ground truth class.”
This limitation is directed to the abstract idea of a mental process (concepts performed in the human mind, including observation and evaluation [see MPEP 2106.04(a)(2) III. C.]).


Step 2A Prong 2: This judicial exception is not integrated into a practical. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element of “A machine learning apparatus comprising: a processing circuit that generates a training sample in a visual question answering (VQA) format regarding a VQA task based on a sample in a non-VQA format… processing circuit”, as drafted, is reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Thus, the claim is not patent eligible.

Regarding claim 7
Step 1:  The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1:  The claim recites multiple mental processes, as explained below.  The claim recites, inter alia:
“wherein the statistical model comprises: an encoder that converts the object into a first feature; an encoder that converts the answer text into a second feature; a fuser that generates a fused feature of the first feature and the second feature; and a converter that converts the fused feature into a character string of natural language representing the answer text.”
This limitation is directed to the abstract idea of a mental process (concepts performed in the human mind, including observation and evaluation [see MPEP 2106.04(a)(2) III. C.]).

Step 2A Prong 2: This judicial exception is not integrated into a practical. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element of “machine learning apparatus”, as drafted, is reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component.

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Thus, the claim is not patent eligible.

Regarding claim 8
Step 1:  The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1:  The claim recites multiple mental processes, as explained below.  The claim recites, inter alia:
“wherein the converter converts the fused feature into a relative value series representing occurrence probabilities of words constituting the answer text.”
This limitation is directed to the abstract idea of a mental process (concepts performed in the human mind, including observation and evaluation [see MPEP 2106.04(a)(2) III. C.]).
Step 2A Prong 2: This judicial exception is not integrated into a practical. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element of “machine learning apparatus”, as drafted, is reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component.

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Thus, the claim is not patent eligible.

Regarding claim 10
Step 1:  The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1:  The claim recites multiple mental processes, as explained below.  The claim recites, inter alia:
“A machine learning method comprising: a conversion step of generating a training sample in a visual question answering (VQA) format regarding a VQA task based on a sample in a non-VQA format, the training sample in the VQA format including a combination of an object, a question text regarding the object and an answer text in response to the question text as elements, and the sample in the non-VQA format including a combination of an object and a label related to the object as elements…”
This limitation is directed to the abstract idea of a mental process (concepts performed in the human mind, including observation and evaluation [see MPEP 2106.04(a)(2) III. C.]).
“wherein the VQA task is an image detection task, the sample in a format of the image detection task of the non-VOA format includes, as the label, a rectangular ground truth position and ground truth size surrounding an object in an image and a ground truth class of the object, and the method includes generating the question text and the answer text based on the ground truth position and/or the ground truth size and the ground truth class.”
This limitation is directed to the abstract idea of a mental process (concepts performed in the human mind, including observation and evaluation [see MPEP 2106.04(a)(2) III. C.]).

Step 2A Prong 2: This judicial exception is not integrated into a practical. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element of “A machine learning… and a training step of training a statistical model of the VQA task based on the training sample in the VQA format generated in the conversion step”, as drafted, is reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Thus, the claim is not patent eligible.

Regarding claim 11
Step 1:  The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1:  The claim recites multiple mental processes, as explained below.  The claim recites, inter alia:
“A… that applies an object and a question text regarding the object to a statistical model of a visual question answering (VQA) task according to claim 1 to infer an answer text in response to the question text, and displays the answer text at a display.”
This limitation is directed to the abstract idea of a mental process (concepts performed in the human mind, including observation and evaluation [see MPEP 2106.04(a)(2) III. C.]).
Step 2A Prong 2: This judicial exception is not integrated into a practical. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element of “inference apparatus comprising: a processing circuit”, as drafted, is reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component.

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Thus, the claim is not patent eligible.

Regarding claim 12
Step 1:  The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1: 
Step 2A Prong 2: This judicial exception is not integrated into a practical. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element of “inference apparatus… wherein the processing circuit generates the question text based on a label associated with the object.”, as drafted, is reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component.

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Thus, the claim is not patent eligible.

Regarding claim 13
Step 1:  The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1:  
Step 2A Prong 2: This judicial exception is not integrated into a practical. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element of “inference apparatus wherein the processing circuit generates the question text that is fixed.”, as drafted, is reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component.

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Thus, the claim is not patent eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1, 7-8 and 10-13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Pham et al. (US S 2022/0129693 A1) in view of Zhou et al. (US 2022/0130499 A1) and further in view of Gao et al. (“Learning to Recognize Visual Concepts for Visual Question Answering with Structural Label Space”)
Regarding claim 1 (Currently Amended)
Pham teaches a machine learning apparatus comprising: a processing circuit that generates a training sample in a visual question answering (VQA) format regarding a VQA task based on a sample in a…(abstract “According to one embodiment, a state determination apparatus includes a processor. The processor acquires a targeted image. The processor acquires a question concerning the targeted image and an expected answer to the question. The processor generates an estimated answer estimated with respect to the question concerning the targeted image using a trained model trained to estimate an answer based on a question concerning an image”)
the training sample in the VQA format including a combination of an object, (Examiner notes that the combination of object include question and answer see para (para [0031] “In step S202, the question and answer acquisition unit 12 acquires a question concerning the targeted image and an expected answer to the question. In the first embodiment, it is assumed that whether preparation or work is performed in accordance with a safety manual is determined. Therefore, questions and expected answers are prepared in advance based on the safety manual. In other words, the expected answers are prepared on the assumption of normal states.”)
a question text regarding the object and an answer text in response to the question text as elements, (para [0031-0032] “In step S202, the question and answer acquisition unit 12 acquires a question concerning the targeted image and an expected answer to the question. In the first embodiment, it is assumed that whether preparation or work is performed in accordance with a safety manual is determined. Therefore, questions and expected answers are prepared in advance based on the safety manual. In other words, the expected answers are prepared on the assumption of normal states. In step S203, the inference unit 13 generates an estimated answer to a question concerning the targeted image using the trained model relating to VQA”)
…
wherein the VQA task is an image detection task, (para [0106] “In step S1301, the image feature calculation unit 83 generates a combined ROI by combining the ROI obtained in step S1001 with an image region obtained in step S1003 . In the generation of the combined ROI, for example, the sum of the ROI detected in step S1001 and the image region”)
Pham does not teach the sample in the non- VQA format including a combination of an object and a label related to the object as elements, and trains a statistical model of the VQA task based on the generated training sample in the VQA format, 
…
the sample in a format of the image detection task of the non-VQA format includes, as the label, a rectangular ground truth position and ground truth size surrounding an object in an image and a ground truth class of the object, and 
the processing circuit generates the question text and the answer text based on the ground truth position and/or the ground truth size and the ground truth class. 
Goa teaches the sample in the non- VQA format including a combination of an object and a label related to the object as elements, (pg. 495 “we usually subconsciously learn the new concept with relevant concepts together (e.g. “batter, pitcher, catcher”) and eliminate the disturbance of irrelevant concepts (e.g. “red, sunny”). Motivated by this idea, we propose a structural label space to represent the semantic meanings of the answers, as shown in Fig. 1(d). Our structural label space organizes the labels into many groups, where the concepts in one group have relevant semantic meanings. More concretely, the concepts in one group classify the things from the same perspective, e.g. “pitcher, batter” classify the people based on their baseball positions, while “red, blue, etc.” classify the objects based on their colors”)
and trains a statistical model of the VQA task based on the generated training sample in the VQA format, (pg. 500 “To demonstrate this idea, we select the results on classifying some groups of concepts, as shown in Fig. 6. For classifying “kitchenware,” “athlete,” “gender,” we can see that our method achieves better results. These groups are all relatively hard to classify, compared to “animal” group. Secondly, The correlation among the groups can help Dynamic Concept Recognizer transform the knowledge of classifying one group of concepts to classifying another relevant group. The transformed knowledge can facilitate classifying those concept groups with relatively few samples. For example, in Visual Genome, “plane figure” and “pattern” are two groups with relatively fewer training samples. From Fig. 6 in the paper, we can see that our method obtains better results on classifying these two groups of concepts. We also calculate the mean accuracy of the groups which contain less than 100 samples”)
…
the sample in a format of the image detection task of the non-VQA format includes, as the label, a rectangular ground truth position and ground truth size surrounding an object in an image and a ground truth class of the object, (pg. 498 “The visual concept recognition requires the model to classify many different kinds of concepts. The standard VQA benchmark is one way to test the performance of visual concept recognition, but many factors impact the results, such as whether the model attends on the correct region, whether the model correctly parses the question. Thus, we propose a toy benchmark to test the performance of visual concept recognition purely. In details, for one sample (as shown in Fig. 4), the inputs of the model are one image, one ground truth bounding box of an object and one group index that indicates the model need to classify one specific group of concepts. The group index can be viewed as a parsed simple question, e.g., “what color is it?”. The output should be the corresponding concept appearing in the image. We use classification accuracy to evaluate a model.”)
and the processing circuit generates the question text and the answer text based on the ground truth position and/or the ground truth size and the ground truth class. (Pg. 498 “we propose a toy benchmark to test the performance of visual concept recognition purely. In details, for one sample (as shown in Fig. 4), the inputs of the model are one image, one ground truth bounding box of an object and one group index that indicates the model need to classify one specific group of concepts. The group index can be viewed as a parsed simple question, e.g., “what color is it?”. The output should be the corresponding concept appearing in the image. We use classification accuracy to evaluate a model.” Also see pg. 500 right col “. Besides, for the DAG model evaluated on test-dev set, we show the performances of three modules in the DAG model: image grounding network, group prediction network, and dynamic concept recognizer. The outputs of three modules all have ground truth annotation: the object related to questions, the group related to questions, and the answers.”)
Pham and Gao are analogous art because they are both directed to machine learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combined state determination of image analysis using Visual Question and Answer system of Pham with computer visual question and answering system with structural label space of Gao.
One of ordinary skill in the art would have been motivated to make this modification in order to provide “structural label space and DCR module can efficiently learn the visual concept recognition and benefit the performance of the VQA model.” as disclosed by (Gao abstract “we propose a novel visual recognition module named Dynamic Concept Recognizer (DCR), which is easy to be plugged in an attention-based VQA model, to utilize the semantics of the labels in answer prediction …structural label space and DCR module can efficiently learn the visual concept recognition and benefit the performance of the VQA model.”).

Regarding claim 10
Claim 10 recites analogous limitations to independent claim 1 and therefore is rejected on the same ground as independent claim 1.

Regarding claim 11
Pham in view of Zhou teaches the machine learning apparatus according to claim 1. 
Pham further teaches an inference apparatus comprising: a processing circuit (para [0127] “The computer adopted in the embodiments is not limited to a PC; it may be a calculation processing apparatus, a microcomputer, or the like included in an information processor, and a device and apparatus that can realize the functions of the embodiments by a program.”) that applies an object and a question text regarding the object to a statistical model of a visual question answering (VQA) task according to claim 1 to infer an answer text in response to the question text, and displays the answer text at a display. (Para [0045] “FIG. 5 is, for example, a user interface screen displayed on, for example, a display device. The presentation unit 15 adds estimated answers to the table of the questions and the expected answers shown in FIG. 3, and displays the table in the user interface screen.”)

Regarding claim 12
Pham in view of Zhou teaches the machine learning apparatus according to claim 11. 
Pham further teaches wherein the processing circuit (para [0127] “The computer adopted in the embodiments is not limited to a PC; it may be a calculation processing apparatus, a microcomputer, or the like included in an information processor, and a device and apparatus that can realize the functions of the embodiments by a program.”) generates the question text based on a label associated with the object. (Para [0104] “The left part of FIG. 12 shows an image example of the target for processing, in which a dog and a cat are present on a sofa. In the semantic segmentation, labeling is performed for each pixel of the image. According to the third embodiment, the divided image regions obtained by step S1003 respectively correspond to, for example, silhouette regions of a dog, a cat, a sofa, and a background in the right part of FIG. 12.”)

Regarding claim 13
Pham in view of Zhou teaches the machine learning apparatus according to claim 1. 
Pham further teaches wherein the processing circuit (para [0127] “The computer adopted in the embodiments is not limited to a PC; it may be a calculation processing apparatus, a microcomputer, or the like included in an information processor, and a device and apparatus that can realize the functions of the embodiments by a program.”) generates the question text that is fixed. (Para [0039] “FIG. 3 is an example of a table storing questions and expected answers in association with each other. The table shows a preparation list for a safe state (there is nothing anomalous) prescribed in the safety manual that the worker should comply with. Specifically, the question “Wear a cap?” and the expected answer “Yes” are associated and stored in the table.”)

Claim(s) 7-8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Pham et al. (US S 2022/0129693 A1) in view of Gao et al. (“Learning to Recognize Visual Concepts for Visual Question Answering with Structural Label Space”) and further in view of Zhou et al. (US 2022/0130499 A1).  
Regarding claim 7
Pham in view of Gao teaches the machine learning apparatus according to claim 1. 
Pham in view of Gao does not teach wherein the statistical model comprises: an encoder that converts the object into a first feature; an encoder that converts the answer text into a second feature; a fuser that generates a fused feature of the first feature and the second feature; and a converter that converts the fused feature into a character string of natural language representing the answer text.
Zhou wherein the statistical model comprises: an encoder that converts the object into a first feature; an encoder that converts the answer text into a second feature; (para [0025] “the text document 306 includes a relatively smaller amount of information related to the semantic meaning. By converting both the image 202 and the text document 206 into respective embedding vectors 204 208 within the same embedding space, the system 100 can map the image embedding vector 204 text embedding vector 208, regardless of the dimensional differences”)
a fuser that generates a fused feature of the first feature and the second feature; (para [0027] “The second phase is performed by the second module 210 which learns features found the image embedding vector 204 that relate to features in the text embedding vector 208, and vice - versa. The second module 210 includes a multi-modal encoder 212 for encoding a relationship between features from the image embedding vector 204 and the text embed ding vector 208.”) and a converter that converts the fused feature into a character string of natural language representing the answer text. (Para [0023] “The text embedder unit 104 can apply natural language processing techniques, via a model, to semantically analyze the text document 206. The model can be, for example, a word embedding model. The text embedder unit 104 can receive the text document 206 and segment it into passages (e.g., paragraphs, sections, etc.).”)
Pham, Gao and Zhou are analogous art because they are both directed to machine learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combined state determination of image analysis using Visual Question and Answer system of Pham in view of Gao with computer visual question and answering system for solving complex process of Zhou.
One of ordinary skill in the art would have been motivated to make this modification in order to solve “complex process that involves textual analysis and visual analysis to determine an image and text relationship through computer - based reasoning” as disclosed by (Zhou “A visual question answering system can be tasked with analyzing the question and searching for objects in the image related to the question. Therefore, the computer visual question answering system has to analyze the questions in relation to the content of the digital image. As such, computer visual question answering is a complex process that involves textual analysis and visual analysis to determine an image and text relationship through computer- based reasoning”).


Regarding claim 8
Pham in view of Gao with Zhou teaches the machine learning apparatus according to claim 7. 
Zhou further teaches wherein the converter converts the fused feature into a relative value series representing occurrence probabilities of words constituting the answer text. (Para [0032] “The multi-modal encoder 212 can employ a classifier that is trained to determine whether an text matches an image or an object in an image. For example, if a token describes a liver and an image feature describes a liver, the classifier can be trained to determine a match exists. If, however, a token describes a broken arm and an image feature describes an ear, the classifier can be trained to determine that there is no match. The multi-modal encoder 212 can also employ the image - text matching model to generate pairs of tokens (or sets of tokens) and features (or sets of features) and determine a probability that the pairs are a match or not a match. A match suggests that the token or set of tokens describes the object described by a feature or set of features.”)
Pham, Gao and Zhou are analogous art because they are both directed to machine learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combined state determination of image analysis using Visual Question and Answer system of Pham in view of Gao with computer visual question and answering system for solving complex process of Zhou.
One of ordinary skill in the art would have been motivated to make this modification in order to solve “complex process that involves textual analysis and visual analysis to determine an image and text relationship through computer - based reasoning” as disclosed by (Zhou “A visual question answering system can be tasked with analyzing the question and searching for objects in the image related to the question. Therefore, the computer visual question answering system has to analyze the questions in relation to the content of the digital image. As such, computer visual question answering is a complex process that involves textual analysis and visual analysis to determine an image and text relationship through computer- based reasoning”).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to VAN C MANG whose telephone number is (571)270-7598. The examiner can normally be reached Mon - Fri 8:00-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached at 5712707519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/VAN C MANG/Primary Examiner, Art Unit 2126
Read full office action
Prosecution Timeline

Aug 26, 2022
Application Filed
Jun 10, 2025
Non-Final Rejection — §101, §103
Sep 08, 2025
Response Filed
Dec 04, 2025
Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/212,614
Patent 12591809
MACHINE LEARNING PLATFORM
2y 5m to grant Granted Mar 31, 2026
18/639,805
Patent 12591830
Machine Learning-Based Approach to Identify Software Components
2y 5m to grant Granted Mar 31, 2026
18/639,784
Patent 12586022
Machine Learning-Based Approach to Characterize Software Supply Chain Risk
2y 5m to grant Granted Mar 24, 2026
17/668,280
Patent 12579444
MACHINE LEARNING MODEL GENERATION AND UPDATING FOR MANUFACTURING EQUIPMENT
2y 5m to grant Granted Mar 17, 2026
17/184,880
Patent 12561555
NETWORK OF TENSOR TIME SERIES
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
75%
Grant Probability
99%
With Interview (+26.9%)
3y 10m
Median Time to Grant
Moderate
PTA Risk
Based on 241 resolved cases by this examiner. Grant probability derived from career allow rate.