Last updated: April 19, 2026
Application No. 18/300,217
METHOD OF GENERATING MULTIMODAL SET OF SAMPLES FOR INTELLIGENT INSPECTION, AND TRAINING METHOD

Non-Final OA §101§102§112
Filed
Apr 13, 2023
Examiner
COLE, BRANDON S
Art Unit
2128
Tech Center
2100 — Computer Architecture & Software
Assignee
BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.
OA Round
1 (Non-Final)
Interview Optional

— +7.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 1205 resolved cases, 2023–2026
Examiner Intelligence

COLE, BRANDON S View full profile →
Grants 80% — above average
Career Allow Rate
958 granted / 1205 resolved
+24.5% vs TC avg
Moderate +8% lift
Without
With
+7.6%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
39 currently pending
Career history
1244
Total Applications
across all art units
Statute-Specific Performance

§101
13.0%
-27.0% vs TC avg
§103
40.6%
+0.6% vs TC avg
§102
34.6%
-5.4% vs TC avg
§112
7.1%
-32.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 1205 resolved cases
Office Action

§101 §102 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 9 is objected to because of the following informalities: 
Claim 1, lines 1 -  2, “at least one selected from” should be changed to -- at least one learning technique selected from --. 
Appropriate correction is required.


Election/Restrictions
Claims 10 - 13, 16, 17, 19, and 20 are withdrawn from further consideration pursuant to 37 CFR 1.142(b), as being drawn to a nonelected species, there being no allowable generic or linking claim. Applicant timely traversed the restriction (election) requirement in the reply filed on 4/13/2023.


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1 - 9, 14, 15, and 18 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.	
As to claims 1, 14, and 18 
the limitation “inputting an environmental sample in a collected multimodal set of environmental samples into a single-modal model” is ambiguous because of these three different meanings. Because of this ambiguity, the bounds of the limitation are  not clear and therefore indefinite. It is not clear where the environmental sample is being inputted: (1) the environmental sample is being inputted into a collected multimodal set of environmental samples. (2) the environmental sample, is one of a plurality of collected multimodal set of environmental samples, and is being inputted into a single-modal model. (3) the environmental sample is being inputted into a collected multimodal set of environmental samples and the collected multimodal set of environmental samples is inputted into a single-modal model. 
The examiner will interpret the claims as if the environmental sample, is one of a plurality of collected multimodal set of environmental samples, and is being inputted into a single-modal model.
Claims 2 - 9 and 15 depend on independent claims 1 and 14, respectively, therefore the claims are also rejected. 

As to claims 3, the limitation “the intermediate subset of samples comprises…wherein the first subset of other samples is a set of other samples in the intermediate set of samples other than the intermediate subset of samples”  is not understood by the examiner. The intermediate subset must be derived from and contained within the original set of samples. If additional elements (set of other samples) are introduced beyond those selected from the original set, the resulting collection is not the same subset but a newly formed set. Therefore, the examiner will interpret the claims as the “set of other samples in the intermediate set of samples other than the intermediate subset of samples” as a new distinct set. It also has to satisfy the requirement that the other samples come from the intermediate set of samples from which the intermediate subset came from



Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1 - 9, 14, 15, and 18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step One
The claims are directed to a method (claims 1 - 9), an electronic device comprising structural components (claims 14 and 15), and a non-transitory computer-readable medium (18). Thus, each of the claims falls within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).

As to claims 1, 
Step 2A, Prong One
The claim recites in part:
determining an initial set of samples from the multimodal set of environmental samples according to the model processing result;
As drafted and under its broadest reasonable interpretation, these limitation covers performance of the limitation in the mind (including an observation, evaluation, judgment, opinion) or with the aid of pencil and paper but for the recitation of generic computer components. For example: (1) A human can use a sheet of paper to write down (determine) an initial set of samples from a result displayed on a computer screen.
Accordingly, at Step 2A, Prong One, the claim is directed to an abstract idea.
Step 2A, Prong Two
The judicial exception is not integrated into a practical application.  In particular, the claim recites the additional elements of:
inputting an environmental sample in a collected multimodal set of environmental samples into a single-modal model matched with a modality of the environmental sample, so as to obtain a model processing result corresponding to the environmental sample. 
are recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea.  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).
The claim further recites:
processing the initial set of samples by means of an active learning, so as to determine the multimodal set of samples.
which is recited at a high-level of generality with no detail of the training process and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea.  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f))
A processor and memory which are recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic computer component (See MPEP 2106.05(f)). 

The recitation multimodal, set of environmental samples amounts to generally linking the use of the judicial exception to a particular environment of field of use (See MPEP 2106.05(h)).  
Accordingly, at Step 2A, Prong Two, the additional elements individually or in combination do no integrate the judicial exception into a practical application.
Step 2B
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.  As discussed above, the additional elements of:
inputting an environmental sample in a collected multimodal set of environmental samples into a single-modal model matched with a modality of the environmental sample, so as to obtain a model processing result corresponding to the environmental sample. 
are recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea.  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).
The claim further recites:
processing the initial set of samples by means of an active learning, so as to determine the multimodal set of samples.
which is recited at a high-level of generality with no detail of the training process and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea.  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f))
A processor and memory which are recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic computer component (See MPEP 2106.05(f)). 
The recitation multimodal, set of environmental samples amounts to generally linking the use of the judicial exception to a particular environment of field of use (See MPEP 2106.05(h)).  
Accordingly, at Step 2B the additional elements individually or in combination do not amount to significantly more than the judicial exception.

As to claims 2, 
Step 2A, Prong One
The claim is directed to the same abstract idea identified in claim 1 above
Step 2A, Prong Two
The judicial exception is not integrated into a practical application.  In particular, the claim recites the additional elements of:
wherein the processing the initial set of samples by means of an active learning so as to determine the multimodal set of samples comprises:
processing the initial set of samples by means of the first active learning, so as to obtain an intermediate set of samples;
performing at least one round of processing on the intermediate set of samples by means of the second active learning, so as to obtain an intermediate subset of samples corresponding to the round; and
determining the multimodal set of samples according to the intermediate subset of samples.
which is recited at a high-level of generality with no detail of the training process and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea.  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f))
Accordingly, at Step 2A, Prong Two, the additional elements individually or in combination do no integrate the judicial exception into a practical application
Step 2B
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.  The limitations:
wherein the processing the initial set of samples by means of an active learning so as to determine the multimodal set of samples comprises:
processing the initial set of samples by means of the first active learning, so as to obtain an intermediate set of samples;
performing at least one round of processing on the intermediate set of samples by means of the second active learning, so as to obtain an intermediate subset of samples corresponding to the round; and
determining the multimodal set of samples according to the intermediate subset of samples.
which is recited at a high-level of generality with no detail of the training process and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea.  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f))
Accordingly, at Step 2B the additional elements individually or in combination do not amount to significantly more than the judicial exception.






As to claims 3, 
Step 2A, Prong One
The claim recites in part:
determining the multimodal set of samples according to the intermediate subset of samples comprises:
determining an intermediate subset of labeled samples and a first subset of other samples as the multimodal set of samples, wherein the first subset of other samples is a set of other samples in the intermediate set of samples other than the intermediate subset of samples.
As drafted and under its broadest reasonable interpretation, these limitation covers performance of the limitation in the mind (including an observation, evaluation, judgment, opinion) or with the aid of pencil and paper but for the recitation of generic computer components. For example: A human can determine a multimodal set of samples from the intermediate subset of samples by using a pencil and paper to write down the samples associated with images (or text or audio).
Step 2A, Prong Two
The claim does not include additional elements that integrate the judicial exception into a practical application or amount to significantly more than the judicial exception itself
Step 2B
The claim does not include additional elements that are sufficient to amount to “significantly more” to the judicial exception

As to claims 4,
Step 2A, Prong One
The claim recites in part:
determining the multimodal set of other samples according to the intermediate subset of samples comprises:
determining an intermediate subset of labeled samples as the multimodal set of samples. 
As drafted and under its broadest reasonable interpretation, these limitation covers performance of the limitation in the mind (including an observation, evaluation, judgment, opinion) or with the aid of pencil and paper but for the recitation of generic computer components. For example: A human can determine a multimodal set of samples from the intermediate subset of samples by using a pencil and paper to write down the samples associated with images (or text or audio) and label the samples. Human have been applying labels to data before computers where ever invented. 
Step 2A, Prong Two
The claim does not include additional elements that integrate the judicial exception into a practical application or amount to significantly more than the judicial exception itself
Step 2B
The claim does not include additional elements that are sufficient to amount to “significantly more” to the judicial exception


As to claims 5,
Step 2A, Prong One
The claim recites in part:
wherein the at least one round comprises a first round to an Mth round, and M is a positive integer,
As drafted and under its broadest reasonable interpretation, these limitation covers performance of the limitation in the mind (including an observation, evaluation, judgment, opinion) or with the aid of pencil and paper but for the recitation of generic computer components. For example: A human can create a sample set during a first round of process then go back and create another sample set during a second round of processing,  Humans have been creating multiple sample sets from data and repeating processes before computers where ever even created.
Step 2A, Prong Two
The judicial exception is not integrated into a practical application.  In particular, the claim recites the additional elements of:
wherein the performing at least one round of processing on the intermediate set of samples by means of the second active learning so as to obtain an intermediate subset of samples corresponding to the round comprises:
processing, in the first round, the intermediate set of samples by means of the second active learning, so as to obtain an intermediate subset of samples corresponding to the first round; and
processing, in an mth round, a second subset of other samples by means
of the second active learning, so as to obtain an intermediate subset of samples corresponding to the mth round, wherein the second subset of other samples is a set of other samples in the intermediate set of samples other than intermediate subsets of samples corresponding to first m-1 rounds, and wherein m is greater than 1 and less than or equal to M, and m is a positive integer.
which is recited at a high-level of generality with no detail of the training process and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea.  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f))
Accordingly, at Step 2A, Prong Two, the additional elements individually or in combination do no integrate the judicial exception into a practical application
Step 2B
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.  The limitations:
wherein the performing at least one round of processing on the intermediate set of samples by means of the second active learning so as to obtain an intermediate subset of samples corresponding to the round comprises:
processing, in the first round, the intermediate set of samples by means of the second active learning, so as to obtain an intermediate subset of samples corresponding to the first round; and
processing, in an mth round, a second subset of other samples by means
of the second active learning, so as to obtain an intermediate subset of samples corresponding to the mth round, wherein the second subset of other samples is a set of other samples in the intermediate set of samples other than intermediate subsets of samples corresponding to first m-1 rounds, and wherein m is greater than 1 and less than or equal to M, and m is a positive integer.
which is recited at a high-level of generality with no detail of the training process and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea.  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f))
Accordingly, at Step 2B the additional elements individually or in combination do not amount to significantly more than the judicial exception.



As to claims 6, 
Step 2A, Prong One
The claim is directed to the same abstract idea identified in claim 1 above
Step 2A, Prong Two
The judicial exception is not integrated into a practical application.  In particular, the claim recites the additional elements of:
wherein the processing the initial set of samples by means of an active learning so as to determine the multimodal set of samples
comprises:
determining a first target set of samples, wherein the first target set of samples is obtained by processing the initial set of samples by means of the active learning; and
determining the first target set of samples as the multimodal set of samples.
which is recited at a high-level of generality with no detail of the training process and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea.  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f))
Accordingly, at Step 2A, Prong Two, the additional elements individually or in combination do no integrate the judicial exception into a practical application
Step 2B
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.  The limitations:
wherein the processing the initial set of samples by means of an active learning so as to determine the multimodal set of samples
comprises:
determining a first target set of samples, wherein the first target set of samples is obtained by processing the initial set of samples by means of the active learning; and
determining the first target set of samples as the multimodal set of samples.
which is recited at a high-level of generality with no detail of the training process and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea.  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f))
Accordingly, at Step 2B the additional elements individually or in combination do not amount to significantly more than the judicial exception.

As to claims 7,
Step 2A, Prong One
The claim recites in part:
the model processing result comprises a confidence information corresponding to the model processing result, and wherein the determining an initial set of samples from the multimodal set of environmental samples according to the model processing result comprises:
determining the initial set of samples from the multimodal set of environmental samples according to the confidence information.
As drafted and under its broadest reasonable interpretation, these limitation covers performance of the limitation in the mind (including an observation, evaluation, judgment, opinion) or with the aid of pencil and paper but for the recitation of generic computer components. For example: A human can determine the initial set of samples by based on how confident they are in the initial set of date being part of a group
Step 2A, Prong Two
The claim does not include additional elements that integrate the judicial exception into a practical application or amount to significantly more than the judicial exception itself
Step 2B
The claim does not include additional elements that are sufficient to amount to “significantly more” to the judicial exception

As to claims 8,
Step 2A, Prong One
The claim recites in part:
determining the initial set of samples from the multimodal set of environmental samples according to the confidence information comprises:
determining a first target set of environmental samples corresponding to a confidence information greater than a first predetermined threshold;
determining a second target set of environmental samples corresponding to a confidence information less than a second predetermined threshold, wherein the second predetermined threshold is less than the first predetermined threshold; 
determining the initial set of samples according to the first target set of environmental samples and the second target set of environmental samples.
As drafted and under its broadest reasonable interpretation, these limitation covers performance of the limitation in the mind (including an observation, evaluation, judgment, opinion) or with the aid of pencil and paper but for the recitation of generic computer components. For example: (1) A human can determine the initial set of samples by based on how confident they are in the initial set of date being part of a group. (2) A human can determine that a confidence level is above a threshold which determines if the initial set of samples is in the group. 
Step 2A, Prong Two
The claim does not include additional elements that integrate the judicial exception into a practical application or amount to significantly more than the judicial exception itself
Step 2B
The claim does not include additional elements that are sufficient to amount to “significantly more” to the judicial exception

As to claims 8,
Step 2A, Prong One
The claim recites in part:
determining the initial set of samples from the multimodal set of environmental samples according to the confidence information comprises:
determining a first target set of environmental samples corresponding to a confidence information greater than a first predetermined threshold;
determining a second target set of environmental samples corresponding to a confidence information less than a second predetermined threshold, wherein the second predetermined threshold is less than the first predetermined threshold; 
determining the initial set of samples according to the first target set of environmental samples and the second target set of environmental samples.
As drafted and under its broadest reasonable interpretation, these limitation covers performance of the limitation in the mind (including an observation, evaluation, judgment, opinion) or with the aid of pencil and paper but for the recitation of generic computer components. For example: (1) A human can determine the initial set of samples by based on how confident they are in the initial set of date being part of a group. (2) A human can determine that a confidence level is above a threshold which determines if the initial set of samples is in the group. 
Step 2A, Prong Two
The claim does not include additional elements that integrate the judicial exception into a practical application or amount to significantly more than the judicial exception itself
Step 2B
The claim does not include additional elements that are sufficient to amount to “significantly more” to the judicial exception

As to claims 10, 
Step 2A, Prong One
The claim is directed to the same abstract idea identified in claim 1 above
Step 2A, Prong Two
The judicial exception is not integrated into a practical application.  In particular, the claim recites the additional elements of:
the active learning comprises at least one selected from: Uncertainty Sampling, Query-By-Committee, or Expected Model Change.
which is recited at a high-level of generality with no detail of the training process and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea.  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f))
Accordingly, at Step 2A, Prong Two, the additional elements individually or in combination do no integrate the judicial exception into a practical application
Step 2B
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.  The limitations:
the active learning comprises at least one selected from: Uncertainty Sampling, Query-By-Committee, or Expected Model Change.
which is recited at a high-level of generality with no detail of the training process and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea.  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f))
Accordingly, at Step 2B the additional elements individually or in combination do not amount to significantly more than the judicial exception.


Claim 14 has similar limitations as claim 1. Therefore, the claim is rejected for the same reasons as above. 
The claim further recites at least one processor and a memory which is recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic computer component (See MPEP 2106.05(f)). 


Claim 15 has similar limitations as claim 2. Therefore, the claim is rejected for the same reasons as above. 

Claim 18 has similar limitations as claim 1. Therefore, the claim is rejected for the same reasons as above. 
The claim further recites a non-transitory computer-readable storage medium and computer system which is recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic computer component (See MPEP 2106.05(f)). 


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claim(s) 1 - 4, 6, 7, 9, 14, and 15  is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Pena Pena et al (US 11/861,884)


As to claim 1, Pena Pena et al teaches figures 1 and 2 shows/teaches a method of generating a multimodal set of samples (column 4, lines 40 - 45…the information extracting transformer model 104 is configured to process the multimodal data input 102 to generate extracted data 106 as recognized entity information (e.g., key entity information) that is classified into one or more classification types ; Examiner’s Note: “generate extracted data 106 as recognized entity information” reads on “generating a multimodal set of samples”), comprising:
inputting an environmental sample in a collected multimodal set of environmental samples into a single-modal model matched with a modality of the environmental sample, so as to obtain a model processing result corresponding to the environmental sample (column 4, lines 40 - 50…the information extracting transformer model 104 is configured to process the multimodal data input 102 to generate extracted data 106 as recognized entity information (e.g., key entity information) that is classified into one or more classification types…multimodal data input 102 may be captured by a sensor of a device, such as a scanner, camera, or the like ; column 5, lines 25 - 40…FIG. 2 depicts an embodiment of a training flow 200 of the information extracting transformer model, such as model 104 of FIG. 1…In process block 202, the first multimodal transformer model (e.g., 302 in FIG. 3) is pre-trained on unlabeled data of input block 204. Pre-training at process block 202 is useful to build a general model of contextual multi-modal document representations before being trained in a task oriented manner for specific KIE tasks, such as multi-modal named entity recognition for documents ; column 6, lines 10 - 60…In process block 210, the second multimodal transformer model (e.g., 304 of FIG. 3) is trained on a labeled dataset for KIE task 205… In process block 214, the third multimodal transformer model is trained in an uncertainty-aware manner based on at least (i) the pseudo-labels as updated in process block 212 and (i) the noise-aware loss function (e.g., 310 of FIG. 3) to generate the updated multimodal transformer model (e.g., 312 of FIG. 3). In some embodiments, a calibrated confidence score of each of the pseudo-labels based on the second multimodal transformer model. Then during the training of the third multimodal transformer model the noise-aware loss function, which takes account of the calibrated scores as weight coefficient for each pseudo label, is used to compute the iterative updates of the parameters.) (Examiner’s Note: “input block 204” reads on “inputting an environmental sample” ; “multimodal data input 102 captured by a sensor of a device” reads on “collected multimodal set of environmental samples” ; “In process block 210, the second multimodal transformer model (is trained on a labeled dataset for KIE task 205” reads on “single-modal model” ; “ calibrated confidence score of each of the pseudo-labels based on the second multimodal transformer model” reads on “obtain a model processing result corresponding to the environmental sample”);
determining an initial set of samples from the multimodal set of environmental samples according to the model processing result (column 9, lines 35 - 50… one or more new documents to be labeled for further training of the updated third multimodal transformer model 312 may be identified (corresponding to process block 218 of FIG. 2). In some embodiments, the one or more new documents may be identified based on a set of calibrated confidence scores indicative of model uncertainty for the one or more unlabeled documents. n some embodiments, the one or more new documents may be identified based on a set of calibrated confidence scores indicative of model uncertainty for the one or more unlabeled documents. When the calibrated confidence score is above a predetermined threshold, a corresponding document may be identified to be labeled and/or as a labeled document (corresponding to input block 206 of FIG. 2) for further training of the updated third multimodal transformer model 312. The updated third multimodal transformer model 312 may be utilized to classify and label key information elements in one or more multimodal documents, such as the documents input as multimodal data input 102 as described above in FIG. 1)(Examiner’s Note: “when the calibrated confidence score is above a predetermined threshold, a corresponding document may be identified to be labeled and/or as a labeled document” reads on “determining an initial set of samples”); and 
processing the initial set of samples by means of an active learning, so as to determine the multimodal set of samples (column 4, lines 1 - 5…the information extracting transformer model 104 is configured to process the multimodal data input 102 to generate extracted data 106 as recognized entity information (e.g., key entity information) that is classified into one or more classification types (i.e., whether an extracted item is a description, quantity, or price in a receipt that is associated with a product sold by an identified vendor at an identified time ; column 7, line 60 - column 8, line 10… In process block 218, one or more documents may be sampled for labeling and added to the set of labeled documents in process block 206 for continuous active  learning (e.g., training the updated multimodal transformer model). The newly labeled documents 206 may then be utilized to continue to fine-tune the updated multimodal transformer model (e.g., for the KIE task. Thus, the updated multimodal transformer model may be continually improved by such active learning. (40) In embodiments in which the data is continually able to be labeled, an uncertainty based active learning loop may thus be employed that continuously selects such one or more documents for labeling that meet a threshold. For example, documents having a calibrated confidence score determined using the confidence values associated with the pseudo-labels may be selected when their calibrated confidence score is in a score range (e.g., higher or lower than a threshold value). In some cases, the selected documents in the score range may be ranked from a highest calibrated confidence score to a lowest calibrated confidence score, and a subset of the ranked documents may be selected for manual labeling) (Examiner’s Note: “one or more documents may be sampled for labeling and added to the set of labeled documents in process block 206 for continuous active  learning” reads on “processing the initial set of samples by means of an active learning” ; “generate extracted data 106” reads on “determine the multimodal set of samples”);

As to claim 2, Pena Pena et al teaches figures 1 and 2 shows/teaches the method, wherein the active learning comprises a first active learning and a second active learning, and wherein the processing the initial set of samples by means of an active learning so as to determine the multimodal set of samples comprises: 
processing the initial set of samples by means of the first active learning, so as to obtain an intermediate set of samples; 
performing at least one round of processing on the intermediate set of samples by means of the second active learning, so as to obtain an intermediate subset of samples corresponding to the round; and 
determining the multimodal set of samples according to the intermediate subset of samples. (column 4, lines 1 - 5…the information extracting transformer model 104 is configured to process the multimodal data input 102 to generate extracted data 106 as recognized entity information (e.g., key entity information) that is classified into one or more classification types (i.e., whether an extracted item is a description, quantity, or price in a receipt that is associated with a product sold by an identified vendor at an identified time ; column 7, line 60 - column 8, line 10… In process block 218, one or more documents may be sampled for labeling and added to the set of labeled documents in process block 206 for continuous active  learning (e.g., training the updated multimodal transformer model). The newly labeled documents 206 may then be utilized to continue to fine-tune the updated multimodal transformer model (e.g., for the KIE task. Thus, the updated multimodal transformer model may be continually improved by such active learning. (40) In embodiments in which the data is continually able to be labeled, an uncertainty based active learning loop may thus be employed that continuously selects such one or more documents for labeling that meet a threshold. For example, documents having a calibrated confidence score determined using the confidence values associated with the pseudo-labels may be selected when their calibrated confidence score is in a score range (e.g., higher or lower than a threshold value). In some cases, the selected documents in the score range may be ranked from a highest calibrated confidence score to a lowest calibrated confidence score, and a subset of the ranked documents may be selected for manual labeling) (Examiner’s Note: “continuous active learning, the newly labeled documents 206 may then be utilized to continue to fine-tune the updated multimodal transformer model ” reads on “the active learning comprises a first active learning and a second active learning” “fine-tune the updated multimodal transformer model” reads on “determining the multimodal set of samples according to the intermediate subset of samples”  ; The applicant teaches in paragraph [0067] of the current applicant’s specification that the strategy of the first active learning may be the same as or different from the strategy of the second active learning and Pena Pena et al teaches an active learning loop. In Pena Pena et al a first round of active learning would involve processing initial set of samples to determine a multimodal set of samples. A second iteration of active learning would then involve processing the multimodal set of samples  (i.e. an intermediate subset of samples) to determine a fine-tuned multimodal set of samples).

As to claim 3, Pena Pena et al teaches figures 1 and 2 shows/teaches the method, wherein determining the multimodal set  of samples according to the intermediate subset of samples comprises: 
determining an intermediate subset of labeled samples and a first subset of other samples as the multimodal set of samples, wherein the first subset of other samples is a set of other samples in the intermediate set of samples other than the intermediate subset of samples. (column 3, lines 40 - 65… a third model is generated via training of the first model to perform key information extraction based on a second labeled dataset, which may be a closed-source labeled dataset, comprising one or more labels, the weakly-labeled dataset as pseudo-labels generated by the second model (e.g., as the generated pseudo-labels), or combinations thereof. In an embodiment, the third model may be trained on the closed-source labeled dataset as a small strongly labeled dataset (e.g., human annotated) when available in place of the pseudo-labels. Training of the first model to generate the third model based on the closed-source labeled dataset in place of the generated pseudo-labels of the second model allows for knowledge transfer from the first model to the third model and label enrichment. Training of the first model to generate the third model based on the generated pseudo-labels of the second model allows for knowledge transfer from the second model to the third model and iterative label enrichment. The unlabeled dataset may further be processed by the third multimodal transformer model to update the pseudo-labels for the unlabeled data. Owing to the nature of the partial weak-supervision for the third model (e.g., based on the pseudo-labels), this training may use an uncertainty-aware training objective such as through a noise-aware loss to allow the model to dynamically and differentially learn from different pseudo-labels based on the amount of label uncertainty) (Examiner’s Note: “ closed-source labeled dataset” reads on “intermediate subset of labeled samples”; “unlabeled dataset” reads on “a first subset of other samples”  ).

As to claim 4, Pena Pena et al teaches figures 1 and 2 shows/teaches the method, wherein the determining the multimodal set of samples according to the intermediate subset of samples comprises: determining an intermediate subset of labeled samples as the multimodal set of samples. (column 4, lines 1-20 the third model may be further fine-tuned through a process of active learning. For example, to improve the model performance in high uncertainty/low confidence data points, uncertainty based active learning samples a subset of low uncertainty pseudo-labels for labeling by humans. The new inputs/labels pairs from this high uncertainty set is then added to the small set of strongly labeled data. This growing set of strongly-labeled data may be used for active and continual fine tuning. In some cases, the samples selected for strong labelling are selected based on a measure of uncertainty of the model's output, which is calibrated during training. This allows uncertainty-based sampling of samples for strong labelling.). (Examiner’s Note: “strongly-labeled data may be used for active and continual fine tuning” reads on “determining an intermediate subset of labeled samples as the multimodal set of samples”).

As to claim 6, Pena Pena et al teaches figures 1 and 2 shows/teaches the method, wherein the processing the initial set of samples by means of an active learning so as to determine the multimodal set of samples comprises: 
determining a first target set of samples, wherein the first target set of samples is obtained by processing the initial set of samples by means of the active learning; and
determining the first target set of samples as the multimodal set of samples. (column 4, lines 5 - 20… strongly-labeled data may be used for active and continual fine tuning. In some cases, the samples selected for strong labelling are selected based on a measure of uncertainty of the model's output, which is calibrated during training).(Examiner’s Note: “strongly-labeled data may be used for active and continual fine tuning. In some cases, the samples selected for strong labelling are selected based on a measure of uncertainty of the model's output, which is calibrated during training” reads on “ determining a first target set of samples”)

As to claim 7, Pena Pena et al teaches figures 1 and 2 shows/teaches the method, wherein the model processing result comprises a confidence information corresponding to the model processing result, and wherein the determining an initial set of samples from the multimodal set of environmental samples according to the model processing result comprises: determining the initial set of samples from the multimodal set of environmental samples according to the confidence information.(column 9, lines 35 - 50… one or more new documents to be labeled for further training of the updated third multimodal transformer model 312 may be identified (corresponding to process block 218 of FIG. 2). In some embodiments, the one or more new documents may be identified based on a set of calibrated confidence scores indicative of model uncertainty for the one or more unlabeled documents. n some embodiments, the one or more new documents may be identified based on a set of calibrated confidence scores indicative of model uncertainty for the one or more unlabeled documents. When the calibrated confidence score is above a predetermined threshold, a corresponding document may be identified to be labeled and/or as a labeled document (corresponding to input block 206 of FIG. 2) for further training of the updated third multimodal transformer model 312. The updated third multimodal transformer model 312 may be utilized to classify and label key information elements in one or more multimodal documents, such as the documents input as multimodal data input 102 as described above in FIG. 1)(Examiner’s Note: “when the calibrated confidence score is above a predetermined threshold, a corresponding document may be identified to be labeled and/or as a labeled document” reads on “determining an initial set of samples from the multimodal set of environmental samples according to the confidence information”)

As to claim 9, Pena Pena et al teaches figures 1 and 2 shows/teaches the method, wherein the active learning comprises at least one selected from: Uncertainty Sampling, Query-By-Committee, or Expected Model Change (column 4, lines 5 - 20… the third model may be further fine-tuned through a process of active learning. For example, to improve the model performance in high uncertainty/low confidence data points, uncertainty based active learning samples a subset of low uncertainty pseudo-labels for labeling by humans. The new inputs/labels pairs from this high uncertainty set is then added to the small set of strongly labeled data. This growing set of strongly-labeled data may be used for active and continual fine tuning. In some cases, the samples selected for strong labelling are selected based on a measure of uncertainty of the model's output, which is calibrated during training. This allows uncertainty-based sampling of samples for strong labelling) (Examiner’s Note: “to improve the model performance in high uncertainty/low confidence data points, uncertainty based active learning samples a subset of low uncertainty pseudo-labels for labeling by humans” reads on “active learning comprises at least: Uncertainty Sampling”).

Claim 14 has similar limitations as claim 1. Therefore, the claim is rejected for the same reasons as above. 

Claim 15 has similar limitations as claim 2. Therefore, the claim is rejected for the same reasons as above. 

Claim 18 has similar limitations as claim 1. Therefore, the claim is rejected for the same reasons as above. 

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRANDON S COLE whose telephone number is (571)270-5075. The examiner can normally be reached Mon - Fri 7:30pm - 5pm EST (Alternate Friday's Off).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez can be reached at 571-272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/BRANDON S COLE/           Primary Examiner, Art Unit 2128
Read full office action
Prosecution Timeline

Apr 13, 2023
Application Filed
Mar 05, 2026
Non-Final Rejection — §101, §102, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/122,428
Patent 12596908
WEAK NEURAL ARCHITECTURE SEARCH (NAS) PREDICTOR
2y 5m to grant Granted Apr 07, 2026
17/652,236
Patent 12596940
SMART TRAINING AND SMART DEPLOYMENT OF MACHINE LEARNING MODELS
2y 5m to grant Granted Apr 07, 2026
17/975,837
Patent 12596913
CONVOLUTIONAL NEURAL NETWORK (CNN) PROCESSING METHOD AND APPARATUS
2y 5m to grant Granted Apr 07, 2026
18/525,523
Patent 12598117
METHODS AND SYSTEMS FOR IMPLEMENTING DYNAMIC-ACTION SYSTEMS IN REAL-TIME DATA STREAMS
2y 5m to grant Granted Apr 07, 2026
18/167,748
Patent 12578502
WEATHER-DRIVEN MULTI-CATEGORY INFRASTRUCTURE IMPACT FORECASTING
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
80%
Grant Probability
87%
With Interview (+7.6%)
2y 7m
Median Time to Grant
Low
PTA Risk
Based on 1205 resolved cases by this examiner. Grant probability derived from career allow rate.
METHOD OF GENERATING MULTIMODAL SET OF SAMPLES FOR INTELLIGENT INSPECTION, AND TRAINING METHOD

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email