Last updated: April 19, 2026
Application No. 18/703,045
MODEL TRAINING METHOD AND PLATFORM, IMAGE INPAINTING METHOD AND APPARATUS, DEVICE, AND MEDIUM

Non-Final OA §101§102§103§112
Filed
Apr 19, 2024
Examiner
LEMIEUX, IAN L
Art Unit
2669
Tech Center
2600 — Communications
Assignee
BOE TECHNOLOGY GROUP CO., LTD.
OA Round
1 (Non-Final)
Interview Optional

— +9.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 569 resolved cases, 2023–2026
Examiner Intelligence

LEMIEUX, IAN L View full profile →
Grants 87% — above average
Career Allow Rate
496 granted / 569 resolved
+25.2% vs TC avg
Moderate +10% lift
Without
With
+9.6%
Interview Lift
resolved cases with interview
Typical timeline
2y 4m
Avg Prosecution
34 currently pending
Career history
603
Total Applications
across all art units
Statute-Specific Performance

§101
11.2%
-28.8% vs TC avg
§103
39.6%
-0.4% vs TC avg
§102
19.1%
-20.9% vs TC avg
§112
19.4%
-20.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 569 resolved cases
Office Action

§101 §102 §103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are currently pending in U.S. Patent Application No. 18/703,045 and an Office action on the merits follows.

Drawings
The drawings are objected to because:
Fig. 8 features reversed “third image sample” and “text data sample” inputs into those “text encoder” and “image encoder” respectively (of CLIP/second preset model).  Applicant’s PGPUB US 2025/0265687 A1 at e.g. [0199-0200] discloses the text encoder receiving the text data sample, and the image encoder receiving the third image sample.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Rejections - 35 USC § 101
	35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


	Claims 20 is rejected under 35 U.S.C. 101 because the claimed invention does not appear directed to any of the four statutory categories (directed to a “platform” comprising “a training module”), and instead appears directed to non-statutory subject matter, specifically a program/software per se.  See MPEP § 2106.03.  In the context of the flowchart illustrated therein, claim 20 fails at Step 1 of the Subject Matter Eligibility test (Step 1: No).  Even under an interpretation wherein the “training module” invokes the provisions of 112(f), such an invocation is not understood to exclude those software embodiments disclosed.  Reference may be made to PGPUB US 2025/0265687 A1 at [0292] with reference to Figure 13.  While Applicant’s disclosure does recite statutory embodiments to include a processor/memory combination ([0083]), claim 20 is distinct from claim 22 (claim 22 is more explicitly directed to a ‘machine’ and/or ‘manufacture’), the disclosure is non-limiting, and the broadest reasonable interpretation of the claim in light of the specification concludes that the claim as a whole covers a program/software per se, which does not fall within the definition of a process, machine, manufacture or composition of matter (In re Nuijten).


Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder “___ module” that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “similarity determination module” in claim 8, and “training module” in claim 20.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 20 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
At line 2 of claim 20, claim limitation “a sample library” has been evaluated under the three-prong test set forth in MPEP § 2181, subsection I, but the result is inconclusive. Thus, it is unclear whether this limitation should be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the means/nonce term “library” (when considered at Prong A) is modified by the word “sample” which is ambiguous regarding whether it conveys structure given that the samples are images (Prong C inconclusive).  The boundaries of this claim limitation are ambiguous; therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
In response to this rejection, applicant must clarify whether this limitation should be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. Mere assertion regarding applicant’s intent to invoke or not invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph is insufficient. Applicant may:
(a)	Amend the claim to clearly invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, by reciting “means” or a generic placeholder for means, or by reciting “step.” The “means,” generic placeholder, or “step” must be modified by functional language, and must not be modified by sufficient structure, material, or acts for performing the claimed function;
(b)	Present a sufficient showing that 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, should apply because the claim limitation recites a function to be performed and does not recite sufficient structure, material, or acts to perform that function; 
(c)	Amend the claim to clearly avoid invoking 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, by deleting the function or by reciting sufficient structure, material or acts to perform the recited function; or
(d)	Present a sufficient showing that 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, does not apply because the limitation does not recite a function or does recite a function along with sufficient structure, material or acts to perform that function.


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

1.	Claims 1-4, 16, 19-20 and 22 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Zhang et al. “Text-Guided Image Inpainting”.

As to claim 1, Zhang discloses a model training method, comprising:
acquiring a plurality of image sample pairs, wherein the image sample pair comprises a first image sample and a second image sample of a same image (Fig. 3 input images x1-xn for a training as distinguished from inference stage, comprising an original/complete image and one or more with masked/omitted regions for subsequent inpainting operations, page 4082 Section 3.2 “denotes the image sequence input during training, where 𝑛 is step number and x1 is the masked image. Let I and M denote the original image and mask respectively … We construct {x2, x3, · · · , x𝑛} by applying a box blur filter on the missing region of original image and decrease blur degree from x2 to x𝑛. … Note that we need the whole image sequence for parallel training while during test only x1 is input”); and image quality of the second image sample is higher than image quality of the first image sample (x1 is characterized by that omitted/cropped/masked white portion/block – “missing region”, analogous to a ‘scratch’ of Applicant’s disclosure, and is lower/lowest in image quality accordingly relative to those additionally constructed images x2-xn – all lower quality than the original/input/true image I (xn+1); wherein the original image best corresponds that ‘second’ recited); and
training a first preset model (Fig 3 (a) Inpainting Module, Section 3.3, etc.,) with the plurality of image sample pairs as training samples, wherein the first preset model is configured to improve the image quality of the first image sample (Fig. 3 output images y1-yn improving image quality by inpainting/filling masked/missing/‘scratch’ regions, page 4082 section 3.3 Inpainting Module), and a process of the training (section 3.7) comprises:
acquiring text data corresponding to a prediction image currently output by the first preset model (permissible interpretation includes both/either of the non-masked/original user supplied text prompt, in addition to the masked version as used in deriving the reconstruction loss, Fig. 3, see both “a small sized bird…” text instances, as both correspond to e.g. output/predicted images y1-n), wherein the text data comprises data for evaluating image quality of the prediction image (supplied prompt/text guides, via e.g. semantic similarity (see p 4083 Section 3.5), image inpainting quality/desired effect – while not required for the rejection of claim 1, see also Morita et al. “Interactive Image Manipulation with Complex Text instructions”, e.g. Ldamsm of Equation 5, Eqn 4, and a manipulation phase that is reiterated based on fine-tuned text inputs); and
updating parameters of the first preset model based on the text data, the prediction image, and the second image sample (page 4080 left col par 3 “We think the reason causing this problem is that the algorithms really don’t know what to fill the missing region with due to insufficient contextual information of source region. To tackle this problem, we try to analyze how we human inpaint. We consult several experts about this problem and conclude that leaving out the image artistic effect, we human first consider what should be filled into the missing region to make the image reasonable. Like the example in Figure 1, we human’s thought must be that a belly should be embedded into the bird’s body, but existing methods may not have this kind of ability to decide what to add. Based on this observation, we introduce the textual description for masked images as a kind of guidance to direct the system to inpaint”, page 4081 Fig. 3 textual encoder input/prompt, in view of each of those disclosed losses, GCU GAN loss, L1 loss, TV loss, and Reconstruction Loss from (b) reconstruction module (best corresponding to Applicant’s second preset model not recited for the case of claim 1), page 4082-4083 Section 3.4 “The above inpainting module can be regarded as a generator, then we need to design a powerful discriminator … As a kind of auxiliary information, the text feature can be regarded as conditioning code, which can be input into conditional discriminators to determine whether the generated image matches the text or not and direct the generator to approximate the conditional image distribution”, page 4083 section 3.7 (see where xn+1 represents the original image I (best corresponding to the ‘second’ training sample recited – see above)), etc.,).

As to claim 2, Zhang discloses the method of claim 1.
Zhang further discloses the method wherein the updating comprises:
determining a similarity between a target text and the prediction image, wherein the target text is the text data or a text with semantics opposite to that of the text data (Fig. 3, prediction/output image(s) fed to visual encoder of (b), wherein ‘target text’ best corresponds to the masked version of that initial/input prompt, wherein the masked version is supplied to the texture decoder (see also Reconstructed Description), page 4083 Section 3.5 “we introduce the reconstruction module to improve the semantic similarity between visual and textual content. This strategy is similar to dual learning, which is a new learning framework that leverages the primal-dual structure of AI tasks to obtain effective feedback or regularization signals to enhance the learning and inference process”, see also Lrec loss, equation 14 at page 4084 and Fig. 3);
determining a loss value based on the prediction image and the second image sample (Fig. 3, L1 loss, page 4083 Section 3.7 Eqns. 11 and 12, wherein the predicted image is y, and in view of the manner in which xn+1 represents the original image I (as identified above original I best corresponds to ‘second” image sample)); and
updating the parameters of the first preset model based on the similarity and the loss value (Network Training section 3.7, page 4084 “Multi-Task Loss. After getting above four loss functions, we combine them and compute a multi-task loss to train the network in an end-to-end manner”, see also Parameter Sharing of Section 3.6).
As to claim 3, Zhang discloses the method of claim 2.
Zhang further discloses the method wherein the determining a similarity between a target text and the prediction image comprises:
encoding the target text to obtain a text feature vector of the target text (page 4083, Section 3.5 “We can extract word and image sequence feature by the same way, given by..” Equations 7 and 8, see word2vec); 
encoding the prediction image to obtain an image feature vector of the prediction image (Fig. 3 y is passed to Visual encoder, see page 4083 section 3.5 top right column),
wherein the text feature vector and the image feature vector have a consistent dimension (see Fig. 3 yellow nxDown to ensure prediction image y feature vector from visual encoder matches that of text feature – it is the Examiner’s understanding that these two features should/must generally be the same/consistent dimension for mathematical compatibility (vector addition/fusion requires same space) and where transformer architectures (performing such fusion for e.g. cross-attention mechanisms) rely on fixed dimensions regardless of the input source/domain – see also Morita cited below in the rejection of e.g. claim 14, and page 5 visual vector v that is the same size as a, and subsequent fusion to produce ā); and
determining the similarity (Lrec) based on the text feature vector and the image feature vector (both vectors are used to reconstruct that reconstructed description, which is used to derive ‘similarity’ equivalent Lrec).
Should applicant assert non-equivalence between the similarity determination as recited and those operations of Zhang’s Reconstruction Module (Fig 3 module (b), Section 3.5), e.g. if applicant intends for a similarity score that is not itself a loss measure per se, Examiner understands the limitations of claim 3 to concern admitted prior art, e.g. CLIP which is known to provide a score of how close an image is to a supplied text/ caption – see CLIP/Radford et al. “Learning transferable visual models from natural language supervision”.  See also Nichol et al. “GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models”.

As to claim 4, Zhang discloses the method of claim 1.
Zhang further discloses the method wherein the training a first preset model with the plurality of image sample pairs as training samples comprises:
performing first training on the first preset model with part of the image sample pairs as training samples, wherein in the first training, the parameters of the first preset model are updated based on the prediction image output by the first preset model and the second image sample (L1 loss as considered for e.g. a first or early mini-batch, page 4084 “During the training process, we employ an Adam optimizer [12] to minimize the multi-task loss, where we set initial learning rate to 0.0001 and batch size to 32. The learning rate increases linearly to the maximum with 1600 warm-up steps and then decreases itself based on the number of updates For each mini-batch, we first update discriminators and then update generators”); and
performing second training on a first preset model obtained by the first training with part of or all the image sample pairs as training samples (continuing training for additional mini-batches after an first/initial set of batches, page 4084 equation 18), wherein in the second training, parameters of the first preset model obtained by the first training are updated based on the text data, the prediction image, and the second image sample (Zhang page 4084, section 3.7 Network training, while Zhang discloses training the network in an end-to-end manner, Examiner understands it still to be the case that the discriminator is frozen/locked while updating the weights of the generator portion, and vice versa, for each mini-batch, performed iteratively; see also Zhang’s parameter sharing disclosure at section 3.6 to pre-emptively address any assertion that training sub-steps more relate to Zhang’s reconstruction module (b) and less the ‘first’/inpainting module (a) – furthermore, while multiple iterations of optimizing generator parameters in view of Zhang’s Equation 18 would consider L1 and Lrec for each iteration, that first training as recited involving y and I/Xn+1 does not explicitly exclude any basis on the text/Lrec during that first training).

As to claim 16, Zhang discloses the method of claim 1.
Zhang further discloses the method wherein the text data comprises at least one entry (Fig. 3, see exemplary text, page 4084 “For each text, we tokenize the sentence and split it into words using NLTK [23] and extract word features using the pre-trained word2vec Glove [28] of version cased-300d. The maximum text length is set to 20”); the at least one entry is used for describing image quality of the prediction image in different image regions and/or different quality dimensions (Fig. 1, Fig. 3, exemplary regions being “belly” and “wings” and dimension being color e.g. white/black/blue and/or size “small sized bird” – additional examples are implicitly disclosed in those captions of the associated datasets disclose in 4.1).

As to claim 19, this claim is the method claim corresponding to an inference stage executing the model as obtained in accordance with claim 1, and is rejected accordingly.  Zhang discloses the same in at least page 4085 Section 4.4-4.5 evaluating quantitative and qualitative results post training.  

As to claim(s) 22/20, this claim is the system claim corresponding to the method of claim 1 and is rejected accordingly.  Claim 20 as directed to a ‘platform’/software per se, is further addressed above with respect to 101, 112(f) invocation, and corresponding 112(b) rejections.  For an interpretation that ‘a sample library’ constitutes those sample images in the aggregate, Zhang as applied for the case of claim 1 reads (see also Zhang Section 4.1 Dataset with reference to CUB-200-2011 and Oxford-102).  Regarding explicit structure, see Zhang page 4085 “All experiments are conducted on a server with the Ubuntu 20.04 OS, Intel(R) Xeon(R) Silver 4114 CPU and Nvidia RTX 2080Ti GPU”.


2.	Claim 1 is alternatively rejected under 35 U.S.C. 102(a)(1) as being anticipated by Xu et al. “ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation”.

As to claim 1, Xu discloses a model training method (page 8, page 5 section 3 ReFL: Reward Feedback Learning Improves Text-to-Image Diffusion, “Though ImageReward can pick out highly human-preferred images from many generations of a prompt, the generate-and-then-filter paradigm could be expensive and inefficient in practical applications. Therefore, we seek to improve text-to-image generative models, particularly for the popular latent diffusion models, for allowing high-quality generation in single or very few trials”), comprising:
acquiring a plurality of image sample pairs, wherein the image sample pair comprises a first image sample and a second image sample of a same image; and image quality of the second image sample is higher than image quality of the first image sample (page 6 Section 4.1 “Dataset & Training Setting. Rankings of annotated images are collected to train ImageReward, which contains 8,878 prompts and 136,892 pairs of image comparisons”, page 7 Table 2, “for each pair of images, we use the image considered better by most people as the better one”); and
training a first preset model with the plurality of image sample pairs as training samples (page 8 Section 4.2 “Training Settings. We use Stable Diffusion v1.4[45] as the baseline generative model and fine-tune it for experiments. For the dataset, the pre-training dataset is from a 625k subset of LAION-5B[50] selected by aesthetic score, while the prompt set for ReFL is sampled from DiffusionDB”), wherein the first preset model is configured to improve the image quality of the first image sample (page 3 Figure 2, see various images A-D generated in response to the prompt, wherein the generative/ diffusion model concerns image generation over subsequent iterations, so as to arrive at an improved/acceptable quality image; page 5 Section 3 referenced in the preamble above “Therefore, we seek to improve text-to-image generative models, particularly for the popular latent diffusion models, for allowing high-quality generation in single or very few trials”), and a process of the training comprises:
acquiring text data corresponding to a prediction image currently output by the first preset model, wherein the text data comprises data for evaluating image quality of the prediction image (page 6 Section 4.1 “Rankings of annotated images are collected”, Fig. 2 annotation, page 3 Section 2.1 “Our annotation pipeline involves a prompt annotation stage, which includes categorizing prompts and identifying problematic ones, and a text-image rating stage, where images are rated based on alignment, fidelity, and harmlessness. Subsequently, annotators rank the images in order of preference. To manage potential contradictions in the ranking, we provide trade-offs in our annotation document (completely attached in Appendix B). Our annotation system is composed of three stages: Prompt Annotation, Text-Image Rating, and Image Ranking”); and
updating parameters of the first preset model based on the text data, the prediction image, and the second image sample (Fig. 2 “ReFL leverages ImageReward’s feedback to directly optimize diffusion models at a random latter denoising step”, page 3 “We propose Reward Feedback Learning (ReFL) for tuning diffusion models regarding human preference scorers. Our unique insight on ImageReward’s quality identifiability at latter denoising steps allows the direct feedback learning on diffusion models, which offer no likelihood for their generations. Extensive automatic and human evaluations demonstrate ReFL’s advantages over existing approaches including data augmentation [61; 13] and loss reweighing [23]”).


Claim Rejections - 35 USC § 103
	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


1.	Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. “Text-Guided Image Inpainting” in view of Morita et al. “Interactive Image Manipulation with Complex Text Instructions”.

As to claim 14, Zhang discloses the method of claim 1.
Zhang suggests the method wherein the acquiring text data corresponding to a prediction image currently output by the first preset model comprises:
displaying the prediction image (y1-yn are output images, as illustrated); and
Zhang fails to explicitly disclose in response to the displaying, acquiring text data input for the prediction image.  For clarify of record, Examiner understands what is at least an implicit ordering as presented (see underlined language/proposed reading), to be critical/required, such that the ‘acquiring’ occurs after the displaying (generally such ordering is not read into the claim(s) unless explicitly required – there is a presumption against strict order/sequence), and while Zhang at least suggests that intermediate prediction images are output, and that they may be output prior to an ‘acquiring’ that need not be the same as that step of a user supplying the text input/prompt, the supplied text is required for the generation of predicted/output image y (visual decoder uses output from textual encoder).  
Morita however evidences the obvious nature of such a text input acquisition in response to a predicted/generated image being displayed (page 3 Fig. 2, top arrow back to user interface, providing via said interface the resultant/predicted image from the combination phase, and that text relevant mask Mtr refined on the basis of newly supplied text, page 2 section 1 “The whole process is first to determine the text relevant content and the text-irrelevant content, then only modify the text-relevant content, and then put the modified result back to the original position”).  Morita further evidences the manner in which a user may not be satisfied with an initial/first generated/predicted image on the basis of a supplied text prompt, and how allowing the user to modify such text and resultant Mtr/embeddings of Zhang supplied to visual decoder, may better ensure that specific areas of interest to the user receive a desired degree of inpainting/manipulation.  
It would have been obvious to a person of ordinary skill in the art, before the effective filing date, to modify the system and method of Zhang, so as to acquire text data in response to displaying a generated/predicted image as taught/suggested by Morita, the motivation as similarly taught/suggested therein that such an acquisition may allow the user to fine-tune supplied text/prompt to ultimately achieve a predicted/ generated image more closely reflecting user intent.

Additional References
Prior art made of record and not relied upon that is considered pertinent to applicant's disclosure:
Additionally cited references (see attached PTO-892) otherwise not relied upon above have been made of record in view of the manner in which they evidence the general state of the art.


Allowable Subject Matter
Claims 5-13 and 17-18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.  References of record fail to serve in any obvious combination teaching each and every limitation as required therein, and more specifically that third training as recited for claim 5 wherein for that ‘updating’ the prediction/generated image is input into that second preset model after completion of the third training.


Inquiry
Any inquiry concerning this communication or earlier communications from the examiner should be directed to IAN L LEMIEUX whose telephone number is (571)270-5796. The examiner can normally be reached Mon - Fri 9:00 - 6:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan Park can be reached on 571-272-7409. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/IAN L LEMIEUX/Primary Examiner, Art Unit 2669
Read full office action
Prosecution Timeline

Apr 19, 2024
Application Filed
Feb 26, 2026
Non-Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/459,244
Patent 12602825
Human body positioning method based on multi-perspectives and lighting system
2y 5m to grant Granted Apr 14, 2026
17/896,167
Patent 12592086
POSE DETERMINING METHOD AND RELATED DEVICE
2y 5m to grant Granted Mar 31, 2026
18/405,490
Patent 12586397
METHOD AND APPARATUS EMPLOYING FONT SIZE DETERMINATION FOR RESOLUTION-INDEPENDENT RENDERED TEXT FOR ELECTRONIC DOCUMENTS
2y 5m to grant Granted Mar 24, 2026
18/205,005
Patent 12579840
BEHAVIOR ESTIMATION DEVICE, BEHAVIOR ESTIMATION METHOD, AND RECORDING MEDIUM
2y 5m to grant Granted Mar 17, 2026
18/205,065
Patent 12573086
CONTROL METHOD, RECORDING MEDIUM, METHOD FOR MANUFACTURING PRODUCT, AND SYSTEM
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
87%
Grant Probability
97%
With Interview (+9.6%)
2y 4m
Median Time to Grant
Low
PTA Risk
Based on 569 resolved cases by this examiner. Grant probability derived from career allow rate.