Last updated: May 29, 2026
Application No. 18/177,733
IMAGE ENCODING DEVICE, IMAGE ENCODING METHOD, IMAGE ENCODING PROGRAM, IMAGE DECODING DEVICE, IMAGE DECODING METHOD, IMAGE DECODING PROGRAM, IMAGE PROCESSING DEVICE, LEARNING DEVICE, LEARNING METHOD, LEARNING PROGRAM, SIMILAR IMAGE SEARCH DEVICE, SIMILAR IMAGE SEARCH METHOD, AND SIMILAR IMAGE SEARCH PROGRAM

Final Rejection §102§103
Filed
Mar 02, 2023
Priority
Sep 15, 2020 — JP 2020-154532 +1 more
Examiner
DRYDEN, EMMA ELIZABETH
Art Unit
2677
Tech Center
2600 — Communications
Assignee
Fujifilm Corporation
OA Round
2 (Final)
Interview Optional

— +30.0% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 62% grant rate with +30.0% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 13 resolved cases, 2023–2026
Examiner Intelligence

DRYDEN, EMMA ELIZABETH View full profile →
Grants 62% of resolved cases
Career Allowance Rate
8 granted / 13 resolved
-0.5% vs TC avg
Strong +30% interview lift
Without
With
+30.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
14 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
2.9%
-37.1% vs TC avg
§103
95.2%
+55.2% vs TC avg
§102
1.0%
-39.0% vs TC avg
§112
1.0%
-39.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 13 resolved cases
Office Action

§102 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Receipt is acknowledged that application claims priority to foreign application with application number JP2020-154532 dated 09/15/2020. Copies of certified papers required by 37 CFR 1.55 have been received. Priority is acknowledged under 35 USC 119(e) and 37 CFR 1.78.
Response to Amendment
The amendment filed 10/30/2025 has been entered. Applicant’s amendments to the claims have overcome each and every objection and 35 U.S.C. 112 rejection previously set forth in the Non-Final Office Action mailed 08/07/2025. Claims 1-18 remain pending in the application.
Response to Arguments
Applicant's arguments regarding claim 1, on pg. 14-16 of the remarks filed 10/30/2025, have been fully considered but they are not persuasive. Applicant states that the unique and common features of Vorontsov refer to features in the latent representation used for disentanglement. Examiner agrees that Vorontsov teaches disentangling the common and unique features, as cited from paragraph 25. Applicant states that in the disclosure of Vorontsov, the features are not directly used for determining any condition of the image. Examiner disagrees. As shown in Figure 1 of Vorontsov, the unique features directly indicate an abnormal or normal condition of the image. By training the model with images wherein the common features are normal regions and the unique features are abnormal regions, the model is able to disentangle the two types of regions. Paragraph 20 from Vorontsov states: “Both images 114(0) and 114(1) include features typically associated with the brain, including, for example, cerebral hemispheres, fissures, and gyri. These features (indicated with a “C”) are common to both images 114(0) and 114(1). Additionally, images 114(0) include an abnormality that could be, for example, a brain tumor. This feature (indicated with a “U”) uniquely occurs in images 114(0).” 
Applicant argues that the claimed invention differs from Vorontsov because “the first feature amount and the second feature amount in the claimed invention are values used for actual determination of the image condition (i.e., whether a region is normal or abnormal), rather than abstract or intermediate feature representations. Accordingly, the claimed invention enables explicit and interpreable judgement between different image conditions based on the derived features amounts, instead of relying on latent or non-interpretable features.” In response to applicant's argument that the references fail to show certain features of the invention, it is noted that the features upon which applicant relies (i.e., details about the determination of image condition) are not recited in the rejected claim(s). Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
Applicant argues that Vorontsov fails to teach the following claim limitations: 

    PNG
    media_image1.png
    205
    671
    media_image1.png
    Greyscale

In the disclosure of Vorontsov, as demonstrated by paragraph 20 above, common features are those considered to be normal to the dataset. For example, it is normal for a brain to have fissures and gyri. Unique features are abnormal features – a brain tumor, provided in the example. Thus, since the encoder disentangles image data into an amount of common features and an amount of unique features, the first and second feature amounts indicating normal and abnormal regions, respectively, is taught. Additionally, an intended use of the Vorontsov disclosure is to indicate where unique features are located (FIG 6 and para 46-48 of Vorontsov). Because the segmentation relies on the encoded features, the encoded features must indicate an abnormal region from a normal one.
In view of the foregoing, the rejection of claim 1 is maintained. For the same reasons, the double patenting rejection for claims 10, 14, and 18 is maintained.
Applicant's arguments regarding claim 3, on pg. 16-17 of the remarks filed 10/30/2025, have been fully considered but they are not persuasive. Applicant argues that the relied upon teachings of Gârbacea “does not involve any operation for determining image abnormality or normality based on derived feature amounts as required in the claimed invention” and “cannot be reasonably be interpreted as disclosing or suggestion the claimed image encoding process for abnormality determination”. Claim 3 is rejected with the combination of Vorontsov in view of Gârbacea. One cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).
In response to applicant's argument that it would not have been obvious to for a skilled person to combine Gârbacea into other cited references to arrive at claim 3, the fact that the inventor has recognized another advantage which would flow naturally from following the suggestion of the prior art cannot be the basis for patentability when the differences would otherwise be obvious. See Ex parte Obiaya, 227 USPQ 58, 60 (Bd. Pat. App. & Inter. 1985). Further, the use of vector-quantized autoencoder architecture in image analysis is not novel, as demonstrated by the last paragraph on pg. 1 from Gârbacea, cited in the Non-Final Office Action and claim 1 rejection, and the NPL disclosure from van den Oord, cited in the pertinent art section below. In view of the foregoing, the rejection of claim 3 is maintained.
Claim Objections
Claims 9-10 are objected to because of the following informalities:
In claim 9, two instances of “at least one third processor” should read “at least 
In claim 10, two instances of “at least one fourth processor” should read “at least one second processor”. Based on its dependence on claim 1, the processor of claim 10 is the at least one second processor. 
Appropriate correction is required.
Claim Interpretation
Regarding claims 5, 10, 12, 14, 16, and 18, which each reference “the image encoding device according to claim 1”, each claim will be interpreted as including all limitations of claim 1. 
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Instant claims 10, 14, and 18 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 of co-pending Application No. 18/539,303 in view of Vorontsov et al. (U.S. Patent No. 2022/0254029 A1), hereinafter Vorontsov.
Regarding instant claim 10, co-pending claim 1 discloses a similar image search device comprising: at least one processor; wherein the processor is configured to derive a first feature amount and a second feature amount for a query image (finding feature amount and normal feature amount in co-pending claim 1), to derive a similarity between the query image and each of a plurality of reference images on the basis of at least one of the first feature amount or the second feature amount derived from the query image with reference to an image database in which a first feature amount and a second feature amount for each of the plurality of reference images are registered in association with each of the plurality of reference images, and to extract a reference image that is similar to the query image as a similar image from the image database on the basis of the similarity.
However, co-pending claim 1 fails to disclose the image encoding device according to claim 1 (instant claim 1) and to derive a first feature amount and a second feature amount for a query image using the image encoding device. In the similar art, Vorontsov discloses the image encoding device according to claim 1, which derives a first feature amount and a second feature amount for an image using the image encoding device. Regarding claim 1, Vorontsov teaches an image encoding device (Vorontsov, encoder of the model, and related computing components, see Figure 2 and 7, para 45: “encoder 200 of FIG. 2”) comprising:
at least one processor (Vorontsov, para 56: “CPU 702 is the master processor of computer system 700, controlling and coordinating operations of other system components”),
wherein the processor is configured to encode a target image (Vorontsov, input image, para 45: “encoder 200 of FIG. 2 encodes an input image into a latent space to disentangle common features and unique features that may occur in the input image”, See Fig. 6) to derive at least one first feature amount indicating an image feature for an abnormality of a region of interest included in the target image (Vorontsov, para 45: “unique features”; See also Fig. 2, para 25: “unique features Up”; See unique feature, an abnormality in image 114(0) in Fig. 1) and to encode the target image to derive at least one second feature amount indicating an image feature for an image in a case in which the region of interest included in the target image is a normal region (Vorontsov, para 45: “common features”; See 122(0) in Fig. 1, para 21: “common features from the latent space to generate translated images 122(0), which lack the unique features”).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified co-pending claim 1 to incorporate the teachings of Vorontsov to encode normal and abnormal image feature amounts separately, for readily available analysis of the distinct regions in the image (para 17: “features of interest can be readily identified within images without needing to perform a complex manual process to generate training data. Another technological advantage of the disclosed techniques relative to the prior art is that fewer reference segmentations are needed to train the neural network compared to conventional approaches, thereby simplifying and expediting the training process”).
Regarding instant claims 14 and 18, the claimed limitations are taught by co-pending claim 1 in combination with Vorontsov, in the same way as instant claim 1 above.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-2, 4-6, 8, 11-12, and 15-16 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Vorontsov.

Regarding claim 1, Vorontsov teaches an image encoding device (Vorontsov, encoder of the model, and related computing components, see Figure 2 and 7, para 45: “encoder 200 of FIG. 2”) comprising:
at least one first processor (Vorontsov, para 56: “CPU 702 is the master processor of computer system 700, controlling and coordinating operations of other system components”),
wherein the at least one first processor is configured to encode a target image (Vorontsov, input image, para 45: “encoder 200 of FIG. 2 encodes an input image into a latent space to disentangle common features and unique features that may occur in the input image”, See Fig. 6) to derive at least one first feature amount indicating an image feature for an abnormality of a region of interest included in the target image (Vorontsov, para 45: “unique features”; See also Fig. 2, para 25: “unique features Up”; See unique feature, an abnormality in image 114(0), in Fig. 1) and to encode the target image to derive at least one second feature amount indicating an image feature for an image in a case in which the region of interest included in the target image is a normal region (Vorontsov, para 45: “common features”; See 122(0) in Fig. 1, para 21: “common features from the latent space to generate translated images 122(0), which lack the unique features”).


    PNG
    media_image2.png
    544
    707
    media_image2.png
    Greyscale

Regarding claim 2 (dependent on claim 1), Vorontsov teaches wherein a combination of the first feature amount and the second feature amount indicates an image feature for the target image (Vorontsov, combining the first and second feature amounts indicates an image feature for the input image of both normal and abnormal features).

Regarding claim 4 (dependent on claim 1), Vorontsov teaches wherein the at least one first processor is configured to derive the first feature amount and the second feature amount, using an encoding learning model (Vorontsov, encoder, para 45: “encoder 200 of FIG. 2”) which has been trained to derive the first feature amount and the second feature amount in a case in which the target image is input (Vorontsov, para 21: “In one embodiment, training engine 110 trains neural network 120 to encode images 114(0) and 114(1) into a latent space based on weakly-labeled training data 112. The latent space disentangles unique features and common features”, see Fig. 2 where encoder is part of neural network 120).

Regarding claim 5, Vorontsov teaches an image decoding device (Vorontsov, decoders of the model, and related computing components, see Figure 2 and 7, para 23: “common decoder 210, and a residual decoder 220”) comprising:
at least one second processor (Vorontsov, Vorontsov teaches a plurality of processors to execute the methods performed by the device, see FIG. 8/para 58 and claim 27 of Vorontsov; para 111: “Such processors may be, without limitation, general purpose processors, special-purpose processors, application-specific processors, or field-programmable gate arrays”), wherein the at least one second processor is configured to extract a region corresponding to a type of the abnormality of the region of interest in the target image (Vorontsov, Step 608 in Figure 6, para 48: “At step 608, neural network 120 configures residual decoder 220 to generate a segmentation mask indicating locations of unique features in the input image”) on the basis of the first feature amount (Vorontsov, decodes UP, para 25: “Residual decoder 220 decodes unique features Up based on common features CP to generate ΔPA. ΔPA is a translated version of XP that no longer includes the common features”) derived from the target image by the image encoding device according to claim 1 (See Figure 2 wherein Up is input from encoder to decoder 220, see also claim 1 rejection regarding the image encoding device).

Regarding claim 6 (dependent on claim 5), Vorontsov teaches wherein the at least one second processor is configured to derive a first reconstructed image obtained by reconstructing an image feature for an image in a case in which the region of interest in the target image is a normal region (Vorontsov, Step 604 in Figure 6, para 46: “At step 604, common decoder 210 of FIG. 2 decodes common features from the latent space to generate a translated image that lacks unique features”) on the basis of the second feature amount (Vorontsov, decodes CP, para 25: “Common decoder 210 decodes common features Cp to generate XPA. XPA is a translated version of XP that no longer includes the unique features”) and to derive a second reconstructed image obtained by reconstructing an image feature for the target image on the basis of the first feature amount and the second feature amount (Vorontsov, para 16: “The neural network combines the translated images with the image deltas to generate combined images that may include both common features and unique features”; para 25: “ΔPA is expressed as a residual difference between images and may include pixel values that, when combined with XPA, produce a translated image XPP”; Refer to Figure 2 where the combined image is based on CP and Up, or the first and second feature amounts. XPA and ΔPA are based on the decoding of features CP and Up).

Regarding claim 8, Vorontsov teaches an image processing device comprising: 
an image encoding device (Vorontsov, encoder of the model, and related computing components, see Figure 2 and 7, para 45: “encoder 200 of FIG. 2”) comprising at least one first processor (Vorontsov, para 56: “CPU 702 is the master processor of computer system 700, controlling and coordinating operations of other system components”) configured to encode a target image (Vorontsov, input image, para 45: “encoder 200 of FIG. 2 encodes an input image into a latent space to disentangle common features and unique features that may occur in the input image”, See Fig. 6) to derive at least one first feature amount indicating an image feature for an abnormality of a region of interest included in the target image (Vorontsov, para 45: “unique features”; See also Fig. 2, para 25: “unique features Up”; See unique feature, an abnormality in image 114(0), in Fig. 1) and to encode the target image to derive at least one second feature amount indicating an image feature for an image in a case in which the region of interest included in the target image is a normal region (Vorontsov, para 45: “common features”; See 122(0) in Fig. 1, para 21: “common features from the latent space to generate translated images 122(0), which lack the unique features”); and 
an image decoding device (Vorontsov, decoders of the model, and related computing components, see Figure 2 and 7, para 23: “common decoder 210, and a residual decoder 220”) comprising at least one second processor (Vorontsov, Vorontsov teaches a plurality of processors to execute the methods performed by the device, see FIG. 8/para 58 and claim 27 of Vorontsov; para 111: “Such processors may be, without limitation, general purpose processors, special-purpose processors, application-specific processors, or field-programmable gate arrays”) configured to extract a region corresponding to a type of the abnormality of the region of interest in the target image (Vorontsov, Step 608 in Figure 6, para 48: “At step 608, neural network 120 configures residual decoder 220 to generate a segmentation mask indicating locations of unique features in the input image”) on the basis of the first feature amount (Vorontsov, decodes UP, para 25: “Residual decoder 220 decodes unique features Up based on common features CP to generate ΔPA. ΔPA is a translated version of XP that no longer includes the common features”) derived from the target image by the image encoding device (See Figure 2 wherein Up is input from encoder to decoder 220).

Regarding claim 11, all claim limitations are met by Vorontsov because the method steps of claim 11 are the same as claim 1.

Regarding claim 12, all claim limitations are met by Vorontsov because the method steps of claim 12 are the same as claim 5.

Regarding claim 15, Vorontsov teaches a non-transitory computer-readable storage medium that stores an image encoding program that causes a computer to execute (Vorontsov, para 52: “system disk 714 that may be configured to store content and applications and data for use by CPU 702 and parallel processing subsystem 712. In one embodiment, system disk 714 provides non-volatile storage for applications and data”; see the executed encoding program steps below):
a procedure of encoding a target image to derive at least one first feature amount indicating an image feature for an abnormality of a region of interest included in the target image (Vorontsov, para 45: “encoder 200 of FIG. 2 encodes an input image into a latent space to disentangle common features and unique features that may occur in the input image”, See Fig. 6; para 45: “unique features”; See also Fig. 2, para 25: “unique features Up”; See unique feature, an abnormality in image 114(0) in Fig. 1); and
a procedure of encoding the target image to derive at least one second feature amount indicating an image feature for an image in a case in which the region of interest included in the target image is a normal region (Vorontsov, para 45: “common features”; See 122(0) in Fig. 1, para 21: “common features from the latent space to generate translated images 122(0), which lack the unique features”).

Regarding claim 16, Vorontsov teaches a non-transitory computer-readable storage medium that stores an image decoding program that causes a computer to execute (Vorontsov, para 52: “system disk 714 that may be configured to store content and applications and data for use by CPU 702 and parallel processing subsystem 712. In one embodiment, system disk 714 provides non-volatile storage for applications and data”; see the executed decoding program steps below):
a procedure of extracting a region corresponding to a type of the abnormality of the region of interest in the target image on the basis of the first feature amount derived from the target image by the image encoding device according to claim 1 (Vorontsov, para 25: “Residual decoder 220 decodes unique features Up based on common features CP to generate ΔPA. ΔPA is a translated version of XP that no longer includes the common features”; See Figure 2 wherein Up is input from encoder to decoder 220; See also claim 1 rejection regarding the image encoding device).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Vorontsov in view of Gârbacea et al. (Gârbacea, C., van den Oord, A., Li, Y., Lim, F. S., Luebs, A., Vinyals, O., & Walters, T. C. (2019, May). Low bit-rate speech coding with VQ-VAE and a WaveNet decoder. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 735-739). IEEE.), hereinafter Gârbacea.

Regarding claim 3 (dependent on claim 1), Vorontsov teaches further comprising:
a storage that stores at least one first feature vector (Vorontsov teaches first and second feature data, CP and UP) indicating a representative image feature for the abnormality of the region of interest and at least one second feature vector indicating a representative image feature for the image in a case in which the region of interest is the normal region (Vorontsov, storage for data, para 52: “system disk 714 that may be configured to store content and applications and data for use by CPU 702 and parallel processing subsystem 712. In one embodiment, system disk 714 provides non-volatile storage for applications and data”),
wherein the at least one first processor is configured to derive the first feature amount and second feature amount (Vorontsov, para 56, See claim 1 rejection), but fails to explicitly teach first and second feature vectors and configured to derive the first feature amount by substituting a feature vector indicating the image feature for the abnormality of the region of interest with a first feature vector, which minimizes a difference from the image feature for the abnormality of the region of interest, among the at least one first feature vector to quantize the feature vector indicating the image feature for the abnormality of the region of interest and to derive the second feature amount by substituting a feature vector indicating the image feature for the image in a case in which the region of interest is the normal region with a second feature vector, which minimizes a difference from the image feature for the image in a case in which the region of interest is the normal region, among the at least one second feature vector to quantize the feature vector indicating the image feature in a case where in which the region of the region of interest is the normal region (Vorontsov teaches encoding the images into a latent space, but fails to teach further details regarding feature vectors and deriving the feature amounts, para 21: “In one embodiment, training engine 110 trains neural network 120 to encode images 114(0) and 114(1) into a latent space based on weakly-labeled training data 112. The latent space disentangles unique features and common features”).
However, Gârbacea teaches an autoencoder (Gârbacea, bottom right paragraph on pg. 1: “VQ-VAE combines a variational autoencoder (VAE) [13] with a vector quantization (VQ) layer to produce a discrete latent representation which has been shown to capture important high-level features in image, …”), further disclosing to derive a feature amount by substituting a feature vector indicating an image feature (Gârbacea similarly teaches two separate types of data features, time-varying and non time-varying information, see section II-A on pg. 2) with a first feature vector, which minimizes a difference from the image feature, among the at least one first feature vector to quantize the feature vector indicating the image feature (Gârbacea, performed by the vector-quantized VAE, 3rd paragraph on pg. 1: “van den Oord et al. [12] demonstrate a learned autoencoder – the vector-quantized variational autoencoder (VQ-VAE) – which is able to encode speech into a compact discrete latent representation”) and to derive the second feature amount by substituting a feature vector indicating an image feature for the image with a second feature vector (Gârbacea teaches two separate types of data features, time-varying and non time-varying information, see section II-A on pg. 2), which minimizes a difference from the image feature for the image, among the at least one second feature vector to quantize the feature vector indicating the image feature for the (Gârbacea discloses two separate VQ-VAE codebooks for different inputs, Section II-A on pg. 2: “In its place, we add a latent representation (with an associated codebook) that takes its input from the whole utterance and does not vary over time. The time invariant code is generated by mean pooling over the time dimension of the encoder output and fed to a separate codebook. The expectation is that the network will learn to use the time-varying set of codes to encode the message content which varies over time, while summarising and passing speaker-related information through the separate non time-varying set of codes”; bottom left paragraph on pg. 3: “two latent maps each with a 256-element codebook”). Use of the vector-quantized VAE teaches first feature vector indicating a representative feature and second feature vector indicating a representative feature (Gârbacea, in the VQ-VAE vectors are quantized to obtain a discrete latent representation).
Gârbacea discloses the use of two latent maps/codebooks for use in a vector-quantized variational autoencoder. Although Gârbacea demonstrates use of their model with speech data, VQ-VAE architecture has been demonstrated for use with image data as well (Gârbacea, last paragraph on pg. 1: “The VQ-VAE combines a variational autoencoder (VAE) [13] with a vector quantization (VQ) layer to produce a discrete latent representation which has been shown to capture important high-level features in image, audio and video data, yielding an extremely compact and semantically meaningful representation of the input”). It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the VQ-VAE architecture, including two discrete latent representations, of Gârbacea with the device of Vorontsov in order to capture high-level image features in the model (Gârbacea, last paragraph on pg. 1, see last citation). 
Further, it would have been obvious to a person having ordinary skill in the art to utilize two discrete latent representations in the VQ-VAE to separately encode different types of features (Gârbacea, Section II-A on pg. 2: “The expectation is that the network will learn to use the time-varying set of codes to encode the message content which varies over time, while summarising and passing speaker-related information through the separate non time-varying set of codes”).

Claims 7, 9, 13, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Vorontsov in view of Guida et al. (U.S. Patent No. 2022/0076422 A1), hereinafter Guida.

Regarding claim 7 (dependent on claim 6), Vorontsov teaches wherein the at least one second processor is configured to derive an image corresponding to the type of the abnormality of the region of interest in the target image (Vorontsov, See claim 5 rejection), the first reconstructed image (Vorontsov, See claim 6 rejection), and the second reconstructed image (Vorontsov, See claim 6 rejection), using a decoding learning model (Vorontsov, neural network 120 components besides the encoder - decoders, para 23: “common decoder 210, and a residual decoder 220”, and software to produce translated image XPP, see para 25) which has been trained to derive the image corresponding to the type of the abnormality of the region of interest in the target image on the basis of the first feature amount (Vorontsov, para 21: “Training engine 110 further trains neural network 122(1) to decode the unique features (in conjunction with the common features) from the latent space to generate translated images 122(1), which lack the common features”; See 122(1) in Figure 1), to derive the first reconstructed image obtained by reconstructing the image feature for the image in a case in which the region of interest in the target image is the normal region on the basis of the second feature amount (Vorontsov, para 21: “Training engine 110 also trains neural network 120 to decode the common features from the latent space to generate translated images 122(0)”; See 122(0) in Figure 1), and to derive the second reconstructed image obtained by reconstructing the image feature of the target image on the basis of the first feature amount and the second feature amount (Vorontsov, para 16: “The neural network combines the translated images with the image deltas to generate combined images that may include both common features and unique features”; para 26: “training engine 110 of FIG. 1 includes a generative adversarial network (GAN) that discriminates between XP and XPP during training and modifies at least one of encoder 200, common decoder 210, and residual decoder 220 to reduce the difference between XP and XPP. Those familiar with autoencoder training will understand how the GAN can be implemented to improve image translation from XP to XPP”; Refer to Figure 2 where the combined image is based on CP and Up, or the first and second feature amounts. XPA and ΔPA are based on the decoding of features CP and Up).
	Vorontsov fails to teach wherein image corresponding to the type of the abnormality of the region of interest in the target image is a label image.
	However, Guida teaches an encoder-decoder model (Guida, abstract: “encoder-decoder convolutional neural network architecture is employed to process multiparametric magnetic resonance images for the generation of cancer predication maps”) that derives a label image corresponding to the type of the abnormality of the region of interest in the target image (Guida, para 83: “use of encoder-decoder convolutional neural networks for the per-pixel classification of multiparametric MR prostate images for the detection of prostate cancer (prostate adenocarcinoma)”). It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the derived label image of Guida with the device of Vorontsov in order to classify the abnormality in the image (Guida, para 47: “Such a classifier, capable of additionally providing location and classification information with pixel-wise resolution, could result in a powerful aid to the clinician”). 

Regarding claim 9, Vorontsov teaches a learning device (Vorontsov, para 45: “training engine 110”) that trains an encoding learning model in an image encoding device (Vorontsov, encoder of the model, and related computing components, see Figure 2 and 7, para 45: “encoder 200 of FIG. 2”) and a decoding learning model in an image decoding device (Vorontsov, decoders of the model, and related computing components, see Figure 2 and 7, para 23: “common decoder 210, and a residual decoder 220”), using training data consisting of a training image including a region of interest and a training label image corresponding to an abnormality of the region of interest in the training image (Vorontsov, para 45: “training engine 110 of FIG. 1 trains encoder 200 based on weakly-labeled training data 112. Weakly-labeled training data 112 includes images 114(0) and 114(1)”; see how the training images in FIG. 1 include a label for a unique feature in a region in the image), the learning device comprising: 
at least one processor, wherein the at least one processor is configured to derive a first learning feature amount and a second learning feature amount corresponding to the first feature amount and the second feature amount, respectively, from the training image using the encoding learning model (Vorontsov, encodes a feature amount of unique features and a feature amount of common features, para 21: “training engine 110 trains neural network 120 to encode images 114(0) and 114(1) into a latent space based on weakly-labeled training data 112. The latent space disentangles unique features and common features”), to derive a learning label image corresponding to the abnormality of the region of interest included in the training image on the basis of the first learning feature amount, to derive a first learning reconstructed image obtained by reconstructing an image feature for an image in a case in which the region of interest in the training image is a normal region on the basis of the second learning feature amount (Vorontsov, para 21: “Training engine 110 also trains neural network 120 to decode the common features from the latent space to generate translated images 122(0), which lack the unique features”), and to derive a second learning reconstructed image obtained by reconstructing an image feature for the training image on the basis of the first learning feature amount and the second learning feature amount, using the decoding learning model (Vorontsov, para 21: “Training engine 110 further trains neural network 122(1) to decode the unique features (in conjunction with the common features) from the latent space to generate translated images 122(1), which lack the common features. Translated images 122(1) may be represented as image differences, in some embodiments.”; a difference image requires both types of feature amounts), and to train the encoding learning model and the decoding learning model such that at least one of a first loss which is a difference between the first learning feature amount and a predetermined probability distribution of the first feature amount, a second loss which is a difference between the second learning feature amount and a predetermined probability distribution of the second feature amount, a third loss based on a difference between the training label image included in the training data and the learning label image as semantic segmentation for the training image, a fourth loss based on a difference between the first learning reconstructed image and an image outside the region of interest in the training image, a fifth loss based on a difference between the second learning reconstructed image and the training image (Vorontsov, loss term for input image and translated image, para 40: “In one embodiment, training engine 110 trains common decoder 210 and residual decoder 220 based on Objective Function 1 in order to improve image reconstruction. Objective Function 1 includes various loss terms that can be evaluated based on specific input images and translated images”; see para 21 citation above wherein the second learning reconstructed image is a translated image), or a sixth loss based on a difference between regions corresponding to an inside and an outside of the region of interest in the first learning reconstructed image and in the second learning reconstructed image satisfies a predetermined condition (Vorontsov, Objective functions satisfy the condition to minimize the difference between images in order to improve image reconstruction).
Vorontsov fails to teach a training label image corresponding to a type of an abnormality of the region of interest in the training image and to derive a learning label image corresponding to the type of the abnormality of the region of interest included in the training image on the basis of the first learning feature amount (emphasis added). However, Guida teaches an encoder-decoder model (Guida, abstract: “encoder-decoder convolutional neural network architecture is employed to process multiparametric magnetic resonance images for the generation of cancer predication maps”) that derives a label image corresponding to the type of the abnormality of the region of interest in the target image (Guida, para 83: “use of encoder-decoder convolutional neural networks for the per-pixel classification of multiparametric MR prostate images for the detection of prostate cancer (prostate adenocarcinoma)”). It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the derived label image of Guida with the training images of Vorontsov in order to classify the abnormality in the image (Guida, para 47: “Such a classifier, capable of additionally providing location and classification information with pixel-wise resolution, could result in a powerful aid to the clinician”). Doing so could further train the model to identify the type of abnormality, using the training labels.

Regarding claim 13, all claim limitations are met and rendered obvious by Vorontsov in view of Guida because the method steps of claim 13 are the same as those performed in claim 9.

Regarding claim 17, all claim limitations are met and rendered obvious by Vorontsov in view of Guida because the performed steps of claim 17 are the same as that of claim 9.

Claims 10, 14, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Vorontsov in view of Kondo et al. (U.S. Patent No. 2013/0114867 A1), hereinafter Kondo.

Regarding claim 10, Vorontsov teaches a device (Vorontsov, neural network 120, and related computing components, see Figure 2 and 7, para 18: “neural network 120”) comprising:
at least one second processor (Vorontsov, Vorontsov teaches a plurality of processors to execute the methods performed by the device, see FIG. 8/para 58 and claim 27 of Vorontsov; para 111: “Such processors may be, without limitation, general purpose processors, special-purpose processors, application-specific processors, or field-programmable gate arrays”),
wherein the at least one second processor is configured to derive a first feature amount and a second feature amount for a query image using the image encoding device (Vorontsov, for the input image, para 45: “encoder 200 of FIG. 2 encodes an input image into a latent space to disentangle common features and unique features that may occur in the input image”, See Fig. 6) according to claim 1 (Vorontsov, para 23: “neural network 120 includes an encoder 200”, see also claim 1 rejection regarding the image encoding device), 
but fails to teach wherein the device is a similar image search device, and to derive a similarity between the query image and each of a plurality of reference images on the basis of at least one of the first feature amount or the second feature amount derived from the query image with reference to an image database in which a first feature amount and a second feature amount for each of the plurality of reference images are registered in association with each of the plurality of reference images, and to extract a reference image that is similar to the query image as a similar image from the image database on the basis of the similarity.
However, Kondo teaches a similar image search device (Kondo, computer program for Figure 1 units, para 261: “Here, the computer program is configured by combining plural instruction codes indicating instructions for the computer, so as to allow execution of predetermined functions”), disclosing to derive a similarity between a query image (Kondo, target image; para 87: “The image feature quantity extracting unit 150 extracts a plurality of kinds of image feature quantities from the interpretation target image”) and each of a plurality of reference images (Kondo, images in the case database, see para 92 reference below) on the basis of at least one of the first feature amount or the second feature amount (Kondo, plurality of kinds of image feature quantities in para 87) derived from the query image (Kondo, para 92: “The similar case search unit 200 searches the case database 100 for a case including a medical image similar to the interpretation target image and registered in the case database 100 by weighting each of pairs of an image feature quantity of a kind extracted by the image feature quantity extracting unit 150 and an image feature quantity of the same kind extracted from the medical image included in the case”) with reference to an image database (Kondo, case database) in which a first feature amount and a second feature amount for each of the plurality of reference images are registered in association with each of the plurality of reference images (Kondo, multiple feature amounts may be determined, para 98: “The "image feature quantities" relate to, for example, the shapes of organs or lesion portions in medical images, or the luminance distributions of the medical images… As image feature quantities used in this embodiment, several ten to several hundred kinds of image feature quantities are predefined for each of medical imaging apparatuses (modality apparatuses) used to capture the medical images or each of target organs used for image interpretation”; feature quantities for each case image is stored, para 208: “corresponding one of the image feature quantities extracted from a medical image included in a case stored in the case database 100”), and to extract a reference image that is similar to the query image as a similar image from the image database on the basis of the similarity (Kondo, see para 92 reference above; para 59: “similar case searching apparatus which searches a case database for similar case data items including similar images similar to an interpretation target image”).
It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the similar search device of Kondo with the device of Vorontsov in order to obtain similar medical images based on abnormal features in the images for clinicians to reference (Kondo, para 274: “One or more exemplary embodiments of the present disclosure are applicable to similar case searching apparatus which search and present similar cases provided to doctors for reference”).

Regarding claim 14, all claim limitations are met and rendered obvious by Vorontsov in view of Kondo because the method steps of claim 14 are the same as claim 10.

	
Regarding claim 18, Vorontsov teaches a non-transitory computer-readable storage medium that stores a computer program (Vorontsov, para 52: “system disk 714 that may be configured to store content and applications and data for use by CPU 702 and parallel processing subsystem 712. In one embodiment, system disk 714 provides non-volatile storage for applications and data”). 
A similar image search program and further limitations of claim 18 are met with the combination of Vorontsov in view of Kondo, as demonstrated in claim 10 above. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Van Den Oord et al. (introduced in the Non-Final Office Action of 08/07/2025 - Van Den Oord, A., & Vinyals, O. (2017). Neural discrete representation learning. Advances in neural information processing systems, 30.) discloses the Vector-Quantised-Variational AutoEncoder (See abstract on pg. 1).
Gao et al. (introduced in the Non-Final Office Action of 08/07/2025 - Gao, F., Zeng, H., & Huang, Z. (2019, December). Vector Quantization for Large Scale CBIR. In 2019 International Conference on Intelligent Computing, Automation and Systems (ICICAS) (pp. 133-137). IEEE.) discloses a content-based image retrieval system based on vector quantization (See abstract on pg. 1).
Nozaki et al. (introduced in the Non-Final Office Action of 08/07/2025 - U.S. Patent No. 2021/0256701 A1) discloses an autoencoder that encodes normal and abnormal feature amounts and reconstructs an image in a case in which the region of interest is the normal region (para 53: “Additionally, when given an abnormal image, such as a reference image indicative of a particular severity level or an abnormal subject image, the trained model can generate a corresponding normal image. For instance, an abnormal image may be obtained and compressed with the encoder. Using the decoder, the input normal features (encodings) are restored, and the input abnormal features are not restored”); and a method for performing a similar image search (para 61: “The method 100 may acquire reference stomach data for other gastric disorders, which may be processed in a manner similar to those discussed in the present example”).
Yasutomi et al. (introduced in the Non-Final Office Action of 08/07/2025 - U.S. Patent No. 2020/0226796 A1) discloses an encoder and two decoders that decode separate image features (See Figure 2, attached below).

    PNG
    media_image3.png
    273
    708
    media_image3.png
    Greyscale

Astaraki et al. (introduced in the Non-Final Office Action of 08/07/2025 - Astaraki, M., Toma-Dasu, I., Smedby, Ö., & Wang, C. (2019, October). Normal appearance autoencoder for lung cancer detection and segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 249-256). Cham: Springer International Publishing.) discloses a similar system (See Figure 1, attached below).

    PNG
    media_image4.png
    442
    750
    media_image4.png
    Greyscale


THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EMMA E DRYDEN whose telephone number is (571)272-1179. The examiner can normally be reached M-F 9-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ANDREW BEE can be reached at (571) 270-5183. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/EMMA E DRYDEN/Examiner, Art Unit 2677                                                                                                                                                                                                        
/ANDREW W BEE/Supervisory Patent Examiner, Art Unit 2677
Read full office action
Prosecution Timeline

Show 1 earlier event
Aug 07, 2025
Non-Final Rejection mailed — §102, §103
Oct 30, 2025
Response Filed
Jan 12, 2026
Final Rejection mailed — §102, §103
Mar 04, 2026
Interview Requested
Mar 16, 2026
Examiner Interview Summary
Mar 16, 2026
Applicant Interview (Telephonic)
Apr 13, 2026
Request for Continued Examination
Apr 15, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

18/340,515
Patent 12632966
METHOD, ELECTRONIC DEVICE, AND COMPUTER PROGRAM PRODUCT FOR RECOGNIZING OBJECT REGIONS IN IMAGE
2y 11m to grant Granted May 19, 2026
18/171,522
Patent 12561873
IMAGE PROCESSING APPARATUS AND METHOD
3y 0m to grant Granted Feb 24, 2026
17/641,440
Patent 12543950
SLIT LAMP MICROSCOPE, OPHTHALMIC INFORMATION PROCESSING APPARATUS, OPHTHALMIC SYSTEM, METHOD OF CONTROLLING SLIT LAMP MICROSCOPE, AND RECORDING MEDIUM
3y 11m to grant Granted Feb 10, 2026
17/951,249
Patent 12526379
AUTOMATIC IMAGE ORIENTATION VIA ZONE DETECTION
3y 3m to grant Granted Jan 13, 2026
17/934,618
Patent 12340443
METHOD AND APPARATUS FOR ACCELERATED ACQUISITION AND ARTIFACT REDUCTION OF UNDERSAMPLED MRI USING A K-SPACE TRANSFORMER NETWORK
2y 9m to grant Granted Jun 24, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
62%
Grant Probability
92%
With Interview (+30.0%)
2y 10m (~0m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 13 resolved cases by this examiner. Grant probability derived from career allowance rate.