Office Action Analysis: 18547855 — ACTOR-CRITIC APPROACH FOR GENERATING SYNTHETIC IMAGES

Examiner Intelligence

YAO, JULIA ZHI-YI View full profile →
Grants 68% — above average
Career Allow Rate
47 granted / 69 resolved
+6.1% vs TC avg
Strong +36% interview lift
Without
With
+35.7%
Interview Lift
resolved cases with interview
Typical timeline
3y 4m
Avg Prosecution
29 currently pending
Career history
98
Total Applications
across all art units
Statute-Specific Performance

§101
8.9%
-31.1% vs TC avg
§103
52.6%
+12.6% vs TC avg
§102
11.2%
-28.8% vs TC avg
§112
26.1%
-13.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 69 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendments
Applicant’s preliminary amendments filed August 24th, 2023, to the Specification and Claims have been entered. In the preliminary remarks and amendments, claims 1-15 are amended. Accordingly, claims 1-15 are currently pending for examination in the Application No. 18/547,855 filed August 24th, 2023.

Priority
Acknowledgment is made of applicant’s status as a U.S. National Stage Filing under 35 U.S.C. § 371 of International Application No. PCT/EP2022/053756, filed on February 16th, 2022, which claims priority to foreign Patent Application No. EP 21159750.5, filed on February 26th, 2021.
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed as to foreign Patent Application No. EP 21159750.5, filed on February 26th, 2021.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on August 24th, 2023, is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the IDS is being considered and attached by the examiner.

Drawings
The drawings are objected to because of the following informalities:
Figs. 4-8 are objected to as depicting a block diagram without “readily identifiable” descriptors of each block, as required by 37 CFR 1.84(n). Rule 84(n) requires “labeled representations” of graphical symbols, such as blocks; and any that are “not universally recognized may be used, subject to approval by the Office, if they are not likely to be confused with existing conventional symbols, and if they are readily identifiable.” In the case of Figs. 4-8, the blocks are not readily identifiable per se and therefore require the insertion of text that identifies the function of that block. That is, each vacant block should be provided with a corresponding label identifying its function or purpose. The examiner respectfully suggests applicant to include descriptive labels to correct this informality.   
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance. 

Claim Objections
Claims 8 and 14 are objected to because of the following informalities failing to comply with 37 CFR 1.71(a)  for "full, clear, concise, and exact terms" (see MPEP § 608.01(m)): 
In claim 8, “The method according to of claim 1…” should be “The method [[according to ]]of claim 1…”; and
 In claim 14, “output the synthetic image via the output unit” should be “output the synthetic image[[ via the output unit]]”.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 3-4 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim(s) 3 and 4, the term “preferably” in the limitation “preferably a convolutional neural network” in each of these claims render these claims indefinite because it is unclear whether the limitation(s) following the term are part of the claimed invention. See MPEP § 2173.05(d). For examination purposes, the limitation “preferably a convolutional neural network” in each of these claims will be read as “[[preferably]]the artificial neural network is a convolutional neural network”.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-6, 8-11, and 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Zhou et al. (Zhou; US 2019/0220977 A1; provided in Applicant’s IDS submitted on August 24th, 2023) in view of Liu et al. (Liu; “Multimodal MR Image Synthesis Using Gradient Prior and Adversarial Learning,” 2020).

Regarding claim 1, Zhou discloses a computer-implemented method comprising:
providing an actor-critic framework comprising an actor and a critic (para(s). [0050], recite(s)
[0050] “A generative adversarial network (GAN) is described above and illustrated in FIG. 2. The GAN can be extended to cross-domain medical image synthesis in which a synthesized medical image J in a target domain B is generated from an input medical image I is a source domain A. FIG. 6 illustrates a generative adversarial network (GAN) for cross-domain medical image synthesis according to an embodiment of the present invention. As shown in FIG. 6, the GAN includes a generator network G 600 and a discriminator network 610. An input image I 602 in a source domain is input to the generator G 600. The generator G 600 is a deep neural network that generates a synthesized output image J′ 604 in a target domain from the input image I 602. In an exemplary implementation, the generator G can be implemented using a DI2IN, such as the DI2IN 100 of FIG. 1. The synthesized output image J′ 604 and a real image J 606 in the target domain are input to a discriminator D 610. The discriminator D 610 is another deep neural network that distinguishes between the synthesized output imager J′ 604 and the real image J 606 in the target domain. In particular, the discriminator D 610 classifies each image as real (positive) or synthesized (negative). …”
, where a “generative adversarial network (GAN)” is an actor-critic framework comprising an actor (i.e., a “generator network”) and a critic (i.e., a “discriminator network”));
training the actor-critic framework based on training data comprising a multitude of datasets, each dataset comprising an input dataset and a corresponding ground truth image (para(s). [0030], recite(s)
[0030] “…In particular, a first set of M training pairs                         
                            {
                            (
                            
                                            I
                                        
                                            A
                                        
                                    m
                                
                            ,
                            
                                            J
                                        
                                            A
                                        
                                    m
                                
                            )
                            }
                        
                     from domain A and a second set of N training pairs                         
                            {
                            (
                            
                                            I
                                        
                                            B
                                        
                                    m
                                
                            ,
                            
                                            J
                                        
                                            B
                                        
                                    m
                                
                            )
                            }
                        
                     from domain B. For each domain, each training pair includes a training input image I from that domain and a corresponding ground truth output image J that provides the results of the target medical image analysis task for the corresponding training input image I. In many cases, the set of training pairs from one domain will be much larger than the set of training pairs from the other domain (e.g., M>>N). In such cases, the training of the cross-domain DI2IN 400 and adversarial network is beneficial since knowledge learned from training the encoder for the domain with the larger set of training pairs is automatically integrated into the training of the encoder for the other domain with the smaller set of training pairs.”
, where sets of “training pairs” is training data comprising a multitude of datasets, each dataset comprising an input dataset (e.g., “input image I”) and a corresponding ground truth image (e.g., “corresponding ground truth output image J”)), wherein training the actor-critic framework comprises:
training the actor to generate, for each dataset, at least one synthetic image from the input dataset (para(s). [0050]—see citation above—, where the “synthesized output image J’” is a synthetic image generated by training the actor (i.e., “generator network”)),
training the critic to:
receive the at least one synthetic image and/or the corresponding ground truth image (para(s). [0050]—see citation above—, where 
[0050] “…The synthesized output image J′ 604 and a real image J 606 in the target domain are input to a discriminator D 610. …”
Is the critic (i.e., “discriminator network”) is trained to receive (e.g., “input”) at least one synthetic image (e.g., “synthesized output image J’”) and the corresponding ground truth image (e.g., “real image J”)),
classify the received image(s) into one of two classes, the two classes comprising a first class and a second class, the first class comprising synthetic images, and the second class comprising ground truth images (para(s). [0050]—see citation above—, where 
[0050] “… the discriminator D 610 classifies each image as real (positive) or synthesized (negative). …”
is classifying the received images into at least a first class comprising synthetic images (i.e., “synthesized (negative)”) and a second class comprising ground truth images (i.e., “real (positive)”)), and
output a classification result (para(s). [0050]—see citation above—, where classifying the received images as either “real (positive) or synthesized (negative)” is outputting a classification result)

wherein a loss function is used to minimize deviations between the at least one synthetic image and the corresponding ground truth image(para(s). [0050], further recite(s)
[0050] “…During training, the generator G 600 and the discriminator D 610 together play the following minimax game:
                        
                                    m
                                    i
                                    n
                                
                                    G
                                
                                    m
                                    a
                                    x
                                
                                    D
                                
                                    E
                                
                                    J
                                    -
                                    p
                                    
                                            J
                                        
                                            log
                                        
                                        ⁡
                                        
                                                    D
                                                    
                                                            J
                                                        
                            +
                            
                                    E
                                
                                    I
                                    -
                                    p
                                    
                                            I
                                        
                                            log
                                        
                                        ⁡
                                        
                                                    1
                                                    -
                                                    D
                                                    
                                                                    J
                                                                
                                                                    '
                                                                
                                                            =
                                                            G
                                                            
                                                                    I
                                                                
                    	(11)
The networks are trained end-to-end by iteratively adjusting the parameters (weights) of the discriminator D 610 and the generator G 600 to optimize the minimax objective function in Equation (1). In Equation (1), the first term is a cost related to classification of the real sample J 606 by the discriminator D 610 and the second term is a cost related to the synthesized sample J′ 604 by the discriminator D 610. The discriminator D 610 maximizes the function (i.e., trying its best to distinguish between the real and synthesized samples) and the generator G 600 minimizes the function (i.e., synthesize real looking samples to fool the discriminator). The generator G 600 and the discriminator D 610 evolve dynamically in the sense of learning better network parameters until they reach equilibrium, that is, the synthesized sample J′ 604 becomes indistinguishable (or as close as possible from being indistinguishable) from the real sample J 606 through the eyes of the discriminator D 610. …”
, where objective cost function Eq. (11) is a loss function minimizing the deviations between the synthetic (“synthesized”) image and the corresponding truth (“real”) image).

Where Zhou does not specifically disclose
…wherein a saliency map relating to the received image(s) is generated from the critic based on the classification result, and
wherein a loss function is used to minimize deviations between the at least one synthetic image and the corresponding ground truth image at least partially based on the saliency map;
Liu teaches in the same field of endeavor of synthetic image generation using an actor-critic framework
…wherein a saliency map relating to the received image(s) is generated from the critic based on the classification result (sections III(A)(2) on pg. 1180 and III(B)(2) on pg. 1181, recite(s)
[2) Gradient Detector] “Most GAN-based image synthesis models focus on pixel-to-pixel conversion, but for medical images, edge information contains important tissue, organ, and even lesion information. To better utilize the structural information in the discriminator to supervise the generator, we use the gradient maps of both synthesized image and real image. The gradient map of the synthesized image and the gradient map of original image are calculated using equation (2). The gradient map and the original image are then combined into an image pair as the input to the discriminator. The gradient information from the real image will be used as a priori information to guide the discriminator to distinguish between the real image and the synthesized image, forcing the generator to synthesize an image with similar structure to the real image. … The gradient map pair of the synthesized image pair and the real image pair are respectively used as the input to the discriminator.”
[2) Synthesis Loss] “…In medical image processing, image gradient information is important, and it contains important diagnostic information such as tissue structure or contour of the lesion area [26]. To protect this important diagnostic information, we added gradient loss in [13] to the synthesis loss. …
…In equation (9), the gradient difference loss calculates the horizontal and vertical gradient difference between the synthesized image and the ground truth. Minimizing the gradient difference loss attempts to keep the same texture information between the synthesized image and ground truth.”
, where the “gradient difference loss” is a saliency map (e.g., a “gradient map”) generated from the critic (“discriminator”) based on the classification result (i.e., “synthesis loss”)) , and
wherein a loss function is used to minimize deviations between the at least one synthetic image and the corresponding ground truth image at least partially based on the saliency map  (sections III(A)(2) on pg. 1180 and III(B)(2) on pg. 1181—see preceding citation above—, where the added loss of “gradient loss” is a loss function further used to minimize deviations between the at least one synthetic image (“synthesized image”) and the corresponding ground truth image (“ground truth”) at least partially based on the saliency map (“gradient difference”)).

Since Zhou also discloses using synthesis loss based on gradients of the at least one synthetic image and the corresponding ground truth (para(s). [0062], recite(s)
[0062] “In steps 908 and 910, it is practically found that rather than minimizing log(1−D), maximizing log(D) (minimizing −log(D)) leads to better gradient signals early in learning, yet both objective functions yield the same fixed point.”
, where the “object functions” are synthesis losses), it would have been obvious to one of ordinary skill in the art before the effective filing date of the presently filed invention to modify the system of Zhou to incorporate a saliency map generated from the critic based on the classification map related to the received images and minimizing deviations between the at least one synthetic image and the corresponding ground truth image at least partially based on the saliency map to improve generation of the synthetic image by minimizing the gradient loss between the at least one synthetic image and the corresponding ground truth image to minimize loss of texture information in medical images as taught by Liu above.

Regarding claim 2, Zhou in view of Liu discloses the method of claim 1, wherein Zhou further discloses the method of claim 1 further comprising:
storing the actor in a data storage (para(s). [0064] and [0081], recite(s)
[0064] “At step 914, once the stop condition is reached, the training ends. The trained first generator G 1 802 …are stored in a memory or storage of a computer system …”
[0081] “The above-described methods for cross-domain medical image analysis, cross-domain medical image synthesis, training deep neural networks for cross-domain medical image analysis, and training deep neural networks for cross-domain medical image synthesis may be implemented on a computer using well-known computer processors, memory units, storage devices, computer software, and other components. … The computer program instructions may be stored in a storage device 1112 (e.g., magnetic disk) and loaded into memory 1110 when execution of the computer program instructions is desired. Thus, the steps of the methods of FIGS. 3, 5, 7, 9, and 10 may be defined by the computer program instructions stored in the memory 1110 and/or storage 1112 and controlled by the processor 1104 executing the computer program instructions. …”
, where the “deep neural networks” includes the actor (e.g., a “first generator”)).

Regarding claim 3, Zhou in view of Liu discloses the method of claim 1, wherein Zhou further discloses the actor is or comprises an artificial neural network, preferably a convolutional neural network (para(s). [0048-0049], recite(s)
[0048] “Suppose that given an input image I of size m x n, we aim to synthesize an output image J of the same size. Note that we use 2D as a working example, but it is straightforward extend this to 3D or even higher dimensions. … Many machine learning methods can be used, including k-nearest neighbor, support vector regression, random regression forest, boosting regression, etc. Recently, neural networks, such as a convolutional neural network (CNN), have been used for learning such a mapping function for patch-based image synthesis. The benefits of using a CNN lie in its powerful hierarchical feature representation and efficient computation.”
[0049] “A deep DI2IN is a universal variant of CNN that has applications for medical image analysis tasks beyond image synthesis, including landmark detection, image segmentation, image registration etc. In a possible implementation, a deep image-to-image network (DI2IN), such as the DI2IN 100 of FIG. 1 described above, can be used for cross-domain medical image synthesis. In this case, the input image I is a medical image from a source domain and the output image J is medical image in a target domain.”
, where the generator (e.g., “deep DI2IN”) is at least a convolutional neural network (CNN)).

Regarding claim 4, Zhou in view of Liu discloses the method of claim 1, wherein Zhou further discloses the critic is or comprises an artificial neural network, preferably a convolutional neural network (para(s). [0048]—see citation in claim 3 above—, where para(s). [0071] further recite(s):
[0071] “At step 1000, in the training stage, a geometry-preserving GAN is trained for cross-domain medical image synthesis. The GAN includes a generator that is a deep neural network for generating a synthesized medical image is a target domain from an input medical image in a source domain, and a discriminator that is another deep neural network for distinguishing between synthesized medical images in the target domain generated by the generator and real medical images in the target domain. The GAN framework for cross-domain medical image synthesis is illustrated in FIG. 6. In an advantageous implementation, the generator of the geometry-preserving GAN can be implemented using a deep image-to-image network (DI2IN), such as the DI2IN illustrated in FIG. 1.”
, where the “discriminator that is another deep neural network” is at least a convolutional neural network as disclosed in para(s). [0048]).

Regarding claim 5, Zhou in view of Liu discloses the method of claim 1, wherein Liu further teaches the saliency map is generated by taking a gradient of the classification result with respect to the received image(s) (sections III(A)(2) on pg. 1180 and III(B)(2) on pg. 1181—see citations in claim 1 above—, where the “gradient difference between the synthesized image and the ground truth” is at least a saliency map generated by taking a gradient of the classification result with respect to the received image(s)).

Regarding claim 6, Zhou in view of Liu discloses the method of claim 1, wherein Liu further teaches the saliency map is generated from gradient maps related to the at least one synthetic image and to the corresponding ground truth image (sections III(A)(2) on pg. 1180 and III(B)(2) on pg. 1181—see citations in claim 1 above—, where the “gradient map of the synthesized image” is a gradient map related to the at least one synthetic image and the “gradient map of the original image” or “real image” is a gradient map of a corresponding ground truth image).

Regarding claim 8, Zhou in view of Liu discloses the method of claim 1, wherein Zhou further discloses the method of claim 1 further comprising:
receiving a new input dataset (para(s). [0051], recite(s)
[0051] “…Once the first and second deep neural networks for cross-domain bilateral medical image synthesis are trained in the training stage, the inference stage can be repeated for newly received medical images to generate synthesized medical images in a second domain from received medical images in a first domain and to generate synthesized medical images in the first domain from received medical images in the second domain.”
, where the “newly received medical images to generate synthesized medical images” is receiving a new input dataset);
inputting the new input dataset into the actor (para(s). [0052], recite(s)
[0052] “…As shown in FIG. 8, the bilateral GAN 800 includes a first generator network G 1 802 for generating synthesized images of a second domain (e.g., domain B) from input images of a first domain (e.g., domain A) …”
, where the “first generator network G” is an actor equivalent to the actor disclosed in para(s). [0050] in claim 1 above);
receiving from the actor a new synthetic image (para(s). [0051-0052]—see citations above—, where inputting the new input dataset as “input images” into an actor generates new “synthesized images” corresponding to the new input dataset); and
outputting the new synthetic image (para(s). [0051-0052]—see citations above—, where generating “synthesized images” corresponding to the new input dataset is outputting new synthetic images).

Regarding claim 9, Zhou in view of Liu discloses the method of claim 1, wherein Zhou further discloses each dataset of the multitude of datasets belongs to a subject or an object (para(s). [0030]—see citation in claim limitation “training the actor-critic framework based on training data—, where para(s). [0021] further recite(s):
[0021] “Medical images can be acquired using different types of imaging devices, such as ultrasound, computed tomography (CT), and magnetic resonance imaging (MRI) image acquisition devices. Analysis of such medical images thus can benefit from leveraging shared knowledge from multiple domains. For example, consider the medical image analysis task of segmented target anatomical structures from medical images. The same anatomical structure, e.g., the liver, appearing in CT and MRI images for the same patient shares the same morphology though its appearance is different. Designing two independent segmentation pipelines, one for CT and the other for MRI, is suboptimal. Embodiments of the present invention provide a machine learning based method for cross-domain image analysis. In an advantageous embodiment of the present invention, a deep image-to-image network and adversarial network are used to train together deep neural networks to perform medical image analysis tasks for medical images from different domains, such that knowledge from one domain can improve the performance of the medical image analysis tasks in another domain. …”
, where the multitude of datasets are “medical images” belonging to at least a subject (e.g., a “patient”)).

Regarding claim 10, Zhou in view of Liu discloses the method of claim 9, wherein Zhou further discloses each subject is a patient and the ground truth image of each subject is at least one medical image of the patient (para(s). [0030] and [0021]—see citations in claim 9 above).

Regarding claim 11, Zhou in view of Liu discloses the method of claim 9, wherein Zhou further discloses the subject is a patient, and the input dataset comprises at least one medical image of the patient (para(s). [0030] and [0021]—see citations in claim 9 above).

Regarding claim 14, Zhou discloses a computer system, comprising one or more processors configured to:
receive an input dataset (para(s). [0030]—see citation in claim1 limitation “training the actor-critic framework based on training data…” above—, where each input image (e.g., “input image I”) in each set of “training pairs” are an input dataset);
input the input dataset into a predictive machine learning model (para(s). [0050]—see citation in claim 1 limitation “providing an actor-critic framework…” above—, where the “generator network” is a predictive machine learning model receiving the input data set (i.e., each “input image I” in each set of “training pairs”) as input);
receive from the predictive machine learning model a synthetic image (para(s). [0050]—see citation in claim 1 limitation “providing an actor-critic framework…” above—, where the “synthesized output image J’” is a synthetic image received from training the predictive machine learning model (i.e., “generator network”)); and
output the synthetic image via the output unit (para(s). [0050]—see citation in claim 1 limitation “providing an actor-critic framework…” above—, where the “generator network” outputs the “synthesized output image J’”),
wherein the predictive machine learning model was trained in a training process to generate synthetic images from input datasets (para(s). [0050]—see citation in claim 1 limitation “wherein a loss function is used…” above—, where “optimiz[ing] the minimax objective function” is training the predictive machine learning model in a training process to generate synthetic images from input datasets), the training process comprising:
receiving training data comprising a multitude of datasets, each dataset comprising an input dataset and a corresponding ground truth image (para(s). [0030]—see similar limitation in claim 1, “training the actor-critic framework based on training data…”, above);
providing an actor-critic framework comprising an actor and a critic (para(s). [0050]—see similar limitation in claim 1, “providing an actor-critic framework…”, above);
training the actor-critic framework on the basis of based on the training data (para(s). [0030]—see similar limitation in claim 1, “training the actor-critic framework based on training data…”, above), wherein training the actor-critic framework comprises:
training the actor to:
generate, for each dataset, at least one synthetic image from the input dataset (para(s). [0050]—see similar limitation in claim 1, “training the actor to generate, for each dataset, at least one synthetic image…”, above), and
output the at least one synthetic image (para(s). [0050]—see similar limitation in claim 1, “training the actor to generate, for each dataset, at least one synthetic image…”, above—, where the generated synthetic image is output by the actor (i.e., “generator network”)),
wherein the critic is trained to:
receive the at least one synthetic image and/or the corresponding ground truth image (para(s). [0050]—see similar limitation in claim 1, “receive the at least one synthetic image and/or…”, above), and
output a classification result for each received image, wherein the classification result indicates whether the received image is a synthetic image or a ground truth image (para(s). [0050]—see similar limitations in claim 1, “classify the received image(s) into one of two classes…” and “output a classification result…”, above—, where classifying the image as either “real (positive) or synthesized (negative)” is outputting a classification result indicating whether the received image is a ground truth image or a synthetic image, respectively)

wherein a loss function is used to minimize deviations between the at least one synthetic image and the corresponding ground truth image(para(s). [0050]—see similar limitation in claim 1, “wherein a loss function is used…”, above).
Where Zhou does not specifically disclose
…wherein a saliency map relating to the received image(s) is generated from the critic based on the classification result, and
wherein a loss function is used to minimize deviations between the at least one synthetic image and the corresponding ground truth image at least partially based on the saliency map;

Liu same field of endeavor of synthetic image generation using an actor-critic framework
…wherein a saliency map relating to the received image(s) is generated from the critic based on the classification result (sections III(A)(2) on pg. 1180 and III(B)(2) on pg. 1181—see similar teaching in claim 1 above), and
wherein a loss function is used to minimize deviations between the at least one synthetic image and the corresponding ground truth image at least partially based on the saliency map (sections III(A)(2) on pg. 1180 and III(B)(2) on pg. 1181—see similar teaching in claim 1 above);
Claim 14 recites similar limitations to claim 1 and is rejected for similar rationale and reasoning (see the analysis for claim 1 above).

Regarding claim 15, the claim differs from claim 14 in that the claim is in the form of a non-transitory computer-readable storage medium storing instructions for generating a synthetic image. Zhou discloses said non-transitory computer-readable storage medium (para(s). [0064] and [0081]—see citations in claim 2 above—, where the “memory or storage” is a non-transitory computer-readable storage medium). Therefore, claim 15 recites similar limitations to claim 14 and is rejected for similar rationale and reasoning (see the analysis for claim 14 above).

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Zhou in view of Liu as applied to claim 1 above, and further in view of Mescheder et al. (Mescheder; “Which Training Methods for GANs do actually Converge?,” 2018).

Regarding claim 7, Zhou in view of Liu discloses the method of claim 1, wherein Mescheder teaches in the same field of endeavor of generating synthetic images using actor-critic frameworks the loss function is computed by multiplying a pixel-wise loss function with the saliency map (section 2.1 on pg. 2, recite(s)

    PNG
    media_image1.png
    419
    485
    media_image1.png
    Greyscale
 
    PNG
    media_image2.png
    343
    477
    media_image2.png
    Greyscale

, where the gradient descent algorithm as depicted in Eq. (2) is multiplying a pixel-wise loss function (i.e., the object “loss function” of a generative adversarial network (GAN) depicted in Eq. (1)) with a saliency map (e.g., gradients [Symbol font/0xD1])).
Since Zhou also discloses the pixel-wise loss function to train and optimize a GAN (Zhou; para(s). [0050]—see citation in claim 1 limitation “wherein a loss function…” above—, where para(s). [0054] further recites:
[0054] “…In an advantageous implementation, the first cost function C.sub.1 810 computes a pixel-wise (or voxel-wise) error between the each training image I in the first domain and the respective synthesized image I″ synthesized from the synthesized image J′ synthesized from that training image I. Since the synthesized image I″ can be traced back the original image I, they can be directly compared using a pixel-wise cost function to measure the consistency between the images. …”
, where Eq. (11) of Zhou is identical to Eq. (1) of Mescheder), it would have been obvious to one of ordinary skill in the art before the effective filing date of the presently filed invention to modify the system of Zhou in view of Liu to incorporate multiplying a pixel-wise loss function with the saliency map (e.g., a “gradient map” as disclosed in Liu) to train and optimize the actor-critic framework (i.e., the GAN) as commonly known in the art as disclosed by Mescheder above.

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Zhou in view of Liu as applied to claim 1 above, and further in view of Zhao et al. (Zhao; “Craniomaxillofacial Bony Structures Segmentation from MRI with Deep-Supervision Adversarial Learning,” 2018; provided in Applicant’s IDS submitted on August 24th, 2023).

Regarding claim 12, Zhou in view of Liu discloses the method of claim 1, wherein Zhou further discloses the input dataset of each dataset of the multitude of datasets comprises a medical image(para(s). [0030] and [0021]—see citations in claim 9 above), wherein the actor is trained to generate synthetically segmented medical images from the medical images (para(s). [0027], recite(s)
[0027] “…The cross-domain DI2IN 400 of FIG. 4 is trained to perform the target medical image analysis task for two domains (A and B) of medical images. Domains A and B can be any two domains of medical images, such as different medical imaging modalities (e.g., CT and MRI) or different image domains within the same medical imaging modality (e.g., T1-weighted MRI and T2-weighted MRI). The target medical image analysis task can be any medical image analysis task, such as landmark detection, anatomical object segmentation, etc. For example, in a possible implementation, the cross-domain DI2IN 400 can be trained to perform segmentation of an organ (e.g., the liver) in both CT and MRI images.”
, where training the “cross-domain DI2IN” to perform “anatomical object segmentation” is training the actor to generate synthetically segmented medical images from the medical images).
	Where Zhou in view of Liu does not specifically disclose
	…the multitude of datasets comprises a medical image and a segmented medical image;
Zhao in the same field of endeavor of synthetic image generation using an actor-critic framework
	…multitude of datasets comprises a medical image and a segmented medical image (sections 2 Method on pg. 2 and 2.2 on pg. 3, recite(s)
[2 Method] “In this section, we propose a cascaded generative adversarial network with deep-supervision discriminators (Deep-supGAN) to perform CMF bony structures segmentation from the MR image and generated CT image. The proposed framework is shown in Fig. 1. It includes two parts: (1) a simulation GAN that estimates a CT image from an MR image and (2) a segmentation GAN that segments the CMF bony structures based on both the original MR image and the generated CT image. …”
[2.2 Segmentation GAN] “Similarly, with the generated CT                         
                            x
                            '
                        
                     from                         
                            
                                    G
                                
                                    c
                                
                                    z
                                
                    , we can construct a segmentation GAN                         
                            
                                    G
                                
                                    s
                                
                                    z
                                    ,
                                    x
                                    '
                                
                    , which learns to predict a bony structures segmentation                         
                            y
                            ’
                        
                    . Then, the ground-truth y and the predicted segmentation                         
                            y
                            ’
                        
                     are forwarded to the discriminator                         
                            
                                    D
                                
                                    s
                                
                                    y
                                
                     to get an evaluation. …”
, where the “original” and/or “ground truth” MR and/or CT images are medical images and the segmentation of “original” images are segmented medical images).
Since Zhou and Zhao each disclose using an actor-critic framework generating synthetic images from one image domain to another for image segmentation, it would have been obvious to one of ordinary skill in the art before the effective filing date of the presently filed invention to modify the system of Zhou in view of Liu to further incorporate segmented medical images to train the actor for generating synthetically segmented medical images from the medical images as taught by Zhao above.

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Zhou in view of Liu as applied to claim 1 above, and further in view of Zaharchuk et al. (Zaharchuk; US 2019/0108634 A1; provided in Applicant’s IDS submitted on August 24th, 2023).

Regarding claim 13, Zhou in view of Liu discloses the method of claim 1, wherein Zhou further discloses the input dataset of each dataset of the multitude of datasets comprises a zero-contrast imagesynthetic full-contrast images from the zero-contrast(para(s). [0047], recite(s)
[0047] “As used herein, cross-domain synthesis refers to synthesis of medical images across medical imaging modalities, such as synthesizing a CT image from an MR image, as well as synthesis of images across an image domain, such MR images with different protocols (e.g., T1 and T2), contrast CT images and non-contrast CT images, CT image captured with low kV and CT images captured with high kV, or any type of low resolution medical image to a corresponding high resolution medical image. That is, the “source domain” and “target domain” may be completely different medical imaging modalities or different image domains or protocols within the same overall imaging modality.”
, where “contrast CT images” are full-contrast images and “non-contrast CT images” are zero-contrast images; such that the “source domain” is the “non-contrast” images and the “target domain” (e.g., domain of the synthetic image) is the “contrast” images).
	Where Zhou in view of Liu does not specifically disclose
…multitude of datasets comprises a zero-contrast image, a low-contrast image, and a full-contrast image, wherein the actor is trained to generate synthetic full-contrast images from the zero-contrast and the low-contrast images;
	Zaharchuk teaches in the same field of endeavor of generating synthetic full-contrast images from no-contrast images
…multitude of datasets comprises a zero-contrast image, a low-contrast image, and a full-contrast image, wherein the actor is trained to generate synthetic full-contrast images from the zero-contrast and the low-contrast images (para(s). [0024], recite(s)
[0024] “…The input to the deep learning network is a zero-contrast dose image 100 and low-contrast dose image 102, while the output of the network is a synthesized prediction of a full-contrast dose image 116. During training, a reference full contrast image 104 is compared with the synthesized image 116 using a loss function to train the network using error backpropagation.”
, where the “deep learning network” synthesizing a “full-contrast dose image” is an actor trained to generate a synthetic full-contrast image (i.e., “full-contrast dose image”) from a zero-contrast image (i.e., “zero-contrast dose image”) and low-contrast image (i.e., “low-contrast dose image”)).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the presently filed invention to modify the system of Zhou in view of Liu to incorporate a low-contrast image in addition to the no-contrast image and full-contrast image in the input dataset in each dataset of the multitude of datasets to improve synthesizing full-contrast images by using both zero-contrast and low-contrast images as input images to reduce noise in the generated synthetic full-contrast images as taught by Zaharchuk (para(s). [0030], recite(s)
[0030] “After pre-processing, a deep learning network is trained using the true 100% full-dose CE-MRI images as the reference ground-truth. The non-contrast (zero-dose) MRI and the 10% low-dose CE-MRI are provided to the network as inputs, and the output of the network is an approximation of the full-dose CE-MRI. During training, this network implicitly learns the guided denoising of the noisy contrast uptake extracted from the difference signal between low-dose and non-contrast (zero-dose) images, which can be scaled to generate the contrast enhancement of a full-dose image.”
).

Conclusion
Any inquiry concerning this commu
Read full office action
Prosecution Timeline

Aug 24, 2023
Application Filed
Oct 10, 2025
Non-Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/421,030
Patent 12597169
ACTIVITY PREDICTION USING PORTABLE MULTISPECTRAL LASER SPECKLE IMAGER
2y 5m to grant Granted Apr 07, 2026
18/289,093
Patent 12586219
Fast Kinematic Construct Method for Characterizing Anthropogenic Space Objects
2y 5m to grant Granted Mar 24, 2026
17/822,138
Patent 12579638
IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD, AND IMAGE PROCESSING PROGRAM FOR PERFORMING DETERMINATION REGARDING DIAGNOSIS OF LESION ON BASIS OF SYNTHESIZED TWO-DIMENSIONAL IMAGE AND PRIORITY TARGET REGION
2y 5m to grant Granted Mar 17, 2026
17/757,727
Patent 12562063
METHOD FOR DETECTING ROAD USERS
2y 5m to grant Granted Feb 24, 2026
18/471,188
Patent 12561805
METHODS AND SYSTEMS FOR GENERATING DUAL-ENERGY IMAGES FROM A SINGLE-ENERGY IMAGING SYSTEM BASED ON ANATOMICAL SEGMENTATION
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds
Prosecution Projections

1-2
Expected OA Rounds
68%
Grant Probability
99%
With Interview (+35.7%)
3y 4m
Median Time to Grant
Low
PTA Risk
Based on 69 resolved cases by this examiner. Grant probability derived from career allow rate.
ACTOR-CRITIC APPROACH FOR GENERATING SYNTHETIC IMAGES

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

ACTOR-CRITIC APPROACH FOR GENERATING SYNTHETIC IMAGES

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email