DETAILED ACTION
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on December 8th, 2025 has been entered.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s amendment has overcome the previously set forth 112(a) rejection regarding claims 1-12 and 14-20.
Applicant's arguments filed December 8th, 2025 have been fully considered but they are not persuasive.
On page 15, line 7 of “Remarks”, applicant alleges that "the techniques described by Ning do not teach or suggest the proposed modifications to the system of Olszewski." Examiner respectfully disagrees. The disclosure of Ning suggests that using vector quantization on a volume will reduce the amount of data stored and allow for fast access to any given voxel (Ning Section 3 "In consideration of the data compression requirements imposed by volume rendering, it should be evident that vector quantization is particularly suitable. Vectors may be formed from contiguous blocks of voxels in a volume and then quantized according to some codebook. The voxel data is then replaced by a much smaller data set of codebook indices representing the quantized blocks. Decompression consists of simple table lookups into the codebook so fast, on-the-fly voxel access is possible."). This would suggest to one of ordinary skill in the art that using vector quantization would lead to an improvement in their system.
On page 15, line 15 of “Remarks”, applicant alleges that "Ning does not further suggest modifying the encoder-decoder system of Olszewski to include a quantizer that quantizes a transformed volume to 'map a feature vector of each cell in the three-dimensional (3D) space of the transformed volume to one of a discrete number of quantized feature vector entries in a codebook' and a de-quantizer that 'de-quantizes the pulled discrete quantized values to produce a quantized 3D representation' and a decoder that 'decodes the quantized 3D representation to produce two-dimensional (2D) feature maps and that synthesizes the camera pose of the view of the target image from the 2D feature maps.' Ning generally describes data compression using vector quantization but does not suggest how vector quantization could have applied to the system of Olszewski. In particular, Ning does not provide sufficient teachings for one skilled in the art to modify the methods of Olszewski to enable flexible content manipulation and novel view synthesis by encoding-quantizing-de-quantizing-decoding as claimed." Examiner respectfully disagrees.
Ning teaches performing vector quantization on cells of a volume, which includes mapping vectors of each cell in the volume to one of the entries within the codebook (Ning Figure 1(a) and 4; Section 4 "The input volume of size LxMxN is divided into contiguous blocks of size JxJxK. Each block is then treated as a. vector of length I • J • K samples that is quantized according to an 8-bit codebook."), this results in a compressed volume. This quantizer and quantization process is analogous to mapping a feature vector of each cell of the 3D volume to one entry within the codebook. Then de-quantization can be performed by the decoder on the compressed volume by using the codebook to search for indices which were indicated by the VQ encoder (Ning Figure 1(a) and 4; Section 4 "The indices and foll codebook are then combined to form the new data file. This format is then accessed directly by the volume renderer to generate an image.") which results in a decompressed volume. This de-quantizer and dequantization process is analogous to de-quantizing the discrete values from the codebook to form a quantized 3D representation, Finally, the decompressed volume can be rendered as a 2D image by the volume renderer (Ning Figure 1(a)). This decoder and decoding process is analogous to decoding the quantized 3D volume to produce a target image. While Ning does not explicitly enable novel view synthesis, Ning does describe advantages that could be used when rendering volumes (as described above), and Olszewski is in the same field of volume rendering. Therefore, one of ordinary skill in the art would have found it obvious to use the volume rendering and quantization techniques of Ning to modify the novel view synthesis system of Olszewski.
On page 15, line 28 of “Remarks”, applicant alleges that "there are no teachings by Olszewski or Ning that would have led one skilled in the art to modify the encoder of Olszewski to extract data from the input image 'describing features representing different aspects of an appearance and shape of one or more objects in the input image to create feature vectors representing a spatially disentangled volumetric representation of relative camera poses of the one or more objects of the input image, wherein creating the spatially disentangled volumetric representation comprises learning a volumetric feature representation that is resampled under a new camera pose and rendered to a decoder to generate novel view images' and to perform 'a relative pose transformation of the spatially disentangled volumetric representation of the one or more objects of the input image between a camera pose of a view of the input image and a camera pose of a view of the target image to form a transformed volume,' where the transformed volume is quantized 'to map a feature vector of each cell in three-dimensional (3D) space of the transformed volume to one of a discrete number of quantized feature vector entries in a codebook including quantized discrete values of volumetric constituents of input objects learned during training, wherein the feature vectors are mapped to the codebook feature vector entries represented as indices to the codebook that are used to look up feature vectors in the codebook to pull corresponding discrete quantized values' before symmetrically de-quantizing the quantized image and decoding the de-quantized image to produce the view of the target image as claimed." Examiner respectfully disagrees. Examiner recognizes that obviousness may be established by combining or modifying the teachings of the prior art to produce the claimed invention where there is some teaching, suggestion, or motivation to do so found either in the references themselves or in the knowledge generally available to one of ordinary skill in the art. See In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 1988), In re Jones, 958 F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992), and KSR International Co. v. Teleflex, Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007). In this case, there is an advantage of Ning that would lead one of ordinary skill in the art to modify the system of Olszewski to include the teachings of Ning, as described above.
On page 16, line 17 of “Remarks”, applicant alleges that “Ning does not suggest quantizing a transformed volume to map a feature vector of each cell in three-dimensional (3D) space of the transformed volume to one of a discrete number of quantized feature vector entries in a codebook where the codebook includes 'quantized discrete values of volumetric constituents of input objects learned during training.' Ning merely suggests that an iterative technique may be used to optimize the codebook" Examiner respectfully disagrees. Ning teaches that a codebook can be iteratively optimized to fit the data which it is meant to represent, or that a codebook would have already been established based on the data which is being processed (Ning Section 4 "For many applications codebooks are designed on an entire class of data. ( e.g. medical magnetic resonance images) and used repeatedly, so it is reasonable not to include it with every data set. In this investigation, however, we do not assume the existence of an appropriate class, so a codebook is custom designed for each volume."). This iterative optimization technique is analogous to the learning technique applied in the current disclosure. The learning technique of the current disclosure is shown in paragraph [0026] of the specification, where a loss between the input vectors and the indices of the codebook are minimized. This is analogous to the iterative optimization technique of Ning which looks to minimize the distortion between the input vectors and the codebook indices. (Ning Figure 3; Section 3 "The second and third steps are then repeated until the improvement in overall distortion falls below some threshold.")
On page 16, line 21 of “Remarks”, applicant alleges that “Ning does not further suggest that the quantized feature vector entries in the codebook include 'quantized discrete values of volumetric constituents of input objects learned during training.' Instead Ning teaches that 'vectors may be randomly selected from the input or chosen as a uniform distribution in k-dimensional space' (Section 3, page 71, first column). Thus, even if the teachings of Ning were applied to Olszewski, there would be no teaching of encoding features representing different aspects of an appearance and shape of one or more objects in an input image to create feature vectors representing a spatially disentangled volumetric representation of relative camera poses of the one or more objects of the input image, quantizing a transformed volume to map a feature vector of each cell in three-dimensional (3D) space of the transformed volume to one of a discrete number of quantized feature vector entries in a codebook including quantized discrete values of volumetric constituents of input objects learned during training, wherein the feature vectors features representing different aspects of an appearance and shape of one or more objects in an input image are mapped to the codebook feature vector entries represented as indices to the codebook. On the contrary, the vectors of Ning are 'randomly selected from the input.' " Examiner respectfully disagrees.
Ning suggests that the vectors of the input volume would correspond to some real world object. This can be seen with respect to section 5, where the system described by Ning is used to render smoke from an air jet (Ning Section 5 "Volume rendered images of the air jet are shown in Fig. 8. The color and opacity mappings are assigned so that the highest smoke concentration is white, medium concentration is blue, and lowest concentration is gold.") This teaching suggests that the system proposed by Ning would be applied to vectors which represent real world objects. Furthermore, the codebook indices would have been learned during the iterative optimization technique, as described above. Furthermore, Ning does suggest that initially the vectors of the codebook could be randomly selected or chosen from a uniform distribution, but it also teaches that this initial codebook would be iteratively optimized based on the vectors of the input volume, as shown above. Thus, Ning does suggest that the encoded features would represent the appearance and shape of objects in the input volume, and that the quantized discrete values of the codebook would be learned from input objects during training. Nonetheless, Olszewski is cited to teach that the features would represent the appearance and shape of objects within an input image.
On page 18, line 2 of “Remarks”, applicant alleges that “Watanabe does not further suggest application of vector quantization to the system of Olszewski or suggest what the vectors would represent for a transformed volume." Watanabe is cited for the sole reason of teaching transforming a quantized volume. Olszewski is cited to teach what the vectors would represent within a volume.
For at least the reasons stated above, the rejection of claims 1-12 and 14-20 is maintained.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-9 and 12, 14-20 are rejected under 35 U.S.C. 103 as being unpatentable over Olszewski (Transformable Bottleneck Networks) in view of Ning (Vector Quantization for Volume Rendering), and Watanabe (US20120251014)
In regards to claim 1, Olszewski teaches a method of transforming an input image to a target image (Olszewski Abstract “We demonstrate that the bottlenecks produced by networks trained for this task contain meaningful spatial structure that allows us to intuitively perform a variety of image manipulations in 3D”), the method comprising: receiving, by an encoder, an input image at an arbitrary camera pose (Olszewski Section 3.1 Step 1 “An encoder network with parameters, that takes in an image Ik” Examiner note: As can be seen in figure 2 , the input image can be at a variety of poses.); extracting, by the encoder, data from the input image describing features representing different aspects of an appearance and shape of one or more objects in the input image to create feature vectors representing a spatially disentangled volumetric representation of relative camera poses of the one or more objects of the input image (Olszewski Section 3.1 Step 1 “An encoder network, that… outputs a bottleneck representation Xk structured as a volumetric grid of cells, each containing an n-dimensional feature vector” Examiner note: It can be appreciated that feature vectors, by definition, represent different aspects of an appearance and shape of an object. The grid of cells which each contain a feature vector (defined as a bottleneck representation) are analogous to the spatially disentangle volumetric representation of the camera pose of the one or more objects in the input image.), wherein creating the spatially disentangled volumetric representation comprises learning a volumetric feature representation that is resampled under a new camera pose and rendered to a decoder to generate novel view images (Olszewski Section 3.2.1 “NVS requires a minimum of two images of a given object from different, known viewpoints. Given {Ik,It} and Fk->t we can computer a reconstruction I’t of It using equation (1). Using this, we define several losses in image space with which to train our network parameters.” Examiner note: This section teaches that, during training, the system of Olszewski is given an image from two viewpoints, represented as Ik and It . The system then attempts to recreate It from Ik, this attempted reconstruction from a new viewpoint (or camera pose) is done by the decoder, and is represented as I’t. The error between I’t and It is then used to change (or train) network parameters.); performing a relative pose transformation of the spatially disentangled volumetric representation of the one or more objects of the input image between a camera pose of a view of the input image and a camera pose of a view of the target image to form a transformed volume (Olszewski Section 3.1 Step 2 Examiner note: This step teaches transforming the bottleneck representation using a user provided transform, this transform will change the bottleneck representation from one view of an object to the different, target view.); and decoding, by a decoder, the Olszewski Figure 2 “Decoder”; Section 3.1 Step 3; Abstract “We propose a novel approach to performing fine-grained 3D manipulation of image content via a convolutional neural network, which we call the Transformable Bottleneck Network (TBN).” Examiner note: The transformable bottleneck network described here is implemented through a convolutional neural network. It is known in the art that a convolutional network performs a variety of convolutions which result in feature maps. (See, for example, “A Review of Convolutional Neural Networks” Section II A) Therefore, when the decoder of this disclosure is creating the 2D image from the aggregated volumetric representation, each convolution will result in a feature map, and will eventually result in a 2D image, as can be seen in figure 2(a) “Decoder”.); and synthesizing, by the decoder, the camera pose of the view of the target image from the 2D feature maps. (Olszewski Section 3.1 Step 3).
Olszewski does not teach quantizing the transformed volume to map a feature vector of each cell in the three-dimensional (3D) space of the transformed volume to one of a discrete number of quantized feature vector entries in a codebook including quantized discrete values of volumetric constituents of input objects learned during training, wherein the feature vectors are mapped to the codebook feature vector entries represented as indices to the codebook that are used to look up feature vectors in the codebook to pull corresponding discrete quantized values; de-quantizing the pulled discrete quantized values to produce a quantized 3D representation; and decoding, by a decoder, the quantized 3D representation.
However, Ning teaches quantizing the Ning Figure 4, Volume and Block; Section 3 “With an 8-bit vector quantizer we can have 256 possible reproduction vectors known as codewords. These are stored in a table known as the codebook. Given a vector to be quantized, the encoder finds the codeword that best represents the input according to some distortion criterion.” Examiner note: This section teaches that each of the blocks in the volume are separately quantized. Once the vectors are quantized, they are represented by the index of the closest codeword (also known as reproduction vector) within the codebook.) including quantized discrete values of volumetric constituents of input objects learned during training (Ning Section 3 “Perhaps the most interesting aspect of vector quantization is the design of a good codebook for a particular input distribution. One popular design technique (the generalized Lloyd algorithm [7]) exploits the following necessary (but not sufficient) conditions for optimality: 1) given a codebook, the optimal partition is a nearest neighbor assignment; 2) given a partition, the optimal codevector for any cell is the centroid of the input vectors that fall into that cell. The design is an iterative technique that successively optimizes the codebook. First, an initial codebook is determined by some means. For example, vectors may be randomly selected from the input or chose as a uniform distribution in k-dimensional space. Second, the input vectors are partitioned into cells by the nearest neighbors rule applied to the current codebook. Third, the codebook is updated by replacing codevectors with the centroids of the new cells. The second and third steps are then repeated until the improvement in overall distortion falls below some threshold.” Examiner note: This section describes how the system of Ning creates a codebook to adapt to a particular given input. This section describes an iterative technique where the codebook is optimized with each step, until a point where the codebook stops improving by some threshold. In the field of artificial intelligence, training is defined as a system of feedback that takes in data to “learn”, and can generalize that to unseen data. The procedure described by Ning can be reasonably interpreted as training, as the system uses the input data to improve the codebook (thus “learning” how to create a codebook), and can then generalize that codebook training to any unseen data, as the algorithm is generic for any input data.), wherein the feature vectors are mapped to the codebook feature vector entries represented as indices to the codebook that are used to look up feature vectors in the codebook to pull corresponding discrete quantized values (Ning Figure 4 Indices + Codebook; Section 3 “Given a vector to be quantized, the encoder finds the codeword that best represents the input according to some distortion criterion. The binary index of the best codeword is then used to represent the entire vector. A decoder having an identical codebook uses this index to reproduce the vector as a simple table lookup (Fig. 2).” Examiner note: This section teaches that once a vector is quantized, it can be represented by the codeword that represents the closest vector. Then, as shown in figure 4, the decoder can use the indices and the codebook to finally render the volume (which is analogous to dequantizing the vectors).); de-quantizing the pulled discrete quantized values to produce a quantized 3DNing Figure 1, 4 Examiner note: These figures show that once the vectors are quantized, their indices can be pulled from the codebook to either decompress the volume, or to just render the volume into an image. These figures show that this disclosure begins with a volume, and ends with an image (2D representation). Note that in figure 1, the “compressed volume” is the same as the indices + codebook of figure 4.); and decoding, by a decoder, the quantized 3D representation (Ning Figure 1 “Volume Renderer” Examiner note: The quantized (compressed) volume can either be passed directly to a volume renderer for image generation, or it can first be decompressed and then passed to the volume renderer.).
Ning is considered to be analogous to the claimed invention because they are in the same field of using volumes to represent and manipulate images. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Olszewski to include the teachings of Ning, to provide the advantage of requiring less digital storage space, and allowing for fast access of voxels (Ning Section 3 "In consideration of the data compression requirements imposed by volume rendering, it should be evident that vector quantization is particularly suitable. Vectors may be formed from contiguous blocks of voxels in a volume and then quantized according to some codebook. The voxel data is then replaced by a much smaller data set of codebook indices representing the quantized blocks. Decompression consists of simple table lookups into the codebook so fast, on-the-fly voxel access is possible.").
Furthermore, Watanabe teaches quantizing the transformed volume (Watanabe Figure 2, 75-76; Paragraph [0015] “The 3D transformation unit further includes a 3D orthogonal transform coefficient data transformation unit for performing one-dimensional orthogonal transformation on the sorted 2D orthogonal transform coefficient data in the direction the plurality of images are arranged, and transforming the data into 3D orthogonal transform coefficient data, a quantization unit for quantizing the transformed 3D orthogonal transform coefficient data…” Examiner note: This reference specifically shows quantizing a transformed volume, which is not shown by Olszewski in view of Ning.).
Watanabe is considered to be analogous to the claimed invention because they are in the same field of transforming images from their two-dimensional form to a three-dimensional form. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Olszewski in view of Ning to include the teachings of Watanabe, to provide the advantage of allowing for a system that preserves image quality. (Watanabe Paragraph [0024] “According to an embodiment of the present disclosure, deterioration in image quality can be reduced.")
In regards to claim 2, Olszewski in view of Ning, and Watanabe teaches the method of claim 1, and further teaches resampling the transformed volume to correspond to a layout of image content in the view of the target image prior to quantizing the transformed volume. (Olszewski Section 3.1, Step 2)
In regards to claim 3, Olszewski in view of Ning, and Watanabe teaches the method of claim 1, and further teaches interpolating between two views of a same object in the input image, or interpolating two different objects in a same view or a similar view of the input image. (Olszewski Figure 2)
In regards to claim 4, Olszewski in view of Ning, and Watanabe teaches the method of claim 1, and further teaches wherein the input image is a red, green, blue (RGB) image captured in a given camera pose, and wherein creating the spatially disentangled volumetric representation of the input image comprises generating, by the encoder, a volumetric representation of content of the input image using learnable parameters whereby each cell in the volumetric representation contains a feature vector describing a local shape and appearance of a corresponding region in the input image (Olszewski Figure 2, Volume; Section 3.1, Step 1).
In regards to claim 5, Olszewski in view of Ning, and Watanabe teaches the method of claim 4, and further teaches wherein the spatially disentangled volumetric representation is defined within a view space of the input image such that a depth dimension corresponds to a distance from a camera (Olszewski Section 3.1.2 “In this work the grid cells are chosen to be equally spaced, with the volume centered on the target object and axis aligned with the camera coordinate frame.” Examiner note: Since the axis is aligned with the camera coordinates, the depth (and therefore distance from the camera) would correspond to one of the x, y, or z components.).
In regards to claim 6, Olszewski in view of Ning, and Watanabe teaches the method of claim 4, and further teaches wherein performing the relative pose transformation of the spatially disentangled volumetric representation of the one or more objects of the input image between a camera pose of a view of the input image and a camera pose of a view of the target image to form the transformed volume comprises using a trilinear resampling operation with parameters defined based on a transformation between the camera pose of the view of the input image and the camera pose of the view of the target image (Olszewski Section 3.1 Step 2 “transforms the bottleneck via a trilinear resampling operation”).
In regards to claim 7, Olszewski in view of Ning, and Watanabe teaches the method of claim 1, and further teaches receiving information from a number of input views of an object in the input image, transforming the number of input views into the view of the target image, and computing per-cell averages of the feature vectors before decoding the de-quantized image (Olszewski Section 3.1.1 Equation 2 [average]; Section 3.1.1 “our formulation naturally extends to an arbitrary number of inputs”).
In regards to claim 8, Olszewski in view of Ning, and Watanabe teaches the method of claim 1, and further teaches wherein the feature vectors representing the spatially disentangled volumetric representation of poses of the input image comprise 2D feature maps, further comprising reshaping, by the encoder, the 2D feature maps to generate spatially transformed 3D feature maps and reshaping, by the decoder, the spatially transformed 3D feature maps to produce 2D feature maps for synthesis of the target image (Olszewski Figure 2, Encoder and Decoder; Section 3.1 Steps 1 and 3).
In regards to claim 9, Olszewski in view of Ning, and Watanabe teaches the method of claim 1, and further teaches training the codebook using at least one multi-view dataset in which source and target images are randomly selected and a corresponding pose transformation is applied to an encoded source image bottleneck to produce a result that is quantized and decoded to synthesize a synthesized image in the codebook. (Olszewski Section 3.2.1 “NVS requires a minimum of two images of a given object from different, known viewpoints Given {Ik, It} and Fk->i, we can compute a reconstruction, I’t, of It using equation (1). Using this, we define several losses in image space with which to train our network parameters.”)
In regards to claim 12, Olszewski in view of Ning, and Watanabe renders obvious the claim limitations as in the consideration of claim 1.
In regards to claim 14, Olszewski in view of Ning, and Watanabe, teaches a processor and a memory storing computer readable instructions that, when executed by the processor, configure the system to perform operations (Ning Section 1 “Scientific visualization is an emerging discipline that seeks to use computer graphics to extract information and insight from a variety of data.” “This integrated system would allow for efficient use of disk space, main memory, and network bandwidth.” Examiner note: Since this system is being performed with a computer, it is obvious that a processor would need to be used to execute the instructions.) and renders obvious the remaining claim limitations as in the consideration of claims 2 and 12.
In regards to claim 15, Olszewski in view of Ning, and Watanabe renders obvious the claim limitations as in the consideration of claims 3 and 12.
In regards to claim 16, Olszewski in view of Ning, and Watanabe renders obvious the claim limitations as in the consideration of claims 4, 5, and 12.
In regards to claim 17, Olszewski in view of Ning, and Watanabe renders obvious the claim limitations as in the consideration of claims 6 and 12.
In regards to claim 18, Olszewski in view of Ning, and Watanabe renders obvious the claim limitations as in the consideration of claims 7 and 12.
In regards to claim 19, Olszewski in view of Ning, and Watanabe renders obvious the claim limitations as in the consideration of claims 7 and 12.
In regards to claim 20, Olszewski in view of Ning, and Watanabe teaches a non-transitory computer-readable storage medium, the computer-readable storage medium including instructions that when executed by a processor cause the processor to [perform operations] (Watanabe Paragraph [0129] “The program may be temporarily or permanently stored (recorded) in a removable medium 111 such as a flexible disk, a compact disc read-only memory (CD-ROM), a magnet-optical (MO) disk, a digital versatile disc (DVD), a magnetic disk, a semiconductor memory, or the like. Such a removable medium 111 may be provided as so-called packed software.”) and renders obvious the remaining claim limitations as in the consideration of claims 1 and 12.
Claims 10-11 are rejected under 35 U.S.C. 103 as being unpatentable over Olszewski in view of Ning, and Watanabe, and further in view of Löhdefink (Scalar and Vector Quantization for Learned Image Compression: A Study on the Effects of MSE and GAN Loss in Various Space).
In regards to claim 10, Olszewski in view of Ning, and Watanabe teaches the method of claim 9, but fails to further teach wherein training the codebook comprises employing an adversarial loss using a discriminator network with learnable parameters and optimizing the codebook during training using the adversarial loss.
However, Löhdefink teaches wherein training the codebook comprises employing an adversarial loss using a discriminator network with learnable parameters and optimizing the codebook during training using the adversarial loss. (Section III B “We employ a GAN loss LGAN=LMSE+αLFM+βLadv, using a learned discriminator network, combining a standard MSE loss LMSE(1) with a feature matching (FM) loss LFM (2) and an adversarial loss Ladv”)
Löhdefink is considered to be analogous to the claimed invention because they are in the same field of training a neural network to perform vector quantization. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Olszewski in view of Ning, and Watanabe to include the teachings of Löhdefink, to provide the advantage of guaranteeing that the system generates realistic outputs (Löhdefink Section I “Incorporating adversarial loss functions into learned image compression by self-supervised autoencoders constituting the generator of a GAN, see Fig. 1, guarantees to generate realistic outputs even at very low bitrates and has benefits, e.g., in real-time applications dealing with data-intensive machine learning functions.”)
In regards to claim 11, Olszewski in view of Ning, Watanabe, and Löhdefink teaches the method of claim 10, and further teaches wherein training the codebook further comprises selecting an adversarial loss weight applied to the adversarial loss by using a reconstruction loss measured between a ground truth and a reconstructed image and a gradient of an input with respect to a final layer of an image generator. (Löhdefink Equation 1; Section III B “In the loss functions we use the image x=(xi) with pixel index i ∈ I = { 1, 2, H ⋅ W } and I being the set of I=H⋅W pixel indices. The same holds for the reconstructed image x^=(x^i).” Examiner note: In equation 1 they are subtracting the difference between the reconstructed image and the image.)
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
“Using Vector Quantization for Image Processing” discusses different ways of using vector quantization to process images. This includes rendering volumes as discussed in section VIII.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CALEB LOGAN ESQUINO whose telephone number is (703)756-1462. The examiner can normally be reached M-Fr 8:00AM-4:00PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Bee can be reached at (571) 270-5183. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CALEB L ESQUINO/Examiner, Art Unit 2677
/ANDREW W BEE/Supervisory Patent Examiner, Art Unit 2677