Office Action Analysis: 18709218 — High Dynamic Range View Synthesis from Noisy Raw Images

Office Action

§103 §112
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

	Information Disclosure Statement
The information disclosure statements (IDS) submitted on May 7, 2025 and February 4, 2026 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Priority
The present application is a 371 continuation of PCT/US2022/047381 filed on October 21, 2022, and claims benefit of the provisional application number 63/279,363 filed on November 15, 2021. 
Claim Objections
Claims 14-16 is objected to because of the following informalities: 
Regarding claim 14, line 2, “input data set the plurality” should read “input data set of the plurality”.  
Regarding claim 15, line 2, “input data set the plurality” should read “input data set of the plurality”.  
Regarding claim 16, line 2, “input data set the plurality” should read “input data set of the plurality”.  
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 6-7, and 10 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding claims 6 and 10,
	Claims 6 recites “processing the predicted quad bayer filter data to generate a novel view rendering” and 10 recites “A computer-implemented method for novel view rendering, the method comprising”.   The term “novel” is a relative term, not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.  While the disclosure describes examples of “novel” views as objects (e.g. rendering “novel views based on previously unobserved poses” ([0140])), it also describes performing “novel” views (e.g. “novel view synthesis” ([0130]), enabling “various novel HDR view synthesis tasks” ([0127])). It is unclear whether a “novel view” is a new copy of an image based on existing image data, whether it’s an action being performed, or any other underlining framework. Accordingly, claims 6 and 10 are under 35 U.S.C. 112(b) for being indefinite.

Regarding claim 7,
The terms “generalizing to low confidence values” in claim 7 are relative terms which renders the claim indefinite. The term “generalizing” is not defined by the claim, the terms “low confidence values” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. Specifically, it is unclear how the neural radiance field model performs “generalizing”, and it is unclear how “low confidence values” are determined, or what constitutes “low confidence values”.  Thus, claim 7 is rejected under 35 U.S.C. 112(b) for being indefinite.
Due to the indefinite nature of claim 7, examiner is unable to perform a prior art search for claim 7.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-2, 4, 6, 8-9, and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Mildenhall (“NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis”; examiner relied on a more easily readable copy of the article than provided by applicant, updated version of reference provided by examiner) in view of Lim (US 20130322752 A1).

Regarding Claim 1,
Mildenhall teaches: (Original) A computing system, the system comprising:
one or more processors; and one or more non-transitory computer-readable media that collectively store instructions that, when executed by the one or more processors, cause the computing system to perform operations, the operations comprising
(Mildenhall teaches a system that can use a NVIDIA V100 GPU (Abstract; p. 8, section 5.3, last paragraph; p. 17-18 Annex A).  NVIDIA V100 GPU is known by one of ordinary skill to comprise one or more processors and one or more non-transitory computer-readable media that collectively store instructions that, when executed by the one or more processors, cause the computing system to perform operations.):
obtaining a training dataset, wherein the training dataset comprises a plurality of three-dimensional positions, a plurality of two-dimensional view directions, and a plurality of 
(Abstract "…input is a single continuous 5D coordinate (spatial location (x,y,z) and viewing direction (θ,φ))"; see FIG. 1-3 and their corresponding descriptions; see p. 14. section 7, first paragraph; Mildenhall further teaches using pixels representing images throughout the model, including in the training data set and implementation (e.g. p. 6, section 4, paragraph 1 "Rendering a view from our continuous neural radiance field requires estimating this integral C(r) for a camera ray traced through each pixel
of the desired virtual camera.", and p. 9, section 5.3, paragraph 1 and p. 9, section 6, paragraph 1, teach the implementation and use of synthetic renderings of objects using pixels).  Under the broadest reasonable interpretation of the claim and to a person of ordinary skill in the art, the use of pixels representative of an image constitutes the use of digital images, i.e. "a plurality of bits structured in a format".);
	processing a first three-dimensional position of the plurality of three-dimensional positions and a first two-dimensional view direction of the plurality of two-dimensional view directions with a neural radiance field model to generate a view rendering (See p. 5, FIG. 2 (found below)),

    PNG
    media_image1.png
    574
    930
    media_image1.png
    Greyscale

wherein the neural radiance field model comprises one or more multi-layer perceptrons (p. 1, section 1, second paragraph "Our method optimizes a deep fully-connected neural network without any convolutional layers (often referred to as a multilayer perceptron or MLP) to represent this function by regressing from a single 5D coordinate (x,y,z,θ,φ) to a single volume density and view-dependent RGB color."; p. 2, section 1, bullet point from last paragraph "An approach for representing continuous scenes with complex geometry and materials as 5D neural radiance fields, parameterized as basic MLP networks.")
	wherein the view rendering is descriptive of one or more predicted color values and one or more predicted volume density values (FIG. 2 and description (seen above); p. 5, section 3, paragraph 2 (seen below));

    PNG
    media_image2.png
    326
    936
    media_image2.png
    Greyscale

 
evaluating a loss function that evaluates a difference between the view rendering and a first image of the plurality of (p. 2, section 1, last paragraph "…our technical contributions are…A differentiable rendering procedure based on classical volume rendering techniques, which we use to optimize these representations from standard RGB images. This includes a hierarchical sampling strategy to allocate the MLP’s capacity towards space with visible scene content."; P. 9, section 5.3, paragraph 1 "Our loss is simply the total squared error between the rendered and true pixel colors for both the coarse and fine renderings" and equation 6 includes loss function to measure the difference between the rendered pixel color and the ground truth pixel colors (i.e. minimized error).), wherein the first image is associated with at least one of the first three-dimensional position or the first two- dimensional view direction (p. 9, section 5.3, paragraph 1 "…a dataset of captured RGB images of the scene, the corresponding camera poses and intrinsic parameters, and scene bounds (we use ground truth camera poses, intrinsics, and bounds for synthetic data, and use the COLMAP structure-from-motion package [39] to estimate these parameters for real data).");
and adjusting one or more parameters of the neural radiance field model based at least in part on the loss function (Mildenhall teaches a neural network parameterized by a fully connected MLP (p. 2, section 1, paragraph 5 "…representing continuous scenes with complex geometry and materials as 5D neural radiance fields, parameterized as basic MLP networks").  Such MLP are understood in the art to include learnable parameters (e.g., weights and biases).  Mildenhall further teaches, defining and minimizing an error between rendered and ground truth pixel colors as a loss function (p. 2, section 1, first paragraph "…we can use gradient descent to optimize this model by minimizing the error between each observed image and the corresponding views rendered from our representation."; see also p. 9, section 5.3, paragraph 1; p.5-6, section 4, paragraphs 1-2), and optimizing the network parameters over 100-300k iterations using an optimizer (p. 9, section 5.3, paragraph 2), thereby iteratively adjusting one or more parameters of the neural radiance field model based at least in part on the loss function.).
Mildenhall fails to explicitly disclose: using raw noisy images, high dynamic range images, comprising of unprocessed bits, in a raw format.
In a related art, Lim teaches: systems and methods for “reducing chrominance (chroma) noise in image data” through known imaging processing techniques (Abstract).  Lim further teaches using image data comprising of raw noisy images and a plurality of images, including high dynamic range images, comprising of unprocessed bits, in a raw format, for image processing (see paragraphs [0218], [00247], and [0353]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of Mildenhall to incorporate the teachings of Lim to make the system more robust by increasing the types and quality of images and image data the system is capable of handling, thereby increasing the system’s effectiveness in handling different image data formats or sources.  Lim also notes conventional image processing techniques do not adequately account for the locations and direction of components (e.g. edges) within an image (Lim [006]), while Mildenhall, Lim, and the instant application all lie in the same field of endeavor of image processing with a specific aim of improving image quality through the use of location, direction, and colors. 

Regarding Claim 2,
	Mildenhall and Lim teach: (Currently Amended) The computing system of claim 1. 
Mildenhall further teaches: wherein the operations further comprise: processing the view rendering with a color correction model to generate a color corrected rendering (See p. 1-2, paragraph 2).

Regarding Claim 4,
	Mildenhall and Lim teach: (Currently Amended) The computing system of claim 1, including: evaluating the loss function that evaluates the difference between the view rendering and the first image of the plurality of raw noisy images 
	Mildenhall further teaches: mosaic masking being comprised in evaluating the loss function by sampling pixels from the dataset of a frame and rendering true pixel colors, thereby altering pixels of a portion of a frame, a known form of “mosaic masking”) (Mildenhall p. 9, section 5.3; Mildenhall teaches the system uses a NVIDIA V100 that uses frames when rendering p. 18, section A, subsection Rendering Details). 

Regarding Claim 6,
	Mildenhall and Lim teach: (Currently Amended) The computing system of claim 1.
	Mildenhall further teaches: wherein the operations further comprising:
	obtaining an input view direction and an input position (Abstract "..input is a single continuous 5D coordinate (spatial location (x,y,z) and viewing direction (θ,φ))");
	processing the input view direction and the input position with the neural radiance field model to generate predicted (p. 1-2, paragraph 2);
	and processing the predicted 
(Abstract "We describe how to effectively optimize neural radiance fields to render photorealistic novel views"; FIG. 1 and caption "we show two novel views rendered from our optimized NeRF representation.").
	Mildenhall fails to explicitly disclose: quad bayer filter data.
	Lim further teaches: Bayer quad data ([0377]; [0409]) and using a filter on the Bayer quad data ([0468]).  Bayer quad data taught by Lim is interpreted to be equivalent to “quad bayer data”.  Thus, under the broadest interpretation of the claim 6, Lim teaches quad bayer filter data.
	It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of Mildenhall, previously modified by Lim, to incorporate the further teachings of Mildenhall and Lim.  Doing so would increase the efficiency of the model by filtering the input patterns faster and improve brightness resolution rendering performance.

Regarding Claim 8,
	Mildenhall and Lim teach: The computing system of claim 1,
	Mildenhall further teaches: wherein the first image comprises a real-world photon signal data generated by a camera, and wherein the view rendering comprises predicted photon signal data (see FIG. 2’s description “We synthesize images by sampling 5D coordinates (location and viewing direction) along camera rays (a), feeding those locations into an MLP to produce a color and volume density (b), and using volume rendering techniques to composite these values into an image (c). This rendering function is differentiable, so we can optimize our scene representation by mini-mizing the residual between synthesized and ground truth observed images.”. Camera rays are known in the art to be equivalent to photons).

Regarding Claim 9,
	Mildenhall and Lim teach: (Currently Amended) The computing system of claim 1, including the plurality of raw noisy images.
	Mildenhall further teaches: (Currently Amended) The computing system of claim 1, wherein the plurality of (p. 8-9, section 5.3, paragraph 1 "…a dataset of captured RGB images of the scene, the corresponding camera poses and intrinsic parameters, and scene bound").
	Mildenhall fails to explicitly disclose: raw noisy images and using red-green-green-blue datasets.  However, Mildenhall and Lim teach the plurality of raw noisy images in claim 1.
	In a related art, Lim further teaches: capturing red-green-green-blue image data with a color filter array (e.g. Bayer color filter array) (FIG. 2; [0229]).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of Mildenhall, previously modified by Lim, to incorporate the further teachings of Lim to provide information regarding intensity of light at green, red, and blue wavelengths and increase the model’s accuracy in processing raw image data (see Lim, [0229]). 

Regarding Claim 17,
	Mildenhall teaches: (Original) One or more non-transitory computer-readable media that collectively store instructions that, when executed by one or more computing devices, cause the one or more computing devices to perform operations (Mildenhall teaches a system that can use a NVIDIA V100 GPU (Abstract; p. 8, section 5.3, last paragraph; p. 17-18 Annex A).  NVIDIA V100 GPU is known by one of ordinary skill to comprise one or more processors and one or more non-transitory computer-readable media that collectively store instructions that, when executed by the one or more processors, cause the computing system to perform operations.), the operations comprising:
	obtaining a training dataset, wherein the training dataset comprises a plurality of (Mildenhall teaches training a neural radiance field model using a set of images of a scene, including camera poses, directions, and spatial locations (Abstract, lines 1-4 and 10-12; Abstract, lines 4-6 "…input is a single continuous 5D coordinate (spatial location (x,y,z) and viewing direction"), which serve as training data for optimizing the model (p. 8-9, section 5.3); also see FIG 1-3 and corresponding captions; p. 14, section 7, first paragraph);
	processing a first view direction and a first position with a neural radiance field model to generate first predicted data, wherein the first predicted data is descriptive of one or more first predicted color values and one or more first predicted density values (Abstract lines 4-10; p. 1-2, section 1, paragraph 2; p. 4-5, section 3, first paragraph);
evaluating a loss function that evaluates a difference between the first predicted data and a first (p. 8-9, section 5.3, paragraphs 1-2, and equation 6), wherein the first (p. 8-9, section 5.3, paragraphs 1-2, including "a dataset of captured RGB images of the scene, the corresponding camera poses and intrinsic parameters, and scene bounds (we use ground truth camera poses, intrinsics, and bounds for synthetic data, and use the COLMAP structure-from-motion package [39] to estimate these parameters for real data)." and equation 6); and
	adjusting one or more parameters of the neural radiance field model based at least in part on the loss function (Mildenhall teaches a neural network parameterized by a fully connected MLP (p. 2, section 1, paragraph 5 "…representing continuous scenes with complex geometry and materials as 5D neural radiance fields, parameterized as basic MLP networks").  Such MLP are understood in the art to include learnable parameters (e.g., weights and biases).  Mildenhall further teaches, defining and minimizing an error between rendered and ground truth pixel colors as a loss function (p. 2, section 1, first paragraph "…we can use gradient descent to optimize this model by minimizing the error between each observed image and the corresponding views rendered from our representation."; see also p. 8-9, section 5.3, paragraph 1; p.5-6, section 4, paragraphs 1-2), and optimizing the network parameters over 100-300k iterations using an optimizer (p. 9, section 5.3, paragraph 2), thereby iteratively adjusting one or more parameters in the neural radiance field model based at least in part on the loss function.).
	Mildenhall fails to explicitly disclose: raw input.
In a related art, Lim teaches: systems and methods for “reducing chrominance (chroma) noise in image data” through known imaging processing techniques (Abstract).  Lim further teaches using image data comprising of raw noisy images and a plurality of images, in a raw format, for image processing (see paragraphs [0218] and [0353]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of Mildenhall to incorporate the teachings of Lim to make the system more robust by increasing the types and quality of images and image data the system is capable of handling, thereby increasing the system’s effectiveness in handling different image data formats or sources.  Lim also notes conventional image processing techniques do not adequately account for the locations and direction of components (e.g. edges) within an image (Lim [006]), while Mildenhall, Lim, and the instant application all lie in the same field of endeavor of image processing with a specific aim of improving image quality through the use of location, direction, and colors. 

Regarding Claim 18,
	Mildenhall and Lim teach: (Currently Amended) The one or more non-transitory computer-readable media of claim 17.
Mildenhall further teaches: wherein the one or more parameters are associated with a learned three- dimensional representation associated with an environment (p. 14, section 7, "We demonstrate that representing scenes as 5D neural radiance fields (an MLP that outputs volume density and view-dependent emitted radiance as a function of 3D location and 2D viewing direction)…"; see FIG 1-3; p. 8-9, paragraph 1-2).

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Mildenhall (“NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis”; examiner relied on a more easily readable copy of the article than provided by applicant, updated version of reference provided by examiner) in view of Lim (US 20130322752 A1), and in further view of Li (“A reweighted L2 method for image restoration with Poisson and mixed Poisson-Gaussian nois*”; copy provided by examiner).

Regarding Claim 3,
Mildenhall and Lim teach: (Currently Amended) The computing system of claim 1, including the loss function.
Mildenhall and Lim fail to explicitly disclose: wherein the loss function comprises a reweighted L2 loss.
In a related art, Li teaches: a reweighted L2 loss, referred to as reweighted L2 “fidelity”, for noise related image restoration (Abstract) that iteratively estimates noise variance (p. 2, section 1, 2nd to last paragraph).
	It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of Mildenhall, previously modified by Lim, to incorporate the teachings of Li in order to more efficiently estimate noise variance.  The teachings lie in the same field of endeavor as the instant application of image processing with an aim at improving image quality.

Claims 5 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Mildenhall (“NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis”; examiner relied on a more easily readable copy of the article than provided by applicant, updated version of reference provided by examiner) in view of Lim (US 20130322752 A1), and in further view of Zamir (“Learning Digital Camera Pipeline for Extreme Low-Light Imaging”; copy provided by examiner).

Regarding Claim 5
	Mildenhall and Lim teach: (Currently Amended) The computing system of claim 1, including: evaluating the loss function that evaluates the difference between the view rendering and the first image of the plurality of raw noisy images 
	Mildenhall and Lim fail to explicitly disclose exposure adjustments being comprised in the evaluation of the loss function.
	In a related art, Zamir teaches:  transforming short-exposure raw images into well-exposed images using a neural network trained with a loss function (Abstract; Figure 2), thereby performing and evaluating exposure adjustment, with regards to the loss function.
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate the teachings of Zamir into the teachings of Mildenhall and Lim to more accurately adjust levels of brightness and provide for better visualization of images by improving the model’s ability to react to blotchy appearances, amplified noise, and inaccurate colors and/or levels of darkness caused by short exposure times (Zamir, p. 1, section 1, first paragraph).  All inventions light in the same field of endeavor of image processing with a specific aim of improving image quality.

Regarding Claim 19,
	Mildenhall and Lim teach: The one or more non-transitory computer-readable media of claim 17, including evaluating a loss function that evaluates a difference between the first predicted data and a first raw input dataset.
Mildenhall and Lim fail to explicitly disclose: wherein the loss function comprises a tone-mapping loss associated with processing at least one of the first predicted data or the first raw input dataset.
In a related art, Zamir teaches: known conventional imaging processing uses image data to evaluate and generate tone mapping during the course of an imaging pipeline (“A conventional camera imaging pipeline processes the RAW sensor data through a sequence of operations (such as … tone mapping, sharpening, etc.) in order to generate the final RGB images” (p.1, section 1, paragraph 2).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of Mildenhall, previously modified by Lim and Chen, to incorporate the teachings of Zamir in order to more accurately map colors and brightness levels to corresponding at least one of the predicted data or the first raw input dataset.

Claim(s) 10-11, and 14-16 are rejected under 35 U.S.C. 103 as being unpatentable over Mildenhall (“NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis”; examiner relied on a more easily readable copy of the article than provided by applicant, updated version of reference provided by examiner) in view of Lim (US 20130322752 A1), in further view of Chen (“Learning to See in the Dark”; copy provided by examiner).

Regarding Claim 10,
	Mildenhall teaches: (Original) A computer-implemented method for novel view rendering (Abstract "We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes…"; p. 17-18 Annex A),
the method comprising: obtaining, by a computing system comprising one or more processors (Mildenhall teaches a system that can use a NVIDIA V100 GPU (Abstract; p. 8, section 5.3, last paragraph; p. 17-18 Annex A).  NVIDIA V100 GPU is known by one of ordinary skill to comprise one or more processors that cause the computing system to perform operations.),
an input two- dimensional view direction and an input three-dimensional position associated with an environment (Abstract "Our algorithm represents a scene … whose input is a single continuous 5D coordinate (spatial location (x,y,z) and viewing direction (θ,φ))…"; see FIGS. 1-3 and their corresponding descriptions)
obtaining, by the computing system, a neural radiance field model, (p. 1, section 1, paragraph 2) wherein the neural radiance field model was trained on a training dataset, wherein the training dataset comprises a plurality of (Mildenhall teaches training a neural radiance field model using a set of images of a scene, including camera poses, directions, and spatial locations (Abstract, lines 1-4 and 10-12), which serve as training data for optimizing the model (p. 8-9, section 5.3).);
processing, by the computing system, the input two-dimensional view direction and the input three-dimensional position with the neural radiance field model to generate prediction data, wherein the prediction data comprises one or more predicted density values and one or more predicted color values (Abstract, lines 4-10; p. 1-2, section 1, paragraph 2; FIG. 2 and p. 5, section 3, paragraphs 1-2)
processing, by the computing system, the prediction data with an image (FIG. 2 and p. 4-5, section 3, paragraphs 1-3; FIG. 3 and p. 5-6, paragraphs 1-2).
	Mildenhall fails to explicitly disclose: noisy input datasets and an image augmentation block.
In a related art, Lim teaches: systems and methods for “reducing chrominance (chroma) noise in image data” through known imaging processing techniques (Abstract).  Lim further teaches using image data comprising of noisy images data (Abstract) and input structures (FIG. 1) and capturing image data from an image sensor input signal for image processing ([0241]- [0242]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of Mildenhall to incorporate the teachings of Lim to make the system more robust by increasing the types and quality of images and image data the system is capable of handling, thereby increasing the system’s effectiveness in handling different image data formats or sources.  Lim also notes conventional image processing techniques do not adequately account for the locations and direction of components (e.g. edges) within an image (Lim [006]), while Mildenhall, Lim, and the instant application all lie in the same field of endeavor of image processing with a specific aim of improving image quality through the use of location, direction, and colors.
Mildenhall and Lim fails to explicitly disclose: the use of an image augmentation block to generate predicted view rendering.
In a related art, Chen teaches: applying data augmentation to training images, including cropping or patch-based processing (p. 5, section 4.2, first paragraph), which constitutes an image augmentation block in the training pipeline.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of Mildenhall, previously modified by Lim, incorporate the teachings of Chen in order to increase the effectiveness and efficiency of training the model and predicting view rendering when a variety of unknown types of images (e.g. raw images) are provided to the model.  All references and the instant application all lie in the same field of endeavor of image processing with a specific aim of improving image quality, specifically relating to colors.  Also, Mildenhall, Chen, and the instant application use “Adam optimizers” for training the network (Mildenhall p. 9, section 5.3, last paragraph; Chen, p. 5, section 4.2, first paragraph).

Regarding Claim 11,
	Mildenhall, Lim, and Chen teach: (Currently Amended) The method of claim 10, including processing the prediction data with an augmentation block to generate predicted view rendering, descriptive of a predicted scene.
 	Mildenhall, Lim, and Chen fail to explicitly teach: wherein the image augmentation block adjusts a focus of the prediction data.
	However, Chen and Lim each teach: well-known image adjustment techniques for focusing.  For example, Chen teaches camera settings like focus, and focal length can be adjusted to maximize the quality of images (Chen, p. 3, section 3, paragraph 5) and teaches there are a variety of deblurring techniques found in prior art (Chen, Abstract), while Lim teaches “a technique for performing auto-focus” (Lim, [0071] FIG. 75; also see [0072] and [0245]).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of Mildenhall, previously modified by Lim and Chen, to incorporate the further focus techniques of Chen and/or Lim to increase the accuracy of the model and corresponding predicted view rendering by adjusting focus of the prediction data throughout the training pipeline, specifically by use of the modified image augmentation block.

Regarding Claim 14,
	Mildenhall, Lim, and Chen teach: (Currently Amended) The method of claim 10, including a plurality of noisy input dataset.
	Mildenhall further teaches: datasets comprising of photon signal data (Mildenhall teaches sampling camera rays, which are known in the art to be photons (see FIG. 2’s description)).
	Mildenhall doesn’t, by itself, explicitly disclose: wherein each noisy input dataset the plurality of noisy input datasets.  However, Mildenhall, as previously modified by Lim and Chen, teach these principles in claim 10.  Refer back to claim 10 for further details.
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of Mildenhall, previously modified by Lim and Chen, to incorporate the further focus techniques of Mildenhall in order to more accurately capture location and viewing direction.

Regarding Claim 15,
	Mildenhall, Lim, and Chen teach: (Currently Amended) The method of claim 10, including a plurality of noisy input dataset.
	Mildenhall further teaches: wherein each (Examiner notes, signal data is known in the art as data related to the visual representation of an image (e.g. pixel color represented at a location). Mildenhall teaches using an input dataset of a plurality input datasets as a "sparse set of input views" (Abstract). Mildenhall further teaches input data sets comprise of data associated with RGB images (i.e. "data associated with at least one of a red value, a green value, or a blue value") (p. 8, section 5.3 "We optimize a separate neural continuous volume representation network for each scene. This requires only a dataset of captured RGB images of the scene…"; p. 2, section 1, last paragraph, lines 9-21); thus, Mildenhall also teaches the use of signal data.) 
	Mildenhall fails to explicitly disclose: noisy input dataset.  However, Mildenhall, as previously modified by Lim and Chen (in claim 10), does teach a noisy input dataset.
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate the teachings of input datasets associated with color values, discussed in the previous paragraph of the present office action and taught by Mildenhall, to the teachings of Mildenhall, previously modified by Lim, to increase the accuracy of the system by accounting for color values.

Regarding Claim 16,
	Mildenhall, Lim, and Chin teach: (Currently Amended) The method of claim 10, including a plurality of noisy input datasets.
Lim further teaches: wherein each noisy input dataset the plurality of noisy input datasets comprises one or more noisy mosaicked linear raw images (p. 22, [0103] of the instant application’s disclosure states “input data 4002 (e.g., noisy mosaicked linear raw images (e.g., RGGB bayer filter image datasets)),” indicating RGGB bayer filter image data is an example of noisy mosaicked linear raw images.  Under the broadest interpretation of the claim, Lim teaches RGGB bayer filter image datasets through the use of Bayer quad filter data derived from RGGB images ([0377]; [0409]; [0468]). Thus, Mildenhall and Lim teach wherein each noisy input dataset the plurality of noisy input datasets comprises one or more noisy mosaicked linear raw images.

Claim(s) 12-13 are rejected under 35 U.S.C. 103 as being unpatentable over Mildenhall (“NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis”; examiner relied on a more easily readable copy of the article than provided by applicant, updated version of reference provided by examiner) in view of Lim (US 20130322752 A1), in further view of Chen (“Learning to See in the Dark”; copy provided by examiner), and in further view of Zamir (“Learning Digital Camera Pipeline for Extreme Low-Light Imaging”; copy provided by examiner).

Regarding Claim 12,
	Mildenhall, Lim, and Chen teach: (Currently Amended) The method of claim 10, including processing the prediction data with an augmentation block to generate predicted view rendering, descriptive of a predicted scene.
 	Mildenhall, Lim, and Chen fail to explicitly teach: wherein the image augmentation block adjusts an exposure level of the prediction data.
	In a related art, Zamir teaches: evaluating and performing exposure adjustments (see Abstract, Figure 2, and refer back to claim 5 for further explanation), thereby adjusting exposure levels, under BRI.
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of Mildenhall, previously modified by Lim and Chen, to incorporate the teachings of Zamir to more accurately adjust levels of brightness and provide for better visualization of images by improving the model’s ability to react to blotchy appearances, amplified noise, and inaccurate colors and/or levels of darkness caused by short exposure times (Zamir, p. 1, section 1, first paragraph).  All inventions light in the same field of endeavor of image processing with a specific aim of improving image quality.

Regarding Claim 13,
	Mildenhall, Lim, and Chen teach: (Currently Amended) The method of claim 10, including processing the prediction data with an augmentation block to generate predicted view rendering, descriptive of a predicted scene.
 	Mildenhall, Lim, and Chen fail to explicitly teach: wherein the image augmentation block adjusts a tone-mapping of the prediction data.
	In a related art, Zamir teaches: known conventional imaging processing uses image data to evaluate and generate tone mapping during the course of an imaging pipeline (“A conventional camera imaging pipeline processes the RAW sensor data through a sequence of operations (such as … tone mapping, sharpening, etc.) in order to generate the final RGB images” (p.1, section 1, paragraph 2).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of Mildenhall, previously modified by Lim and Chen, to incorporate the teachings of Zamir in order to more accurately map colors and brightness levels to corresponding prediction data.

Allowable Subject Matter
	Claim 20 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form, including all of the limitations of the base claim and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SAMUEL DAVID BAYNES whose telephone number is (571)272-0607. The examiner can normally be reached Monday - Friday 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen R Koziol can be reached at (408)918-7630. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/S.D.B./
Samuel Baynes
Examiner | Art Unit 2665

/BOBBAK SAFAIPOUR/Primary Examiner, Art Unit 2665
Read full office action
High Dynamic Range View Synthesis from Noisy Raw Images

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

High Dynamic Range View Synthesis from Noisy Raw Images

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email