Last updated: April 19, 2026
Application No. 18/658,496
METHOD FOR CREATING SYNTHETIC DATA FOR AI MODEL TRAINING

Non-Final OA §102§103§112
Filed
May 08, 2024
Examiner
SONNERS, SCOTT E
Art Unit
2613
Tech Center
2600 — Communications
Assignee
Karl Storz SE & Co. Kg
OA Round
1 (Non-Final)
Interview Optional

— +12.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 375 resolved cases, 2023–2026
Examiner Intelligence

SONNERS, SCOTT E View full profile →
Grants 69% — above average
Career Allow Rate
258 granted / 375 resolved
+6.8% vs TC avg
Moderate +12% lift
Without
With
+12.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
25 currently pending
Career history
400
Total Applications
across all art units
Statute-Specific Performance

§101
7.9%
-32.1% vs TC avg
§103
39.2%
-0.8% vs TC avg
§102
29.4%
-10.6% vs TC avg
§112
14.1%
-25.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 375 resolved cases
Office Action

§102 §103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Regarding claim 24, the phrase "should" renders the claim indefinite because it is unclear whether the limitations following the phrase are part of the claimed invention.  See MPEP § 2173.05(d). As used in the claims it is not clear whether it is actually required that the scene is simulated “in predetermined surroundings” or whether such simulation is merely a preference where for example although the scene described in the configuration should be simulated in predetermined surroundings, this does not mean that it “must be” or “is” simulated.  In the interest of compact prosecution the Examiner will interpret the claim language as if it reads “is” instead of “should be” which would render the claims definite.
Regarding claim 25, the instant claim carries through the deficiency of parent claim 24 above such that it is rejected for the same reasons as claim 24, and is interpreted as rendered definite according to the interpretation of claim 24 as rendered definite.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1, 16, 21-24, and 27-34 is/are rejected under 35 U.S.C. 102(a)(1)/(a)(2) as being anticipated by Wrenninge1.
Regarding claim 1, Wrenninge teaches a computer-implemented method for synthesizing application-specific image data for training artificial intelligence-based object recognition in images of medical and/or clinical workflows, comprising (note that here “for synthesizing application-specific image data” is a recitation of intended use, and “for training artificial intelligence-based object recognition in images of medical and/or clinical workflows” is also a statement of intended use; only “for synthesizing application-specific image data” is given patentable weight here as the body of the claim recites what may be considered synthesizing application-specific image data and thus the image synthesis operations below have meaning given to them by such preamble language, whereas “for training artificial intelligence-based object recognition in images of medical and/or clinical workflows” is given no patentable weight as a statement of intended use as it “merely states, for example, the purpose or intended use of the invention, rather than any distinct definition of any of the claimed invention’s limitations” and thus “the preamble is not considered a limitation and is of no significance to claim construction” (see MPEP 2111.02), as the body of the claim does not recite any limitations specifically relating to “training” or how such images are used “for training” that limits the body of the claim, nor does object recognition in images of medical and/or clinical workflows breathe any meaning into the claim as for example AI based object recognition in images would not necessitate any specific content or format for training images as AI object recognition trained on other types of data could be used for object recognition in any type of image including those of medical and/or clinical workflows, or in other words the body of the claim defines a structurally complete and operable computer implemented method which does not rely on any specific elements or terminology of the claim language for antecedent basis, completeness or further meaning; furthermore, with regard to “synthesizing application-specific image data”, the claim does not define what is meant by application-specific nor image data and in the context of the claims, application-specific image data is considered any data related to an image such as any data used in creating an image, processing an image, and/or image data itself which is specific to an application in some way such as being used by an application or used for a certain application; thus see Wrenninge, paragraphs 0033-0034 and figures 1A-1B teaching “in FIGS. 1A-1B, the method 100 can include: determining a set of parameter values associated with at least one of a set of geometric parameters, a set of rendering parameters, and a set of augmentation parameters S100; generating a three dimensional (3D) scene based on the set of geometric parameters S200; rendering a synthetic image of the scene based on the set of rendering parameters S300; augmenting the synthetic image based on the set of augmentation parameters S400; and, generating a synthetic image dataset S500. The method can additionally or alternatively include: training a model based on the synthetic image dataset S600” and “method 100 functions to generate synthetic data that is suited (e.g., optimally suited, adapted to optimize model convergence and/or accuracy, etc.) to image-based machine learning (e.g., computer vision models, vehicle control models, classification models, mission planning models, etc.)” and “can also function to generate a synthetic dataset for provision to a third party (e.g., a developer of machine image classification models, a tester of computer vision models, a creator of vehicle control models based on image analysis, etc.)” such that here the generation of the 3D models for the specific application being tested such as image classification/object recognition using the process described below is synthesizing application-specific image data  and though not required by the claims, this further teaches that this image data is generated for training AI-based object recognition in images of any scene designed by a user or based on any environment that can be modeled which would include images of medical and/or clinical workflows as such images are simply a type of scene like any other made up of objects that can be recognized in some manner; see paragraphs 0116-0118 and figure 9 teaching “implementation 900 of the simulation system” which uses a computer in the form of system 900 including “processor 904” which implements the functions of the methods described): 
receiving a command file for execution on a processor to create image data records comprising up to three dimensions using at least one configuration file (see Wrenninge, paragraphs 0033-0034 and figures 1A-1B teaching “in FIGS. 1A-1B, the method 100 can include: determining a set of parameter values associated with at least one of a set of geometric parameters, a set of rendering parameters, and a set of augmentation parameters S100; generating a three dimensional (3D) scene based on the set of geometric parameters S200; rendering a synthetic image of the scene based on the set of rendering parameters S300; augmenting the synthetic image based on the set of augmentation parameters S400; and, generating a synthetic image dataset S500” where “Block S100 functions to define the parametric rule set for 3D scene generations, image rendering, and image augmentation” such that this “parametric rule set” corresponds to a command file for execution on a processor (such as in paragraphs 0116-0117) and the rule set operates on the “set of parameter values” such that the set of parameter values functions as a configuration file and thus for example in steps S300-S500 these receive a command file to create image data records comprising up to three dimensions using at least one configuration file when the steps receive the “parametric rule set” for execution by the processor to create image data records comprising up to three dimensions such as “generating a three-dimensional (3D) scene based on the set of geometric parameters” and this is done using the “set of parameter values” which are the configuration files; see also paragraphs 0056-0058 teaching “block S200 functions to procedurally create a three dimensional virtual representation of an environment (e.g., a scene) based on the parameter values that control the implementation of the procedure based on the parameter values that control the implementation of the procedure (e.g., geometric parameter values, parameter values defining which object classes will be populated into a scene). Block S200 can also function to produce a realistic virtual scene, wherein realism in the overall scene composition, along with the geometric structure and the material properties of objects in the scene, is enforced by the parametrized rule set and parameter values (e.g., determined in an instance of Block S100)” such that here there is such receiving of the command files and configuration files that dictate the creation of the image data records); 
creating a configuration file describing a scene to be simulated, the configuration file defining one or more object parameters (see Wrenninge, paragraphs 0033-0034 and figures 1A-1B teaching “in FIGS. 1A-1B, the method 100 can include: determining a set of parameter values associated with at least one of a set of geometric parameters, a set of rendering parameters, and a set of augmentation parameters S100; generating a three dimensional (3D) scene based on the set of geometric parameters S200; rendering a synthetic image of the scene based on the set of rendering parameters S300; augmenting the synthetic image based on the set of augmentation parameters S400; and, generating a synthetic image dataset S500” where “Block S100 functions to define the parametric rule set for 3D scene generations, image rendering, and image augmentation” such that as explained above this “parametric rule set” corresponds to a command file and the rule set operates on the “set of parameter values” such that the set of parameter values functions as a configuration file and these parameter values are created in step s100 and describe a scene to be simulated and define one or more object parameters where “a set of parameter values associated with at least one of a set of geometric parameters, a set of rendering parameters, and a set of augmentation parameters” correspond to object parameters with geometric parameters, rendering parameters and augmentation parameters all being parameters corresponding to the geometry of objects, rendering of objects and any augmentation of objects where an example scene to be simulated and defined with various types of object parameters for various objects is taught in paragraph 0041 teaching “the discrete variables include: an overall number of pedestrians in the scene, a wet road surface, and light; the continuous variables include: the camera location (e.g., x/y/z), each pedestrian's height and pose, and the angle and magnitude of the light (e.g., into the camera); and the stochastic variables include: the pedestrian locations within the scene (e.g., wherein the stochastic variable can be changed between images to produce scenes with the desired parametric properties but with unique combinations” where for example pedestrians are objects defined and a camera is an object defined by object parameters that describes the scene to be captured from that viewpoint and as in paragraphs 0043-0044 a camera object may be defined by parameters for capturing a scene where “Alternatively, a single scene is generated, and the parameters defining the viewpoint of a virtual camera (e.g., camera angle and position of the camera) can be changed within the scene to generate a set of synthetic images” and “output(s) of Block S100 preferably include the definition of each object property (e.g., defined by parameter values) as well as the layout of each object within the scene (e.g., defined by parameter values), which can be used to generate the three dimensional virtual representation in Block S200”)
executing the command file and the configuration file on the processor to create a first image data record, the first image data record including a first object defined by the one or more object parameters (see Wrenninge, paragraphs 0056-0066 teaching “Block S200 includes generating a 3D scene based on parameter values (e.g., determined in accordance with one or more variations of Block S100). Block S200 functions to procedurally create a three dimensional virtual representation of an environment (e.g., a scene) based on the parameter values that control the implementation of the procedure (e.g., geometric parameter values, parameter values defining which object classes will be populated into a scene)” such that when generating a 3D scene based on the particular values passed in any iteration to block S200, this executes the command file and configuration file to generate the scene according to the parameters in order to create a first image data record in the form of the 3D scene which can be captured, and does so according to the various objects such as “objects in the scene” and “a virtual camera model” object as in “Scene synthesis in conjunction with the method 100 preferably includes a defined model of the 3D virtual scene (e.g., determined in accordance with one or more variations of Block S100) that contains the geometric description of objects in the scene, a set of materials describing the appearance of the objects, specifications of the light sources in the scene, and a virtual camera model”  and; then, additionally “Block S300 includes generating a synthetic image of the generated scene, which functions to create a realistic synthetic two-dimensional (2D) representation of objects in the scene, wherein the objects are intrinsically labeled with the parameter values used to generate the scene (e.g., the objects in the scene, object classifications, the layout of the objects, all other parametrized metadata, etc.)” where this generation of the synthetic image of the scene also corresponds to creation of a first image data record either alone or in combination with step S200; in both cases the image data records include a first object defined by the one or more object parameters as the virtual scene generated according to the object parameters includes the procedurally generated 3D virtual representations of objects that are to be captured by virtual camera objects that are defined and the synthetic images capturing the 3D virtual representation from the perspective of the virtual cameras include a first object defined by one or more object parameters as “Block S300 can also function to produce realistic synthetic images with pixel-perfect ground truth annotations and/or labels (e.g., of what each object should be classified as, to any suitable level of subclassification and/or including any suitable geometric parameter, such as pose, heading, position, orientation, etc.). Block S300 is preferably performed based on a set of rendering parameters (e.g., determined in accordance with an instance of a variation of Block S100), but can additionally or alternatively be performed based on a fixed set of rendering rules or procedures. Block S300 is preferably performed by a virtual camera, wherein the viewpoint of the virtual camera is determined as a parameter value in a variation of Block S100 (e.g., as a rendering parameter). Generating the synthetic image preferably includes generating a projection of the 3D scene onto a 2D virtual image plane at a predetermined location in the 3D scene (e.g., at the virtual camera location)” and as in paragraphs 0120-0123 the camera can be included in such labeled/annotated data where “metadata associated with each image is stored in a meta subdirectory, with a single JSON file corresponding to each RGB image. In some implementations, three types of metadata are provided: scene metadata, which describes the properties of the scene as a whole; camera/sensor metadata, describing the intrinsic and extrinsic characteristics of the sensor” and “the camera/sensor metadata describes attributes of the intrinsic and extrinsic behavior of each camera/sensor. For example, the camera metadata may include extrinsic camera metadata and intrinsic camera metadata”); 
creating a first annotation file associated with the first image data record (see Wrenninge, paragraph 0057 teaching a first example of a first annotation file associated with the first image data record when the image data record is the 3D virtual representation of the scene where “output of Block S200 preferably includes a three dimensional virtual representation of a set of objects (e.g., a scene), the aspects of which are defined by the parameter values determined in Block S100. Each object in the scene generated in Block S200 is preferably automatically labeled with the parameter values used to generate the object (e.g., including the object class name, object type, material properties, etc.), and other suitable object metadata (e.g., subclassification), in order to enable the synthetic image(s) generated in Block S300 to be used for supervised learning without additional labeling (e.g., the objects in the scene are preferably intrinsically labeled)” and another example of a first annotation file associated with the first image record corresponds to the labeled 2D images of the 3D scene representation defined by the objects as in paragraph 0066 -0070 teaching “Block S300 includes generating a synthetic image of the generated scene, which functions to create a realistic synthetic two-dimensional (2D) representation of objects in the scene, wherein the objects are intrinsically labeled with the parameter values used to generate the scene (e.g., the objects in the scene, object classifications, the layout of the objects, all other parametrized metadata, etc.)” and “can also function to produce realistic synthetic images with pixel-perfect ground truth annotations and/or labels (e.g., of what each object should be classified as, to any suitable level of subclassification and/or including any suitable geometric parameter, such as pose, heading, position, orientation, etc.). Block S300 is preferably performed based on a set of rendering parameters (e.g., determined in accordance with an instance of a variation of Block S100)” and “Block S300 is preferably performed by a virtual camera, wherein the viewpoint of the virtual camera is determined as a parameter value in a variation of Block S100 (e.g., as a rendering parameter). Generating the synthetic image preferably includes generating a projection of the 3D scene onto a 2D virtual image plane at a predetermined location in the 3D scene (e.g., at the virtual camera location)” and “output of Block S300 preferably includes a two dimensional synthetic image that realistically depicts a realistic 3D scene. The synthetic image defines a set of pixels, and each pixel is preferably labeled with the object depicted by the pixel (e.g., intrinsically labeled based on the parameters used to generate the object rendered in the image). In this manner, a “pixel-perfect”intrinsically annotated synthetic image can be created. In alternative variations, labelling can be performed on a basis other than a pixel-by-pixel basis; for example, labelling can include automatically generating a bounding box around objects depicted in the image, a bounding polygon of any other suitable shape, a centroid point, a silhouette or outline, a floating label, and any other suitable annotation, wherein the annotation includes label metadata such as the object class and/or other object metadata. Preferably, labelling is performed automatically (e.g., to generate an automatically semantically segmented image)” such that these labelled/annotated images correspond to annotation files where also as in paragraphs 0120-0123 it is disclosed that “each image is annotated with class, instance, and depth information” and “metadata associated with each image is stored in a meta subdirectory, with a single JSON file corresponding to each RGB image. In some implementations, three types of metadata are provided: scene metadata, which describes the properties of the scene as a whole; camera/sensor metadata, describing the intrinsic and extrinsic characteristics of the sensor; and instance metadata, which provides details on the individual actors in each image” such that these are additional examples of annotation files); 
storing the first image data record together with the first annotation file (see Wrenninge, paragraph 0057 teaching “output of Block S200 preferably includes a three dimensional virtual representation of a set of objects (e.g., a scene), the aspects of which are defined by the parameter values determined in Block S100. Each object in the scene generated in Block S200 is preferably automatically labeled with the parameter values used to generate the object (e.g., including the object class name, object type, material properties, etc.), and other suitable object metadata (e.g., subclassification), in order to enable the synthetic image(s) generated in Block S300 to be used for supervised learning without additional labeling (e.g., the objects in the scene are preferably intrinsically labeled)” such that here the 3D virtual representation as an image data record is output along with annotations such that as an output of the computer these are stored, such as when the first image data record is considered the 2D image of the 3D scene generated which is also provided with an annotation file as in paragraphs 0066-0070 teaching “Block S300 includes generating a synthetic image of the generated scene, which functions to create a realistic synthetic two-dimensional (2D) representation of objects in the scene, wherein the objects are intrinsically labeled with the parameter values used to generate the scene (e.g., the objects in the scene, object classifications, the layout of the objects, all other parametrized metadata, etc.)” and “output of Block S300 preferably includes a two dimensional synthetic image that realistically depicts a realistic 3D scene. The synthetic image defines a set of pixels, and each pixel is preferably labeled with the object depicted by the pixel (e.g., intrinsically labeled based on the parameters used to generate the object rendered in the image). In this manner, a “pixel-perfect” intrinsically annotated synthetic image can be created. In alternative variations, labelling can be performed on a basis other than a pixel-by-pixel basis; for example, labelling can include automatically generating a bounding box around objects depicted in the image, a bounding polygon of any other suitable shape, a centroid point, a silhouette or outline, a floating label, and any other suitable annotation, wherein the annotation includes label metadata such as the object class and/or other object metadata. Preferably, labelling is performed automatically (e.g., to generate an automatically semantically segmented image)” where such output of the image data record and annotation file are stored as explicitly mentioned in paragraphs 0119-0123 teaching “he images of the synthetic dataset include training classes for semantic segmentation. As an illustrative example, a set of RGB images may be stored to facilitate training of neural network architectures configured to be trained using one or more training classes. An exemplary storage format of the RGB image is the Portable Network Graphics (PNG) format. As an illustrative but non-limiting example, the dataset may include 25,000 RGB images stored in PNG format with a structure and content selected to form training classes for semantic segmentation” and “each image is annotated with class, instance, and depth information” and “metadata associated with each image is stored in a meta subdirectory, with a single JSON file corresponding to each RGB image. In some implementations, three types of metadata are provided: scene metadata, which describes the properties of the scene as a whole; camera/sensor metadata, describing the intrinsic and extrinsic characteristics of the sensor; and instance metadata, which provides details on the individual actors in each image”); 
randomly modifying the configuration file by varying the one or more object parameters (see Wrenninge, paragraphs 0040-0043 teaching randomly modifying the configuration file by varying the one or more object parameters where in block S100 there is the “determining an entire set of parameters defining a multidimensional parameter space that encompasses scene parameters, rendering parameters, and augmentation parameters. The parametrized variables (e.g., parameters) used in subsequent Blocks of the method can be known or determined prior to the generation of a scene (e.g., Block S200), rendering an image of the scene (e.g., Block S300), and/or augmenting of the image (e.g., Block S400); for example, prior to scene generation, Block S100 can include determining which augmentation parameters (e.g., camera exposure, scaling ranges, etc.) will be sampled and used in generating the synthetic image dataset. However, in alternative variations, Block S100 can include determining any suitable subset of the set of parameters at any suitable relative time relative to other Blocks or portions of the method 100” where “Block S100 includes determining a set of parameter values of the determined parameters (e.g., sampling a value of each of the set of parameters from a distribution function associated with each parameter and/or group of parameters). Parameters can include any quantifiable property of a virtual scene” and “Each parameter of the set of parameters can take on a value that is defined by a random variable, which can each be of several types: discretely valued (DV) random variables, stochastically valued (SV) random variables, and continuously valued (CV) random variables. DV parameters are preferably selected from a predetermined set or range (e.g., a set of discrete numerical values, a set of predetermined 3D model descriptions, etc.), but can be otherwise suitably determined” where in “a specific example, the discrete variables include: an overall number of pedestrians in the scene, a wet road surface, and light; the continuous variables include: the camera location (e.g., x/y/z), each pedestrian's height and pose, and the angle and magnitude of the light (e.g., into the camera); and the stochastic variables include: the pedestrian locations within the scene (e.g., wherein the stochastic variable can be changed between images to produce scenes with the desired parametric properties but with unique combinations)” such that this establishes a set of randomly determined parameters that can be sampled where “a new set of parameter values is determined (e.g., sampled from an LDS) each time a scene is generated, and then a synthetic image is generated of the newly generated scene (e.g., wherein a camera angle and position is defined by a pair of parameter values that are sampled to maximize the coverage of the parameter space). Alternatively, a single scene is generated, and the parameters defining the viewpoint of a virtual camera (e.g., camera angle and position of the camera) can be changed within the scene to generate a set of synthetic images. However, Block S100 can have any other suitable temporal characteristics” where “each time a scene is generated” this corresponds to randomly modifying the configuration file as the parameters are then changed to generate the next training data output image data as in paragraphs 0047-0049 where “Each parameter value (e.g., of each parameter of the set of parameters, each geometric parameter, each selection parameter, each rendering parameter, each augmentation parameter, any other parameter associated with a variable of the 3D scene and related synthetic images, etc.) is preferably determined (e.g., sampled, selected, computed, generated) according to a low discrepancy sequence (LDS) that covers the range of possible and/or allowable parameter values” such that any sampling by the LDS is of such random values meaning that any sampling by the LDS to modify the parameter values of the configuration file corresponds to randomly modifying the configuration by varying the one or more object properties as these are randomly set values, and furthermore such sampling may also be random as “each parameter value can be determined by sampling the PDF with a random or pseudorandom sequence, using any suitable random or pseudorandom sampling technique”; see also paragraphs 0056-0060 teaching “Block S200 includes generating a 3D scene based on parameter values (e.g., determined in accordance with one or more variations of Block S100). Block S200 functions to procedurally create a three dimensional virtual representation of an environment (e.g., a scene) based on the parameter values that control the implementation of the procedure (e.g., geometric parameter values, parameter values defining which object classes will be populated into a scene)” and “Block S200 can include defining the set of parameters to vary as well as the rule set that translates the parameter values (e.g., sampled according to one or more variations of Block S100) into the scene arrangement. However, the set of parameters to vary can additionally or alternatively be defined prior to Block S200 (e.g., in an instance of Block S100) or at any other suitable time” such that here again there is “defining the set of parameters to vary” as these then produce the multiple varied outputs based on the random selection of the parameters as “Block S200 preferably produces, as output, a plurality of synthesized virtual scenes. Scene synthesis in conjunction with the method 100 preferably includes a defined model of the 3D virtual scene (e.g., determined in accordance with one or more variations of Block S100) that contains the geometric description of objects in the scene, a set of materials describing the appearance of the objects, specifications of the light sources in the scene, and a virtual camera model”; see also paragraphs 0076-0079 wherein the “dataset generation” iterates through each block to vary the parameters and “In a first variation, all the parameters defining the final image (e.g., scene generation parameters associated with Block S200, rendering parameters associated with Block S300, and augmentation parameters associated with Block S400) are varied (e.g., resampled from an LDS, determined in accordance with one or more variations of Block S100 in multiple instances) for each synthetic image. By utilizing an LDS to determine the parameter values anew for each image, this variation preferably maximizes the variation among the set of images generated, given a set of parameters to vary” such that this is another instance of the variations being applied again); 
executing the modified configuration file on the processor to generate a second image data record, the second image data record including a second object different from the first object (see paragraphs 0076-0079 teaching “Block S500 can include repeating Blocks S100, S200, S300, and S400 to build up a synthetic dataset of synthetic images. Repetition of the aforementioned Blocks can be performed any suitable number of times, to produce a synthetic image dataset of any suitable size. The predetermined number of iterations can be: selected by a user (e.g., an end user of the dataset), based upon the parameter ranges of one or more parameters (e.g., wherein a larger parameter range can correspond to a larger number of iterations and larger resulting dataset, to prevent sparse sampling)” and as in paragraph 0043 as explained above “a new set of parameter values is determined (e.g., sampled from an LDS) each time a scene is generated, and then a synthetic image is generated of the newly generated scene (e.g., wherein a camera angle and position is defined by a pair of parameter values that are sampled to maximize the coverage of the parameter space). Alternatively, a single scene is generated, and the parameters defining the viewpoint of a virtual camera (e.g., camera angle and position of the camera) can be changed within the scene to generate a set of synthetic images” which uses varied parameters to generate a second image data record when the block is run again which then leads to creation of the multiple synthesized scenes as in paragraphs 0056-0059 teaching “Block S200 preferably produces, as output, a plurality of synthesized virtual scenes. Scene synthesis in conjunction with the method 100 preferably includes a defined model of the 3D virtual scene (e.g., determined in accordance with one or more variations of Block S100) that contains the geometric description of objects in the scene, a set of materials describing the appearance of the objects, specifications of the light sources in the scene, and a virtual camera model” which are second image data records which contain objects different from the first objects in the form of the new versions of any object or camera object or rendering property defined by the parameters for the scene which have been randomly varied; furthermore this leads to creation of second image data records in the form of the synthetic images of the varied scenes with varied objects as in paragraph 0066 teaching “Block S300 includes generating a synthetic image of the generated scene, which functions to create a realistic synthetic two-dimensional (2D) representation of objects in the scene, wherein the objects are intrinsically labeled with the parameter values used to generate the scene (e.g., the objects in the scene, object classifications, the layout of the objects, all other parametrized metadata, etc.)” and for example this may include a second object different from a first object such as a first camera object different from a second camera object defined with different parameters where “Block S300 is preferably performed by a virtual camera, wherein the viewpoint of the virtual camera is determined as a parameter value in a variation of Block S100 (e.g., as a rendering parameter). Generating the synthetic image preferably includes generating a projection of the 3D scene onto a 2D virtual image plane at a predetermined location in the 3D scene (e.g., at the virtual camera location)” and as established above the camera object can have its properties varied like any other object to change it to a different version of the object).
Regarding claim 16, Wrenninge teaches all that is required as applied to claim 1 above and further teaches wherein executing the command file and the configuration file on the processor to create the first image data record comprises receiving the first object from an object database based on the one or more object parameters (see Wrenninge, paragraphs 0056-0062 where as explained above Block S200 executes the command and configuration file on the processor to create the image data records and “In a specific example, Block S200 includes selecting object geometry of each object from a geometry database (e.g., a database of CAD models, geometry gathered from real-world examples of objects in the form of images, etc.) and arranging the objects (e.g., rendered using the selected geometry) within a virtual space based on a pose parameter value (e.g., determined in a variation of Block S100). In another specific example, Block S200 includes placing an object in a 3D scene, wherein the object is an instance of an object class, based on the geometric parameter value (e.g., defining a spacing of the placed object relative to previously placed objects, defining the angular orientation of the object, etc.). In another specific example, Block S200 can include simultaneously placing a plurality of objects into a 3D scene (e.g., generating a 3D scene composed of a plurality of objects)” such that here an object may be taken from a geometry database).
Regarding claim 21, Wrenninge teaches all that is required as applied to claim 1 above and further teaches wherein the one or more parameters comprise an inherent hierarchy of objects stored in an object database (see Wrenninge, paragraph 0042 teaching “Objects that are defined by scene parameters can be made up of subobjects, which can have properties (e.g., defined by parameters, which can be the same parameters, similar parameters, or different parameters than the parent object) associated therewith. For example, a building object can include a footprint (e.g., defined by a DV parameter that can take on a value of a footprint selected from a set of predetermined building footprints) and a number of floors and windows, each of which can be defined by additional parameters” such that here this parent, child relationship is an inherent hierarchy of objects stored in an object database and as in paragraph 0061 “Block S200 includes selecting object geometry of each object from a geometry database (e.g., a database of CAD models, geometry gathered from real-world examples of objects in the form of images, etc.) and arranging the objects (e.g., rendered using the selected geometry) within a virtual space based on a pose parameter value (e.g., determined in a variation of Block S100). In another specific example, Block S200 includes placing an object in a 3D scene, wherein the object is an instance of an object class, based on the geometric parameter value (e.g., defining a spacing of the placed object relative to previously placed objects, defining the angular orientation of the object, etc.)” such that here parameters comprise an inherent hierarchy of objects stored in a database in relation to their hierarchy for placement parameters where objects are placed relative to other objects previously placed in a hierarchical placement).
Regarding claim 22, Wrenninge teaches all that is required as applied to claim 1 above and further teaches wherein the one or more parameters comprise states of defined, allowed, and forbidden (see Wrenninge, paragraph 0051 teaching “Block S100 can optionally include constraining parameter values and determining parameter values based on the constraint(s) (e.g., constrained values). For example, in a case wherein the parameter defines the orientation (e.g., heading) of an object, the orientation can be constrained to align substantially with a direction of traffic flow. In a related example, bicycle objects can be selected from a set of predetermined models (i.e., the bicycle objects types are a discrete variable) and then positioned along a defined roadway (e.g., stochastically by sampling a stochastic variable), but can take on an orientation defined by a continuous variable that is constrained to be no more than 5° from parallel with the traffic direction of the virtually defined roadway” and “Block S100 can optionally include constraining parameter values and determining parameter values based on the constraint(s) (e.g., constrained values). For example, in a case wherein the parameter defines the orientation (e.g., heading) of an object, the orientation can be constrained to align substantially with a direction of traffic flow. In a related example, bicycle objects can be selected from a set of predetermined models (i.e., the bicycle objects types are a discrete variable) and then positioned along a defined roadway (e.g., stochastically by sampling a stochastic variable), but can take on an orientation defined by a continuous variable that is constrained to be no more than 5° from parallel with the traffic direction of the virtually defined roadway” such that these constraints comprise states of defined, allowable, and forbidden as parameters are defined by being in a possible set, and are allowed if with a specific range or value and certain parameter values may be forbidden if outside the constraint and as in paragraph 0052 “Block S100 can include constraining parameter values based on extracted parameter ranges. In examples, the method can include extracting (e.g., from a real image dataset, from a database, etc.) a set of parameter ranges (e.g., maximum and minimum parameter values in a real image dataset) and constraining parameter values (e.g., sampled from an LDS) to fall within the extracted parameter range (e.g., by remapping the sampled LDS value to be between the minimum and maximum values of the range). In a specific example, the minimum and maximum camera exposure can be extracted from a real-image dataset, and an augmentation parameter defining a camera exposure (e.g., image brightness and/or contrast) can be constrained to fall between the minimum and maximum camera exposure upon sampling the parameter value from an LDS” this also provides parameters that comprise constraints which comprise states of defined, allowed, and forbidden).
Regarding claim 23, Wrenninge teaches all that is required as applied to claim 1 above and further teaches wherein the configuration file describes a static or dynamic scene (note that the claim appears to recite the only possible states of a scene which would be static or dynamic and furthermore does not limit in what way the configure file describes that a scene is static or dynamic; see Wrenninge, paragraph 0095 teaching for example “a single static scene (e.g., of particular relevance to an end-user application) can be used to render many image variations, which can be augmented or left un-augmented post-rendering” such that here a single static scene means that the configuration file describes a single static scene).
Regarding claim 24, Wrenninge teaches all that is required as applied to claim 1 above and further teaches wherein the configuration file comprises general parameters, objects, materials, illumination, camera properties, object positions, object movements, properties of surroundings, occlusion planes, or time-dependent parameter changes (see Wrenninge, paragraphs 0036-0042 teaching “all aspects of the individual object classes, such as geometry, materials, color, and placement can be parameterized, and a synthesized image and its corresponding annotations (e.g., of each instance of an object class in a virtual scene) represent a sampling of that parameter space (e.g., multidimensional parameter space)” and “Parameters can include any quantifiable property of a virtual scene” such that the configuration file which is the set of parameters comprise general parameters, objects, materials, illumination, camera properties, object positions, object movements and/or properties of surroundings).
Regarding claim 27, Wrenninge teaches all that is required as applied to claim 24 above and further teaches wherein the general parameters comprise a description of the scene to be simulated (see Wrenninge, paragraphs 0036-0042 teaching “all aspects of the individual object classes, such as geometry, materials, color, and placement can be parameterized, and a synthesized image and its corresponding annotations (e.g., of each instance of an object class in a virtual scene) represent a sampling of that parameter space (e.g., multidimensional parameter space)” and “Block S100 can include determining an object class, which functions to select a class of object to be included in the parametric scene generation (e.g., in accordance with one or more variations of Block S200). Block S100 can include determining a set of object classes to be depicted in a 3D scene. Determining the set of object classes can include selecting from a predetermined list of possible object classes (e.g., automotive vehicles, pedestrians, light and/or human-powered vehicles, buildings, traffic signage or signals, road surfaces, vegetation, trees etc.). Determining the set of object classes can be based upon received instructions (e.g., user preferences, preferences of an entity requesting generation of a synthetic dataset, etc.), contextual information (e.g., the physical environment of which the synthetic image dataset is intended to be representative), or otherwise suitably determined with any other suitable basis. The set of object classes can be different for each scene, and can include various subsets of the list of possible or available object classes (e.g., wherein a first scene includes only automotive vehicles, and a second scene includes only traffic signage and buildings, etc.)” where this selection of objects is according to a general parameter that dictates a “predetermined list of possible object classes” and for example “the object classes defining the list of possible object classes for inclusion can be extracted from a real world dataset (e.g., via an object detection and classification process), such as a real world cityscape dataset; in such examples and related examples, determining the list of object classes can include replicating the list of object classes extracted, selecting a subset from among the extracted object classes (e.g., to increase the likelihood of low-probability object classes of appearing in the synthetic dataset), or otherwise suitably determining the object class or set of object classes for inclusion” such that this corresponds to a general parameter that the object classes are based on particular scenarios).
Regarding claim 28, Wrenninge teaches all that is required as applied to claim 1 above and further teaches wherein further image data records and annotation files associated with the further image data records are created on the basis of variations defined by the random modifications of the configuration file (note that technically the claim refers to “variations defined by the random modifications” which refers see Wrenninge, paragraphs 0076-0079 teaching “Block S500 can include repeating Blocks S100, S200, S300, and S400 to build up a synthetic dataset of synthetic images. Repetition of the aforementioned Blocks can be performed any suitable number of times, to produce a synthetic image dataset of any suitable size. The predetermined number of iterations can be: selected by a user (e.g., an end user of the dataset), based upon the parameter ranges of one or more parameters (e.g., wherein a larger parameter range can correspond to a larger number of iterations and larger resulting dataset, to prevent sparse sampling)” such that this built up dataset of many images are further data records and annotation files and as in paragraph 0043 as explained above “a new set of parameter values is determined (e.g., sampled from an LDS) each time a scene is generated, and then a synthetic image is generated of the newly generated scene (e.g., wherein a camera angle and position is defined by a pair of parameter values that are sampled to maximize the coverage of the parameter space). Alternatively, a single scene is generated, and the parameters defining the viewpoint of a virtual camera (e.g., camera angle and position of the camera) can be changed within the scene to generate a set of synthetic images” which uses varied parameters to generate further image data records when the block is run again which then leads to creation of the multiple synthesized scenes as in paragraphs 0056-0059 teaching “Block S200 preferably produces, as output, a plurality of synthesized virtual scenes. Scene synthesis in conjunction with the method 100 preferably includes a defined model of the 3D virtual scene (e.g., determined in accordance with one or more variations of Block S100) that contains the geometric description of objects in the scene, a set of materials describing the appearance of the objects, specifications of the light sources in the scene, and a virtual camera model” which are further image data records which contain objects different from the first objects in the form of the new versions of any object or camera object or rendering property defined by the parameters for the scene which have been randomly varied; furthermore this leads to creation of further image data records in the form of the synthetic images of the varied scenes with varied objects as in paragraph 0066 teaching “Block S300 includes generating a synthetic image of the generated scene, which functions to create a realistic synthetic two-dimensional (2D) representation of objects in the scene, wherein the objects are intrinsically labeled with the parameter values used to generate the scene (e.g., the objects in the scene, object classifications, the layout of the objects, all other parametrized metadata, etc.)”).
Regarding claim 29, Wrenninge teaches all that is required as applied to claim 28 above and further teaches wherein the configuration file contains allowed and forbidden states of objects and illumination, and the variations are defined by these states (see Wrenninge, paragraph 0059 teaching “Block S200 preferably produces, as output, a plurality of synthesized virtual scenes. Scene synthesis in conjunction with the method 100 preferably includes a defined model of the 3D virtual scene (e.g., determined in accordance with one or more variations of Block S100) that contains the geometric description of objects in the scene, a set of materials describing the appearance of the objects, specifications of the light sources in the scene, and a virtual camera model” and “materials describing or defining the appearance of the objects preferably define how light interacts with surfaces and participating media (e.g., air or dust interposed between a virtual camera and surfaces), but can otherwise suitably define object properties. After generation, the scene is then virtually illuminated using a light source (e.g., a single light source, several light sources, etc.), and the composition of the rendered frame is defined by introducing a virtual camera” such that these parameters make up the configuration files and have the parameters or states of objects and illumination which are set in block S100 and as explained in paragraphs 0040-0041, each object including lighting and camera objects is defined by parameter values from the set of parameter values where “Each parameter of the set of parameters can take on a value that is defined by a random variable, which can each be of several types: discretely valued (DV) random variables, stochastically valued (SV) random variables, and continuously valued (CV) random variables. DV parameters are preferably selected from a predetermined set or range (e.g., a set of discrete numerical values, a set of predetermined 3D model descriptions, etc.), but can be otherwise suitably determined. In a specific example, a DV parameter can be a vehicle object parameter, which can take on one of 30 predetermined values (e.g., 3D models of various vehicle types), and the parameter value can be sampled from the set of 30 3D models in order to maximize the coverage of the set within the generated scene. CV parameters are preferably determined from either a discrete or continuous distribution of values based on a randomized seed (e.g., computed as a pseudorandom number), but can be manually selected, automatically extracted from a map or image (e.g., of a real-world scene, of a physical location, of a manually or automatically detected edge case), or otherwise suitably determined. CV parameters are preferably computed from a continuous functional distribution (e.g., a single valued function), but can be otherwise suitably determined. In one example, DV parameters are determined for a set of candidate objects (e.g., pedestrians, roads, light source, etc.); and CV parameter values and SV parameter values are determined (e.g., selected, specified, received, randomly generated, etc.) for each candidate object to be included in the scene (e.g., based on a set of potential values associated with each candidate object). In a specific example, the discrete variables include: an overall number of pedestrians in the scene, a wet road surface, and light; the continuous variables include: the camera location (e.g., x/y/z), each pedestrian's height and pose, and the angle and magnitude of the light (e.g., into the camera); and the stochastic variables include: the pedestrian locations within the scene (e.g., wherein the stochastic variable can be changed between images to produce scenes with the desired parametric properties but with unique combinations)” and setting of such a range for a parameter means that the variation of the parameters is defined by allowed states within a range and forbidden states which are outside the range, and as in paragraphs 0047-0054, these parameters are selected from by an LDS such that the variations of the parameters selected lead to creation of each synthetic scene and image).
Regarding claim 30, Wrenninge teaches all that is required as applied to claim 28 above and further teaches wherein the variations relate to at least one light source (see Wrenninge, paragraph 0041 teaching each object including lighting and camera objects is defined by parameter values from the set of parameter values where “Each parameter of the set of parameters can take on a value that is defined by a random variable, which can each be of several types: discretely valued (DV) random variables, stochastically valued (SV) random variables, and continuously valued (CV) random variables. DV parameters are preferably selected from a predetermined set or range (e.g., a set of discrete numerical values, a set of predetermined 3D model descriptions, etc.), but can be otherwise suitably determined. In a specific example, a DV parameter can be a vehicle object parameter, which can take on one of 30 predetermined values (e.g., 3D models of various vehicle types), and the parameter value can be sampled from the set of 30 3D models in order to maximize the coverage of the set within the generated scene. CV parameters are preferably determined from either a discrete or continuous distribution of values based on a randomized seed (e.g., computed as a pseudorandom number), but can be manually selected, automatically extracted from a map or image (e.g., of a real-world scene, of a physical location, of a manually or automatically detected edge case), or otherwise suitably determined. CV parameters are preferably computed from a continuous functional distribution (e.g., a single valued function), but can be otherwise suitably determined. In one example, DV parameters are determined for a set of candidate objects (e.g., pedestrians, roads, light source, etc.); and CV parameter values and SV parameter values are determined (e.g., selected, specified, received, randomly generated, etc.) for each candidate object to be included in the scene (e.g., based on a set of potential values associated with each candidate object). In a specific example, the discrete variables include: an overall number of pedestrians in the scene, a wet road surface, and light; the continuous variables include: the camera location (e.g., x/y/z), each pedestrian's height and pose, and the angle and magnitude of the light (e.g., into the camera); and the stochastic variables include: the pedestrian locations within the scene (e.g., wherein the stochastic variable can be changed between images to produce scenes with the desired parametric properties but with unique combinations)” where “light” and “angle and magnitude of the light” are parameters that can be varied relating to at least one light source).
Regarding claim 31, Wrenninge teaches all that is required as applied to claim 1 above and further teaches wherein the image data records created and the annotation files associated therewith are stored in categorized fashion (note that data “stored in categorized fashion” is extremely broad as “categorized” would refer to any data that has been stored with respect to a particular class or group; thus see Wrenninge, paragraphs 0076-0079 teaching that running the processing blocks numerous times outputs image data records created and annotation files associated therewith where “Block S500 includes generating a synthetic image dataset. Block S500 functions to output a synthetic image dataset, made up of intrinsically labelled images, as a result of the procedural generation, rendering, and/or augmentation Blocks of the method, for use in downstream applications (e.g., model training, model evaluation, image capture methodology validation, etc.). Block S500 can thus also include combining a plurality of images into dataset (e.g., made up of the plurality of images)” such that this combining into a dataset for use by another program downstream is storage in a categorized fashion as all have been placed into the same class or group dataset to be used by another process which will take the categorized data and use it downstream ).
Regarding claim 32, Wrenninge teaches all that is required as applied to claim 1 above and further teaches wherein the configuration file, at least in part, contains pieces of information based on a default (note that a file containing information based on a default is extremely broad as this could mean information has been set to some default state ((default meaning some element that is automatically selected without user intervention such as by being pre-assigned or pre-set or automatically set otherwise), or it could mean information in the configuration file themselves are based on a default or it could mean that the system detects a default in some manner and the information is based on such a default; see Wrenninge, paragraph 0041 teaching “Preferably, the parameters are related to driving-relevant environments (e.g., roadways and surrounding objects and scenery), but can additionally or alternatively be related to any suitable virtual environment (e.g., airplane landing and takeoff corridors, orbital trajectories, pedestrian routes, mixed use environments, etc.). Each parameter of the set of parameters can take on a value that is defined by a random variable, which can each be of several types: discretely valued (DV) random variables, stochastically valued (SV) random variables, and continuously valued (CV) random variables. DV parameters are preferably selected from a predetermined set or range (e.g., a set of discrete numerical values, a set of predetermined 3D model descriptions, etc.), but can be otherwise suitably determined. In a specific example, a DV parameter can be a vehicle object parameter, which can take on one of 30 predetermined values (e.g., 3D models of various vehicle types), and the parameter value can be sampled from the set of 30 3D models in order to maximize the coverage of the set within the generated scene” such that here for example objects which are selected and defined to be in the scene with particular object parameters are based on a default in the form of default objects and object parameters which are pre-existing such that the file containing these objects and parameters are based on the default libraries for a particular environment for example).
Regarding claim 33, Wrenninge teaches a method for creating an application specific, artificial intelligence-based object recognition, comprising the training of a prediction algorithm (see Wrenninge, paragraph 0033-0034 and figures 1A-1B teaching “the method 100 can include: determining a set of parameter values associated with at least one of a set of geometric parameters, a set of rendering parameters, and a set of augmentation parameters S100; generating a three dimensional (3D) scene based on the set of geometric parameters S200; rendering a synthetic image of the scene based on the set of rendering parameters S300; augmenting the synthetic image based on the set of augmentation parameters S400; and, generating a synthetic image dataset S500. The method can additionally or alternatively include: training a model based on the synthetic image dataset S600” where S600 corresponds to training a prediction algorithm as part of a method for creating an application specific, artificial intelligence-based object recognition as for example “method 100 can also function to generate a synthetic dataset for provision to a third party (e.g., a developer of machine image classification models, a tester of computer vision models, a creator of vehicle control models based on image analysis, etc.)” where image classification models and computer vision models which are trained on data are AI based object recognition models as further described in paragraphs 0080-0082 teaching “Block S600 functions to train a learning model (e.g., an ML model, a synthetic neural network, a computational network, etc.) using supervised learning, based on the intrinsically labeled synthetic image dataset. Block S600 can function to modify a model to recognize objects (e.g., classify objects, detect objects, etc.) depicted in images with an improved accuracy (e.g., as compared to an initial accuracy, a threshold accuracy, a baseline accuracy, etc.)” and as in paragraph 0083 the model is trained for “various tasks (e.g., object classification, object detection, etc.) using the synthetic dataset as an input”) with training data synthesized according to claim 1 (note that the claim requires all of the particulars of claim 1 as well as the limitations above; see rejection of claim 1 above which teaches such data synthesized according to claim 1 and which is used for such creating as explained above.).
Regarding claim 34, Wrenninge teaches a method for recognizing objects in a specific application, wherein a prediction algorithm trained with training data synthesized according to claim 1 is applied to at least one digital image and wherein the at least one image is an image of a scene which forms the basis of the training data created (note that the claim requires all of the particulars of claim 1 such that the training data used is generated according to the limitations of claim 1, thus for such limitations, see the rejection of claim 1 above which addresses training data synthesized according to claim 1; then see further paragraph 0033 teaching “the method 100 can include: determining a set of parameter values associated with at least one of a set of geometric parameters, a set of rendering parameters, and a set of augmentation parameters S100; generating a three dimensional (3D) scene based on the set of geometric parameters S200; rendering a synthetic image of the scene based on the set of rendering parameters S300; augmenting the synthetic image based on the set of augmentation parameters S400; and, generating a synthetic image dataset S500. The method can additionally or alternatively include: training a model based on the synthetic image dataset S600; evaluating a trained model based on the synthetic image dataset S700” where such evaluating corresponds to recognizing objects in a specific application corresponding to that dataset and that uses a prediction algorithm trained with the synthetic training data generated according to claim 1 above and as the model is an object recognition model that is applied to digital images then such evaluation and use of the model corresponds to application of it to at least one digital image as for example evaluation in paragraphs 0083-0087 comprises “evaluating a model based on the synthetic image dataset. Block S700 can include generating a performance metric that quantifies the performance of the model at various tasks (e.g., object classification, object detection, etc.) using the synthetic dataset as an input” where as in paragraph 0080 it is a “model to recognize objects (e.g., classify objects, detect objects, etc.) depicted in images” such that here when using the trained model this applies the model to whatever input image is provided for object detection where here the image with objects being recognized during inference is an image of a scene where during testing it may be “using the synthetic dataset as an input” such that these form the basis of the training data created).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 17-20 and 25-26 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wrenninge in view of Wolf et al2 (“Wolf”).
Regarding claim 17, Wrenninge teaches all that is required as applied to claim 1 above but fails to specifically teach wherein the first image data record and the second image data record include a hospital room. Note that such image data records that “include a hospital room” are defined in any manner of how the records “include a hospital room” and for example such records could include a hospital room if they contain any reference to a hospital room or describe a hospital room in any manner. Wrenninge teaches that the first and second image data records correspond to the models and synthetic images created from parameters in Block S100 where these parameters describe the objects and object parameters that define everything visible in the scene and the surrounding environment and the parameters “can additionally or alternatively be related to any suitable virtual environment (e.g., airplane landing and takeoff corridors, orbital trajectories, pedestrian routes, mixed use environments, etc.)” (see Wrenninge, paragraph 0041). Wrenninge teaches that “objects” and “object classes” “include any suitable three-dimensional objects that can be arranged within a virtual world” and “object classes defining the list of possible object classes for inclusion can be extracted from a real world dataset” (see Wrenninge, paragraph 0038) and that objects and parameters “can be…automatically extracted from a map or image (e.g., of a real-world scene, of a physical location” (see Wrenninge, paragraph 0041) such that it should be understood that Wrenninge teaches to one of ordinary skill in the art that any conceivable environment containing any type of content can be generated, where this would include generating first and second images of a hospital scene defined by objects. However, Wrenninge is technically silent with respect to the system generating training data where the first and second image data records explicitly include a hospital room, or that the predetermined surroundings are of an operating room. Thus Wrenninge stands as a base device upon which the claimed invention can be seen as an improvement insofar as the claimed system would contain an improvement over the ability of Wrenninge to specifically provide hospital/operating room data as the basis for training the AI-based object recognition.
In the same field of endeavor relating to providing images of specific environments as training examples for training artificial intelligence based object detection recognition in images of medical and/or clinical workflows, Wolf teaches that it is known to provide image data of hospital or operating rooms and images of medical and/or clinical workflow to train an AI object recognition model in order to use such an object recognition model to recognize objects in images of medical and/or clinical workflow and where multiple images are of a hospital or operating room setting such that the images provided for training include a hospital or operating room and are of predetermined surroundings of such hospital or operating room (see Wolf, paragraph 0079 teaching “machine learning algorithms (also referred to as machine learning models in the present disclosure) may be trained using training examples, for example in the cases described below. Some non-limiting examples of such machine learning algorithms may include classification algorithms, data regressions algorithms, image segmentation algorithms, visual detection algorithms (such as object detectors, face detectors, person detectors, motion detectors, edge detectors, etc.), visual recognition algorithms (such as face recognition, person recognition, object recognition, etc.)” and paragraph 0083 teaching “analyzing image data (for example, by the methods, steps and modules described herein) may comprise analyzing the image data and/or the preprocessed image data using one or more rules, functions, procedures, artificial neural networks, object detection algorithms” and “examples of such inference models may include: … a result of training algorithms, such as machine learning algorithms and/or deep learning algorithms, on training examples, where the training examples may include examples of data instances, and in some cases, a data instance may be labeled with a corresponding desired label and/or result; and so forth” and as in paragraphs 0085-0086 it is shown that the application of the techniques is with regard to an operating room setting which can be considered a hospital room as well where “FIG. 1 shows an example operating room 101, consistent with disclosed embodiments” and “some of the cameras (e.g., cameras 115, 123 and 125) may capture video/image data of operating table 141 (e.g., the cameras may capture the video/image data at a location 127 of a body of patient 143 on which a surgical procedure is performed), camera 121 may capture video/image data of other parts of operating room 101. For instance, camera 121 may capture video/image data of a surgeon 131 performing the surgery. In some cases, cameras may capture video/image data associated with surgical team personnel, such as an anesthesiologist, nurses, surgical tech and the like located in operating room 101. Additionally, operating room cameras may capture video/image data associated with medical equipment located in the room” such that what is to be analyzed is a hospital and/or operating room setting of medical and/or clinical workflow, and see paragraph 0188-0189 teaching multiple instances of training examples based on video clips of a medical or clinical workflow in a hospital room or operating room surroundings such as “a machine learning model may be trained using training examples to determine event types from videos, and the trained machine learning model may be used to analyze the video footage and determine the event type. An example of such training example may include a video clip depicting an event together with a label indicating the event type. In an additional example, the event characteristic may include information related to a medical instrument involved in the event (such as type of medical instrument, usage of the medical instrument, etc.), a machine learning model may be trained using training examples to identify such information related to medical instruments from videos, and the trained machine learning model may be used to analyze the video footage and determine the information related to a medical instrument involved in the event. An example of such training example may include video clip depicting an event including a usage of a medical instrument, together with a label indicative of the information related to the medical instrument”). Thus Wolf teaches that it is known and desirable to utilize multiple image data records of a hospital room when generating training images for artificial intelligence-based object recognition in images of medical and/or clinical workflows. Thus Wolf provides a known technique for generating such application specific AI-based object recognition in hospital and/or operating room environments.
Therefore it would have been obvious for one of ordinary skill in the art before the effective filing date to modify Wrenninge with the known techniques of Wolf as doing so would be no more than application of a known technique to a base device ready for improvement which would yield predictable results and result in an improved system.  Here the predictable result of the application of Wolf’s technique to Wrenninge would be that the virtual environment modeled and parameterized and defined as in Wrenninge would be a virtual environment related to a hospital room or including the surroundings of an operating room in a hospital, such that the synthesized model of Wrenninge as so modified would generate a model of the workflow in a medical and/or clinical setting and would then vary such parameters and output annotated image data for training as in Wrenninge. As explained above Wrenninge already teaches that any environment including real environments including people and objects can be simulated and used as the basis for training data and a hospital or operating room environment is such an environment. This would result in an improved system as Wrenninge as modified would now be capable of providing training data and training AI models based on such specific data which would result in improved object detection models for that domain.
Regarding claim 18, Wrenninge as modified teaches all that is required as applied to claim 17 above and further teaches wherein the first object is a camera, the one or more parameters defining a perspective of the camera (see Wrenninge, paragraphs 0056-0066 teaching “Block S200 includes generating a 3D scene based on parameter values (e.g., determined in accordance with one or more variations of Block S100). Block S200 functions to procedurally create a three dimensional virtual representation of an environment (e.g., a scene) based on the parameter values that control the implementation of the procedure (e.g., geometric parameter values, parameter values defining which object classes will be populated into a scene)” such that when generating a 3D scene based on the particular values passed in any iteration to block S200, this executes the command file and configuration file to generate the scene according to the parameters in order to create a first image data record in the form of the 3D scene which can be captured, and does so according to the various objects such as “objects in the scene” and “a virtual camera model” object as in “Scene synthesis in conjunction with the method 100 preferably includes a defined model of the 3D virtual scene (e.g., determined in accordance with one or more variations of Block S100) that contains the geometric description of objects in the scene, a set of materials describing the appearance of the objects, specifications of the light sources in the scene, and a virtual camera model”  and; then, additionally “Block S300 includes generating a synthetic image of the generated scene, which functions to create a realistic synthetic two-dimensional (2D) representation of objects in the scene, wherein the objects are intrinsically labeled with the parameter values used to generate the scene (e.g., the objects in the scene, object classifications, the layout of the objects, all other parametrized metadata, etc.)” where this generation of the synthetic image of the scene also corresponds to creation of a first image data record either alone or in combination with step S200; in both cases the image data records include a first object defined by the one or more object parameters as the virtual scene generated according to the object parameters includes the procedurally generated 3D virtual representations of objects that are to be captured by virtual camera objects that are defined and the synthetic images capturing the 3D virtual representation from the perspective of the virtual cameras include a first object defined by one or more object parameters as “Block S300 can also function to produce realistic synthetic images with pixel-perfect ground truth annotations and/or labels (e.g., of what each object should be classified as, to any suitable level of subclassification and/or including any suitable geometric parameter, such as pose, heading, position, orientation, etc.). Block S300 is preferably performed based on a set of rendering parameters (e.g., determined in accordance with an instance of a variation of Block S100), but can additionally or alternatively be performed based on a fixed set of rendering rules or procedures. Block S300 is preferably performed by a virtual camera, wherein the viewpoint of the virtual camera is determined as a parameter value in a variation of Block S100 (e.g., as a rendering parameter). Generating the synthetic image preferably includes generating a projection of the 3D scene onto a 2D virtual image plane at a predetermined location in the 3D scene (e.g., at the virtual camera location)” and as in paragraphs 0120-0123 the camera can be included in such labeled/annotated data where “metadata associated with each image is stored in a meta subdirectory, with a single JSON file corresponding to each RGB image. In some implementations, three types of metadata are provided: scene metadata, which describes the properties of the scene as a whole; camera/sensor metadata, describing the intrinsic and extrinsic characteristics of the sensor” and “the camera/sensor metadata describes attributes of the intrinsic and extrinsic behavior of each camera/sensor. For example, the camera metadata may include extrinsic camera metadata and intrinsic camera metadata”).
Regarding claim 19, Wrenninge teaches all that is required as applied to claim 18 above and further teaches wherein the second object is the camera, the one or more parameters defining a different perspective of the camera than the first object (note that the claim requires that the second object is the camera after claim 17 required that “the first object is a camera” which means that the second camera must also be considered to be the first object which is the camera, with the one or more parameters defining a different perspective which is seen as making the second camera the first camera and would essentially include a case in which the first camera is moved and is now in one sense the first camera, but can be considered a second camera as it has moved and functions as a second camera in that instance; see Wrenninge, paragraphs 0056-0059 teaching “Block S200 preferably produces, as output, a plurality of synthesized virtual scenes. Scene synthesis in conjunction with the method 100 preferably includes a defined model of the 3D virtual scene (e.g., determined in accordance with one or more variations of Block S100) that contains the geometric description of objects in the scene, a set of materials describing the appearance of the objects, specifications of the light sources in the scene, and a virtual camera model” which are second image data records which contain objects different from the first objects in the form of the new versions of any object or camera object or rendering property defined by the parameters for the scene which have been randomly varied; furthermore this leads to creation of second image data records in the form of the synthetic images of the varied scenes with varied objects as in paragraph 0066 teaching “Block S300 includes generating a synthetic image of the generated scene, which functions to create a realistic synthetic two-dimensional (2D) representation of objects in the scene, wherein the objects are intrinsically labeled with the parameter values used to generate the scene (e.g., the objects in the scene, object classifications, the layout of the objects, all other parametrized metadata, etc.)” and for example this may include a second object different from a first object such as a first camera object different from a second camera object defined with different parameters where “Block S300 is preferably performed by a virtual camera, wherein the viewpoint of the virtual camera is determined as a parameter value in a variation of Block S100 (e.g., as a rendering parameter). Generating the synthetic image preferably includes generating a projection of the 3D scene onto a 2D virtual image plane at a predetermined location in the 3D scene (e.g., at the virtual camera location)” and as established above the camera object can have its properties varied like any other object to change it to a different version of the object).
Regarding claim 20, Wrenninge as modified teaches all that is required as applied to claim 18 above and further teaches wherein the first image data record includes a two- dimensional individual image, the individual image taken from a viewing angle of the camera (see Wrenninge, paragraphs 0066-0070 teaching “Block S300 includes generating a synthetic image of the generated scene, which functions to create a realistic synthetic two-dimensional (2D) representation of objects in the scene, wherein the objects are intrinsically labeled with the parameter values used to generate the scene (e.g., the objects in the scene, object classifications, the layout of the objects, all other parametrized metadata, etc.)” and “output of Block S300 preferably includes a two dimensional synthetic image that realistically depicts a realistic 3D scene”).
Regarding claim 25, as rendered definite as explained above, Wrenninge teaches all that is required as applied to claim 1 above and further teaches wherein the scene described in the configuration file is simulated in predetermined surroundings, the predetermined surroundings including a description of an operating room (see Wrenninge, paragraphs 0041-0042 teaching for example “parameters are related to driving-relevant environments (e.g., roadways and surrounding objects and scenery), but can additionally or alternatively be related to any suitable virtual environment” and “Examples of parameters for which values are determined in Block S100 include: roadway numerosity and connectivity, roadway spacing (e.g., space between intersections, width of roadways, length of roadways, lane spacing on roadways, number of lanes, road width, etc.), ground and/or roadway surface properties (e.g., roughness, reflectivity, traffic markings, repair markings, material, moisture, etc.), sidewalk properties (e.g., presence or absence of sidewalks, curb height, width, surface properties, color, material, dirt amount, etc.), surrounding object properties (e.g., presence or absence of buildings, pedestrians, fences, cars, vegetation, etc.) and properties associated therewith (e.g., height, width, window height, depth, material, numerosity, color, surface characteristics, orientation, position, geometric model, type, count, etc.), objects on the roadway (e.g., motor vehicles, human-powered vehicles, sedans, vans, trucks, etc.) and properties associated therewith (e.g., numerosity, color, surface characteristics, orientation, position, etc.), lighting parameters (e.g., practical lighting such as traffic lights or street lights, atmospheric lighting, longitude and latitude dictating sun positions, direction, angle, intensity, color, cloud cover, etc.), or any other suitable parameter” where all of such parameter describe the surroundings of the scene as for example if there are multiple objects or they are set in any background then there are predetermined surroundings). Wrenninge teaches all of the above but fails to specifically teach that the surroundings are including a description of an operating room, as instead it is taught more generally that the system can produce a model of any environment that is able to be modeled in 3D as explained above.  Thus Wrenninge stands as a base device upon which the claimed invention can be seen as an improvement insofar as the claimed system would contain an improvement over the ability of Wrenninge to specifically provide hospital/operating room data as the basis for training the AI-based object recognition.
As explained above, Wrenninge teaches that “objects” and “object classes” “include any suitable three-dimensional objects that can be arranged within a virtual world” and “object classes defining the list of possible object classes for inclusion can be extracted from a real world dataset” (see Wrenninge, paragraph 0038) and that objects and parameters “can be…automatically extracted from a map or image (e.g., of a real-world scene, of a physical location” (see Wrenninge, paragraph 0041) such that it should be understood that Wrenninge teaches to one of ordinary skill in the art that any conceivable environment containing any type of content can be generated, where this would include generating first and second images of a hospital scene defined by objects. However, Wrenninge is technically silent with respect to the system generating training data where the predetermined surroundings include a hospital room, or that the predetermined surroundings are of an operating room. Thus Wrenninge stands as a base device upon which the claimed invention can be seen as an improvement insofar as the claimed system would contain an improvement over the ability of Wrenninge to specifically provide hospital/operating room data as the basis for training the AI-based object recognition.
In the same field of endeavor relating to providing images of specific environments as training examples for training artificial intelligence based object detection recognition in images of medical and/or clinical workflows, Wolf teaches that it is known to provide image data of hospital or operating rooms and images of medical and/or clinical workflow to train an AI object recognition model in order to use such an object recognition model to recognize objects in images of medical and/or clinical workflow and where multiple images are of a hospital or operating room setting such that the images provided for training include a hospital or operating room and are of predetermined surroundings of such hospital or operating room (see Wolf, paragraph 0079 teaching “machine learning algorithms (also referred to as machine learning models in the present disclosure) may be trained using training examples, for example in the cases described below. Some non-limiting examples of such machine learning algorithms may include classification algorithms, data regressions algorithms, image segmentation algorithms, visual detection algorithms (such as object detectors, face detectors, person detectors, motion detectors, edge detectors, etc.), visual recognition algorithms (such as face recognition, person recognition, object recognition, etc.)” and paragraph 0083 teaching “analyzing image data (for example, by the methods, steps and modules described herein) may comprise analyzing the image data and/or the preprocessed image data using one or more rules, functions, procedures, artificial neural networks, object detection algorithms” and “examples of such inference models may include: … a result of training algorithms, such as machine learning algorithms and/or deep learning algorithms, on training examples, where the training examples may include examples of data instances, and in some cases, a data instance may be labeled with a corresponding desired label and/or result; and so forth” and as in paragraphs 0085-0086 it is shown that the application of the techniques is with regard to an operating room setting which can be considered a hospital room as well where “FIG. 1 shows an example operating room 101, consistent with disclosed embodiments” and “some of the cameras (e.g., cameras 115, 123 and 125) may capture video/image data of operating table 141 (e.g., the cameras may capture the video/image data at a location 127 of a body of patient 143 on which a surgical procedure is performed), camera 121 may capture video/image data of other parts of operating room 101. For instance, camera 121 may capture video/image data of a surgeon 131 performing the surgery. In some cases, cameras may capture video/image data associated with surgical team personnel, such as an anesthesiologist, nurses, surgical tech and the like located in operating room 101. Additionally, operating room cameras may capture video/image data associated with medical equipment located in the room” such that what is to be analyzed is a hospital and/or operating room setting of medical and/or clinical workflow, and see paragraph 0188-0189 teaching multiple instances of training examples based on video clips of a medical or clinical workflow in a hospital room or operating room surroundings such as “a machine learning model may be trained using training examples to determine event types from videos, and the trained machine learning model may be used to analyze the video footage and determine the event type. An example of such training example may include a video clip depicting an event together with a label indicating the event type. In an additional example, the event characteristic may include information related to a medical instrument involved in the event (such as type of medical instrument, usage of the medical instrument, etc.), a machine learning model may be trained using training examples to identify such information related to medical instruments from videos, and the trained machine learning model may be used to analyze the video footage and determine the information related to a medical instrument involved in the event. An example of such training example may include video clip depicting an event including a usage of a medical instrument, together with a label indicative of the information related to the medical instrument”). Thus Wolf provides a known technique for generating application specific AI-based object recognition in hospital and/or operating room environments.
Therefore it would have been obvious for one of ordinary skill in the art before the effective filing date to modify Wrenninge with the known techniques of Wolf as doing so would be no more than application of a known technique to a base device ready for improvement which would yield predictable results and result in an improved system.  Here the predictable result of the application of Wolf’s technique to Wrenninge would be that the virtual environment modeled and parameterized and defined as in Wrenninge would be a virtual environment related to a hospital room or including the surroundings of an operating room in a hospital, such that the synthesized model of Wrenninge as so modified would generate a model of the workflow in a medical and/or clinical setting and would then vary such parameters and output annotated image data for training as in Wrenninge. As explained above Wrenninge already teaches that any environment including real environments including people and objects can be simulated and used as the basis for training data and a hospital or operating room environment is such an environment. This would result in an improved system as Wrenninge as modified would now be capable of providing training data and training AI models based on such specific data which would result in improved object detection models for that domain.
Regarding claim 26, Wrenninge as modified teaches all that is required as applied to claim 25 above and further teaches wherein the description of the operating room is captured by way of a scanning device or comprises digital images of the room (see Wrenninge as modified where Wrenninge is already taught as combined that a scanning device may be the basis for the object parameters of the surroundings as in paragraph 0041 teaching objects and parameters “can be…automatically extracted from a map or image (e.g., of a real-world scene, of a physical location” and furthermore the description of the operating room in Wrenninge is based on models which are based on the images of the operating room environment in Wolf such that when modeling the environment imaged in Wolf according to the Wrenninge technique this would mean the description of the operating room which is the predetermined surroundings of an operating room environment would be captured by way of a scanning device and as the surroundings would digitally depict the room being modeled then the description would comprise digital images of the room as well).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See Nasrollahi (US PGPUB No. 20230081908), teaching to generate synthetic images from synthesized models based on object parameters that are defined and can be varied to generated training data for machine learning/artificial intelligence based object detection models (see paragraphs 0003-0010 teaching “video analytics software modules can detect and identify activities or behaviour” and “example would be a video analytics module used to analyse video data in a hospital or care home environment that can identify a patient in distress, for example someone falling over. Another example would be a video analytics module used for traffic monitoring which can detect illegal traffic manoeuvres or traffic accidents” and see paragraphs 0011-0014 teaching “a method of training a machine learning algorithm to identify objects or activities in video surveillance data. The method includes generating a 3D simulation of a real environment from video surveillance data captured by at least one video surveillance camera installed in the real environment, synthesizing objects or activities within the simulated 3D environment, and using the synthesized objects or activities within the simulated 3D environment as training data to train the machine learning algorithm to identify objects or activities, wherein the synthesized objects or activities within the simulated 3D environment used as training data are all viewed from the same viewpoint in the simulated 3D environment” and paragraphs 0020-0050).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SCOTT E SONNERS whose telephone number is (571)270-7504. The examiner can normally be reached Mon-Friday 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached at (571) 272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SCOTT E SONNERS/Examiner, Art Unit 2613          

/XIAO M WU/Supervisory Patent Examiner, Art Unit 2613                                                                                                                                                                                                                                                                                                                                                                                                      

        1 US PGPUB No. 20220237410
        2 US PGPUB No. 20200237452
Read full office action
Prosecution Timeline

May 08, 2024
Application Filed
Aug 02, 2024
Response after Non-Final Action
Mar 19, 2026
Non-Final Rejection — §102, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/619,549
Patent 12561816
MOTION CAPTURE USING CONCAVE REFLECTOR STRUCTURES
2y 5m to grant Granted Feb 24, 2026
18/627,336
Patent 12561845
DISTORTION INFORMATION FOR EACH ITERATION OF VERTICES RECONSTRUCTION
2y 5m to grant Granted Feb 24, 2026
18/153,020
Patent 12524957
METHOD OF GENERATING THREE-DIMENSIONAL MODEL AND DATA PROCESSING DEVICE PERFORMING THE SAME
2y 5m to grant Granted Jan 13, 2026
17/799,604
Patent 12518408
VIDEO-BASED TRACKING SYSTEMS AND METHODS
2y 5m to grant Granted Jan 06, 2026
18/353,581
Patent 12519919
METHOD AND SYSTEM FOR CONVERTING SINGLE-VIEW IMAGE TO 2.5D VIEW FOR EXTENDED REALITY (XR) APPLICATIONS
2y 5m to grant Granted Jan 06, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
69%
Grant Probability
81%
With Interview (+12.0%)
3y 2m
Median Time to Grant
Low
PTA Risk
Based on 375 resolved cases by this examiner. Grant probability derived from career allow rate.