DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Allowable Subject Matter
Claim 6 is objected to as being dependent upon a rejected base claim, but would be allowable (for similar reasons to those of claim 7, below) if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claims 7-9 are allowed.
The following is an examiner' s statement of reasons for allowance:
Claim 7 is allowable over the prior art of record since the cited references taken individually or in combination fails to particularly disclose or suggest a system comprising: wherein the controller learns handling information for implementing optimal handling of the target including an optimal gripping position of the target by performing a simulation of handling of the target by a robot and further registers the learned handling information in the target master data, as presented in the environment of the remaining limitations of claim 7. It is noted that the closest prior art, Dal Mutto, shows a sensor that detects information regarding appearance of a handling target of a robot; and a robot controller that controls the robot so as to perform handling of the handling target on a basis of a detection result of the information regarding the appearance of the handling target and handling information of target master data acquired from a target digital twin model generation system, the target digital twin model generation system comprising: a light output configured to output planar light having a predetermined pattern and light, which includes visible light and invisible light to a target, having a plurality of wavelengths, from a plurality of illumination positions that surround the target at different timings; an imager configured to individually capture images of the target irradiated with the planar light and the target sequentially irradiated with the light having the plurality of wavelengths, at a plurality of imaging positions corresponding to the plurality of illumination positions in synchronization with timings at which the planar light and the light having the plurality of wavelengths are respectively output; and a controller configured to control the light output and the imager, wherein the controller acquires three-dimensional data indicating a three-dimensional shape over an entire circumference of a surface of the target on a basis of an imaging result of the target irradiated with the planar light, acquires two-dimensional data indicating a two-dimensional appearance over the entire circumference of the target viewed from the plurality of imaging positions on a basis of imaging results of the target sequentially irradiated with the light having the plurality of wavelengths, and generates a target digital twin model that reproduces appearance of the target in a computer-readable form by associating the three-dimensional data with the two-dimensional data, wherein the controller acquires texture data indicating texture of the surface of the target as the two-dimensional data on a basis of an imaging result of the target irradiated with the visible light and acquires optical absorption property data in which an optical absorption region on the surface of the target is visualized as the two-dimensional data on a basis of an imaging result of the target irradiated with the invisible light, wherein the controller acquires a product code and product information associated with the target on a basis of the additional information and further registers the acquired product code and product information in the target master data. However, Dal Mutto fails to disclose or suggest wherein the controller corrects the three-dimensional data based on a generator and a discriminator such that a difference discriminated by the discriminator becomes smaller, and generates the target digital twin model by associating the corrected three-dimensional data with the two-dimensional data, the generator generating a first appearance image indicating appearance of the target at a certain viewpoint from a model generated by pasting the texture data to the three-dimensional data, the discriminator discriminating the difference between the first appearance image generated by the generator and a second appearance image which is generated from the imaging result of the target irradiated with the visible light and which indicates appearance of the target at a same viewpoint as a viewpoint of the first appearance image, wherein the controller recognizes additional information added to the surface of the target as characters or figures on a basis of the two-dimensional appearance of the target acquired from the target digital twin model and generates target master data as a comprehensive database regarding the target in which the additional information is registered along with the three-dimensional data and the two-dimensional data, wherein the controller learns handling information for implementing optimal handling of the target including an optimal gripping position of the target by performing a simulation of handling of the target by a robot and further registers the learned handling information in the target master data.
Claims 8 and 9 depend from independent claim 7, either directly or indirectly, and are accordingly allowable.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
Claims 1 and 2 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Dal Mutto et al. (US Pub. 2020/0372626), hereinafter Dal Mutto.
Regarding claim 1, Dal Mutto discloses a target digital twin model generation system comprising: a light output configured to output planar light having a predetermined pattern and light, which includes visible light and invisible light to a target, having a plurality of wavelengths, from a plurality of illumination positions that surround the target at different timings (Paragraphs [0076]-[0079]: provide additional illumination by projecting a pattern that is designed to improve or optimize the performance of block matching algorithm that can capture small 3-D details such as the one described in U.S. Pat. No. 9,392,262 “System and Method for 3-D Reconstruction Using Multiple Multi-Channel Cameras,” issued on Jul. 12, 2016, the entire disclosure of which is incorporated herein by reference. Another approach projects a pattern that is purely used to provide a texture to the scene and particularly improve the depth estimation of texture-less regions by disambiguating portions of the scene that would otherwise appear the same…projection source 106 according to embodiments of the present invention may be configured to emit visible light (e.g., light within the spectrum visible to humans and/or other animals) or invisible light (e.g., infrared light) toward the scene imaged by the cameras 102 and 104. In other words, the projection source may have an optical axis substantially parallel to the optical axes of the cameras 102 and 104 and may be configured to emit light in the direction of the fields of view of the cameras 102 and 104…invisible light projection source may be better suited to for situations where the subjects are people (such as in a videoconferencing system) because invisible light would not interfere with the subject's ability to see, whereas a visible light projection source may shine uncomfortably into the subject's eyes or may undesirably affect the experience by adding patterns to the scene. Examples of systems that include invisible light projection sources are described, for example, in U.S. patent application Ser. No. 14/788,078 “Systems and Methods for Multi-Channel Imaging Based on Multiple Exposure Settings,” filed in the United States Patent and Trademark Office on Jun. 30, 2015, the entire disclosure of which is herein incorporated by reference…Active projection sources can also be classified as projecting static patterns, e.g., patterns that do not change over time, and dynamic patterns, e.g., patterns that do change over time. In both cases, one aspect of the pattern is the illumination level of the projected pattern. This may be relevant because it can influence the depth dynamic range of the depth camera system); an imager configured to individually capture images of the target irradiated with the planar light and the target sequentially irradiated with the light having the plurality of wavelengths, at a plurality of imaging positions corresponding to the plurality of illumination positions in synchronization with timings at which the planar light and the light having the plurality of wavelengths are respectively output (Paragraphs [0066]-[0071]: the image sensors 102a and 104a are conventional visible light sensors. In some embodiments of the present invention, the system includes one or more visible light cameras (e.g., RGB cameras) and, separately, one or more invisible light cameras (e.g., infrared cameras, where an IR band-pass filter is located across all over the pixels). In other embodiments of the present invention, the image sensors 102a and 104a are infrared (IR) light sensors. In some embodiments of the present invention, the image sensors 102a and 104a are infrared light (IR) sensors. In some embodiments (such as those in which the image sensors 102a and 104a are IR sensors) the depth camera 100 may include a third camera 105 including a color image sensor 105a (e.g., an image sensor configured to detect visible light in the red, green, and blue wavelengths, such as an image sensor arranged in a Bayer layout or RGBG layout) and an image signal processor 105b… embodiments in which the depth cameras 100 include color image sensors (e.g., RGB sensors or RGB-IR sensors), the color image data collected by the depth cameras 100 may supplement the color image data captured by the color cameras 150. In addition, in some embodiments in which the depth cameras 100 include color image sensors (e.g., RGB sensors or RGB-IR sensors), the color cameras 150 may be omitted from the system… detect the depth of a feature in a scene imaged by the cameras, the depth camera system determines the pixel location of the feature in each of the images captured by the cameras. The distance between the features in the two images is referred to as the disparity, which is inversely related to the distance or depth of the object; Paragraph [0093]: color camera and the depth camera can be synchronized and geometrically calibrated, allowing it to capture sequences of frames that are constituted by color images and corresponding depth maps, which can be geometrically aligned (e.g., each pixel or location of a depth map can be correlated with a corresponding color from a color image, thereby allowing capture of the surface colors of the scene). The combination of a depth map and a color image captured at substantially the same time as the depth map may be referred to as a “frame” of data. In this case, a color image with a depth map (or “depth image”) may be called an RGB-D frame, which contains color (RGB) and depth (D) information, as if both were acquired by a single camera with a single shutter and a single vantage point); and a controller configured to control the light output and the imager, wherein the controller acquires three-dimensional data indicating a three-dimensional shape over an entire circumference of a surface of the target on a basis of an imaging result of the target irradiated with the planar light, acquires two-dimensional data indicating a two-dimensional appearance over the entire circumference of the target viewed from the plurality of imaging positions on a basis of imaging results of the target sequentially irradiated with the light having the plurality of wavelengths, and generates a target digital twin model that reproduces appearance of the target in a computer-readable form by associating the three-dimensional data with the two-dimensional data (Fig. 13; Paragraph [0145]: in the embodiment shown in FIG. 13, the descriptor or feature vector is computed from 2-D views 16 of the 3-D model 10, as rendered by a view generation module in operation 1112. In operation 1114, the synthesized 2-D views are supplied to a descriptor generator to extract a descriptor or feature vector for each view. In operation 1116, the feature vectors for each view are combined (e.g., using max pooling, where a “pooled” feature vector is computed, where each position of the pooled feature vector is the maximum of the values at the corresponding position of the input feature vectors computed for each 2D view, as described in more detail below) to generate a descriptor for the 3-D model and to classify the object based on the descriptor; Paragraphs [0153]-[0155]: architecture of a classifier described above with respect to FIG. 12 can be applied to classifying multi-view shape representations of 3-D objects based on n different 2-D views of the object. For example, the first stage CNN.sub.1 can be applied independently to each of the n 2-D views used to represent the 3-D shape, thereby computing a set of n feature vectors (one for each of the 2-D views). Aspects of this technique are described in more detail in, for example, Su, H., Maji, S., Kalogerakis, E., & Learned-Miller, E. (2015). Multi-view convolutional neural networks for 3-D shape recognition. In Proceedings of the IEEE International Conference on Computer Vision (pp. 945-953). In some embodiments, the n separate feature vectors are combined using, for example, max pooling (see, e.g., Boureau, Y. L., Ponce, J., & LeCun, Y. (2010). A theoretical analysis of feature pooling in visual recognition. In Proceedings of the 27th international conference on machine learning…the selection of particular poses of the virtual cameras, e.g., the selection of which particular 2-D views to render, results in a descriptor F having properties that are substantially rotationally invariant. For example, considering a configuration where all the virtual cameras are located on a sphere (e.g., all arranged at poses that are at the same distance from the center of the 3-D model or a particular point p on the ground plane, and all having optical axes that intersect at the center of the 3-D model or at the particular point p on the ground plane). Another example of an arrangement with similar properties includes all of the virtual cameras located at the same elevation above the ground plane of the 3-D model, oriented toward the 3-D model (e.g., having optical axes intersecting with the center of the 3-D model), and at the same distance from the 3-D model, in which case any rotation of the object around a vertical axis (e.g., perpendicular to the ground plane) extending through the center of the 3-D model will result in essentially the same vector or descriptor F).
Regarding claim 2, Dal Mutto discloses the target digital twin model generation system according to claim 1, wherein the controller acquires texture data indicating texture of the surface of the target as the two-dimensional data on a basis of an imaging result of the target irradiated with the visible light and acquires optical absorption property data in which an optical absorption region on the surface of the target is visualized as the two-dimensional data on a basis of an imaging result of the target irradiated with the invisible light (Paragraph [0080]: Depth computations may fail in some region areas due to multiple factors, including: the mechanism used to compute depth (triangulation, with or without an active illuminator, or time of flight); the geometry of the scene (such as the angle between each surface element and the associated line of sight, or the presence of partial occlusion which may impede view by either sensor in a stereo system); and the reflectivity characteristics of the surface (such as the presence of a specular component which may hinder stereo matching or reflect away light from a projector, or a very low albedo causing insufficient light reflected by the surface). For those pixels of the depth image where depth computation fails or is unreliable, only color information may be available; Paragraphs [0090]-[0093]: Objects can typically be characterized by both specific surface colors (e.g., different colors on different portions of the surface of the object) and geometry (although these may be subject to variation between different instances of the same object, such as variations in the surface shape of a soft handbag or duffel bag based on the locations and depth of folds in the material). This type of information can be used to estimate the size and dimensions of the objects themselves, as described in more detail below…RGB-D camera according to some embodiments includes one or more color cameras (e.g., color camera 105), which acquire the color information of a scene imaged by the one or more color cameras and by one or more depth cameras (e.g., cameras 102 and 104), which acquire the geometry information (e.g., using infrared light). In some embodiments, the RGB-D camera includes one or more color cameras and one or more Infra-Red (IR) cameras, which, coupled with an IR structured-light illuminator (e.g., projection source 106), constitute the depth camera. The case in which there are two IR cameras and an IR structured-light illuminator is called active stereo…color camera and the depth camera can be synchronized and geometrically calibrated, allowing it to capture sequences of frames that are constituted by color images and corresponding depth maps, which can be geometrically aligned (e.g., each pixel or location of a depth map can be correlated with a corresponding color from a color image, thereby allowing capture of the surface colors of the scene). The combination of a depth map and a color image captured at substantially the same time as the depth map may be referred to as a “frame” of data. In this case, a color image with a depth map (or “depth image”) may be called an RGB-D frame, which contains color (RGB) and depth (D) information, as if both were acquired by a single camera with a single shutter and a single vantage point (even though the individual cameras).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 3-5 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Dal Mutto, in view of Saphier et al. (US Pub. 2021/0321872), hereinafter Saphier.
Regarding claim 3, Dal Mutto discloses the target digital twin model generation system according to claim 2.
Dal Mutto does not explicitly disclose wherein the controller corrects the three-dimensional data based on a generator and a discriminator such that a difference discriminated by the discriminator becomes smaller, and generates the target digital twin model by associating the corrected three-dimensional data with the two-dimensional data, the generator generating a first appearance image indicating appearance of the target at a certain viewpoint from a model generated by pasting the texture data to the three-dimensional data, the discriminator discriminating the difference between the first appearance image generated by the generator and a second appearance image which is generated from the imaging result of the target irradiated with the visible light and which indicates appearance of the target at a same viewpoint as a viewpoint of the first appearance image.
However, Saphier teaches scanning an object surface (Abstract), further comprising wherein the controller corrects the three-dimensional data based on a generator and a discriminator such that a difference discriminated by the discriminator becomes smaller, and generates the target digital twin model by associating the corrected three-dimensional data with the two-dimensional data, the generator generating a first appearance image indicating appearance of the target at a certain viewpoint from a model generated by pasting the texture data to the three-dimensional data, the discriminator discriminating the difference between the first appearance image generated by the generator and a second appearance image which is generated from the imaging result of the target irradiated with the visible light and which indicates appearance of the target at a same viewpoint as a viewpoint of the first appearance image (Paragraph [0260]: input intraoral scan data/3D surface data is limited to intraoral scans (e.g., height maps) and/or 3D surfaces (or projections of 3D surfaces onto one or more plane) generated by stitching together such intraoral scans. In further embodiments, input layers/data that are used (e.g., that is input one or more trained machine learning model) include color images (e.g., color 2D images) and/or images generated under specific lighting conditions (e.g., NIRI images). Scanner 150 may separately generate intraoral scans (which include height information) and color 2D images and/or other 2D images. The intraoral scans and 2D images may be generated close enough in time that they depict the same or close to the same surface. The 2D images (e.g., color 2D images) may provide additional data that improves distinction between teeth and gums, tongue, and so on due to differences in color between these objects; Paragraphs [0316]-[0317]: small difference at a dental site between earlier scans generated before a modification of the dental site and later scans generated after the modification of the dental site may be at an error level per surface point of the scans. However, differences will generally be detected for an area that includes multiple points rather than at a single point. Such differences of an area of a dental site can be represented by creating a difference map between the earlier scans (generated prior to the modification) and the later scans (generated after the modification). A low pass filter may be applied to the difference map to determine if differences are point differences or area differences. Point differences are generally noise, and area differences have a high probability of being actual differences in the dental site…intraoral scan application 115 may be able to detect particular types of common differences between a 3D model or 3D surface and intraoral scans generated after a change to a dental site depicted in the 3D model or 3D surface. For example, differences in some areas (like when taking out a dental retraction cord) will have a specific place and be around the tooth. Intraoral scan application may include one or more rules for detecting signs of such common differences and/or may include one or more machine learning models that have been trained to receive data from two intraoral scans (or data from two sets of intraoral scans) and to identify particular types of differences between the data from the two intraoral scans or from the two sets of intraoral scans; Paragraph [0373]: a generative adversarial network (GAN) is used for one or more machine learning models. A GAN is a class of artificial intelligence system that uses two artificial neural networks contesting with each other in a zero-sum game framework. The GAN includes a first artificial neural network that generates candidates and a second artificial neural network that evaluates the generated candidates. The GAN learns to map from a latent space to a particular data distribution of interest (a data distribution of changes to input images that are indistinguishable from photographs to the human eye), while the discriminative network discriminates between instances from a training dataset and candidates produced by the generator. The generative network's training objective is to increase the error rate of the discriminative network (e.g., to fool the discriminator network by producing novel synthesized instances that appear to have come from the training dataset). The generative network and the discriminator network are co-trained, and the generative network learns to generate images that are increasingly more difficult for the discriminative network to distinguish from real images (from the training dataset) while the discriminative network at the same time learns to be better able to distinguish between synthesized images and images from the training dataset. The two networks of the GAN are trained once they reach equilibrium. The GAN may include a generator network that generates artificial intraoral images and a discriminator network that segments the artificial intraoral images. In embodiments, the discriminator network may be a MobileNet). Saphier teaches that this will improve accuracy of scanning (Paragraphs [0260]-[0261]). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Dal Mutto with the features of above as taught by Saphier so as to improve accuracy of scanning as presented by Saphier.
Regarding claim 4, Dal Mutto, in view of Saphier teaches the target digital twin model generation system according to claim 3, Saphier discloses wherein the controller recognizes additional information added to the surface of the target as characters or figures on a basis of the two-dimensional appearance of the target acquired from the target digital twin model and generates target master data as a comprehensive database regarding the target in which the additional information is registered along with the three-dimensional data and the two-dimensional data (Paragraphs [0283]-[0285]: a doctor may take a pre-scan of a patient's dental arch (e.g., before any treatment is performed). A pre-scan 3D model of the patient's dental arch may be generated based on the pre-scan. The intraoral scan application 115 may save the pre-scan and/or pre-scan 3D model to a patient record, and may identify that saved scan/3D model as a pre-scan/pre-scan 3D model. In an example, a pre-scan 3D model may be generated before a tooth is ground to form a preparation tooth. The pre-scan 3D model may provide information for a shape, coloration, position, etc. of a tooth before that tooth is ground to form a preparation tooth. The pre-scan 3D model may then be used for various purposes, such as to determine how much tooth has been ground to generate the preparation, to determine a shape of a prosthodontic, and so on…intraoral scan application 115 may compare a current intraoral scan (or a 3D surface or 3D model generated from current intraoral scans) to previous intraoral scans (or 3D surfaces or 3D models generated from previous intraoral scans) of the patient. Based on the comparison, intraoral scan application 115 may determine differences between the current teeth of the patient and the previous teeth of the patient. These differences may indicate which tooth or teeth are being treated. This information on teeth being treated may be added to a prescription).
Regarding claim 5, Dal Mutto, in view of Saphier teaches the target digital twin model generation system according to claim 4, Dal Mutto discloses wherein the controller acquires a product code and product information associated with the target on a basis of the additional information and further registers the acquired product code and product information in the target master data (Paragraph [0096]: an electronic scale may provide measurements of the weight of the object, and a barcode decoding system may provide an identifier (e.g., a Universal Product Code or UPC) of the object in order to allow metadata about the object to be retrieved from a database or other data store. In some embodiments, the barcode decoding system may use an image of a barcode captured by a color camera of the depth cameras system (e.g., applying image rectification to a barcode appearing in a portion of the color image).
Regarding claim 10, Dal Mutto discloses a virtual shop generation system comprising: an acquirer configured to acquire target master data from a target digital twin model generation system (Paragraphs [0283]-[0285]: a doctor may take a pre-scan of a patient's dental arch (e.g., before any treatment is performed). A pre-scan 3D model of the patient's dental arch may be generated based on the pre-scan. The intraoral scan application 115 may save the pre-scan and/or pre-scan 3D model to a patient record, and may identify that saved scan/3D model as a pre-scan/pre-scan 3D model. In an example, a pre-scan 3D model may be generated before a tooth is ground to form a preparation tooth. The pre-scan 3D model may provide information for a shape, coloration, position, etc. of a tooth before that tooth is ground to form a preparation tooth. The pre-scan 3D model may then be used for various purposes, such as to determine how much tooth has been ground to generate the preparation, to determine a shape of a prosthodontic, and so on…intraoral scan application 115 may compare a current intraoral scan (or a 3D surface or 3D model generated from current intraoral scans) to previous intraoral scans (or 3D surfaces or 3D models generated from previous intraoral scans) of the patient. Based on the comparison, intraoral scan application 115 may determine differences between the current teeth of the patient and the previous teeth of the patient. These differences may indicate which tooth or teeth are being treated. This information on teeth being treated may be added to a prescription); and a virtual shop generator configured to generate a virtual shop that virtually reproduces a store in which products associated with product codes and product information registered in the target master data acquired by the acquirer are arbitrarily displayed (Paragraph [0097]: some aspects of embodiments of the present invention relate to computing bounding boxes of objects (e.g., arbitrary objects). FIG. 3 is a flowchart of a method for measuring dimensions of object according to one embodiment of the present invention; Paragraph [0130]: embodiments of the present invention relate to systems and methods for using higher level data, in particular, a classification and/or identification of an object to apply heuristics or to retrieve other stored information regarding the target object. For example, manufactured products are, generally, substantially physically identical across all instances of those products. For example, continuing the above example of the bottle of detergent, all such bottles of detergent corresponding to a particular stock keeping unit (SKU) are substantially identical in size. Accordingly, if the target object can be identified as an instance of a particular known SKU, then the dimensions of the target object can be extrapolated as being the same as other instances of the SKU. As another example, aluminum beverage cans appear in very few standard sizes, of which the 12 oz variety is the most prevalent. Accordingly, if a target object is identified, based on partial information, as being a beverage can, then the object may be extrapolated as having a particular shape and dimensions consistent with the known size of an intact beverage can), wherein the target digital twin model generation system comprises: a light output configured to output planar light having a predetermined pattern and light, which includes visible light and invisible light to a target, having a plurality of wavelengths, from a plurality of illumination positions that surround the target at different timings (Paragraphs [0076]-[0079]: provide additional illumination by projecting a pattern that is designed to improve or optimize the performance of block matching algorithm that can capture small 3-D details such as the one described in U.S. Pat. No. 9,392,262 “System and Method for 3-D Reconstruction Using Multiple Multi-Channel Cameras,” issued on Jul. 12, 2016, the entire disclosure of which is incorporated herein by reference. Another approach projects a pattern that is purely used to provide a texture to the scene and particularly improve the depth estimation of texture-less regions by disambiguating portions of the scene that would otherwise appear the same…projection source 106 according to embodiments of the present invention may be configured to emit visible light (e.g., light within the spectrum visible to humans and/or other animals) or invisible light (e.g., infrared light) toward the scene imaged by the cameras 102 and 104. In other words, the projection source may have an optical axis substantially parallel to the optical axes of the cameras 102 and 104 and may be configured to emit light in the direction of the fields of view of the cameras 102 and 104…invisible light projection source may be better suited to for situations where the subjects are people (such as in a videoconferencing system) because invisible light would not interfere with the subject's ability to see, whereas a visible light projection source may shine uncomfortably into the subject's eyes or may undesirably affect the experience by adding patterns to the scene. Examples of systems that include invisible light projection sources are described, for example, in U.S. patent application Ser. No. 14/788,078 “Systems and Methods for Multi-Channel Imaging Based on Multiple Exposure Settings,” filed in the United States Patent and Trademark Office on Jun. 30, 2015, the entire disclosure of which is herein incorporated by reference…Active projection sources can also be classified as projecting static patterns, e.g., patterns that do not change over time, and dynamic patterns, e.g., patterns that do change over time. In both cases, one aspect of the pattern is the illumination level of the projected pattern. This may be relevant because it can influence the depth dynamic range of the depth camera system); an imager configured to individually capture images of the target irradiated with the planar light and the target sequentially irradiated with the light having the plurality of wavelengths, at a plurality of imaging positions corresponding to the plurality of illumination positions in synchronization with timings at which the planar light and the light having the plurality of wavelengths are respectively output (Paragraphs [0066]-[0071]: the image sensors 102a and 104a are conventional visible light sensors. In some embodiments of the present invention, the system includes one or more visible light cameras (e.g., RGB cameras) and, separately, one or more invisible light cameras (e.g., infrared cameras, where an IR band-pass filter is located across all over the pixels). In other embodiments of the present invention, the image sensors 102a and 104a are infrared (IR) light sensors. In some embodiments of the present invention, the image sensors 102a and 104a are infrared light (IR) sensors. In some embodiments (such as those in which the image sensors 102a and 104a are IR sensors) the depth camera 100 may include a third camera 105 including a color image sensor 105a (e.g., an image sensor configured to detect visible light in the red, green, and blue wavelengths, such as an image sensor arranged in a Bayer layout or RGBG layout) and an image signal processor 105b… embodiments in which the depth cameras 100 include color image sensors (e.g., RGB sensors or RGB-IR sensors), the color image data collected by the depth cameras 100 may supplement the color image data captured by the color cameras 150. In addition, in some embodiments in which the depth cameras 100 include color image sensors (e.g., RGB sensors or RGB-IR sensors), the color cameras 150 may be omitted from the system… detect the depth of a feature in a scene imaged by the cameras, the depth camera system determines the pixel location of the feature in each of the images captured by the cameras. The distance between the features in the two images is referred to as the disparity, which is inversely related to the distance or depth of the object; Paragraph [0093]: color camera and the depth camera can be synchronized and geometrically calibrated, allowing it to capture sequences of frames that are constituted by color images and corresponding depth maps, which can be geometrically aligned (e.g., each pixel or location of a depth map can be correlated with a corresponding color from a color image, thereby allowing capture of the surface colors of the scene). The combination of a depth map and a color image captured at substantially the same time as the depth map may be referred to as a “frame” of data. In this case, a color image with a depth map (or “depth image”) may be called an RGB-D frame, which contains color (RGB) and depth (D) information, as if both were acquired by a single camera with a single shutter and a single vantage point); and a controller configured to control the light output and the imager, wherein the controller acquires three-dimensional data indicating a three-dimensional shape over an entire circumference of a surface of the target on a basis of an imaging result of the target irradiated with the planar light, acquires two-dimensional data indicating a two-dimensional appearance over the entire circumference of the target viewed from the plurality of imaging positions on a basis of imaging results of the target sequentially irradiated with the light having the plurality of wavelengths, and generates a target digital twin model that reproduces appearance of the target in a computer-readable form by associating the three-dimensional data with the two-dimensional data (Fig. 13; Paragraph [0145]: in the embodiment shown in FIG. 13, the descriptor or feature vector is computed from 2-D views 16 of the 3-D model 10, as rendered by a view generation module in operation 1112. In operation 1114, the synthesized 2-D views are supplied to a descriptor generator to extract a descriptor or feature vector for each view. In operation 1116, the feature vectors for each view are combined (e.g., using max pooling, where a “pooled” feature vector is computed, where each position of the pooled feature vector is the maximum of the values at the corresponding position of the input feature vectors computed for each 2D view, as described in more detail below) to generate a descriptor for the 3-D model and to classify the object based on the descriptor; Paragraphs [0153]-[0155]: architecture of a classifier described above with respect to FIG. 12 can be applied to classifying multi-view shape representations of 3-D objects based on n different 2-D views of the object. For example, the first stage CNN.sub.1 can be applied independently to each of the n 2-D views used to represent the 3-D shape, thereby computing a set of n feature vectors (one for each of the 2-D views). Aspects of this technique are described in more detail in, for example, Su, H., Maji, S., Kalogerakis, E., & Learned-Miller, E. (2015). Multi-view convolutional neural networks for 3-D shape recognition. In Proceedings of the IEEE International Conference on Computer Vision (pp. 945-953). In some embodiments, the n separate feature vectors are combined using, for example, max pooling (see, e.g., Boureau, Y. L., Ponce, J., & LeCun, Y. (2010). A theoretical analysis of feature pooling in visual recognition. In Proceedings of the 27th international conference on machine learning…the selection of particular poses of the virtual cameras, e.g., the selection of which particular 2-D views to render, results in a descriptor F having properties that are substantially rotationally invariant. For example, considering a configuration where all the virtual cameras are located on a sphere (e.g., all arranged at poses that are at the same distance from the center of the 3-D model or a particular point p on the ground plane, and all having optical axes that intersect at the center of the 3-D model or at the particular point p on the ground plane). Another example of an arrangement with similar properties includes all of the virtual cameras located at the same elevation above the ground plane of the 3-D model, oriented toward the 3-D model (e.g., having optical axes intersecting with the center of the 3-D model), and at the same distance from the 3-D model, in which case any rotation of the object around a vertical axis (e.g., perpendicular to the ground plane) extending through the center of the 3-D model will result in essentially the same vector or descriptor F), wherein the controller acquires texture data indicating texture of the surface of the target as the two-dimensional data on a basis of an imaging result of the target irradiated with the visible light and acquires optical absorption property data in which an optical absorption region on the surface of the target is visualized as the two-dimensional data on a basis of an imaging result of the target irradiated with the invisible light (Paragraph [0080]: Depth computations may fail in some region areas due to multiple factors, including: the mechanism used to compute depth (triangulation, with or without an active illuminator, or time of flight); the geometry of the scene (such as the angle between each surface element and the associated line of sight, or the presence of partial occlusion which may impede view by either sensor in a stereo system); and the reflectivity characteristics of the surface (such as the presence of a specular component which may hinder stereo matching or reflect away light from a projector, or a very low albedo causing insufficient light reflected by the surface). For those pixels of the depth image where depth computation fails or is unreliable, only color information may be available; Paragraphs [0090]-[0093]: Objects can typically be characterized by both specific surface colors (e.g., different colors on different portions of the surface of the object) and geometry (although these may be subject to variation between different instances of the same object, such as variations in the surface shape of a soft handbag or duffel bag based on the locations and depth of folds in the material). This type of information can be used to estimate the size and dimensions of the objects themselves, as described in more detail below…RGB-D camera according to some embodiments includes one or more color cameras (e.g., color camera 105), which acquire the color information of a scene imaged by the one or more color cameras and by one or more depth cameras (e.g., cameras 102 and 104), which acquire the geometry information (e.g., using infrared light). In some embodiments, the RGB-D camera includes one or more color cameras and one or more Infra-Red (IR) cameras, which, coupled with an IR structured-light illuminator (e.g., projection source 106), constitute the depth camera. The case in which there are two IR cameras and an IR structured-light illuminator is called active stereo…color camera and the depth camera can be synchronized and geometrically calibrated, allowing it to capture sequences of frames that are constituted by color images and corresponding depth maps, which can be geometrically aligned (e.g., each pixel or location of a depth map can be correlated with a corresponding color from a color image, thereby allowing capture of the surface colors of the scene). The combination of a depth map and a color image captured at substantially the same time as the depth map may be referred to as a “frame” of data. In this case, a color image with a depth map (or “depth image”) may be called an RGB-D frame, which contains color (RGB) and depth (D) information, as if both were acquired by a single camera with a single shutter and a single vantage point (even though the individual cameras), wherein the controller acquires a product code and product information associated with the target on a basis of the additional information and further registers the acquired product code and product information in the target master data (Paragraph [0096]: an electronic scale may provide measurements of the weight of the object, and a barcode decoding system may provide an identifier (e.g., a Universal Product Code or UPC) of the object in order to allow metadata about the object to be retrieved from a database or other data store. In some embodiments, the barcode decoding system may use an image of a barcode captured by a color camera of the depth cameras system (e.g., applying image rectification to a barcode appearing in a portion of the color image).
Dal Mutto does not explicitly disclose wherein the controller corrects the three-dimensional data based on a generator and a discriminator such that a difference discriminated by the discriminator becomes smaller, and generates the target digital twin model by associating the corrected three-dimensional data with the two-dimensional data, the generator generating a first appearance image indicating appearance of the target at a certain viewpoint from a model generated by pasting the texture data to the three-dimensional data, the discriminator discriminating the difference between the first appearance image generated by the generator and a second appearance image which is generated from the imaging result of the target irradiated with the visible light and which indicates appearance of the target at a same viewpoint as a viewpoint of the first appearance image, wherein the controller recognizes additional information added to the surface of the target as characters or figures on a basis of the two-dimensional appearance of the target acquired from the target digital twin model and generates target master data as a comprehensive database regarding the target in which the additional information is registered along with the three-dimensional data and the two-dimensional data.
However, Saphier teaches scanning an object surface (Abstract), further comprising wherein the controller corrects the three-dimensional data based on a generator and a discriminator such that a difference discriminated by the discriminator becomes smaller, and generates the target digital twin model by associating the corrected three-dimensional data with the two-dimensional data, the generator generating a first appearance image indicating appearance of the target at a certain viewpoint from a model generated by pasting the texture data to the three-dimensional data, the discriminator discriminating the difference between the first appearance image generated by the generator and a second appearance image which is generated from the imaging result of the target irradiated with the visible light and which indicates appearance of the target at a same viewpoint as a viewpoint of the first appearance image (Paragraph [0260]: input intraoral scan data/3D surface data is limited to intraoral scans (e.g., height maps) and/or 3D surfaces (or projections of 3D surfaces onto one or more plane) generated by stitching together such intraoral scans. In further embodiments, input layers/data that are used (e.g., that is input one or more trained machine learning model) include color images (e.g., color 2D images) and/or images generated under specific lighting conditions (e.g., NIRI images). Scanner 150 may separately generate intraoral scans (which include height information) and color 2D images and/or other 2D images. The intraoral scans and 2D images may be generated close enough in time that they depict the same or close to the same surface. The 2D images (e.g., color 2D images) may provide additional data that improves distinction between teeth and gums, tongue, and so on due to differences in color between these objects; Paragraphs [0316]-[0317]: small difference at a dental site between earlier scans generated before a modification of the dental site and later scans generated after the modification of the dental site may be at an error level per surface point of the scans. However, differences will generally be detected for an area that includes multiple points rather than at a single point. Such differences of an area of a dental site can be represented by creating a difference map between the earlier scans (generated prior to the modification) and the later scans (generated after the modification). A low pass filter may be applied to the difference map to determine if differences are point differences or area differences. Point differences are generally noise, and area differences have a high probability of being actual differences in the dental site…intraoral scan application 115 may be able to detect particular types of common differences between a 3D model or 3D surface and intraoral scans generated after a change to a dental site depicted in the 3D model or 3D surface. For example, differences in some areas (like when taking out a dental retraction cord) will have a spe