DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are: an acquisition unit that acquires data and an output unit that outputs at least one of data relating to one change object or a cognitive parameter in claims 1-17; and a display control unit that controls the display apparatus to display a target object and change the target object in claims 8-9.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claim 19 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.
Regarding claim 19, the broadest reasonable interpretation of the term “a computer-readable recording medium”, as claimed, covers forms of non-transitory tangible media and transitory propagating signals per se in view of the ordinary and customary meaning of computer readable media. The specification describes in paragraph [0078] “The storage unit 24 is a nonvolatile storage device. For the storage unit 24, for example, a recording medium using a solid-state element such as a solid state drive (SSD) or a magnetic recording medium such as a hard disk drive (HDD) is used. In addition, the type or the like of the recording medium used as the storage unit 24 is not limited, and for example, any recording medium for recording data in a non-transitory manner may be used”. Further, in paragraph [0081] of the specification, it is recited “the storage unit 24 corresponds to a computer-readable recording medium on which programs are recorded”. The specification does not explicitly exclude the recording medium to include transitory propagating signals. Therefore, the broadest reasonable interpretation of “computer-readable recording medium” as claimed and described in the specification, could cover transitory propagating signals, which are non-statutory. Therefore, the claims are rejected under 35 USC 101 as covering non-statutory subject matter.
Claim 20 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim(s) recite(s) at least one step or act for generating a model representing a relationship between the distance in the latent space and a degree of recognition with respect to a change of the visual object based on the distance on a basis of the acquire data. Thus, the claims is a process, which is a statutory category of invention.
The claim recites the step of generating data relating to each of a first visual object and a second visual object that are represented by different points in a latent space relating to a visual object, which under the broadest reasonable interpretation is merely plotting of data representing the first and second visual objects in a latent space. This generating step is a mental process that can be performed using a pen and a paper. Therefore, this step falls within the mental process grouping of abstract ideas. The claim further recites the step of generating a model representing a relationship between the distance in the latent space and a degree of recognition with respect to a change of the visual object based on the distance on a basis of the acquire data. Under the broadest reasonable interpretation, this step generates a model representing a relationship between different data and is merely a mathematical concept. Therefore, this step falls within the mathematical concept grouping of abstract ideas.
The claim further recites the additional limitation of acquiring data in which a determination result of a test is associated with a distance between the points representing the first visual object and the second visual object in the latent space, the test being for allowing a tester to determine presence or absence of a cognitive difference between the first visual object and the second visual object or a degree of the cognitive difference. This limitation amounts to mere data gathering. It is necessary to acquire data in order to use the recited judicial exception to generate the claimed model. This limitation does not impose any other meaningful limits on the claim. Therefore, this additional limitation is insignificant extra-solution activity. See MPEP 2106.05(g).
The data gathering activities in the step of acquiring data based on test result is recited at a high level of generality. Therefore, this limitation remains insignificant extra solution activity and does not amount to significantly more. Thus, claim 20 is directed to a judicial exception and is ineligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1, 10-11, 15, 18 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lin et al. (US 2022/0122232, hereinafter Lin), in view of Fradkin et al. (US 2023/0325678, hereianfter Fradkin), and further in view of Edi et al. (US 2019/0116290, hereinafter Edi).
Regarding claim 1, Lin teaches an information processing apparatus (fig. 1A: image editing system 102), comprising:
an acquisition unit (this element is interpreted under 35 USC 112(f) as dedicated hardware such as an integrated circuit (IC); image editing system 102; projection subsystem 110, fig. 1A) that acquires data relating to an input object as a visual object (input image 408, fig. 4; [0072]: The projection subsystem 110 includes hardware and/or software configured to identify and transform latent space representations of images. The projection subsystem 110 receives as input the input image 106 and generates as output a modified latent space representation of the input image 117, which is a vector string of numbers reflecting edits to be applied to the input image 106); and
an output unit (this element is interpreted under 35 USC 112(f) as dedicated hardware such as an integrated circuit (IC); the image generation subsystem 130, fig. 1A) that outputs ([0080]: the image generation subsystem 130 includes hardware and/or software configured to generate an output image 150 based on input code (e.g., the modified latent space representation 117)), on a basis of a model (machine learning model, [0049]; [0074]: The encoder 112 is a machine learning model trained to generate such a latent space representation. The encoder 112 may, for example, be a feed forward network trained to encode the input image 106. Given an input image 106 and a generator 132, the encoder discovers a latent space representation of the input image z, such that when the latent space representation of the input image z is input to the generator 132, the resulting generated image 139 perceptually resembles the target input image 106; [0081]: The generator 132 includes a machine learning model which has been trained to generate a generated image 139 based on input latent code. In some implementations, the generator 132 is a neural network. The generator 132 is pretrained to generate data that is similar to a training set. Depending on the type of image to be edited by the image editing system 102, the generator may be trained to generate an image of a human face, a landscape, a dog, a cat, a shoe, and so forth. In some aspects, the generator 132 is trained to generate a specific type of image, as such targeted training can produce very realistic results; [0083]: The training subsystem 140 includes hardware and/or software configured to train one or more machine learning models as used by the image editing system 102. The training subsystem 140 includes a discriminator 136. The discriminator 136 is part of the GAN 138 including the generator 132, and evaluates the output of the generator 132 to train the generator 132. The discriminator 136 compares images produced by the generator 132 to real images, and the generator 132 works to “trick” the discriminator into determining that a generated image is actually a real image) representing a relationship between a distance in a latent space relating to the visual object (distance is represented by vector product of the latent space representation and the filtering vector; [0049]: The image editing system applies a machine learning model to the input image to generate the latent space representation of the input image. The latent space representation can be inferred by optimization and/or generated using an encoder, as described herein. The image editing system applies the learnt filtering vector to the latent space representation of the input image. Applying the filtering vector may be performed using vector addition to add the latent space representation to the filtering vector times a scalar used to control the degree of change of the target attribute to be edited; [0052]: In certain embodiments, for an attribute in the image that can be edited, the image editing system computes a metric for the attribute as a function of a latent space representation of the image and a filtering vector trained for editing the image. In particular, using the normal vector and latent space representation described above, it has been determined that the vector product of the latent space representation and the filtering vector represents the distance to the separating hyperplane in the latent space dividing images with and without a particular attribute. This distance to the separating hyperplane highly correlates to the intensity of the attribute value. For example, traveling in latent space away from the hyperplane discovered for the age attribute in one direction, the images begin to have features associated with increasing age. This can be used to identify an attribute intensity that reflects the magnitude of the attribute value in the attribute for the input image. For example, if the product of the latent space representation and the filtering vector is relatively high, then the image editing system can determine that this is a picture of an older person) and a degree of (degree of change of the target attribute is controlled by the calculated distance), at least one of data relating to at least one change object (modified image 410, fig. 4) in which the input object is changed in the latent space (input image is changed to a modified image such as output image as shown in fig. 4 by manipulating the slider corresponding to particular attribute) in accordance with instruction information including an instruction value (threshold value) relating to the degree of recognition (based on a threshold value identified by the image editing system or manually selected by an administrator, the user manipulates the slider of an attribute within a range determined by the threshold value and modifies the input image 408 and output a modified image 410; [0051]: The attribute range available can represent permissible values of the scalar controlling the degree of attribute modification. For example, when editing images via a slider-based editing interface as illustrated in FIG. 4, the dynamic range selection is used to automatically place an upper and/or lower bound on the values of the slider; [0058]: As another example, thresholds are determined for an attribute that can be edited based upon the training data, where the thresholds define a range of editable values for the attribute such that the results from the edits appear realistic (i.e., do not lose their realism); [0069]: The edit attribute 122A is a target attribute to be edited. For example, for an image of a human face, the smile, hair length, age, and gender are examples of attributes that can be selected for editing. For an image of a shoe, attributes that can be edited include heel size, whether there is an open toe, and the color. The edit magnitude 122B is a degree of change to make to the edit attribute 122A. For example, a user can interact with a slider of the editor interface 104 to indicate that the smile should be increased or decreased by a certain amount. In some implementations, the attribute modifier supports multiple attribute editing—for example, the attribute modifier will receive indications of several edit attributes 122A and edit magnitudes 122B that are processed together (e.g., increase smile by +1 unit and decrease age by—2 units); [0100]: For example, based on a user moving a slider for a particular attribute, the image editing system that attribute as the target attribute to be edited. Based on a degree of modification specified by the user input (e.g., how far the slider is moved), the image editing system may increase or decrease the magnitude for the edit; [0128]: image editing is performed by applying a filtering vector multiplied by a scalar to control the degree of modification; [0132]: In some embodiments, at 1004, the image editing system identifies a threshold value for the attribute. For example, the range adapter performs statistical analysis of distance from hyperplane as a function of age for a set of images to determine a threshold above which there are few images. Alternatively, the range adapter analyzes distance from hyperplane versus attribute score to find a threshold near the maximum and minimum attribute scores in the data set. The threshold, η can be determined using a distance vs attribute score plot as illustrated in FIG. 11. In this example, η=±4.5 will lead to a smaller filter range for some images with relatively young or old ages. Alternatively, or additionally, an administrator can manually select a desired threshold).
Lin does not explicitly teach the model represents a relationship between distance in a latent space relating to the visual object and a degree of recognition with respect to a change of the visual object based on the distance, and to output a cognitive parameter representing the degree of recognition with respect to a change from the input object to a reference object corresponding to the input object on the basis of the model.
Fradkin teaches a model (data protector 121, fig. 3; [0022]: data protector 121 is configured to include interpretable models (e.g., deep learning or neural network models) that are trained and leveraged for identification and prevention of data poisoning and backdoor insertion) representing a relationship between a distance (perceptual distance) in a latent space relating to the visual object and a degree of recognition (amount of change in perceptual similarity) with respect to a change of the visual object based on the distance (data protector employs latent space embeddings for training image data where distances between inputs corresponds to an amount of change in the perceptual similarity of the images; [0023]: To detect adversarial examples characterized by small modifications of input leading to significantly different model output, data protector 121 employs latent space embedding for training data (e.g., image and audio data) where distances correspond to dissimilarities in perception or meaning within the current context. Perceptual distance metrics between inputs, regardless of whether they are on the manifold of natural images, can be informative of perceptual similarity between inputs and allows creation of meaningful latent spaces where distance corresponds to amount of change in perception or meaning … Embedding data into such a latent space would also make predictive models and the detector 121 more robust and significantly smaller, simplifying computation of robustness guarantees. Perceptual distance may be defined via a dynamic partial function. Another approach models the image space as a fiber bundle, where the base/projection space corresponds to the perception-sensitive latent space; [0036]: i) a meaningful latent space should have short distances between similar instances, and long distances between instances of different types; and (ii) interpretable models are used to allow a check for whether the models are focusing on the appropriate aspects of the data, or picking up on spurious associations, backdoor triggers or mislabeled training data; [0040]: Perceptually-compact latent space—In an embodiment, data protector 121 implements latent space embedding to create meaningful perceptually-compact latent space. Ideally, distances within the latent space of a neural network should represent distances in the space of concepts or perceptions). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply Fradkin’s knowledge of using a model that represents distance relating to an image in the latent space and an amount of change in perceptual similarity with respect to a change in the image based on distance and modify the system of Lin because such a system improves interpretability of the latent spaces of the interpretable neural networks, where the interpretable models are complemented by label correction and anomaly detection for identifying potential cases of data poisoning ([0039]).
Edi teaches a cognitive parameter (preserved or not preserved is determined as a sentiment value between an image and a modified image) representing the degree of recognition (extent to which the meaning of the original image is preserved) with respect to a change from the input object (original image i 200, fig. 3) to a reference object (morphed image i* 202, fig. 3) corresponding to the input object ([0006]: analyse an image to determine an image scene category; and generate an anonymized image based on a determined image scene category by an application of a morphing model to the image as a whole, such that the anonymised image is between the image and a generic image associated with the identified scene category; [0023]: apply a sentiment detector deep neural network to determine an anonymized image sentiment value and an image sentiment value; and compare the anonymized image sentiment value and the image sentiment value to determine whether the sentiment associated with the image in the anonymized image is preserved; [0088]: The evaluator takes as input the original image i and the anonymized image i*, and may comprise a data utility evaluator configured to evaluate the extent to which the meaning of the original image is preserved after anonymization; [0091]: The sentiment detector 403 is configured to receive the original image i and the anonymized image i* and verify that the sentiment aroused by the original image is preserved after anonymization … The output of the sentiment detector is a value s with respect to the original image i and value s* with respect to the anonymized image i* which may be output to a sentiment comparator 407. The values s and s* reflecting how positive the sentiment of the image is. The data utility evaluation module may further comprise a sentiment comparator 407. The sentiment comparator 407 may be configured to receive the original image sentiment value s and anonymized image sentiment value s* and compare them to determine the distance between them. In some embodiments the distance is compared against a threshold value to determine whether the sentiment is maintained between the original and anonymized images; [0110]: The evaluation may then perform an evaluation operation to determine the extent to which the meaning of the original image is preserved in the anonymized image; [0111]: The operation of evaluating the extent to which the meaning of the original image is preserved in the anonymized image is shown in FIG. 8 by step 803; [0113]: Where the meaning of the original image within the anonymized image is not preserved then the anonymization operation may pass to a fail state. In other words there is a lack of match in the output from the scene detector when the input is the original image and the anonymized image or the sentiment value output based on the original image and the anonymized image differs significantly (and is above a determined threshold value)). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply Edi’s knowledge of determining if the extent to which the meaning of an original image is preserved as taught and modify the system of Lin and Fradkin because such a system enhances the user’s experience by suggesting whether the morphed image preserves the sentiment of the original image.
Claims 18 and 19 are similar in scope to claim 1 and therefore the examiner provides similar rationale to reject claims 18 and 19.
Regarding claim 10, the combination of Lin, Fradkin and Edi teaches the information processing apparatus according to claim 1, wherein the output unit outputs, as the cognitive parameter (preserved or not preserved is determined as a sentiment value between an image and a modified image), a change detection rate with respect to an overall visual change caused by changing the input object to the reference object (extent to which the meaning of original image is preserved or not preserved is functionally analogous to a change detection rate with respect to the image; Edi - [0006]: analyse an image to determine an image scene category; and generate an anonymized image based on a determined image scene category by an application of a morphing model to the image as a whole, such that the anonymised image is between the image and a generic image associated with the identified scene category; Edi - [0023]: apply a sentiment detector deep neural network to determine an anonymized image sentiment value and an image sentiment value; and compare the anonymized image sentiment value and the image sentiment value to determine whether the sentiment associated with the image in the anonymized image is preserved; Edi - [0088]: The evaluator takes as input the original image i and the anonymized image i*, and may comprise a data utility evaluator configured to evaluate the extent to which the meaning of the original image is preserved after anonymization; Edi - [0091]: The sentiment detector 403 is configured to receive the original image i and the anonymized image i* and verify that the sentiment aroused by the original image is preserved after anonymization … The output of the sentiment detector is a value s with respect to the original image i and value s* with respect to the anonymized image i* which may be output to a sentiment comparator 407. The values s and s* reflecting how positive the sentiment of the image is. The data utility evaluation module may further comprise a sentiment comparator 407. The sentiment comparator 407 may be configured to receive the original image sentiment value s and anonymized image sentiment value s* and compare them to determine the distance between them. In some embodiments the distance is compared against a threshold value to determine whether the sentiment is maintained between the original and anonymized images; Edi - [0110]: The evaluation may then perform an evaluation operation to determine the extent to which the meaning of the original image is preserved in the anonymized image; Edi - [0111]: The operation of evaluating the extent to which the meaning of the original image is preserved in the anonymized image is shown in FIG. 8 by step 803; Edi - [0113]: Where the meaning of the original image within the anonymized image is not preserved then the anonymization operation may pass to a fail state. In other words there is a lack of match in the output from the scene detector when the input is the original image and the anonymized image or the sentiment value output based on the original image and the anonymized image differs significantly (and is above a determined threshold value)).
Regarding claim 11, the combination of Lin, Fradkin and Edi teaches the information processing apparatus according to claim 10, wherein the output unit outputs, a first change detection rate (preserved or not preserved is determined as a sentiment value between an image and a modified image; extent to which the meaning of original image is preserved or not preserved is functionally analogous to a change detection rate with respect to the image) with respect to a visual change associated with first change processing of changing the input object to the reference object at one time (a change in an extent to which the original meaning of the input image i is preserved in an output morphed image i* that is changed on the basis of an anonymization parameter; Edi - [0006]: analyse an image to determine an image scene category; and generate an anonymized image based on a determined image scene category by an application of a morphing model to the image as a whole, such that the anonymised image is between the image and a generic image associated with the identified scene category; Edi - [0023]: apply a sentiment detector deep neural network to determine an anonymized image sentiment value and an image sentiment value; and compare the anonymized image sentiment value and the image sentiment value to determine whether the sentiment associated with the image in the anonymized image is preserved; Edi - [0088]: The evaluator takes as input the original image i and the anonymized image i*, and may comprise a data utility evaluator configured to evaluate the extent to which the meaning of the original image is preserved after anonymization; Edi - [0091]: The sentiment detector 403 is configured to receive the original image i and the anonymized image i* and verify that the sentiment aroused by the original image is preserved after anonymization … The output of the sentiment detector is a value s with respect to the original image i and value s* with respect to the anonymized image i* which may be output to a sentiment comparator 407. The values s and s* reflecting how positive the sentiment of the image is. The data utility evaluation module may further comprise a sentiment comparator 407. The sentiment comparator 407 may be configured to receive the original image sentiment value s and anonymized image sentiment value s* and compare them to determine the distance between them. In some embodiments the distance is compared against a threshold value to determine whether the sentiment is maintained between the original and anonymized images; Edi - [0110]: The evaluation may then perform an evaluation operation to determine the extent to which the meaning of the original image is preserved in the anonymized image; Edi - [0111]: The operation of evaluating the extent to which the meaning of the original image is preserved in the anonymized image is shown in FIG. 8 by step 803; Edi - [0113]: Where the meaning of the original image within the anonymized image is not preserved then the anonymization operation may pass to a fail state. In other words there is a lack of match in the output from the scene detector when the input is the original image and the anonymized image or the sentiment value output based on the original image and the anonymized image differs significantly (and is above a determined threshold value)).
Regarding claim 15, the combination of Lin, Fradkin and Edi teaches the information processing apparatus according to claim 10, wherein the reference object is an object input by a user or an object obtained by changing the input object (morphed image i* 202 as shown in fig. 3 is obtained by changing the input image i 200 using an anonymization parameter α) in accordance with an amount of change input by the user (anonymization parameter α is functionally equivalent to the amount of change input by the user; Edi – [0077]: the anonymization parameter α input 309 may be provided by a user input configured to control the degree of anonymization of the images; Edi – [0083]: The morphing module 307 may be a scene-specific morphing module M selected by from pool of pre-trained models. The morphing module 307 receives the image i 200 and the anonymization parameter α∈[0,1] used to select the desired anonymization level that needs to be applied to the picture. The morphing module 307 is then configured to process the input original image i 200 and output an anonymized picture i* 202 which is a “generalized” version of i.; Edi - [0085]: The initialization of f is of significance to the anonymization process. If f were to be initialized with random noise, then the resulting G(f) would be the average representation of category h (of, e.g., a garden). Instead, since f is initialized with i, then the resulting G(f) is i's morphed version; Edi – [0087]: In such embodiments the hyper-parameters may be chosen as 30 iterations, and λ=0.005. α is used as initial learning rate hyper-parameter to control the extent to which the original image is morphed. A low morphing (e.g., learning rate 0.01) generates the morphed image i* which is still quite similar in terms of colours and structure to the initial picture i. Implementing medium morphing (e.g., learning rate 0.5), the generated morphed image i* has a look in between the average image of category h, and the original image i. A high morphing (e.g., learning rate close to 1), produces a morphed image i* similar to the average image of category h.).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Sukegawa et al. (US 2014/0079299) describes if a plurality of results is output by the person determination unit 107, the results may be displayed or recorded in a predetermined order, or it is also possible to display images on the screen in descending order of the total value of the degrees of reliability (the degrees of similarity) of a face part. It becomes possible to increase a possibility to include a target image by a human visual check by showing a plurality of candidates.
Tomar et al. (US 2021/0382936) describes one or more image styles 211 (of the image style repository 209) that have corresponding feature vectors within a threshold distance (or similarity score) of the feature vector representing the image style of the target image 203 are rendered or provided to the one or more consumer applications 215. Learning an embedding may include learning the distance between two or more feature vectors representing two or more image style features of two or more images based on feature similarity of values between the two or more images and adjusting weights of the deep learning model. For example, as described above, the more that image style features of two images are matching or are within a threshold feature vector value, the closer the two images (e.g., data points 403-1 and 403-2) are to each other in feature space, whereas when features do not match or are not within a feature vector value threshold, the further away the two feature vectors are from each other in feature space.
Yamane (US 2005/0187975) describes when the distance calculation unit 124 receives from the search unit 123 vector sets corresponding to two image files to be compared, the distance calculation unit 124 calculates the distance between the received image files. Specifically, the distance calculation unit 124 establishes one-to-one correspondences between feature vectors in the two inputted vector sets so as to form a plurality of vector pairs. The distance calculation unit 124 calculates distances between the respective vector pairs. Then, the distance calculation unit 124 calculates the sum of the distances between the respective vector pairs as a distance between the two vector sets, which is information indicating a degree of similarity between the two image files. The value indicating this distance decreases with increase in the degree of similarity.
Cocias et al. (US 2018/0189607) describes receiving generic image data of an object type, step 310, and receiving recorded image data related to the object type, step 320. The method 300 further comprises modifying the generic image data with respect to at least one imaging-related parameter, step 330, and determining a degree of similarity between the modified generic image data and the recorded image data, step 340, as has been described in more detail in connection with the computing device 200 of FIG. 2. When the determined degree of similarity fulfils a similarity condition, the modified generic image data is stored as generated training image data of the object type, step 350.
Kajimoto et al. (US 2019/0392259) describes when GAN is used to generate a face image, a generator generates a generation image by receiving a latent noise as input which is randomly selected within a range of [−1, 1], and randomly gives either the generation image or a face training image to a discriminator. The discriminator discriminates whether the given image is the generation image or the training image. The generator and the discriminator learn in an adversarial manner by performing error backpropagation of the accuracy of the discrimination in the discriminator so that the generator outputs a generation image which captures features of a face training image.
Aliper et al. (US 2019/0392304) describes the reconstructed object (e.g., generated object data 203) is compared with the input object (e.g., object data 202) using a specified similarity measure (e.g., object loss module 150). The deviation of these two objects (e.g., difference between object data 202 and generated object data 203) is called an object reconstruction loss and is denoted by L.sub.object. A deviation in terms of a distance measure (e.g. Eucledian distance) between z.sub.xy (e.g., latent object-condition data 206a) and z.sub.yx (e.g., latent condition-object data 208a) is computed, such as by the distance comparator 154, and is called a latent deviation loss. A distance metric can be used to measure the similarity of “x” and “y” based on the encodings z.sub.xy and z.sub.yx to capture common information and allows ranking generated objects “x” accordingly to similarity or relevance to “y,” and vice versa.
Sullivan et al. (US 2020/0035350) describes receiving a histological image captured by a medical imaging device and generating a modified image of the histological image by, for example, using a first autoencoder (e.g., first autoencoder 310) and a predictive model (e.g., predictive model 620). An image generating unit (e.g., image generating unit 360) may apply one or more display characteristics associated with the identified at least one target histological image to the histological image to modify the histological image. The modified image may result in improving visibility of the target region.
Mitra et al. (US 2022/0028139) describes identifying an original image including a plurality of semantic attributes, wherein each of the semantic attributes represents a complex set of features of the original image; identifying a target attribute value that indicates a change to a target attribute of the semantic attributes; computing a modified feature vector based on the target attribute value, wherein the modified feature vector incorporates the change to the target attribute while holding at least one preserved attribute of the semantic attributes substantially unchanged; and generating a modified image based on the modified feature vector, wherein the modified image includes the change to the target attribute and retains the at least one preserved attribute from the original image.
Tagra et al. (US 2022/0121839) describes new facial image synthesizer 220 determines the Euclidean distance between a feature vector for the base facial image and a feature vector for a reference facial image. A similarity score is a metric measuring the degree of similarity or the degree of difference between the updated input image and the original input image. In some embodiments, the similarity score is utilized to ensure that some aesthetics of the input image are retained in the updated user input image. In these cases, the similarity score determined for an updated input image is compared to a predetermined minimum threshold similarity score, and if the similarity score for the updated input image does not meet the minimum threshold, a user is provided a notification indicating that identity obfuscation will change the aesthetics of the user's input image. Additionally or alternatively, the similarity score may be utilized to ensure that the updated input image is not too similar to the input image so that the identity is not properly protected. As such, in these instances, the similarity score determined for an updated input image is compared to a predetermined maximum threshold similarity score, and if the similarity score for the updated input image does not meet the maximum threshold, a user is provided a notification indicating that facial identity could not be protected. Where the updated input image does not meet the maximum threshold score, the updated input image may either be presented with the notification or may not be presented at all. In some embodiment, the determined similarity score is compared to both a maximum score and a minimum score to determine whether the updated input image both sufficiently obfuscates the individual's identity and preserves the aesthetics of the original input image.
Ko et al. (US 2022/0148244) describes a generation apparatus may generate various transformed images by applying a target coefficient that is to be changed, to basis vectors in an embedding space of a neural network. A direction in which a predetermined semantic element changes in the transformed images may be analyzed by sampling a basis vector. When an input vector is changed in a direction of each of basis vectors and when an image is generated using a generator, a change in a predetermined semantic element in the image may be analyzed. A top row 810 shows a process in which an input image representing an expressionless face is gradually transformed to images representing bright smiling expressions exposing teeth by changing (for example, increasing by “2”) a target coefficient corresponding to a facial expression of the input image, when a target semantic element among semantic elements of the input image is a facial expression.
Ikede (US 2024/0004878) describes a generation unit 424 configured to generate, based on the neighbor data, a local model LM that outputs an estimation value dp.sub.i of a latent distance d.sub.i when difference information V.sub.i is inputted, the latent distance being a distance between the query data and the neighbor data in the latent space, the difference information being related to a difference, for each element of the features, between the query data and the neighbor data in the presentation space; and a calculation unit 425 configured to calculate, based on the local model and the difference information, an element contribution degree c.sub.i,f representing a magnitude of an effect that each element of the features of the neighbor data exerts on the latent distance.
Shaobo Guan (“Generating custom photo-realistic faces using AI”) describes to test whether the discovered feature axes can be used to control the corresponding feature of the generated image. To do this, a random vector z_0 in the latent space of GAN is generated to produce a synthetic image x_0 by passing it through the generator network x_0=G(z_0). Next, moving the latent vector along one feature axis u (a unit vector in the latent space, say, corresponding to the gender of the face) by distance λ, to a new location x_1=x_0+λu, a new image x1=G(z1) is generated. Ideally, the
corresponding feature of the new image would be modified toward the expected direction.
Allowable Subject Matter
Claims 2-9, 12-14, 16-17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:
Regarding claims 2-9, none of the cited prior art references of record, teach either individually or in combination, “the at least one change object is an object switched for display in a predetermined order instead of the input object, and the output unit sets a distance in the latent space, which corresponds to an amount of change of a visual change caused by switching the change object for display, such that the degree of recognition with respect to the visual change caused by switching the change object for display does not exceed the threshold of the degree of recognition.”
Regarding claims 12-14, none of the cited prior art references of record, teach either individually or in combination, “the output unit outputs a second change detection rate with respect to a visual change associated with second change processing including a plurality of division change processes of changing the input object to the reference object a plurality of times.”
Regarding claims 16-17, none of the cited prior art references of record, teach either individually or in combination, “the model includes a plurality of pieces of graph data each indicating the degree of recognition with respect to the change of the visual object based on the distance in the latent space in each of change directions mutually different in the latent space.”
The following subject matter has been identified as allowable subject matter in view of the cited prior art references:
A model generating method, comprising: generating data relating to each of a first visual object and a second visual object that are represented by different points in a latent space relating to a visual object; acquiring data in which a determination result of a test is associated with a distance between the points representing the first visual object and the second visual object in the latent space, the test being for allowing a tester to determine presence or absence of a cognitive difference between the first visual object and the second visual object or a degree of the cognitive difference; and generating a model representing a relationship between the distance in the latent space and a degree of recognition with respect to a change of the visual object based on the distance on a basis of the acquired data.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JWALANT B AMIN whose telephone number is (571)272-2455. The examiner can normally be reached Monday-Friday 10am - 630pm CST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Said Broome can be reached at 571-272-2931. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JWALANT AMIN/Primary Examiner, Art Unit 2612