DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Remarks
This action is in response to the applicant’s response filed 30 July 2025, which is in response to the USPTO office action mailed 3 April 2025. Claims 1-20 are currently pending.
Response to Arguments
With respect to the 35 USC §103 rejections of claims 1-20, the applicant’s arguments have been fully considered but have not been deemed persuasive.
Firstly, the applicant argues “There is no teaching or suggestion in Beauchamp of a ‘non-intended search term.’” (Remarks pg. 9). Respectfully, this argument is not persuasive.
Beauchamp teaches processing an image using a convolutional neural network (CNN) which “includes a plurality of layers that process the image 12 in order to generate an output, such as a predicted classification or predicted label” for the image (Beauchamp, [0051]). In particular, the CCN includes “a fully connected layer 18 processes the set of feature maps 16 in order to perform a classification of the image, based on the features encoded in the set of feature maps 16. The fully connected layer 18 contains learned parameters that, when applied to the set of feature maps 16, outputs a set of probabilities representing the likelihood that the image 12 belongs to each of a defined set of possible classes” (Beauchamp, [0052]). To clarify the rejection, the examiner interprets determining the likelihood an image belongs to a set of possible classes reads on “determining a first keyword indicated as an intended search term for the image” and “determining a second keyword indicated as a non-intended search term for the image” as claimed, where a class with a high likelihood reads on an intended search term and a class with a low likelihood reads on a non-intended search term. Therefore, this argument is not persuasive.
Lastly, the applicant argues “And even if there was disclosure of a non-intended search, Beauchamp fails to teach performing a reward adjustment based on the non-intended search term in general, or performing ‘reward adjustments that result in a decrease in a second similarity score corresponding to a similarity between the vector representation of the adjusted image and a vector representation of the non-intended search term.’” (Remarks pg. 9). Respectfully, this argument is not persuasive.
Beauchamp teaches a training process where learned parameters of a CNN are “updated based on a difference between the generated output value and the desired target value. For example, if the value outputted by the ML model is excessively high, the parameters may be adjusted so as to lower the output value in future training iterations. An objective function is a way to quantitatively represent how close the output value is to the target value. An objective function represents a quantity (or one or more quantities) to be optimized (e.g., minimize a loss or maximize a reward) in order to bring the output value as close to the target value as possible. The goal of training the ML model typically is to minimize a loss function or maximize a reward function” (Beauchamp, [0046]). To clarify the rejection, the examiner interprets updating the learned parameters of the CNN (which determines the likelihood an image belongs to a set of possible classes) reads on “reward adjustments that result in a decrease in a second similarity score corresponding to the non-intended search term” based on the broadest reasonable interpretation of reward, which is giving recognition (i.e. the parameters are iteratively updated based on the difference between an output value and a desired target value such that the CNN may more accurately determine the likelihood an input image belongs to a set of possible classes). Therefore, this argument is not persuasive.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-3 and 11-13 are rejected under 35 U.S.C. 103 as being unpatentable over BEAUCHAMP et al., US 2024/0289365 A1 (hereinafter “Beauchamp”) in view of Shi et al., US 2018/0075581 A1 (hereinafter “Shi”).
Claim 1: Beauchamp teaches a method comprising:
accessing an image (Beauchamp, [Fig. 1] note user-supplied query, [0075] note The input prompt may be a user-supplied prompt that is received from a user device 120 via a network 150. In the context of a search, the input prompt may be or include a user-supplied search query… The search query may comprise text, images, audio, and/or other forms of unstructured data; i.e. inputting an image reads on accessing an image);
determining a first keyword indicated as an intended search term for the image; determining a second keyword indicated as a non-intended search term for the image (Beauchamp, [Fig. 6], [0051] note CNN 10 includes a plurality of layers that process the image 12 in order to generate an output, such as a predicted classification or predicted label for the image 12, [0052] note a fully connected layer 18 processes the set of feature maps 16 in order to perform a classification of the image, based on the features encoded in the set of feature maps 16. The fully connected layer 18 contains learned parameters that, when applied to the set of feature maps 16, outputs a set of probabilities representing the likelihood that the image 12 belongs to each of a defined set of possible classes);
inputting the image to a machine learning system comprising a generative model and a discriminative model, wherein (Beauchamp, [Fig. 1] note 112, 114, 116, [0074] note system 100 includes a generative AI model 112, a search engine 114, and an embeddings module 116):
the generative model iteratively makes adjustments to the image to output an adjusted image, wherein the generative model modifies the adjustments to the image based on a loss function (Beauchamp, [0037] note generated enhancement data may include the search query and/or additional keywords or synonyms. Additionally, or alternatively, the generated enhancement data may be in a format associated with the embedding space, [0038] note he LLM may be provided additional context for generating the enhancement text, [0105] note the enhancement criteria may relate to length, size, and/or dimensionality of the input enhancement data. The enhancement criteria may include, for example, defined threshold (e.g., minimum or maximum) values for size of text, images, etc), and
wherein the loss function is configured to: reward adjustments that result in an increase in a first similarity score corresponding to the intended search term, wherein the first similarity score corresponds to a similarity between a vector representation of the adjusted image and a vector representation of the intended search term; reward adjustments that result in a decrease in a second similarity score corresponding to the non-intended search term, wherein the second similarity score corresponds to a similarity between the vector representation of the adjusted image and a vector representation of the non-intended search term (Beauchamp, [0046] note Training an ML model… parameters of the ML model are updated based on a difference between the generated output value and the desired target value. For example, if the value outputted by the ML model is excessively high, the parameters may be adjusted so as to lower the output value in future training iterations. An objective function is a way to quantitatively represent how close the output value is to the target value. An objective function represents a quantity (or one or more quantities) to be optimized (e.g., minimize a loss or maximize a reward) in order to bring the output value as close to the target value as possible. The goal of training the ML model typically is to minimize a loss function or maximize a reward function, [0047], [0048] note Backpropagation is performed iteratively, so that the loss function is converged or minimized); and
the discriminative model determines the first and second similarity scores based on the adjusted image, the intended search term, and the non-intended search term (Beauchamp, [0080] note search engine 114 computes similarity between vectors in the embedding space, [0092] note input enhancement data, is passed to a search engine associated with the computing system… the first vector embedding may be created based on a combination of the search query and the input enhancement data, [0093] note the computing system performs a search of an embedding space based on the first vector embedding).
Beauchamp does not explicitly teach an image for upload to a sharing platform; penalize adjustments that result in an increase in perceptual loss of the adjusted image compared to the image; and causing the adjusted image to be uploaded to the sharing platform.
However, Shi teaches this (Shi, [0026] note a super-resolution generative adversarial network, [0038] note a loss function based on the Euclidean distance between feature maps extracted from the VGG19 network can be used to obtain perceptually superior results for both super-resolution and artistic style-transfer; i.e. minimizing perceptual loss, [0041] note using a combination of content loss and adversarial loss as perceptual loss functions. For example, the adversarial loss is driven by the discriminator network 408 to encourage solutions from the natural image domain, while the content loss function ensures that the super-resolved images have the same content as their low-resolution counterparts, [0079] note training the network 710 includes increasing the quality of the input visual data 720, [0085] note Using the updated parameters 765, the method may iterate, seeking to reduce the differences between the plurality of characteristics determined from the high-quality visual data 730 and the estimated enhanced quality visual data 740, each time using the updated parameters 765 produced by the comparison network 760, [0073] note The techniques described herein have a wide variety of applications in which increasing the resolution of a visual image would be helpful. For example, the resolution of still, or video, images can be enhanced, where the images are uploaded to a social media site).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the image enhancement based on minimizing a loss function of Beauchamp with the perceptual loss functions of Shi according to known methods (i.e. minimizing perceptual loss in an image). Motivation for doing so is that improves a variety of applications in which increasing the resolution of a visual image would be helpful (Shi, [0073]).
Claim 2: Beauchamp and Shi teach the method of claim 1, further comprising causing the adjusted image to be uploaded to the sharing platform based on:
determining that the first similarity score of the intended search term for the adjusted image is greater than the first similarity score of the intended search term for the image; and determining that the second similarity score of the non-intended search term for the adjusted image is less than the second similarity score of the non-intended search term for the image (Beauchamp, [0046] note Training an ML model generally involves inputting into an ML model (e.g., an untrained ML model) training data to be processed by the ML model, processing the training data using the ML model, collecting the output generated by the ML model (e.g., based on the inputted training data), and comparing the output to a desired set of target values… parameters of the ML model are updated based on a difference between the generated output value and the desired target value. For example, if the value outputted by the ML model is excessively high, the parameters may be adjusted so as to lower the output value in future training iterations. An objective function is a way to quantitatively represent how close the output value is to the target value. An objective function represents a quantity (or one or more quantities) to be optimized (e.g., minimize a loss or maximize a reward) in order to bring the output value as close to the target value as possible. The goal of training the ML model typically is to minimize a loss function or maximize a reward function).
Claim 3: Beauchamp and Shi teach the method of claim 1, further comprising:
determining a plurality of first keywords indicated as intended search terms for the image; and determining a plurality of second keywords indicated as non-intended search terms for the image, wherein the loss function is further configured to (Beauchamp, [Fig. 6], [0051] note CNN 10 includes a plurality of layers that process the image 12 in order to generate an output, such as a predicted classification or predicted label for the image 12, [0052] note a fully connected layer 18 processes the set of feature maps 16 in order to perform a classification of the image, based on the features encoded in the set of feature maps 16. The fully connected layer 18 contains learned parameters that, when applied to the set of feature maps 16, outputs a set of probabilities representing the likelihood that the image 12 belongs to each of a defined set of possible classes):
reward adjustments that result in an increase in the respective similarity scores corresponding to any of the intended search terms; and reward adjustments that result in a decrease in the respective similarity scores corresponding to any of the non-intended search terms (Beauchamp, [0046] note Training an ML model generally involves inputting into an ML model (e.g., an untrained ML model) training data to be processed by the ML model, processing the training data using the ML model, collecting the output generated by the ML model (e.g., based on the inputted training data), and comparing the output to a desired set of target values… parameters of the ML model are updated based on a difference between the generated output value and the desired target value. For example, if the value outputted by the ML model is excessively high, the parameters may be adjusted so as to lower the output value in future training iterations. An objective function is a way to quantitatively represent how close the output value is to the target value. An objective function represents a quantity (or one or more quantities) to be optimized (e.g., minimize a loss or maximize a reward) in order to bring the output value as close to the target value as possible. The goal of training the ML model typically is to minimize a loss function or maximize a reward function).
Claim 11: Beauchamp teaches a system comprising:
input/output circuitry configured to (Beauchamp, [0071] note computing system 500 may optionally include at least one input/output (I/O) interface 508, which may interface with optional input device(s) 510 and/or optional output device(s) 512):
access an image (Beauchamp, [Fig. 1] note user-supplied query, [0075] note The input prompt may be a user-supplied prompt that is received from a user device 120 via a network 150. In the context of a search, the input prompt may be or include a user-supplied search query… The search query may comprise text, images, audio, and/or other forms of unstructured data; i.e. inputting an image reads on accessing an image); and
control circuitry configured to (Beauchamp, [0069] note processor 502 may be, for example, a central processing unit, a microprocessor, a digital signal processor, an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a dedicated logic circuitry, a dedicated artificial intelligence processor unit, a graphics process):
determine a first keyword indicated as an intended search term for the image; determine a second keyword indicated as a non-intended search term for the image (Beauchamp, [Fig. 6], [0051] note CNN 10 includes a plurality of layers that process the image 12 in order to generate an output, such as a predicted classification or predicted label for the image 12, [0052] note a fully connected layer 18 processes the set of feature maps 16 in order to perform a classification of the image, based on the features encoded in the set of feature maps 16. The fully connected layer 18 contains learned parameters that, when applied to the set of feature maps 16, outputs a set of probabilities representing the likelihood that the image 12 belongs to each of a defined set of possible classes);
input the image to a machine learning system comprising a generative model and a discriminative model, wherein (Beauchamp, [Fig. 1] note 112, 114, 116, [0074] note system 100 includes a generative AI model 112, a search engine 114, and an embeddings module 116):
the generative model iteratively makes adjustments to the image to output an adjusted image, wherein the generative model modifies the adjustments to the image based on a loss function (Beauchamp, [0037] note generated enhancement data may include the search query and/or additional keywords or synonyms. Additionally, or alternatively, the generated enhancement data may be in a format associated with the embedding space, [0038] note he LLM may be provided additional context for generating the enhancement text, [0105] note the enhancement criteria may relate to length, size, and/or dimensionality of the input enhancement data. The enhancement criteria may include, for example, defined threshold (e.g., minimum or maximum) values for size of text, images, etc), and
wherein the loss function is configured to: reward adjustments that result in an increase in a first similarity score corresponding to the intended search term, wherein the first similarity score corresponds to a similarity between a vector representation of the adjusted image and a vector representation of the intended search term; reward adjustments that result in a decrease in a second similarity score corresponding to the non-intended search term, wherein the second similarity score corresponds to a similarity between the vector representation of the adjusted image and a vector representation of the non-intended search term (Beauchamp, [0046] note Training an ML model… ); parameters of the ML model are updated based on a difference between the generated output value and the desired target value. For example, if the value outputted by the ML model is excessively high, the parameters may be adjusted so as to lower the output value in future training iterations. An objective function is a way to quantitatively represent how close the output value is to the target value. An objective function represents a quantity (or one or more quantities) to be optimized (e.g., minimize a loss or maximize a reward) in order to bring the output value as close to the target value as possible. The goal of training the ML model typically is to minimize a loss function or maximize a reward function, [0047], [0048] note Backpropagation is performed iteratively, so that the loss function is converged or minimized); and
the discriminative model determines the first and second similarity scores based on the adjusted image, the intended search term, and the non-intended search term (Beauchamp, [0080] note search engine 114 computes similarity between vectors in the embedding space, [0092] note input enhancement data, is passed to a search engine associated with the computing system… the first vector embedding may be created based on a combination of the search query and the input enhancement data, [0093] note the computing system performs a search of an embedding space based on the first vector embedding).
Beauchamp does not explicitly teach an image for upload to a sharing platform; penalize adjustments that result in an increase in perceptual loss of the adjusted image compared to the image; and cause the adjusted image to be uploaded to the sharing platform.
However, Shi teaches this (Shi, [0026] note a super-resolution generative adversarial network, [0038] note a loss function based on the Euclidean distance between feature maps extracted from the VGG19 network can be used to obtain perceptually superior results for both super-resolution and artistic style-transfer; i.e. minimizing perceptual loss, [0041] note using a combination of content loss and adversarial loss as perceptual loss functions. For example, the adversarial loss is driven by the discriminator network 408 to encourage solutions from the natural image domain, while the content loss function ensures that the super-resolved images have the same content as their low-resolution counterparts, [0079] note training the network 710 includes increasing the quality of the input visual data 720, [0085] note Using the updated parameters 765, the method may iterate, seeking to reduce the differences between the plurality of characteristics determined from the high-quality visual data 730 and the estimated enhanced quality visual data 740, each time using the updated parameters 765 produced by the comparison network 760, [0073] note The techniques described herein have a wide variety of applications in which increasing the resolution of a visual image would be helpful. For example, the resolution of still, or video, images can be enhanced, where the images are uploaded to a social media site).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the image enhancement based on minimizing a loss function of Beauchamp with the perceptual loss functions of Shi according to known methods (i.e. minimizing perceptual loss in an image). Motivation for doing so is that improves a variety of applications in which increasing the resolution of a visual image would be helpful (Shi, [0073]).
Claim 12: Beauchamp and Shi teach the system of claim 11, wherein the control circuitry is further configured to cause the adjusted image to be uploaded to the sharing platform based on:
determining that the first similarity score of the intended search term for the adjusted image is greater than the first similarity score of the intended search term for the image; and determining that the second similarity score of the non-intended search term for the adjusted image is less than the second similarity score of the non-intended search term for the image (Beauchamp, [0046] note Training an ML model generally involves inputting into an ML model (e.g., an untrained ML model) training data to be processed by the ML model, processing the training data using the ML model, collecting the output generated by the ML model (e.g., based on the inputted training data), and comparing the output to a desired set of target values… parameters of the ML model are updated based on a difference between the generated output value and the desired target value. For example, if the value outputted by the ML model is excessively high, the parameters may be adjusted so as to lower the output value in future training iterations. An objective function is a way to quantitatively represent how close the output value is to the target value. An objective function represents a quantity (or one or more quantities) to be optimized (e.g., minimize a loss or maximize a reward) in order to bring the output value as close to the target value as possible. The goal of training the ML model typically is to minimize a loss function or maximize a reward function).
Claim 13: Beauchamp and Shi teach the system of claim 11, wherein the control circuitry is further configured to:
determine a plurality of first keywords indicated as intended search terms for the image; and determine a plurality of second keywords indicated as non-intended search terms for the image, wherein the loss function is further configured to (Beauchamp, [Fig. 6], [0051] note CNN 10 includes a plurality of layers that process the image 12 in order to generate an output, such as a predicted classification or predicted label for the image 12, [0052] note a fully connected layer 18 processes the set of feature maps 16 in order to perform a classification of the image, based on the features encoded in the set of feature maps 16. The fully connected layer 18 contains learned parameters that, when applied to the set of feature maps 16, outputs a set of probabilities representing the likelihood that the image 12 belongs to each of a defined set of possible classes):
reward adjustments that result in an increase in the respective similarity scores corresponding to any of the intended search terms; and reward adjustments that result in a decrease in the respective similarity scores corresponding to any of the non-intended search terms (Beauchamp, [0046] note Training an ML model generally involves inputting into an ML model (e.g., an untrained ML model) training data to be processed by the ML model, processing the training data using the ML model, collecting the output generated by the ML model (e.g., based on the inputted training data), and comparing the output to a desired set of target values… parameters of the ML model are updated based on a difference between the generated output value and the desired target value. For example, if the value outputted by the ML model is excessively high, the parameters may be adjusted so as to lower the output value in future training iterations. An objective function is a way to quantitatively represent how close the output value is to the target value. An objective function represents a quantity (or one or more quantities) to be optimized (e.g., minimize a loss or maximize a reward) in order to bring the output value as close to the target value as possible. The goal of training the ML model typically is to minimize a loss function or maximize a reward function).
Claims 4-6, 10, 14-16 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Beauchamp and Shi in further view of Shen et al., US 11,069,030 B2 (hereinafter “Shen”).
Claim 4: Beauchamp and Shi do not explicitly teach the method of claim 1, further comprising determining a segmentation mask for the image, wherein the generative model is configured to iteratively adjust the image based on the segmentation mask, wherein: adjustments to a first portion of the image covered by the segmentation mask are prioritized over adjustments to a second portion of the image not covered by the segmentation mask.
However, Shen teaches this (Shen, [Fig. 6], [Col. 19 Lines 17-23] note Input 614 can be fed into aesthetic enhancement neural network 606. Such an input can include, for example, an image and a corresponding segmentation map. In embodiments, upon an indication that an image is selected to be fed into aesthetic enhancement neural network, a corresponding segmentation map can be generated and/or obtained for the image, [Col. 15 Lines 64-67]-[Col. 16 Line 1] note a segmentation map along with an input image is advantageous because the segmentation map ensures that the network learns to make adaptive adjustments dependent on the categorization/content of an image (e.g., sky is not green, skin is not purple, etc.)).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the vector based search of Beauchamp and Shi with the aesthetics-guided image enhancement of Shen according to known methods (i.e. enhancing an image based on a corresponding segmentation map). Motivation for doing so is that this enhances an image while minimizing content loss (Shen, [Col. 11 Lines 48-67]).
Claim 5: Beauchamp, Shi and Shen teach the method of claim 4, wherein determining the segmentation mask for the image comprises automatically determining the segmentation mask based on the first keyword (Shen, [Col. 15 Lines 54-62] note a selected image can generally be an image input into a fully trained aesthetic enhancement neural network. the segmentation map corresponds to the image fed into the aesthetic enhancement neural network. A segmentation map generally refers to a parsed image where image content is mapped for each image pixel (e.g., water, sky, building, etc.). Parsing of an image to generate a segmentation map can be performed using, for example, a pyramid parsing network)).
Claim 6: Beauchamp, Shi and Shen teach the method of claim 4, wherein determining the segmentation mask for the image comprises:
receiving input via a user interface of a selected portion of the image; and determining the segmentation mask for the image based on the selected portion of the image (Shen, [Col. 6 lines 3-17] note a user can select or input an image via a graphical user interface (GUI)… An image can be selected or input in any manner. For example, a user may take a picture using a camera function on a device. As another example, a user may select a desired image from a repository, for example, stored in a data store accessible by a network or stored locally at the user device 102a. Based on the input image, an enhanced image can be generated and provided to the user via the user device 102a. In this regard, the enhanced image can be displayed via a display screen of the user device. Such an enhanced image can be further manipulated and or edited by a user via a GUI on a user device).
Claim 10: Beauchamp and Shi do not explicitly teach the method of claim 1, wherein the generative model is configured to iteratively adjust the image by changing the color of one or more pixels of the image.
However, Shen teaches this (Shen, [Col. 2 Lines 45-47] note increasing the aesthetics of an image often requires modifying varying aspects of the image (e.g., adjusting lighting, color, focus, composition, etc.), [Col. 12 Lines 3-16] note Euclidean loss can be determined by comparing the original red/green/blue (“RGB”) input image and the generated enhanced image, using the original RGB input image as ground-truth to calculate pixel-wise loss between the two images. As the aesthetic enhancement neural network is being trained to enhance images, the pixels of the enhanced image should remain relatively close to the RGB colors of the input image (e.g., a dark red pixel of the input image is enhanced to a slightly brighter red pixel in the generated enhanced image). However, if only Euclidean loss is minimized, the aesthetic enhancement neural network will be trained to generate images identical to the input images. As such, perceptual loss can also be evaluated to prevent this outcome).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the enhancement images of Beauchamp and Shi with the aesthetics-guided image enhancement of Shen according to known methods (i.e. enhancing the RBG colors of an input image). Motivation for doing so is that this enhances the aesthetics of an image (Shen, [Col. 3 Lines 20-21]).
Claim 14: Beauchamp and Shi do not explicitly teach the system of claim 11, wherein the control circuitry is further configured to determine a segmentation mask for the image, wherein the generative model is configured to iteratively adjust the image based on the segmentation mask, wherein: adjustments to a first portion of the image covered by the segmentation mask are prioritized over adjustments to a second portion of the image not covered by the segmentation mask.
However, Shen teaches this (Shen, [Fig. 6], [Col. 19 Lines 17-23] note Input 614 can be fed into aesthetic enhancement neural network 606. Such an input can include, for example, an image and a corresponding segmentation map. In embodiments, upon an indication that an image is selected to be fed into aesthetic enhancement neural network, a corresponding segmentation map can be generated and/or obtained for the image, [Col. 15 Lines 64-67]-[Col. 16 Line 1] note a segmentation map along with an input image is advantageous because the segmentation map ensures that the network learns to make adaptive adjustments dependent on the categorization/content of an image (e.g., sky is not green, skin is not purple, etc.)).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the vector based search of Beauchamp and Shi with the aesthetics-guided image enhancement of Shen according to known methods (i.e. enhancing an image based on a corresponding segmentation map). Motivation for doing so is that this enhances an image while minimizing content loss (Shen, [Col. 11 Lines 48-67]).
Claim 15: Beauchamp, Shi and Shen teach the system of claim 14, wherein the control circuitry is further configured to determine the segmentation mask for the image by automatically determining the segmentation mask based on the first keyword (Shen, [Col. 15 Lines 54-62] note a selected image can generally be an image input into a fully trained aesthetic enhancement neural network. the segmentation map corresponds to the image fed into the aesthetic enhancement neural network. A segmentation map generally refers to a parsed image where image content is mapped for each image pixel (e.g., water, sky, building, etc.). Parsing of an image to generate a segmentation map can be performed using, for example, a pyramid parsing network)).
Claim 16: Beauchamp, Shi and Shen teach the system of claim 14, wherein the control circuitry is further configured to determine the segmentation mask for the image by:
receiving input via a user interface of a selected portion of the image; and determining the segmentation mask for the image based on the selected portion of the image (Shen, [Col. 6 lines 3-17] note a user can select or input an image via a graphical user interface (GUI)… An image can be selected or input in any manner. For example, a user may take a picture using a camera function on a device. As another example, a user may select a desired image from a repository, for example, stored in a data store accessible by a network or stored locally at the user device 102a. Based on the input image, an enhanced image can be generated and provided to the user via the user device 102a. In this regard, the enhanced image can be displayed via a display screen of the user device. Such an enhanced image can be further manipulated and or edited by a user via a GUI on a user device).
Claim 20: Beauchamp and Shi do not explicitly teach the system of claim 11, wherein the generative model is configured to iteratively adjust the image by changing the color of one or more pixels of the image.
However, Shen teaches this (Shen, [Col. 2 Lines 45-47] note increasing the aesthetics of an image often requires modifying varying aspects of the image (e.g., adjusting lighting, color, focus, composition, etc.), [Col. 12 Lines 3-16] note Euclidean loss can be determined by comparing the original red/green/blue (“RGB”) input image and the generated enhanced image, using the original RGB input image as ground-truth to calculate pixel-wise loss between the two images. As the aesthetic enhancement neural network is being trained to enhance images, the pixels of the enhanced image should remain relatively close to the RGB colors of the input image (e.g., a dark red pixel of the input image is enhanced to a slightly brighter red pixel in the generated enhanced image). However, if only Euclidean loss is minimized, the aesthetic enhancement neural network will be trained to generate images identical to the input images. As such, perceptual loss can also be evaluated to prevent this outcome).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the enhancement images of Beauchamp and Shi with the aesthetics-guided image enhancement of Shen according to known methods (i.e. enhancing the RBG colors of an input image). Motivation for doing so is that this enhances the aesthetics of an image (Shen, [Col. 3 Lines 20-21]).
Claims 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Beauchamp and Shi in further view of Ferrer et al., US 2019/0197670 A1 (hereinafter “Ferrer”).
Claim 7: Beauchamp and Shi teach the method of claim 1, further comprising:
determining a perceptual loss; and causing the adjusted image to be uploaded to the sharing platform (Shi, [0038] note a loss function based on the Euclidean distance between feature maps extracted from the VGG19 network can be used to obtain perceptually superior results for both super-resolution and artistic style-transfer; i.e. minimizing perceptual loss, [0073] note The techniques described herein have a wide variety of applications in which increasing the resolution of a visual image would be helpful. For example, the resolution of still, or video, images can be enhanced, where the images are uploaded to a social media site).
Beauchamp and Shi do not explicitly teach a perceptual loss threshold; and based on determining that the perceptual loss of the adjusted image compared to the image is less than the perceptual loss threshold.
However, Ferrer teaches this (Ferrer, [0033] note an example method 600 for training the Generator. In particular embodiments, the Generator may be trained in the manner shown prior to being used to generate in-painted images for training the Discriminator, [0034] note At step 680, the system may determine whether the training is complete based on one or more termination criteria. For example, if the perceptual loss is below a predetermined threshold and/or if its changes over the last several iterations have stabilized (e.g., fluctuated within a predetermined range), then the system may determine that training is complete).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the perceptual loss of Beauchamp and Shi with the perceptual loss threshold of Ferrer according to known methods (i.e. enhancing an image based on a perceptual loss threshold). Motivation for doing so is that this would minimize the perceptual loss of an image (Ferrer, [0027]).
Claim 17: Beauchamp and Shi teach the system of claim 11, wherein the control circuitry is further configured to:
determine a perceptual; and causing the adjusted image to be uploaded to the sharing platform (Shi, [0038] note a loss function based on the Euclidean distance between feature maps extracted from the VGG19 network can be used to obtain perceptually superior results for both super-resolution and artistic style-transfer; i.e. minimizing perceptual loss, [0073] note The techniques described herein have a wide variety of applications in which increasing the resolution of a visual image would be helpful. For example, the resolution of still, or video, images can be enhanced, where the images are uploaded to a social media site).
Beauchamp and Shi do not explicitly teach a perceptual loss threshold; and based on determining that the perceptual loss of the adjusted image compared to the image is less than the perceptual loss threshold.
However, Ferrer teaches this (Ferrer, [0033] note an example method 600 for training the Generator. In particular embodiments, the Generator may be trained in the manner shown prior to being used to generate in-painted images for training the Discriminator, [0034] note At step 680, the system may determine whether the training is complete based on one or more termination criteria. For example, if the perceptual loss is below a predetermined threshold and/or if its changes over the last several iterations have stabilized (e.g., fluctuated within a predetermined range), then the system may determine that training is complete).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the perceptual loss of Beauchamp and Shi with the perceptual loss threshold of Ferrer according to known methods (i.e. enhancing an image based on a perceptual loss threshold). Motivation for doing so is that this would minimize the perceptual loss of an image (Ferrer, [0027]).
Claims 8 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Beauchamp and Shi in further view of Epstein et al., US 2016/0179847 A1.
Claim 8: Beauchamp and Shi do not explicitly teach the method of claim 1, further comprising: presenting, via a user interface, the image and the first keyword indicated as the intended search term for the image; identifying, based on the image and the first keyword, a plurality of candidate second keywords; receiving, via the user interface, a selected candidate second keyword of the plurality of candidate second keywords; and identifying, as the second keyword indicated as the non-intended search term for the image, the selected candidate second keyword.
However, Epstein teaches this (Epstein, [Fig. 4A]-[Fig. 4L], [0115] note FIG. 4D shows a detailed view UI 400d associated with the image 402c in FIG. 4C. By way of example, the UI 400c transitions to the UI 400d by the user selecting the image 402c in the UI 400c… User interface element 410a lists the tags that are associated with the image 402c. The user interface element 410a allows a user to directly see the tags that are associated with the image 402c, [0116] note FIG. 4E shows a UI 400e that includes a recommendation for a source based on the currently presented image 402d, which is presented based on the computer-implemented method or algorithm 200a. For example, the user may have indicated a negative preference or dislike in response to being presented the image 402c).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the enhancement images of Beauchamp and Shi with the image search algorithm informed by user feedback of Epstein according to known methods (i.e. determining negative and positive tags based on user feedback). Motivation for doing so is that, by processing tags differently based on the user's input, the image search algorithm can operate very quickly to locate the next image. Its simplicity saves significant computing, memory, and bandwidth resources yet results in a highly accurate search methodology that very quickly iterates to an image having content that represents something the user desires or wants at that given instant (Epstein, [0007]).
Claim 18: Beauchamp and Shi do not explicitly teach the system of claim 11, wherein: the input/output circuitry is further configured to: present, via a user interface, the image and the first keyword indicated as the intended search term for the image; and the control circuity is further configured to identify, based on the image and the first keyword, a plurality of candidate second keywords, wherein the input/output circuitry is further configured to: receive, via the user interface, a selected candidate second keyword of the plurality of candidate second keywords, and wherein the control circuity is further configured to identify, as the second keyword indicated as the non-intended search term for the image, the selected candidate second keyword.
However, Epstein teaches this (Epstein, [Fig. 4A]-[Fig. 4L], [0115] note FIG. 4D shows a detailed view UI 400d associated with the image 402c in FIG. 4C. By way of example, the UI 400c transitions to the UI 400d by the user selecting the image 402c in the UI 400c… User interface element 410a lists the tags that are associated with the image 402c. The user interface element 410a allows a user to directly see the tags that are associated with the image 402c, [0116] note FIG. 4E shows a UI 400e that includes a recommendation for a source based on the currently presented image 402d, which is presented based on the computer-implemented method or algorithm 200a. For example, the user may have indicated a negative preference or dislike in response to being presented the image 402c).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the enhancement images of Beauchamp and Shi with the image search algorithm informed by user feedback of Epstein according to known methods (i.e. determining negative and positive tags based on user feedback). Motivation for doing so is that, by processing tags differently based on the user's input, the image search algorithm can operate very quickly to locate the next image. Its simplicity saves significant computing, memory, and bandwidth resources yet results in a highly accurate search methodology that very quickly iterates to an image having content that represents something the user desires or wants at that given instant (Epstein, [0007]).
Claims 9 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Beauchamp and Shi in further view of Bedi et al., US 2016/0239944 A1 (hereinafter “Bedi”).
Claim 9: Beauchamp and Shi do not explicitly teach the method of claim 1, further comprising: presenting, via a user interface, the image and the adjusted image; presenting a prompt via the user interface for confirmation of the adjusted image; and based on receiving confirmation of the adjusted image via the user interface, causing the adjusted image to be uploaded to the sharing platform.
However, Bedi teaches this (Bedi, [Fig. 2] note 204, 226, [0035] note the image service 110 can receive input data 202, such as a low resolution image 204, [0036] note convert the input data 202, such as the low resolution image 204, into output data 224 that includes a high resolution image 226, [0066] note In at least some implementations, the user can be prompted to accept or cancel the image resolution enhancement of the input image. In one example, if the input image is only partially enhanced based on insufficient data from the related images to enhance the image resolution of the entire input image, the user can be provided a preview of the partially enhanced input image and a request to accept or cancel the partial enhancement, [0031] note one or more of the images 112 can be uploaded to the storage 114, [0036] note the storage 114 of the service provider 102 such as a cloud-based repository (e.g, Facebook™, Instagram™, and so on)).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the enhancement images of Beauchamp and Shi with the prompt to accept an image resolution enhancement of Bedi according to known methods (i.e. prompting a user to accept or cancel an image resolution enhancement of an input image). Motivation for doing so is that avoids sacrificing image quality (Bedi, [0001]).
Claim 19: Beauchamp and Shi do not explicitly teach the system of claim 11, wherein: the input/output circuitry is further configured to: present, via a user interface, the image and the adjusted image; and present a prompt via the user interface for confirmation of the adjusted image; and the control circuitry is further configured to: based on receiving confirmation of the adjusted image via the user interface, cause the adjusted image to be uploaded to the sharing platform.
However, Bedi teaches this (Bedi, [Fig. 2] note 204, 226, [0035] note the image service 110 can receive input data 202, such as a low resolution image 204, [0036] note convert the input data 202, such as the low resolution image 204, into output data 224 that includes a high resolution image 226, [0066] note In at least some implementations, the user can be prompted to accept or cancel the image resolution enhancement of the input image. In one example, if the input image is only partially enhanced based on insufficient data from the related images to enhance the image resolution of the entire input image, the user can be provided a preview of the partially enhanced input image and a request to accept or cancel the partial enhancement, [0031] note one or more of the images 112 can be uploaded to the storage 114, [0036] note the storage 114 of the service provider 102 such as a cloud-based repository (e.g, Facebook™, Instagram™, and so on)).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the enhancement images of Beauchamp and Shi with the prompt to accept an image resolution enhancement of Bedi according to known methods (i.e. prompting a user to accept or cancel an image resolution enhancement of an input image). Motivation for doing so is that avoids sacrificing image quality (Bedi, [0001]).
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier comm