Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over SMETANIN et al. (2024/0412433) in view of ISOLA et al (Image-to-Image Translation with Conditional Adversarial Networks) and ZHANG et al (Adding Conditional Control to Text-to-Image Diffusion Models).
As per claim 1, Smetanin teaches the claimed “method for processing comments,” comprising: “displaying a plurality of image-generating entries on a comment editing panel of a comment object, wherein the plurality of image-generating entries is configured to generate comment images for commenting on the comment object by triggering a plurality of image-generating networks” (Smetanin, [0022] - Generative AI enables users to transform their raw media into unique and captivating forms, fostering a culture of creativity and innovation. Through the application of sophisticated algorithms, users can easily apply filters, effects, and other artistic enhancements to their media content, resulting in visually stunning and engaging creations; [0095] - Consistent with some examples, the stylization pipeline leverages a machine learning model, such as a neural network implementing a process consistent with a latent diffusion technique. The model is trained or fine-tuned as follows. First, the pre-trained neural network model is executed in text-to-image mode, and several text prompts are provided, sequentially, as input, to generate a reference image for each text-based prompt. For instance, each text-based prompt may instruct the model to generate an image having one of several specific stylization effects; [0099] - The user interface element with reference number 602 is an example of a carousel, where each element in the carouse (e.g., each circular icon or graphic) represents an individual special effect that might be selected and applied to the real-time image. As the end-user scrolls, by swiping left or right, through the elements presented in the carousel 602, the end-user eventually selects a particular element 604 or graphic that is associated with a special effect that involves generating a stylized image using a generative neural network technique, such as latent diffusion, as implemented with a neural network model. When the end-user selects the specific special effect associated with the icon 604, a user interface element (e.g., the button 606) appears above the icon 602 representing the selected special effect. Consistent with some embodiments, the image presented within the circular area of the button 606 depicts an example of the stylization effect that will be applied to an input image, when the end-user captures the input image). It is noted that Smetanin does not explicitly teach “wherein each image-generating entry of the plurality of image-generating entries corresponds to an image-generating network of the plurality of image-generating networks, and wherein each image-generating entry is configured to generate a comment image by triggering the image-generating network corresponding to the each image-generating entry” as claimed; however, Isola’s neural network models, in which each model generates a specific corresponding image using a generative neural network technique (Isola, Figure 1 - Many problems in image processing, graphics, and vision involve translating an input image into a corresponding output image. These problems are often treated with application-specific algorithms, even though the setting is always the same: map pixels to pixels. Conditional adversarial nets are a general-purpose solution that appears to work well on a wide variety of these problems. Here we show results of the method on several. In each case we use the same architecture and objective, and simply train on different data) suggests the “the plurality of image-generating networks” in which each network is “an image-generating network” generating “a comment image” when selected. Thus, it would have been obvious, in view of Isola, to configure Smetamin’s method as claimed by provide a plurality of conditional adversarial nets (e.g., Isola, Figure 1) for selecting a specific network to generate a comment image. The motivation is to allow a user to select one of a plurality of neural networks for generating a desired image corresponding to the selected network.
Claim 2 adds into claim 1 “wherein the plurality of image-generating entries comprises at least two of a text-based image-generating entry, an image-based image-generating entry, a graffiti-based image-generating entry, or an image-and-text-based image-generating entry” (Isola, Figure 1 - an image-based image-generating entry (e.g., Aerial-to-map, Day-to-night, BW to color), a graffiti-based image-generating entry (e.g., Edges to photo)). Thus, it would have been obvious, in view of Isola, to configure Smetamin’s method as claimed by provide at least two of conditional adversarial nets (e.g., Isola, Figure 1) for selecting a specific network to generate a comment image. The motivation is to allow a user to select one of a plurality of neural networks for generating a desired image corresponding to the selected network.
Claim 3 adds into claim 1 “wherein displaying the plurality of image-generating entries on the comment editing panel of the comment object comprises: displaying the comment editing panel provided with the plurality of image-generating entries in response to a comment triggering instruction for the comment object” which would have been obvious, given the availabilities of the plurality of image-generating entries, to allow a user to select a specific image-generating entry by representing the selecting options on the user interface (analogous to Smetamin’s selection on of the multiple special effects (Figure 6, [0099] - The user interface element with reference number 602 is an example of a carousel, where each element in the carouse (e.g., each circular icon or graphic) represents an individual special effect that might be selected and applied to the real-time image. As the end-user scrolls, by swiping left or right, through the elements presented in the carousel 602, the end-user eventually selects a particular element 604 or graphic that is associated with a special effect that involves generating a stylized image using a generative neural network technique, such as latent diffusion, as implemented with a neural network model. When the end-user selects the specific special effect associated with the icon 604, a user interface element (e.g., the button 606) appears above the icon 602 representing the selected special effect. Consistent with some embodiments, the image presented within the circular area of the button 606 depicts an example of the stylization effect that will be applied to an input image, when the end-user captures the input image).
Claim 4 adds into claim 1 “displaying an initial comment editing panel in response to a comment triggering instruction for the comment object, wherein the initial comment editing panel is provided with entry triggering information; and displaying the plurality of image-generating entries on the comment editing panel of the comment object comprising: updating the initial comment editing panel to the comment editing panel provided with the plurality of image-generating entries in response to an entry displaying instruction triggered by the entry triggering information” which would have been obvious, given the availabilities of the plurality of image-generating entries, to allow a user to select a specific image-generating entry by representing the selecting options on the user interface (analogous to Smetamin’s selection on of the multiple special effects (Figure 6, [0099] - The user interface element with reference number 602 is an example of a carousel, where each element in the carouse (e.g., each circular icon or graphic) represents an individual special effect that might be selected and applied to the real-time image. As the end-user scrolls, by swiping left or right, through the elements presented in the carousel 602, the end-user eventually selects a particular element 604 or graphic that is associated with a special effect that involves generating a stylized image using a generative neural network technique, such as latent diffusion, as implemented with a neural network model. When the end-user selects the specific special effect associated with the icon 604, a user interface element (e.g., the button 606) appears above the icon 602 representing the selected special effect. Consistent with some embodiments, the image presented within the circular area of the button 606 depicts an example of the stylization effect that will be applied to an input image, when the end-user captures the input image).
Claim 5 adds into claim 1 “displaying a target comment image on the comment editing panel in response to generating the target comment image by triggering a target image-generating entry” (Smetamin, Figure 6 – image 612), “wherein the target image-generating entry is any one of the plurality of image-generating entries” (Smetanin, the neural network 230 generates a stylized image), and “the target comment image is generated based on an image-generating network corresponding to the target image-generating entry” (Smetamin, Figure 6 – image 612); and displaying, in response to a comment posting instruction, comment information comprising the target comment image on a comment displaying region of the comment object” (Smetamin, Figure 6, [0099] - The user interface element with reference number 602 is an example of a carousel, where each element in the carouse (e.g., each circular icon or graphic) represents an individual special effect that might be selected and applied to the real-time image. As the end-user scrolls, by swiping left or right, through the elements presented in the carousel 602, the end-user eventually selects a particular element 604 or graphic that is associated with a special effect that involves generating a stylized image using a generative neural network technique, such as latent diffusion, as implemented with a neural network model. When the end-user selects the specific special effect associated with the icon 604, a user interface element (e.g., the button 606) appears above the icon 602 representing the selected special effect. Consistent with some embodiments, the image presented within the circular area of the button 606 depicts an example of the stylization effect that will be applied to an input image, when the end-user captures the input image).
Claim 6 adds into claim 5 “wherein displaying the target comment image on the comment editing panel in response to generating the target comment image by triggering the target image-generating entry (Smetamin, Figure 6) comprises: “displaying operation information in response to triggering an image-generating triggering instruction based on the target image-generating entry” (Smetanin, the neural network 230 generates a stylized image); in response to acquiring image description information by editing based on the operation information and triggering an image-generating confirming instruction, displaying at least one candidate image, wherein the at least one candidate image is generated based on the image description information and the image-generating network corresponding to the target image-generating entry; and displaying the target comment image on the comment editing panel in response to an image confirming instruction for the target comment image, wherein the target comment image is any image from the at least one candidate image” (Smetanin, Figure 6 - Frames of “transition" animation).
Claim 7 adds into claim 6 “in response to the target image-generating entry being a text-based image-generating entry, the image description information is text information as edited; and in response to the image-generating network corresponding to the target image-generating entry being a text-based image-generating network, the at least one candidate image is generated based on the text information and the text-based image-generating network” (Smetamin, [0018] - The neural network model may be pre-trained to generate different stylized effects, based on a text prompt. This is achieved through a technique called conditional or text-guided generation; Smetanin, [0098] - The reference images, organized by individual style, are then used along with the image data derived through the pre-processing techniques, to fine-tune the pre-trained model, so that at inference time, given an input image and a text prompt, and optionally image characteristics determined during the pre-processing of the image, the fined-tuned model will generate an output image having specific stylization characteristics or effects).
Claim 8 adds into claim 6 “in response to the target image-generating entry being an image-based image-generating entry, the image description information is a reference image as selected” (Smetamin, [0106] - Upon detecting that the specific special effect has been selected, a button 806 is presented, in addition to a user interface element 808 depicting a selection of images stored on the camera roll of the end-user); and “in response to the image-generating network corresponding to the target image-generating entry being an image-based image-generating network, the at least one candidate image is generated based on the reference image and the image-based image-generating network” (Smetamin, [0015] - select a photo or video to be stylized with a particular special effect, generate a stylized version of the photo or video based on the selected effect, and then save or share the resulting stylized photo or video).
Claim 9 adds into claim 6 “in response to the target image-generating entry being a graffiti-based image-generating entry, the image description information is a graffiti image as edited; and in response to the image-generating network corresponding to the target image-generating entry being a graffiti-based image-generating network, the at least one candidate image is generated based on the graffiti image and the graffiti-based image-generating network” (Esola, Figure 1 – edge-to-photo). Thus, it would have been obvious, in view of Isola, to configure Smetamin’s method as claimed by provide a graffiti-based image-generating network (e.g., Isola, Figure 1) for selecting a specific network to generate a draw-to-photo image. The motivation is to allow a user to select one of a plurality of neural networks for generating a desired image corresponding to the selected network and the input (e.g., the graffiti).
Claim 10 adds into claim 6 “in response to the target image-generating entry being an image-and-text-based image-generating entry, the image description information is image-and-text information as edited; and in response to the image-generating network corresponding to the target image-generating entry being an image-and-text-based image-generating network, the at least one candidate image is generated based on the image-and-text information and the image-and-text-based image-generating network” (Smetanin, [0098] - The reference images, organized by individual style, are then used along with the image data derived through the pre-processing techniques, to fine-tune the pre-trained model, so that at inference time, given an input image and a text prompt, and optionally image characteristics determined during the pre-processing of the image, the fined-tuned model will generate an output image having specific stylization characteristics or effects).
Claim 11 adds into claim 6 “wherein the at least one candidate image comprises candidate images of a plurality of first styles, and wherein displaying the at least one candidate image comprises: displaying a first candidate image and a first style switch control corresponding to each of the plurality of first styles, wherein the first candidate image is a candidate image of a first target style of the plurality of first styles, and any first style switch control is configured to switch a currently displayed candidate image to the candidate image of the first style corresponding to the first style switch control” (Smetanin, [0098] - The reference images, organized by individual style, are then used along with the image data derived through the pre-processing techniques, to fine-tune the pre-trained model, so that at inference time, given an input image and a text prompt, and optionally image characteristics determined during the pre-processing of the image, the fined-tuned model will generate an output image having specific stylization characteristics or effects).
Claim 12 adds into claim 6 “wherein the at least one candidate image comprises a second candidate image of a second target style of a plurality of second styles, and wherein displaying the at least one candidate image comprises: displaying the second candidate image and at least one of a displaying style adjust control, a first image update control, or a second style switch control corresponding to each of the plurality of second styles, wherein the style adjust control is configured to adjust an intensity of an image style of a currently displayed candidate image, wherein the first image update control is configured to update the currently displayed candidate image to an updated candidate image, and an image style of the updated candidate image is consistent with an image style of the currently displayed candidate image, and wherein the corresponding second style switch control is configured to switch the currently displayed candidate image to the candidate image of the second style corresponding to the second style switch control” (Smetanin, [0098] - The reference images, organized by individual style, are then used along with the image data derived through the pre-processing techniques, to fine-tune the pre-trained model, so that at inference time, given an input image and a text prompt, and optionally image characteristics determined during the pre-processing of the image, the fined-tuned model will generate an output image having specific stylization characteristics or effects) (the switch style is performed by Smatamin’s text prompt).
Claim 13 adds into claim 12 “wherein the second style switch control corresponding to a second non-target style is further configured to, in response to being triggered for a first time, generate a candidate image of the second non-target style, and wherein the second non-target style is any one of the plurality of second styles is different from the second target style” (Smetanin, [0098] - The reference images, organized by individual style, are then used along with the image data derived through the pre-processing techniques, to fine-tune the pre-trained model, so that at inference time, given an input image and a text prompt, and optionally image characteristics determined during the pre-processing of the image, the fined-tuned model will generate an output image having specific stylization characteristics or effects) (Any target or non-target style is defined by the text prompt and the selected reference image).
Claim 14 adds into claim 6 “wherein the at least one candidate image comprises a plurality of third candidate images of a third target style of a plurality of third styles, and wherein displaying the at least one candidate image comprises: displaying the plurality of third candidate images and a third style switch control corresponding to each of the plurality of third styles, wherein any third style switch control is configured to switch a plurality of currently displayed candidate images to the each of the plurality of candidate images of the third style using the corresponding to the third style switch control” (Smetanin, [0098] - The reference images, organized by individual style, are then used along with the image data derived through the pre-processing techniques, to fine-tune the pre-trained model, so that at inference time, given an input image and a text prompt, and optionally image characteristics determined during the pre-processing of the image, the fined-tuned model will generate an output image having specific stylization characteristics or effects) (Any target style is defined by the text prompt and performed by Smatamin’s text prompt; furthermore, the candidate image is showed on Smetamin’s Figure 6 – frames of “transition” animation).
Claim 15 adds into claim 6 “wherein the third style switch control corresponding to a third non-target style is further configured to, in response to being triggered for a first time, generate a plurality of candidate images of the third non-target style, and wherein the third non-target style is one of the plurality of third styles is different from the third target style (Smetanin, [0098] - The reference images, organized by individual style, are then used along with the image data derived through the pre-processing techniques, to fine-tune the pre-trained model, so that at inference time, given an input image and a text prompt, and optionally image characteristics determined during the pre-processing of the image, the fined-tuned model will generate an output image having specific stylization characteristics or effects) (Any target style is defined by the text prompt and performed by Smatamin’s text prompt; furthermore, the candidate image is showed on Smetamin’s Figure 6 – frames of “transition” animation).
Claim 16 adds into claim 6 “wherein the at least one candidate image comprises candidate images of a plurality of fourth styles; and wherein displaying the at least one candidate image comprises: displaying the candidate images of the plurality of fourth styles and a second image update control, wherein the second image update control is configured to update currently displayed candidate images of the plurality of fourth styles” (Smetanin, [0098] - The reference images, organized by individual style, are then used along with the image data derived through the pre-processing techniques, to fine-tune the pre-trained model, so that at inference time, given an input image and a text prompt, and optionally image characteristics determined during the pre-processing of the image, the fined-tuned model will generate an output image having specific stylization characteristics or effects) (Any target style is defined by the text prompt and performed by Smatamin’s text prompt; furthermore, the candidate image is showed on Smetamin’s Figure 6 – frames of “transition” animation).
Claims 17-19 and 20 claim an electronic device and a non-transitory computer-readable storage medium storing instructions based on the method of claims 1-16; therefore, they are rejected under a similar rationale.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PHU K NGUYEN whose telephone number is (571)272-7645. The examiner can normally be reached M-F 8-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel F. Hajnik can be reached at (571) 272-7642. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/PHU K NGUYEN/Primary Examiner, Art Unit 2616