DETAILED ACTION
Response to Arguments
The amendment filed 11 February 2026 has been entered in full. Accordingly, claims 1-20 are pending in the application.
Regarding the rejections under 35 U.S.C. 102(a)(1), the applicant has amended independent claims 1 and 20 to recite “subsequent to determining the image processing setting, receive a portion of the image data from an image sensor; and process the first portion of the image data based on the image processing setting”, while deleting earlier recitations of “with an image of” and “capture of the image”. The applicant argues that the prior art of record does not disclose or suggest these limitations. The examiner now relies on a reference cited by the applicant in the IDS filed 2 January 2024, Bahng et al. (Coloring with Words: Guiding Image Colorization Through Text-based Palette Generation, 2018, ECCV, Pages 1-17), hereinafter “Bahng”, in a new grounds of rejection as necessitated by the applicant’s amendment.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1, 13, 14, and 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Manjunatha (Learning to Color from Language, 17 April 2018, arXiv, pages 1-6) in view of Bahng (Coloring with Words: Guiding Image Colorization Through Text-based Palette Generation, 2018, ECCV, Pages 1-17).
Claim 1 is met by the combination of Manjunatha and Bahng, wherein
Manjunatha discloses:
An apparatus for processing image data (See the Abstract.), the apparatus comprising:
at least one memory (See page 1, right column: “We present two neural architectures for language-based colorization that augment an existing fully-convolutional model (Zhang et al., 2016) with representations learned from image captions.” Presence of a processor and memory is understood.); and
at least one processor coupled to the at least one memory and configured to (See page 1, right column: “We present two neural architectures for language-based colorization that augment an existing fully-convolutional model (Zhang et al., 2016) with representations learned from image captions.” Presence of a processor and memory is understood.):
obtain at least one character associated with a scene (See any individual character in only in the underlined portion of the caption input “An orange dog sitting on a blue couch” in Fig. 1 and “a green pickup truck next to trees” in Fig. 2. Also see page 2: “Next, we propose two ways to integrate additional text input into FCNN.”);
obtain additional data other than the at least one character, the additional data including context associated with the scene (See the underlined portion of the caption input “An orange dog sitting on a blue couch” in Fig. 1 and “a green pickup truck next to trees” in Fig. 2. These portions meet the claimed “context associated with capture of the image” since they specify a location or scene in which the image is captured.);
determine an image processing setting based on the at least one character and the additional data (See Fig. 4 and its caption on page 5: “Examples of intermediate layer activations while generating colorized images using the FILM network. These activation maps correspond to the mean activation immediately after the FILM layers of the sixth, seventh, and eighth blocks. Interestingly, the activations after the FILM layer of Block 6 always seems to focus on the object that is to be colorized, while those of Block 8 focus almost exclusively on the background.”);
…
process the first portion of the image data based on the image processing setting (See page 1: “crowdsourced evaluations confirm that our models properly localize and color objects based on captions (Figure 1).” Also see Fig. 3 and page 4: “Both CONCAT and FILM can manipulate image color from captions (further supported by the top row of Figure 3).”).
Manjunatha does not disclose the following; however, Bahng discloses:
subsequent to determining the image processing setting, receive a portion of the image data from an image sensor; and process the first portion of the image data based on the image processing setting (See page 2, last paragraph: “In this paper, we propose a novel method to generate multiple color palettes that convey the semantics of rich text and then colorize a given grayscale image according to the generated color palette.”)
Manjunatha and Bahng together disclose the limitations of claim 1. Bahng is directed to a similar field of art (language-guided image colorization). Therefore, Manjunatha and Bahng are combinable. Manjunatha does not receive image data from an image sensor after determining the image processing setting. Modifying the system and method of Manjunatha by adding the capability to generate color palettes from input text and then colorize a given grayscale image, as taught by Bahng, would yield the expected and predictable result of an additional processing branch for broader image editing application. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine Manjunatha and Bahng in this way.
Claim 13 is met by the combination of Manjunatha and Bahng, wherein
The combination of Manjunatha and Bahng discloses
The apparatus of claim 1, wherein the at least one processor is configured to:
And Manjunatha further discloses:
identify at least one message communicated using an application, wherein the at least one character is based on the at least one message (See the Abstract: “We condition this process on language, allowing end users to manipulate a colorized image by feeding in different captions.” Also see the recognition of the language in Fig. 2. Use of an application by the user to feed in the captions is understood.).
Claim 14 is met by the combination of Manjunatha and Bahng, wherein
The combination of Manjunatha and Bahng discloses
The apparatus of claim 1, wherein
And Manjunatha further disclose:
the at least one character associated with the scene is included in a caption for a second portion of the image data (See the captions in Figs. 1-2.).
Claim 18 is met by the combination of Manjunatha and Bahng, wherein
The combination of Manjunatha and Bahng discloses
The apparatus of claim 1, wherein,
And Bahng further discloses:
to process the first portion of the image data based on the image processing setting, the at least one processor is configured to: perform post-processing of the first portion of the image data according to the image processing setting after capture of the first portion of the image data (See Fig. 2 and page 2, last paragraph: “In this paper, we propose a novel method to generate multiple color palettes that convey the semantics of rich text and then colorize a given grayscale image according to the generated color palette.”.).
Claim 19 is met by the combination of Manjunatha and Bahng, wherein
The combination of Manjunatha and Bahng discloses
The apparatus of claim 1, wherein the at least one processor is configured to:
And Manjunatha further discloses:
store the first portion of the image data as processed using the image processing setting (See page 4, left column: “Workers receive a pair of images, a ground-truth MSCOCO image and a generated output from one of our three architectures, and are asked to choose the image that was not colored by a computer.” To later provide the generated images to workers, storing of those generated images is understood.).
Manjunatha teaches the method of claim 20 for the reasons given in the treatment of claim 1.
Claim(s) 2-4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Manjunatha (Learning to Color from Language, 17 April 2018, arXiv, pages 1-6) in view of Bahng (Coloring with Words: Guiding Image Colorization Through Text-based Palette Generation, 2018, ECCV, Pages 1-17) in view of Li (Data-Driven Affective Filtering for Images and Videos, 2015, IEEE Transactions on Cybernetics, Vol. 45, No. 10, Pages 2336-2349).
Claim 2 is met by the combination of Manjunatha, Bahng, and Li, wherein
The combination of Manjunatha and Bahng teaches:
The apparatus of claim 1, wherein
The combination of Manjunatha and Bahng does not disclose the following; however, Li teaches:
the image data includes at least one video frame of a video (See page 2336, right column: “Given an input image or video, we can synthesize any user-specific emotion onto it automatically.”).
Manjunatha, Bahng, and Li together teach the limitations of claim 2. Li is directed to a similar field of art (receiving an input image and generating from it an output image with a target emotion specified by a user). Therefore, Manjunatha, Bahng, and Li are combinable. Modifying the system and method of Manjunatha and Bahng by adding the capability of accepting and transforming a dataset of video frames would yield the expected and predictable result of wider applicability of the Manjunatha & Bahng technique. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine Manjunatha, Bahng, and Li in this way.
Claim 3 is met by the combination of Manjunatha, Bahng, and Li, wherein
The combination of Manjunatha, Bahng, and Li discloses:
The apparatus of claim 2, wherein
And Manjunatha further teaches:
the at least one video frame is part of a video (See page 2336, right column: “Given an input image or video, we can synthesize any user-specific emotion onto it automatically.”), and wherein the at least one character is based on video data associated with at least a portion of the video (See any individual character in only in the underlined portion of the caption input “An orange dog sitting on a blue couch” in Fig. 1 and “a green pickup truck next to trees” in Fig. 2. Each character in the underlined portion is part of a word describing scene content, which meets “based on video data associated with at least a portion of the video”.).
Claim 4 is met by the combination of Manjunatha, Bahng, and Li, wherein
The combination of Manjunatha, Bahng, and Li discloses:
The apparatus of claim 2, wherein
And Manjunatha further teaches:
the at least one video frame is part of a video, and wherein the additional data is based on video data associated with at least a portion of the video (See the underlined portion of the caption input “An orange dog sitting on a blue couch” in Fig. 1 and “a green pickup truck next to trees” in Fig. 2. The additional data describes more scene content, which meets “based on video data associated with at least a portion of the video”.).
Claim(s) 5-7, 9, and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Manjunatha (Learning to Color from Language, 17 April 2018, arXiv, pages 1-6) in view of Bahng (Coloring with Words: Guiding Image Colorization Through Text-based Palette Generation, 2018, ECCV, Pages 1-17) in view of Li (Data-Driven Affective Filtering for Images and Videos, 2015, IEEE Transactions on Cybernetics, Vol. 45, No. 10, Pages 2336-2349) in view of Kumar (Domain Adaptation Based Technique for Image Emotion Recognition Using Image Captions, 4 December 2020, International Conference on Computer Vision and Image Processing, Pages 1-12).
Claim 5 is met by the combination of Manjunatha, Bahng, and Kumar, wherein
The combination of Manjunatha and Bahng teaches:
The apparatus of claim 1, wherein the at least one processor is configured to:
The combination of Manjunatha and Bahng does not disclose the following; however, Kumar teaches:
identify an identity of at least one object in a second portion of the image data, wherein the at least one character is based on the identity of the at least one object (See Fig. 2, “man”, and page 6, first sentence: “This phase takes an image and generates a n words long caption c = {c1, c2, . . . , cn}, which is a sentence describing the content of the image.”).
Manjunatha, Bahng, and Kumar together teach the limitations of claim 5. Kumar is directed to a similar field of art (generating image captions and emotions based on image content). Therefore, Manjunatha, Bahng, and Kumar are combinable. Modifying the system and method of Manjunatha and Bahng by adding the capability to “identify an identity of at least one object in at least one of the image or a second image of the scene, wherein the at least one character is based on the identity of the at least one object”, as taught by Kumar, would yield the expected and predictable result of an additional way to receive a caption for an image, in the case that no user input is received in Manjunatha. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine Manjunatha, Bahng, and Kumar in this way.
Claim 6 is met by the combination of Manjunatha, Bahng, and Kumar, wherein
The combination of Manjunatha and Bahng teaches:
The apparatus of claim 1, wherein the at least one processor is configured to:
The combination of Manjunatha and Bahng does not disclose the following; however, Kumar teaches:
identify an expression of at least one person, wherein the at least one character is based on the expression of the at least one person (See Fig. 2, “man smiling”, and page 6, first sentence: “This phase takes an image and generates a n words long caption c = {c1, c2, . . . , cn}, which is a sentence describing the content of the image.).
See the motivation to combine in the treatment of claim 5.
Claim 7 is met by the combination of Manjunatha, Bahng, and Kumar, wherein
The combination of Manjunatha and Bahng teaches:
The apparatus of claim 6, wherein,
The combination of Manjunatha and Bahng does not disclose the following; however, Kumar teaches:
to identify the expression of the at least one person, the at least one processor is configured to identify the expression of the at least one person in a second portion of the image data (See Fig. 2, “man smiling”, and page 6, first sentence: “This phase takes an image and generates a n words long caption c = {c1, c2, . . . , cn}, which is a sentence describing the content of the image.).
See the motivation to combine in the treatment of claim 5.
Claim 9 is met by the combination of Manjunatha, Bahng, and Kumar, wherein
The combination of Manjunatha and Bahng teaches:
The apparatus of claim 1, wherein the at least one processor is configured to:
The combination of Manjunatha and Bahng does not disclose the following; however, Kumar teaches:
identify a scheduled event associated with the scene, wherein the at least one character is based on the scheduled event (See Fig. 5(b): generating a caption that reads “a woman and a woman are posing for a picture”. The examiner asserts that the act of posing for the picture meets the claimed “scheduled event” since it is a planned occurrence for a personal purpose.).
See the motivation to combine in the treatment of claim 5.
Claim 12 is met by the combination of Manjunatha, Bahng, and Kumar, wherein
The combination of Manjunatha and Bahng teaches:
The apparatus of claim 1, wherein the at least one processor is configured to:
The combination of Manjunatha and Bahng does not disclose the following; however, Kumar teaches:
identify an application, wherein at least one of the at least one character or the additional data is based on the application (See page 10, a captioning model generates the caption: “a man is cutting a piece of wood”. The system identifies an application of the piece of wood (i.e., for cutting).).
See the motivation to combine in the treatment of claim 5.
Claim(s) 8 and 15-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Manjunatha (Learning to Color from Language, 17 April 2018, arXiv, pages 1-6) in view of Bahng (Coloring with Words: Guiding Image Colorization Through Text-based Palette Generation, 2018, ECCV, Pages 1-17) in view of Chen (Language-Based Image Editing with Recurrent Attentive Models, 2018, CVPR, Pages 8721-8729), as cited in the IDS filed 2 January 2024.
Claim 8 is met by the combination of Manjunatha, Bahng, and Chen, wherein
The combination of Manjunatha and Bahng discloses:
The apparatus of claim 1, wherein the at least one processor is configured to:
The combination of Manjunatha and Bahng does not disclose the following; however, Chen teaches:
identify a time of day associated with capture of the image, wherein the at least one character is based on the time of day (See Fig. 2, “The user provides a textual description for image editing: ‘The afternoon light flooded the little room from the window…’”.).
Manjunatha, Bahng, and Chen together teach the limitations of claim 8. Chen is directed to a similar field of art (generating a target image based on a user-provided description). Therefore, Manjunatha, Bahng, and Chen are combinable. Modifying the system and method of Manjunatha and Bahng by adding the capability to “identify a time of day associated with capture of the image, wherein the at least one character is based on the time of day”, as taught by Chen, would yield the expected and predictable result of enabling a wider variety of image modification. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine Manjunath, Bahng, and Chen in this way.
Claim 15 is met by the combination of Manjunatha, Bahng, and Chen, wherein
The combination of Manjunatha and Bahng teaches:
The apparatus of claim 1, wherein
The combination of Manjunatha and Bahng does not disclose the following; however, Chen teaches:
the additional data identifies at least one image capture setting configured to be used to capture the first portion of the image data (See page 8721, Fig. 2 caption: “The user provides a textual description: ‘The afternoon light flooded the little room from the window…’”. The “afternoon light” language, serving as the claimed “additional data” identifies a time of day to capture the image. The examiner asserts this indication of capture time meets an “image capture setting”.).
See the motivation to combine in the treatment of claim 8.
Claim 16 is met by the combination of Manjunatha, Bahng, and Chen, wherein
The combination of Manjunatha and Bahng teaches:
The apparatus of claim 1, wherein
The combination of Manjunatha and Bahng does not disclose the following; however, Chen teaches:
the additional data identifies parameters of an image signal processor (ISP) that are adjustable using the image processing setting (See Fig. 1 caption: “In an interactive design interface, a sketch of shoes is presented to a customer, who then gives a verbal instruction on how to modify the design: “The insole of the shoes should be brown. The vamp and the heel should be purple and shining”.” The underlined portions are user-provided instructions to modify color, which is performed by choosing parameters in the learning model of Fig. 4. The examiner asserts that this model is implemented by an image signal processor based on page 8727, section titled “Model Implementation”.).
See the motivation to combine in the treatment of claim 8.
Claim 17 is met by the combination of Manjunatha, Bahng, and Chen, wherein
The combination of Manjunatha and Bahng teaches:
The apparatus of claim 1, wherein
The combination of Manjunatha and Bahng does not appear to disclose the following; however, Chen teaches:
to process the first portion of the image data based on the image processing setting, the at least one processor is configured to: apply the image processing setting using an image signal processor (ISP) during capture of the first portion of the image data (See the model in Fig. 4 that colorizes the image. The examiner asserts that this model is implemented by an image signal processor based on page 8727, section titled “Model Implementation”.).
See the motivation to combine in the treatment of claim 8.
Claim(s) 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Manjunatha (Learning to Color from Language, 17 April 2018, arXiv, pages 1-6) in view of Bahng (Coloring with Words: Guiding Image Colorization Through Text-based Palette Generation, 2018, ECCV, Pages 1-17) in view of Vaidyanathan (SNAG: Spoken Narratives and Gaze Dataset, 2018, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Pages 132-137).
Claim 10 is met by the combination of Manjunatha, Bahng, and Vaidyanathan, wherein
The combination of Manjunatha and Bahng teaches:
The apparatus of claim 1, wherein the at least one processor is configured to:
The combination of Manjunatha and Bahng does not disclose the following; however, Vaidyanathan teaches:
receive speech using a microphone, wherein the at least one character is based on the speech (See Fig. 2, transcript (including at least one character) of a participant’s spoken description of an image.).
Manjunatha, Bahng, and Vaidyanathan together teach the limitations of claim 10. Vaidyanathan is directed to a similar field of art (image captioning). Therefore, Manjunatha, Bahng, and Vaidyanathan are combinable Modifying the system and method of Manjunatha and Bahng by adding the capability to receive speech using a microphone, wherein the at least one character is based on the speech”, as taught by Vaidyanathan, would yield the expected and predictable result of an additional way to obtain a character/word/caption for an image. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine Manjunatha, Bahng, and Vaidyanathan in this way.
Allowable Subject Matter
Claim 11 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter: the prior art of record, individually or in combination, does not disclose or suggest in claim 11: “The apparatus of claim 1, wherein the at least one processor is configured to: determine an identity of a user operating an image capture device that captures the first portion of the image data, wherein the at least one character is based on the identity of the user.”
The concept of “determine an identity of a user operating an image capture device that captures the first portion of the image data” is found in the prior art; however, in combination with “wherein the at least one character is based on the identity of the user” and the limitations of claim 1 that it refers back to, there does not appear to be reasonable combination of prior art that arrives at these limitations as a whole.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Contact
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONATHAN S LEE whose telephone number is (571)272-1981. The examiner can normally be reached 11:30 AM - 7:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Bee can be reached at (571)270-5183. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Jonathan S Lee/Primary Examiner, Art Unit 2677