DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1, 7, 8, 14, 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chakraborty et al. (US 20240184524) in view of He et al. (US 20260004490), and further in view of Wannerberg et al. (US 20250252661).
Regarding claim 1, Chakraborty discloses a method comprising: generating by a processing device, using a second ML model, a set of test images using the plurality of test images by: adding a first test image from the plurality of test images to the set of test images; for each of a set of subsequent test images from the plurality of test images, adding the subsequent test image to the set of test images if it is determined that a difference between visual properties of the subsequent test image and visual properties of each other test image currently in the set of test images meets a threshold based on a set of distance metrics (Chakraborty, “[0056] Training: one or more embodiments make use of training vectors; for example, a set of images. For a first image, construct input vectors X.sub.i and initialize the W.sub.i to a certain predetermined value; say, all zeroes or all ones, or some middle value. [0063] Inferencing: suppose training has occurred using eight (8) images; the weight vectors have been stored for all eight (8) images, and the expected output word is known. During inferencing, comparator 203 compares the output for the unknown features (say the image of a DOG) to the outputs for the known features (say images of CAT, DOG, COW, ELEPHANT, LION, TIGER, . . . ). Look for the sum of pixel distance between the unknown image and the known images. During inferencing, use the values stored in the memory cell(s) 208 during training. In a non-limiting example, suppose the system was trained on a set of known features (e.g. images of DOGS and BIRDS), and the system needs to be trained with a new object (say the image of a KANGAROO). In this case, after the MAC computations and comparisons, no closest features are found (e.g., the distances from all the previously known features are larger than the acceptable threshold), and the image of kangaroo is added to the feature set”. Therefore, for instance, an image of KANGAROO corresponds to the subsequent test image, and the images of DOGS and BIRDS correspond to the set of test images).
On the other hand, Chakraborty fails to explicitly disclose but He discloses generating, using a first machine learning (ML) model, a plurality of candidate images that are each expected to meet a set of visual and semantic criteria; identifying as a test image, each of the plurality of candidate images that meets the set of visual and semantic criteria to obtain a plurality of test images (He, “[0026] The rich human feedback can also include scores for plausibility, text-image alignment, aesthetics, and overall quality. [0042] a set of training data can be generated by selecting or filtering images based on the output of the multimodal feedback prediction model. For example, a set of images can be generated for each of one or more text prompts. For instance, eight images can be generated in response to a prompt. The images can be generated using a target model that is to be trained. The system can select images for a finetuning dataset based on the output of the prediction model. For example, the system can obtain one or more scores generated for an image. If the score(s) generated for an image satisfies one or more criteria (e.g., exceeds a fixed threshold), the image and corresponding prompt can be selected as part of the finetuning dataset”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Chakraborty and He. That is, applying the generating test images of Chakraborty to the selected dataset of image and corresponding prompt of He. The motivation/ suggestion would have been the automatically generated feedback can be used to finetune and improve the generative models and/or to generate masks or other data for inpainting problematic regions in generated images (He, [0026]).
On the other hand, Chakraborty in view of He fails to explicitly disclose but Wannerberg discloses determining the difference of visual and semantic properties (Wannerberg, “[0120] machine learning model 900 may be trained using training data 904 to receive as input data corresponding to an image of 3D object 122 (e.g., a beach chair in FIG. 1B inserted into the 3D environment) and/or other portions of 3D environment 106 (e.g., palm trees 108 and/or the ocean and/or the sand, and output text, e.g., natural language description 120 of FIG. 1B. [0160] a vector may be generated for each 3D object in the library indicative of its textual description and/or image characteristics, and compared to a vector for the textual description and/or image characteristics of a current 3D scene, to determine whether there is sufficient similarity”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Wannerberg, Chakraborty and He, to include all limitations of claim 1. That is, replacing the features of Chakraborty with the textual description and/or image characteristics of Wannerberg to determine the difference. The motivation/ suggestion would have been to enable providing an interactive, semantics-based recommendation system to facilitate 3D content creation by way of providing an intuitive and easy-to-use user interface to one or more users, e.g., via a user interface of an HMD, in a dynamic virtual design environment (Wannerberg, [0007]).
Regarding claim(s) 8, 15, they are interpreted and rejected for the same reasons set forth in claim(s) 1. Specifically, claim(s) 8 and 15 further discloses “A system comprising: a memory; and a processing device operatively coupled to the memory”, and “A non-transitory computer-readable medium having instructions stored thereon which, when executed by a processing device, cause the processing device to…”, respectively. Chakraborty further discloses “[0011] The code can then be executed on a system (or apparatus) including a memory, and at least one processor that is coupled to the memory and operative to perform the techniques. [0124] Computer readable program instructions are typically loaded onto computer 101 to cause a series of operational steps to be performed by processor set 110 of computer 101…”.
Regarding claim 7, Chakraborty in view of He and Wannerberg discloses The method of claim 1.
On the other hand, Chakraborty in view of Wannerberg fails to explicitly disclose but He discloses wherein a third ML model identifies as a test image, each of the plurality of candidate images that meets the set of visual and semantic criteria (He, “[0042] The system can select images for a finetuning dataset based on the output of the prediction model. For example, the system can obtain one or more scores generated for an image. If the score(s) generated for an image satisfies one or more criteria (e.g., exceeds a fixed threshold), the image and corresponding prompt can be selected as part of the finetuning dataset”). The same motivation of combining He in claim 1 applies here.
Regarding claim(s) 14, it is interpreted and rejected for the same reasons set forth in claim(s) 7.
Claim(s) 2, 9, 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chakraborty et al. (US 20240184524) in view of He et al. (US 20260004490), and further in view of Wannerberg et al. (US 20250252661) and Dasgupta et al. (US 11176154).
Regarding claim 2, Chakraborty in view of He and Wannerberg discloses The method of claim 1.
On the other hand, Chakraborty in view of He and Wannerberg fails to explicitly disclose but Dasgupta discloses displaying, as they are added to the set of test images, the first test image and each of the set of subsequent test images that is added to the set of test images (Dasgupta, col.8, lines 47-51, “For example, in some embodiments, the sample review interface may include a GUI that displays incoming images to be added a dataset and selectively reject any samples that fall outside the scope of that dataset”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Dasgupta into the combination of Wannerberg, Chakraborty and He, to include all limitations of claim 2. That is, adding the displaying images of Dasgupta to the set of test images of Chakraborty in view of He and Wannerberg. The motivation/ suggestion would have been the sample review interface 154 allows users to review changes to dataset based on the sample data (Dasgupta, col.8, lines 45-47).
Regarding claim(s) 9, 16, they are interpreted and rejected for the same reasons set forth in claim(s) 2.
Claim(s) 5, 12, 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chakraborty et al. (US 20240184524) in view of He et al. (US 20260004490), and further in view of Wannerberg et al. (US 20250252661) and AHN et al. (US 20200110788).
Regarding claim 5, Chakraborty in view of He and Wannerberg discloses The method of claim 1, wherein He discloses generating each of the plurality of candidate images using a text prompt (He, “[0042] For example, a set of images can be generated for each of one or more text prompts. For instance, eight images can be generated in response to a prompt”).
On the other hand, Chakraborty in view of He and Wannerberg fails to explicitly disclose but AHN discloses the text prompt is a candidate object of interest from a predefined list of candidate objects of interest (AHN, “[0138] The user of the first terminal 210 may generate a list of interests by selecting a more preferred item among the items of “music appreciation” and “restaurant.””).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined AHN into the combination of Wannerberg, Chakraborty and He, to include all limitations of claim 5. That is, applying the list of interest of AHN to the prompt of He. The motivation/ suggestion would have been the satisfaction level with the mediation felt by the user of the terminal may be further improved (AHN, [0136]).
Regarding claim(s) 12, 19, they are interpreted and rejected for the same reasons set forth in claim(s) 5.
Allowable Subject Matter
Claim(s) 3-4, 6, 10-11, 13, 17-18, 20 is/are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:
Regarding claim 3, it recites, building an input dimensional space for the first ML model; and defining within the input dimensional space, the set of distance metrics, wherein the set of distance metrics comprises: a first minimum distance between visual and semantic properties of test images that are key images and visual and semantic properties of test images that are distractor images; a second minimum distance between visual and semantic properties of test images that are key images; and a third minimum distance between visual and semantic properties of all test images. None of the prior arts on the record or any of the prior arts searched, alone or in combination, renders obvious the combination of elements recited in the claim(s) as a whole.
Regarding claim(s) 10, 17, they are interpreted and allowed under similar rationale as set forth in claim 3.
Regarding claim 4, it recites, wherein the set of visual and semantic criteria include: a generic and non-descript background; a unitary object of interest whose semantics can be determined; a restriction that the unitary object of interest cannot contain human or animal faces; a restriction on the subject matter of the unitary object of interest; and a position and orientation of the unitary object of interest. None of the prior arts on the record or any of the prior arts searched, alone or in combination, renders obvious the combination of elements recited in the claim(s) as a whole.
Regarding claim(s) 11, 18, they are interpreted and allowed under similar rationale as set forth in claim 4.
Regarding claim 6, it recites, wherein generating a candidate image of the plurality of candidate images using a candidate object of interest comprises: determining whether the candidate object of interest has been previously used to generate any of the plurality of candidate images; in response to determining that the candidate object of interest has been previously used to generate any of the plurality of candidate images, modifying one or more visual properties of the candidate object of interest; and generating the candidate image of the plurality of images using the modified candidate object of interest. None of the prior arts on the record or any of the prior arts searched, alone or in combination, renders obvious the combination of elements recited in the claim(s) as a whole.
Regarding claim(s) 13, 20, they are interpreted and allowed under similar rationale as claim 6.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GRACE Q LI whose telephone number is (571)270-0497. The examiner can normally be reached Monday - Friday, 8:00 am-5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, DEVONA FAULK can be reached at 571-272-7515. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/GRACE Q LI/Primary Examiner, Art Unit 2618 3/18/2026