Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Detailed Action
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 10/10/25 has been entered.
In amendments dated 10/10/25, Applicant amended claims 2, 9, and 16, canceled no claims, and added no new claims. Claims 2-21 are presented for examination.
Applicant is advised that the instant application is now being examined by Examiner Bruce Moser.
Rejections under 35 U.S.C. 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 2-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to mental processes without significantly more. Independ clams 2, 9, and 16 each recites determining, using one or more hardware processors coupled to a non-transitory memory of a system for an artificial intelligence (Al) model pipeline, content in a media file using an image analyzer Al model of a plurality of computer vision Al models of the Al model pipeline; selecting, using the one or more hardware processors, a subset of the plurality of computer vision Al models usable to analyze the media file based on the content, wherein the subset comprises a first Al model trained for a first image classification analysis and a second Al model trained for a second image classification analysis; executing, using the one or more hardware processors, a first run of the first Al model for the first image classification analysis based on the content; generating, using the one or more hardware processors, a first set of computer vision tags for the media file based on the first run; executing, using the one or more hardware processors, a second run of the second Al model for the second image classification analysis based on the content; generating, using the one or more hardware processors, a second set of computing vision tags for the media file based on the first run; segmenting a plurality of objects from the media file using the first set and the second set of computing vision tags; concatenating the plurality of objects in the media file that prevents, for each of the plurality of objects, different ones of the plurality of objects from being combined when segmented from the media file for generating a plurality of first computer vision tags for the plurality of objects segmented from the media file; performing an intra-model fusion of outputs from at least the first and second Al models of the AI model pipeline based on the concatenated plurality of objects and the first set and the second set of computing vision tags; generating, using the one or more hardware processors and based on the intra-model fusion, the plurality of first computer vision tags for the media file, wherein each of the plurality of first computer vision tags is associated with a confidence value that each of the plurality of objects is properly labeled; filtering, using the one or more hardware processors, the plurality of first computer vision tags based on the confidence values and a Natural Language Processing (NLP) model, wherein the filtering removes a portion of the plurality of first computer vision tags based on first corresponding ones of the confidence values at or below a predetermined threshold and prioritizes a remaining portion of the plurality of first computer vision tags based on a ranking of second corresponding ones of the confidence values; and tagging, using the one or more hardware processors, the content in the media file based on the filtered plurality of first computer vision tags. Determining content is evaluating and a mental process, selecting AI models is evaluating and a mental process, generating first and second sets of tags are recited broadly and are mental processes accomplishable in the human mind or on paper, executing first and second runs of an AI model are not significantly more than mental processes per Recentive Analytics v. Fox Broadcasting Corp. (134 F.4th 1205, 2025 U.S.P.Q.2d 628), segmenting and concatenating objects and performing a fusion of outputs are each recited broadly and are mental processes accomplishable in the human mind or on paper, generating and filtering the plurality of tags are each recited broadly and are mental processes accomplishable in the human mind or on paper, and tagging content is a mental process accomplishable in the human mind or on paper. Each claim recites an additional element of outputting, from the Al model pipeline using the one or more hardware processors, the media file comprising the tagged content having the filtered plurality of first computer vision tags searchable by a search process in place of performing an Al visual analysis of the content, which is an output step and insignificant extra-solution activity. Claim 9 recites a non-transitory memory and one or more hardware processors, and claim 16 recites a non-transitory computer readable medium comprising computer readable instructions, which are each generic components of a computer. Examiner notes specification paragraphs 0004-0006 recites drawbacks in the technology for object identification in images, videos, and other content (some images and/or videos lack meaningful tags or descriptions causing users to be unable to discover said content via search or any means other than direct user lookup, deep learning has been successful in identifying some information in images, a human-comparable automatic annotation of images and videos (comparable to deep learning identifying information in images) such as producing natural-language descriptions solely from visual data is still far from being achieved, and recognition parameters are not personalized at a user level and may not account for user preferences in searches). Specification paragraphs 0018-0025 describe techniques in the invention for addressing the above drawbacks but the claim steps do not recite a particular improvement in any technology or function of a computer per MPEP 2106.04(d) and do not recite any unconventional steps in the invention per MPEP 2106.05(a). Therefore, the recited mental processes are not integrated into a practical application. Taking the claims as a whole, the output step is recited broadly and amounts to sending data across a network per specification paragraph 0026 figure 1A computer networks 101, which is routine and conventional activity per the list of such activities in MPEP 2106.05(d) part II. The non-transitory memory, one or more hardware processors, and non-transitory computer readable medium comprising computer readable instructions are still each generic components of a computer. Thus the claims do not include additional elements that are sufficient to amount to significantly more than the recited mental processes.
Claims 3, 10, and 17 each recites wherein the subset of the plurality of computer vision Al models is further selected based on user preferences for a user performing a search associated with the media file, and selecting AI models is evaluating and a mental process. Claims 4, 11, and 18 each recites determining, using the one or more hardware processors, the user preferences based on at least one of past searches for past content in past media files by the user or ones of the plurality of computer vision Al models usable for identifying the past content for the past searches, and determining user preferences is evaluating and a mental process. Claims 5, 12, and 19 each recites identifying, using the one or more hardware processors, one of the plurality of first computer vision tags having a corresponding one of the confidence values at or below the predetermined threshold, and identifying a confidence value per a threshold is evaluating and a mental process; reprocessing, using the one or more hardware processors, the one of the plurality of first computer vision tags using the subset of the plurality of computer vision AI models and the NLP model, and reprocessing by applying an AI model is not significantly more than mental processes per Recentive Analytics v. Fox Broadcasting Corp. (134 F.4th 1205, 2025 U.S.P.Q.2d 628); determining, using the one or more hardware processors, that the one of the plurality of first computer vision tags is an irrelevant tag based on the reprocessing, and determining a tag is irrelevant is evaluating and a mental process; and discarding the one of the plurality of first computer vision tags based on being the irrelevant tag, and discarding a tag as irrelevant is a mental process accomplishable in the human mind or on paper.
Claims 6, 13, and 20 each recites wherein the determining the content includes determining a plurality of second computer vision tags initially used to tag the content in the media file, and wherein the selecting is further based on the plurality of second computer vision tags, and determining tags is evaluating and a mental process. Claims 7, 14, and 21 each recites extracting, using the one or more hardware processors, a plurality of frames from the media file based on the content and the second plurality of computer vision tags, and extracting frames is recited broadly and amounts to receiving data across a network and is routine and conventional per the list of such activities in MPEP 2106.05(d) part II; and building, using the one or more hardware processors, at least one scene using the extracted plurality of frames, which is recited broadly and a mental process accomplishable in the human mind or on paper, wherein the executing the first run and the second run are further based on the built at least one scene, and executing an AI model is not significantly more than mental processes per Recentive Analytics v. Fox Broadcasting Corp. (134 F.4th 1205, 2025 U.S.P.Q.2d 628). Claims 8 and 15 each recites wherein the plurality of computer vision AI models comprises at least one of an object segmentation model, an object localization model, an object detection and recognition model, the NLP model, or a relevance feedback loop model, and using an AI model is applying it which is not significantly more than mental processes per Recentive Analytics v. Fox Broadcasting Corp. (134 F.4th 1205, 2025 U.S.P.Q.2d 628).
Relevant Prior Art
During his search for prior art, Examiner found the following references to be relevant to Applicant's claimed invention. Each reference is listed on the Notice of References form included in this office action:
Mishra (US 9,465,994) teaches predicting performance of vision algorithms for imaging data, determining characteristics or attributes of imaging data such as tags, determining content of the images, predicting the most appropriate of the algorithms, running the algorithm and generating the tags, does not teach segmentation using the tags or object concatenation in the images, or intra-model fusion using output from the algorithms or filtering of the tags (columns 2-3 lines 64-46, column 4 lines 3-25, columns 8-9 lines 42-8); and
Garrigues et al (US 9,218,364) teaches providing a user with tags for video or image content and techniques for combining and segmenting images prompting the tagging os said images, does not teach segmentation using the tags or object concatenation in the images, or intra-model fusion using output from the algorithms or filtering of the tags (column 3 lines 8-54, column 10 lines 1036 figure 4A).
Responses to Applicant’s Remarks
Regarding rejections of claims 2-21 under 35 U.S.C. 101 for reciting mental processes without significantly more, Applicant’s arguments have been considered and are not persuasive. On pages 11-12 of his Remarks Applicant asserts, under Step 2 Prong Two of the Eligibility Analysis, “the claims provide specific improvements in technology so as to be limited to a practical application that improves over the prior systems.” Examiner disagrees and notes the claims do not recite specific details reciting how the invention implements an Al pipeline of individual AI models and fuses computer vision tags generated by such models into a set of tags that better identifies objects in images. The claims recite conclusory statements of determining content in a media file, selecting a subset of models, executing the models, generating tags, segmenting objects using the tags, concatenating objects in the media file, performing an intra-model fusion of output from the executed models, generating the tags, filtering the tags, and tagging the determined content. Thus Examiner does not believe the claims recite a particular improvement in any technology or function of a computer per MPEP 2106.04(d) and are not integrated into a practical application.
Regarding rejections of claims 2-21 under 35 U.S.C. 103 by Pesavento, Givental, Weisel, and Dunn, Applicant’s arguments on pages 12-14 of his Remarks have been considered and are persuasive that Applicant’s amendments overcome these references’ teachings.
Inquiry
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRUCE M MOSER whose telephone number is (571)270-1718. The examiner can normally be reached M-F 9a-5p.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Boris Gorney can be reached at 571 270-5626. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/BRUCE M MOSER/Primary Examiner, Art Unit 2154 5/15/26