DETAILED ACTION
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine,
manufacture, or composition of matter, or any new and useful improvement
thereof, may obtain a patent therefor, subject to the conditions and requirements
of this title.
Claims 1-8, and 21-32 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. When reviewing independent claim 1-14, and based upon consideration of all of the relevant factors with respect to the claim as a whole, 1-8, and 21-32 are held to claim an abstract idea without reciting elements that amount to significantly more than the abstract idea and is/are therefore rejected as ineligible subject matter under 35 U.S.C. 101.
The Examiner will analyze Claim 1, and similar rationale applies to independent Claims 21 and 27. The rationale, under MPEP § 2106, for this finding is explained below:
The claimed invention (1) must be directed to one of the four statutory categories, and (2) must not be wholly directed to subject matter encompassing a judicially recognized exception, as defined below. The following two step analysis is used to evaluate these criteria.
Step 1: Is the claim directed to one of the four patent-eligible subject matter categories: process, machine, manufacture, or composition of matter?
When examining the claim under 35 U.S.C. 101, the Examiner interprets that the claims is related to a machine since the claim is directed to a system with a processing device and a memory.
Step 2a, Prong 1: Does the claim wholly embrace a judicially recognized exception, which includes laws of nature, physical phenomena, and abstract ideas, or is it a particular practical application of a judicial exception?
The Examiner interprets that the judicial exception applies since Claim 1 limitation of selecting a pivot image [key frame] from the plurality of images, and generate a descriptor based at least in part on the pivot image are directed to an abstract.
The limitations could be performed by a person processing selected image to generate a language-based descriptor (mental process/step). Since the claim doesn’t specify how the pivot image is selected, it can be also done by a person.
Accordingly, the "mental processes" abstract idea grouping is defined as concepts performed in the human mind, and examples of mental processes include observations, evaluations, judgments, and opinions, The courts consider a mental process (thinking) that "can be performed in the human mind, or by a human using a pen and paper" to be an abstract idea. CyberSource Corp. v. Retail Decisions, Inc., 654 F.3d 1366, 1372, 99 USPQ2d 1690, 1695 (Fed. Cir. 2011). As the Federal Circuit explained, "methods which can be performed mentally, or which are the equivalent of human mental work, are unpatentable abstract ideas the ‘basic tools of scientific and technological work’ that are open to all.’" 654 F.3d at 1371, 99 USPQ2d at 1694 (citing Gottschalk v. Benson, 409 U.S. 63, 175 USPQ 673 (1972)). See also Mayo Collaborative Servs. v. Prometheus Labs. Inc., 566 U.S. 66, 71, 101 USPQ2d 1961, 1965 (2012) ("‘[M]ental processes[] and abstract intellectual concepts are not patentable, as they are the basic tools of scientific and technological work’" (quoting Benson, 409 U.S. at 67, 175 USPQ at 675)); Parker v. Flook, 437 U.S. 584, 589, 198 USPQ 193, 197 (1978) (same).
If/when the claim recites a judicial exception (i.e., an abstract idea enumerated in MPEP § 2106.04(a), a law of nature, or a natural phenomenon), the claim requires further analysis in Prong Two.
Step 2a, Prong 2: Does the claim recite additional elements that integrate the judicial exception into a practical application?
The additional claim limitations obtaining a video and providing the image and descriptor is nothing more than insignificant extra solution activity.
A machine learning model and a decoder are used to generally apply the abstract idea without limiting how it functions.
Step 2b: If a judicial exception into a practical application is not recited in the claim, the Examiner must interpret if the claim recites additional elements that amount to significantly more than the judicial exception.
The Examiner interprets that the Claims do not amount to significantly more since the Claims are generally linking the use of the judicial exception to a particular technological environment or field of use, e.g., a claim describing how the abstract idea of hedging could be used in the commodities and energy markets, as discussed in Bilski v. Kappos, 561 U.S. 593, 595, 95 USPQ2d 1001, 1010(2010) or a claim limiting the use of a mathematical formula to the petrochemical and oil-refining fields, as discussed in Parker v. Flook, 437 U.S. 584, 588-90, 198 USPQ 193, 197-98 (1978) (MPEP § 2106.05(h)).
Furthermore, the generic computer components of the processor/memory recited as performing generic computer functions that are well-understood, routine and conventional activities amount to no more than implementing the abstract idea with a computerized system.
Claims 2-8, 22-26, and 28-32 depending on the independent claims include all the limitation of the independent claim. The Examiner finds that Claims 2-8, 22-26, and 28-32 does not state significantly more since the claim only recites additional steps for analyzing video using machine learning model.
Thus, 1-8, and 21-32 recite the same abstract idea and therefore are not drawn to the eligible subject matter as they are directed to the abstract idea without significantly more.
Therefore, all claims are rejected under 35 U.S.C. 101.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1, 2, 21 and 27 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Chhaya et al. (Pub. No. US 2023/0290146).
Regarding claim 1, Chhaya teaches a system comprising: a memory component [Para. 35, 73, and 75]; and a processing device coupled to the memory component, the processing device to perform operations comprising [Para. 35, 73, and 75]: obtaining a video (digital video 114) comprising a plurality of images (frames 210) [Para. 45 “a digital video input module 202 is configured to input a plurality of digital videos 114 that are to be used as a basis to generate the digital document 120”; Para. 48 “The information references the selected digital videos 114 having frames 210 and corresponding digital audio 212”]; selecting a pivot image (key frame 512) from the plurality of images (frames from respective action clips) [Para. 29 “The selected path is utilized by a frame location module to find key frames by mapping the nodes back to the action clips. The frame location module, for instance, locates a key frame from a collection of frames from respective action clips using a clustering technique” and “A centroid is computed for each of the frames in the action clip, and a frame that is closest to the centroid is selected as a frame that is representative of the action client, i.e., is the “key frame”]; causing a first machine learning model (model 524 trained using machine learning) to generate a descriptor (textual components 518 including a sequence of entity 520 and respective action descriptions 522) based at least in part on the pivot image (key frame 512) by at least providing the pivot image as an input to the first machine learning model (model 524 trained using machine learning), where the descriptor includes a language description (textual components describe entities and corresponding action descriptions) of the pivot image (key frame) [Para. 59 “The decoding module 516 is configured to form textual components 518 that include a sequence of entity 520 and respective action descriptions 522 using a model 524 trained using machine learning (block 920). To do so, the decoding module 516 is configured to obtain the key frames 512 that are representative of the clusters along with content associated with the digital videos 114, e.g., titles, content outline, synopsis”; Para. 30 “A decoding module is then utilized by the digital document generation system to generate textual components based on the frames. The textual components describe entities and corresponding action descriptions.”]; and providing the pivot image (frames) and the descriptor (entities identified for each of the frames) to a decoder (action description decoding module) [Para. 30 “The entities are identified by an entity decoding module from portions of transcripts corresponding to the frames and/or from the frames themselves, e.g., using image processing and machine-learning classifiers”; and Para. 31 “The entities identified for each of the frames are processed using machine learning along with the frames using an action description decoding module to generate action descriptions for each of the entities”].
Claims 21 and 27 are rejected for the same reasons as claim 1.
Regarding claim 2, Chhaya teaches wherein the pivot image (key frame 512) depicts a conceptual element of the video (actions from the plurality of digital video 114) [Para. 49 “The digital document generation system 118 begins by locating action clips. The action clips includes frames that depict actions from the plurality of digital videos 114 (block 906)”; Para. 29 “The selected path is utilized by a frame location module to find key frames by mapping the nodes back to the action clips.” And “The frame location module, for instance, locates a key frame from a collection of frames from respective action clips using a clustering technique”].
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 3 and 4 are rejected under 35 U.S.C. 103 as being unpatentable over Chhaya et al. (Pub. No. US 2023/0290146) in view of LEE et al. (Pub. No. US 2023/0306056).
Regarding claim 3, Chhaya doesn’t explicitly teach the claim limitation.
However, LEE teaches wherein selecting the pivot image (key frame) further comprises selecting the pivot image from the plurality of images detecting a change between two or more images of the plurality of images [Para. 100 and 113].
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Chhaya to teach the claim limitation, feature as taught by LEE; because the modification enables the system to improve video compression efficiency and conceptual fidelity by using a machine learning model to pick pivot/key frames only when there is a meaningful change between video images.
Regarding claim 4, Chhaya doesn’t explicitly teach the claim limitation.
However, LEE teaches wherein the change comprises a modification to an object depicted in the two or more images that is detected, by the second machine learning model [Para. 59, 116 and121].
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Chhaya to teach the claim limitation, feature as taught by LEE; because the modification enables the system to improve video compression efficiency and conceptual fidelity by using a machine learning model to pick pivot/key frames only when there is a meaningful change between video images.
Claims 5 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Chhaya et al. (Pub. No. US 2023/0290146) in view of LEE et al. (Pub. No. US 2023/0306056) further in view of LI et al. (Pub. No. US 2022/0207750).
Regarding claim 5, Chhaya in view of LEE doesn’t explicitly teach the claim limitation.
LI teaches wherein the change comprises detecting, by the second machine learning model, an additional object relative to at least one image of the two or more images [Para. 73, and 85].
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Chhaya in view of LEE to teach the claim limitation, feature as taught by LI; because the modification enables the system to improve automated visual change detection by using a second machine learning model to identify newly appearing objects when comparing one image against other images in a set.
Regarding claim 6 Chhaya in view of LEE doesn’t explicitly teach the claim limitation.
LI teaches wherein causing the first machine learning model to generate the descriptor further comprises prompting the first machine learning model to describe a conceptual element of the video relative to the pivot image and at least one other image of the plurality of images [fig. 8 , 9 and related description].
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Chhaya in view of LEE to teach the claim limitation, feature as taught by LI; because the modification enables the system to improve automated visual change detection by using a second machine learning model to identify newly appearing objects when comparing one image against other images in a set.
Claims 7 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Chhaya et al. (Pub. No. US 2023/0290146) in view of Kreis et al. (Pub. No. US 2024/0171788).
Regarding claim 7, Chhaya teaches wherein the processing device further performs operations causing [Claim 11 and corresponding description].
However, Chhaya doesn’t explicitly teach the rest of the claim limitations.
Kreis teaches, at the decoder, a third machine learning model to generate a reconstructed video by at least providing as a first input to the third machine learning model the pivot image and the descriptor, where the third machine learning model uses the pivot image and at least a portion of the descriptor to output a second plurality of images that are combined to generate the reconstructed video [Para. 17 “The method can include updating a neural network model to align a plurality of images into frames of a first video by updating at least one first temporal attention layer of the neural network model” Para. 8 “the neural network model is to generate a third video and the first video by generating at least one frame between two consecutive frames of the third video according to relative time step embedding.” Para. 9-11].
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Chhaya to teach the claim limitations, feature as taught by Kreis; because the modification enables the system to improve scalability at high resolutions.
Regarding claim 8, Chhaye doesn’t explicitly teach the claim limitation.
However, Kreis teaches wherein the first machine learning model comprises a large language model, the second machine learning model comprises a neural network, and the third machine learning model comprises a diffusion model [Para. 2, and 11].
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Chhaya to teach the claim limitations, feature as taught by Kreis; because the modification enables the system to improve scalability at high resolutions.
Claims 22 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Chhaya et al. (Pub. No. US 2023/0290146) in view of YIN et al. (Pub. No. US 2025/0088675).
Regarding claim 22, Chhaye doesn’t explicitly teach the claim limitation.
However, YIN teaches wherein the medium further stores executable instructions, that, cause the processing device to perform operations causing a decoder executed by the endpoint to generate a reconstructed video by at least providing the descriptor and the pivot image as an input to a generative model [Para. 73 and 74].
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Chhaya to teach the claim limitations, feature as taught by YIN; because the modification enables the system to improve video streaming efficiently by letting the decoder reconstruct the video from a key image plus a compact descriptor using a generative model, reducing the amount of data that must be transmitted while maintaining visual quality.
Regarding claim 23 Chhaye doesn’t explicitly teach the claim limitation.
However, YIN teaches wherein the generative model generates intermediate frames of the reconstructed video between the pivot image and a second pivot image based at least in part on the descriptor [fig. 5 and related description].
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Chhaya to teach the claim limitations, feature as taught by YIN; because the modification enables the system to improve video streaming efficiently by letting the decoder reconstruct the video from a key image plus a compact descriptor using a generative model, reducing the amount of data that must be transmitted while maintaining visual quality.
Claim 24 is rejected under 35 U.S.C. 103 as being unpatentable over Chhaya et al. (Pub. No. US 2023/0290146) in view of Liu et al. (Pub. No. US 2020/0012940).
Regarding claim 24, Chhaye doesn’t explicitly teach the claim limitation.
However, Liu teaches wherein obtaining the pivot image further comprises sampling frames of the video over an interval of time [Para. 4 and 50].
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Chhaya to teach the claim limitations, feature as taught by Liu; because the modification enables the system to improve video streaming efficiently by letting the decoder reconstruct the video from a key image plus a compact descriptor using a generative model, reducing the amount of data that must be transmitted while maintaining visual quality.
Claim 25 is rejected under 35 U.S.C. 103 as being unpatentable over Chhaya et al. (Pub. No. US 2023/0290146) in view of Shetty et al. (Pub. No. US 2016/0070962).
Regarding claim 25, Chhaye doesn’t explicitly teach the claim limitation.
However, Shetty teaches wherein obtaining the pivot image further comprises causing a second machine learning model to determine the pivot image includes a conceptual element of the video [Para. 36 and 43].
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Chhaya to teach the claim limitations, feature as taught by Shetty; because the modification enables the system to solving how to use a few pivot frames plus rich natural language descriptions to drive a generative model that reconstructs a video.
Claims 26, 28, and 29 are rejected under 35 U.S.C. 103 as being unpatentable over Chhaya et al. (Pub. No. US 2023/0290146) in view of Yu et al. (Pub. No. US 2017/0127016).
Regarding claim 26, Chhaye doesn’t explicitly teach the claim limitation.
However, Yu teaches wherein the medium further stores executable instructions, that, cause the processing device to perform operations causing the machine learning model to generate a second descriptor that includes a second natural language description of a relationship between the set of pivot images and at least one other pivot image obtained from the video, where the pivot image of the at least one other pivot image is provided to the machine learning model as an input [Para. 27, and 70].
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Chhaya to teach the claim limitations, feature as taught by Yu; because the modification enables the system to improve the quality, efficiency, and semantic controllability of video reconstruction by using a small set of intelligently chosen pivot frames plus rich natural language descriptors instead of needing the full original video.
Regarding claim 28, Chhaye doesn’t explicitly teach the claim limitation.
However, Yu teaches wherein the descriptor further includes a second natural language description of objects within the pivot image [fig. 1, 3 and related description].
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Chhaya to teach the claim limitations, feature as taught by Yu; because the modification enables the system to improve the quality, efficiency, and semantic controllability of video reconstruction by using a small set of intelligently chosen pivot frames plus rich natural language descriptors instead of needing the full original video.
Regarding claim 29, Chhaye doesn’t explicitly teach the claim limitation.
However, Yu teaches wherein causing the second machine learning model to generate the reconstructed video further comprises causing the second machine learning model to reconstruct a first version of the video [fig. 2, 3 and related description].
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Chhaya to teach the claim limitations, feature as taught by Yu; because the modification enables the system to improve the quality, efficiency, and semantic controllability of video reconstruction by using a small set of intelligently chosen pivot frames plus rich natural language descriptors instead of needing the full original video.
Claims 30 and 32 are rejected under 35 U.S.C. 103 as being unpatentable over Chhaya et al. (Pub. No. US 2023/0290146) in view of YIN et al. (Pub. No. US 2025/0088675).
Regarding claim 30, Chhaya doesn’t teach the claim limitation.
However, YIN teaches wherein causing the second machine learning model to generate the reconstructed video further comprises combining a plurality of images generated by the second machine learning model based at least in part on the pivot image and the descriptor [Para. 82].
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Chhaya to teach the claim limitations, feature as taught by YIN; because the modification enables the system to improve the quality, efficiency, and semantic controllability of video reconstruction by using a small set of intelligently chosen pivot frames plus rich natural language descriptors instead of needing the full original video.
Regarding claim 32, Chhaya doesn’t teach the claim limitation.
However, YIN teaches wherein causing the second machine learning model to generate the reconstructed video further comprises providing, as an input, a plurality of pivot images and a plurality of descriptors to the second machine learning model [fig. 2, 4 and related description].
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Chhaya to teach the claim limitations, feature as taught by YIN; because the modification enables the system to improve the quality, efficiency, and semantic controllability of video reconstruction by using a small set of intelligently chosen pivot frames plus rich natural language descriptors instead of needing the full original video.
Claim 31 is rejected under 35 U.S.C. 103 as being unpatentable over Chhaya et al. (Pub. No. US 2023/0290146) in view of YIN et al. (Pub. No. US 2025/0088675) further in view of Yu et al. (Pub. No. US 2017/0127016).
Regarding claim 31, Chhaye doesn’t explicitly teach the claim limitation.
However, Yu teaches wherein the descriptor further includes a second natural language description of objects within the pivot image [fig. 1, 3 and related description].
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Chhaya to teach the claim limitations, feature as taught by Yu; because the modification enables the system to improve the quality, efficiency, and semantic controllability of video reconstruction by using a small set of intelligently chosen pivot frames plus rich natural language descriptors instead of needing the full original video.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SOLOMON G BEZUAYEHU whose telephone number is (571)270-7452. The examiner can normally be reached on Monday-Friday 10 AM-7 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, O’Neal Mistry can be reached on 313-446-4912. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-0101 (IN USA OR CANADA) or 571-272-1000.
/SOLOMON G BEZUAYEHU/ Primary Examiner, Art Unit 2666