Last updated: April 19, 2026
Application No. 17/696,329
Machine Learning Model Based Embedding for Adaptable Content Evaluation

Non-Final OA §101§102§103
Filed
Mar 16, 2022
Examiner
HONORE, EVEL NMN
Art Unit
2142
Tech Center
2100 — Computer Architecture & Software
Assignee
Disney Enterprises Inc.
OA Round
3 (Non-Final)
This examiner grants 39% of cases after interview

— +46.4% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 18 resolved cases, 2023–2026
Examiner Intelligence

HONORE, EVEL NMN View full profile →
Grants only 39% of cases
Career Allow Rate
7 granted / 18 resolved
-16.1% vs TC avg
Strong +46% interview lift
Without
With
+46.4%
Interview Lift
resolved cases with interview
Typical timeline
4y 5m
Avg Prosecution
38 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
42.6%
+2.6% vs TC avg
§103
49.7%
+9.7% vs TC avg
§102
6.6%
-33.4% vs TC avg
§112
1.1%
-38.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 18 resolved cases
Office Action

§101 §102 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This action is responsive to the Application filed on 11/04/2025
Claims 1-20 are pending in the case. Claims 1 and 11 are independent claims. Claims
1-2, 4, 11-12, 14 have been currently amended. 

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 11/04/2025 has been entered.
 
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim(s) 1-20 are rejected under 35 U.S.C. § 101 because the claimed invention is
directed to an abstract idea without significantly more.
When considering subject matter eligibility under 35 U.S.C. 101, it must be determined whether the claim is directed to one of the four statutory categories of invention, i.e., process, machine, manufacture, or composition of matter (Step 1). If the claim does fall within one of the statutory categories, the second step in the analysis is to determine whether the claim is directed to a judicial exception (Step 2A). The Step 2A analysis is broken into two prongs. In the first prong (Step 2A, Prong 1), it is determined whether or not the claims recite a judicial exception (e.g., mathematical concepts, mental processes, certain methods of organizing human activity). If it is determined in Step 2A, Prong 1 that the claims recite a judicial exception, the analysis proceeds to the second prong (Step 2A, Prong 2), where it is determined whether or not the claims integrate the judicial exception into a practical application. If it is determined at step 2A, Prong 2 that the claims do not integrate the judicial exception into a practical application, the analysis proceeds to determining whether the claim is a patent-eligible application of the exception (Step 2B). If an abstract idea is present in the claim, any element or combination of elements in the claim must be sufficient to ensure that the claim integrates the judicial exception into a practical application, or else amounts to significantly more than the abstract idea itself. Applicant is advised to consult the 2019 PEG for more details of the analysis.

Step 1 Analysis: Is the claim to a process, machine, manufacture or composition of matter? See MPEP § 2106.03.

Claim(s) 1-10 are drawn to a system and claims 11-20 are drawn to a method, therefore each of these claim groups falls under one of four categories of statutory subject matter (machine/products/apparatus, process/method, manufactures and compositions of mater; Step 1). Nonetheless, the claims are directed to a judicially recognized exception of an abstract idea without significant more (Step 2A, see below). Independent claims 1 and 10 are nonverbatim but similar in claim construction, hence share the same rationale that the claimed inventions are directed to non-statutory subject matter as follows:

Regarding claim 1:

Claim 1 recites: A system comprising: 
a processing hardware; and 
a system memory storing a software code and a machine learning (ML) model based embedder trained using contrastive learning based on a similarity metric to map each of a plurality of video segments to a respective embedding in a continuous vector space; 
the processing hardware configured to execute the software code to: 
receive an input including the plurality of video segments; 
map, using the ML model based embedder, each of the plurality of video segments to the respective embedding in the continuous vector space to provide a plurality of mapped embeddings corresponding respectively to the plurality of video segments; 
perform, using the ML model based embedder or another trained ML model, one of a classification or a regression of the plurality of video segments using the plurality of mapped embeddings; 
classify, based on the one of the classification or the regression, the plurality of content video segments as being a first content type among a plurality of content types; 
determine an encoding schema suitable for encoding the first content type; and
 encode the plurality of video segments using the encoding schema determined to be suitable for encoding the first content type

Step 2A Prong One Analysis: Does the claim recite an abstract idea, law of nature, or natural phenomenon? See MPEP § 2106.04(II)(A)(1).

Claim 1 is directed to an abstract idea, specifically, a mental process that can practically be performed in the human mind, with or without the use of a physical aid such as pen and paper (including an observation, evaluation, judgment, opinion). See MPEP § 2106.04(a)(2)(III). As well as, a mathematical concept, when the claim recites," a mathematical calculation is a mathematical operation (such as multiplication) or an act of calculating using mathematical methods to determine a variable or number." See MPEP § 2106.04(a)(2)(I)(C).

Independent claim 1 recites in part:

map, […] each of the plurality of video segments to the respective embedding in the continuous vector space to provide a plurality of mapped embeddings corresponding respectively to the plurality of video segments

The limitation above is broadly and reasonable interpreted as a mental and mathematical concept. For example, one can match each video segments, involving a mathematical transformation of data into numerical vector representations. See MPEP § 2106.04(a)(2)(III) & See MPEP § 2106.04(a)(2)(I)(C).

classify, based on the one of the classification or the regression, the plurality of content video segments as being a first content type among a plurality of content types

The limitation above is broadly and reasonable interpreted as a mental concept. For example, with pen and paper one can sort video segments into one main type from a group of different types. See MPEP § 2106.04(a)(2)(III).

determine an encoding schema suitable for encoding the first content type
The limitation above is broadly and reasonable interpreted as a mental concept. For example, one can mentally evaluate information and make a judgement. See MPEP § 2106.04(a)(2)(III).

Step 2A Prong Two Analysis: Does the claim recite additional elements that integrate the judicial exception into a practical application? See MPEP § 2106.04(d).

Independent claim 1 recites in part:

A system comprising: 
a processing hardware; and 
a system memory storing a software code and a machine learning (ML) model based embedder trained using contrastive learning based on a similarity metric to map each of a plurality of video segments to a respective embedding in a continuous vector space, as drafted, amount to the judicial exception is not integrated into a practical application. In particular, the claims only recites generic computing components . Such generic computing components are recited at a high-level of generality (i.e., as a generic processor performing data gathering and mathematical calculations) such that they amount to no more than mere instructions to apply the exception using generic computer components.

the processing hardware configured to execute the software code to: 
receive an input including the plurality of video segments, as drafted, amount to the judicial exception is not integrated into a practical application. In particular, the claims only recites generic computing components . Such generic computing components are recited at a high-level of generality (i.e., as a generic processor performing data gathering and mathematical calculations) such that they amount to no more than mere instructions to apply the exception using generic computer components.

[…] using the ML model based embedder, […], as drafted, amount to adding the words “apply it” (or an equivalent) with the judicial exception and reciting only the idea of a solution or outcome, i.e., the claim fails to recite details of how a solution to a problem is accomplished because it is unclear how the “ML model” is used nor the specification makes it clear how these actions are performed.  Thus, these additional elements are recited in a manner that represent no more than mere instructions to apply the judicial exceptions on a computer.  See MPEP § 2106.05(f) and § 2106.04(d).

perform, using the ML model based embedder or another trained ML model, one of a classification or a regression of the plurality of video segments using the plurality of mapped embeddings, as drafted, amount to adding the words “apply it” (or an equivalent) with the judicial exception and reciting only the idea of a solution or outcome, i.e., the claim fails to recite details of how a solution to a problem is accomplished because it is unclear how the “ML model” is used nor the specification makes it clear how these actions are performed.  Thus, these additional elements are recited in a manner that represent no more than mere instructions to apply the judicial exceptions on a computer.  See MPEP § 2106.05(f) and § 2106.04(d).

encode the plurality of video segments using the encoding schema determined to be suitable for encoding the first content type, as drafted, amount to adding the words “apply it” (or an equivalent) with the judicial exception and reciting only the idea of a solution or outcome, i.e., the claim fails to recite details of how a solution to a problem is accomplished because it is unclear how the “encoding schema” is used nor the specification makes it clear how these actions are performed.  Thus, these additional elements are recited in a manner that represent no more than mere instructions to apply the judicial exceptions on a computer.  See MPEP § 2106.05(f) and § 2106.04(d).

Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea when considered as an ordered combination and as a whole.

Step 2B Analysis: Does the claim recite additional elements that amount to significantly more than the judicial exception? See MPEP § 2106.05.

First, the additional elements directed to generally linking the use of a judicial exception to a particular technological environment or field of use are deemed insufficient to transform the judicial exception to a patentable invention because the claimed limitations generally link the judicial exception to the technology environment, see MPEP 2106.05(h). However, they are included below for the sake of completeness.

Second, the additional elements mere application of the abstract idea or mere instructions to implement an abstract idea on a computer are deemed insufficient to transform the judicial exception to a patentable invention because the limitations generally apply the use of a generic computer and/or process with the judicial exception. See MPEP 2106.05(f). However, they are included below for the sake of completeness.

Independent claim 1 recites in part:

A system comprising: 
a processing hardware; and 
a system memory storing a software code and a machine learning (ML) model based embedder trained using contrastive learning based on a similarity metric to map each of a plurality of video segments to a respective embedding in a continuous vector space, as drafted, amount to the judicial exception is not integrated into a practical application. In particular, the claims only recites generic computing components . Such generic computing components are recited at a high-level of generality (i.e., as a generic processor performing data gathering and mathematical calculations) such that they amount to no more than mere instructions to apply the exception using generic computer components.

the processing hardware configured to execute the software code to: 
receive an input including the plurality of video segments, as drafted, amount to the judicial exception is not integrated into a practical application. In particular, the claims only recites generic computing components . Such generic computing components are recited at a high-level of generality (i.e., as a generic processor performing data gathering and mathematical calculations) such that they amount to no more than mere instructions to apply the exception using generic computer components.

[…] using the ML model based embedder, […], as drafted, amount to adding the words “apply it” (or an equivalent) with the judicial exception and reciting only the idea of a solution or outcome, i.e., the claim fails to recite details of how a solution to a problem is accomplished because it is unclear how the “ML model” is used nor the specification makes it clear how these actions are performed.  Thus, these additional elements are recited in a manner that represent no more than mere instructions to apply the judicial exceptions on a computer.  See MPEP § 2106.05(f) and § 2106.04(d).

perform, using the ML model based embedder or another trained ML model, one of a classification or a regression of the plurality of video segments using the plurality of mapped embeddings, as drafted, amount to adding the words “apply it” (or an equivalent) with the judicial exception and reciting only the idea of a solution or outcome, i.e., the claim fails to recite details of how a solution to a problem is accomplished because it is unclear how the “ML model” is used nor the specification makes it clear how these actions are performed.  Thus, these additional elements are recited in a manner that represent no more than mere instructions to apply the judicial exceptions on a computer.  See MPEP § 2106.05(f) and § 2106.04(d).

encode the plurality of video segments using the encoding schema determined to be suitable for encoding the first content type, as drafted, amount to adding the words “apply it” (or an equivalent) with the judicial exception and reciting only the idea of a solution or outcome, i.e., the claim fails to recite details of how a solution to a problem is accomplished because it is unclear how the “encoding schema” is used nor the specification makes it clear how these actions are performed.  Thus, these additional elements are recited in a manner that represent no more than mere instructions to apply the judicial exceptions on a computer.  See MPEP § 2106.05(f) and § 2106.04(d).

Thus, considering the additional elements individually and in combination and the claims as a whole, the additional elements do not provide significantly more than the abstract idea. The claims are not eligible subject matter. 

Therefore, in examining elements as recited by the limitations individually and as an ordered combination, as a whole the independent claim limitations do not recite what have the courts have identified as “significantly more”.

Regarding claim 11
Claim 11 recites: A method for use by a system including a processing hardware, and a system memory storing a software code and a machine learning (ML) model based embedder trained using contrastive learning based on a similarity metric to map each of a plurality of video segments to a respective embedding in a continuous vector space, the method comprising: 
receiving, by the software code executed by the processing hardware, an input including the plurality of video segments;
 mapping, by the software code executed by the processing hardware and using the ML model based embedder, each of the plurality of video segments to the respective embedding in the continuous vector space to provide a plurality of mapped embeddings corresponding respectively to the plurality of video segments; 
performing, using the ML model based embedder or another trained ML model, one of a classification or a regression of the content plurality of video segments, by the software code executed by the processing hardware, using the plurality of mapped embeddings; 
classifying, by the software code executed by the processing hardware based on the one of the classification or the regression, the plurality of video segments as being a first content type among a plurality of content types; 
determining, by the software code executed by the processing hardware, an encoding schema suitable for encoding the first content type; and 
encoding, by the software code executed by the processing hardware, the plurality of video segments using the encoding schema determined to be suitable for encoding the first content type
Step 2A Prong One Analysis: Does the claim recite an abstract idea, law of nature, or natural phenomenon? See MPEP § 2106.04(II)(A)(1).

Independent claim 11 recites in part:

mapping, […] each of the plurality of video segments to the respective embedding in the continuous vector space to provide a plurality of mapped embeddings corresponding respectively to the plurality of video segments

The limitation above is broadly and reasonable interpreted as a mental and mathematical concept. For example, one can match each video segments, involving a mathematical transformation of data into numerical vector representations. See MPEP § 2106.04(a)(2)(III) & See MPEP § 2106.04(a)(2)(I)(C).

classifying, by the software code executed by the processing hardware based on the one of the classification or the regression, the plurality of video segments as being a first content type among a plurality of content types

The limitation above is broadly and reasonable interpreted as a mental concept. For example, with pen and paper one can sort video segments into one main type from a group of different types. See MPEP § 2106.04(a)(2)(III).

determining, […] an encoding schema suitable for encoding the first content type
The limitation above is broadly and reasonable interpreted as a mental concept. For example, one can mentally evaluate information and make a judgement. See MPEP § 2106.04(a)(2)(III).

Step 2A Prong Two Analysis: Does the claim recite additional elements that integrate the judicial exception into a practical application? See MPEP § 2106.04(d).

Independent claim 11 recites in part:

A method for use by a system including a processing hardware, and a system memory storing a software code and a machine learning (ML) model based embedder trained using contrastive learning based on a similarity metric to map each of a plurality of video segments to a respective embedding in a continuous vector space, the method comprising, as drafted, amount to the judicial exception is not integrated into a practical application. In particular, the claims only recites generic computing components . Such generic computing components are recited at a high-level of generality (i.e., as a generic processor performing data gathering and mathematical calculations) such that they amount to no more than mere instructions to apply the exception using generic computer components.

receiving, by the software code executed by the processing hardware, an input including the plurality of video segments, as drafted, amount to the judicial exception is not integrated into a practical application. In particular, the claims only recites generic computing components . Such generic computing components are recited at a high-level of generality (i.e., as a generic processor performing data gathering and mathematical calculations) such that they amount to no more than mere instructions to apply the exception using generic computer components.

[…] by the software code executed by the processing hardware and using the ML model based embedder, […], as drafted, amount to adding the words “apply it” (or an equivalent) with the judicial exception and reciting only the idea of a solution or outcome, i.e., the claim fails to recite details of how a solution to a problem is accomplished because it is unclear how the “ML model” is used nor the specification makes it clear how these actions are performed.  Thus, these additional elements are recited in a manner that represent no more than mere instructions to apply the judicial exceptions on a computer.  See MPEP § 2106.05(f) and § 2106.04(d).

performing, using the ML model based embedder or another trained ML model, one of a classification or a regression of the content plurality of video segments, by the software code executed by the processing hardware, using the plurality of mapped embeddings, as drafted, amount to adding the words “apply it” (or an equivalent) with the judicial exception and reciting only the idea of a solution or outcome, i.e., the claim fails to recite details of how a solution to a problem is accomplished because it is unclear how the “ML model” is used nor the specification makes it clear how these actions are performed.  Thus, these additional elements are recited in a manner that represent no more than mere instructions to apply the judicial exceptions on a computer.  See MPEP § 2106.05(f) and § 2106.04(d).

encoding, by the software code executed by the processing hardware, the plurality of video segments using the encoding schema determined to be suitable for encoding the first content type, as drafted, amount to adding the words “apply it” (or an equivalent) with the judicial exception and reciting only the idea of a solution or outcome, i.e., the claim fails to recite details of how a solution to a problem is accomplished because it is unclear how the “encoding schema” is used nor the specification makes it clear how these actions are performed.  Thus, these additional elements are recited in a manner that represent no more than mere instructions to apply the judicial exceptions on a computer.  See MPEP § 2106.05(f) and § 2106.04(d).

Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea when considered as an ordered combination and as a whole.

Step 2B Analysis: Does the claim recite additional elements that amount to significantly more than the judicial exception? See MPEP § 2106.05.

First, the additional elements directed to generally linking the use of a judicial exception to a particular technological environment or field of use are deemed insufficient to transform the judicial exception to a patentable invention because the claimed limitations generally link the judicial exception to the technology environment, see MPEP 2106.05(h). However, they are included below for the sake of completeness.

Second, the additional elements mere application of the abstract idea or mere instructions to implement an abstract idea on a computer are deemed insufficient to transform the judicial exception to a patentable invention because the limitations generally apply the use of a generic computer and/or process with the judicial exception. See MPEP 2106.05(f). However, they are included below for the sake of completeness.

Independent claim 11 recites in part:

A method for use by a system including a processing hardware, and a system memory storing a software code and a machine learning (ML) model based embedder trained using contrastive learning based on a similarity metric to map each of a plurality of video segments to a respective embedding in a continuous vector space, the method comprising, as drafted, amount to the judicial exception is not integrated into a practical application. In particular, the claims only recites generic computing components . Such generic computing components are recited at a high-level of generality (i.e., as a generic processor performing data gathering and mathematical calculations) such that they amount to no more than mere instructions to apply the exception using generic computer components.

receiving, by the software code executed by the processing hardware, an input including the plurality of video segments, as drafted, amount to the judicial exception is not integrated into a practical application. In particular, the claims only recites generic computing components . Such generic computing components are recited at a high-level of generality (i.e., as a generic processor performing data gathering and mathematical calculations) such that they amount to no more than mere instructions to apply the exception using generic computer components.

[…] by the software code executed by the processing hardware and using the ML model based embedder, […], as drafted, amount to adding the words “apply it” (or an equivalent) with the judicial exception and reciting only the idea of a solution or outcome, i.e., the claim fails to recite details of how a solution to a problem is accomplished because it is unclear how the “ML model” is used nor the specification makes it clear how these actions are performed.  Thus, these additional elements are recited in a manner that represent no more than mere instructions to apply the judicial exceptions on a computer.  See MPEP § 2106.05(f) and § 2106.04(d).

performing, using the ML model based embedder or another trained ML model, one of a classification or a regression of the content plurality of video segments, by the software code executed by the processing hardware, using the plurality of mapped embeddings, as drafted, amount to adding the words “apply it” (or an equivalent) with the judicial exception and reciting only the idea of a solution or outcome, i.e., the claim fails to recite details of how a solution to a problem is accomplished because it is unclear how the “ML model” is used nor the specification makes it clear how these actions are performed.  Thus, these additional elements are recited in a manner that represent no more than mere instructions to apply the judicial exceptions on a computer.  See MPEP § 2106.05(f) and § 2106.04(d).

encoding, by the software code executed by the processing hardware, the plurality of video segments using the encoding schema determined to be suitable for encoding the first content type, as drafted, amount to adding the words “apply it” (or an equivalent) with the judicial exception and reciting only the idea of a solution or outcome, i.e., the claim fails to recite details of how a solution to a problem is accomplished because it is unclear how the “encoding schema” is used nor the specification makes it clear how these actions are performed.  Thus, these additional elements are recited in a manner that represent no more than mere instructions to apply the judicial exceptions on a computer.  See MPEP § 2106.05(f) and § 2106.04(d).

Thus, considering the additional elements individually and in combination and the claims as a whole, the additional elements do not provide significantly more than the abstract idea. The claims are not eligible subject matter. 

Therefore, in examining elements as recited by the limitations individually and as an ordered combination, as a whole the independent claim limitations do not recite what have the courts have identified as “significantly more”.

Furthermore, regarding dependent claims 2-10 which are dependent on claim 1 and claims 12-20 which are dependent on claim 11, the claims are directed to a judicial exception without significantly more as highlighted below in the claim limitations by evaluating the claim limitations under Step 2A and 2B:

Claims 2 and 12 are dependent on claims 1 and 11 respectively, and include additional element that amounts to adding the words “apply it” (or an equivalent) with the judicial exception, or merely uses a computer in its ordinary capacity as a tool to perform an existing process. See MPEP §§ 2106.04(d), 2106.05(f)(2). 

Claims 3 and 13 are dependent on claims 1 and 11 respectively, and include mental concept- concepts performed on pen and paper (including an observation, evaluation, judgement, opinion). For example, one can with pen and paper group similar data points corresponding to a distinct category. 

Claims 4 and 14 are dependent on claims 3 and 13 respectively, and include an additional element that amounts to adding the words “apply it” (or an equivalent) with the judicial exception and reciting only the idea of a solution or outcome, i.e., the claim fails to recite details of how a solution to a problem is accomplished because it is unclear how the “NN” or “unsupervised process” is used nor the specification makes it clear how these actions are performed. Thus, these additional elements are recited in a manner that represent no more than mere instructions to apply the judicial exceptions on a computer. See MPEP § 2106.05(f) and § 2106.04(d).

Claims 5 and 15 are dependent on claims 1 and 11 respectively, and include additional element that amounts to adding the words “apply it” (or an equivalent) with the judicial exception and reciting only the idea of a solution or outcome, i.e., the claim fails to recite details of how a solution to a problem is accomplished because it is unclear how the “NN” or “CNN” is used nor the specification makes it clear how these actions are performed. Thus, these additional elements are recited in a manner that represent no more than mere instructions to apply the judicial exceptions on a computer. See MPEP § 2106.05(f) and § 2106.04(d). 

Claims 6 and 16 are dependent on claims 1 and 11 respectively, and include an additional element that generally links the use of the judicial exception to a particular technological environment or field of use. See MPEP §§ 2106.04(d), 2106.05(h). 

Claims 7 and 17 are dependent on claims 1 and 11 respectively, and include an additional element that amounts to adding insignificant extra-solution activity to the judicial exception. See MPEP §§ 2106.04(d), 2106.05(g). 

Claims 8 and 18 are dependent on claims 1 and 11 respectively, and include additional elements recited in a manner that represent no more than mere instructions to apply the judicial exceptions on a computer. See MPEP § 2106.05(f) and § 2106.04(d).

Claims 9 and 19 are dependent on claims 1 and 11 respectively, and include additional elements recited in a manner that represent no more than mere instructions to apply the judicial exceptions on a computer. See MPEP § 2106.05(f) and § 2106.04(d). 

Claims 10 and 20 are dependent on claims 1 and 11 respectively, and include additional elements recited in a manner that represent no more than mere instructions to apply the judicial exceptions on a computer. See MPEP § 2106.05(f) and § 2106.04(d).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 7, 9, 11-14, 17 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Rotman et al. (US Patent No.11,450,111 B1), hereinafter referred to as Rotman in view of Lin et al. (US Patent No.11,042,798 B2), hereinafter referred to as Lin.

With respect to claim 1, Rotman discloses:
A system comprising: a processing hardware (In Fig. 8 and Col. 3-4, lines 67-2, Rotman discloses external hardware components.)
a system memory storing a software code and a machine learning (ML) model based embedder trained using contrastive learning based on a similarity metric to map each of a plurality of video segments to a respective embedding in a continuous vector space ((In Col. 3, lines 15-26, Rotman discloses a technical process involving machine learning (neural network) and video analysis, wherein a Siamese network learns to differentiate between data points by comparing them. It learns how to measure the “distance” (similarity or dissimilarity) between features extracted from video inputs. In addition, the paragraph discusses a machine-learning approach for analyzing and segmenting video content based on a “similarity metric” that integrates both visual and textual information. In Col. 21, lines 23-27, Rotman discloses that memory and persistent storage are a computer-readable storage medium that can be a tangible device that can retain and store instructions for use by an instruction execution device. In Col. 22, lines 48 56, Rotman further discloses that the computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer.))
the processing hardware configured to execute the software code to: receive an input including the plurality of video segments (In Col. 7, lines 20-31, Rotman disclose receiving a plurality of feature vectors corresponding to audio components of a video presentation)
map, using the ML model based embedder, each of the plurality of video segments to the respective embedding in the continuous vector space to provide a plurality of mapped embeddings corresponding respectively to the plurality of video segments (In FIGS. 5A-5B and Col. 16, lines 14-21, Rotman discloses that the neural network is designed to produce “feature space embeddings” for video segments, which summarize their visual and audio characteristics. The Block-Diagonal Structure, and the resulting distance matrix, implies that segments of the same scene will have shorter distances (they are closer in the feature space) compared to segments from different scenes. This structure helps in the effective analysis of video content. )
perform, using the ML model based embedder or another trained ML model, one of a classification or a regression of the content segments using the plurality of mapped embeddings (In Col.16, lines 14-35, Rotman discloses the machine learning model by clustering video segments that belong to the same class (or category) to be located close together in this feature space embedding, whereas segments from different categories should be spaced further apart.)
With respect to claim 1, Rotman do not explicitly disclose:
classify, based on the one of the classification or the regression, the plurality of  video segments as being a first content type among a plurality of content types
determine an encoding schema suitable for encoding the first content type
encode the plurality of video segments using the encoding schema determined to be suitable for encoding the first content type
However, it is known by Lin to disclose:
Classify, based on the one of the classification or the regression, the plurality of  video segments as being a first content type among a plurality of content types (In Col.19, lines 47–61, Lin discloses a neural network to learn features by learning to classify a plurality of content items based at least partially on the first or the second feature of the content item.)
Determine an encoding schema suitable for encoding the first content type (In Col.4-5, lines 60–2, Lin discloses the latent factor that encodes information about the similarity of content items or the semantics of content items (e.g., description or meaning associated with the content items). In Col. 18, lines 7–21, Lin discloses the machine-learning application learns or determines a first latent factor for the content item based on the equation above by using the latent factor module 204 of FIG. 2). The first latent factor includes a content item latent factor or a text query latent factor.)
Encode the plurality of video segments using the encoding schema determined to be suitable for encoding the first content type (In Col. 18, lines 7–21, Lin discloses the machine-learning application learns or determines a first latent factor for the content item based on the equation above by using the latent factor module 204 of FIG. 2). The first latent factor includes a content item latent factor or a text query latent factor.)

Rotman and Lin are analogous pieces of art because both references concern the video scene detection models utilize deterministic algorithms to learn representations of visual, textual, and audio features in video scenes. Accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Rotman, with encoding the visual and audio video components of the video segments and vectors, for input into machine learning model as taught by Rotman, with a first latent factor of the content intent item being classified as taught by Lin. The motivation for doing so would have been to improve upon the accuracy of video scene detection models that include dividing video segments into scenes and further classifying those scenes (See (Col.2-3, lines 62-1) of Rotman.)


Regarding claim 2, Rotman in view of Lin discloses the elements of claim 1. In addition, Lin disclose:
The system of claim 1, wherein the plurality of content types include a live action and an animation, and wherein the first content type is one of the live action or the animation (In Col. 6, lines 16-28, Lin disclose  machine-learning application obtains the visual web data from the visual web data system. The visual web data includes data related to a text query provided by a user, an image resulting from a search of the visual web data system based on the text query provided by the user)

Regarding claim 3, Rotman in view of Lin discloses the elements of claim 1. In addition, Rotman disclose:
The system of claim 1, wherein the classification comprises grouping each of at least one of the plurality of mapped embeddings into one or more clusters each corresponding respectively to a distinct category of the similarity metric (Rotman In Col. 13, lines 14- 39, the program finds the best way to group related video clips based on how similar they are. It uses a chart called a distance matrix that shows how different or similar the clips are. The program looks for groups of clips that share common features. It also identifies areas in the distance matrix that show lower values, which suggest that there are multiple similar clips in those areas. The distance matrix visually represents the similarities between clips. To calculate important numbers used to measure these similarities, the program uses some specific equations. These calculations help the program improve its learning about grouping video clips better.)

Regarding claim 4, Rotman in view of Lin discloses the elements of claim 3. In addition, Lin disclose:
The system of claim [[3]] 1, wherein the processing hardware is further configured to execute the software code to: select, based on the first content type, a pre-processing algorithm for pre-processing the plurality of video segments (In Col. 7, lines 53-65, Lin disclose the trained machine-learning model helps the application find and suggest similar content items (like images) to the user. For example, when the user picks a content item from a search result to view, the system lets the machine-learning application know about this choice. Then, the application finds more content items that are similar to what the user selected for the system to show the user.)
Pre-process the plurality of video segments using the selected pre-processing algorithm (In Col. 7, lines 37-39, Lin disclose the user selects a content item associated with a result of a text query to view on the visual web data system.)

Regarding claim 7, Rotman in view of Lin discloses the elements of claim 1. In addition, Rotman disclose:
The system of claim 1, wherein the similarity metric comprises one of a quantitative similarity metric or a perceptual similarity metric (Rotman In Col. 3, lines 6-26 Some solutions try to organize video shots into scenes without needing a lot of supervision. They do this by comparing keyframes based on certain visual features. A method called quantization helps to group the shots. A sliding window technique is used that decides if shots or short scenes should be put together. Generally, deep learning features from the video are grouped into scenes. Shots that are linked together are combined using a method that overlaps links, which works like a graph. These features are stronger. Most methods evaluate from just one viewpoint. They also learn a way to measure the distance between shots using a specific type of deep learning model, which helps define the boundaries of scenes. They find a way to connect visual info and text to measure similarity in the video. In Col. 7, lines 7-19In different versions of the invention, machine learning model 124 assesses how similar shots within the same scene are compared to shots from different scenes. The similarity between scenes is represented using a distance matrix. Typically, this results in a block-diagonal structure, indicating that shots from the same scene tend to have lower distance values than shots from different scenes. Furthermore, optimal sequence grouping program 126 employs an optimal sequential grouping algorithm to arrange the distance matrix, ensuring that the intra-scene distances are minimized.)

Regarding claim 9, Rotman in view of Lin discloses the elements of claim 1. In addition, Rotman disclose:
The system of claim 1, wherein the one of the classification or the regression is performed using a respective one of a trained classification ML model or a trained regression ML model, and wherein the respective one of the trained classification ML model or the trained regression ML model comprises a trained neural network (NN) (In FIGS. 5A-5B and Col. 16, lines 14-21, Rotman discloses that the neural network is designed to produce “feature space embeddings” for video segments, which summarize their visual and audio characteristics. The Block-Diagonal Structure, and the resulting distance matrix, implies that segments of the same scene will have shorter distances (they are closer in the feature space) compared to segments from different scenes. This structure helps in the effective analysis of video content.)


With respect to claim 11, Rotman discloses:
A method for use by a system including a processing hardware, and a system memory storing a software code and a machine learning (ML) model based embedder trained using contrastive learning based on a similarity metric to map each of a plurality of video segments to a respective embedding in a continuous vector space (In Col. 3, lines 15-26, Rotman discloses a technical process involving machine learning (neural network) and video analysis, wherein a Siamese network learns to differentiate between data points by comparing them. It learns how to measure the “distance” (similarity or dissimilarity) between features extracted from video inputs. In addition, the paragraph discusses a machine-learning approach for analyzing and segmenting video content based on a “similarity metric” that integrates both visual and textual information. In Col. 21, lines 23-27, Rotman discloses that memory and persistent storage are a computer-readable storage medium that can be a tangible device that can retain and store instructions for use by an instruction execution device. In Col. 22, lines 48 56, Rotman further discloses that the computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer.)
the method comprising: receiving, by the software code executed by the processing hardware, an input including the plurality of video segments (In Col. 7, lines 20-31, Rotman disclose receiving a plurality of feature vectors corresponding to audio components of a video presentation.)
mapping, by the software code executed by the processing hardware and using the ML model based embedder, each of the plurality of video segments to a the respective embedding in a the continuous vector space to provide a plurality of mapped embeddings corresponding respectively to the plurality of video segments(In FIGS. 5A-5B and Col. 16, lines 14-21, Rotman discloses that the neural network is designed to produce “feature space embeddings” for video segments, which summarize their visual and audio characteristics. The Block-Diagonal Structure, and the resulting distance matrix implies that segments of the same scene will have shorter distances (they are closer in the feature space) compared to segments from different scenes. This structure helps in the effective analysis of video content. ) 
performing, using the ML model based embedder or another trained ML model, one of a classification or a regression of the plurality of video segments, by the software code executed by the processing hardware, using the plurality of mapped embeddings (In Col.16, lines 14-35, Rotman discloses the machine learning model by clustering video segments that belong to the same class (or category) to be located close together in this feature space embedding, whereas segments from different categories should be spaced further apart.)
With respect to claim 11, Rotman do not explicitly disclose:
classifying, by the software code executed by the processing hardware based on the one of the classification or the regression, the plurality of video segments as being a first content type among a plurality of content types 
determining, by the software code executed by the processing hardware, an encoding schema suitable for encoding the first content type
encoding, by the software code executed by the processing hardware, the plurality of video segments using the encoding schema determined to be suitable for encoding the first content type
However, it is known by Lin to disclose:
Classifying, by the software code executed by the processing hardware based on the one of the classification or the regression, the plurality of video segments as being a first content type among a plurality of content types (In Col.19, lines 47–61, Lin discloses a neural network to learn features by learning to classify a plurality of content items based at least partially on the first or the second feature of the content item.)
Determining, by the software code executed by the processing hardware, an encoding schema suitable for encoding the first content type (In Col.4-5, lines 60–2, Lin discloses the latent factor that encodes information about the similarity of content items or the semantics of content items (e.g., description or meaning associated with the content items). In Col. 18, lines 7–21, Lin discloses the machine-learning application learns or determines a first latent factor for the content item based on the equation above by using the latent factor module 204 of FIG. 2). The first latent factor includes a content item latent factor or a text query latent factor.)
Encoding, by the software code executed by the processing hardware, the plurality of video segments using the encoding schema determined to be suitable for encoding the first content type (In Col. 18, lines 7–21, Lin discloses the machine-learning application learns or determines a first latent factor for the content item based on the equation above by using the latent factor module 204 of FIG. 2). The first latent factor includes a content item latent factor or a text query latent factor.)

Rotman and Lin are analogous pieces of art because both references concern the video scene detection models utilize deterministic algorithms to learn representations of visual, textual, and audio features in video scenes. Accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Rotman, with encoding the visual and audio video components of the video segments and vectors, for input into machine learning model as taught by Rotman, with a first latent factor of the content intent item being classified as taught by Lin. The motivation for doing so would have been to improve upon the accuracy of video scene detection models that include dividing video segments into scenes and further classifying those scenes (See (Col.2-3, lines 62-1) of Rotman.)

Regarding claim 12, Rotman in view of Lin discloses the elements of claim 11. In addition, Lin disclose:
The method of claim 11, wherein the plurality of content types include a live action and an animation, and wherein the first content type is one of the live action or the animation (In Col. 6, lines 16-28, Lin disclose  machine-learning application obtains the visual web data from the visual web data system. The visual web data includes data related to a text query provided by a user, an image resulting from a search of the visual web data system based on the text query provided by the user))

Regarding claim 13, Rotman in view of Lin discloses the elements of claim 11. In addition, Rotman disclose:
The method of claim 11, wherein the classification comprises grouping each of at least one of the plurality of mapped embeddings into one or more clusters each corresponding respectively to a distinct category of the similarity metric (Rotman In Col. 13, lines 14- 39, the program finds the best way to group related video clips based on how similar they are. It uses a chart called a distance matrix that shows how different or similar the clips are. The program looks for groups of clips that share common features. It also identifies areas in the distance matrix that show lower values, which suggest that there are multiple similar clips in those areas. The distance matrix visually represents the similarities between clips. To calculate important numbers used to measure these similarities, the program uses some specific equations. These calculations help the program improve its learning about grouping video clips better.)

Regarding claim 14, Rotman in view of Lin discloses the elements of claim 11. In addition, Lin disclose:
The method of claim [[13]] 11, further comprising: selecting, based on the first content type, a pre-processing algorithm for pre-processing the plurality of video segments (In Col. 7, lines 53-65, Lin disclose the trained machine-learning model helps the application find and suggest similar content items (like images) to the user. For example, when the user picks a content item from a search result to view, the system lets the machine-learning application know about this choice. Then, the application finds more content items that are similar to what the user selected for the system to show the user.)
Pre-processing the plurality of video segments using the selected pre-processing algorithm (In Col. 7, lines 37-39, Lin disclose the user selects a content item associated with a result of a text query to view on the visual web data system .)

Regarding claim 17, Rotman in view of Lin discloses the elements of claim 11. In addition, Rotman disclose:
The method of claim 11, wherein the similarity metric comprises one of a quantitative similarity metric or a perceptual similarity metric (Rotman In Col. 3, lines 6-26 Some solutions try to organize video shots into scenes without needing a lot of supervision. They do this by comparing keyframes based on certain visual features. A method called quantization helps to group the shots. A sliding window technique is used that decides if shots or short scenes should be put together. Generally, deep learning features from the video are grouped into scenes. Shots that are linked together are combined using a method that overlaps links, which works like a graph. These features are stronger. Most methods evaluate from just one viewpoint. They also learn a way to measure the distance between shots using a specific type of deep learning model, which helps define the boundaries of scenes. They find a way to connect visual info and text to measure similarity in the video. In Col. 7, lines 7-19In different versions of the invention, machine learning model 124 assesses how similar shots within the same scene are compared to shots from different scenes. The similarity between scenes is represented using a distance matrix. Typically, this results in a block-diagonal structure, indicating that shots from the same scene tend to have lower distance values than shots from different scenes. Furthermore, optimal sequence grouping program 126 employs an optimal sequential grouping algorithm to arrange the distance matrix, ensuring that the intra-scene distances are minimized)

Regarding claim 19, Rotman in view of Lin discloses the elements of claim 11. In addition, Rotman disclose:
The method of claim 11, wherein the one of the classification or the regression is performed using a respective one of a trained classification ML model or a trained regression ML model, and wherein the respective one of the trained classification ML model or the trained regression ML model comprises a trained neural network (NN) (In FIGS. 5A-5B and Col. 16, lines 14-21, Rotman discloses that the neural network is designed to produce “feature space embeddings” for video segments, which summarize their visual and audio characteristics. The Block-Diagonal Structure, and the resulting distance matrix, implies that segments of the same scene will have shorter distances (they are closer in the feature space) compared to segments from different scenes. This structure helps in the effective analysis of video content.)

Claims 5-6 and 15-16 are rejected under 35 U.S.C. 102 as being unpatentable over Rotman in view of Lin and further in view of Anthony et al. (US Pub No.: 20200180647 A1), hereinafter referred to as Anthony.

Regarding claim 5, Rotman in view of Lin discloses the elements of claim 1. Rotman in view of Lin do not appear to explicitly disclose:
The system of claim 1, wherein the at least one ML model based embedder comprises at least one of a one-dimensional (1D) convolutional neural network (CNN), a two-dimensional (2D) (CNN), or a three-dimensional (3D) CNN
However, Anthony disclose the limitation (In step 412, the collected statistics and images—such as video frames or segments captured by the vehicle's camera or sensor— are transmitted over network 104 to the model training system 112. This data is then used to train a machine learning model. For instance, the images and statistics can train a supervised learning model. This can include various types of models, such as a random forest regressor, a support vector regressor, a simple neural network, a deep convolutional neural network (CNN), a recurrent neural network (RNN), or a long short-term memory (LSTM) network with linear or nonlinear kernels that are two-dimensional or three-dimensional. These models are designed to work with labeled data that includes continuous values. They adapt their structure—such as weights and configurations—to minimize the difference between their predicted outputs for new inputs and the actual observed outputs, using the same methods applied during training)
Accordingly, it would have been obvious to a person having ordinary skills in the art before the effective filling date of the claimed invention, having the teaching of Rotman in view of Lin, to include Anthony’s simulation of non-stationary traffic objects to tell apart real and fake images in a special setup. This helps make more realistic pictures of people for simulations, which are then used to figure out some hidden details as taught by Anthony (see[0049]).

Regarding claim 6, Rotman in view of Lin discloses the elements of claim 1. Rotman in view of Lin do not appear to explicitly disclose:
The system of claim 1, wherein the continuous vector space is multi-dimensional 
However, Anthony disclose the limitation (In step 412, the collected statistics and images—such as video frames or segments captured by the vehicle's camera or sensor— are transmitted over network 104 to the model training system 112. This data is then used to train a machine learning model. For instance, the images and statistics can train a supervised learning model. This can include various types of models, such as a random forest regressor, a support vector regressor, a simple neural network, a deep convolutional neural network (CNN), a recurrent neural network (RNN), or a long short-term memory (LSTM) network with linear or nonlinear kernels that are two-dimensional or three-dimensional. These models are designed to work with labeled data that includes continuous values. They adapt their structure—such as weights and configurations—to minimize the difference between their predicted outputs for new inputs and the actual observed outputs, using the same methods applied during training.)
Accordingly, it would have been obvious to a person having ordinary skills in the art before the effective filling date of the claimed invention, having the teaching of Rotman in view of Lin, to include Anthony’s simulation of non-stationary traffic objects to tell apart real and fake images in a special setup. This helps make more realistic pictures of people for simulations, which are then used to figure out some hidden details as taught by Anthony (see[0049]).

Regarding claim 15, Rotman in view of Lin discloses the elements of claim 11. Rotman in view of Lin do not appear to explicitly disclose:
The method of claim 11, wherein the at least one ML model based embedder comprises at least one of a one-dimensional (1D) convolutional neural network (CNN), a two-dimensional (2D) (CNN), or a three-dimensional (3D) CNN
However, Anthony disclose the limitation (In step 412, the collected statistics and images—such as video frames or segments captured by the vehicle's camera or sensor— are transmitted over network 104 to the model training system 112. This data is then used to train a machine learning model. For instance, the images and statistics can train a supervised learning model. This can include various types of models, such as a random forest regressor, a support vector regressor, a simple neural network, a deep convolutional neural network (CNN), a recurrent neural network (RNN), or a long short-term memory (LSTM) network with linear or nonlinear kernels that are two-dimensional or three-dimensional. These models are designed to work with labeled data that includes continuous values. They adapt their structure—such as weights and configurations—to minimize the difference between their predicted outputs for new inputs and the actual observed outputs, using the same methods applied during training)
Accordingly, it would have been obvious to a person having ordinary skills in the art before the effective filling date of the claimed invention, having the teaching of Rotman in view of Lin, to include Anthony’s simulation of non-stationary traffic objects to tell apart real and fake images in a special setup. This helps make more realistic pictures of people for simulations, which are then used to figure out some hidden details as taught by Anthony (see[0049]).

Regarding claim 16, Rotman in view of Lin discloses the elements of claim 11. Rotman in view of Lin do not appear to explicitly disclose:
The method of claim 11, wherein the continuous vector space is multi-dimensional
However, Anthony disclose the limitation (In step 412, the collected statistics and images—such as video frames or segments captured by the vehicle's camera or sensor— are transmitted over network 104 to the model training system 112. This data is then used to train a machine learning model. For instance, the images and statistics can train a supervised learning model. This can include various types of models, such as a random forest regressor, a support vector regressor, a simple neural network, a deep convolutional neural network (CNN), a recurrent neural network (RNN), or a long short-term memory (LSTM) network with linear or nonlinear kernels that are two-dimensional or three-dimensional. These models are designed to work with labeled data that includes continuous values. They adapt their structure—such as weights and configurations—to minimize the difference between their predicted outputs for new inputs and the actual observed outputs, using the same methods applied during training)
Accordingly, it would have been obvious to a person having ordinary skills in the art before the effective filling date of the claimed invention, having the teaching of Rotman to include Anthony’s simulation of non-stationary traffic objects to tell apart real and fake images in a special setup. This helps make more realistic pictures of people for simulations, which are then used to figure out some hidden details as taught by Anthony (see[0049]).

Claims 8 and 18 are rejected under 35 U.S.C. 102 as being unpatentable over Rotman in view of Lin and further in view of Wei. (US Patent No.10,798,399 B1), hereinafter referred to as Wei.

Regarding claim 8, Rotman in view of Lin discloses the elements of claim 1. Rotman in view of Lin do not appear to explicitly disclose:
The system of claim 1, wherein the one of the classification or the regression is performed using a respective one of a trained classification ML model or a trained regression ML model, and wherein the ML model based embedder and the respective one of the trained classification ML model or the trained regression ML model are trained independently of one another
However, Wei disclose the limitation (In Col.7-8, lines 50-2, the learning database stores information about training, a system that sorts videos into different categories. It keeps track of how each category is defined and what features each category has. The encoding module then takes the video and compresses it based on its category. Some information about how to compress the video is stored in another database. The transmission module sends the compressed video to a device over the internet. The video compression system can have extra features or parts that do different jobs, and some parts can work together or be split into even more parts.)
Accordingly, it would have been obvious to a person having ordinary skills in the art before the effective filling date of the claimed invention, having the teaching of Rotman in view of Lin, to include Wei’s adaptive video compression to improve the operation and performance of the computing device(s) as taught by Wei (see(Col.4, lines 13-17)).

Regarding claim 18, Rotman in view of Lin discloses the elements of claim 11. Rotman in view of Lin do not appear to explicitly disclose:
The method of claim 11, wherein the one of the classification or the regression is performed using a respective one of a trained classification ML model or a trained regression ML model, and wherein the ML model based embedder and the respective one of the trained classification ML model or the trained regression ML model are trained independently of one another
However, Wei disclose the limitation (In Col.7-8, lines 50-2, the learning database stores information about training, a system that sorts videos into different categories. It keeps track of how each category is defined and what features each category has. The encoding module then takes the video and compresses it based on its category. Some information about how to compress the video is stored in another database. The transmission module sends the compressed video to a device over the internet. The video compression system can have extra features or parts that do different jobs, and some parts can work together or be split into even more parts)
Accordingly, it would have been obvious to a person having ordinary skills in the art before the effective filling date of the claimed invention, having the teaching of Rotman in view of Lin, to include Wei’s adaptive video compression to improve the operation and performance of the computing device(s) as taught by Wei (see(Col.4, lines 13-17)).

Claims 10 and 20 are rejected under 35 U.S.C. 102 as being unpatentable over Rotman in view of Lin and further in view of Zhu. (US Pub No.: 20210397941 A1), hereinafter referred to as Zhu.

Regarding claim 10, Rotman in view of Lin discloses the elements of claim 1. Rotman in view of Lin do not appear to explicitly disclose:
The system of claim 1, wherein the one of the classification or the regression is performed using a respective one of a classification block or a regression block of the ML model based embedder, and wherein the ML model based embedder including the respective one of the classification block or the regression block is trained using end-to-end learning
However, Zhu disclose the limitation (In paragraph [0024], Zhu disclose the machine learning model is trained with end-to-end learning scheme that automatically integrates task-based evaluation criteria into the learning process)
Accordingly, it would have been obvious to a person having ordinary skills in the art before the effective filling date of the claimed invention, having the teaching of Rotman in view of Lin, to include Zhu’s task-oriented ML to improve the predictor, which collaboratively forms a virtuous circle for the learning of both the task-oriented estimator network and the predictor network.

Regarding claim 20, Rotman in view of Lin discloses the elements of claim 11. Rotman in view of Lin do not appear to explicitly disclose:
The method of claim 11, wherein the one of the classification or the regression is performed using a respective one of a classification block or a regression block of the ML model based embedder, and wherein the ML model based embedder including the respective one of the classification block or the regression block and the trained NN is trained using end-to-end learning
However, Zhu disclose the limitation (In paragraph [0024], Zhu disclose the machine learning model is trained with end-to-end learning scheme that automatically integrates task-based evaluation criteria into the learning process)
Accordingly, it would have been obvious to a person having ordinary skills in the art before the effective filling date of the claimed invention, having the teaching of Rotman in view of Lin, to include Zhu’s task-oriented ML to improve the predictor, which collaboratively forms a virtuous circle for the learning of both the task-oriented estimator network and the predictor network.

Response to Arguments
The applicant's arguments filed 11/04/2025 have been fully considered, but in part are not persuasive.

Pertaining to Rejection under 101
The examiner respectfully remains convinced claim 1 does not overcome rejection under 35 U.S.C. 101. Under Step 2A, Prong 1, limitations such as “using contrastive learning based on a similarity metric…” recites a mathematical concept. Also, classify, “based on the one of the classification or the regression, the plurality of content video segments as being a first content type among a plurality of content types”, reasonable interpreted as a mental concept. For example, with pen and paper one can sort video segments into one main type from a group of different types. See MPEP § 2106.04(a)(2)(III). Step 2A Prong Two, limitations such as “the processing hardware configured to execute the software code to: receive an input including a plurality of content segments” is not integrated into a practical application. In particular, the claims only recites generic computing components. Such generic computing components are recited at a high-level of generality (i.e., as a generic processor performing data gathering and mathematical calculations) such that they amount to no more than mere instructions to apply the exception using generic computer components.

Pertaining to the improvement of technology 
	Examiner believes under 35 101 Step 2A Prong 2, the claims does not show an improvement of a computer or another technology field, rather than merely using a computer (e.g., ML model) to perform analysis that includes mapping video segments to embeddings, performing calculation or regression, determining content type, selecting an encoding schema, and then encoding the video. Although these steps involve machine learning model and video processing, the claim does not actually change or improve how the machine learning model operates, nor does it modify the underlying video encoding technology itself. (See MPEP § 2106.04 for further explanation on evaluating improvements in the functioning of a computer or other technology). 

Pertaining to Rejection under 103
Applicant’s arguments with respect to claim(s) 1 and 11 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to EVEL HONORE whose telephone number is (703)756-1179. The examiner can normally be reached Monday-Friday 8 a.m. -5:30 p.m.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela D Reyes can be reached at (571) 270-1006. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

EVEL HONORE
Examiner
Art Unit 2142



/Mariela Reyes/Supervisory Patent Examiner, Art Unit 2142
Read full office action
Prosecution Timeline

Mar 16, 2022
Application Filed
Dec 11, 2024
Non-Final Rejection — §101, §102, §103
Mar 14, 2025
Response Filed
Jul 22, 2025
Final Rejection — §101, §102, §103
Nov 04, 2025
Response after Non-Final Action
Nov 18, 2025
Request for Continued Examination
Nov 26, 2025
Response after Non-Final Action
Mar 10, 2026
Non-Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/399,470
Patent 12566942
System and Method For Generating Parametric Activation Functions
2y 5m to grant Granted Mar 03, 2026
17/484,623
Patent 12547946
SYSTEMS AND METHODS FOR FIELD EXTRACTION FROM UNLABELED DATA
2y 5m to grant Granted Feb 10, 2026
17/687,918
Patent 12547906
METHOD, DEVICE, AND PROGRAM PRODUCT FOR TRAINING MODEL
2y 5m to grant Granted Feb 10, 2026
17/189,160
Patent 12536156
UPDATING METADATA ASSOCIATED WITH HISTORIC DATA
2y 5m to grant Granted Jan 27, 2026
17/331,332
Patent 12406483
ONLINE CLASS-INCREMENTAL CONTINUAL LEARNING WITH ADVERSARIAL SHAPLEY VALUE
2y 5m to grant Granted Sep 02, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
39%
Grant Probability
85%
With Interview (+46.4%)
4y 5m
Median Time to Grant
High
PTA Risk
Based on 18 resolved cases by this examiner. Grant probability derived from career allow rate.