Prosecution Insights
Last updated: April 19, 2026
Application No. 18/227,661

ADAPTIVE THRESHOLDING FOR VIDEOS USING ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING

Final Rejection §103
Filed
Jul 28, 2023
Examiner
SHERMAN, STEPHEN G
Art Unit
2621
Tech Center
2600 — Communications
Assignee
Twelve Labs, Inc.
OA Round
2 (Final)
82%
Grant Probability
Favorable
3-4
OA Rounds
2y 7m
To Grant
99%
With Interview

Examiner Intelligence

Grants 82% — above average
82%
Career Allow Rate
1334 granted / 1626 resolved
+20.0% vs TC avg
Strong +17% interview lift
Without
With
+17.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
30 currently pending
Career history
1656
Total Applications
across all art units

Statute-Specific Performance

§101
2.9%
-37.1% vs TC avg
§103
50.5%
+10.5% vs TC avg
§102
19.9%
-20.1% vs TC avg
§112
17.9%
-22.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 1626 resolved cases

Office Action

§103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Response to Arguments Applicant’s arguments with respect to claims 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. Claims 1-5, 7-12 and 14-19 are rejected under 35 U.S.C. 103 as being unpatentable over Suri et al. (US 2016/0034786) in view of Li et al. (CN 115359074 A). Regarding claim 1, Suri et al. disclose a method of retrieving videos using an adaptive thresholding model, comprising: receiving, from a client device, a request to retrieve one or more videos relevant to a query (Paragraph [0028): "In other examples, the techniques described herein may be useful for video editing. For example, in at least one example, a user may be presented with a user interface identifying desirable video data for manipulating the video data. In other examples, desirable video data may be segmented and combined to automatically create video files summarizing or highlighting content in a video collection and/or video file. A summarizing video file may be a video file that includes video segments and/or video frames from every portion of a video file and/or video collection. A highlighting video file may be a video file that includes important video segments and/or video frames from a video file and/or video collection. In at least one example, transitions may be added between the segmented video data to provide for seamless viewing." See also Figure 7, step 702 and paragraph [0100].); accessing a set of videos, wherein a video in the set of videos is indexed to divide the video into one or more video segments (Paragraph [0030), "FIG. 1 is a diagram showing an example environment 100 for training models from video data based on low level and high level features and applying the learned models to new video data for identifying desirable video data. More particularly, the example environment 100 may include a service provider 102, one or more network(s) 104, one or more users 106, and one or more user devices 108 associated with the one or more users 106. Alternatively, or in addition, the functionality described herein can be performed, at least in part, by one or more hardware logic components such as accelerators. For example, and without limitation, illustrative types of hardware logic components that can be used include Field-programmable Gate Arrays (FPGAs), Application-specific Integrated Circuits (ASICs), Application-specific Standard Products (ASSPs), System-on-a-chip systems (SOCs), Complex Programmable Logic Devices (CPLDs), etc.”); generating relevance scores for the video segments obtained from the set of videos, wherein a relevance score for a respective video segment is generated by applying a machine-learned video retrieval model to the query and the video segment, and wherein the relevance score indicates a likelihood the video segment is related to the query (Paragraph [0043]: “The training module 116 may be configured to train models based on extracted features. The training module 116 may include a receiving module 202 and a learning module 204. The receiving module 202 may be configured to receive video data. In at least one example, the video data may include a video collection, video files, video segments, and/or video frames. The video data may be pre-labeled with semantic categories describing the video data and/or scores indicative of desirability. In some examples, individual video files, video segments, video frames, and/or video collections may be individually labeled with the semantic categories and/or scores indicative of desirability after the receiving module 202 receives the video data. For example, the video data may be labeled as shown in the example pseudocode below.” See also Figure 7, steps 704/706 and paragraphs [0101]-[0102].); generating one or more predicted threshold values for the requested query, wherein the one or more predicted threshold values are generated by applying a machine-learned adaptive thresholding model to the query (Paragraph [0044]: “The semantic categories may belong to a set of predefined semantic categories that describes the subject matter of the video data. For example, semantic categories may include indoor, outdoor, mountain, lake, city, country, home, party, sporting event, zoo, concert, etc. The scores indicative of desirability, or desirability scores, may be a number on a scale (e.g., a five point scale, scale from zero to one, etc.). Such scores may be based on technical quality and subjective importance, as perceived from a human labeling the video data. For example, on a five point scale, the combination of strong technical quality (e.g., full exposure, even color distribution, minimal camera motion, bright colors, focused faces, etc.) and subject importance above a predetermined threshold (e.g., main character, clear audio, minimal camera motion and/or object motion, etc.) may result in desirable video data, or a desirability score closer to five. In contrast, poor technical quality (e.g., poor exposure, uneven color distribution, significant camera motion, dark picture, etc.) and/or subjective importance below a predetermined threshold (e.g., unimportant character, muffled audio, significant camera motion and/or object motion, etc.) may result in undesirable video data, or a desirability score closer to zero. In at least some examples, video data may have a neutral level of desirability (e.g., the video data is not desirable or undesirable), or a desirability score near 2. The receiving module 202 may provide the received video data to the extracting module 118 for feature extraction before training the classifier and scoring model in the learning module 204.”); and filtering a subset of video segments based on the one or more predicted threshold values, wherein the subset of video segments are associated with relevance scores that are equal to or above a value obtained from the predicted threshold values (Paragraph [0097]: “The processed video data 610 may be used for identifying a set of video data and ranking individual video frames, video segments, video files, or video collections in the video data based on which individual video frames, video segments, video files, or video collections are the most desirable according to desirability scores. In other examples, the identified set of video data may be filtered based on levels of desirability per the desirability scores associated with individual video frames, video segments, video files, and/or video collections. For instance, video frames, video segments, video files, and/or video collections may be ranked against other video frames, video segments, video files, and/or video collections based on the desirability scores. The processed video data 610 may also be leveraged in other manners as described above.”); and providing the filtered subset of video segments to the client device as being relevant to the query of the request (Paragraph [0097]). Suri et al. fail to explicitly teach: receiving another request to retrieve videos relevant to another query different from the query; generating one or more higher threshold values for the another query by applying the machine-learned adaptive thresholding model to the another query; and providing another filtered subset of video segments as being relevant to the another query, wherein the another filtered subset of video segments are associated with relevance scores equal to or above a value obtained from the one or more higher threshold values. Li et al. disclose an adaptive thresholding model wherein one or more threshold values are generated for each query (See the middle of page 19 of the provided document: “Further, in the step S3.3, the threshold learning device is adaptive threshold learning device, using two layers are fully connected to obtain...”). Therefore, it would have been obvious to “one of ordinary skill” in the art before the effective filing date of the claimed invention to use the adaptive thresholding as taught by Li et al. and apply the teachings to the threshold as taught by Suri et al., such that when another query is received one or more higher threshold values as generated, and the steps of Suri et al. are repeated. This would result in receiving another request to retrieve videos relevant to another query different from the query (Paragraph [0043] of Suri et al. for the second query), generating one or more higher threshold values for the another query by applying the machine-learned adaptive thresholding model to the another query (The adaptive thresholding of Li et al. with paragraph [0044] of Suri et al.); and providing another filtered subset of video segments as being relevant to the another query, wherein the another filtered subset of video segments are associated with relevance scores equal to or above a value obtained from the one or more higher threshold values (Paragraphs [0043], [0097] and [0101]-[0102] of Suri et al. using the adaptive thresholding of Li et al.). The motivation to combine would have been in order to improve the filtering of the video segments thus providing better results to the queries. Regarding claim 2, Suri et al. and Li et al. disclose the method of claim 1, wherein parameters of the machine-learned adaptive thresholding model are trained by: obtaining a training dataset including a set of instances including at least a previous query and a subset of video segments that are known to be relevant to the previous query (Suri et al.: Paragraph [0070]); obtaining relevance scores generated for the subset of video segments using the video retrieval model (Suri et al.: Paragraph [0071]); and training the parameters of the adaptive thresholding model using the training dataset (Suri et al.: Paragraph [0072]). Regarding claim 3, Suri et al. and Li et al. disclose the method of claim 2, wherein training the parameters further comprises: generating estimated threshold values by applying the adaptive thresholding model to the previous query (Suri et al.: Paragraphs [0070]-[0072]); and computing a loss function indicating a difference between the estimated threshold values and the relevance scores generated for the subset of video segments (Suri et al.: Paragraphs [0070]-[0072]); and backpropagating a value obtained from the loss function to update the parameters of the adaptive thresholding model (Suri et al.: Paragraphs [0070]-[0072]). Regarding claim 4, Suri et al. and Li et al. disclose the method of claim 1, wherein parameters of the video retrieval model are trained by: obtaining a training dataset including a set of instances including at least a previous query, a set of video segments, and labels for the set of video segments that each indicate whether a respective video segment is relevant to the previous query (Suri et al.: Paragraphs [0070]-[0072]); and training the parameters of the video retrieval model using the training dataset (Suri et al.: Paragraphs [0070]-[0072]). Regarding claim 5, Suri et al. and Li et al. disclose the method of claim 4, wherein obtaining the training dataset further comprises: for the previous query, identifying one or more augmented queries from the previous query that each describe an object, person, or entity described in the previous query (Suri et al.: Paragraphs [0070]-[0072]); augmenting the training dataset by generating additional instances based on the augmented queries (Suri et al.: Paragraphs [0070]-[0072]); and training the parameters of the video retrieval model using at least the augmented instances of the training dataset (Suri et al.: Paragraphs [0070]-[0072]). Regarding claim 7, Suri et al. and Li et al. disclose the method of claim 1, wherein parameters of the machine-learned adaptive thresholding model are trained by: obtaining a training dataset including a set of instances including at least a previous query and a set of video segments (Suri et al.: Paragraphs [0070]-[0072]); obtaining relevance scores generated for the set of video segments using the video retrieval model (Suri et al.: Paragraphs [0070]-[0072]); identifying one or more relevance scores each associated with a respective performance metric when used to filter the set of video segments (Suri et al.: Paragraphs [0070]-[0072]); and training the parameters of the adaptive thresholding model using the previous query and the identified relevance scores for the set of video segments (Suri et al.: Paragraphs [0070]-[0072]). Regarding claim 8, please refer to the rejection of claim 1, and furthermore Suri et al. also disclose a non-transitory computer-readable medium including instructions for execution on a processor (Figure 1 shows a processor 112 which executes the instructions stored on non-transitory computer-readable medium 114.). Regarding claim 9, this claim is rejected under the same rationale as claim 2. Regarding claim 10, this claim is rejected under the same rationale as claim 3. Regarding claim 11, this claim is rejected under the same rationale as claim 4. Regarding claim 12, this claim is rejected under the same rationale as claim 5. Regarding claim 14, this claim is rejected under the same rationale as claim 7. Regarding claim 15, this claim is rejected under the same rationale as claim 8. Regarding claim 16, this claim is rejected under the same rationale as claim 2. Regarding claim 17, this claim is rejected under the same rationale as claim 3. Regarding claim 18, this claim is rejected under the same rationale as claim 4. Regarding claim 19, this claim is rejected under the same rationale as claim 5. Claims 6, 13 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Suri et al. (US 2016/0034786) in view of Li et al. (CN 115359074 A) and further in view of Muffat et al. (US 2020/0279105). Regarding claim 6, Suri et al. and Li et al. disclose the method of claim 1. Suri et al. and Li et al. disclose wherein the adaptive thresholding model is configured as a bidirectional encoding representations from transformer (BERT) architecture. Muffat et al. disclose wherein an adaptive thresholding model is configured as a bidirectional encoding representations from transformer (BERT) architecture (Paragraph [0033]). Therefore, it would have been obvious to “one of ordinary skill” in the art before the effective filing date of the claimed invention to use the BERT teachings of Muffat et al. in the method taught by the combination of Suri et al. and Li et al. The motivation to combine would have been in order to provide the classification process increased accuracy and speed, improved scalability and ease of adaptation. Regarding claim 13, this claim is rejected under the same rationale as claim 6. Regarding claim 20, this claim is rejected under the same rationale as claim 6. Conclusion Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to STEPHEN G SHERMAN whose telephone number is (571)272-2941. The examiner can normally be reached Monday - Friday, 8:00am - 4pm ET. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, AMR AWAD can be reached at (571)272-7764. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /STEPHEN G SHERMAN/Primary Examiner, Art Unit 2621 29 December 2025
Read full office action

Prosecution Timeline

Jul 28, 2023
Application Filed
Jul 18, 2025
Non-Final Rejection — §103
Nov 24, 2025
Response Filed
Dec 29, 2025
Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12603045
ELECTRONIC DEVICE FOR REDUCING OUTPUT VARIATION FACTORS OF PIXEL CIRCUITS
2y 5m to grant Granted Apr 14, 2026
Patent 12597219
HEAD MOUNTABLE DISPLAY
2y 5m to grant Granted Apr 07, 2026
Patent 12592044
Systems and Methods for Providing Real-Time Composite Video from Multiple Source Devices Featuring Augmented Reality Elements
2y 5m to grant Granted Mar 31, 2026
Patent 12591302
GENERATING AI-CURATED AR CONTENT BASED ON COLLECTED USER INTEREST LABELS
2y 5m to grant Granted Mar 31, 2026
Patent 12586407
IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM FOR ADJUSTING IMAGE PARAMETERS
2y 5m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
82%
Grant Probability
99%
With Interview (+17.2%)
2y 7m
Median Time to Grant
Moderate
PTA Risk
Based on 1626 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month