Prosecution Insights
Last updated: April 19, 2026
Application No. 17/744,343

COMPUTERIZED SYSTEM AND METHOD FOR KEY EVENT DETECTION USING DENSE DETECTION ANCHORS

Non-Final OA §103
Filed
May 13, 2022
Examiner
VARNDELL, ROSS E
Art Unit
2674
Tech Center
2600 — Communications
Assignee
Yahoo Assets LLC
OA Round
4 (Non-Final)
85%
Grant Probability
Favorable
4-5
OA Rounds
2y 4m
To Grant
98%
With Interview

Examiner Intelligence

Grants 85% — above average
85%
Career Allow Rate
520 granted / 615 resolved
+22.6% vs TC avg
Moderate +13% lift
Without
With
+13.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 4m
Avg Prosecution
28 currently pending
Career history
643
Total Applications
across all art units

Statute-Specific Performance

§101
6.3%
-33.7% vs TC avg
§103
66.9%
+26.9% vs TC avg
§102
6.4%
-33.6% vs TC avg
§112
10.7%
-29.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 615 resolved cases

Office Action

§103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Response to Arguments This non-final office action is in response to the amendment filed 1/2/2026. Claims 1, 3-8, 10-15, and 17-20 are pending in this application and have been considered below. Claims 2, 9, and 16 are canceled by the applicant. Applicant’s arguments with respect to claims 1, 3-8, 10-15, and 17-20 regarding the lack of temporal displacements in Liu have been considered but are moot in view of new ground(s) of rejection, Yang, which explicitly teaches predicting temporal displacement from every frame (see below). Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention. Claims 1, 3-8, 10-15, and 17-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vahdani et al. (Deep Learning-based Action Detection in Untrimmed Videos: A Survey – hereinafter “Vahdani”) in view of Yang et al. (Revisiting Anchor Mechanisms for Temporal Action Localization – hereinafter “Yang”) in view of De Souza et al. (US 2018/0053057 A1 – hereinafter “De Souza”). Claims 1, 8 and 15. Yang disclose a non-transitory computer-readable storage medium, the computer-readable storage medium including instructions that when executed by a computer (Yang p. 6, § IV. B.: “The proposed A2Net is implemented based on PyTorch 1.4 [40]. We perform experiments with one NVIDIA TITAN Xp GPU, Intel Xeon E5-2683 v3 CPU and 128G memory”), cause the computer to … Vahdani discloses a method comprising: retrieving a frame sequence depicting an event (Abstract discloses “activity detection in untrimmed videos”; Fig. 1), the frame sequence having a plurality of frames (p. 2, left column, discloses “predict action labels at every frame of the video”), each frame corresponding to a temporal location of the frame sequence (p. 3, top right column, discloses “The start time, end time, and label”); extracting a feature from each frame of the plurality of frames (p. 2, left column, ¶3 discloses “When targeting fine-grained actions, temporal action detection (segmentation) is similar to semantic segmentation as both aim to classify every single instance, i.e., frames in temporal domain”; p. 4, left column, §2.2 discloses equation 4 a sequence S with l frames where, “RGB frame xtn is fed to spatial network ResNet [25], extracting feature vector fS,n”; ); generating an input matrix by (p. 4, left column, §2.2 discloses “spatial and temporal features, fS,n and fT,n, are concatenated to represent the visual feature fn for snippet sn“; where, fn is a matrix with dimension dxT and an output matrix A (T-CAM) that contains activation scores (confidences) for each class at each temporal position); applying an event detection model to the input matrix to combine features across all temporal locations and generate an output matrix (p. 10, left column, § 2.4.1.1 discloses “T-CAM is a matrix denoted by A which represent the possibility of activities at each temporal position. Matrix A has nc rows which is the total number of action classes, and T columns which is the number of temporal positions in the video.”), and every class, confidences (p. 10, right column, § 2.4.1.2 discloses “each video should be represented using a single confidence score per category. The confidence score for each category is computed as the average of top activation k scores over the temporal dimension for that category”; p. 11, top right column, discloses “nc is the number of action classes” and p. 3, left column, §2.1, definition 1 discloses multiple “categories of action instances”) and temporal displacements from the output matrix (p. 10, left column, § 2.4.1.1 discloses “T-CAM is a matrix denoted by A which represent the possibility of activities at each temporal position … T columns which is the number of temporal positions in the video.”); and determining, based on the confidences and temporal displacements, (p. 10, right column, § 2.4.1.2 discloses confidences and timing; pp. 17-18, §4.1.3 discloses action spotting in sports and “Human activity localization in sports videos is studied in [192], [193], [194], [195], salient game actions are identified in [196], [197], automatic game highlights identification and summarization are performed in [198], [199], [200], [201], [202]. Moreover, action spotting, which is the task of temporal localization of human-induced events, has been popular in soccer game broadcasts [3], [203] and some methods aimed to automatically detect goals, penalties, corner kicks, and card events [204].”). Vahdani discloses all of the subject matter as described above except for specifically teaching “the output matrix comprising a plurality of classes for each frame,” “determining, for every frame”, and “the event depicted by the frame sequence.” However, Yang in the same field of endeavor teaches the output matrix comprising a plurality of classes for each frame (Yang teaches the model outputs a matrix/map that contains the class scores and displacement values for the temporal sequence. Yang’s matrix structure: “The anchor-free module uses two individual branches for predicting classification scores … and regression distances to the starting boundary and ending boundary (rs; re)” (p. 5, § III. D.). Table II lists the “Anchor-Free Localization Module” output size as “B x (C +2) x t”; where, t is the temporal length (frames), C is the plurality of classes, and +2 are the two temporal displcament values (start/end) (p. 4, Table II).), determining, for every frame (Yang p. 2, right column & Fig. 2: “The anchor-free module simultaneously predicts the classification score and regresses the distances to the starting and ending boundaries. The anchor-based module first chooses the closest-matched anchor then refines the action boundaries via regression. These two modules share the same backbone and independently make predictions at each temporal location from every pyramid level” – every frame/location confidence and displacements; Yang p. 4, § 3. D.: “The proposed anchor-free module regresses the distances from the center of an action instance to its boundaries.”), and the event depicted by the frame sequence (p. 6, § III. F.: “During inference … the predicted action boundaries (saf, eaf) can be obtained via inverting equation 3 [using the temporal displacements]. The maximum value of classification score Saf is regarded as the confidence for the localization results … Finally, action localization results for the anchor-free and anchor-based modules are merged together … and obtain the final localization results” [the event depicted].). Vahdani discloses all of the subject matter as described above except for specifically teaching “stacking.” However, De Souza in the same field of endeavor teaches “stacking” (¶26: “Feature vectors extracted from the transformation(s) are aggregated (“stacked”), a process referred to herein as Data Augmentation by Feature Stacking (DAFS). The stacked descriptors form a feature matrix”) It would have been obvious to a Person having Ordinary Skill In The Art (POSITA), before the effective filling date of the claimed invention, to combine the teachings of Vahdani, Yang, and De Souza to arrive at the claimed invention. A POSITA would have been motivated to improve temporal localization capability of Vahdani’s system. Yang teaches an anchor-free prediction head to allow for flexible action detection that disposes each temporal location equally (Yang p. 2) and avoids the limitation of pre-defined anchors. A POSITA would have found it obvious to modify Vahdani’s system by incorporating the specific localization techniques taught by Yang to achieve more precise boundary detecting, thereby arriving at a system that determines an event based on confidences and temporal displacements. De Souza teaches that stacking feature vectors into a matrix is a well-known and conventional method for preparing sequential data for input into a deep learning model. Therefore, it would have been an obvious and routine design choice to apply the stacking technique of De Souza to the features within the combined Vahdani and Yang framework, as this would be a predictable step with a reasonable expectation of success. Claims 3, 10, and 17. The combination of Vahdani, Yang, and De Souza discloses the method of claim 1, wherein determining confidences and temporal displacements further comprises performing separate convolution operations on the output matrix (Vahdani p. 10, left column, §2.4.1.1, definition 15 discloses “T-CAM is a matrix denoted by A”; p. 6, right column, § 2.3.4 discloses “Several convolutional layers are applied on the features to predict actionness score (def 6), completeness score (def 8), classification score (def 9), and to adjust the temporal boundary of the proposals.”; Yang uses two parallel convolution branches (Fig. 4).). Claims 4, 11, and 18. The combination of Vahdani, Yang, and De Souza discloses the method of claim 1, wherein the event detection model includes a model trunk selected from a group consisting of a 1-D U-Net (Vahdani p. 5, right column, ¶2 discloses “U-shaped TFPNs”) and a transformer encoder (TE) (Vahdani p. 8, bottom left column, 2.3.5.3 Transformers disclose “encoder decoder transformer … Their encoder generates a context graph where the nodes are initially video level features and the interactions among nodes are modeled as learnable edge weights. Also, positional information for each node is provided using learnable positional encodings.”). Claims 5, 12, and 19. The combination of Vahdani, Yang, and De Souza discloses the method of claim 1, wherein determining the event depicted by the frame sequence based on the confidences and temporal displacements includes consolidating the confidences and temporal displacements by displacing the confidences by the temporal displacements (Vahdani p. 10, left column, § 2.4.1.1 discloses “T-CAM is a matrix denoted by A which represent the possibility of activities at each temporal position.”; p. 10, right column, § 2.4.1.2, last paragraph, discloses “A[c, tcl] is the activation (def 15) of class c at temporal position tcl”). Claims 6 and 13. The combination of Vahdani, Yang, and De Souza discloses the method of claim 1, wherein prior to applying the event detection model, a dimensionality reduction technique is applied to the input matrix (Vahdani p. 4, left top column, discloses “In some cases, the recognition scores of sampled frames are aggregated with the Top-k pooling”; where, pooling layers are commonly used in neural networks to summarize features from an area). Claims 7, 14, and 20. The combination of Vahdani, Yang, and De Souza discloses the method of claim 1, further comprising training the event detection model by optimizing a confidence loss and a temporal displacement loss (Vahdani p. 11, top left column, discloses “MIL loss is a cross-entropy loss applied over all videos and all action classes; p. 6, § 2.3.4 discloses loss functions for proposal evaluation – definition 14, action regression loss to adjust the temporal boundaries of the proposals. Yang details a “classification loss” which corresponds to the focal loss structure (§ III. E., Eq. 5) for confidence and “L1 loss” for displacement (§ III. E.)). Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to Ross Varndell whose telephone number is (571)270-1922. The examiner can normally be reached M-F 9-5 EST. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, O’Neal Mistry can be reached at (313)446-4912. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /Ross Varndell/Primary Examiner, Art Unit 2674
Read full office action

Prosecution Timeline

May 13, 2022
Application Filed
Mar 07, 2025
Non-Final Rejection — §103
Jun 12, 2025
Response Filed
Jun 17, 2025
Final Rejection — §103
Sep 11, 2025
Request for Continued Examination
Sep 16, 2025
Response after Non-Final Action
Sep 26, 2025
Non-Final Rejection — §103
Jan 02, 2026
Response Filed
Feb 02, 2026
Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12603810
System and Method for Communications Beam Recovery
2y 5m to grant Granted Apr 14, 2026
Patent 12597238
AUTOMATIC IMAGE VARIETY SIMULATION FOR IMPROVED DEEP LEARNING PERFORMANCE
2y 5m to grant Granted Apr 07, 2026
Patent 12582348
DEVICE AND METHOD FOR INSPECTING A HAIR SAMPLE
2y 5m to grant Granted Mar 24, 2026
Patent 12579441
SYSTEMS AND METHODS FOR IMAGE RECONSTRUCTION
2y 5m to grant Granted Mar 17, 2026
Patent 12579786
SYSTEM AND METHOD FOR PROPERTY TYPICALITY DETERMINATION
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

4-5
Expected OA Rounds
85%
Grant Probability
98%
With Interview (+13.0%)
2y 4m
Median Time to Grant
High
PTA Risk
Based on 615 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month