Last updated: July 17, 2026

Application No. 17/744,343

COMPUTERIZED SYSTEM AND METHOD FOR KEY EVENT DETECTION USING DENSE DETECTION ANCHORS

Final Rejection §103

Filed

May 13, 2022

Examiner

VARNDELL, ROSS E

Art Unit

2674

Tech Center

2600 — Communications

Assignee

Yahoo Assets LLC

OA Round

5 (Final)

Interview Optional

— +13.2% interview lift. Interview lift (+13.2%) is below the 15.0% threshold. A written response is recommended.

Based on 622 resolved cases, 2023–2026

Examiner Intelligence

VARNDELL, ROSS E View full profile →

Grants 85% — above average

Career Allowance Rate

526 granted / 622 resolved

+22.6% vs TC avg

Moderate +13% lift

Without

With

+13.2%

Interview Lift

resolved cases with interview

Typical timeline

2y 3m

Avg Prosecution

33 currently pending

Career history

659

Total Applications

across all art units

Statute-Specific Performance

§101

1.6%

-38.4% vs TC avg

§103

89.2%

+49.2% vs TC avg

§102

2.8%

-37.2% vs TC avg

§112

4.9%

-35.1% vs TC avg

Black line = Tech Center average estimate • Based on career data from 622 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
This non-final office action is in response to the amendment filed May 1, 2026.  Claims 1, 3-8, 10-15, and 17-20 are pending in this application and have been considered below.  Claims 2, 9, and 16 are canceled by the applicant.  

Applicant’s arguments with respect to claims 1, 3-8, 10-15, and 17-20 regarding the lack of temporal displacements in Liu have been considered but are moot in view of new ground(s) of rejection over Yang (US 2019/0102908 A1) which explicitly teaches a regression coefficient for each class and each frame coefficient vector.
	
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 3-8, 10-15, and 17-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vahdani et al. (Deep Learning-based Action Detection in Untrimmed Videos: A Survey – hereinafter “Vahdani”) in view of Yang et al. (US 2019/0102908 A1, hereinafter "Yang"), which incorporates by reference Girshick ("Fast R-CNN", hereinafter "Girshick") at ¶¶ 0060, 76, and 80; where, "Each patent and prior printed publication cited herein is expressly incorporated by reference as if expressly set forth," at ¶ 0173.).
Claims 1, 8 and 15. 
Yang disclose a non-transitory computer-readable storage medium, the computer-readable storage medium including instructions that when executed by a computer (Yang ¶¶ 0145-146), cause the computer to … 
Vahdani discloses a method comprising: 
retrieving a frame sequence depicting an event (Vahdani Abstract discloses “activity detection in untrimmed videos”; Fig. 1), the frame sequence having a plurality of frames (Vahdani p. 2, left column, discloses “predict action labels at every frame of the video”; § 1.1 discloses "the goal is to predict action labels at every frame of the video" for "videos with dense occurrence of actions"), each frame corresponding to a temporal location of the frame sequence (p. 3, top right column, discloses “The start time, end time, and label”);  
extracting a feature from each frame of the plurality of frames (Vahdani p. 2, left column, ¶3 discloses “When targeting fine-grained actions, temporal action detection (segmentation) is similar to semantic segmentation as both aim to classify every single instance, i.e., frames in temporal domain”; p. 4, left column, §2.2 discloses equation 4 a sequence S with l frames where, “RGB frame xtn is fed to spatial network ResNet [25], extracting feature vector fS,n”; ); 
generating an input matrix by stacking the extracted features (Vahdani § 2.4.1.1, Eq. (10) discloses "                
                    X
                    ∈
                     
                    
                        
                            R
                        
                        
                            (
                            d
                            ×
                            T
                            )
                        
                    
                     
                
             is a video-level feature matrix, and d is the feature dimension",
where T is the number of temporal positions; the per-frame extracted features are thus
arranged along the temporal axis to form a d x T matrix, stacking the extracted features
along the temporal dimension; BRI: Vahdani's X functions as the input matrix to the linear classifier WX                 
                    ⨁
                
             b of Eq. (10), reading on the claimed input matrix); 
applying an event detection model to the input matrix to combine features across all temporal locations and generate an output matrix, the output matrix comprising a plurality of classes for each frame (Vahdani § 2.3.5 discloses a model that combine features across the full temporal extent of the sequence, including an "encoder decoder transformer" that captures non-linear temporal structure "by reasoning over videos as nonsequential entities" and a "U-shaped TFPN" of convolutional and deconvolutional layers; § 2.4.1.1, Eq. (10) discloses applying the event detection model to the input feature matrix X to generate the output T-CAM matrix A= WX + b, where A has "n_c rows which is the total number of action classes, and T columns which is the number of temporal positions in the video", providing a plurality of classes for each frame; the matrix multiplication WX combines all T temporal columns of X into each column of A);
determining, for each frame and each class, confidences  from the output matrix (Vahdani § 2.4.1.2 discloses A is the "activation of class c at temporal position t" and the per-category confidence is "computed as the average of top k activation scores over the temporal dimension for that category"); and 
determining, based on tdetermined for each frame and each class, the event depicted by the frame sequence (Vahdani § 2.4.1.1 discloses the T-CAM matrix A "represent[s] the possibility of
activities at each temporal position"; § 4.1.3 discloses action spotting yielding detected
events including "goals, penalties, corner kicks, and card events").
Vahdani discloses all of the subject matter as described above except for specifically teaching “a respective confidence and a respective class-specific temporal displacement.”  However, Yang in the same field of endeavor (action detection in video) teaches a respective confidence and a respective class-specific temporal displacement (Yang ¶ 0056 discloses "a 4 x C x T dimensional regression coefficient vector is produced for each anchor tube, where ... C is the number of action classes and T is the length of the input frame sequence"; ¶ 0058 repeats the 4 x C x T regression coefficient teaching; ¶ 0080 expressly discloses "The regression is class specific, i.e. different regression outputs correspond to different actions"; ¶ 0076 discloses the regression coefficient parameterization "may be the same as in Girshick et al."; Girshick (incorporated by reference into Yang at ¶ 0173) discloses "per-class bounding-box regression offsets" such that "each of the K classes gets its own refined bounding-box prediction", applied to a preassigned anchor; the regression output indexed by class c and time treads on the claimed class-specific temporal displacement determined for each frame and each class).
It would have been obvious to a Person of Ordinary Skill In The Art (POSITA), before the effective filling date of the claimed invention, to modify Vahdani's temporal action detection by including the regression coefficient vector determined for each class and each frame taught by Yang and the regression offset framework of Girshick. A POSITA would have been motivated to do so because Yang teaches that determining a regression coefficient for each class and each frame enables refinement of action boundaries for each class and avoids the assignment ambiguities of class-agnostic regression heads (Yang ¶ 0080), and Girshick teaches that giving "each of the K classes ... its own refined ... prediction" is a known technique for improving localization accuracy. Applying the regression framework of Girshick and Yang to Vahdani's T-CAM scoring, which already produces a value for each class and each frame, is the predictable application of a known technique to a known problem (temporal action localization) to yield a predictable result (a refined temporal location for each class and each frame). KSR exemplary rationale (D), MPEP 2143(I)(D).
Claims 3, 10, and 17.
The combination of Vahdani, Yang, and De Souza discloses the method of claim 1, wherein determining the respective confidence and the respective class-specific temporal displacement for each frame and each class  further comprises performing separate convolution operations on the output matrix (Vahdani p. 6, right column, § 2.3.4 discloses “Several convolutional layers are applied on the features to predict actionness score (def 6), completeness score (def 8), classification score (def 9), and to adjust the temporal boundary of the proposals,” providing separate convolution operations on the output features for classification and for boundary adjustment; Yang Fig. 1B and ¶ 0050 shows a two-branch architecture, with two-branch head 128 splitting into a global/classification branch (operations 130, 132) and a local/regression branch (operations 134, 136)).
Claims 4, 11, and 18.
The combination of Vahdani, Yang, and De Souza discloses the method of claim 1, wherein the event detection model includes a model trunk selected from a group consisting of a 1-D U-Net (Vahdani 2.3.5 discloses a "U-shaped TFPN" (Temporal Feature Pyramid Network) built from convolutional and deconvolutional layers operating along the temporal axis of the input feature sequence, reading on the 1-D U-Net alternative under BRI) and a transformer encoder (TE) (Vahdani § 2.3.5 discloses an "encoder decoder transformer" trunk to "capture non-linear temporal structure by reasoning over videos as nonsequential entities", reading on the transformer encoder alternative).
Claims 5, 12, and 19.
The combination of Vahdani, Yang, and De Souza discloses the method of claim 1, wherein determining the event depicted by the frame sequence based on the respective confidence and the respective class-specific temporal displacement defined for each frame and each class includes consolidating the confidences and temporal displacements by displacing the confidences by the temporal displacements (Vahdani § 2.4.1.1 supplies the per-class, per-frame confidence, where “A[c, tcl] is the activation (def 15) of class c at temporal position tcl”;  Yang ¶ 0056 supplies the per-class, per-frame displacement as the 4xCxT regression coefficient vector, and Yang ¶¶ 0070-73 and Fig. 2D disclose iterative refinement anchor tubes across t-1, t, t+1 by application of the regression coefficient (i.e. the displacement in time); Girshick (incorporated by reference in Yang ¶ 0076) supplies the operation rule that the regression coefficient shifts the anchor in time and the classification score moves with the shifted anchor – reading on the claimed consolidation of confidences and displacements).
Claims 6 and 13.
The combination of Vahdani, Yang, and De Souza discloses the method of claim 1, wherein prior to applying the event detection model, a dimensionality reduction technique is applied to the input matrix (Vahdani p. 4, left top column, discloses “In some cases, the recognition scores of sampled frames are aggregated with the Top-k pooling”; where, pooling layers are commonly used in neural networks to summarize features from an area).
Claims 7, 14, and 20.
The combination of Vahdani, Yang, and De Souza discloses the method of claim 1, further comprising training the event detection model by optimizing a confidence loss and a temporal displacement loss (Vahdani § 2.3.4, Def. 13 discloses a classification cross-entropy loss over action classes and Def. 14 discloses an action regression loss whereby "the start and end offset of proposals are predicted and supervised by a regression loss"; Yang ¶ 0076 discloses joint training with two loss functions, "the cross-entropy loss for classification and the smooth-L1-loss ... for [regressing] the coefficients", written as                 
                    L
                    =
                    
                        
                            ∑
                            
                                i
                            
                        
                        
                            
                                
                                    L
                                
                                
                                    c
                                    l
                                    s
                                
                                
                                    i
                                
                            
                        
                    
                    +
                    λ
                     
                    
                        
                            L
                        
                        
                            r
                            e
                            g
                        
                        
                            i
                        
                    
                
            ).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Ross Varndell whose telephone number is (571)270-1922.  The examiner can normally be reached M-F 9-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, O’Neal Mistry can be reached at (313)446-4912.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Ross Varndell/Primary Examiner, Art Unit 2674

Read full office action

Prosecution Timeline

Show 3 earlier events

Jun 20, 2025

Final Rejection mailed — §103

Sep 11, 2025

Request for Continued Examination

Sep 16, 2025

Response after Non-Final Action

Oct 01, 2025

Non-Final Rejection mailed — §103

Jan 02, 2026

Response Filed

Feb 04, 2026

Non-Final Rejection mailed — §103

May 01, 2026

Response Filed

Jun 10, 2026

Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/471,940

Patent 12664610

MACHINE LEARNING TECHNIQUES TO CREATE HIGHER RESOLUTION COMPRESSED DATA STRUCTURES REPRESENTING TEXTURES FROM LOWER RESOLUTION COMPRESSED DATA STRUCTURES AND TRAINING THEREFOR

4y 9m to grant Granted Jun 23, 2026

17/855,763

Patent 12646343

METHODS AND APPARATUS TO PERFORM DENSE PREDICTION USING TRANSFORMER BLOCKS

3y 11m to grant Granted Jun 02, 2026

18/562,799

Patent 12646221

Point Cloud Attribute Encoding Method and Apparatus, Decoding Method and Apparatus, and Related Device

2y 6m to grant Granted Jun 02, 2026

18/483,427

Patent 12633106

INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING METHOD

2y 7m to grant Granted May 19, 2026

19/044,857

Patent 12626488

POST-PROCESSING UNIT FOR NEURAL PROCESSING UNIT

1y 3m to grant Granted May 12, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

6-7

Expected OA Rounds

85%

Grant Probability

98%

With Interview (+13.2%)

2y 3m (~0m remaining)

Median Time to Grant

High

PTA Risk

Based on 622 resolved cases by this examiner. Grant probability derived from career allowance rate.