Prosecution Insights
Last updated: April 19, 2026
Application No. 18/209,374

OBJECT TRACKING WITH SHOT TRANSITION DETECTION AND DYNAMIC QUEUE RESIZING

Final Rejection §102§103
Filed
Jun 13, 2023
Examiner
OMETZ, RACHEL ANNE
Art Unit
2668
Tech Center
2600 — Communications
Assignee
Microsoft Technology Licensing, LLC
OA Round
2 (Final)
69%
Grant Probability
Favorable
3-4
OA Rounds
2y 11m
To Grant
99%
With Interview

Examiner Intelligence

Grants 69% — above average
69%
Career Allow Rate
18 granted / 26 resolved
+7.2% vs TC avg
Strong +30% interview lift
Without
With
+30.1%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
24 currently pending
Career history
50
Total Applications
across all art units

Statute-Specific Performance

§101
3.1%
-36.9% vs TC avg
§103
62.1%
+22.1% vs TC avg
§102
18.8%
-21.2% vs TC avg
§112
14.7%
-25.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 26 resolved cases

Office Action

§102 §103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Status Claims 1-20 were pending for examination in Application No. 18/209,374 filed June 13th, 2023. In the remarks and amendments received on January 12th, 2026, claims 1-3, 5-7, 13, and 19-20 are amended, no claims are cancelled, and no claims are added, Accordingly, claims 1-20 are currently pending for examination in the application. Response to Amendment Applicant’s amendments filed January 12th, 2026, have overcome the 35 U.S.C. 112(b) rejections previously set forth in the Non-Final Office Action mailed September 11th, 2025. Accordingly, the rejections are withdrawn. Response to Arguments Applicant’s arguments filed January 12th, 2026, with respect to the rejection of claims 1 and 19-20, have been fully considered but are moot because the arguments do not apply to the new combination of references, facilitated by Applicant’s newly submitted amendments being used in the current rejection. Claim Rejections - 35 USC § 102 The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention. Claim(s) 1, 4-5, 7-9, and 19-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Castellani et al. (US-20170200280-A1). Regarding claim 1, Castellani teaches: In a computer system, a method comprising: reading a given frame of a video sequence (“An initial frame 202a is processed using full-frame feature extraction,” Para [0031]); determining whether an object detection condition is satisfied for the given frame (“identify locations in that frame of features of one or more objects to be tracked,” Para [0031]), including determining whether the given frame depicts a shot transition (“Yet another example event is raised when a scene change analysis detects that the current frame is different enough from a previous frame that a scene change is detected,” Para [0030]), wherein the object detection condition for the given frame depends at least in part on whether the given frame depicts the shot transition, the object detection condition being satisfied if the given frame depicts the shot transition (“Based on some event, such as… detecting a scene change as between frame 202d and 202e, the processing of the stream returns to full frame feature-based detection for frame 202e,” Para [0032]); and tracking an object in the given frame, including determining feature information for the object in the given frame (“a traditional full frame feature-based algorithm is used for feature extraction to detect interesting/relevant features, i.e. points or areas of interest,” Para [0027]), wherein the determining the feature information for the object in the given frame includes selecting between different tracking operations to determine at least some of the feature information for the object in the given frame depending on a result of the determining whether the object detection condition is satisfied for the given frame (“Based on some event, such as… detecting a scene change as between frame 202d and 202e, the processing of the stream returns to full frame feature-based detection for frame 202e,” Para [0032] for when a shot transition is detected, or simply “motion estimation” for non-scene change frames, see Para [0027]). Regarding claim 4, the rejection of claim 1 is incorporated herein. Castellani teaches the method of claim 1, and further teaches: wherein the shot transition is a viewpoint change in a scene, an abrupt scene change, a gradual scene change, a zoom-in, a zoom-out, a fade-in, a fade-out, or a wipe (“Sudden camera movements, entrance of new objects into field of view, and camera cutaways are just some examples of a scene change,” Para [0030]). Regarding claim 5, the rejection of claim 1 is incorporated herein. Castellani teaches the method of claim 1, and further teaches: wherein the determining whether the object detection condition is satisfied for the given frame further includes determining whether a frame counter has reached a threshold (“Based on some event, such as meeting/exceeding a threshold number of frames processed… the processing of the stream returns to full frame feature-based detection,” Para [0032]), and wherein the object detection condition for the given frame further depends at least in part on whether the frame counter has reached the threshold, the object detection condition being satisfied if the frame counter has reached the threshold (“Based on some event, such as meeting/exceeding a threshold number of frames processed,… or detecting a scene change as between frame 202d and 202e, the processing of the stream returns to full frame feature-based detection,” Para [0032]). Regarding claim 7, the rejection of claim 1 is incorporated herein. Castellani teaches the method of claim 1, and further teaches: wherein the feature information for the object in the given frame includes spatial information for the object in the given frame (“A list of coordinates corresponding to the points/areas of interest is established,” Para [0027]) and visual information for the object in the given frame (“a traditional full frame feature-based algorithm is used for feature extraction to detect interesting/relevant features, i.e. points or areas of interest,” Para [0027]). Regarding claim 8, the rejection of claim 7 is incorporated herein. Castellani teaches the method of claim 7, and further teaches: wherein, for the object in the given frame: the spatial information indicates location of the object in the given frame (“A list of coordinates corresponding to the points/areas of interest is established,” Para [0027]); and the visual information indicates aspects of appearance of the object in the given frame (“a traditional full frame feature-based algorithm is used for feature extraction to detect interesting/relevant features, i.e. points or areas of interest,” Para [0027]). Regarding claim 9, the rejection of claim 7 is incorporated herein. Castellani teaches the method of claim 7, and further teaches: wherein the object detection condition is satisfied (“Based on some event, such as… detecting a scene change,” Para [0032]), and wherein the determining the feature information for the object in the given frame (“the processing of the stream returns to full frame feature-based detection,” Para [0032]) includes: getting results of object detection operations to determine the spatial information for the object in the given frame (“A list of coordinates corresponding to the points/areas of interest is established,” Para [0027]); and performing feature extraction operations to determine the visual information for the object in the given frame (“a traditional full frame feature-based algorithm is used for feature extraction to detect interesting/relevant features, i.e. points or areas of interest,” Para [0027]). Claims 19 and 20 are non-transitory computer-readable medium and system claims that corresponds to method claim 1. Implementation of method claim 1 would necessarily encompass the non-transitory computer-readable medium and system claims 19 and 20. Therefore, the rejection of method claim 1 applies to claims 19 and 20. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim(s) 2 and 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Castellani et al. (US-20170200280-A1) as applied to claim 1 above, and further in view of Xu et al., "A shot boundary detection method for news video based on object segmentation and tracking," 2008 International Conference on Machine Learning and Cybernetics, Kunming, China, 2008, pp. 2470-2475, doi: 10.1109/ICMLC.2008.462082, hereinafter referred to as Xu. Regarding claim 2, the rejection of claim 1 is incorporated herein. Castellani teaches the method of claim 1, but fails to teach the following limitations as further claimed. However, Xu teaches wherein the determining whether the given frame depicts a shot transition (“shot boundary”) includes evaluating a result of shot transition detection for the given frame, the result of shot transition detection for the given frame having been determined by shot transition detection operations comprising: calculating a given frame histogram (“histogram”) using sample values (“color”) of the given frame (“first, video frame is divided into N sub-blocks; then, we compute the color histogram of each sub-blocks,” Pg. 2470, Section 2); and measuring differences between the given frame histogram and a previous frame histogram (“Compare the histogram difference between the frame and the immediate preceding frame,” Pg. 2472, Section 4) the previous frame histogram having been calculated using sample values of a previous frame of the video sequence (Pg. 2471, Section 2, Equation 1, where H(i-1)(n) is the “color histogram” of the previous frame). PNG media_image1.png 54 447 media_image1.png Greyscale Xu is considered to be analogous to the claimed invention because they are in the same field of shot boundary detection that is dependent on motion or lack thereof. Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to have incorporated the teachings of Xu into Castellani for the benefit of improved object tracking. Regarding claim 10, Castellani teaches the method of claim 9, but fails to teach the following limitations as further claimed. Xu, however, further teaches wherein, for the object (“discrete objects”) in the given frame: the object detection operations produce a bounding box around the object as the spatial information for the object (“First, the discrete objects are extracted from the segmentation results of the video frames, and their bounding boxes and centroids are obtained,” Pg. 2472, Section 3.2); and the feature extraction operations produce an embedding vector as the visual information for the object (“the pixel value in object mask map is binary data (either 1 or 2),” Pg. 2472, Section 4). It would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to have incorporated the teachings of Xu into Castellani for the benefit of improved object tracking. Claim(s) 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Castellani et al. (US-20170200280-A1) as applied to claim 1 above, and further in view of Xu et al., "A shot boundary detection method for news video based on object segmentation and tracking," 2008 International Conference on Machine Learning and Cybernetics, Kunming, China, 2008, pp. 2470-2475, doi: 10.1109/ICMLC.2008.462082, hereinafter referred to as Xu, Hameed, "Video shot detection by motion estimation and compensation," 2009 International Conference on Emerging Technologies, Islamabad, Pakistan, 2009, pp. 241-246, doi: 10.1109/ICET.2009.5353168, hereinafter referred to as Hameed, and Hassanien et al., “Large-scale, Fast and Accurate Shot Boundary Detection through Spatio-temporal Convolutional Neural Networks,” arXiv:1705.03281 v2, hereinafter referred to as Hassanien. Regarding claim 3, the rejection of claim 1 is incorporated herein. Castellani teaches the method of claim 1, but fails to teach the following limitations as further claimed. Xu, however, further teaches: wherein the determining whether the given frame depicts a shot transition includes evaluating a result of shot transition detection for the given frame (“partitioned color histogram comparison method as the first filter and uses strict threshold in order not to miss any possible shot boundary,” Pg. 2470, Section 2), the result of shot transition detection for the given frame having been determined by shot transition detection operations comprising: analyzing statistical properties of sample values of the given frame (“Compare the histogram difference between the frame and the immediate preceding frame,” Pg. 2472, Section 4) compared to statistical properties of another frame of the video sequence (“histogram… [of] the immediate preceding frame,” Pg. 2472, Section 4). Xu fails to teach the following limitations as further claimed. Hameed, however, teaches analyzing encoded data (“blocks,” Pg. 243, Section III) for the given frame, including analyzing one or more of statistical properties of motion vectors for the units of the given frame (“correlation coefficient gives a measure of the degree of similarity between two regions in different video frames,” Pg. 243, Section III); analyzing results of block matching or other motion estimation (calculating “The correlation coefficient ȡij between the two blocks i and j in consecutive frames,” Pg. 243, Section III) between blocks of the given frame and a previous frame of the video sequence (“consecutive frames”). Hameed fails to teach the following limitations as further claimed. Hassanien, however, teaches using spatio-temporal convolutional neural network (“spatio-temporal Convolutional Neural Networks”) or other neural network to detect boundaries between different shots (“we present an SBD technique based on spatio-temporal Convolutional Neural Networks (CNN),” Abstract, where SBD is “shot boundary detection”). Xu is considered to be analogous to the claimed invention because they are in the same field of shot boundary detection that is dependent on motion or lack thereof. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Xu into Castellani for the benefit of more accurate shot boundary detection. Hameed is considered to be analogous to the claimed invention because they are in the same field of detecting shot transitions. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Hameed into Castellani and Xu for the benefit of more accurate shot transition detection. Hassanien is considered to be analogous to the claimed invention because they are in the same field of shot boundary detection. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Hassanien into Hameed, Xu, and Castellani for the benefit of more accurate shot transition detection. Claim(s) 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Castellani et al. (US-20170200280-A1), as applied to claim 1 above, and further in view of Huang et al., "Shot Change Detection via Local Keypoint Matching," in IEEE Transactions on Multimedia, vol. 10, no. 6, pp. 1097-1108, Oct. 2008, doi: 10.1109/TMM.2008.2001374, hereinafter referred to as Huang. Regarding claim 6, the rejection of claim 1 is incorporated herein. Castellani teaches the method of claim 1, but fails to teach the following limitations as further described. However, Huang further teaches wherein the determining whether the object detection condition is satisfied for the given frame further includes: determining whether an in-interval shot transition occurs (“candidate shot changes”) anywhere in an interval between two end-point frames on opposite sides of the given frame (“we match the frames before and after the intervals of candidate shot changes,” Pg. 1100, Section III), wherein the object detection condition for the given frame further depends at least in part on whether the in-interval shot transition (“detected transition”) occurs anywhere in the interval between the two end-point frames (“the first and the last frames of each interval”; “After finding the intervals of transitions in the initial step, discussed in Section II, the first and the last frames of each interval are matched again. If there are still many matching keypoints, the two frames are considered similar because the detected transition is probably a false alarm,” Pg. 1100, Section III). Huang is considered to be analogous to the claimed invention because they are in the same field of shot transition detection. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Huang into Castellani for the benefit of another shot transition “check”, which makes the overall shot transition detection system more accurate. Claim(s) 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Castellani et al. (US-20170200280-A1), as applied to claim 1 above, and further in view of Hameed, "Video shot detection by motion estimation and compensation," 2009 International Conference on Emerging Technologies, Islamabad, Pakistan, 2009, pp. 241-246, doi: 10.1109/ICET.2009.5353168, hereinafter referred to as Hameed. Regarding claim 11, the rejection of claim 7 is incorporated herein. Castellani teaches the method of claim 7, but fails to teach the following limitations as further claimed. However, Hameed further teaches wherein the object detection condition is not satisfied (“Inaccurate motion estimation is observed for the block if a high pass filter was not applied,” Pg. 244, Section IV), and wherein the determining the feature information for the object in the given frame includes: performing interpolation operations (“high pass filter”) to determine the spatial information (related to motion) for the object in the given frame (“motion of a block is projected more accurately which have been passed through high pass filter,” Pg. 244, Section IV); and performing feature extraction operations (“correlation metric,” Pg. 245, Section V) to determine the visual information for the object in the given frame (“matches image features corresponding to high frequency occurrence viz. edges, corners and certain types of textures,” Pg. 245, Section V). Hameed is considered to be analogous to the claimed invention because they are in the same field of detecting shot transitions. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Hameed into Castellani for the benefit of more accurately detecting and tracking objects when determining a shot transition. Claim(s) 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Castellani et al. (US-20170200280-A1), as applied to claim 1 above, and further in view of Wang et al. (US-20200126241-A1), hereinafter referred to as Wang ‘241. Regarding claim 13, the rejection of claim 1 is incorporated herein. Castellani teaches the method of claim 1, but fails to teach the following limitations as further claimed. Wang ‘241, however, teaches wherein the tracking the object in the given frame further includes: PNG media_image2.png 729 611 media_image2.png Greyscale using at least some of the feature information (Fig. 2, 203, “feature map”) determining affinities for the object in the given frame (“concatenating bounding boxes coordinates and corresponding velocities to each first feature map into corresponding first dimensional vectors”); and using at least some of the affinities, associating the object in the given frame with corresponding objects (Fig. 2, 202, “each object in the second sequence of tracklets”) in other frames of the video sequence as part of updating tracking information (Fig. 3, Frame t vs Frame t-1). PNG media_image3.png 514 628 media_image3.png Greyscale Wang ‘241 is considered to be analogous to the claimed invention because they are in the same field of multi-object tracking and associating. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Wang ‘241 into Castellani for the benefit of more accurate object tracking and shot detection. Claim(s) 17-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Castellani et al. (US-20170200280-A1), as applied to claim 1 above, and further in view of Wang (JP-2015050591-A), hereinafter referred to as Wang ‘591. Regarding claim 17, the rejection of claim 1 is incorporated herein. Castellani teaches the method of claim 1, but fails to teach the following limitations as further claimed. Wang ‘591, however, further teaches determining that a queue (“queues”) is not full; and after the reading the given frame (“packet”), storing the given frame in the queue (“queues… that store packets classified into the video category (AC_VI),” Para [0045]). Wang ‘591 is considered to be analogous to the claimed invention because they are in the same field of video processors. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Wang ‘591 into Castellani for the benefit of better video processing, as there will be less or no buffering in the system. Regarding claim 18, the rejection of claim 17 is incorporated herein. Castellani in view of Wang ‘591 teaches the method of claim 17, and Wang ‘591 further teaches wherein the queue has a maximum queue size (“sets the number of stages in each queue buffer (queues 111 to 113) that stores packets classified in the video category (AC_VI),” Para [0089]), the method further comprising: selectively adjusting the maximum queue size (“accept a size change”) depending on whether a queue condition is satisfied (“The queue buffer unit 22 can accept a size change from the change means 109 of the video queue size change unit 25,” Para [0089]; in other words, the size or amount of buffer stages in the queue can change depending on the processing needs of the user). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Wang ‘591 into Castellani for the benefit of optimized video processing performance. Allowable Subject Matter Claims 12 and 14-16 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. It is suggested that the allowable dependent claims are moved to the independent claim as limitations. The claims are potentially allowable on account of the prior art failing to anticipate or render obvious the limitations of the allowable subject matter. The primary reason of allowance for claim 12 is the implementation of “as a condition for the performing the interpolation operations, determining that visual information for the object matches in the two end-point frames or that an identifier for the object matches in the two end-point frames; determining spatial information for the object in two end-point frames on opposite sides of the given frame, the object detection condition being satisfied for each of the two end-point frames; and interpolating between the spatial information for the object in the two end-point frames.” Although interpolation is not new in the field of shot detection or object tracking, it is in combination and in the context of all claim limitations that claim 12 recites novel matter. The primary reason for allowance for claim 14 is the implementation of selectively updating information relating to the target/detected object based on if a shot transition is detected or not. This, in combination with the claims it depends on, is novel. Claim 15 is novel because of the specific filtering of the tracking information, and claim 16 is novel because of the specific models adapted for each type of object. All of these claims are novel when in combination with the claims they depend from. Conclusion Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to RACHEL A OMETZ whose telephone number is (571)272-2535. The examiner can normally be reached 6:45am-4:00pm ET Monday-Thursday, 6:45am-1:00pm ET every other Friday. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached at 571-272-7332. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /Rachel Anne Ometz/ Examiner, Art Unit 2668 2/12/26 /VU LE/ Supervisory Patent Examiner, Art Unit 2668
Read full office action

Prosecution Timeline

Jun 13, 2023
Application Filed
Sep 04, 2025
Non-Final Rejection — §102, §103
Jan 12, 2026
Response Filed
Feb 12, 2026
Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12602925
HYPERSPECTRAL IMAGE ANALYSIS USING MACHINE LEARNING
2y 5m to grant Granted Apr 14, 2026
Patent 12555255
ABSOLUTE DEPTH ESTIMATION FROM A SINGLE IMAGE USING ONLINE DEPTH SCALE TRANSFER
2y 5m to grant Granted Feb 17, 2026
Patent 12548354
METHOD FOR PROCESSING CELL IMAGE, ELECTRONIC DEVICE, AND STORAGE MEDIUM
2y 5m to grant Granted Feb 10, 2026
Patent 12541970
SYSTEM AND METHOD FOR ESTIMATING THE POSE OF A LOCALIZING APPARATUS USING REFLECTIVE LANDMARKS AND OTHER FEATURES
2y 5m to grant Granted Feb 03, 2026
Patent 12530735
IMAGE PROCESSING APPARATUS THAT IMPROVES COMPRESSION EFFICIENCY OF IMAGE DATA, METHOD OF CONTROLLING SAME, AND STORAGE MEDIUM
2y 5m to grant Granted Jan 20, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
69%
Grant Probability
99%
With Interview (+30.1%)
2y 11m
Median Time to Grant
Moderate
PTA Risk
Based on 26 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month