Last updated: April 19, 2026
Application No. 18/292,723
METHOD AND SYSTEM OF ANALYZING MOTION BASED ON FEATURE TRACKING

Non-Final OA §102
Filed
Jan 26, 2024
Examiner
BAKER, CHARLOTTE M
Art Unit
2664
Tech Center
2600 — Communications
Assignee
Postech Research And Business Development Foundation
OA Round
1 (Non-Final)
Interview Optional

— -0.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 1067 resolved cases, 2023–2026
Examiner Intelligence

BAKER, CHARLOTTE M View full profile →
Grants 93% — above average
Career Allow Rate
991 granted / 1067 resolved
+30.9% vs TC avg
Minimal -0% lift
Without
With
+-0.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 2m
Avg Prosecution
15 currently pending
Career history
1082
Total Applications
across all art units
Statute-Specific Performance

§101
21.6%
-18.4% vs TC avg
§103
24.7%
-15.3% vs TC avg
§102
27.4%
-12.6% vs TC avg
§112
4.3%
-35.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 1067 resolved cases
Office Action

§102
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-3, 6, 9, 12, 15-17  is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Mao et al. (hereinafter Mao) (WO 2021/236296 A1).
Regarding claim 1:  Mao discloses receiving an image frame containing an image of a target to determine a reference point (For example, the process 930 can be performed to select an initial video frame from a sequence of frames and to set an object bounding box center point (or other point) and an object bounding box diagonal length (or other length) as a reference point. The process 935 can be performed to crop and scale subsequent video frames to maintain the size and/or position of the object throughout the sequence of frames., par. 185) and a direction vector of the image (The motion estimation/for each pixel can include an optical flow vector that indicates a movement of the pixel between the frames. In some cases, the optical flow vector for a pixel can be a displacement vector (e.g., indicating horizontal and vertical displacements, such as x- and y- displacements) showing the movement of a pixel from a first frame to a second frame., par. 210); rotating the image on the basis of the reference point and the direction vector of the image (One example of a video stabilization technique that can be performed is a fast and robust two-dimensional motion model of Euclidean transformation, which can be used by motion models to solve the video stabilization problem. In the Euclidean motion model, a square in an image can be transformed to any other square with a different location, size, and/or rotation for motion stabilization (because the camera movement between successive frames of a video is usually small). FIG. 11 is a diagram illustrating examples of applied motion models, including an original square and various transforms applied relative to the original square. The transforms include translation, Euclidean, Affine, and Homography. FIG. 12 is a flow diagram illustrating an example of a process 1200 for performing image stabilization. The image stabilization process includes tracking one or more feature points between two consecutive frames. The tracked features allow the system to estimate the motion between frames and compensate for the motion. An input frame sequence 1202 including a sequence of frames is provided as input to the process 1200. The input frame sequence 1202 can include the output frames 814. At block 1204, the process 1200 includes performing saliency points detection using optical flow. The saliency detection is performed to determine feature points in a current frame. Any suitable type of optical flow technique or algorithm can be used at block 1204. The optical flow motion estimation can be performed on a pixel-by-pixel basis in some cases. For instance, for each pixel in a current frame y, the motion estimation / defines the location of the corresponding pixel in the previous frame x. The motion estimation/for each pixel can include an optical flow vector that indicates a movement of the pixel between the frames. In some cases, the optical flow vector for a pixel can be a displacement vector (e.g., indicating horizontal and vertical displacements, such as x- and y- displacements) showing the movement of a pixel from a first frame to a second frame., pars. 209-210); extracting a feature by setting a region of interest centered on the reference point (An example is described with respect to FIG. 10A and FIG. 10B. FIG. 10A is a diagram illustrating an example of an initial frame 1002 of a video. A user has selected a person as an object of interest. A bounding box 1004 is generated to represent a region of interest for the person. The bounding box 1004 is shown with a height of h and a width of w. A location (e.g., an (x, y) coordinate location) of the center point 1006 of the bounding box 1004 and a diagonal length 1008 of the bounding box 1004 are determined and used as references from which to crop and scale subsequent frames of the video in order to maintain the person with a constant size and location in the subsequent frames. [0204] FIG. 10B is a diagram illustrating an example of a subsequent frame 1012 occurring after the initial frame 1002 in the video. Based on object detection and tracking, a bounding box 1014 is generated around the person in the subsequent frame 1012. The bounding box 1014 has a width of w-n and a height of h-m. The width w-n of the bounding box 10014 is smaller than the width w of the bounding box 1004 in the initial frame 1002, and the height of h-m of the bounding box 1014 is smaller than the height h of the bounding box 1004 in the initial frame 1002. A location (e.g., an (x, y) coordinate location) of the center point 1016 and a diagonal length 1008 of the bounding box 1004 are determined., pars. 203-204 and Fig. 10A) and masking a region other than the region of interest (Figs. 10A and 10B); and tracking the motion of the feature (FIG. 12 is a flow diagram illustrating an example of a process 1200 for performing image stabilization. The image stabilization process includes tracking one or more feature points between two consecutive frames. The tracked features allow the system to estimate the motion between frames and compensate for the motion., par. 210).
Regarding claim 2:  Mao satisfies all the elements of claim 1.  Mao further discloses wherein when determining the reference point (For example, the process 930 can be performed to select an initial video frame from a sequence of frames and to set an object bounding box center point (or other point) and an object bounding box diagonal length (or other length) as a reference point. The process 935 can be performed to crop and scale subsequent video frames to maintain the size and/or position of the object throughout the sequence of frames., par. 185) and the direction vector of the image (The motion estimation/for each pixel can include an optical flow vector that indicates a movement of the pixel between the frames. In some cases, the optical flow vector for a pixel can be a displacement vector (e.g., indicating horizontal and vertical displacements, such as x- and y- displacements) showing the movement of a pixel from a first frame to a second frame., par. 210), the reference point and the direction vector are automatically determined through edge detection (In some examples, an object in the initial frame can be automatically detected (e.g., using object detection and/or recognition) in the initial frame, and the ROI determination engine 804 can define a ROI around the detected object. The object can be detected using object detection and/or recognition technique (e.g., a facial detection and/or recognition algorithm, a feature detection and/or recognition algorithm, an edge detection algorithm, a boundary tracing function, any combination thereof, and/or other object detection and/or recognition technique)., par. 222).
Regarding claim 3:  Mao satisfies all the elements of claim 1.  Mao further discloses wherein when determining the reference point (For example, the process 930 can be performed to select an initial video frame from a sequence of frames and to set an object bounding box center point (or other point) and an object bounding box diagonal length (or other length) as a reference point. The process 935 can be performed to crop and scale subsequent video frames to maintain the size and/or position of the object throughout the sequence of frames., par. 185) and the direction vector of the image (The motion estimation/for each pixel can include an optical flow vector that indicates a movement of the pixel between the frames. In some cases, the optical flow vector for a pixel can be a displacement vector (e.g., indicating horizontal and vertical displacements, such as x- and y- displacements) showing the movement of a pixel from a first frame to a second frame., par. 210), the reference point and the direction vector are received through a user interface (As described above, in some examples the ROI can be determined based on a user selection of a portion of the initial frame, such as an object depicted in the initial frame. For example, a user can select the target object that will be used in the autozoom process to maintain the object with a fixed size (e.g., the size of the object in the initial frame) across multiple frames of the sequence of frames. The user input can be received using any input interface of the device, such as a touchscreen, an electronic drawing tool, a gesture-based user interface (e.g., one or more image sensors used to detect gesture input), a voice input based user interface (e.g., a speaker and voice recognition tool used to identify voice inputs), and/or other user interface. Any of the inputs described above with respect to FIG. 8C and FIG. 9 and/or other inputs can be provided by a user. For instance, the object selection can be performed based on a tap (e.g., a single tap, a double tap, or the like) on an object displayed in the initial frame, the user drawing a bounding box around the object, or other type of object selection. In some cases, guidance can be provided for the end user on how to utilize the feature of keeping a target object size unchanged throughout a video or other sequence of frames. For instance, a prompt can be displayed to the user indicating how to select an object to keep fixed throughout the video. For a video, the user can select an object of interest by tapping (e.g., on a touchscreen) on the object or drawing a bounding box around the object in the initial frame of the video. Based on the selected portion of the initial frame, the ROI determination engine 804 can define a ROI around the selected portion (e g., around a selected object)., par. 221).
Regarding claim 6:  Mao satisfies all the elements of claim 1.  Mao further discloses further comprising:  obtaining an optimal model on the basis of the motion of the feature (The background subtraction engine 412 can model the background of a scene (e.g., captured in the video sequence) using any suitable background subtraction technique (also referred to as background extraction). One example of a background subtraction method used by the background subtraction engine 412 includes modeling the background of the scene as a statistical model based on the relatively static pixels in previous frames which are not considered to belong to any moving region. For example, the background subtraction engine 412 can use a Gaussian distribution model for each pixel location, with parameters of mean and variance to model each pixel location in frames of a video sequence. All the values of previous pixels at a particular pixel location are used to calculate the mean and variance of the target Gaussian model for the pixel location. When a pixel at a given location in a new video frame is processed, its value will be evaluated by the current Gaussian distribution of this pixel location. A classification of the pixel to either a foreground pixel or a background pixel is done by comparing the difference between the pixel value and the mean of the designated Gaussian model. In one illustrative example, if the distance of the pixel value and the Gaussian Mean is less than three (3) times of the variance, the pixel is classified as a background pixel. Otherwise, in this illustrative example, the pixel is classified as a foreground pixel. At the same time, the Gaussian model for a pixel location will be updated by taking into consideration the current pixel value., par. 114); and removing an extreme value on the basis of the optimal model (The background subtraction engine 412 can model the background of a scene (e.g., captured in the video sequence) using any suitable background subtraction technique (also referred to as background extraction). One example of a background subtraction method used by the background subtraction engine 412 includes modeling the background of the scene as a statistical model based on the relatively static pixels in previous frames which are not considered to belong to any moving region. For example, the background subtraction engine 412 can use a Gaussian distribution model for each pixel location, with parameters of mean and variance to model each pixel location in frames of a video sequence. All the values of previous pixels at a particular pixel location are used to calculate the mean and variance of the target Gaussian model for the pixel location. When a pixel at a given location in a new video frame is processed, its value will be evaluated by the current Gaussian distribution of this pixel location. A classification of the pixel to either a foreground pixel or a background pixel is done by comparing the difference between the pixel value and the mean of the designated Gaussian model. In one illustrative example, if the distance of the pixel value and the Gaussian Mean is less than three (3) times of the variance, the pixel is classified as a background pixel. Otherwise, in this illustrative example, the pixel is classified as a foreground pixel. At the same time, the Gaussian model for a pixel location will be updated by taking into consideration the current pixel value., par. 114).
Regarding claim 9:  Mao discloses receiving an image frame containing an image of a target (For example, the process 930 can be performed to select an initial video frame from a sequence of frames and to set an object bounding box center point (or other point) and an object bounding box diagonal length (or other length) as a reference point. The process 935 can be performed to crop and scale subsequent video frames to maintain the size and/or position of the object throughout the sequence of frames., par. 185); finding a trackable feature in the image frame (FIG. 12 is a flow diagram illustrating an example of a process 1200 for performing image stabilization. The image stabilization process includes tracking one or more feature points between two consecutive frames. The tracked features allow the system to estimate the motion between frames and compensate for the motion. An input frame sequence 1202 including a sequence of frames is provided as input to the process 1200. The input frame sequence 1202 can include the output frames 814. At block 1204, the process 1200 includes performing saliency points detection using optical flow. The saliency detection is performed to determine feature points in a current frame. Any suitable type of optical flow technique or algorithm can be used at block 1204. The optical flow motion estimation can be performed on a pixel-by-pixel basis in some cases. For instance, for each pixel in a current frame y, the motion estimation / defines the location of the corresponding pixel in the previous frame x. The motion estimation/for each pixel can include an optical flow vector that indicates a movement of the pixel between the frames. In some cases, the optical flow vector for a pixel can be a displacement vector (e.g., indicating horizontal and vertical displacements, such as x- and y- displacements) showing the movement of a pixel from a first frame to a second frame., par. 210); tracking the motion of the feature (FIG. 12 is a flow diagram illustrating an example of a process 1200 for performing image stabilization. The image stabilization process includes tracking one or more feature points between two consecutive frames. The tracked features allow the system to estimate the motion between frames and compensate for the motion. An input frame sequence 1202 including a sequence of frames is provided as input to the process 1200. The input frame sequence 1202 can include the output frames 814. At block 1204, the process 1200 includes performing saliency points detection using optical flow. The saliency detection is performed to determine feature points in a current frame. Any suitable type of optical flow technique or algorithm can be used at block 1204. The optical flow motion estimation can be performed on a pixel-by-pixel basis in some cases. For instance, for each pixel in a current frame y, the motion estimation / defines the location of the corresponding pixel in the previous frame x. The motion estimation/for each pixel can include an optical flow vector that indicates a movement of the pixel between the frames. In some cases, the optical flow vector for a pixel can be a displacement vector (e.g., indicating horizontal and vertical displacements, such as x- and y- displacements) showing the movement of a pixel from a first frame to a second frame., par. 210); obtaining an optimal model on the basis of the motion of the feature (The background subtraction engine 412 can model the background of a scene (e.g., captured in the video sequence) using any suitable background subtraction technique (also referred to as background extraction). One example of a background subtraction method used by the background subtraction engine 412 includes modeling the background of the scene as a statistical model based on the relatively static pixels in previous frames which are not considered to belong to any moving region. For example, the background subtraction engine 412 can use a Gaussian distribution model for each pixel location, with parameters of mean and variance to model each pixel location in frames of a video sequence. All the values of previous pixels at a particular pixel location are used to calculate the mean and variance of the target Gaussian model for the pixel location. When a pixel at a given location in a new video frame is processed, its value will be evaluated by the current Gaussian distribution of this pixel location. A classification of the pixel to either a foreground pixel or a background pixel is done by comparing the difference between the pixel value and the mean of the designated Gaussian model. In one illustrative example, if the distance of the pixel value and the Gaussian Mean is less than three (3) times of the variance, the pixel is classified as a background pixel. Otherwise, in this illustrative example, the pixel is classified as a foreground pixel. At the same time, the Gaussian model for a pixel location will be updated by taking into consideration the current pixel value., par. 114); and removing an extreme value on the basis of the optimal model (The background subtraction engine 412 can model the background of a scene (e.g., captured in the video sequence) using any suitable background subtraction technique (also referred to as background extraction). One example of a background subtraction method used by the background subtraction engine 412 includes modeling the background of the scene as a statistical model based on the relatively static pixels in previous frames which are not considered to belong to any moving region. For example, the background subtraction engine 412 can use a Gaussian distribution model for each pixel location, with parameters of mean and variance to model each pixel location in frames of a video sequence. All the values of previous pixels at a particular pixel location are used to calculate the mean and variance of the target Gaussian model for the pixel location. When a pixel at a given location in a new video frame is processed, its value will be evaluated by the current Gaussian distribution of this pixel location. A classification of the pixel to either a foreground pixel or a background pixel is done by comparing the difference between the pixel value and the mean of the designated Gaussian model. In one illustrative example, if the distance of the pixel value and the Gaussian Mean is less than three (3) times of the variance, the pixel is classified as a background pixel. Otherwise, in this illustrative example, the pixel is classified as a foreground pixel. At the same time, the Gaussian model for a pixel location will be updated by taking into consideration the current pixel value., par. 114); and removing an extreme value on the basis of the optimal model (The background subtraction engine 412 can model the background of a scene (e.g., captured in the video sequence) using any suitable background subtraction technique (also referred to as background extraction). One example of a background subtraction method used by the background subtraction engine 412 includes modeling the background of the scene as a statistical model based on the relatively static pixels in previous frames which are not considered to belong to any moving region. For example, the background subtraction engine 412 can use a Gaussian distribution model for each pixel location, with parameters of mean and variance to model each pixel location in frames of a video sequence. All the values of previous pixels at a particular pixel location are used to calculate the mean and variance of the target Gaussian model for the pixel location. When a pixel at a given location in a new video frame is processed, its value will be evaluated by the current Gaussian distribution of this pixel location. A classification of the pixel to either a foreground pixel or a background pixel is done by comparing the difference between the pixel value and the mean of the designated Gaussian model. In one illustrative example, if the distance of the pixel value and the Gaussian Mean is less than three (3) times of the variance, the pixel is classified as a background pixel. Otherwise, in this illustrative example, the pixel is classified as a foreground pixel. At the same time, the Gaussian model for a pixel location will be updated by taking into consideration the current pixel value., par. 114).
Regarding claim 12:  Mao discloses an image capturing device that captures an image frame containing an image of a target (Fig. 1); and a controller configured to receive the image frame from the image capturing device (The image capture and processing system 100 includes various components that are used to capture and process images of scenes (e.g., an image of a scene 110). The image capture and processing system 100 can capture standalone images (or photographs) and/or can capture videos that include multiple images (or video frames) in a particular sequence. A lens 115 of the system 100 faces a scene 110 and receives light from the scene 110. The lens 115 bends the light toward an image sensor 130. The light received by the lens 115 passes through an aperture controlled by one or more control mechanisms 120 and is received by an image sensor 130., par. 82 and Fig. 1), extract a trackable feature from the image frame, track a motion of the extracted feature (An example is described with respect to FIG. 10A and FIG. 10B. FIG. 10A is a diagram illustrating an example of an initial frame 1002 of a video. A user has selected a person as an object of interest. A bounding box 1004 is generated to represent a region of interest for the person. The bounding box 1004 is shown with a height of h and a width of w. A location (e.g., an (x, y) coordinate location) of the center point 1006 of the bounding box 1004 and a diagonal length 1008 of the bounding box 1004 are determined and used as references from which to crop and scale subsequent frames of the video in order to maintain the person with a constant size and location in the subsequent frames. [0204] FIG. 10B is a diagram illustrating an example of a subsequent frame 1012 occurring after the initial frame 1002 in the video. Based on object detection and tracking, a bounding box 1014 is generated around the person in the subsequent frame 1012. The bounding box 1014 has a width of w-n and a height of h-m. The width w-n of the bounding box 10014 is smaller than the width w of the bounding box 1004 in the initial frame 1002, and the height of h-m of the bounding box 1014 is smaller than the height h of the bounding box 1004 in the initial frame 1002. A location (e.g., an (x, y) coordinate location) of the center point 1016 and a diagonal length 1008 of the bounding box 1004 are determined., pars. 203-204 and Fig. 10A), obtain an optimal model on the basis of the motion of the feature (The background subtraction engine 412 can model the background of a scene (e.g., captured in the video sequence) using any suitable background subtraction technique (also referred to as background extraction). One example of a background subtraction method used by the background subtraction engine 412 includes modeling the background of the scene as a statistical model based on the relatively static pixels in previous frames which are not considered to belong to any moving region. For example, the background subtraction engine 412 can use a Gaussian distribution model for each pixel location, with parameters of mean and variance to model each pixel location in frames of a video sequence. All the values of previous pixels at a particular pixel location are used to calculate the mean and variance of the target Gaussian model for the pixel location. When a pixel at a given location in a new video frame is processed, its value will be evaluated by the current Gaussian distribution of this pixel location. A classification of the pixel to either a foreground pixel or a background pixel is done by comparing the difference between the pixel value and the mean of the designated Gaussian model. In one illustrative example, if the distance of the pixel value and the Gaussian Mean is less than three (3) times of the variance, the pixel is classified as a background pixel. Otherwise, in this illustrative example, the pixel is classified as a foreground pixel. At the same time, the Gaussian model for a pixel location will be updated by taking into consideration the current pixel value., par. 114), and remove an extreme value on the basis of the optimal model (The background subtraction engine 412 can model the background of a scene (e.g., captured in the video sequence) using any suitable background subtraction technique (also referred to as background extraction). One example of a background subtraction method used by the background subtraction engine 412 includes modeling the background of the scene as a statistical model based on the relatively static pixels in previous frames which are not considered to belong to any moving region. For example, the background subtraction engine 412 can use a Gaussian distribution model for each pixel location, with parameters of mean and variance to model each pixel location in frames of a video sequence. All the values of previous pixels at a particular pixel location are used to calculate the mean and variance of the target Gaussian model for the pixel location. When a pixel at a given location in a new video frame is processed, its value will be evaluated by the current Gaussian distribution of this pixel location. A classification of the pixel to either a foreground pixel or a background pixel is done by comparing the difference between the pixel value and the mean of the designated Gaussian model. In one illustrative example, if the distance of the pixel value and the Gaussian Mean is less than three (3) times of the variance, the pixel is classified as a background pixel. Otherwise, in this illustrative example, the pixel is classified as a foreground pixel. At the same time, the Gaussian model for a pixel location will be updated by taking into consideration the current pixel value., par. 114).
Regarding claim 15:  Mao satisfies all the elements of claim 12.  Mao further discloses wherein the controller (Fig. 1) is configured to extract the trackable feature from the image frame through region of interest (ROI) filtering (An example is described with respect to FIG. 10A and FIG. 10B. FIG. 10A is a diagram illustrating an example of an initial frame 1002 of a video. A user has selected a person as an object of interest. A bounding box 1004 is generated to represent a region of interest for the person. The bounding box 1004 is shown with a height of h and a width of w. A location (e.g., an (x, y) coordinate location) of the center point 1006 of the bounding box 1004 and a diagonal length 1008 of the bounding box 1004 are determined and used as references from which to crop and scale subsequent frames of the video in order to maintain the person with a constant size and location in the subsequent frames. [0204] FIG. 10B is a diagram illustrating an example of a subsequent frame 1012 occurring after the initial frame 1002 in the video. Based on object detection and tracking, a bounding box 1014 is generated around the person in the subsequent frame 1012. The bounding box 1014 has a width of w-n and a height of h-m. The width w-n of the bounding box 10014 is smaller than the width w of the bounding box 1004 in the initial frame 1002, and the height of h-m of the bounding box 1014 is smaller than the height h of the bounding box 1004 in the initial frame 1002. A location (e.g., an (x, y) coordinate location) of the center point 1016 and a diagonal length 1008 of the bounding box 1004 are determined., pars. 203-204 and Fig. 10A).
Regarding claim 16:  Mao satisfies all the elements of claim 15.  Mao further discloses wherein the controller (Fig. 1) is configured to extract the feature by determining a reference point (For example, the process 930 can be performed to select an initial video frame from a sequence of frames and to set an object bounding box center point (or other point) and an object bounding box diagonal length (or other length) as a reference point. The process 935 can be performed to crop and scale subsequent video frames to maintain the size and/or position of the object throughout the sequence of frames., par. 185) and a direction vector of the image (The motion estimation/for each pixel can include an optical flow vector that indicates a movement of the pixel between the frames. In some cases, the optical flow vector for a pixel can be a displacement vector (e.g., indicating horizontal and vertical displacements, such as x- and y- displacements) showing the movement of a pixel from a first frame to a second frame., par. 210), rotating the image on the basis of the reference point and the direction vector of the image (One example of a video stabilization technique that can be performed is a fast and robust two-dimensional motion model of Euclidean transformation, which can be used by motion models to solve the video stabilization problem. In the Euclidean motion model, a square in an image can be transformed to any other square with a different location, size, and/or rotation for motion stabilization (because the camera movement between successive frames of a video is usually small). FIG. 11 is a diagram illustrating examples of applied motion models, including an original square and various transforms applied relative to the original square. The transforms include translation, Euclidean, Affine, and Homography. FIG. 12 is a flow diagram illustrating an example of a process 1200 for performing image stabilization. The image stabilization process includes tracking one or more feature points between two consecutive frames. The tracked features allow the system to estimate the motion between frames and compensate for the motion. An input frame sequence 1202 including a sequence of frames is provided as input to the process 1200. The input frame sequence 1202 can include the output frames 814. At block 1204, the process 1200 includes performing saliency points detection using optical flow. The saliency detection is performed to determine feature points in a current frame. Any suitable type of optical flow technique or algorithm can be used at block 1204. The optical flow motion estimation can be performed on a pixel-by-pixel basis in some cases. For instance, for each pixel in a current frame y, the motion estimation / defines the location of the corresponding pixel in the previous frame x. The motion estimation/for each pixel can include an optical flow vector that indicates a movement of the pixel between the frames. In some cases, the optical flow vector for a pixel can be a displacement vector (e.g., indicating horizontal and vertical displacements, such as x- and y- displacements) showing the movement of a pixel from a first frame to a second frame., pars. 209-210), setting a region of interest centered on the reference point (An example is described with respect to FIG. 10A and FIG. 10B. FIG. 10A is a diagram illustrating an example of an initial frame 1002 of a video. A user has selected a person as an object of interest. A bounding box 1004 is generated to represent a region of interest for the person. The bounding box 1004 is shown with a height of h and a width of w. A location (e.g., an (x, y) coordinate location) of the center point 1006 of the bounding box 1004 and a diagonal length 1008 of the bounding box 1004 are determined and used as references from which to crop and scale subsequent frames of the video in order to maintain the person with a constant size and location in the subsequent frames. [0204] FIG. 10B is a diagram illustrating an example of a subsequent frame 1012 occurring after the initial frame 1002 in the video. Based on object detection and tracking, a bounding box 1014 is generated around the person in the subsequent frame 1012. The bounding box 1014 has a width of w-n and a height of h-m. The width w-n of the bounding box 10014 is smaller than the width w of the bounding box 1004 in the initial frame 1002, and the height of h-m of the bounding box 1014 is smaller than the height h of the bounding box 1004 in the initial frame 1002. A location (e.g., an (x, y) coordinate location) of the center point 1016 and a diagonal length 1008 of the bounding box 1004 are determined., pars. 203-204 and Fig. 10A), and masking a region other than the region of interest (Figs. 10A and 10B).
Regarding claim 17:  Mao satisfies all the elements of claim 16.  Mao further discloses wherein the controller (Fig. 1) is configured to automatically determine the reference point and the direction vector through edge detection (In some examples, an object in the initial frame can be automatically detected (e.g., using object detection and/or recognition) in the initial frame, and the ROI determination engine 804 can define a ROI around the detected object. The object can be detected using object detection and/or recognition technique (e.g., a facial detection and/or recognition algorithm, a feature detection and/or recognition algorithm, an edge detection algorithm, a boundary tracing function, any combination thereof, and/or other object detection and/or recognition technique)., par. 222).
Allowable Subject Matter
Claims 4-5, 7-8, 10-11, 13-14 and 18-19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHARLOTTE M BAKER whose telephone number is (571)272-7459. The examiner can normally be reached Mon - Fri 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, JENNIFER MEHMOOD can be reached at (571)272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CHARLOTTE M BAKER/Primary Examiner, Art Unit 2664                                                                                                                                                                                                        
17 March 2026
Read full office action
Prosecution Timeline

Jan 26, 2024
Application Filed
Mar 17, 2026
Non-Final Rejection — §102 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/033,226
Patent 12602905
A Computer Software Module Arrangement, a Circuitry Arrangement, an Arrangement and a Method for Improved Object Detection Adapting the Detection through Shifting the Image
2y 5m to grant Granted Apr 14, 2026
18/061,396
Patent 12585654
Dynamic Vision System for Robot Fleet Management
2y 5m to grant Granted Mar 24, 2026
18/535,929
Patent 12579900
UAV PERCEPTION VALIDATION BASED UPON A SEMANTIC AGL ESTIMATE
2y 5m to grant Granted Mar 17, 2026
18/118,995
Patent 12548331
TECHNIQUES TO PERFORM TRAJECTORY PREDICTIONS
2y 5m to grant Granted Feb 10, 2026
18/743,338
Patent 12543924
MEDICAL SUPPORT SYSTEM, MEDICAL SUPPORT DEVICE, AND MEDICAL SUPPORT METHOD
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
93%
Grant Probability
93%
With Interview (-0.2%)
2y 2m
Median Time to Grant
Low
PTA Risk
Based on 1067 resolved cases by this examiner. Grant probability derived from career allow rate.