Last updated: April 19, 2026
Application No. 18/558,130
VIDEO PROCESSING METHOD AND DEVICE, AND ELECTRONIC DEVICE

Final Rejection §103
Filed
Oct 30, 2023
Examiner
KAUR, JASPREET
Art Unit
2662
Tech Center
2600 — Communications
Assignee
BEIJING ZITIAO NETWORK TECHNOLOGY CO., LTD.
OA Round
2 (Final)
Interview Optional

— +30.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 16 resolved cases, 2023–2026
Examiner Intelligence

KAUR, JASPREET View full profile →
Grants 81% — above average
Career Allow Rate
13 granted / 16 resolved
+19.3% vs TC avg
Strong +30% interview lift
Without
With
+30.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
31 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
17.2%
-22.8% vs TC avg
§103
53.2%
+13.2% vs TC avg
§102
7.4%
-32.6% vs TC avg
§112
15.3%
-24.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 16 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Applicant’s response to the Non-Final Office Action dated 11/10/2025, filed with the office on 02/10/2026, has been entered and made of record.
	

Status of Claims
Claims 1-6, 8-10, 12-13, 16-20, 22-24 are pending. Claims 7, 11, 14-15, and 21 are cancelled.

Response to Amendments
In light of the Applicant’s amendments of claim 10, the 112(a) rejections of record for written description has been withdrawn.

Response to Arguments
Applicant’s arguments filed on 02/10/2026 with respect to rejection of claims under 103 U.S.C. 103 has been fully considered; but they are not persuasive.
On page 15 of its reply, Applicant states “However, first, it is acknowledged in the last paragraph on p. 6 and the first paragraph on p. 7 of the Office action that "Chang does not teach 'acquire position information of three-dimensional points within the patch region, and determine, according to the position information of the three-dimensional points within the patch region; display, based on the three-dimensional position information of the patch." Examiner did not intend to state “does not teach” and instead should have stated “is not relied on to teach”. The element of the claim are taught in combination of Chang and Wang, and Chang is not relied on to teach “three-dimensional position information of a patch” as recited in claim 1, and Chang is combined with Wang to teach the ordered combination of claims as presented in the Non-Final Office Action. 
On page 16 of its reply, Applicant states “The above excerpt of Wang merely discloses that position information in three-dimensional space is placed in the encoded bitstream of a video frame and transmitted along with the video frame data, so as to avoid the increased cost and synchronization difficulties. However, Wang does not mention displaying a patch corresponding to a target object in subsequent frames (i.e., cross-frame displaying). Thus, Wang fails to disclose displaying the patch at a position corresponding to at least one second video frame, let “Wang discloses “display, based on the three-dimensional position information of the patch (Wang paragraph [0047] "The image information of the target object is blended into the preset video scene based on its position information in the three-dimensional space")”. In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).
The combination of Chang and Wang teaches “acquire position information of three-dimensional points (Wang paragraph [0127] "After determining the image coordinates (u,v) of the target object, the x and y coordinates of the target object can be calculated based on the image coordinates (u,v) and the position of the image sensing device. The x and y coordinates can be relative coordinates with respect to the origin in the three-dimensional space coordinate system, or they can be absolute coordinates in the world coordinate system") within the patch region segment the first video frame to acquire a patch corresponding to a target object and a patch region (Chang paragraph [0092] "Image segmentation techniques can be used to separate the main region containing the target object from the image of the video frame"), and determine, according to the position information of the three-dimensional points within the patch region (Wang paragraph [0127] "After determining the image coordinates (u,v) of the target object, the x and y coordinates of the target object can be calculated based on the image coordinates (u,v) and the position of the image sensing device. The x and y coordinates can be relative coordinates with respect to the origin in the three-dimensional space coordinate system, or they can be absolute coordinates in the world coordinate system"), three-dimensional position information of the patch (Wang paragraph [0128] "Depth information can be understood as the distance between each pixel in a video frame acquired by an image sensing device and the image sensing device itself. Therefore, based on the depth information and the origin in three-dimensional space, the z-coordinate of the target object in the video frame can be obtained");
display (for clarification – Wang paragraph 154 “After decoding the encoded bitstream to obtain video frames and the position information of the target object in three-dimensional space, preset information can be output based on the position information of the target object in three-dimensional space while playing the video frame” – the output is interpretated as a display), based on the three-dimensional position information of the patch (Wang paragraph [0047] "The image information of the target object is blended into the preset video scene based on its position information in the three-dimensional space" – for clarification the blending of the information of the target object and the preset video scene is interpretated obtaining a cross-frame image)”.”

In light of Applicants amendment of independent claims 1 and 12-13 by incorporating subject matter of claims 7 and 21, in page 15-16 of its reply, Applicant states “Thus, Wang fails to disclose displaying the patch at a position corresponding to at least one second video frame, let alone the specific content included in the displaying step recited in claim 1. In addition, although Wang involves a three-dimensional space, Wang does not mention acquiring a direction of the patch, let alone displaying the patch based on the three-dimensional position information and the direction of the patch, as recited in claim 1.“ As stated in page 14 of the Non-final office action, for claim 21, the combination of Chang and Wang are relied on to teach the  “The device according to claim 12, wherein the at least one processor is further caused to: acquire a direction of the patch (Chang paragraph [0034] "The alignment parameters are determined based on the positional offset of the target object in the video frame relative to the target object in the reference frame" – for clarification the direction of the path can be determined by identifying the positional offset of the patch/target object);
display (for clarification – Wang paragraph 154 “After decoding the encoded bitstream to obtain video frames and the position information of the target object in three-dimensional space, preset information can be output based on the position information of the target object in three-dimensional space while playing the video frame” – the output is interpretated as a display), based on the three-dimensional position information of the patch ( for clarification Wang paragraph [0047] "The image information of the target object is blended into the preset video scene based on its position information in the three-dimensional space") and the direction of the patch (Chang paragraph [0034] "The alignment parameters are determined based on the positional offset of the target object in the video frame relative to the target object in the reference frame" – for clarification the direction of the path can be determined by identifying the positional offset of the patch/target object), the patch at a position corresponding to at least one second video frame (Chang paragraph [0062] "Based on the alignment parameters, the main body region is overlaid on the target position of at least one video frame other than the video frame where the target object is located, generating an effect frame that simultaneously contains the two main body regions where the target object is located" – for clarification the alignment parameter can be interpretated as obtaining the direction of the patch and the other video frame is equivalent to the second video frame). The combination of Chang and Wang teaches all the limitations of claim 7 and 21 which is currently amened into the independent claims. Therefore,  Applicant’s arguments are not found persuasive.
Consequently, THIS ACTION IS MADE FINAL.

	Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-4, 10, 12-13, and 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over Chang et al. (CN 111832539 A – translation from Espacenet) in view of Wang (CN 111954032 A – translation from Espacenet).
Regarding claim 12, Chang teaches “An video processing device (Chang paragraph [0006] "This disclosure provides a video processing method, apparatus, and storage medium"), comprising: 
at least one processor and a memory; wherein the memory stores computer execution instructions (Chang paragraph [0068] "the device comprising at least: a processor and a memory for storing executable instructions capable of running on the processor");
the at least one processor executes the computer execution instructions (Chang paragraph [0069] "When the processor runs the executable instructions, the executable instructions perform the steps in any of the video processing methods described above"), causing the at least one processor to:
acquire a first video frame to be processed (Chang paragraph [0090] "The video to be processed can be a pre-recorded video file or a video that is currently being recorded");
segment the first video frame to acquire a patch corresponding to a target object and a patch region (Chang paragraph [0092] "Image segmentation techniques can be used to separate the main region containing the target object from the image of the video frame");
and
displaying, (Chang paragraph [0092] "By overlaying the main area of the target object onto another video frame, it is possible to achieve a clone effect that displays at least two main areas in the video frame" – for clarification the overlaid image that shows a clone effect is interpretated as displaying as one can physically see, i.e. via a screen, the clone effect that shows two main areas in a video frame); 
wherein displaying(- for clarification Chang paragraph [0092] "By overlaying the main area of the target object onto another video frame, it is possible to achieve a clone effect that displays at least two main areas in the video frame") comprises:
acquiring a direction of the patch (Chang paragraph [0034] "The alignment parameters are determined based on the positional offset of the target object in the video frame relative to the target object in the reference frame" – for clarification the direction of the path can be determined by identifying the positional offset of the patch/target object); and 
displaying (- for clarification Chang paragraph [0092] "By overlaying the main area of the target object onto another video frame, it is possible to achieve a clone effect that displays at least two main areas in the video frame" - the overlaid image that shows a clone effect is interpretated as displaying as one can physically see, i.e. via a screen, the clone effect that shows two main areas in a video frame), (Chang paragraph [0062] "Based on the alignment parameters, the main body region is overlaid on the target position of at least one video frame other than the video frame where the target object is located, generating an effect frame that simultaneously contains the two main body regions where the target object is located" – for clarification the alignment parameter can be interpretated as obtaining the direction of the patch and the other video frame is equivalent to the second video frame).
	However, Chang is not relied on to teach “acquire position information of three-dimensional points within the patch region, and determine, according to the position information of the three-dimensional points within the patch region” and “the three-dimensional position information of the patch”.
	Wang teaches “acquiring position information of three-dimensional points within the patch region, and determining, according to the position information of the three-dimensional points within the patch region, three-dimensional position information of the patch (Wang paragraph [0127] "After determining the image coordinates (u,v) of the target object, the x and y coordinates of the target object can be calculated based on the image coordinates (u,v) and the position of the image sensing device. The x and y coordinates can be relative coordinates with respect to the origin in the three-dimensional space coordinate system, or they can be absolute coordinates in the world coordinate system"); and
displaying (for clarification – Wang paragraph 154 “After decoding the encoded bitstream to obtain video frames and the position information of the target object in three-dimensional space, preset information can be output based on the position information of the target object in three-dimensional space while playing the video frame” – the output is interpretated as a display), based on the three-dimensional position information of the patch (Wang paragraph [0047] "The image information of the target object is blended into the preset video scene based on its position information in the three-dimensional space")”.
	It would have been obvious to a person having ordinary skill in the art before
effective filing date of the claimed invention of the instant application to combine an
video processing device that segments and overlays the segmented region to another frame as taught by Chang to use the system for determining three-dimensional coordinates of an object in a video frame as taught by Wang.
The suggestion/motivation for doing so would have been that “it is necessary to use the coordinate position information of the target object in the video in three-dimensional space. In existing technologies, the common practice for processing video in network transmission is to package the location and depth information of the tracked target separately and send them through a separate channel. This method requires the transmission of one video stream and one information stream, which is costly, and the tracked target and its location are difficult to synchronize during video playback" as noted by the Wang disclosure in paragraph 4.
Therefore, it would have been obvious to combine the disclosure of Chang with
the Wang disclosure to obtain the invention as specified in claim 12 as there is a
reasonable expectation of success and/or because doing so merely combines prior art
elements according to known methods to yield predictable results.

Claim 1 recites a method with steps corresponding to the device elements
recited in claim 12. Therefore, the recited steps of this claim are mapped to the
proposed combination in the same manner as the corresponding elements of device
claim 12. Additionally, the rationale and motivation to combine the Chang and Wang
references, presented in rejection of claim 12 apply to this claim.

Regarding claim 10, the combination of Chang and Wang teaches “The method according to claim 1, wherein the acquiring the first video frame to be processed comprises: acquiring the first video frame in response to a triggering operation (Chang paragraph [0118] "terminal can determine the freeze frame based on the predetermined operation command") acting on a screen of an electronic device (Chang paragraph [0118] "the pre-defined operation command can be a pre-defined gesture touch command, such as a click, double click, swipe up, swipe down, or irregular swipe touch operation command; it can also be a button operation command, such as pressing the volume button and the power button at the same time; or it can be a voice input command, etc"); and/or,
acquiring the first video frame when detecting that the target object is in a static state (Chang paragraph [0128] "the selection of video content can also be automated. The selected freeze frame is determined by detecting the pose of the target object in the video frame" and paragraph [0129] "when the subject is moving, they can strike a specific pose and pause their movement for a period of time to create a freeze-frame effect before continuing to move"), and/or,
acquiring the first video frame at a preset interval (Chang paragraph [0122] "the terminal can automatically select a freeze frame according to a predetermined time interval").”
	 
Claim 13 recites a computer readable storage medium including computer executable instructions corresponding to the elements of the device recited in claim 12.  Therefore, the recited instructions of the computer readable medium of claim 13 are mapped to the proposed combination in the same manner as the corresponding elements of the apparatus claim 12.  Additionally, the rationale and motivation to combine Chang and Wang presented in rejection of claim 12, apply to this claim. The additional elements of “A non-transitory computer readable storage medium, wherein the computer readable storage medium stores therein computer execution instructions, when the computer execution instructions are executed by a processor” are taught by Chang (For example, Chang paragraph [0070] "a non-transitory computer-readable storage medium is provided, wherein computer-executable instructions are stored therein, which, when executed by a processor, implement the steps in any of the video processing methods described above").

 Regarding claim 16 (similarly claim 2), the combination of Chang and Wang teaches “The device according to claim 12, wherein the at least one processor is further caused to: acquire position information of a target point corresponding to the patch region (Wang paragraph [0119] "the three dimensional spatial coordinates corresponding to all pixels on the target object can be obtained, or the three-dimensional spatial coordinates corresponding to some pixels on the target object can be collected");
determine, according to the position information of each of the three-dimensional points within the patch region, a depth corresponding to the patch (Wang paragraph [0128] "Depth information can be understood as the distance between each pixel in a video frame acquired by an image sensing device and the image sensing device itself");
determine, according to the depth and the position information of the target point, three-dimensional position information of the patch (Wang paragraph [0128] "based on the depth information and the origin in three-dimensional space, the z-coordinate of the target object in the video frame can be obtained").”
The proposed combination as well as the motivation for combining Chang and Wang references presented in the rejection of claim 12, applies to claim 16. Finally the device recited in claim 16 is met by Chang and Wang.


Regarding claim 17 (similarly claim 3), the combination of Chang and Wang teaches “The device according to claim 16, wherein the three-dimensional position information of the patch is three-dimensional position information in a world coordinate system (Wang paragraph [0127] "After determining the image coordinates (u,v) of the target object, the x and y coordinates of the target object can be calculated based on the image coordinates (u,v) and the position of the image sensing device. The x and y coordinates can be […] can be absolute coordinates in the world coordinate system"); wherein the at least one processor is further caused to:
acquire a camera pose  (Wang paragraph [00118] "the location of the image sensing device can be set as the origin (0,0,0) in the three-dimensional space"), and determine, according to the depth, the position information of the target point and the camera pose (Wang paragraph [0128] "Depth information can be understood as the distance between each pixel in a video frame acquired by an image sensing device and the image sensing device itself"), three-dimensional position information of the patch in the world coordinate system  (Wang paragraph [0128] "the depth information and the origin in three-dimensional space, the z-coordinate of the target object in the video frame can be obtained").”
The proposed combination as well as the motivation for combining Chang and Wang references presented in the rejection of claim 12, applies to claim 17. Finally the device recited in claim 17 is met by Chang and Wang.

Regarding claim 18 (similarly claim 4), the combination of Chang and Wang teaches “The device according to claim 17, wherein the at least one processor is further caused to: determine, according to the depth and the position information of the target point (Wang paragraph [0128] "the depth information and the origin in three-dimensional space, the z-coordinate of the target object in the video frame can be obtained"), first three-dimensional position information corresponding to the target point, wherein the first three-dimensional position information corresponding to the target point is three-dimensional position information of the target point in a camera coordinate system (Wang paragraph [0127] "After determining the image coordinates (u,v) of the target object, the x and y coordinates of the target object can be calculated based on the image coordinates (u,v) and the position of the image sensing device. The x and y coordinates can be relative coordinates with respect to the origin in the three-dimensional space coordinate system");
convert, according to the camera pose (Wang paragraph [0128] "Depth information can be understood as the distance between each pixel in a video frame acquired by an image sensing device and the image sensing device itself"), the first three-dimensional position information of the target point to obtain second three-dimensional position information corresponding to the target point, wherein the second three-dimensional position information corresponding to the target point is three-dimensional position information of the target point in the world coordinate system (Wang paragraph [0127] "After determining the image coordinates (u,v) of the target object, the x and y coordinates of the target object can be calculated based on the image coordinates (u,v) and the position of the image sensing device. The x and y coordinates can be […] can be absolute coordinates in the world coordinate system");
take the second three-dimensional position information corresponding to the target point as the three-dimensional position information of the patch in the world coordinate system (Wang paragraph [0127] "After determining the image coordinates (u,v) of the target object, the x and y coordinates of the target object can be calculated based on the image coordinates (u,v) and the position of the image sensing device. The x and y coordinates can be […] can be absolute coordinates in the world coordinate system").”
The proposed combination as well as the motivation for combining Chang and Wang references presented in the rejection of claim 12, applies to claim 18. Finally the device recited in claim 18 is met by Chang and Wang.

Claims 5-6 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Chang and Wang in view of Saxena et al. (“Learning Depth from Single Monocular Images” – Published 2005).
Regarding claim 19 (similarly claim 5), the combination of Chang and Wang teaches “The device according to claim 16, wherein the position information of the three-dimensional points comprises depths corresponding to the three-dimensional points; wherein the at least one processor is further caused to: (Wang paragraph [0128] "based on the depth information and the origin in three-dimensional space, the z-coordinate of the target object in the video frame can be obtained").“
	However, the combination of Chang and Wang does not teach “perform statistical processing on the depth”.
	Saxena teaches “perform statistical processing on the depth (Saxena Figure 2 and page 4 paragraph 5 "depth at a higher scale is constrained to be the average of the depths at lower scales")”.

    PNG
    media_image1.png
    357
    579
    media_image1.png
    Greyscale

Saxena Figure 2

	It would have been obvious to a person having ordinary skill in the art before
effective filing date of the claimed invention of the instant application to combine a
video processing device that segments and overlays the segmented region, using three-dimensional position information, to another frame as taught by Chang and Wang to include averaging of depths as taught by Saxena.
The suggestion/motivation for doing so would have been that “Recovering 3-D depth from images is a basic problem in computer vision, and has important applications in robotics, scene understanding and 3-D reconstruction. Most work on visual 3-D reconstruction has focused on binocular vision (stereopsis) [1] and on other algorithms that require multiple images, such as structure from motion [2] and depth from defocus [3]. Depth estimation from a single monocular image is a difficult task, and requires that we take into account the global structure of the image, as well as use prior knowledge about the scene" as noted by the Saxena disclosure in page 1 paragraph 1.
Therefore, it would have been obvious to combine the disclosure of Chang and Wang with the Saxena disclosure to obtain the invention as specified in claim 19 as there is a reasonable expectation of success and/or because doing so merely combines prior art elements according to known methods to yield predictable results.

Regarding claim 20 (similarly claim 6), the combination of Chang, Wang, and Saxena teaches “The device according to claim 19, wherein the at least one processor is further caused to: acquire a median of depths corresponding to the three-dimensional points within the patch region, and determine the median as the depth corresponding to the patch; or acquire a mode of depths corresponding to the three-dimensional points within the patch region, and determine the mode as the depth corresponding to the patch; or
acquire an average value of depths (Saxena Figure 2 and page 4 paragraph 5 "depth at a higher scale is constrained to be the average of the depths at lower scales") corresponding to the three-dimensional points within the patch region, and determine the average value as the depth corresponding to the patch  (Wang paragraph [0128] "based on the depth information and the origin in three-dimensional space, the z-coordinate of the target object in the video frame can be obtained").”
The proposed combination as well as the motivation for combining Chang, Wang, and Saxena references presented in the rejection of claim 19, applies to claim 20. Finally the device recited in claim 20 is met by Chang, Wang, and Saxena.

Claims 8-9 and 22-24 are rejected under 35 U.S.C. 103 as being unpatentable over Chang and Wang in view of Liu et al. (US 20180089832 A1).
Regarding claim 22 (similarly claim 8), the combination of Chang and Wang teaches “The device according to claim 12, wherein the at least one processor is further caused to: determine, (Wang paragraph [0127] "After determining the image coordinates (u,v) of the target object, the x and y coordinates of the target object can be calculated based on the image coordinates (u,v) and the position of the image sensing device. The x and y coordinates can be […] can be absolute coordinates in the world coordinate system");
determine, according to the position information of the spatial three-dimensional points, spatial three-dimensional points within the patch region from the spatial three-dimensional points (Wang paragraph [0127] "After determining the image coordinates (u,v) of the target object, the x and y coordinates of the target object can be calculated based on the image coordinates (u,v) and the position of the image sensing device. The x and y coordinates can be […] can be absolute coordinates in the world coordinate system");
take the position information of the spatial three-dimensional points within the patch region as the position information of the three-dimensional points within the patch region (Wang paragraph [0128] "the depth information and the origin in three-dimensional space, the z-coordinate of the target object in the video frame can be obtained").“
	However, the combination of Chang and Wang does not teach “determining, based on a simultaneous localization and mapping construction algorithm position information”.
	In an analogous field of endeavor, Liu teaches “determine, based on a simultaneous localization and mapping construction algorithm (Liu paragraph [0012] "SLAM techniques may be used to determine a location of an object")”.
	It would have been obvious to a person having ordinary skill in the art before
effective filing date of the claimed invention of the instant application to combine a
video processing device that segments and overlays the segmented region, using three-dimensional position information, to another frame as taught by Chang and Wang to include simultaneous localization and mapping construction algorithm (SLAM) for detecting position as taught by Liu.
The suggestion/motivation for doing so would have been that “SLAM can be used to determine how the object moves in an unknown environment while simultaneously building a map of the three dimensional structure surrounding the object" as noted by the Liu disclosure in paragraph 12.
Therefore, it would have been obvious to combine the disclosure of Chang and Wang with the Liu disclosure to obtain the invention as specified in claim 22 as there is a reasonable expectation of success and/or because doing so merely combines prior art elements according to known methods to yield predictable results.

Regarding claim 23 (similarly claim 9), the combination of Chang, Wang, and Liu teaches “The device according to claim 12, wherein the at least one processor is further caused to: determine, based on a simultaneous localization and mapping construction algorithm, a camera pose corresponding to the first video frame (Liu paragraph [0012] "SLAM can be used to obtain camera pose and environmental structure in real time").”
The proposed combination as well as the motivation for combining Chang, Wang, and Liu references presented in the rejection of claim 22, applies to claim 23. Finally the device recited in claim 23 is met by Chang, Wang, and Liu.
	
	Regarding claim 24, the combination of Chang, Wang, and Liu teaches “he method according to claim 8, wherein determining, according to the position information of the spatial three-dimensional points, the spatial three-dimensional points within the patch region from the spatial three-dimensional points (Wang paragraph [0127] "After determining the image coordinates (u,v) of the target object, the x and y coordinates of the target object can be calculated based on the image coordinates (u,v) and the position of the image sensing device. The x and y coordinates can be […] can be absolute coordinates in the world coordinate system") comprises:
for each of the spatial three-dimensional points, acquiring a first coordinate and a second coordinate of the spatial three-dimensional point (Wang paragraph [0127] "After determining the image coordinates (u,v) of the target object, the x and y coordinates of the target object can be calculated based on the image coordinates (u,v) and the position of the image sensing device. The x and y coordinates can be […] can be absolute coordinates in the world coordinate system"), and if the first coordinate and the second coordinate are both within a coordinate range comprised in the patch region (Chang paragraph [0131] "It is understandable that, since the subject may not be able to maintain a completely unchanged posture during the N video frames in which the movement is paused, "same posture" here can be understood as no movement beyond the predetermined range, and is not limited to slight swaying of the target object or small movements of the body such as blinking or finger waving"), determining the spatial three-dimensional point as a spatial three-dimensional point belonging to the patch region (Wang paragraph [0144] "When the position information of the target object in a video frame in three-dimensional space is known, different preset processing can be performed according to different scenarios based on the position information in three-dimensional space. For example, in a monitoring scenario, after receiving the encoded bitstream of the video frame of the monitoring site from the image sensing device at the monitoring site, the current position and behavior information of the target object can be reported in real time based on the position information of the target object in three-dimensional space").”


Conclusion

THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JASPREET KAUR whose telephone number is (571)272-5534. The examiner can normally be reached Monday - Friday 9:30 am - 5:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Amandeep Saini can be reached at (571)272-3382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
	
/JASPREET KAUR/Examiner, Art Unit 2662   
 /AMANDEEP SAINI/ Supervisory Patent Examiner, Art Unit 2662
Read full office action
Prosecution Timeline

Oct 30, 2023
Application Filed
Nov 06, 2025
Non-Final Rejection — §103
Feb 10, 2026
Response Filed
Mar 25, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/173,287
Patent 12596301
RETICLE INSPECTION AND PURGING METHOD AND TOOL
2y 5m to grant Granted Apr 07, 2026
18/480,768
Patent 12555199
IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM, WITH SYNTHESIS OF TWO INFERENCE RESULTS ABOUT AN IDENTICAL FRAME AND WITH INITIALIZING OF RECURRENT INFORMATION
2y 5m to grant Granted Feb 17, 2026
18/465,240
Patent 12513319
END-TO-END INSTANCE-SEPARABLE SEMANTIC-IMAGE JOINT CODEC SYSTEM AND METHOD
2y 5m to grant Granted Dec 30, 2025
18/172,643
Patent 12427606
SYSTEMS AND METHODS FOR NON-DESTRUCTIVELY TESTING STATOR WELD QUALITY AND EPOXY THICKNESS
2y 5m to grant Granted Sep 30, 2025
17/969,020
Patent 12421641
LAUNDRY TREATMENT APPLIANCE AND METHOD OF USING THE SAME ACCORDING TO MATCHED LAUNDRY LOADS
2y 5m to grant Granted Sep 23, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
81%
Grant Probability
99%
With Interview (+30.0%)
2y 8m
Median Time to Grant
Moderate
PTA Risk
Based on 16 resolved cases by this examiner. Grant probability derived from career allow rate.