Last updated: April 19, 2026
Application No. 17/693,056
VIDEO STREAM REFINEMENT FOR DYNAMIC SCENES

Non-Final OA §103
Filed
Mar 11, 2022
Examiner
CADEAU, WEDNEL
Art Unit
2632
Tech Center
2600 — Communications
Assignee
Microsoft Technology Licensing, LLC
OA Round
5 (Non-Final)
Interview Optional

— +19.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 532 resolved cases, 2023–2026
Examiner Intelligence

CADEAU, WEDNEL View full profile →
Grants 72% — above average
Career Allow Rate
381 granted / 532 resolved
+9.6% vs TC avg
Strong +20% interview lift
Without
With
+19.6%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
42 currently pending
Career history
574
Total Applications
across all art units
Statute-Specific Performance

§101
2.5%
-37.5% vs TC avg
§103
75.6%
+35.6% vs TC avg
§102
3.5%
-36.5% vs TC avg
§112
16.5%
-23.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 532 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Prior arts cited in this office action:
Van Hoof et al. (US 20180349708 A1, hereinafter “Van Hoof”)
Tahan (WO 2012021246 A2, hereinafter “Tahan”)
Thapliyal et al. (US 20170060389 A1, hereinafter “Thapliyal”)
Metzler et al. (US 20220020131 A1, hereinafter “ Metzler”)
Lindberg (US 20140043432 A1, hereinafter “Lindberg”)
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 12/08/2025 has been entered.
 
Response to Arguments
Applicant's arguments filed 12/08/2025 have been fully considered but they are moot in view of the new ground of Rejections set forth below.

 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-3, 5-20 are rejected under 35 U.S.C. 103 as being unpatentable over Van Hoof et al. (US 20180349708 A1, hereinafter “Van Hoof”) in view of Thapliyal et al. (US 20170060389 A1, hereinafter “Thapliyal”), in view of Lindberg (US 20140043432 A1, hereinafter “Lindberg”) and in view of Metzler et al. (US 20220020131 A1, hereinafter “ Metzler”) and in view of Tahan (WO 2012021246 A2, hereinafter “Tahan”).
Regarding claim 1:
Van Hoof teaches a system comprising: 
at least one processor (Vann Hoof [0005], where Van Hoof teaches the system includes one or more processors); and 
memory storing instructions that, when executed by the at least one processor, causes the system to perform a set of operations (Vann Hoof [0005], where Van Hoof teaches the system includes memory storing one or more programs for executing by the processor), the set of operations comprising: 
obtaining an input video stream (Van Hoof [0028], [0058], [0066], where the system is configured to receive video data streams from one or more cameras); 
identifying, within the input video stream, a frame portion containing a subject of interest (Van Hoof [0030], [0091], [0097], figs. 10, where Van Hoof teaches identifying the presence of an occupant or an object within the input video in a particular frame portion or a region of interest); 
determining if the frame portion containing a subject of interest in the first position tracking a movement of the subject of interest from the first position  to a second position determining the frame portion containing the subject of interest at the second position enlarging the frame portion containing the subject of interest (Van Hoof figs. 10A and 10B, 10C and 10D, where Van Hoof show that the portion containing the car is enlarge such that the car can be seen properly, same action is taken for the subject or person 1004 where in 10D the person 1004 is enlarged); 
identifying a first portion of the frame portion containing at least a portion of the subject of interest to be enlarged (Van Hoof [0100], figs. 10A-10F, and 11, where Van Hoof teaches identifying a portion 1004 in figure 10D that includes the whole body of the subject (driver) without the package and in figure 10E a region 1004 focusing on the subject’s face and ignore the for example the feet area)
identifying a second portion of the frame portion to not be  enlarged portion (Van Hoof [0100], figs. 10A-10F, and 11, where Van Hoof teaches identifying a portion of the portion 1004 such as the feet area to not display in other words no need for enhancement );

enhancing the frame portion of the input video stream to increase fidelity within the frame portion (Van Hoof [0091], while Van Hoof does not explicitly teach perform the enhancement only on the framed portion, he, however teaches that the framed portion (the cropped portion or the second video stream) may have a higher resolution than the unframed stream, such that details of the shown portion of the field of view are more apparent. He further teaches that the system can perform one or more operations on the raw image data to modify characteristics of the captured image data (e.g., enhancing image quality). Examples of such operations include, but are not limited to: automatic exposure functions for providing capture of illuminance/color ranges by the image sensor 816; noise reduction techniques for improving signal-to-noise ratio (S R); color processing techniques (e.g., white balance, color correction, gamma correction, or color conversion, etc.); and/or other image enhancement operations. Therefore, the enhancement can be done on all the frames or the selected frames or portion of the frame);
Van Hoof fails to explicitly teach wherein the set of operations further comprise: determining if the frame portion is smaller than a designated threshold, wherein, if the frame portion is smaller than the designated threshold, then the frame portion is enlarged.
However, determining if an object is big enough for displaying such that it does not need enlarging is well-known in the art and would have been obvious to one of ordinary skill in the art in this case. For example, Thapliyal teaches in response to the zoom-in command and for each object of the group of objects, comparing the defined object size of that object to a zoomed in object size threshold that is smaller than the initial object size threshold, adding that object to a second set of objects when the defined object size of that object is greater than the zoomed in object size threshold, and omitting that object from the second set of objects when the defined object size of that object is less than the zoomed in object size threshold. Furthermore, the arrangements include rendering a second diagram view of the diagram model on the electronic display to the user, the second diagram view including the second set of objects(Thapliyal [0017]). In contrast to the above-described conventional document viewing program which enlarges or shrinks a whole view in a flat/static manner, improved techniques are directed to providing, to a user, a set of diagram views of a diagram model stored in memory. In particular, the user is able to work at different levels using a zoom feature which selects which objects (e.g., shapes, graphics, etc.) and associated relations (e.g., lines, arrows, etc.) to display to the user. For example, suppose that a set of initial objects is rendered to the user (Thapliyal [0004]-[0005]).
Therefore, taking the teaching of Van Hoof and Thapliyal as a whole, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the application to check if the image is big enough for viewing, in order to avoid over enlarging the image that could prevent proper displaying of the image and to display the enlarged portion while the unenlarged portion is also visible on the display.
	Van Hoof in view of Thapliyal fails to explicitly teach if the input video stream is a video stream of a video call, determining if the frame portion containing the subject is smaller than a threshold, in response of determining the frame portion is not smaller than the designated threshold when the subject of interest is at the first position, continuing to display the frame portion at its current size.
However, Lindberg teaches in an embodiment, at t2, a video image scene is adjusted using the redundant pixels that fall outside the maximal output range of the peer display device, such as a TV. Once targets are identified the output image (virtually) pans to align the targets to the composition layout rules, which vary according to number of targets and relative positioning of targets. The purpose of this method is to create compositionally balanced video call scenes by repositioning the subjects automatically, without physically adjusting the input device (camera). This method also applies to simply zooming functions ("make me bigger"/"make me smaller") This method can be part of a set procedure, or as a dynamic feature that continually optimizes according to the number of subjects in the scene (i.e. people moving in/out of the scene). As can be seen in FIG. 8, at t2, the first resolution view area 820 is targeted towards the detected active subject 830 compared to phase t0-t1, based on the determined active subject information (Lindberg [0128]-[0138], [0141]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the application to modify the system of Van Hoof in view of  Thapliyal such that the system is not only applied to video call but also determining whether the to adjust the video stream containing the subject based on the determination that the frame portion is not smaller than a predetermined threshold and continuing to display the frame portion at its current size, in order to provide tracking an active subject in a video call service that is easy-to-use, low-cost, utilizing different resolutions in different ends of the call and still being highly-functional. and to enhance the experience to all users of the service with a convenient way to increase the perceived quality of the video call through superior video call compositions (Lindberg [0004]).
The combination above fails to teach enhancing the first portion of the frame portion of the input video stream to increase fidelity within the frame portion by a trained model, wherein the trained model has been trained based on one or more up-sampled images and corresponding one or more original images to reduce a fidelity.
However, Metzler in the same line of endeavor teaches the enhanced image is upsampled by the neural network, wherein a resolution of the upsampled enhanced image is higher than a resolution of the sensor image. The enhanced image has a processed image geometric correctness, the processed image geometric correctness relating to distorted metrological information representing a loss of initial metrological information caused by an image processing with a neural network, the processed image geometric correctness being lower than the sensor image geometric correctness. The method further comprises the steps: 1) providing a geometric correction image having an image geometric correctness higher than the processed image geometric correctness and showing at least a part of the scene of interest, and 2) at least partially reducing the loss of initial metrological information in the distorted metrological information by fusing the enhanced image with the geometric correction image. (Metzler [0017], [0025], [0027], [0046]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date application to perform image enhancement on the portion of the image or the portion of the frame using trained model where the resulting enhanced image does not lose significant information with regard to the original image, in order to present image that allow user to obtain as much detail as possible from the image.
The combination above fails to teach explicitly displaying the enhanced first option and the unenhanced portion of the frame portion.
Although we can deduct that particular limitation from the figures of Van Hoof (Van Hoof figs. 10A-10F, and 11) but for clarity’s sake and to correspond to the disclosure of the applicant we turn to Tahan. Tahan teaches According to an embodiment, the client device 110 can be configured to allow the user to move, using a pointing device, the virtual lens 160 over the base-resolution image in display region 150 in order to change the portion of the high-resolution image displayed in the virtual lens 160. As described above, while the virtual lens 160 appears to operate as a magnifying glass on the base-resolution image in display region 150, no actual magnification of the base-resolution image is performed. Instead, the virtual lens 160 displays the portion of the high-resolution image received from the server 120 (Tahan [0047], fig. 1).
	Therefore, taking the teachings of Van Hoof, Thapliyal, Lindberg, Metzler and Tahan as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date to enhanced a portion of a frame and unenhanced another portion of the frame and further display both on a screen such that a user viewing the screen can see the desired potion better that is being emphasized while the undesired showing but not emphasized.
Regarding claim 11:
Van Hoof teaches  method for video stream refinement of a dynamic scene (Van Hoof [0003]-[0004]), the method comprising: 
receiving an input video stream (Van Hoof [0003]-[0004], [0057]-[0058], figs. 1, 2 and 11); 
identifying, within the input video stream, a subject of interest (Van Hoof [0030], [0091], [0097], figs. 10, where Van Hoof teaches identifying the presence of an occupant or an object within the input video in a particular frame portion or a region of interest); 
generating a subject frame around the subject of interest (Van Hoof [0166],  figs. 10C-10H); 
identifying, within the input video stream, a feature of interest that corresponds to the subject of interest (Van Hoof [0163], where Van Hoof discloses zoomed-in on a face of a person, for example); 
generating a feature frame around the feature of interest (Van Hoof [0070],  [0097], [00163], fig. 10); 
determining if the frame portion containing a subject of interest in the first position tracking a movement of the subject of interest from the first position  to a second position determining the frame portion containing the subject of interest at the second position enlarging the feature frame portion containing the subject of interest  (Van Hoof [0163], figs. 10A and 10B, 10C and 10D, where Van Hoof show that the portion containing the car is enlarge such that the car can be seen properly, same action is taken for the subject or person 1004 where in 10D the person 1004 is enlarged); 
enhancing the input video stream, within the feature frame, to increase fidelity within the feature frame (Van Hoof [0091], while Van Hoof does not explicitly teach perform the enhancement only on the framed portion, he, however teaches that the framed portion (the cropped portion or the second video stream) may have a higher resolution than the unframed stream, such that details of the shown portion of the field of view are more apparent. He further teaches that the system can perform one or more operations on the raw image data to modify characteristics of the captured image data (e.g., enhancing image quality). Examples of such operations include, but are not limited to: automatic exposure functions for providing capture of illuminance/color ranges by the image sensor 816; noise reduction techniques for improving signal-to-noise ratio (S R); color processing techniques (e.g., white balance, color correction, gamma correction, or color conversion, etc.); and/or other image enhancement operations. Therefore, the enhancement can be done on all the frames or the selected frames or portion of the frame); 
identifying a first portion of the frame portion containing at least a portion of the subject of interest to be enlarged (Van Hoof [0100], figs. 10A-10F, and 11, where Van Hoof teaches identifying a portion 1004 in figure 10D that includes the whole body of the subject (driver) without the package and in figure 10E a region 1004 focusing on the subject’s face and ignore the for example the feet area)
identifying a second portion of the frame portion to not be  enlarged portion (Van Hoof [0100], figs. 10A-10F, and 11, where Van Hoof teaches identifying a portion of the portion 1004 such as the feet area to not display in other words no need for enhancement ); and 
displaying the feature frame (Van Hoof [0100], figs. 10A-10F, and 11, where Van Hoof teaches displaying the enhanced portion while tracking the object).

Van Hoof fails to explicitly teach wherein the set of operations further comprise: determining if the frame portion is smaller than a designated threshold, wherein, if the frame portion is smaller than the designated threshold, then the frame portion is enlarged.
However, determining if an object is big enough for displaying such that it does not need enlarging is well-known in the art and would have been obvious to one of ordinary skill in the art in this case. For example, Thapliyal teaches in response to the zoom-in command and for each object of the group of objects, comparing the defined object size of that object to a zoomed in object size threshold that is smaller than the initial object size threshold, adding that object to a second set of objects when the defined object size of that object is greater than the zoomed in object size threshold, and omitting that object from the second set of objects when the defined object size of that object is less than the zoomed in object size threshold. Furthermore, the arrangements include rendering a second diagram view of the diagram model on the electronic display to the user, the second diagram view including the second set of objects (Thapliyal [0017]).
Therefore, taking the teaching of Van Hoof and Thapliyal as a whole, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the application to check if the image is big enough for viewing, in order to avoid over enlarging the image that could prevent proper displaying of the image.
	Van Hoof in view of Thapliyal fails to explicitly teach if the input video stream is a video stream of a video call, determining if the frame portion containing the subject is smaller than a threshold, in response of determining the frame portion is not smaller than the designated threshold when the subject of interest is at the first position, continuing to display the frame portion at its current size.
However, Lindberg teaches  in an embodiment, at t2, a video image scene is adjusted using the redundant pixels that fall outside the maximal output range of the peer display device, such as a TV. Once targets are identified the output image (virtually) pans to align the targets to the composition layout rules, which vary according to number of targets and relative positioning of targets. The purpose of this method is to create compositionally balanced video call scenes by repositioning the subjects automatically, without physically adjusting the input device (camera). This method also applies to simply zooming functions ("make me bigger"/"make me smaller") This method can be part of a set procedure, or as a dynamic feature that continually optimizes according to the number of subjects in the scene (i.e. people moving in/out of the scene). As can be seen in FIG. 8, at t2, the first resolution view area 820 is targeted towards the detected active subject 830 compared to phase t0-t1, based on the determined active subject information (Lindberg [0128]-[0138], [0141]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the application to modify the system of Van Hoof in view of  Thapliyal such that the system is not only applied to video call but also determining whether the to adjust the video stream containing the subject based on the determination that the frame portion is not smaller than a predetermined threshold and continuing to display the frame portion at its current size, in order to provide tracking an active subject in a video call service that is easy-to-use, low-cost, utilizing different resolutions in different ends of the call and still being highly-functional. and to enhance the experience to all users of the service with a convenient way to increase the perceived quality of the video call through superior video call compositions (Lindberg [0004]).
The combination above fails to teach enhancing the frame portion of the input video stream to increase fidelity within the frame portion by a trained model, wherein the trained model has been trained based on one or more up-sampled images and corresponding one or more original images to reduce a fidelity.
However, Metzler in the same line of endeavor teaches the enhanced image is upsampled by the neural network, wherein a resolution of the upsampled enhanced image is higher than a resolution of the sensor image. The enhanced image has a processed image geometric correctness, the processed image geometric correctness relating to distorted metrological information representing a loss of initial metrological information caused by an image processing with a neural network, the processed image geometric correctness being lower than the sensor image geometric correctness. The method further comprises the steps: 1) providing a geometric correction image having an image geometric correctness higher than the processed image geometric correctness and showing at least a part of the scene of interest, and 2) at least partially reducing the loss of initial metrological information in the distorted metrological information by fusing the enhanced image with the geometric correction image. (Metzler [0017], [0025], [0027], [0046]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date application to perform image enhancement on the portion of the image or the portion of the frame using trained model where the resulting enhanced image does not lose significant information with regard to the original image, in order to present image that allow user to obtain as much detail as possible from the image.

Regarding claim 17:
Van Hoof teaches a system comprising: 
at least one processor (Vann Hoof [0005], where Van Hoof teaches the system includes one or more processors); and 
memory storing instructions that, when executed by the at least one processor, causes the system to perform a set of operations (Vann Hoof [0005], where Van Hoof teaches the system includes memory storing one or more programs for executing by the processor), the set of operations comprising: 
receiving an input video stream (Van Hoof [0028], [0058], [0066], where the system is configured to receive video data streams from one or more cameras); 
identifying, within the input video stream, a frame portion containing a subject of interest; (Van Hoof [0030], [0091], [0097], figs. 10, where Van Hoof teaches identifying the presence of an occupant or an object within the input video in a particular frame portion or a region of interest); 
determining if the frame portion containing a subject of interest in the first position tracking a movement of the subject of interest from the first position  to a second position determining the frame portion containing the subject of interest at the second position enlarging the frame portion containing the subject of interest (Van Hoof figs. 10A and 10B, 10C and 10D, where Van Hoof show that the portion containing the car is enlarge such that the car can be seen properly, same action is taken for the subject or person 1004 where in 10D the person 1004 is enlarged); 
identifying a first portion of the frame portion containing at least a portion of the subject of interest to be enlarged (Van Hoof [0100], figs. 10A-10F, and 11, where Van Hoof teaches identifying a portion 1004 in figure 10D that includes the whole body of the subject (driver) without the package and in figure 10E a region 1004 focusing on the subject’s face and ignore the for example the feet area)
identifying a second portion of the frame portion to not be  enlarged portion (Van Hoof [0100], figs. 10A-10F, and 11, where Van Hoof teaches identifying a portion of the portion 1004 such as the feet area to not display in other words no need for enhancement );
enhancing the first portion of the frame portion of the input video stream (Van Hoof [0091], while Van Hoof does not explicitly teach perform the enhancement only on the framed portion, he, however teaches that the framed portion (the cropped portion or the second video stream) may have a higher resolution than the unframed stream, such that details of the shown portion of the field of view are more apparent. He further teaches that the system can perform one or more operations on the raw image data to modify characteristics of the captured image data (e.g., enhancing image quality). Examples of such operations include, but are not limited to: automatic exposure functions for providing capture of illuminance/color ranges by the image sensor 816; noise reduction techniques for improving signal-to-noise ratio (S R); color processing techniques (e.g., white balance, color correction, gamma correction, or color conversion, etc.); and/or other image enhancement operations. Therefore, the enhancement can be done on all the frames or the selected frames or portion of the frame);  and 
displaying the enhanced frame portion moving across a display screen, the enhanced frame portion moving based on a movement of the subject of interest (Van Hoof [0070], [0091], [0098], [00100], [0104]-[00107], [0129], [0163], figs. 10A-10F, and 11, where Van Hoof teaches displaying the enhanced portion while tracking the object).
Van Hoof fails to teach wherein the set of operations further comprise: determining if the frame portion is smaller than a designated threshold, wherein, if the frame portion is smaller than the designated threshold, then the frame portion is enlarged.
However, determining if an object is big enough for displaying such that it does not need enlarging is well-known in the art and would have been obvious to one of ordinary skill in the art in this case. For example, Thapliyal teaches in response to the zoom-in command and for each object of the group of objects, comparing the defined object size of that object to a zoomed in object size threshold that is smaller than the initial object size threshold, adding that object to a second set of objects when the defined object size of that object is greater than the zoomed in object size threshold, and omitting that object from the second set of objects when the defined object size of that object is less than the zoomed in object size threshold. Furthermore, the arrangements include rendering a second diagram view of the diagram model on the electronic display to the user, the second diagram view including the second set of objects (Thapliyal [0017]).
Therefore, taking the teaching of Van Hoof and Thapliyal as a whole, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the application to check if the image is big enough for viewing, in order to avoid over enlarging the image that could prevent proper displaying of the image.
	Van Hoof in view of Thapliyal fails to explicitly teach if the input video stream is a video stream of a video call, determining if the frame portion containing the subject is smaller than a threshold, in response of determining the frame portion is not smaller than the designated threshold when the subject of interest is at the first position, continuing to display the frame portion at its current size.
However, Lindberg teaches  in an embodiment, at t2, a video image scene is adjusted using the redundant pixels that fall outside the maximal output range of the peer display device, such as a TV. Once targets are identified the output image (virtually) pans to align the targets to the composition layout rules, which vary according to number of targets and relative positioning of targets. The purpose of this method is to create compositionally balanced video call scenes by repositioning the subjects automatically, without physically adjusting the input device (camera). This method also applies to simply zooming functions ("make me bigger"/"make me smaller") This method can be part of a set procedure, or as a dynamic feature that continually optimizes according to the number of subjects in the scene (i.e. people moving in/out of the scene). As can be seen in FIG. 8, at t2, the first resolution view area 820 is targeted towards the detected active subject 830 compared to phase t0-t1, based on the determined active subject information (Lindberg [0128]-[0138], [0141]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the application to modify the system of Van Hoof in view of  Thapliyal such that the system is not only applied to video call but also determining whether the to adjust the video stream containing the subject based on the determination that the frame portion is not smaller than a predetermined threshold and continuing to display the frame portion at its current size, in order to provide tracking an active subject in a video call service that is easy-to-use, low-cost, utilizing different resolutions in different ends of the call and still being highly-functional. and to enhance the experience to all users of the service with a convenient way to increase the perceived quality of the video call through superior video call compositions (Lindberg [0004]).
The combination above fails to teach enhancing the frame portion of the input video stream to increase fidelity within the frame portion by a trained model, wherein the trained model has been trained based on one or more up-sampled images and corresponding one or more original images to reduce a fidelity.
However, Metzler in the same line of endeavor teaches the enhanced image is upsampled by the neural network, wherein a resolution of the upsampled enhanced image is higher than a resolution of the sensor image. The enhanced image has a processed image geometric correctness, the processed image geometric correctness relating to distorted metrological information representing a loss of initial metrological information caused by an image processing with a neural network, the processed image geometric correctness being lower than the sensor image geometric correctness. The method further comprises the steps: 1) providing a geometric correction image having an image geometric correctness higher than the processed image geometric correctness and showing at least a part of the scene of interest, and 2) at least partially reducing the loss of initial metrological information in the distorted metrological information by fusing the enhanced image with the geometric correction image. (Metzler [0017], [0025], [0027], [0046]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date application to perform image enhancement on the portion of the image or the portion of the frame using trained model where the resulting enhanced image does not lose significant information with regard to the original image, in order to present image that allow user to obtain as much detail as possible from the image.
Although we can deduct that particular limitation from the figures of Van Hoof (Van Hoof figs. 10A-10F, and 11) but for clarity’s sake and to correspond to the disclosure of the applicant we turn to Tahan. Tahan teaches According to an embodiment, the client device 110 can be configured to allow the user to move, using a pointing device, the virtual lens 160 over the base-resolution image in display region 150 in order to change the portion of the high-resolution image displayed in the virtual lens 160. As described above, while the virtual lens 160 appears to operate as a magnifying glass on the base-resolution image in display region 150, no actual magnification of the base-resolution image is performed. Instead, the virtual lens 160 displays the portion of the high-resolution image received from the server 120 (Tahan [0047], fig. 1).
	Therefore, taking the teachings of Van Hoof, Thapliyal, Lindberg, Metzler and Tahan as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date to enhanced a portion of a frame and unenhanced another portion of the frame and further display both on a screen such that a user viewing the screen can see the desired potion better that is being emphasized while the undesired showing but not emphasized.

Regarding claim 2:
Van Hoog in view of Thapliyal in view of Lindberg, in view of Metzler and in view of Tahan teaches further comprising a display screen, wherein the enhanced frame portion is displayed on the display screen, and wherein the designated threshold is a unit of area of the display screen on which the enhanced frame portion can be displayed relative to an overall area of the display (Van Hoof [0138], [0144], [0149], [0165]-[0166], [0187],figs 10A-10K; Thapliyal [0060]-[0063]).
Regarding claim 3:
Van Hoof in view of Thapliyal in view of Lindberg, in view of Metzler and in view of Tahan teaches wherein the frame portion is digitally enlarged (Van Hoof fig. 10D).
Regarding claim 5:
Van Hoof un view of Thapliyal in view of Lindberg fails to explicitly teach wherein, after identifying the frame portion, the set of operations further comprises generating a transition portion extending between the enhanced frame portion and an unenhanced portion, wherein displaying the enhanced frame portion further comprises displaying the transition portion, and the unenhanced portion.
However, Tahan in the same line of endeavor teaches As an alternative type of border for the virtual lens, the portion of the image at the edges of the virtual lens can be reduced in size and resolution such that the transition between the high-resolution image displayed in the virtual lens and the base resolution image is smoother and there is less loss or no loss of a portion of the image due to the size differences between the high-resolution and base-resolution images (Tahan [0115]).
Therefore, taking the teachings of Van Hoff, Thapliyal, Lindberg and Tahan as a whole, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the application to display the transition portion and the enhanced portion, in order to achieve better viewing and minimize loss.
Regarding claim 6:
Van Hoof, in view of Thapliyal, in view of Lindberg, in view of Metzler and in view of Tahan teaches wherein a loss of fidelity in the transition portion is higher than a loss of fidelity in the enhanced frame portion (Van Hoof [0091], [0163]; Tahan [0026], [0042]-[0043]).

Regarding claim 7:
Van Hoof, in view of Thapliyal, in view of Lindberg, in view of Metzler and in view of Tahan teaches wherein the set of operations further comprises:
tracking, movements of the subject of interest; and storing, in memory, a record corresponding to the movements of the subject of interest, the movements occurring over a period of time (Van Hoof [0070], [0091], [0098], [105]-[0107], [0163]).
Regarding claims 8 and 18:
Van Hoof, in view of Thapliyal, in view of Lindberg, in view of Metzler and in view of Tahan teaches wherein the subject of interest is a plurality of subjects of interest, and wherein from amongst the plurality of subjects of interest, a focal subject of interest is identified (Van Hoof [0070], [0091], [0098], [105]-[0107],  [0143]-[0144], [0163]).
Regarding claim 9:
Van Hoof, in view of Thapliyal, in view of Lindberg, in view of Metzler and in view of Tahan teaches wherein the frame portions surrounds the focal subject of interest (Van Hoof [0070], [0091], [0098], [105]-[0107],  [0143]-[0144], [0163]).
Regarding claim 10:
Van Hoof, in view of Thapliyal, in view of Lindberg, in view of Metzler and in view of Tahan teaches wherein the set of operations further comprise:
determining if the focal subject of interest is moving; and
if the focal subject of interest is moving, translating the enhanced frame portion across a display screen, based on a movement of the focal subject of interest (Van Hoof [0070], [0091], [0098], [105]-[0107],  [0143]-[0144], [0163]).
Regarding claim 12:
Van Hoof, in view of Thapliyal, in view of Lindberg, in view of Metzler and in view of Tahan teaches wherein after enlarging the feature frame, the feature frame is enhanced, and displaying the feature frame comprises displaying the enhanced feature frame (Van Hoof [0163], figs. 10E, 10J; (Metzler [0025], [0027] claim 1).
Regarding claim 13:
Van Hoof, in view of Thapliyal, in view of Lindberg, in view of Metzler and in view of Tahan teaches further comprising: training a model to enhance the feature frame, wherein the training is based on a loss of fidelity between one or more original images and one or more enhanced images that correspond to the original images (Metzler [0025], [0027] claim 1).
Regarding claim 14:  
Van Hoof, in view of Thapliyal, in view of Lindberg, in view of Metzler and in view of Tahan teaches wherein the model is a machine learning model (Metzler [0025], claim 1).
Regarding claim 15:
Van Hoof, in view of Thapliyal, in view of Lindberg, in view of Metzler and in view of Tahan teaches wherein the subject of interest is one or more persons, one or more animals, or one or more objects (Van Hoof [0038]; Lindberg [0127]; Metzler [0002]-[0004]).
Regarding claim 16:
Van Hoof, in view of Thapliyal, in view of Lindberg, in view of Metzler and in view of Tahan teaches wherein, when the subject of interest is a person, the feature of interest is a head of the person, or hands of the person (Van Hoof [0038], [0078; Lindberg [0127]; Metzler [0002]-[0004]).
Regarding claim 19:
Van Hoof, in view of Thapliyal, in view of Lindberg, in view of Metzler and in view of Tahan teaches wherein the focal subject of interest is a person (Van Hoof [0163], figs. 10A and 10B, 10C and 10D.
Regarding claim 20:
Van Hoof, in view of Thapliyal, in view of Lindberg, in view of Metzler and in view of Tahan teaches wherein the input video stream is obtained from a video data source (Van Hoof [0028], [0058], [0066], where the system is configured to receive video data streams from one or more cameras);  

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to WEDNEL CADEAU whose telephone number is (571)270-7843. The examiner can normally be reached Mon-Fri 9:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chieh Fan can be reached at 571-272-3042. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/WEDNEL CADEAU/Primary Examiner, Art Unit 2632                                                                                                                                                                                                        January 21, 2026
Read full office action
Prosecution Timeline

Mar 11, 2022
Application Filed
Aug 30, 2024
Non-Final Rejection — §103
Nov 18, 2024
Interview Requested
Nov 26, 2024
Applicant Interview (Telephonic)
Dec 04, 2024
Examiner Interview Summary
Dec 05, 2024
Response Filed
Jan 11, 2025
Final Rejection — §103
Mar 05, 2025
Interview Requested
Mar 13, 2025
Applicant Interview (Telephonic)
Mar 14, 2025
Examiner Interview Summary
Apr 16, 2025
Request for Continued Examination
Apr 21, 2025
Response after Non-Final Action
May 12, 2025
Non-Final Rejection — §103
Aug 05, 2025
Applicant Interview (Telephonic)
Aug 05, 2025
Examiner Interview Summary
Aug 12, 2025
Response Filed
Sep 05, 2025
Final Rejection — §103
Dec 08, 2025
Request for Continued Examination
Jan 05, 2026
Response after Non-Final Action
Jan 21, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/004,508
Patent 12586241
POSITION DETERMINATION METHOD, DEVICE, AND SYSTEM, AND COMPUTER-READABLE STORAGE MEDIUM
2y 5m to grant Granted Mar 24, 2026
17/888,138
Patent 12573052
METHOD AND APPARATUS FOR IMAGE SEGMENTATION
2y 5m to grant Granted Mar 10, 2026
18/139,132
Patent 12573022
ANOMALY DETECTION FOR COMPONENT THROUGH MACHINE-LEARNING BASED IMAGE PROCESSING AND CONSIDERING UPPER AND LOWER BOUND VALUES
2y 5m to grant Granted Mar 10, 2026
18/558,032
Patent 12573076
POSITION MEASUREMENT SYSTEM
2y 5m to grant Granted Mar 10, 2026
18/243,176
Patent 12567178
THREE-DIMENSIONAL DATA ENCODING METHOD, THREE-DIMENSIONAL DATA DECODING METHOD, THREE-DIMENSIONAL DATA ENCODING DEVICE, AND THREE-DIMENSIONAL DATA DECODING DEVICE
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
72%
Grant Probability
91%
With Interview (+19.6%)
2y 9m
Median Time to Grant
High
PTA Risk
Based on 532 resolved cases by this examiner. Grant probability derived from career allow rate.
VIDEO STREAM REFINEMENT FOR DYNAMIC SCENES

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email