Prosecution Insights
Last updated: April 19, 2026
Application No. 18/537,291

CASCADED VIDEO ANALYTICS FOR EDGE COMPUTING

Non-Final OA §102§103
Filed
Dec 12, 2023
Examiner
HO, ANDREW N
Art Unit
2169
Tech Center
2100 — Computer Architecture & Software
Assignee
Microsoft Technology Licensing, LLC
OA Round
3 (Non-Final)
62%
Grant Probability
Moderate
3-4
OA Rounds
4y 1m
To Grant
92%
With Interview

Examiner Intelligence

Grants 62% of resolved cases
62%
Career Allow Rate
137 granted / 221 resolved
+7.0% vs TC avg
Strong +30% interview lift
Without
With
+30.3%
Interview Lift
resolved cases with interview
Typical timeline
4y 1m
Avg Prosecution
18 currently pending
Career history
239
Total Applications
across all art units

Statute-Specific Performance

§101
21.2%
-18.8% vs TC avg
§103
58.0%
+18.0% vs TC avg
§102
10.7%
-29.3% vs TC avg
§112
6.1%
-33.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 221 resolved cases

Office Action

§102 §103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 21-40 are pending this office action. Continued Examination Under 37 CFR 1.114 A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on December 30th, 2025 has been entered. Response to Amendment This Office Action is in response to applicant’s communication filed on December 30th, 2025. The applicant’s remark and amendments to the claims were considered with the results that follow. In response to the last Office Action, claims 21, 28, and 34 have been amended. As a result, claims 21-40 are pending in this application. Response to Arguments Applicant’s argument, see pgs. 6-9 of the remarks, filed on December 30th, 2025, with respect to the rejection of independent claims 21, 28 and 34 as amended under 35 U.S.C 102 and 35 U.S.C 103, where the applicant asserts that Non-Patent Literature "Delivering Deep Learning to Mobile Devices via Offloading" issued to Xukan Ran et al. (hereinafter as “Ran”) does not explicitly teach or suggest, "subsequent local processing of the input data ... , thereby continuing processing of the input data according to a different resource allocation.". Examiner respectfully disagrees. Ran teaches “subsequent local processing of the input data ... , thereby continuing processing of the input data according to a different resource allocation." Ran indicates on pg. 43, “ PNG media_image1.png 347 492 media_image1.png Greyscale ”. The following above statement merely specifies that when the smart watch process data it would determine to offload to a nearby device due to camera capability and processing requirement due to unable to processing the video feed due to performance issue in which would upload the video to a nearby device (Smartwatch with camera capabilities and processing requirements…could offload to a nearby device such as the user’s smartphone). Additionally, Ran teaches thereby continuing processing of the input data according to a different resource allocation according on pg. 45, where Ran indicates on pg. 45, “ PNG media_image2.png 132 472 media_image2.png Greyscale As shown above, the offloading decision is highly dependent on the network conditions, therefore it is understood that the offloading decision must changed based on the network resource change thereby continuing the processing of the input data based on the different network resource change. Accordingly, on pg. 43, Ran indicates the example as shown below, PNG media_image3.png 172 491 media_image3.png Greyscale the Minecraft game would is first uploaded to the back-end server when performance is not good causing the decision to process to switch to a smaller accurate neural net model without causing to invoke the cloud server. As such, Ran teaches the amended limitations as discussed above. Applicant’s argument, see pg. 9 of the remarks, filed on December 30th, 2025, with respect to the rejection of dependent claim 26, where the applicant asserts that Non-Patent Literature "Delivering Deep Learning to Mobile Devices via Offloading" issued to Xukan Ran et al. (hereinafter as “Ran”) does not explicitly teach or suggest, "the determined configuration directs one or more cloud devices to perform processing of the extracted video frames using a heavy DNN model when results from the lightweight DNN model do not meet the defined threshold confidence value". Examiner respectfully disagrees. The above limitation merely indicates reallocation between neural network models according to the latency value. Thus the limitation above merely indicates redirecting traffic to the cloud device using the Big CNN in result that the lightweight DNN model (Little CNN) does not meet the threshold value. Thus, Ran teaches this aspect on Ran, pg. 44-45, PNG media_image4.png 397 492 media_image4.png Greyscale PNG media_image5.png 153 311 media_image5.png Greyscale {Examiner correlates the offloading to determine the best course of action to be perform on the video processing in such that when the frame rate is above a certain threshold, the CNN can be performed on the server (heavyweight DNN Model)} while frame rates below the threshold of 4 Mbps is run locally on the phone}). As such, Ran teaches the above limitation as discussed above. Claim Rejections - 35 USC § 102 The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention. Claims 28-30 and 33 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Non-Patent Literature "Delivering Deep Learning to Mobile Devices via Offloading" issued to Xukan Ran et al. (hereinafter as “Ran”). Regarding claim 28, Ran teaches a method, comprising: allocating initial local processing of input data, received from a first device, between a set of local devices comprising one or more edge devices and one or more cameras (Ran pg. 42-43: PNG media_image6.png 197 495 media_image6.png Greyscale PNG media_image7.png 532 497 media_image7.png Greyscale Pg. 44-45 PNG media_image8.png 102 432 media_image8.png Greyscale PNG media_image9.png 250 502 media_image9.png Greyscale {Examiner correlates the allocating initial local processing of input data, received from a first device as a baseline diagnostic step in such that the local processing is done locally on the phone the (First device) and analyzing tradeoff between frame rates before deciding to offload to a cloud server. Thus, the allocating between the camera and edge devices based on the features received by adjusting the resolution of the camera according to the input and reviewing the results to determine that that lower resolution requires less computation in which is fed to the Small CNN run locally on the phone while higher resolution require higher computation is sent to the cloud server, the user’s home router, or the user’s laptop based on the associated learnable parameters of the device. Accordingly, the option of a smart watch with a camera aspect offloads to the mobile phone (Edge device) in which includes a camera module}); detecting a change in resource availability for the determined configuration (Ran pg. 42-43: PNG media_image10.png 172 493 media_image10.png Greyscale PNG media_image1.png 347 492 media_image1.png Greyscale Pg. 45 PNG media_image11.png 248 482 media_image11.png Greyscale {Examiner correlates the detection of the change from the resource according to the policy of the device where the device may operate on a CNN on the phone when threshold is about 4Mbps and above the that would require to offload a server}), wherein the change is at least one of: a device condition or a network condition ( Ran: Pg. 45 PNG media_image11.png 248 482 media_image11.png Greyscale {Examiner correlates the network condition based on the network bandwidth fluctuating between 4Mbps where the bandwidth conditions drop would require the phone to offload to a bigger server}}); and when the detected change in resource availability is insufficient for the initial local processing of the input data (Ran: Pg. 43-44; PNG media_image1.png 347 492 media_image1.png Greyscale PNG media_image12.png 345 505 media_image12.png Greyscale {Examiner correlates the detected change is insufficient based on the rate having a low resolution in such the phone must decide to analyze better resolution based on the latency}), reallocating local processing among the set of local devices (Ran: pg. 43 PNG media_image1.png 347 492 media_image1.png Greyscale Pg. 45 PNG media_image13.png 287 467 media_image13.png Greyscale {Examiner correlates the reallocating among the local devices as shown on pg.43, where the Smart watch decides to offload to nearby Smartphone based on the offloading policy where the smart watch would offload locally to the smartphone for network 4Mbps and below. A network bandwidth that is 4 Mbps threshold and above would require to offload to a bigger cloud server}), thereby continuing processing of the input data according to a different resource allocation ( Ran: Pg. 43; PNG media_image1.png 347 492 media_image1.png Greyscale pg. 44-45, “ PNG media_image14.png 228 498 media_image14.png Greyscale PNG media_image9.png 250 502 media_image9.png Greyscale PNG media_image15.png 154 308 media_image15.png Greyscale {Examiner interprets a wearable device (smartwatch) can offload data to a nearby device the smart phone as directly teaching the concept of continuing processing under a different resource allocation. This merely utilizes a edge assisted task to offload the wearable limited resource of the smart watch to transfer computational task to capable nearby device}). Regarding claim 29, Ran further teaches further allocating processing to one or more smart devices, the one or more smart devices performing processing that is computationally cheaper than processing performed by the one or more edge devices (Ran: pg. 44-45, “ PNG media_image16.png 228 329 media_image16.png Greyscale PNG media_image17.png 355 325 media_image17.png Greyscale {Examiner correlates allocating the camera and edge devices based on the features received by adjusting the resolution of the camera according to the input and reviewing the results to determine that that lower resolution requires less computation while higher resolution require higher computation based on the associated learnable parameters of the device}). Regarding claim 30, Ran further teaches dynamically shifting the processing load of the input data back to the one or more edge devices upon detecting that network capability between the set of local devices has been restored (Ran: pg. 43 PNG media_image18.png 246 488 media_image18.png Greyscale pg. 44-45, “ PNG media_image15.png 154 308 media_image15.png Greyscale {Examiner correlates the offloading as determining to shift the processing load to increasing the processing of the mobile device by observing if the frame rate is below a threshold in which is then run on the mobile phone instead. The set of local devices is from offloading from a nearby device in such a smart phone based on the head mount display with camera would offload based on the frame rate requirement}). Regarding claim 33, the modification of Ran and Lin teaches claimed invention substantially as claimed, and Ran further teaches the allocated processing of input data directs the set of local devices to perform processing of the extracted video frames using a lightweight DNN model locally on the set of local devices (Ran, pg. 43; PNG media_image18.png 246 488 media_image18.png Greyscale pg. 45, “ PNG media_image5.png 153 311 media_image5.png Greyscale {Examiner correlates the offloading to determine the best course of action to be perform on the video processing in such that when the frame rate is below a certain threshold, the CNN can be performed on the mobile phone (lightweight DNN Model)}). Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 21, 27, 34, and 40 are rejected under 35 U.S.C. 103 as being unpatentable over Non-Patent Literature "Delivering Deep Learning to Mobile Devices via Offloading" issued to Xukan Ran et al. (hereinafter as “Ran”) in view of Non-Patent Literation "Distributed Deep Neural Networks over the Cloud, the Edge and End Devices” issued to Teerapittayanon et al. (hereinafter as “Teerapittayanon”) . Regarding claim 21, Ran teaches a system comprising ( PNG media_image19.png 542 694 media_image19.png Greyscale {See Fig. 1 above discloses a system}): a processor (Ran: Introduction pg. 43, Fig 1: PNG media_image20.png 265 304 media_image20.png Greyscale {Examiner correlates the mobile phone having a processor to analyze input video and displaying an output afterwards}); and a storage memory storing computer-readable instructions, which when executed by the processor, cause the processor to (Ran: Introduction: pg. 43, a typical Android phone; that is, the video stxeams cannot be analyzed in real time [2]. Even with speedup from the mobile GPU [13], typical processing times are approximately 600 ms, which is equivalent to less than 1.7 frames per second. and is still not acceptable for real time processing. Pg. 43 PNG media_image21.png 86 449 media_image21.png Greyscale {Examiner correlates the android phone to have storage memory to process the video stream and analyze the stream based on its parameter}): receive a video query regarding a live video stream (Ran: Introduction pg. 43; PNG media_image22.png 260 322 media_image22.png Greyscale {Examiner correlates the input video as the video query}); determine a configuration from the determined set of configurations for performing local processing the video query based on current resources available to the system (Ran: Introduction pg. 43; PNG media_image23.png 145 296 media_image23.png Greyscale {Examiner correlates the selection of the local processing video query based on the resource based on the tradeoff that was received from the object and based on the system would determine according to its parameters to decide an image resolution and model size and offloading decision to provide the best optimal decision by utilizing the offloading decision engine of Fig. 1 (shown below). Thus the performing the real-time detection based on the tradeoff is according to the system parameters in such perform by the camera of the device PNG media_image22.png 260 322 media_image22.png Greyscale }); allocate initial local processing for the video query between a set of local devices according to the determined configuration, the set of local devices comprising at least one camera and at least edge device (Ran pg. 42-43: PNG media_image6.png 197 495 media_image6.png Greyscale PNG media_image7.png 532 497 media_image7.png Greyscale Pg. 44-45 PNG media_image8.png 102 432 media_image8.png Greyscale PNG media_image9.png 250 502 media_image9.png Greyscale {Examiner correlates the allocating initial local processing of input data, received from a first device as a baseline diagnostic step in such that the local processing is done locally on the phone the (First device) and analyzing tradeoff between frame rates before deciding to offload to a cloud server. Thus, the allocating between the camera and edge devices based on the features received by adjusting the resolution of the camera according to the input and reviewing the results to determine that that lower resolution requires less computation in which is fed to the Small CNN run locally on the phone while higher resolution require higher computation is sent to the cloud server, the user’s home router, or the user’s laptop based on the associated learnable parameters of the device. Accordingly, the option of a smart watch with a camera aspect offloads to the mobile phone (Edge device) in which includes a camera module}); detect a change in resource availability for the determined configuration (Ran pg. 42-43: PNG media_image10.png 172 493 media_image10.png Greyscale PNG media_image18.png 246 488 media_image18.png Greyscale Pg. 45 PNG media_image11.png 248 482 media_image11.png Greyscale ), wherein the change is at least one of: a device condition or a network condition (Ran: Pg. 45 PNG media_image11.png 248 482 media_image11.png Greyscale PNG media_image13.png 287 467 media_image13.png Greyscale ); and when the detected change in resource availability is insufficient for the initial local processing of the video query (Ran: Pg. 43-44; PNG media_image1.png 347 492 media_image1.png Greyscale PNG media_image12.png 345 505 media_image12.png Greyscale {Examiner correlates the detected change is insufficient based on the rate having a low resolution in such the phone must decide to analyze better resolution based on the latency}), reallocate subsequentfor the video query among the set of local devices (Ran: pg. 43 PNG media_image1.png 347 492 media_image1.png Greyscale Pg. 45 PNG media_image13.png 287 467 media_image13.png Greyscale {Examiner correlates the reallocating among the local devices as shown on pg.43, where the Smart watch decicdes to offload to nearby Smartphone based on the offloading policy where the smart watch would offload locally to the smartphone for network 4Mbps and below. A network bandwidth that is 4 Mbps threshold and above would require to offload to a bigger cloud server}), thereby continuing processing for the video query according to a different resource allocation (Ran: Pg. 43; PNG media_image1.png 347 492 media_image1.png Greyscale pg. 44-45, “ PNG media_image14.png 228 498 media_image14.png Greyscale PNG media_image9.png 250 502 media_image9.png Greyscale PNG media_image15.png 154 308 media_image15.png Greyscale {Examiner interprets a wearable device (smartwatch) can offload data to a nearby device the smart phone as directly teaching the concept of continuing processing under a different resource allocation. This merely utilizes a edge assisted task to offload the wearable limited resource of the smart watch to transfer computational task to capable nearby device}). Ran does not explicitly teach determine a resource requirement estimate for the video query; determine a set of configurations from a plurality of processing configurations, based upon an ability of each configuration of the set of configurations to process the video query; However, Teerapittayanon teaches determine a resource requirement estimate for the video query (Teerapittayanon: D. DDNN Interfence: multiple preconfigured exit thresholds T (one element T at each exit point) as a measure of confidence in the prediction of the sample. One way to define T is by searching over the ranges of T on a validation set and pick the one with the best accuracy. H Reducing Communication Costs: offloading raw sensor input to the cloud. Sending a 32x32 RGB pixel image (the input size of our dataset) to the cloud costs 3072 bytes per image sample. By comparison, as shown in Table II, the largest DDNN model used in our evaluation section requires only 140 bytes of communication per sample on average (an over 20x reduction in communication costs) {Examiner correlates the resource estimate based on the measure of the confidence of the sample. Based on the estimate it determines best accuracy to select the best quality selection utilizing measure of the confidence of the sample data and sending the best quality image accordingly}); determine a set of configurations from a plurality of processing configurations, based upon an ability of each configuration of the set of configurations to process the video query (Teerapittayanon: D. DDNN Inference: multiple preconfigured exit thresholds T (one element T at each exit point) as a measure of confidence in the prediction of the sample. One way to define T is by searching over the ranges of T on a validation set and pick the one with the best accuracy. We use a normalized entropy threshold a the confidence criteria (instead of unnormalized entropy as used in [3 I) that determines whether to classify (exit) a sample at a particular exit point. This normalized entropy r; has values between 0 and I which allows easier interpretation and searching of its corresponding threshold T. For example. 17 close to 0 means that the DDNN is confident about the prediction of the sample: r7 close to I means it is not confident {Examiner correlates the selecting of the best configuration based on the measure of the confidence of the sample data and searching over the set and picking the best quality to assure that based on the criteria that the confidence of the sample is the best to achieve the most confident results based on the criteria}); It would have been obvious to a person of ordinary skill in the art , before the effective filing date of the invention, to modify Ran (teaches to include receive a video query regarding a live video stream and determine the best allocation to select the recommend configuration setting) with the teachings of Teerapittayanon (teaches selecting one or more configurations from a plurality of stored processing configurations and detect a change in resource availability by adjust the selected configuration). One of ordinary skill in the art would have been motivated to make such a combination of dramatically improving the accuracy of the system both at the local and cloud level by automatically tuned to process the geographically unique inputs and work together toward the same overall objective leading to high overall accuracy (See Teerapittayanon: DDNN PROVISION FOR HORIZONTAL AND VERTICAL SCALING). In addition, the references (Ran and Teerapittayanon) teach features that are directed to analogous art and they are directed to the same field of endeavor as Ran and Teerapittayanon are directed to receiving inputs from devices and adjusting the loading process to improve the overall performance of the system. Regarding claim 27, the modification of the modification of Ran and Teerapittayanon teaches claimed invention substantially as claimed, and Ran further teaches detecting a change in resource availability comprises determining whether network connectivity to the at least one edge device is available (Ran: 4.3 Impact of variable network conditions, pg. 45, “ PNG media_image15.png 154 308 media_image15.png Greyscale {Examiner correlates determining the resource available based on the network connectivity according to determining the offload policy on whether the CNN on the phone runs better or the CNN on the server is suitable to run better}). Regarding claim 34, Ran teaches a method, comprising: receive a video query regarding a live video stream (Ran: Introduction pg. 43; PNG media_image22.png 260 322 media_image22.png Greyscale {Examiner correlates the input video as the video query}); determine a configuration from the determined set of configurations for performing local processing the video query based on current resources available to the system (Ran: Introduction pg. 43; PNG media_image23.png 145 296 media_image23.png Greyscale {Examiner correlates the selection of the processing video query based on the resource based on the tradeoff that was received from the object and based on the system would determine according to its parameters to decide an image resolution and model size and offloading decision to provide the best optimal decision by utilizing the offloading decision engine of Fig. 1 (shown below) PNG media_image22.png 260 322 media_image22.png Greyscale }); allocate initial local processing for the video query between a set of local devices according to the determined configuration, the set of local devices comprising at least one camera and at least one edge device (Ran pg. 42-43: PNG media_image6.png 197 495 media_image6.png Greyscale PNG media_image7.png 532 497 media_image7.png Greyscale Pg. 44-45 PNG media_image8.png 102 432 media_image8.png Greyscale PNG media_image9.png 250 502 media_image9.png Greyscale {Examiner correlates the allocating initial local processing of input data, received from a first device as a baseline diagnostic step in such that the local processing is done locally on the phone the (First device) and analyzing tradeoff between frame rates before deciding to offload to a cloud server. Thus, the allocating between the camera and edge devices based on the features received by adjusting the resolution of the camera according to the input and reviewing the results to determine that that lower resolution requires less computation in which is fed to the Small CNN run locally on the phone while higher resolution require higher computation is sent to the cloud server, the user’s home router, or the user’s laptop based on the associated learnable parameters of the device. Accordingly, the option of a smart watch with a camera aspect offloads to the mobile phone (Edge device) in which includes a camera module}); detecting a change in resource availability for the determined configuration, wherein the change is at least one of a device condition or a network condition (Ran: Pg. 45 PNG media_image11.png 248 482 media_image11.png Greyscale PNG media_image13.png 287 467 media_image13.png Greyscale ); and when the detected change in resource availability is insufficient for the initial local processing of the video query (Ran: Pg. 43-44; PNG media_image1.png 347 492 media_image1.png Greyscale PNG media_image12.png 345 505 media_image12.png Greyscale {Examiner correlates the detected change is insufficient based on the rate having a low resolution in such the phone must decide to analyze better resolution based on the latency}), reallocate subsequentfor the video query among the set of local devices (Ran: pg. 43 PNG media_image1.png 347 492 media_image1.png Greyscale Pg. 45 PNG media_image13.png 287 467 media_image13.png Greyscale {Examiner correlates the reallocating among the local devices as shown on pg.43, where the Smart watch decicdes to offload to nearby Smartphone based on the offloading policy where the smart watch would offload locally to the smartphone for network 4Mbps and below. A network bandwidth that is 4 Mbps threshold and above would require to offload to a bigger cloud server}), thereby continuing processing for the video query according to a different resource allocation (Ran: Pg. 43; PNG media_image1.png 347 492 media_image1.png Greyscale pg. 44-45, “ PNG media_image14.png 228 498 media_image14.png Greyscale PNG media_image9.png 250 502 media_image9.png Greyscale PNG media_image15.png 154 308 media_image15.png Greyscale {Examiner interprets a wearable device (smartwatch) can offload data to a nearby device the smart phone as directly teaching the concept of continuing processing under a different resource allocation. This merely utilizes a edge assisted task to offload the wearable limited resource of the smart watch to transfer computational task to capable nearby device}). Ran does not explicitly teach determine a resource requirement estimate for the video query; determine a set of configurations from a plurality of processing configurations, based upon an ability of each configuration of the set of configurations to process the video query; However, Teerapittayanon teaches determine a resource requirement estimate for the video query (Teerapittayanon: D. DDNN Interfence: multiple preconfigured exit thresholds T (one element T at each exit point) as a measure of confidence in the prediction of the sample. One way to define T is by searching over the ranges of T on a validation set and pick the one with the best accuracy. H Reducing Communication Costs: offloading raw sensor input to the cloud. Sending a 32x32 RGB pixel image (the input size of our dataset) to the cloud costs 3072 bytes per image sample. By comparison, as shown in Table II, the largest DDNN model used in our evaluation section requires only 140 bytes of communication per sample on average (an over 20x reduction in communication costs) {Examiner correlates the resource estimate based on the measure of the confidence of the sample. Based on the estimate it determines best accuracy to select the best quality selection utilizing measure of the confidence of the sample data and sending the best quality image accordingly}); determine a set of configurations from a plurality of processing configurations, based upon an ability of each configuration of the set of configurations to process the video query (Teerapittayanon: D. DDNN Inference: multiple preconfigured exit thresholds T (one element T at each exit point) as a measure of confidence in the prediction of the sample. One way to define T is by searching over the ranges of T on a validation set and pick the one with the best accuracy. We use a normalized entropy threshold a the confidence criteria (instead of unnormalized entropy as used in [3 I) that determines whether to classify (exit) a sample at a particular exit point. This normalized entropy r; has values between 0 and I which allows easier interpretation and searching of its corresponding threshold T. For example. 17 close to 0 means that the DDNN is confident about the prediction of the sample: r7 close to I means it is not confident {Examiner correlates the selecting of the best configuration based on the measure of the confidence of the sample data and searching over the set and picking the best quality to assure that based on the criteria that the confidence of the sample is the best to achieve the most confident results based on the criteria}); It would have been obvious to a person of ordinary skill in the art , before the effective filing date of the invention, to modify Ran (teaches to include receive a video query regarding a live video stream and determine the best allocation to select the recommend configuration setting) with the teachings of Teerapittayanon (teaches selecting one or more configurations from a plurality of stored processing configurations and detect a change in resource availability by adjust the selected configuration). One of ordinary skill in the art would have been motivated to make such a combination of dramatically improving the accuracy of the system both at the local and cloud level by automatically tuned to process the geographically unique inputs and work together toward the same overall objective leading to high overall accuracy (See Teerapittayanon: DDNN PROVISION FOR HORIZONTAL AND VERTICAL SCALING). In addition, the references (Ran and Teerapittayanon) teach features that are directed to analogous art and they are directed to the same field of endeavor as Ran and Teerapittayanon are directed to receiving inputs from devices and adjusting the loading process to improve the overall performance of the system. Regarding claim 40, the modification of the modification of Ran and Teerapittayanon teaches claimed invention substantially as claimed, and Ran further teaches detecting a change in resource availability comprises determining whether network connectivity to one or more cloud devices is available (Ran: 4.3 Impact of variable network conditions, pg. 45, “ PNG media_image15.png 154 308 media_image15.png Greyscale {Examiner correlates determining the resource available based on the network connectivity according to determining the offload policy on whether the CNN on the phone runs better or the CNN on the server is suitable to run better}). Claims 22-26 and 35-39 are rejected under 35 U.S.C. 103 as being unpatentable over Non-Patent Literature "Delivering Deep Learning to Mobile Devices via Offloading" issued to Xukan Ran et al. (hereinafter as “Ran”) in view of Non-Patent Literation "Distributed Deep Neural Networks over the Cloud, the Edge and End Devices” issued to Teerapittayanon et al. (hereinafter as “Teerapittayanon”) in further view of Non-Patent Literature "A Computing Platform for Video Crowdprocessing Using Deep Learning" issued to Lu et al. (hereinafter as "Lu"). Regarding claim 22, the modification of Ran and Teerapittayanon teaches claimed invention substantially as claimed, however the modification of Ran and Teerapittayanon does not explicitly teach determined configuration directs the set of local devices to extract video frames from the live video stream using a decoding module. Lu teaches the determined configuration directs the set of local devices to extract video frames from the live video stream using a decoding module ( PNG media_image24.png 259 502 media_image24.png Greyscale See Lu pg 1431, III. OVERVIEW: We consider a crowdprocessing approach to perform object detection/classification of videos. This task includes filtering of videos based on metadata, and then processing the videos to perform object detection. Alternatively, the user can process some of the frames locally… Once these videos are processed using deep learning either locally on the mobile device…For frames that are processed on the mobile devices, the user will forward either the tags, the frames of interest. Alternatively, the user may perform frame extraction locally and then offload specific frames to the cloud. In this case, it needs to determine whether each frame is processed by (ii) frame offload in which the frame is sent to the cloud for detection or (iii) local detection where the frame is detected on the mobile device {Examiner correlates based on the frame extraction that heavy processing would be offloaded to the cloud to perform the video processing while low processing can be immediately by process the object detection then deliver to the cloud to perform the video processing}). It would have been obvious to a person of ordinary skill in the art , before the effective filing date of the invention, to modify Ran (teaches to include receive a video query regarding a live video stream and determine the best allocation to select the recommend configuration setting) with the teachings of Teerapittayanon (teaches selecting one or more configurations from a plurality of stored processing configurations and detect a change in resource availability by adjust the selected configuration) with the further teachings of Lu (teaches the determined configuration directs the set of local devices to extract video frames from the live video stream using a decoding module). One of ordinary skill in the art would have been motivated to make such a combination of dramatically improving frames of the video by saving the energy usage by offloading to a different location to process video without the need of extra energy usage concern (See Lu: III. OVERVIEW; At the expense of some energy usage, the user can offload the videos to the cloud, where high performance computing can quickly process the videos without energy usage concerns). In addition, the references (Ran, Teerapittayanon, and Lu) teach features that are directed to analogous art and they are directed to the same field of endeavor as Ran, Teerapittayanon, and Lu are directed to receiving inputs from devices and adjusting the loading process to improve the overall performance of the system. Regarding claim 23, the modification of the modification of Ran, Teerapittayanon, and Lu teaches claimed invention substantially as claimed, and Lu further teaches the determined configuration directs the set of local devices to perform background subtraction on the extracted video frames (Lu: pg 1431, III. OVERVIEW: Alternatively, the user can process some of the frames locally… Once these videos are processed using deep learning either locally on the mobile device…For frames that are processed on the mobile devices, the user will forward either the tags, the frames of interest. pg.1432, A. Frame Extraction, “Frame extraction is used to take individual video frames and transform them into images upon which object detection may be performed. To target objects with different dynamics within the video, the task issuer may request a different frame extraction rate. For example, for an object moving at a high speed like a car, the rate should be high enough to not miss the object”. B. Detection, “However, batch processing performs much better, where the intercept (α) is about 240ms and the slope (β) is 400ms. The difference grows with the increase of the number of frames. For detection, it is better to put more frames in a batch to reduce processing time. However, the system must wait longer to get the extracted frames from the video. Therefore, it is difficult to determine the best batch size” {Examiner correlates the difference rate in detecting the object would determine the best batch size to be deliver}). Regarding claim 24, the modification of the modification of Ran, Teerapittayanon, and Lu teaches claimed invention substantially as claimed, and Lu further teaches the background subtraction is performed on the extracted video frames to determine whether additional processing should be performed (Lu: pg.1432, A. Frame Extraction, “Frame extraction is used to take individual video frames and transform them into images upon which object detection may be performed. To target objects with different dynamics within the video, the task issuer may request a different frame extraction rate. For example, for an object moving at a high speed like a car, the rate should be high enough to not miss the object”. B. Detection, “However, batch processing performs much better, where the intercept (α) is about 240ms and the slope (β) is 400ms. The difference grows with the increase of the number of frames. For detection, it is better to put more frames in a batch to reduce processing time. However, the system must wait longer to get the extracted frames from the video. Therefore, it is difficult to determine the best batch size” {Examiner correlates the additional processing based on observing the difference growth in the rate when detecting the object in which allows to put more frames in the batch in which requires additional processing to determine best batch size}). Regarding claim 25, the modification of the modification of Ran, Teerapittayanon, and Lu teaches claimed invention substantially as claimed, and Ran further teaches the determined configuration directs set of local devices to perform processing of the extracted video frames using a lightweight DNN model locally on the set of local devices (Ran, Introduction pg. 43; PNG media_image25.png 355 322 media_image25.png Greyscale pg. 45, “ PNG media_image5.png 153 311 media_image5.png Greyscale {Examiner correlates the offloading to determine the best course of action to be perform on the video processing in such that when the frame rate is below a certain threshold, the CNN can be performed on the mobile phone (lightweight DNN Model)}). Regarding claim 26, the modification of the modification of Ran, Teerapittayanon, and Lu teaches claimed invention substantially as claimed, and Ran further teaches the determined configuration directs one or more cloud devices to perform processing of the extracted video frames using a heavy DNN model when results from the lightweight DNN model do not meet the defined threshold confidence value (Ran, pg. 44-45, PNG media_image4.png 397 492 media_image4.png Greyscale PNG media_image5.png 153 311 media_image5.png Greyscale {Examiner correlates the offloading to determine the best course of action to be perform on the video processing in such that when the frame rate is above a certain threshold, the CNN can be performed on the server (heavyweight DNN Model)} while frame rates below the threshold of 4 Mbps is run locally on the phone}). Regarding claim 35, the modification of Ran and Teerapittayanon teaches claimed invention substantially as claimed, however the modification of Ran and Teerapittayanon does not explicitly teach the determined configuration directs the set of local devices to extract video frames from the live video stream using a decoding module. Lu teaches the determined configuration directs the set of local devices to extract video frames from the live video stream using a decoding module ( PNG media_image24.png 259 502 media_image24.png Greyscale See Lu pg 1431, III. OVERVIEW: We consider a crowdprocessing approach to perform object detection/classification of videos. This task includes filtering of videos based on metadata, and then processing the videos to perform object detection. Alternatively, the user can process some of the frames locally… Once these videos are processed using deep learning either locally on the mobile device…For frames that are processed on the mobile devices, the user will forward either the tags, the frames of interest. Alternatively, the user may perform frame extraction locally and then offload specific frames to the cloud. In this case, it needs to determine whether each frame is processed by (ii) frame offload in which the frame is sent to the cloud for detection or (iii) local detection where the frame is detected on the mobile device {Examiner correlates based on the frame extraction that heavy processing would be offloaded to the cloud to perform the video processing while low processing can be immediately by process the object detection then deliver to the cloud to perform the video processing}). It would have been obvious to a person of ordinary skill in the art , before the effective filing date of the invention, to modify Ran (teaches to include receive a video query regarding a live video stream and determine the best allocation to select the recommend configuration setting) with the teachings of Teerapittayanon (teaches selecting one or more configurations from a plurality of stored processing configurations and detect a change in resource availability by adjust the selected configuration) with the further teachings of Lu (teaches the determined configuration directs the set of local devices to extract video frames from the live video stream using a decoding module). One of ordinary skill in the art would have been motivated to make such a combination of dramatically improving frames of the video by saving the energy usage by offloading to a different location to process video without the need of extra energy usage concern (See Lu: III. OVERVIEW; At the expense of some energy usage, the user can offload the videos to the cloud, where high performance computing can quickly process the videos without energy usage concerns). In addition, the references (Ran, Teerapittayanon, and Lu) teach features that are directed to analogous art and they are directed to the same field of endeavor as Ran, Teerapittayanon, and Lu are directed to receiving inputs from devices and adjusting the loading process to improve the overall performance of the system. Regarding claim 36, the modification of the modification of Ran, Teerapittayanon, and Lu teaches claimed invention substantially as claimed, and Lu further teaches the determined configuration directs the set of local devices to perform background subtraction on the extracted video frames (Lu: pg 1431, III. OVERVIEW: Alternatively, the user can process some of the frames locally… Once these videos are processed using deep learning either locally on the mobile device…For frames that are processed on the mobile devices, the user will forward either the tags, the frames of interest. pg.1432, A. Frame Extraction, “Frame extraction is used to take individual video frames and transform them into images upon which object detection may be performed. To target objects with different dynamics within the video, the task issuer may request a different frame extraction rate. For example, for an object moving at a high speed like a car, the rate should be high enough to not miss the object”. B. Detection, “However, batch processing performs much better, where the intercept (α) is about 240ms and the slope (β) is 400ms. The difference grows with the increase of the number of frames. For detection, it is better to put more frames in a batch to reduce processing time. However, the system must wait longer to get the extracted frames from the video. Therefore, it is difficult to determine the best batch size” {Examiner correlates the difference rate in detecting the object would determine the best batch size to be deliver}). Regarding claim 37, the modification of the modification of Ran, Teerapittayanon, and Lu teaches claimed invention substantially as claimed, and Lu further teaches the background subtraction is performed on the extracted video frames to determine whether additional processing should be performed (Lu: pg.1432, A. Frame Extraction, “Frame extraction is used to take individual video frames and transform them into images upon which object detection may be performed. To target objects with different dynamics within the video, the task issuer may request a different frame extraction rate. For example, for an object moving at a high speed like a car, the rate should be high enough to not miss the object”. B. Detection, “However, batch processing performs much better, where the intercept (α) is about 240ms and the slope (β) is 400ms. The difference grows with the increase of the number of frames. For detection, it is better to put more frames in a batch to reduce processing time. However, the system must wait longer to get the extracted frames from the video. Therefore, it is difficult to determine the best batch size” {Examiner correlates the additional processing based on observing the difference growth in the rate when detecting the object in which allows to put more frames in the batch in which requires additional processing to determine best batch size}). Regarding claim 38, the modification of the modification of Ran, Teerapittayanon, and Lu teaches claimed invention substantially as claimed, and Ran further teaches the determined configuration directs the set of local devices to perform processing of the extracted video frames using a lightweight DNN model locally on the set of local devices (Ran, Introduction pg. 43; PNG media_image25.png 355 322 media_image25.png Greyscale pg. 45, “ PNG media_image5.png 153 311 media_image5.png Greyscale {Examiner correlates the offloading to determine the best course of action to be perform on the video processing in such that when the frame rate is below a certain threshold, the CNN can be performed on the mobile phone (lightweight DNN Model)}). Regarding claim 39, the modification of the modification of Ran, Teerapittayanon, and Lu teaches claimed invention substantially as claimed, and Ran further teaches the determined configuration directs one or more cloud devices to perform processing of the extracted video frames using a heavy DNN model when results from the lightweight DNN model do not meet the defined threshold confidence value (Ran, pg. 45, “ PNG media_image5.png 153 311 media_image5.png Greyscale {Examiner correlates the offloading to determine the best course of action to be perform on the video processing in such that when the frame rate is above a certain threshold, the CNN can be performed on the server (heavyweight DNN Model)}). Claims 31-32 are rejected under 35 U.S.C. 103 as being unpatentable over Non-Patent Literature "Delivering Deep Learning to Mobile Devices via Offloading" issued to Xukan Ran et al. (hereinafter as “Ran”) in view of Non-Patent Literature "A Computing Platform for Video Crowdprocessing Using Deep Learning" issued to Lu et al. (hereinafter as "Lu"). Regarding claim 31, Ran teaches claimed invention substantially as claimed, however Ran does not explicitly teach the allocated processing of input data directs the set of local devices to extract video frames from the live video stream using a decoding module. Lu teaches further teaches the allocated processing of input data directs the set of local devices to extract video frames from the live video stream using a decoding module ( PNG media_image24.png 259 502 media_image24.png Greyscale See Lu pg 1431, III. OVERVIEW: We consider a crowdprocessing approach to perform object detection/classification of videos. This task includes filtering of videos based on metadata, and then processing the videos to perform object detection. Alternatively, the user can process some of the frames locally… Once these videos are processed using deep learning either locally on the mobile device…For frames that are processed on the mobile devices, the user will forward either the tags, the frames of interest. Alternatively, the user may perform frame extraction locally and then offload specific frames to the cloud. In this case, it needs to determine whether each frame is processed by (ii) frame offload in which the frame is sent to the cloud for detection or (iii) local detection where the frame is detected on the mobile device {Examiner correlates based on the frame extraction that heavy processing would be offloaded to the cloud to perform the video processing while low processing can be immediately by process the object detection then deliver to the cloud to perform the video processing}). It would have been obvious to a person of ordinary skill in the art , before the effective filing date of the invention, to modify Ran (teaches to include receive a video query regarding a live video stream and determine the best allocation to select the recommend configuration setting) with the teachings of Lu (teaches the allocated processing of input data directs the one or more cameras or the one or more edge devices to extract video frames from the live video stream using a decoding module). One of ordinary skill in the art would have been motivated to make such a combination of dramatically improving frames of the video by saving the energy usage by offloading to a different location to process video without the need of extra energy usage concern (See Lu: III. OVERVIEW; At the expense of some energy usage, the user can offload the videos to the cloud, where high performance computing can quickly process the videos without energy usage concerns). In addition, the references (Ran and Lu) teach features that are directed to analogous art and they are directed to the same field of endeavor as Ran and Lu are directed to receiving inputs from devices and adjusting the loading process to improve the overall performance of the system. Regarding claim 32, Ran teaches claimed invention substantially as claimed, however, Ran does not explicitly teach the allocated processing of input data directs the set of local devices to perform background subtraction on the extracted video frames. Lu teaches the allocated processing of input data directs the set of local devices to perform background subtraction on the extracted video frames (Lu: pg 1431, III. OVERVIEW: Alternatively, the user can process some of the frames locally… Once these videos are processed using deep learning either locally on the mobile device…For frames that are processed on the mobile devices, the user will forward either the tags, the frames of interest. pg.1432, A. Frame Extraction, “Frame extraction is used to take individual video frames and transform them into images upon which object detection may be performed. To target objects with different dynamics within the video, the task issuer may request a different frame extraction rate. For example, for an object moving at a high speed like a car, the rate should be high enough to not miss the object”. B. Detection, “However, batch processing performs much better, where the intercept (α) is about 240ms and the slope (β) is 400ms. The difference grows with the increase of the number of frames. For detection, it is better to put more frames in a batch to reduce processing time. However, the system must wait longer to get the extracted frames from the video. Therefore, it is difficult to determine the best batch size” {Examiner correlates the difference rate in detecting the object would determine the best batch size to be deliver}). It would have been obvious to a person of ordinary skill in the art , before the effective filing date of the invention, to modify Ran (teaches to include receive a video query regarding a live video stream and determine the best allocation to select the recommend configuration setting) with the teachings of Lu (teaches the allocated processing of input data directs the one or more cameras or the one or more edge devices to extract video frames from the live video stream using a decoding module). One of ordinary skill in the art would have been motivated to make such a combination of dramatically improving frames of the video by saving the energy usage by offloading to a different location to process video without the need of extra energy usage concern (See Lu: III. OVERVIEW; At the expense of some energy usage, the user can offload the videos to the cloud, where high performance computing can quickly process the videos without energy usage concerns). In addition, the references (Ran and Lu) teach features that are directed to analogous art and they are directed to the same field of endeavor as Ran and Lu are directed to receiving inputs from devices and adjusting the loading process to improve the overall performance of the system. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Non-Patent Literature “Energy-Traffic Tradeoff Cooperative Offloading for Mobile Cloud Computing” issued to Song et al. (hereinafter as “Song”) teaches energy tradeoff in mobile cloud computing environment by achieving a desirable trade off between energy consumption and internet data tradeoff from a mobile device to a cloud server. Non-Patent Literature “LAVEA: latency-aware video analytics on edge computing platform” issued to YI et al. (hereinafter as “YI”) teaches formulating a way to offload task based on comparing workload choosing the best overall task placement for collaboration. U.S Patent Application Publication 2007/0271570 issued to Brown et al. (hereinafter as “Brown”) teaches an existing system that determine the load on the nodes and determine a way to balance the workload among the database system nodes. Contact Information Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANDREW N HO whose telephone number is (571)270-0590. The examiner can normally be reached Tuesday and Thursday 10:00-6:00. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sherief Badawi can be reached at (571) 272-9782. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 2/27/2026 /ANDREW N HO/Examiner Art Unit 2169 /SHERIEF BADAWI/Supervisory Patent Examiner, Art Unit 2169
Read full office action

Prosecution Timeline

Dec 12, 2023
Application Filed
Jun 05, 2024
Response after Non-Final Action
May 13, 2025
Non-Final Rejection — §102, §103
Aug 07, 2025
Applicant Interview (Telephonic)
Aug 07, 2025
Examiner Interview Summary
Sep 19, 2025
Response Filed
Oct 28, 2025
Final Rejection — §102, §103
Dec 30, 2025
Request for Continued Examination
Jan 20, 2026
Response after Non-Final Action
Feb 17, 2026
Non-Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12541533
DATA SYNCHRONIZATION ERROR RESOLUTION
2y 5m to grant Granted Feb 03, 2026
Patent 12524423
Systems and Methods for Using Multiple Aggregation Levels in a Single Data Visualization
2y 5m to grant Granted Jan 13, 2026
Patent 12511265
DEDUPLICATION FOR DATA TRANSFERS TO PORTABLE STORAGE DEVICES
2y 5m to grant Granted Dec 30, 2025
Patent 12475002
SYSTEM AND METHOD FOR EFFICIENT BLOCK LEVEL GRANULAR REPLICATION
2y 5m to grant Granted Nov 18, 2025
Patent 12475130
NATURAL LANGUAGE KEYWORD TAG EXTRACTION
2y 5m to grant Granted Nov 18, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
62%
Grant Probability
92%
With Interview (+30.3%)
4y 1m
Median Time to Grant
High
PTA Risk
Based on 221 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month