DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 21-40 are pending this office action.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on December 30th, 2025 has been entered.
Response to Amendment
This Office Action is in response to applicant’s communication filed on December 30th, 2025. The applicant’s remark and amendments to the claims were considered with the results that follow.
In response to the last Office Action, claims 21, 28, and 34 have been amended. As a result, claims 21-40 are pending in this application.
Response to Arguments
Applicant’s argument, see pgs. 6-9 of the remarks, filed on December 30th, 2025, with respect to the rejection of independent claims 21, 28 and 34 as amended under 35 U.S.C 102 and 35 U.S.C 103, where the applicant asserts that Non-Patent Literature "Delivering Deep Learning to Mobile Devices via Offloading" issued to Xukan Ran et al. (hereinafter as “Ran”) does not explicitly teach or suggest, "subsequent local processing of the input data ... , thereby continuing processing of the input data according to a different resource allocation.".
Examiner respectfully disagrees. Ran teaches “subsequent local processing of the input data ... , thereby continuing processing of the input data according to a different resource allocation."
Ran indicates on pg. 43, “
PNG
media_image1.png
347
492
media_image1.png
Greyscale
”.
The following above statement merely specifies that when the smart watch process data it would determine to offload to a nearby device due to camera capability and processing requirement due to unable to processing the video feed due to performance issue in which would upload the video to a nearby device (Smartwatch with camera capabilities and processing requirements…could offload to a nearby device such as the user’s smartphone).
Additionally, Ran teaches thereby continuing processing of the input data according to a different resource allocation according on pg. 45, where Ran indicates on pg. 45, “
PNG
media_image2.png
132
472
media_image2.png
Greyscale
As shown above, the offloading decision is highly dependent on the network conditions, therefore it is understood that the offloading decision must changed based on the network resource change thereby continuing the processing of the input data based on the different network resource change. Accordingly, on pg. 43, Ran indicates the example as shown below,
PNG
media_image3.png
172
491
media_image3.png
Greyscale
the Minecraft game would is first uploaded to the back-end server when performance is not good causing the decision to process to switch to a smaller accurate neural net model without causing to invoke the cloud server.
As such, Ran teaches the amended limitations as discussed above.
Applicant’s argument, see pg. 9 of the remarks, filed on December 30th, 2025, with respect to the rejection of dependent claim 26, where the applicant asserts that Non-Patent Literature "Delivering Deep Learning to Mobile Devices via Offloading" issued to Xukan Ran et al. (hereinafter as “Ran”) does not explicitly teach or suggest, "the determined configuration directs one or more cloud devices to perform processing of the extracted video frames using a heavy DNN model when results from the lightweight DNN model do not meet the defined threshold confidence value".
Examiner respectfully disagrees. The above limitation merely indicates reallocation between neural network models according to the latency value. Thus the limitation above merely indicates redirecting traffic to the cloud device using the Big CNN in result that the lightweight DNN model (Little CNN) does not meet the threshold value. Thus, Ran teaches this aspect on
Ran, pg. 44-45,
PNG
media_image4.png
397
492
media_image4.png
Greyscale
PNG
media_image5.png
153
311
media_image5.png
Greyscale
{Examiner correlates the offloading to determine the best course of action to be perform on the video processing in such that when the frame rate is above a certain threshold, the CNN can be performed on the server (heavyweight DNN Model)} while frame rates below the threshold of 4 Mbps is run locally on the phone}).
As such, Ran teaches the above limitation as discussed above.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 28-30 and 33 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Non-Patent Literature "Delivering Deep Learning to Mobile Devices via Offloading" issued to Xukan Ran et al. (hereinafter as “Ran”).
Regarding claim 28, Ran teaches a method, comprising: allocating initial local processing of input data, received from a first device, between a set of local devices comprising one or more edge devices and one or more cameras (Ran pg. 42-43:
PNG
media_image6.png
197
495
media_image6.png
Greyscale
PNG
media_image7.png
532
497
media_image7.png
Greyscale
Pg. 44-45
PNG
media_image8.png
102
432
media_image8.png
Greyscale
PNG
media_image9.png
250
502
media_image9.png
Greyscale
{Examiner correlates the allocating initial local processing of input data, received from a first device as a baseline diagnostic step in such that the local processing is done locally on the phone the (First device) and analyzing tradeoff between frame rates before deciding to offload to a cloud server. Thus, the allocating between the camera and edge devices based on the features received by adjusting the resolution of the camera according to the input and reviewing the results to determine that that lower resolution requires less computation in which is fed to the Small CNN run locally on the phone while higher resolution require higher computation is sent to the cloud server, the user’s home router, or the user’s laptop based on the associated learnable parameters of the device. Accordingly, the option of a smart watch with a camera aspect offloads to the mobile phone (Edge device) in which includes a camera module});
detecting a change in resource availability for the determined configuration (Ran pg. 42-43:
PNG
media_image10.png
172
493
media_image10.png
Greyscale
PNG
media_image1.png
347
492
media_image1.png
Greyscale
Pg. 45
PNG
media_image11.png
248
482
media_image11.png
Greyscale
{Examiner correlates the detection of the change from the resource according to the policy of the device where the device may operate on a CNN on the phone when threshold is about 4Mbps and above the that would require to offload a server}), wherein
the change is at least one of: a device condition or a network condition (
Ran: Pg. 45
PNG
media_image11.png
248
482
media_image11.png
Greyscale
{Examiner correlates the network condition based on the network bandwidth fluctuating between 4Mbps where the bandwidth conditions drop would require the phone to offload to a bigger server}}); and
when the detected change in resource availability is insufficient for the initial local processing of the input data
(Ran: Pg. 43-44;
PNG
media_image1.png
347
492
media_image1.png
Greyscale
PNG
media_image12.png
345
505
media_image12.png
Greyscale
{Examiner correlates the detected change is insufficient based on the rate having a low resolution in such the phone must decide to analyze better resolution based on the latency}),
reallocating local processing among the set of local devices (Ran: pg. 43
PNG
media_image1.png
347
492
media_image1.png
Greyscale
Pg. 45
PNG
media_image13.png
287
467
media_image13.png
Greyscale
{Examiner correlates the reallocating among the local devices as shown on pg.43, where the Smart watch decides to offload to nearby Smartphone based on the offloading policy where the smart watch would offload locally to the smartphone for network 4Mbps and below. A network bandwidth that is 4 Mbps threshold and above would require to offload to a bigger cloud server}), thereby continuing processing of the input data according to a different resource allocation (
Ran: Pg. 43;
PNG
media_image1.png
347
492
media_image1.png
Greyscale
pg. 44-45, “
PNG
media_image14.png
228
498
media_image14.png
Greyscale
PNG
media_image9.png
250
502
media_image9.png
Greyscale
PNG
media_image15.png
154
308
media_image15.png
Greyscale
{Examiner interprets a wearable device (smartwatch) can offload data to a nearby device the smart phone as directly teaching the concept of continuing processing under a different resource allocation. This merely utilizes a edge assisted task to offload the wearable limited resource of the smart watch to transfer computational task to capable nearby device}).
Regarding claim 29, Ran further teaches further allocating processing to one or more smart devices, the one or more smart devices performing processing that is computationally cheaper than processing performed by the one or more edge devices (Ran: pg. 44-45, “
PNG
media_image16.png
228
329
media_image16.png
Greyscale
PNG
media_image17.png
355
325
media_image17.png
Greyscale
{Examiner correlates allocating the camera and edge devices based on the features received by adjusting the resolution of the camera according to the input and reviewing the results to determine that that lower resolution requires less computation while higher resolution require higher computation based on the associated learnable parameters of the device}).
Regarding claim 30, Ran further teaches dynamically shifting the processing load of the input data back to the one or more edge devices upon detecting that network capability between the set of local devices has been restored (Ran: pg. 43
PNG
media_image18.png
246
488
media_image18.png
Greyscale
pg. 44-45, “
PNG
media_image15.png
154
308
media_image15.png
Greyscale
{Examiner correlates the offloading as determining to shift the processing load to increasing the processing of the mobile device by observing if the frame rate is below a threshold in which is then run on the mobile phone instead. The set of local devices is from offloading from a nearby device in such a smart phone based on the head mount display with camera would offload based on the frame rate requirement}).
Regarding claim 33, the modification of Ran and Lin teaches claimed invention substantially as claimed, and Ran further teaches the allocated processing of input data directs the set of local devices to perform processing of the extracted video frames using a lightweight DNN model locally on the set of local devices (Ran,
pg. 43;
PNG
media_image18.png
246
488
media_image18.png
Greyscale
pg. 45, “
PNG
media_image5.png
153
311
media_image5.png
Greyscale
{Examiner correlates the offloading to determine the best course of action to be perform on the video processing in such that when the frame rate is below a certain threshold, the CNN can be performed on the mobile phone (lightweight DNN Model)}).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 21, 27, 34, and 40 are rejected under 35 U.S.C. 103 as being unpatentable over Non-Patent Literature "Delivering Deep Learning to Mobile Devices via Offloading" issued to Xukan Ran et al. (hereinafter as “Ran”) in view of Non-Patent Literation "Distributed Deep Neural Networks over the Cloud, the Edge and End Devices” issued to Teerapittayanon et al. (hereinafter as “Teerapittayanon”) .
Regarding claim 21, Ran teaches a system comprising (
PNG
media_image19.png
542
694
media_image19.png
Greyscale
{See Fig. 1 above discloses a system}): a processor (Ran: Introduction pg. 43, Fig 1:
PNG
media_image20.png
265
304
media_image20.png
Greyscale
{Examiner correlates the mobile phone having a processor to analyze input video and displaying an output afterwards}); and a storage memory storing computer-readable instructions, which when executed by the processor, cause the processor to (Ran: Introduction: pg. 43, a typical Android phone; that is, the video stxeams cannot be analyzed in real time [2]. Even with speedup from the mobile GPU [13], typical processing times are approximately 600 ms, which is equivalent to less than 1.7 frames per second. and is still not acceptable for real time processing. Pg. 43
PNG
media_image21.png
86
449
media_image21.png
Greyscale
{Examiner correlates the android phone to have storage memory to process the video stream and analyze the stream based on its parameter}): receive a video query regarding a live video stream (Ran: Introduction pg. 43;
PNG
media_image22.png
260
322
media_image22.png
Greyscale
{Examiner correlates the input video as the video query});
determine a configuration from the determined set of configurations for performing local processing the video query based on current resources available to the system (Ran: Introduction pg. 43;
PNG
media_image23.png
145
296
media_image23.png
Greyscale
{Examiner correlates the selection of the local processing video query based on the resource based on the tradeoff that was received from the object and based on the system would determine according to its parameters to decide an image resolution and model size and offloading decision to provide the best optimal decision by utilizing the offloading decision engine of Fig. 1 (shown below). Thus the performing the real-time detection based on the tradeoff is according to the system parameters in such perform by the camera of the device
PNG
media_image22.png
260
322
media_image22.png
Greyscale
});
allocate initial local processing for the video query between a set of local devices according to the determined configuration, the set of local devices comprising at least one camera and at least edge device (Ran pg. 42-43:
PNG
media_image6.png
197
495
media_image6.png
Greyscale
PNG
media_image7.png
532
497
media_image7.png
Greyscale
Pg. 44-45
PNG
media_image8.png
102
432
media_image8.png
Greyscale
PNG
media_image9.png
250
502
media_image9.png
Greyscale
{Examiner correlates the allocating initial local processing of input data, received from a first device as a baseline diagnostic step in such that the local processing is done locally on the phone the (First device) and analyzing tradeoff between frame rates before deciding to offload to a cloud server. Thus, the allocating between the camera and edge devices based on the features received by adjusting the resolution of the camera according to the input and reviewing the results to determine that that lower resolution requires less computation in which is fed to the Small CNN run locally on the phone while higher resolution require higher computation is sent to the cloud server, the user’s home router, or the user’s laptop based on the associated learnable parameters of the device. Accordingly, the option of a smart watch with a camera aspect offloads to the mobile phone (Edge device) in which includes a camera module});
detect a change in resource availability for the determined configuration (Ran pg. 42-43:
PNG
media_image10.png
172
493
media_image10.png
Greyscale
PNG
media_image18.png
246
488
media_image18.png
Greyscale
Pg. 45
PNG
media_image11.png
248
482
media_image11.png
Greyscale
), wherein the change is at least one of: a device condition or a network condition (Ran: Pg. 45
PNG
media_image11.png
248
482
media_image11.png
Greyscale
PNG
media_image13.png
287
467
media_image13.png
Greyscale
); and
when the detected change in resource availability is insufficient for the initial local processing of the video query (Ran: Pg. 43-44;
PNG
media_image1.png
347
492
media_image1.png
Greyscale
PNG
media_image12.png
345
505
media_image12.png
Greyscale
{Examiner correlates the detected change is insufficient based on the rate having a low resolution in such the phone must decide to analyze better resolution based on the latency}),
reallocate subsequentfor the video query among the set of local devices (Ran: pg. 43
PNG
media_image1.png
347
492
media_image1.png
Greyscale
Pg. 45
PNG
media_image13.png
287
467
media_image13.png
Greyscale
{Examiner correlates the reallocating among the local devices as shown on pg.43, where the Smart watch decicdes to offload to nearby Smartphone based on the offloading policy where the smart watch would offload locally to the smartphone for network 4Mbps and below. A network bandwidth that is 4 Mbps threshold and above would require to offload to a bigger cloud server}),
thereby continuing processing for the video query according to a different resource allocation (Ran: Pg. 43;
PNG
media_image1.png
347
492
media_image1.png
Greyscale
pg. 44-45, “
PNG
media_image14.png
228
498
media_image14.png
Greyscale
PNG
media_image9.png
250
502
media_image9.png
Greyscale
PNG
media_image15.png
154
308
media_image15.png
Greyscale
{Examiner interprets a wearable device (smartwatch) can offload data to a nearby device the smart phone as directly teaching the concept of continuing processing under a different resource allocation. This merely utilizes a edge assisted task to offload the wearable limited resource of the smart watch to transfer computational task to capable nearby device}).
Ran does not explicitly teach determine a resource requirement estimate for the video query; determine a set of configurations from a plurality of processing configurations, based upon an ability of each configuration of the set of configurations to process the video query;
However, Teerapittayanon teaches determine a resource requirement estimate for the video query (Teerapittayanon: D. DDNN Interfence: multiple preconfigured exit thresholds T (one element T at each exit point) as a measure of confidence in the prediction of the sample. One way to define T is by searching over the ranges of T on a validation set and pick the one with the best accuracy. H Reducing Communication Costs: offloading raw sensor input to the cloud. Sending a 32x32 RGB pixel image (the input size of our dataset) to the cloud costs 3072 bytes per image sample. By comparison, as shown in Table II, the largest DDNN model used in our evaluation section requires only 140 bytes of communication per sample on average (an over 20x reduction in communication costs) {Examiner correlates the resource estimate based on the measure of the confidence of the sample. Based on the estimate it determines best accuracy to select the best quality selection utilizing measure of the confidence of the sample data and sending the best quality image accordingly});
determine a set of configurations from a plurality of processing configurations, based upon an ability of each configuration of the set of configurations to process the video query (Teerapittayanon: D. DDNN Inference: multiple preconfigured exit thresholds T (one element T at each exit point) as a measure of confidence in the prediction of the sample. One way to define T is by searching over the ranges of T on a validation set and pick the one with the best accuracy. We use a normalized entropy threshold a the confidence criteria (instead of unnormalized entropy as used in [3 I) that determines whether to classify (exit) a sample at a particular exit point. This normalized entropy r; has values between 0 and I which allows easier interpretation and searching of its corresponding threshold T. For example. 17 close to 0 means that the DDNN is confident about the prediction of the sample: r7 close to I means it is not confident {Examiner correlates the selecting of the best configuration based on the measure of the confidence of the sample data and searching over the set and picking the best quality to assure that based on the criteria that the confidence of the sample is the best to achieve the most confident results based on the criteria});
It would have been obvious to a person of ordinary skill in the art , before the effective filing date of the invention, to modify Ran (teaches to include receive a video query regarding a live video stream and determine the best allocation to select the recommend configuration setting) with the teachings of Teerapittayanon (teaches selecting one or more configurations from a plurality of stored processing configurations and detect a change in resource availability by adjust the selected configuration). One of ordinary skill in the art would have been motivated to make such a combination of dramatically improving the accuracy of the system both at the local and cloud level by automatically tuned to process the geographically unique inputs and work together toward the same overall objective leading to high overall accuracy (See Teerapittayanon: DDNN PROVISION FOR HORIZONTAL AND VERTICAL SCALING). In addition, the references (Ran and Teerapittayanon) teach features that are directed to analogous art and they are directed to the same field of endeavor as Ran and Teerapittayanon are directed to receiving inputs from devices and adjusting the loading process to improve the overall performance of the system.
Regarding claim 27, the modification of the modification of Ran and Teerapittayanon teaches claimed invention substantially as claimed, and Ran further teaches detecting a change in resource availability comprises determining whether network connectivity to the at least one edge device is available (Ran: 4.3 Impact of variable network conditions, pg. 45, “
PNG
media_image15.png
154
308
media_image15.png
Greyscale
{Examiner correlates determining the resource available based on the network connectivity according to determining the offload policy on whether the CNN on the phone runs better or the CNN on the server is suitable to run better}).
Regarding claim 34, Ran teaches a method, comprising: receive a video query regarding a live video stream (Ran: Introduction pg. 43;
PNG
media_image22.png
260
322
media_image22.png
Greyscale
{Examiner correlates the input video as the video query});
determine a configuration from the determined set of configurations for performing local processing the video query based on current resources available to the system (Ran: Introduction pg. 43;
PNG
media_image23.png
145
296
media_image23.png
Greyscale
{Examiner correlates the selection of the processing video query based on the resource based on the tradeoff that was received from the object and based on the system would determine according to its parameters to decide an image resolution and model size and offloading decision to provide the best optimal decision by utilizing the offloading decision engine of Fig. 1 (shown below)
PNG
media_image22.png
260
322
media_image22.png
Greyscale
});
allocate initial local processing for the video query between a set of local devices according to the determined configuration, the set of local devices comprising at least one camera and at least one edge device (Ran pg. 42-43:
PNG
media_image6.png
197
495
media_image6.png
Greyscale
PNG
media_image7.png
532
497
media_image7.png
Greyscale
Pg. 44-45
PNG
media_image8.png
102
432
media_image8.png
Greyscale
PNG
media_image9.png
250
502
media_image9.png
Greyscale
{Examiner correlates the allocating initial local processing of input data, received from a first device as a baseline diagnostic step in such that the local processing is done locally on the phone the (First device) and analyzing tradeoff between frame rates before deciding to offload to a cloud server. Thus, the allocating between the camera and edge devices based on the features received by adjusting the resolution of the camera according to the input and reviewing the results to determine that that lower resolution requires less computation in which is fed to the Small CNN run locally on the phone while higher resolution require higher computation is sent to the cloud server, the user’s home router, or the user’s laptop based on the associated learnable parameters of the device. Accordingly, the option of a smart watch with a camera aspect offloads to the mobile phone (Edge device) in which includes a camera module});
detecting a change in resource availability for the determined configuration, wherein the change is at least one of a device condition or a network condition (Ran: Pg. 45
PNG
media_image11.png
248
482
media_image11.png
Greyscale
PNG
media_image13.png
287
467
media_image13.png
Greyscale
); and
when the detected change in resource availability is insufficient for the initial local processing of the video query (Ran: Pg. 43-44;
PNG
media_image1.png
347
492
media_image1.png
Greyscale
PNG
media_image12.png
345
505
media_image12.png
Greyscale
{Examiner correlates the detected change is insufficient based on the rate having a low resolution in such the phone must decide to analyze better resolution based on the latency}),
reallocate subsequentfor the video query among the set of local devices (Ran: pg. 43
PNG
media_image1.png
347
492
media_image1.png
Greyscale
Pg. 45
PNG
media_image13.png
287
467
media_image13.png
Greyscale
{Examiner correlates the reallocating among the local devices as shown on pg.43, where the Smart watch decicdes to offload to nearby Smartphone based on the offloading policy where the smart watch would offload locally to the smartphone for network 4Mbps and below. A network bandwidth that is 4 Mbps threshold and above would require to offload to a bigger cloud server}),
thereby continuing processing for the video query according to a different resource allocation (Ran: Pg. 43;
PNG
media_image1.png
347
492
media_image1.png
Greyscale
pg. 44-45, “
PNG
media_image14.png
228
498
media_image14.png
Greyscale
PNG
media_image9.png
250
502
media_image9.png
Greyscale
PNG
media_image15.png
154
308
media_image15.png
Greyscale
{Examiner interprets a wearable device (smartwatch) can offload data to a nearby device the smart phone as directly teaching the concept of continuing processing under a different resource allocation. This merely utilizes a edge assisted task to offload the wearable limited resource of the smart watch to transfer computational task to capable nearby device}).
Ran does not explicitly teach determine a resource requirement estimate for the video query; determine a set of configurations from a plurality of processing configurations, based upon an ability of each configuration of the set of configurations to process the video query;
However, Teerapittayanon teaches determine a resource requirement estimate for the video query (Teerapittayanon: D. DDNN Interfence: multiple preconfigured exit thresholds T (one element T at each exit point) as a measure of confidence in the prediction of the sample. One way to define T is by searching over the ranges of T on a validation set and pick the one with the best accuracy. H Reducing Communication Costs: offloading raw sensor input to the cloud. Sending a 32x32 RGB pixel image (the input size of our dataset) to the cloud costs 3072 bytes per image sample. By comparison, as shown in Table II, the largest DDNN model used in our evaluation section requires only 140 bytes of communication per sample on average (an over 20x reduction in communication costs) {Examiner correlates the resource estimate based on the measure of the confidence of the sample. Based on the estimate it determines best accuracy to select the best quality selection utilizing measure of the confidence of the sample data and sending the best quality image accordingly});
determine a set of configurations from a plurality of processing configurations, based upon an ability of each configuration of the set of configurations to process the video query (Teerapittayanon: D. DDNN Inference: multiple preconfigured exit thresholds T (one element T at each exit point) as a measure of confidence in the prediction of the sample. One way to define T is by searching over the ranges of T on a validation set and pick the one with the best accuracy. We use a normalized entropy threshold a the confidence criteria (instead of unnormalized entropy as used in [3 I) that determines whether to classify (exit) a sample at a particular exit point. This normalized entropy r; has values between 0 and I which allows easier interpretation and searching of its corresponding threshold T. For example. 17 close to 0 means that the DDNN is confident about the prediction of the sample: r7 close to I means it is not confident {Examiner correlates the selecting of the best configuration based on the measure of the confidence of the sample data and searching over the set and picking the best quality to assure that based on the criteria that the confidence of the sample is the best to achieve the most confident results based on the criteria});
It would have been obvious to a person of ordinary skill in the art , before the effective filing date of the invention, to modify Ran (teaches to include receive a video query regarding a live video stream and determine the best allocation to select the recommend configuration setting) with the teachings of Teerapittayanon (teaches selecting one or more configurations from a plurality of stored processing configurations and detect a change in resource availability by adjust the selected configuration). One of ordinary skill in the art would have been motivated to make such a combination of dramatically improving the accuracy of the system both at the local and cloud level by automatically tuned to process the geographically unique inputs and work together toward the same overall objective leading to high overall accuracy (See Teerapittayanon: DDNN PROVISION FOR HORIZONTAL AND VERTICAL SCALING). In addition, the references (Ran and Teerapittayanon) teach features that are directed to analogous art and they are directed to the same field of endeavor as Ran and Teerapittayanon are directed to receiving inputs from devices and adjusting the loading process to improve the overall performance of the system.
Regarding claim 40, the modification of the modification of Ran and Teerapittayanon teaches claimed invention substantially as claimed, and Ran further teaches detecting a change in resource availability comprises determining whether network connectivity to one or more cloud devices is available (Ran: 4.3 Impact of variable network conditions, pg. 45, “
PNG
media_image15.png
154
308
media_image15.png
Greyscale
{Examiner correlates determining the resource available based on the network connectivity according to determining the offload policy on whether the CNN on the phone runs better or the CNN on the server is suitable to run better}).
Claims 22-26 and 35-39 are rejected under 35 U.S.C. 103 as being unpatentable over Non-Patent Literature "Delivering Deep Learning to Mobile Devices via Offloading" issued to Xukan Ran et al. (hereinafter as “Ran”) in view of Non-Patent Literation "Distributed Deep Neural Networks over the Cloud, the Edge and End Devices” issued to Teerapittayanon et al. (hereinafter as “Teerapittayanon”) in further view of Non-Patent Literature "A Computing Platform for Video Crowdprocessing Using Deep Learning" issued to Lu et al. (hereinafter as "Lu").
Regarding claim 22, the modification of Ran and Teerapittayanon teaches claimed invention substantially as claimed, however the modification of Ran and Teerapittayanon does not explicitly teach determined configuration directs the set of local devices to extract video frames from the live video stream using a decoding module.
Lu teaches the determined configuration directs the set of local devices to extract video frames from the live video stream using a decoding module (
PNG
media_image24.png
259
502
media_image24.png
Greyscale
See Lu pg 1431, III. OVERVIEW: We consider a crowdprocessing approach to perform object detection/classification of videos. This task includes filtering of videos based on metadata, and then processing the videos to perform object detection. Alternatively, the user can process some of the frames locally… Once these videos are processed using deep learning either locally on the mobile device…For frames that are processed on the mobile devices, the user will forward either the tags, the frames of interest. Alternatively, the user may perform frame extraction locally and then offload specific frames to the cloud. In this case, it needs to determine whether each frame is processed by (ii) frame offload in which the frame is sent to the cloud for detection or (iii) local detection where the frame is detected on the mobile device {Examiner correlates based on the frame extraction that heavy processing would be offloaded to the cloud to perform the video processing while low processing can be immediately by process the object detection then deliver to the cloud to perform the video processing}).
It would have been obvious to a person of ordinary skill in the art , before the effective filing date of the invention, to modify Ran (teaches to include receive a video query regarding a live video stream and determine the best allocation to select the recommend configuration setting) with the teachings of Teerapittayanon (teaches selecting one or more configurations from a plurality of stored processing configurations and detect a change in resource availability by adjust the selected configuration) with the further teachings of Lu (teaches the determined configuration directs the set of local devices to extract video frames from the live video stream using a decoding module). One of ordinary skill in the art would have been motivated to make such a combination of dramatically improving frames of the video by saving the energy usage by offloading to a different location to process video without the need of extra energy usage concern (See Lu: III. OVERVIEW; At the expense of some energy usage, the user can offload the videos to the cloud, where high performance computing can quickly process the videos without energy usage concerns). In addition, the references (Ran, Teerapittayanon, and Lu) teach features that are directed to analogous art and they are directed to the same field of endeavor as Ran, Teerapittayanon, and Lu are directed to receiving inputs from devices and adjusting the loading process to improve the overall performance of the system.
Regarding claim 23, the modification of the modification of Ran, Teerapittayanon, and Lu teaches claimed invention substantially as claimed, and Lu further teaches the determined configuration directs the set of local devices to perform background subtraction on the extracted video frames (Lu: pg 1431, III. OVERVIEW: Alternatively, the user can process some of the frames locally… Once these videos are processed using deep learning either locally on the mobile device…For frames that are processed on the mobile devices, the user will forward either the tags, the frames of interest. pg.1432, A. Frame Extraction, “Frame extraction is used to take individual video frames and transform them into images upon which object detection may be performed. To target objects with different dynamics within the video, the task issuer may request a different frame extraction rate. For example, for an object moving at a high speed like a car, the rate should be high enough to not miss the object”. B. Detection, “However, batch processing performs much better, where the intercept (α) is about 240ms and the slope (β) is 400ms. The difference grows with the increase of the number of frames. For detection, it is better to put more frames in a batch to reduce processing time. However, the system must wait longer to get the extracted frames from the video. Therefore, it is difficult to determine the best batch size” {Examiner correlates the difference rate in detecting the object would determine the best batch size to be deliver}).
Regarding claim 24, the modification of the modification of Ran, Teerapittayanon, and Lu teaches claimed invention substantially as claimed, and Lu further teaches the background subtraction is performed on the extracted video frames to determine whether additional processing should be performed (Lu: pg.1432, A. Frame Extraction, “Frame extraction is used to take individual video frames and transform them into images upon which object detection may be performed. To target objects with different dynamics within the video, the task issuer may request a different frame extraction rate. For example, for an object moving at a high speed like a car, the rate should be high enough to not miss the object”. B. Detection, “However, batch processing performs much better, where the intercept (α) is about 240ms and the slope (β) is 400ms. The difference grows with the increase of the number of frames. For detection, it is better to put more frames in a batch to reduce processing time. However, the system must wait longer to get the extracted frames from the video. Therefore, it is difficult to determine the best batch size” {Examiner correlates the additional processing based on observing the difference growth in the rate when detecting the object in which allows to put more frames in the batch in which requires additional processing to determine best batch size}).
Regarding claim 25, the modification of the modification of Ran, Teerapittayanon, and Lu teaches claimed invention substantially as claimed, and Ran further teaches the determined configuration directs set of local devices to perform processing of the extracted video frames using a lightweight DNN model locally on the set of local devices (Ran, Introduction pg. 43;
PNG
media_image25.png
355
322
media_image25.png
Greyscale
pg. 45, “
PNG
media_image5.png
153
311
media_image5.png
Greyscale
{Examiner correlates the offloading to determine the best course of action to be perform on the video processing in such that when the frame rate is below a certain threshold, the CNN can be performed on the mobile phone (lightweight DNN Model)}).
Regarding claim 26, the modification of the modification of Ran, Teerapittayanon, and Lu teaches claimed invention substantially as claimed, and Ran further teaches the determined configuration directs one or more cloud devices to perform processing of the extracted video frames using a heavy DNN model when results from the lightweight DNN model do not meet the defined threshold confidence value (Ran, pg. 44-45,
PNG
media_image4.png
397
492
media_image4.png
Greyscale
PNG
media_image5.png
153
311
media_image5.png
Greyscale
{Examiner correlates the offloading to determine the best course of action to be perform on the video processing in such that when the frame rate is above a certain threshold, the CNN can be performed on the server (heavyweight DNN Model)} while frame rates below the threshold of 4 Mbps is run locally on the phone}).
Regarding claim 35, the modification of Ran and Teerapittayanon teaches claimed invention substantially as claimed, however the modification of Ran and Teerapittayanon does not explicitly teach the determined configuration directs the set of local devices to extract video frames from the live video stream using a decoding module.
Lu teaches the determined configuration directs the set of local devices to extract video frames from the live video stream using a decoding module (
PNG
media_image24.png
259
502
media_image24.png
Greyscale
See Lu pg 1431, III. OVERVIEW: We consider a crowdprocessing approach to perform object detection/classification of videos. This task includes filtering of videos based on metadata, and then processing the videos to perform object detection. Alternatively, the user can process some of the frames locally… Once these videos are processed using deep learning either locally on the mobile device…For frames that are processed on the mobile devices, the user will forward either the tags, the frames of interest. Alternatively, the user may perform frame extraction locally and then offload specific frames to the cloud. In this case, it needs to determine whether each frame is processed by (ii) frame offload in which the frame is sent to the cloud for detection or (iii) local detection where the frame is detected on the mobile device {Examiner correlates based on the frame extraction that heavy processing would be offloaded to the cloud to perform the video processing while low processing can be immediately by process the object detection then deliver to the cloud to perform the video processing}).
It would have been obvious to a person of ordinary skill in the art , before the effective filing date of the invention, to modify Ran (teaches to include receive a video query regarding a live video stream and determine the best allocation to select the recommend configuration setting) with the teachings of Teerapittayanon (teaches selecting one or more configurations from a plurality of stored processing configurations and detect a change in resource availability by adjust the selected configuration) with the further teachings of Lu (teaches the determined configuration directs the set of local devices to extract video frames from the live video stream using a decoding module). One of ordinary skill in the art would have been motivated to make such a combination of dramatically improving frames of the video by saving the energy usage by offloading to a different location to process video without the need of extra energy usage concern (See Lu: III. OVERVIEW; At the expense of some energy usage, the user can offload the videos to the cloud, where high performance computing can quickly process the videos without energy usage concerns). In addition, the references (Ran, Teerapittayanon, and Lu) teach features that are directed to analogous art and they are directed to the same field of endeavor as Ran, Teerapittayanon, and Lu are directed to receiving inputs from devices and adjusting the loading process to improve the overall performance of the system.
Regarding claim 36, the modification of the modification of Ran, Teerapittayanon, and Lu teaches claimed invention substantially as claimed, and Lu further teaches the determined configuration directs the set of local devices to perform background subtraction on the extracted video frames (Lu: pg 1431, III. OVERVIEW: Alternatively, the user can process some of the frames locally… Once these videos are processed using deep learning either locally on the mobile device…For frames that are processed on the mobile devices, the user will forward either the tags, the frames of interest. pg.1432, A. Frame Extraction, “Frame extraction is used to take individual video frames and transform them into images upon which object detection may be performed. To target objects with different dynamics within the video, the task issuer may request a different frame extraction rate. For example, for an object moving at a high speed like a car, the rate should be high enough to not miss the object”. B. Detection, “However, batch processing performs much better, where the intercept (α) is about 240ms and the slope (β) is 400ms. The difference grows with the increase of the number of frames. For detection, it is better to put more frames in a batch to reduce processing time. However, the system must wait longer to get the extracted frames from the video. Therefore, it is difficult to determine the best batch size” {Examiner correlates the difference rate in detecting the object would determine the best batch size to be deliver}).
Regarding claim 37, the modification of the modification of Ran, Teerapittayanon, and Lu teaches claimed invention substantially as claimed, and Lu further teaches the background subtraction is performed on the extracted video frames to determine whether additional processing should be performed (Lu: pg.1432, A. Frame Extraction, “Frame extraction is used to take individual video frames and transform them into images upon which object detection may be performed. To target objects with different dynamics within the video, the task issuer may request a different frame extraction rate. For example, for an object moving at a high speed like a car, the rate should be high enough to not miss the object”. B. Detection, “However, batch processing performs much better, where the intercept (α) is about 240ms and the slope (β) is 400ms. The difference grows with the increase of the number of frames. For detection, it is better to put more frames in a batch to reduce processing time. However, the system must wait longer to get the extracted frames from the video. Therefore, it is difficult to determine the best batch size” {Examiner correlates the additional processing based on observing the difference growth in the rate when detecting the object in which allows to put more frames in the batch in which requires additional processing to determine best batch size}).
Regarding claim 38, the modification of the modification of Ran, Teerapittayanon, and Lu teaches claimed invention substantially as claimed, and Ran further teaches the determined configuration directs the set of local devices to perform processing of the extracted video frames using a lightweight DNN model locally on the set of local devices (Ran, Introduction pg. 43;
PNG
media_image25.png
355
322
media_image25.png
Greyscale
pg. 45, “
PNG
media_image5.png
153
311
media_image5.png
Greyscale
{Examiner correlates the offloading to determine the best course of action to be perform on the video processing in such that when the frame rate is below a certain threshold, the CNN can be performed on the mobile phone (lightweight DNN Model)}).
Regarding claim 39, the modification of the modification of Ran, Teerapittayanon, and Lu teaches claimed invention substantially as claimed, and Ran further teaches the determined configuration directs one or more cloud devices to perform processing of the extracted video frames using a heavy DNN model when results from the lightweight DNN model do not meet the defined threshold confidence value (Ran, pg. 45, “
PNG
media_image5.png
153
311
media_image5.png
Greyscale
{Examiner correlates the offloading to determine the best course of action to be perform on the video processing in such that when the frame rate is above a certain threshold, the CNN can be performed on the server (heavyweight DNN Model)}).
Claims 31-32 are rejected under 35 U.S.C. 103 as being unpatentable over Non-Patent Literature "Delivering Deep Learning to Mobile Devices via Offloading" issued to Xukan Ran et al. (hereinafter as “Ran”) in view of Non-Patent Literature "A Computing Platform for Video Crowdprocessing Using Deep Learning" issued to Lu et al. (hereinafter as "Lu").
Regarding claim 31, Ran teaches claimed invention substantially as claimed, however Ran does not explicitly teach the allocated processing of input data directs the set of local devices to extract video frames from the live video stream using a decoding module.
Lu teaches further teaches the allocated processing of input data directs the set of local devices to extract video frames from the live video stream using a decoding module (
PNG
media_image24.png
259
502
media_image24.png
Greyscale
See Lu pg 1431, III. OVERVIEW: We consider a crowdprocessing approach to perform object detection/classification of videos. This task includes filtering of videos based on metadata, and then processing the videos to perform object detection. Alternatively, the user can process some of the frames locally… Once these videos are processed using deep learning either locally on the mobile device…For frames that are processed on the mobile devices, the user will forward either the tags, the frames of interest. Alternatively, the user may perform frame extraction locally and then offload specific frames to the cloud. In this case, it needs to determine whether each frame is processed by (ii) frame offload in which the frame is sent to the cloud for detection or (iii) local detection where the frame is detected on the mobile device {Examiner correlates based on the frame extraction that heavy processing would be offloaded to the cloud to perform the video processing while low processing can be immediately by process the object detection then deliver to the cloud to perform the video processing}).
It would have been obvious to a person of ordinary skill in the art , before the effective filing date of the invention, to modify Ran (teaches to include receive a video query regarding a live video stream and determine the best allocation to select the recommend configuration setting) with the teachings of Lu (teaches the allocated processing of input data directs the one or more cameras or the one or more edge devices to extract video frames from the live video stream using a decoding module). One of ordinary skill in the art would have been motivated to make such a combination of dramatically improving frames of the video by saving the energy usage by offloading to a different location to process video without the need of extra energy usage concern (See Lu: III. OVERVIEW; At the expense of some energy usage, the user can offload the videos to the cloud, where high performance computing can quickly process the videos without energy usage concerns). In addition, the references (Ran and Lu) teach features that are directed to analogous art and they are directed to the same field of endeavor as Ran and Lu are directed to receiving inputs from devices and adjusting the loading process to improve the overall performance of the system.
Regarding claim 32, Ran teaches claimed invention substantially as claimed, however, Ran does not explicitly teach the allocated processing of input data directs the set of local devices to perform background subtraction on the extracted video frames.
Lu teaches the allocated processing of input data directs the set of local devices to perform background subtraction on the extracted video frames (Lu:
pg 1431, III. OVERVIEW: Alternatively, the user can process some of the frames locally… Once these videos are processed using deep learning either locally on the mobile device…For frames that are processed on the mobile devices, the user will forward either the tags, the frames of interest. pg.1432, A. Frame Extraction, “Frame extraction is used to take individual video frames and transform them into images upon which object detection may be performed. To target objects with different dynamics within the video, the task issuer may request a different frame extraction rate. For example, for an object moving at a high speed like a car, the rate should be high enough to not miss the object”. B. Detection, “However, batch processing performs much better, where the intercept (α) is about 240ms and the slope (β) is 400ms. The difference grows with the increase of the number of frames. For detection, it is better to put more frames in a batch to reduce processing time. However, the system must wait longer to get the extracted frames from the video. Therefore, it is difficult to determine the best batch size” {Examiner correlates the difference rate in detecting the object would determine the best batch size to be deliver}).
It would have been obvious to a person of ordinary skill in the art , before the effective filing date of the invention, to modify Ran (teaches to include receive a video query regarding a live video stream and determine the best allocation to select the recommend configuration setting) with the teachings of Lu (teaches the allocated processing of input data directs the one or more cameras or the one or more edge devices to extract video frames from the live video stream using a decoding module). One of ordinary skill in the art would have been motivated to make such a combination of dramatically improving frames of the video by saving the energy usage by offloading to a different location to process video without the need of extra energy usage concern (See Lu: III. OVERVIEW; At the expense of some energy usage, the user can offload the videos to the cloud, where high performance computing can quickly process the videos without energy usage concerns). In addition, the references (Ran and Lu) teach features that are directed to analogous art and they are directed to the same field of endeavor as Ran and Lu are directed to receiving inputs from devices and adjusting the loading process to improve the overall performance of the system.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Non-Patent Literature “Energy-Traffic Tradeoff Cooperative Offloading for Mobile Cloud Computing” issued to Song et al. (hereinafter as “Song”) teaches energy tradeoff in mobile cloud computing environment by achieving a desirable trade off between energy consumption and internet data tradeoff from a mobile device to a cloud server.
Non-Patent Literature “LAVEA: latency-aware video analytics on edge computing platform” issued to YI et al. (hereinafter as “YI”) teaches formulating a way to offload task based on comparing workload choosing the best overall task placement for collaboration.
U.S Patent Application Publication 2007/0271570 issued to Brown et al. (hereinafter as “Brown”) teaches an existing system that determine the load on the nodes and determine a way to balance the workload among the database system nodes.
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANDREW N HO whose telephone number is (571)270-0590. The examiner can normally be reached Tuesday and Thursday 10:00-6:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sherief Badawi can be reached at (571) 272-9782. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
2/27/2026
/ANDREW N HO/Examiner
Art Unit 2169
/SHERIEF BADAWI/Supervisory Patent Examiner, Art Unit 2169