Last updated: April 19, 2026
Application No. 18/837,577
FEATURE EXTRACTION METHOD AND APPARATUS FOR VIDEO, SLICING METHOD AND APPARATUS FOR VIDEO, AND ELECTRONIC DEVICE AND STORAGE MEDIUM

Non-Final OA §101§102
Filed
Aug 09, 2024
Examiner
CHIO, TAT CHI
Art Unit
2486
Tech Center
2400 — Computer Networks
Assignee
Lemon Inc.
OA Round
1 (Non-Final)
Interview Optional

— +16.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 836 resolved cases, 2023–2026
Examiner Intelligence

CHIO, TAT CHI View full profile →
Grants 73% — above average
Career Allow Rate
610 granted / 836 resolved
+15.0% vs TC avg
Strong +17% interview lift
Without
With
+16.6%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
49 currently pending
Career history
885
Total Applications
across all art units
Statute-Specific Performance

§101
8.7%
-31.3% vs TC avg
§103
52.4%
+12.4% vs TC avg
§102
19.9%
-20.1% vs TC avg
§112
7.2%
-32.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 836 resolved cases
Office Action

§101 §102
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Election/Restrictions
Applicant’s election of Group I (claims 1-7, 14-15, and 17-22 in the reply filed on 9/24/2025 is acknowledged. Because applicant did not distinctly and specifically point out the supposed errors in the restriction requirement, the election has been treated as an election without traverse (MPEP § 818.01(a)).

Claim Objections
Claims 8-11 and 16 are objected to because of the following informalities: claims 8-11 and 16 (Group II) were not elected. Therefore, the claim status identifier for these claims should be “withdrawn”.  Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 15 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  Said claim discloses “storage medium.”.  Under the broadest reasonable interpretation, a "storage medium" can encompass non-statutory transitory forms of signal transmission, such as a propagating electrical or electromagnetic signal per se.  A claim directed to only signals per se is not a process, machine, manufacture, or composition of matter and therefore is not directed to statutory subject matter.  MPEP 2106.03.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-7, 14-15, and 17-22 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Kezele et al. (US 2023/0169794 A1).
Consider claim 14, Kezele teaches an electronic device (Fig. 2 shows an example of a device), comprising: at least one processor (The device 200 may include one or more processor devices, such as a processor. [0067]); and a storage apparatus, configured to store at least one program (The device may include non-transitory memories that may store instructions for execution by the processor. [0070].), wherein the at least one program, when executed by the at least one processor, causes the at least one processor to implement a feature extraction method for a video (The device may include non-transitory memories that may store instructions for execution by the processor to carry out examples described in the present disclosure. [0070].), which comprises: determining a plurality of groups of pictures of video data, wherein each group of pictures comprises, according to a time sequence, an intra coding frame and at least one predictive-frame (a decoder may first decode GOP1 102 and GOP2 104. GOP1and GOP2 includes I-frame and at least one predictive frame. [0060] – [0061] and Fig. 1); for each group of pictures, extracting a first frame feature of the intra coding frame (a decoder may first decode GOP 1 102. The decoder will decode the image data 122 of the first I-frame encoding 112 and use the resulting frame (i.e. an RGB image) as the video frame at t=0. [0060]. The resulting frame (i.e. an RGB image) is considered to be first frame feature), and extracting a compensation feature of motion compensation information of the at least one predictive-frame relative to the intra coding frame (The decoder will then decode or generate the first inter frame at t=1 by decoding the motion information 124 and residual information 126 from the first inter frame encoding 114, then applying video decompression techniques to reconstruct the inter frame at t=1 by transforming the image at t=0 using the motion information 124 and residual information 126. [0060]. Residual information is considered to be a compensation feature); and updating the compensation feature according to the first frame feature to obtain a second frame feature of the at least one predictive-frame (The second inter frame at t=2 is similarly decoded by transforming the reconstructed first inter frame at t=1 using the motion information 124 and residual information 126 decoded from the second inter frame encoding 116. [0060]. Decoding the residual information from the second inter frame encoding is considered to be updating the compensation feature. Because the residual information decoded from the second inter frame encoding is performed after the reconstruction of the inter frame at t=1 and inter frame at t=1 is reconstructed by transforming the image at t=0, the updating is according to the first frame feature. The inter frame at t=2 is considered to be second frame feature.), so as to obtain a frame feature of a video frame in the video data (At 1004, the RGB CNN 604a processes the RGB spatial attention information 602a and the inter frame to generate spatially weighted inter-frame feature information 332, specifically RGB-mode spatially weighted inter-frame feature information 332a. At 1008, the MV CNN 604b processes the MV spatial attention information 602b and the MV map to generate MV-mode spatially weighted inter-frame feature information 332b. At 1012, the residual CNN 604c processes the residual spatial attention information 602c and the residual map to generate residual-mode spatially weighted inter-frame feature information 332c. [0114] – [0116] and Fig. 3A-Fig. 3B. See also [0111] – [0117]. Spatially weighted interframe feature information 332a, 332b, and 332c are considered to be a frame feature of a video frame).
Consider claim 17, Kezele teaches updating the compensation feature according to the first frame feature, comprises: splicing the first frame feature, the motion compensation information and the compensation feature to obtain a spliced image (the decoded frame information 212 of each frame encoding may be stored in the memory 208, including a decoded frame 214 (such as the image data 122 of an I-frame encoding or a reconstructed inter frame in RGB image format for an inter frame encoding), a decoded MV map 216 (or other motion information) of an inter frame encoding, and/or a decoded residual map 218 (or other residual information) of an inter frame encoding. [0071]; FIG. 3A is a schematic diagram showing data flows of a first example adaptive inference software system 222 (222A in this example embodiment) as implemented by the processor 202. The adaptive inference software system 222 takes a compressed video 100 as input. The adaptive inference software system 222 uses the inference module 226 to adaptively perform an inference task, based on adaptation decisions made by the decision module 224. A modified video decoder 236 is used to generate decoded video information 212 including not only decoded images (i.e. inter frames 214), but also motion vector (MV) maps 216 and residual maps 218, for all inter frames. [0083]. Fig. 3A-Fig. 3B); determining a first weight of the spliced image in a channel dimension, and a second weight of the spliced image in a spatial dimension, respectively (spatial salience information 602a-602c. [0111] – [0117] and Fig. 3A-3B, Fig. 6); and processing the first frame feature according to the first weight and the second weight to obtain an update parameter (At 1004, the RGB CNN 604a processes the RGB spatial attention information 602a and the inter frame to generate spatially weighted inter-frame feature information 332, specifically RGB-mode spatially weighted inter-frame feature information 332a. The above steps are repeated for each other modality (although it will be appreciated that, in some embodiments, each modality is processed in parallel and independently from each other modality). At 1005, the decision information determines whether the MV processing module 232 is to be used to process the MV map 216. If so, the method proceeds to step 1006; if not, to step 1009. At 1006, the MV processing module 232 processes the MV map 216 using a MV spatial attention module 306b to generate spatial salience information 602b. In some embodiments, the MV spatial attention information 602b is combined with MV map 216 to generate a spatially weighted MV map, such as a cropped ROI of the MV map 216. At 1008, the MV CNN 604b processes the MV spatial attention information 602b and the MV map to generate MV-mode spatially weighted inter-frame feature information 332b. At 1009, the decision information determines whether the residual processing module 234 is to be used to process the residual map 216. If so, the method proceeds to step 1010; if not, step 910 ends (and method 900 proceeds to step 912). At 1010, the residual processing module 234 processes the residual map 218 using a residual spatial attention module 306c to generate spatial salience information 602c. In some embodiments, the residual spatial attention information 602c is combined with residual map 218 to generate a spatially weighted residual map, such as a cropped ROI of the residual map 218. At 1012, the residual CNN 604c processes the residual spatial attention information 602c and the residual map to generate residual-mode spatially weighted inter-frame feature information 332c. Returning to FIG. 3A and FIG. 9, at step 912, the inter-frame feature information 332a, 332b, and/or 332c for each selected modality is processed by the inference module 226 to perform the inference task. The inference model 226 typically performs the inference task over many frames of the compressed video 100, such that the inference module 226 performs the inference task by processing a plurality of frame encodings of the compressed video 100, including the inter frame encoding 114 used to generate the inter-frame feature information 332a, 332b, and/or 332c. Thus, the operations of the decision module 224 determine whether the inter frame encoding 114 is included in the plurality of frame encodings processed by the inference module 226, and if so, extracts feature information from one or more selected modalities of the inter frame encoding 114 prior to processing by the inference module 226. [0111] – [0117] and Fig. 3A-3B, Fig. 6), and updating the compensation feature according to the update parameter (FIG. 11A shows operations of an example step 912 of method 900, as implemented by the inference module 226 (226A in this embodiment) shown in FIG. 3A. The inference module 226A includes three modality-specific multi-class or binary classifiers or other inference models, such as deep CNNs including one or more fully connected layers. These are shown as RGB inference model 310, MV inference model 312, and residual inference model 314. At 1102, 1102 the modality-specific feature information of each modality is processed by the respective modality-specific inference model 310, 2312, 314 to generate modality-specific inference information 334a, 334b, 334c of each modality. The modality-specific inference information 334a, 334b, 334c includes inference or prediction information sufficient to complete the inference task, such as logits or a normalized probability distribution across the classes of the inference task. At 1004, the modality-specific inference information 334a, 334b, 334c of all processed modalities is then fused by an inference fusion module 320 to generate inference information 330, such as a single logits or probability distribution across the classes of the inference task, to complete the inference task. In some embodiments, the inference fusion module 320 performs a simple fusion (such as averaging); in other embodiments, the inference fusion module 320 performs a more complex fusion operation, such as multiplicative log probs fusion (i.e. multiplying the logarithms of the probabilities of each modality). FIG. 3B shows an alternative architecture 222B of the adaptive inference software system 222A of FIG. 3A, whose operations are shown in FIG. 11B. In the alternative architecture 222B, the inference module 226B reverses the order of operations of the first example inference module 226A described with reference to FIG. 3A and FIG. 11A above. At 1152, the inputs 332a, 332b, 332c to the inference module 226B are first fused by a feature fusion module 350. At 1154, the fused feature information 356 generated thereby is processed by a single fused inference model 352 to generate the inference information 330, thereby completing the inference task. [0118] – [0119] and Fig. 3A-3B, Fig. 6).
Consider claim 18, Kezele teaches determining the first weight of the spliced image in the channel dimension, and the second weight of the spliced image in the spatial dimension, respectively, comprises: extracting a splicing feature of the spliced image, wherein the splicing feature and the first frame feature have an identical size (FIG. 6 shows details of the operation of a set of three example modality-specific processing modules 230, 232, 234. The RGB processing module 230 receives the inter frame 214 as input. In the examples discussed herein, we assume that the inter frame 214 is a 3-channel image (i.e. Red, Green, and Blue channels) that can be expressed as a tensor of the size H×W×3, wherein H is the pixel height and W is the pixel width of the inter frame 214. The MV processing module 232 receives the MV map 216 as input. In the examples discussed herein, we assume that the MV map 216 is a 2-component vector field (i.e. horizontal (x) and vertical (y) vector components) that can be expressed as a tensor of the size H×W×2. The residual processing module 234 receives the residual map 218 as input. In the examples discussed herein, we assume that the residual map 218 is a 3-channel image (R, G, B channels) of motion-compensated RGB residuals that can be expressed as a tensor of the size H×W×3. [0106] and Fig. 6; In some embodiments, the spatial salience information 602a generated by the RGB spatial attention module 306a is soft spatial salience information, such as an attention map (e.g., dimensions H.sub.a×W.sub.a) indicating weight values at each pixel location that, when applied to one or multiple intermediate feature maps of the respective models 234, 232, 234 (note: the map may be downsampled to the corresponding feature map's spatial dimensions), weighs each feature map pixel location (over the totality of the map's channels) with an attentional weight indicating the degree in proportion to which each pixel or pixel region should affect the inference task. In some embodiments, the RGB spatial salience information 602s generated by the RGB spatial attention module 306a is hard spatial salience information, such as a ROI comprising a set of coordinates indicating a cropping operation to be performed on the inter frame 214, thereby limiting further processing of the inter frame 214 to the cropped ROI. The cropping operation may be a differentiable cropping operation, allowing this component to be trained with other components end-to-end. The pixel height and pixel width of the inter frame 214 could be reduced from H×W to a smaller region H.sub.r×W.sub.r, contained within H×W, while maintaining the same number of channels. The cropping operation may thus effectively be regarded as achieving the same result as a binary (i.e., hard) attention map, wherein a given pixel or pixel region is given a weight of either 1 or 0, although it may be implemented using a cropping operation. [0111]); pooling the splicing feature in the spatial dimension, and performing full connection on a pooling result to obtain the first weight of the spliced image in the channel dimension (At 1004, the RGB CNN 604a processes the RGB spatial attention information 602a and the inter frame to generate spatially weighted inter-frame feature information 332, specifically RGB-mode spatially weighted inter-frame feature information 332a. The above steps are repeated for each other modality (although it will be appreciated that, in some embodiments, each modality is processed in parallel and independently from each other modality). At 1005, the decision information determines whether the MV processing module 232 is to be used to process the MV map 216. If so, the method proceeds to step 1006; if not, to step 1009. At 1006, the MV processing module 232 processes the MV map 216 using a MV spatial attention module 306b to generate spatial salience information 602b. In some embodiments, the MV spatial attention information 602b is combined with MV map 216 to generate a spatially weighted MV map, such as a cropped ROI of the MV map 216. At 1008, the MV CNN 604b processes the MV spatial attention information 602b and the MV map to generate MV-mode spatially weighted inter-frame feature information 332b. At 1009, the decision information determines whether the residual processing module 234 is to be used to process the residual map 216. If so, the method proceeds to step 1010; if not, step 910 ends (and method 900 proceeds to step 912). At 1010, the residual processing module 234 processes the residual map 218 using a residual spatial attention module 306c to generate spatial salience information 602c. In some embodiments, the residual spatial attention information 602c is combined with residual map 218 to generate a spatially weighted residual map, such as a cropped ROI of the residual map 218. At 1012, the residual CNN 604c processes the residual spatial attention information 602c and the residual map to generate residual-mode spatially weighted inter-frame feature information 332c. Returning to FIG. 3A and FIG. 9, at step 912, the inter-frame feature information 332a, 332b, and/or 332c for each selected modality is processed by the inference module 226 to perform the inference task. The inference model 226 typically performs the inference task over many frames of the compressed video 100, such that the inference module 226 performs the inference task by processing a plurality of frame encodings of the compressed video 100, including the inter frame encoding 114 used to generate the inter-frame feature information 332a, 332b, and/or 332c. Thus, the operations of the decision module 224 determine whether the inter frame encoding 114 is included in the plurality of frame encodings processed by the inference module 226, and if so, extracts feature information from one or more selected modalities of the inter frame encoding 114 prior to processing by the inference module 226. [0111] – [0118], Fig. 3A-3B and Fig. 6); and performing convolution on feature maps of a plurality of channels in the splicing feature (RGB CNN 604a, MV CNN 604b, and residual CNN 604c. A differentiable backbone model (shown as RGB CNN 604a, MV CNN 604B, or residual CNN 604c) is provided for each processed modality, each model (generically, 604) being denoted as m.sub.i(α.sub.i) parametrized by α.sub.i. Each model m.sub.i 604 is a CNN model in the illustrated embodiment, but in some embodiments may be a general DNN model or other differentiable function. At 1004, the RGB CNN 604a processes the RGB spatial attention information 602a and the inter frame to generate spatially weighted inter-frame feature information 332, specifically RGB-mode spatially weighted inter-frame feature information 332a. At 1008, the MV CNN 604b processes the MV spatial attention information 602b and the MV map to generate MV-mode spatially weighted inter-frame feature information 332b.  At 1012, the residual CNN 604c processes the residual spatial attention information 602c and the residual map to generate residual-mode spatially weighted inter-frame feature information 332c. [0111] – [0118], Fig. 3A-3B and Fig. 6), and performing logistic regression on a convolution result to obtain the second weight of the spliced image in the spatial dimension (The modality selection module 238 processes the final feature vectors V.sub.i (or single final feature vector V) to generate the decision information 512 using a number N of Gumbel-Softmax operations 510. A single Gumbel-Softmax operation 510 may be used in some embodiments; in others, a set or a composition of Gumbel-Softmax is used to allow for multiple modalities to be modeled for the inter frame encoding 114. In some embodiments, reinforcement learning may be used in place of the Gumbel-Softmax operations 510. [0097] and Fig. 5).
Consider claim 19, Kezele teaches processing the first frame feature according to the first weight and the second weight to obtain the update parameter, comprises: multiplying the first frame feature by the first weight and the second weight to obtain the update parameter (At 1004, the RGB CNN 604a processes the RGB spatial attention information 602a and the inter frame to generate spatially weighted inter-frame feature information 332, specifically RGB-mode spatially weighted inter-frame feature information 332a. The above steps are repeated for each other modality (although it will be appreciated that, in some embodiments, each modality is processed in parallel and independently from each other modality). At 1005, the decision information determines whether the MV processing module 232 is to be used to process the MV map 216. If so, the method proceeds to step 1006; if not, to step 1009. At 1006, the MV processing module 232 processes the MV map 216 using a MV spatial attention module 306b to generate spatial salience information 602b. In some embodiments, the MV spatial attention information 602b is combined with MV map 216 to generate a spatially weighted MV map, such as a cropped ROI of the MV map 216. At 1008, the MV CNN 604b processes the MV spatial attention information 602b and the MV map to generate MV-mode spatially weighted inter-frame feature information 332b. At 1009, the decision information determines whether the residual processing module 234 is to be used to process the residual map 216. If so, the method proceeds to step 1010; if not, step 910 ends (and method 900 proceeds to step 912). At 1010, the residual processing module 234 processes the residual map 218 using a residual spatial attention module 306c to generate spatial salience information 602c. In some embodiments, the residual spatial attention information 602c is combined with residual map 218 to generate a spatially weighted residual map, such as a cropped ROI of the residual map 218. At 1012, the residual CNN 604c processes the residual spatial attention information 602c and the residual map to generate residual-mode spatially weighted inter-frame feature information 332c. Returning to FIG. 3A and FIG. 9, at step 912, the inter-frame feature information 332a, 332b, and/or 332c for each selected modality is processed by the inference module 226 to perform the inference task. The inference model 226 typically performs the inference task over many frames of the compressed video 100, such that the inference module 226 performs the inference task by processing a plurality of frame encodings of the compressed video 100, including the inter frame encoding 114 used to generate the inter-frame feature information 332a, 332b, and/or 332c. Thus, the operations of the decision module 224 determine whether the inter frame encoding 114 is included in the plurality of frame encodings processed by the inference module 226, and if so, extracts feature information from one or more selected modalities of the inter frame encoding 114 prior to processing by the inference module 226. [0111] – [0117] and Fig. 3A-3B, Fig. 6).
Consider claim 20, Kezele teaches updating the compensation feature according to the update parameter, comprises: pooling the compensation feature in the spatial dimension, and adding a pooling result to the update parameter (FIG. 11A shows operations of an example step 912 of method 900, as implemented by the inference module 226 (226A in this embodiment) shown in FIG. 3A. The inference module 226A includes three modality-specific multi-class or binary classifiers or other inference models, such as deep CNNs including one or more fully connected layers. These are shown as RGB inference model 310, MV inference model 312, and residual inference model 314. At 1102, 1102 the modality-specific feature information of each modality is processed by the respective modality-specific inference model 310, 2312, 314 to generate modality-specific inference information 334a, 334b, 334c of each modality. The modality-specific inference information 334a, 334b, 334c includes inference or prediction information sufficient to complete the inference task, such as logits or a normalized probability distribution across the classes of the inference task. At 1004, the modality-specific inference information 334a, 334b, 334c of all processed modalities is then fused by an inference fusion module 320 to generate inference information 330, such as a single logits or probability distribution across the classes of the inference task, to complete the inference task. In some embodiments, the inference fusion module 320 performs a simple fusion (such as averaging); in other embodiments, the inference fusion module 320 performs a more complex fusion operation, such as multiplicative log probs fusion (i.e. multiplying the logarithms of the probabilities of each modality). FIG. 3B shows an alternative architecture 222B of the adaptive inference software system 222A of FIG. 3A, whose operations are shown in FIG. 11B. In the alternative architecture 222B, the inference module 226B reverses the order of operations of the first example inference module 226A described with reference to FIG. 3A and FIG. 11A above. At 1152, the inputs 332a, 332b, 332c to the inference module 226B are first fused by a feature fusion module 350. At 1154, the fused feature information 356 generated thereby is processed by a single fused inference model 352 to generate the inference information 330, thereby completing the inference task. [0117] – [0119] and Fig. 3A-3B, Fig. 6).
Consider claim 21, Kezele teaches the motion compensation information comprises a motion vector and a residual (Fig. 1, Fig. 3A – Fig. 3B); and updating the compensation feature according to the first frame feature to obtain the second frame feature of the at least one predictive-frame (The second inter frame at t=2 is similarly decoded by transforming the reconstructed first inter frame at t=1 using the motion information 124 and residual information 126 decoded from the second inter frame encoding 116. [0060]. Decoding the residual information from the second inter frame encoding is considered to be updating the compensation feature. Because the residual information decoded from the second inter frame encoding is performed after the reconstruction of the inter frame at t=1 and inter frame at t=1 is reconstructed by transforming the image at t=0, the updating is according to the first frame feature. The inter frame at t=2 is considered to be second frame feature.), comprises: updating an initial vector feature of the motion vector and an initial residual feature of the residual respectively according to the first frame feature to obtain a target vector feature and a target residual feature (The second inter frame at t=2 is similarly decoded by transforming the reconstructed first inter frame at t=1 using the motion information 124 and residual information 126 decoded from the second inter frame encoding 116. [0060]. Decoding the residual information from the second inter frame encoding is considered to be updating the compensation feature. Because the residual information decoded from the second inter frame encoding is performed after the reconstruction of the inter frame at t=1 and inter frame at t=1 is reconstructed by transforming the image at t=0, the updating is according to the first frame feature. The inter frame at t=2 is considered to be second frame feature.); and determining the second frame feature of the at least one predictive-frame according to the target vector feature and the target residual feature (The second inter frame at t=2 is similarly decoded by transforming the reconstructed first inter frame at t=1 using the motion information 124 and residual information 126 decoded from the second inter frame encoding 116. [0060]. Decoding the residual information from the second inter frame encoding is considered to be updating the compensation feature. Because the residual information decoded from the second inter frame encoding is performed after the reconstruction of the inter frame at t=1 and inter frame at t=1 is reconstructed by transforming the image at t=0, the updating is according to the first frame feature. The inter frame at t=2 is considered to be second frame feature.).
Consider claim 22, Kezele teaches the motion compensation information of the at least one predictive-frame relative to the intra coding frame is determined based on following steps: respectively taking the at least one predictive-frame as a starting point to circularly determine a reference frame of a current frame forwards according to the time sequence (FIG. 1 shows the structure of an example compressed video 100. The compressed video 100 includes a plurality of frame encodings (shown as frame encodings 112, 114, 116, . . . 118, 120, . . . ) representative of a temporal sequence of frames, beginning with a first I-frame encoding 112 representative of a first I-frame at t=0, followed by an immediately subsequent first inter frame encoding 114 at t=1, followed by an immediately subsequent second inter frame encoding 116 at t=2, optionally followed by one or more additional inter frame encodings, followed by a second I-frame encoding 118 representative of a second I-frame at t=M, followed by an immediately subsequent further inter frame encoding 120 at t=M+1, optionally followed by one or more additional frame encodings. The plurality of frame encodings are segmented into one or more groups of pictures (GOPs), each of which may encompass a fixed or variable number of frame encodings, such as positive integer K number of frame encodings in GOP 1 102 shown in FIG. 1. The first GOP, GOP 1 102, includes the first I-frame encoding 112 and multiple (i.e., K−1) subsequent inter frame encodings (including first inter frame encoding 114 and second inter frame encoding 116) representative of inter frames subsequent to the first I-frame in the temporal sequence, and a second GOP, GOP 2 104, includes the second I-frame encoding 118 and multiple subsequent inter frame encodings (including further inter frame encoding 120) representative of inter frames subsequent to the second I-frame in the temporal sequence. As described above, each I-frame encoding 112, 118 includes image data 122 representative of a frame, and each inter frame encoding 114, 116, 120 includes motion information 124 and residual information 126 of the respective inter frame relative to one or more reference frames in the temporal sequence, which are used to generate the corresponding inter frame in combination with the one or more reference frames. In the present disclosure, the term “temporal information” may be used to refer to either or both of the motion information and/or residual information of a frame. In some examples, the motion information 124 of an inter frame encoding (such as 114, 116, or 120) includes a motion vector (MV) map of the corresponding frame relative to a reference frame, and the residual information 126 of the inter frame encoding includes a residual map of the corresponding frame relative to the reference frame. For example, the motion information 124 and residual information 126 of the first inter frame encoding 114 may include a motion vector (MV) map and a residual map used to define or generate the first inter frame relative to the first I-frame of the first I-frame encoding 112. Thus, in decoding the compressed video 100, a decoder may first decode GOP 1 102. The decoder will decode the image data 122 of the first I-frame encoding 112 and use the resulting frame (i.e. an RGB image) as the video frame at t=0. The decoder will then decode or generate the first inter frame at t=1 by decoding the motion information 124 and residual information 126 from the first inter frame encoding 114, then applying video decompression techniques to reconstruct the inter frame at t=1 by transforming the image at t=0 using the motion information 124 and residual information 126. The second inter frame at t=2 is similarly decoded by transforming the reconstructed first inter frame at t=1 using the motion information 124 and residual information 126 decoded from the second inter frame encoding 116. When a new GOP is encountered in the compressed video 100, such as GOP 2 104, the decoder begins the process again. The first frame encoding of the GOP is an I-frame encoding, such as second I-frame encoding 118 of GOP 2 104, and is decoded in the same manner as the first I-frame encoding 112, resulting in generation or decoding of a frame at t=K. Subsequent inter frames of the new GOP are decoded based on their respective previously decoded reference frames. [0059] – [0061]), and taking the reference frame as a new current frame until the reference frame is the intra coding frame ([0059] – [0061]); accumulating motion compensation information between the current frame and the reference frame in a circulation process ( [0059] – [0061]); and obtaining, when the circulation process is stopped, the motion compensation information of the at least one predictive-frame relative to the intra coding frame ([0059] – [0061]).
Consider claim 1, claim 1 recites the method implemented by the device recited in claim 14. Thus, it is rejected for the same reasons.
Consider claim 2, claim 2 recites the method implemented by the device recited in claim 17. Thus, it is rejected for the same reasons.
Consider claim 3, claim 3 recites the method implemented by the device recited in claim 18. Thus, it is rejected for the same reasons.
Consider claim 4, claim 4 recites the method implemented by the device recited in claim 19. Thus, it is rejected for the same reasons.
Consider claim 5, claim 5 recites the method implemented by the device recited in claim 20. Thus, it is rejected for the same reasons.
Consider claim 6, claim 6 recites the method implemented by the device recited in claim 21. Thus, it is rejected for the same reasons.
Consider claim 7, claim 7 recites the method implemented by the device recited in claim 22. Thus, it is rejected for the same reasons.
Consider claim 15, claim 15 recites a storage medium comprising computer-executable instructions (The device may include non-transitory memories that may store instructions for execution by the processor to carry out examples described in the present disclosure. [0070].), wherein the computer-executable instructions, when executed by a computer processor (The device 200 may include one or more processor devices, such as a processor. [0067]), are used to perform the feature extraction method for the video according to claim 1 (see rejection of claim 1 and 14).


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TAT CHI CHIO whose telephone number is (571)272-9563. The examiner can normally be reached Monday-Thursday 10am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, JAMIE J ATALA can be reached at 571-272-7384. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/TAT C CHIO/Primary Examiner, Art Unit 2486
Read full office action
Prosecution Timeline

Aug 09, 2024
Application Filed
Jan 16, 2026
Non-Final Rejection — §101, §102 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/669,986
Patent 12587653
Spatial Layer Rate Allocation
2y 5m to grant Granted Mar 24, 2026
18/219,800
Patent 12549764
THREE-DIMENSIONAL DATA ENCODING METHOD, THREE-DIMENSIONAL DATA DECODING METHOD, THREE-DIMENSIONAL DATA ENCODING DEVICE, AND THREE-DIMENSIONAL DATA DECODING DEVICE
2y 5m to grant Granted Feb 10, 2026
18/494,081
Patent 12549845
CAMERA SETTING ADJUSTMENT BASED ON EVENT MAPPING
2y 5m to grant Granted Feb 10, 2026
18/528,147
Patent 12546657
METHODS AND SYSTEMS FOR REMOTE MONITORING OF ELECTRICAL EQUIPMENT
2y 5m to grant Granted Feb 10, 2026
18/639,557
Patent 12549710
MULTIPLE HYPOTHESIS PREDICTION WITH TEMPLATE MATCHING IN VIDEO CODING
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
73%
Grant Probability
90%
With Interview (+16.6%)
3y 2m
Median Time to Grant
Low
PTA Risk
Based on 836 resolved cases by this examiner. Grant probability derived from career allow rate.
FEATURE EXTRACTION METHOD AND APPARATUS FOR VIDEO, SLICING METHOD AND APPARATUS FOR VIDEO, AND ELECTRONIC DEVICE AND STORAGE MEDIUM

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email