DETAILED ACTION
Notice of Pre-AIA or AIA Status
This is in response to application no. 18/747,587 filed on June 19, 2024. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objection
Claim 9 is objected to due to the following informality:
Claim 9 erroneously use the word “base”, instead it should be “based”.
In claim 12 (line 3) the word “aving” contains a typographical error. Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-5, 7-10 and 12 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites “A method for encoding features into a video frame, the video frame being partitionable into a plurality of subpictures...” The use of the term “into” in the claim creates ambiguity as to how the claim should be interpreted because, “encoding features into a video frame” has multiple reasonable meanings. For instance, it could be interpreted as converting features by encoding into a video frame or inserting features into the frame, thereby renders the claim indefinite.
In claim 1 the use of a term “partitionable” in the claim creates ambiguity as to whether partitioning is required or it is an optional feature since it does not indicate whether actual partitioning occurs, thereby renders the claim indefinite.
Claim 2 depended upon claim 1 recites “wherein features are extracted as an output of each layer.” It is unclear whether the claimed element “features” in the claim is the same as “a plurality of features” as recited in claim 1 or it is a different claim element. If it is the same claim element as in the claim 1, it should refer to the previously introduced claim element by stating (e.g., “the plurality of features are extracted…”), or if it is a new claim element, a different term should be used, thereby renders the claim indefinite.
Claims 7 and 12 recite “at least one frame partitioned with a plurality of subpictures”. The use of the phrase “partitioned with” makes the claim ambiguous since it is unclear whether the claim should be interpreted as: the frame is partitioned into a plurality of subpictures or it should be understood differently, thereby renders the claim indefinite.
Dependent claims 3-5 and 8-10 are rejected based on their dependency from the rejected claims 1 and 7.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1-5, 7-10 and 12 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Ikonin et al. (US 20230353764 A1).
Regarding claim 1, Ikonin teaches the limitations of claim 1 as follows:
A method for encoding features into a video frame, the video frame being partitionable into a plurality of subpictures, comprising (See Figs. 1, 8-9, 10-10B, 11, ¶0101, 0172-0174, 0199-0201: the video encoder 20 as shown in FIG. 7A may be further configured to partition and/or encode the picture using slices (also referred to as video slices), wherein a picture may be partitioned into or encoded using one or more slices (typically non-overlapping), and each slice may comprise one or more blocks (e.g. CTUs)): processing an image to extract a plurality of features (¶0207-0209: FIG. 11 illustrates an exemplary implementation, in which the feature map 1110 is a dense optical flow of motion vectors with a width W and a height H...the output (L1-L3) of each layer is a feature map with a gradually lower resolution. The input to L1 is the dense optical flow 1110…Each square in the L1 output (bottom right of FIG. 11) corresponds to a motion vector obtained by downsampling (downspl4) from the sixteen motion vectors of the dense optical flow...Then the output L1 of the first layer is input to the second layer (downspl2). An output L2 feature map element of the second layer is determined from four elements of L1…); representing each of the image features as a two-dimensional feature unit (Figs. 10A-10B, 11: 2x2, 4x4 arrays corresponding to the feature maps); grouping the feature units into at least one subpicture of the frame (¶0201: each (neighboring 2×2 square) four elements of an array 1010 are grouped and used to determine one element in array 1020. ¶0209, 0211-0212: each element of feature map with a lower resolution may also be determined by a group consisting of any other number of elements of the feature map with the next higher resolution); and encoding the video frame into a bitstream (¶0172-0175, 0224-0225, 0231: the video encoder 20 as shown in FIG. 7A may be configured to encode the picture 17 block by block, e.g. the encoding and prediction is performed per block 203…The video decoder 30 is configured to receive encoded picture data 21 (e.g. encoded bitstream 21), e.g. encoded by encoder 20, to obtain a decoded picture 331. ¶0210-0211, 0224: Groups of elements may be arranged in a square shape as in the example of FIG. 11…This shape may be signaled within the bitstream 1150).
Regarding claim 2, Ikonin teaches the method of claim 1, wherein the processing of an image includes a convolutional neural network (CNN) having a plurality of processing layers and wherein features are extracted as an output of each layer (¶0095-0096, 0201, 0205-0209: convolutional operations are used for the downsampling in some or all of the layers. FIG. 11 shows an example for the outputs (L1-L3) of different layers in the contracting path on the right hand side).
Regarding claim 3, Ikonin teaches the method of claim 2, wherein the grouping step includes selecting feature units based on at least one of (1) features representing similar spatial characteristics, (2) features that represent similar object types, (3) features that are extracted using the same filters, (4) features from spatially neighboring regions; (5) features from the same layer of the CNN, and (6) features that relate to a specific task on the decoder side (¶0210-0211, 0213, 0322: Signal selection module 1100 of the motion segmentation net 1140 selects the above mentioned motion vectors (elements of feature maps from the outputs of the first and second layer) and provides them to the bitstream 1150. ¶0225: In the example of signaling motion vectors, groups of similar motion vectors may be signaled by one common motion vector due to the downsampling) .
Regarding claim 4, Ikonin teaches the method of claim 1, wherein parameters of the feature units in the at least one subpicture are signaled in the bitstream (¶0210, 0213-0214: Signal selection module 1100 of the motion segmentation net 1140 selects the above mentioned motion vectors (elements of feature maps from the outputs of the first and second layer) and provides them to the bitstream 1150…and instead or in addition to motion vectors, other data may be processed, such as prediction modes, prediction directions, filtering parameters, or even spatial picture information (samples) or depth information or the like).
Regarding claim 5, Ikonin teaches the method of claim 4, wherein the parameters include at least one of: (1) a flag that signals if feature units are present; (2) the number of feature units in the subpicture; (3) the position and dimensions of each feature unit, in sequence; and (4) a feature unit type identifier (¶0222, 0229-0230: Feature map elements (motion vectors) of three of the four parts are signaled, and, correspondingly, the flags are set to 1. The remaining one part of the feature map L2 does not include motion vector(s) and the flag is thus set to 0).
Regarding claim 7, Ikonin teaches the limitations of claim 7 as follows:
A method for decoding a video signal, the method comprising: receiving an encoded bistream having at least one frame partitioned with a plurality of subpictures (See Figs. 1, 8-9, 10-10B, 11, ¶0173-0176: the video encoder 20 as shown in FIG. 7A may be further configured to partition and/or encode the picture using slices (also referred to as video slices), wherein a picture may be partitioned into or encoded using one or more slices (typically non-overlapping), and each slice may comprise one or more blocks (e.g. CTUs)… The video decoder 30 is configured to receive encoded picture data 21 (e.g. encoded bitstream 21), e.g. encoded by encoder 20, to obtain a decoded picture 331), the subpictures having a plurality of feature units arranged therein (FIGS. 11, 26, ¶0326-0327, 0364-0368: obtaining 3310, from the bitstream, two or more sets of feature map elements, wherein each set of feature map elements relates to a (respective) feature map); identifying at least one subpicture having a plurality of feature units spatially arranged therein (¶0364-0366: FIG. 26 illustrates an exemplary segmentation information for three-layer decoding. The segmentation information may be seen as selecting (cf. the encoder-side description) layers for which feature map elements are to be parsed or otherwise obtained…in FIG. 26, in feature map 2620, out of the four feature map elements that are used to determine the feature map element of feature map 2610, three feature map elements are selected for signaling (indicated by flags 2621, 2622 and 2624) while one feature map element 2623 is not selected); reconstructing a sequence of feature units from spatially arranged feature units in the subpicture (¶0330-0332, 0363-0364: obtaining 3330 said decoded data for picture or video processing as a result of the processing by the plurality of cascaded layers).
Regarding claim 8, Ikonin teaches the method of claim 7, wherein the reconstructing further comprises ordering the feature units based on a predetermined mapping (¶0189, 0329-0330: The method further includes obtaining 3330 said decoded data for picture or video processing as a result of the processing by the plurality of cascaded layers. For example, the first set is a latent feature map element set which is processed by all layers of the network. The second set is an additional set provided to another layer. When referring to FIG. 9, the decoded data 911 is obtained after processing the first set by the three layers 953, 952, and 951 (in this order)).
Regarding claim 9, Ikonin teaches the method of claim 7, wherein the reconstructing further comprises ordering the feature units base on information signalled in the encoded bitstream (¶02111, 0369-0372: Groups of elements may be arranged in a square shape as in the example of FIG. 11. However, the groups may also be arranged in any other shape like, for instance, a rectangular shape wherein the longer sides of the rectangular shape may be arranged in a horizontal or in a vertical direction… This shape may be signaled within the bitstream 1150, too. The signaling may be implemented by a map of flags indicating which feature elements belong to the shape and which do not).
Regarding claim 10, Ikonin teaches the method of claim 7 wherein each subpicture comprising the frame has at least one feature unit (¶0207-0208, 0364-0367: feature map elements; motion vectors).
Regarding claim 12, Ikonin teaches the limitations of claim 12 as follows:
A hybrid video decoder comprising: a demultiplexor, the demultiplexor receiving an encoded bistream having a video substream and a feature substream (¶0165-0167, 0175-0176: The video decoder 30 is configured to receive encoded picture data 21 (e.g. encoded bitstream 21), e.g. encoded by encoder 20, to obtain a decoded picture 331. The encoded picture data or bitstream comprises information for decoding the encoded picture data, e.g. data that represents picture blocks of an encoded video slice (and/or tile groups or tiles) and associated syntax elements.¶0316, 0320-0323: encoder provides a bitstream which includes for the selected layer feature data and/or segmentation information. Correspondingly, the decoder processes the data received from the bitstream in multiple layers). Note that since the recited “demultiplexor” merely receives a bitstream and does not impose structural limitation. Thus, the decoder configured to receive encoded picture data in Ikonin reference meets the limitation. the feature substream having at least one frame partitioned with a plurality of subpictures (See Figs. 1, 8-9, 10-10B, 11, ¶0173-0176: the video encoder 20 as shown in FIG. 7A may be further configured to partition and/or encode the picture using slices (also referred to as video slices), wherein a picture may be partitioned into or encoded using one or more slices (typically non-overlapping), and each slice may comprise one or more blocks (e.g. CTUs)… The video decoder 30 is configured to receive encoded picture data 21 (e.g. encoded bitstream 21), e.g. encoded by encoder 20, to obtain a decoded picture 331), the subpictures having a plurality of feature units arranged therein (FIGS. 11, 26, ¶0326-0327, 0364-0368: obtaining 3310, from the bitstream, two or more sets of feature map elements, wherein each set of feature map elements relates to a (respective) feature map); a video decoder receiving the video substream and providing video output for a human viewer (¶0175, 0478, 0480: The video decoder 30 is configured to receive encoded picture data 21 (e.g. encoded bitstream 21), e.g. encoded by encoder 20, to obtain a decoded picture 331. ¶0478-0480: The display device 34 of the destination device 14 is configured to receive the post-processed picture data 33 for displaying the picture, e.g. to a user or viewer); a feature decoder, the feature decoder receiving the feature substream (¶0323-0327: The method comprises obtaining 3310, from the bitstream, two or more sets of feature map elements, wherein each set of feature map elements relates to a (respective) feature map. ¶0361-0362: On the receiving side the decoder of this embodiment performs parsing and interpretation of segmentation information. Accordingly, a method is provided, as illustrated in FIG. 34, for decoding data for picture or video processing from a bitstream), the feature decoder: identifying at least one subpicture having a plurality of feature units spatially arranged therein (¶0364-0366: FIG. 26 illustrates an exemplary segmentation information for three-layer decoding. The segmentation information may be seen as selecting (cf. the encoder-side description) layers for which feature map elements are to be parsed or otherwise obtained…in FIG. 26, in feature map 2620, out of the four feature map elements that are used to determine the feature map element of feature map 2610, three feature map elements are selected for signaling (indicated by flags 2621, 2622 and 2624) while one feature map element 2623 is not selected); and reconstructing a sequence of feature units from spatially arranged feature units in the subpicture (¶0330-0332, 0363-0364: obtaining 3330 said decoded data for picture or video processing as a result of the processing by the plurality of cascaded layers).
The following are the prior arts made of record and not relied upon are considered pertinent to applicant's disclosure.
Chao et al. (US 20200065251 A1) describes “a processing method for a convolution& neural network and a system thereof…a memory-adaptive processing method for a convolutional neural network and a system thereof.” ¶0002
Misra et al. (US 20220321906 A1) describes “coding multi-dimensional data and more particularly to techniques for performing padding.”¶0002
Kim et al. (US 20210377554 A1) describes “image encoding and decoding, and more particularly, to an apparatus and method for performing artificial intelligence (AI) encoding/decoding on an image.”¶0002
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NATHNAEL AYNALEM whose telephone number is (571)270-1482. The examiner can normally be reached M-F 9AM-5:30 PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, SATH PERUNGAVOOR can be reached at 571-272-7455. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/NATHNAEL AYNALEM/Primary Examiner, Art Unit 2488