DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments filed 11/19/2025 have been fully considered but they are not persuasive.
Applicant argues that Kim does not explicitly teach each of said multiple analyses corresponds to a distinct machine-learning task.
In response, the examiner respectfully disagrees. Kim teaches the processing of multiple videos/images/pictures according to the embodiments means a multi-task process, multi-task neural network analysis, and multi-task machine learning. The extracted features of FIG. 45 represent feature data related to multi-tasks and tasks related to each other. The method/device according to the embodiments of FIGS. 1 to 5, 30, 31, and 35 may perform video compression and decompression not only for a single task but also for multi-tasks. [0631].
FIG. 31 illustrates a structure for extracting, encoding, and decoding features according to embodiments. FIG. 31 illustrates a structure in which the methods/devices according to the embodiments of FIGS. 1 to 5 and 30 encode and decode a video/image. Each element in FIG. 31 may correspond to hardware, software, a processor, and/or a combination thereof. Embodiments present a prediction process and a bitstream generation method required to compress an activation (feature) map generated in a hidden layer of a deep neural network. The input data provided to the neural network goes through computation processes of several hidden layers, and the computation results from the respective hidden layers are presented as a feature map with various sizes and channels depending on the type of a neural network in use and the locations of the hidden layers within the neural network. FIG. 31 shows a feature extraction network to extract intermediate layer activation (feature) map data of the neural network from images or videos, an encoding apparatus to compress the extracted features, a decoding apparatus to reconstruct the features, and a task network to receive the features as input and perform a task. Images and/or videos, which are the input data according to the embodiments, may be in the form of RGB, YUV or the like. The feature extractor network of FIG. 31 may be considered as a set of successive hidden layers from the input to the neural network. The output data 201 may be defined as feature data extracted by the feature extractor network from an image or video frame. The encoding apparatus of FIGS. 1 to 5 and 30 may compress the feature and output the compressed feature in the form of a bitstream (202). The decoding apparatus of FIGS. 1 to 5 and 30 may reconstruct a feature from bitstream (203) and transmit the reconstructed feature to a network capable of performing a task. The methods presented in the present document may be applied to the encoding apparatus and the decoding apparatus of FIG. 31. FIG. 32 illustrates a feature map principal component analysis procedure according to embodiments. FIG. 32 illustrates a process of acquiring, compressing, and reconstructing a feature map in the coding process of the method/device according to the embodiments of FIGS. 1 to 5 and 30. For example, embodiments describe a method for expressing feature data 201 in a lower dimension and reconstruct the feature expressed in the lower dimension back to the original dimension using a technique of principal component analysis (PCA). [0427] – [0440].
Applicant argues that Kim does not explicitly teach said bitstream comprises separate task-specific layers and dependency flags identifying inter-task reference tensors.
In response, the examiner respectfully disagrees. Kim teaches the method/device according to the embodiments of FIGS. 1 to 5, 30, 31, and 35 may acquire and extract a feature map, which is a set of features for video/image input data, in the intervals of Time 0 and Time 1. In addition, in order to use the correlation between the two feature maps, a difference value (residual) between the two feature maps may be generated. Factor data (mean, projected component, etc. as a factor) may be generated from the residual feature map.
FIG. 44 illustrates a configuration of signaling information for factor extraction using temporal redundancy according to embodiments.
The methods/devices according to the embodiments of FIGS. 1 to 5, 30, 31, and 35 may extract a factor using temporal redundancy, and generate and transmit/receive signaling information as shown in FIG. 44.
FIG. 44-(a) shows definitions of signaling information needed to use temporal redundancy. The information may be encoded in different levels (sequence, group, frame) according to the coding unit.
The methods/devices according to the embodiments of FIGS. 1 to 5, 30, 31, and 35 may generate and transmit/receive the following information.
For example, Factor_prediction indicates whether factor extraction and encoding using temporal redundancy is applied.
Factor_prediction_method indicates a method applied to factor extraction and encoding using temporal redundancy. For example, the method includes linear factor models such as PCA and Sparse.
Factor_Reference_List indicates a factor list for using temporal redundancy.
For example, when the feature maps for Time 0, Time 1, Time 2, . . . , and Time N are sequentially encoded, the factor data is coded because there is no factor to be referenced for Time 0. The factor data may be added to and stored in the list. For example, data of Time® is stored in Factor_referenece_list[0].
When encoding Time 1, there is factor data to be referenced. When it is efficient to use the factor data already stored, information that may signal that Factor_referenece_list[0] is used may be encoded and indicated. An example is the reference index.
When it is efficient to independently encode factor data, factor data may be independently encoded without using the information of Time 0 and the corresponding information may be added to the list.
Reference_Index specifies information for signaling a factor used in the factor list.
Factor_Prediction_Data_coding specifies information needed for factor decoding.
FIG. 44-(b) shows an example of PCA application. Not only PCA but also various methods such as a linear factor model, sparse coding, and ICA may be used according to embodiments, and the description of FIG. 44 may be applied. FIG. 44-(b) is an example of removing temporal redundancy at the factor level. As shown in the figure, as a factor of Factor_Prediction_Data_coding, the factor used as a reference and the current factor information may be delivered together. The operation of Factor_Prediction_Data_coding depends on Factor_prediction_method.
When FeaturePredMode is, for example, PCA, factor coding and factor prediction are performed, and a factor prediction method is a factor, Factor_Prediction_Data_coding according to Factor_Reference_List[Reference_Index], mean_feature, principal_components, projected, and the like may be delivered.
Alternatively, in order to increase the coding/compression performance by reducing the amount of additional information to be transmitted/received, the current feature coding information may carry Factor_Prediction_Data_coding according to projected only. In this case, the video reception method/device according to the embodiments may use the information received through the feature coding information of the previous level/unit and the currently received Factor_Prediction_Data_coding (projected) to efficiently perform feature reconstruction. [0607] – [0624].
FIG. 47 illustrates a single feature map and a feature pyramid network according to embodiments. FIG. 47 compares the features used in the single feature map scheme (FIG. 47-(a)) according to the embodiment with the features used in the FPN scheme (FIG. 47-(b)). Video/image/image data 47000 may be composed of one or more layers. A layer may be some data of the video data. The layers may have various sizes. For example, layer data may include more specific data in a process of enlarging a portion of the video data. Since the size and resolution of the video data for each layer may be hierarchical, a pyramid structure may be formed. The single feature map scheme (FIG. 47(a)) according to the embodiments performs feature coding on a feature of a single layer among multiple layers. In the FPN scheme (FIG. 47-(b)) according to the embodiments, a feature for each of several layers among the multiple layers may be feature-coded. Features generated in a network having such a structure have different resolutions (width, height), and therefore independent predictive coding (PCA) may be performed for each layer.
FIG. 48 illustrates feature extraction and calculation based on a feature pyramid network according to embodiments. FIG. 48 illustrates a process in which the method/device according to the embodiments of FIGS. 1 to 5, 30, 31, and 35 extracts features in the FPN scheme of FIG. 47 and performs a PCA calculation. The method/device according to the embodiments may acquire video/image data for multiple layers from input video/image data, and extract feature data from each layer. Various feature values, for example, data p0, p1, and p2 may be extracted. As shown in FIGS. 32, 35, 36, 39, 41, 43, 45, 48, etc., the method/device according to the embodiments may generate prediction data, for example, a mean feature, principal components, a transform coefficient, and the like from each feature. [0641] – [0650].
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1-2, 5, 7-8, 13, and 19 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Kim et al. (US 2023/0156212 A1).
Consider claim 2, Kim teaches an apparatus, comprising: a processor ([0069]), configured to: generate a plurality of tensors of feature maps from multiple analyses of at least one image portion ([0439] – [0446]), wherein each of said multiple analyses corresponds to a distinct machine-learning task ([0427] – [0440], [0631]); and encode said plurality of tensors into a bitstream ([0455] – [0465]), wherein said bitstream comprises separate task-specific layers ([0607] – [0624], [0641] – [0650]) and dependency flags identifying inter-task reference tensors ([0607] – [0624], [0641] – [0650]).
Consider claim 1, claim 1 recites the method implemented by the apparatus recited in claim 2. Thus, it is rejected for the same reasons.
Consider claim 5, Kim teaches each tensor of feature maps is input to a different synthesis stage, for performing a given task ([0081], [0087], [0098], [0417], [0421], [0424] – [0426], [0432] – [0437], [0471] – [0472], [0487], [0631], [0635] – [0640], [0658] – [0664]).
Consider claim 7, Kim teaches tensors are compressed using predictive coding ([0455] – [0465], [0638]).
Consider claim 8, Kim teaches predictive coding comprises transmitting encoded residuals between different tensors ([0106], [0111], [0159], [0499] – [0503], [0549] – [0552], [0657]).
Consider claim 19, Kim teaches predictive coding comprises transmitting encoded residuals between different tensors ([0106], [0111], [0159], [0499] – [0503], [0549] – [0552], [0657]).
Consider claim 13, data content generated by a method, the method comprising… is a product by process claim limitation where the product is the data content and the process is the method steps to generate the bitstream. MPEP §2113 recites “Product-by-Process claims are not limited to the manipulations of the recited steps, only the structure implied by the steps”. Thus, the scope of the claim is the computer readable medium storing the data content (with the structure implied by the method steps). The structure includes the information and samples manipulated by the steps.
“To be given patentable weight, the printed matter and associated product must be in a functional relationship. A functional relationship can be found where the printed matter performs some function with respect to the product to which it is associated”. MPEP §2111.05(I)(A). When a claimed “computer-readable medium merely serves as a support for information or data, no functional relationship exists. MPEP §2111.05(III). The storage medium storing the claimed bitstream in claim 13 merely services as a support for the storage of the bitstream and provides no fictional relationship between the stored bitstream and storage medium. Therefor the structure bitstream, which scope is implied by the method steps, is non-functional descriptive material and given no patentable weight. MPEP §2111.05(III). Thus, the claim scope is just a storage medium storing data and is anticipated by Kim which recites a storage medium storing a bitstream ([0068], [0075], [0111], [0357], [0413], [0419]).
Alternatively, if it is found that patentable weight should be given to the process recited in claim 13, claim 13 is rejected as follows:
Kim teaches a non-transitory computer readable medium containing data content generated according to claim 1 (see rejection for claim 1), for playback using a processor ([0068], [0075], [0111], [0357], [0413], [0419]).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 2023/0156212 A1) in view of Babaheidarian (US 2021/0150769 A1) (hereinafter “Bab”).
Consider claim 10, Kim teaches all the limitations in claim 2 but does not explicitly teach said bitstream comprises multi-view video coding.
Bab teaches said bitstream comprises multi-view video coding ([0049]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the known multi-view video coding standard to generate bitstream because such incorporation would allow the bitstream to conform to known video coding standards. [0049].
Claim(s) 6 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 2023/0156212 A1) in view of Kim et al. (US 2023/0082561 A1) (hereinafter “Kim II”).
Consider claim 6, Kim teaches all the limitations in claim 5 but does not explicitly teach said different synthesis stages are optimized for said given task.
Kim II teaches said different synthesis stages are optimized for said given task ([0080], [0105], [0137] – [0139], [0155] – [0157], [0168], [0171] – [0173], [0186], [0193], [0255] – [0264], [0295], [0371] – [0372]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the known technique of optimizing different synthesis stages for a given task because such incorporation would improve encoding efficiency. [0098].
Consider claim 17, Kim II teaches said different synthesis stages are optimized for said given task ([0080], [0105], [0137] – [0139], [0155] – [0157], [0168], [0171] – [0173], [0186], [0193], [0255] – [0264], [0295], [0371] – [0372]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the known technique of optimizing different synthesis stages for a given task because such incorporation would improve encoding efficiency. [0098].
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TAT CHI CHIO whose telephone number is (571)272-9563. The examiner can normally be reached Monday-Thursday 10am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, JAMIE J ATALA can be reached at 571-272-7384. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/TAT C CHIO/ Primary Examiner, Art Unit 2486