Last updated: April 18, 2026
Application No. 18/002,195
METHOD AND APPARATUS FOR COMPRESSION AND TRAINING OF NEURAL NETWORK

Final Rejection §103
Filed
Dec 16, 2022
Examiner
RAMESH, TIRUMALE K
Art Unit
2121
Tech Center
2100 — Computer Architecture & Software
Assignee
Intellectual Discovery Co. Ltd.
OA Round
2 (Final)
Interview Optional

— +2.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 40 resolved cases, 2023–2026
Examiner Intelligence

RAMESH, TIRUMALE K View full profile →
Grants only 18% of cases
Career Allow Rate
7 granted / 40 resolved
-37.5% vs TC avg
Minimal +2% lift
Without
With
+2.1%
Interview Lift
resolved cases with interview
Typical timeline
4y 5m
Avg Prosecution
40 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
30.7%
-9.3% vs TC avg
§103
59.1%
+19.1% vs TC avg
§102
3.7%
-36.3% vs TC avg
§112
5.4%
-34.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 40 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
(Submitted 1/2/2026)
-	The applicant argues with respect to the amended claims 1 and 9 that the Matlage and Bhagheri do not teach all required features of the claim. 
Examiner’s Response:
The examiner “disagrees” with the argument.  The examiner submits as result of new reference
“HAASE” teaching the portion of amendments (“ wherein the first layer parameter set is used to
construct a candidate list for the second layer parameter set, and wherein the second layer 
parameter set is decoded by identifying a candidate among a plurality of candidates in the 
candidate list, based on identifier information obtained from the bitstream and corresponding 
to the information on the at least one neural network access unit”),  the applicant arguments 
arguments with respect to claim(s) 1, and 9 have been considered MOOT because the 
new ground of rejection does not rely on any reference applied in the prior rejection of record 
for any teaching or matter specifically challenged in the argument.
-	Specifically, the applicant argues on Page 6 that reference MATLAGE does not teach constructing a list of layer parameters (as amended) and decoding layer parameter set to reference to previous layer .
Examiner’s Response:
The examiner interprets that the amended limitations “wherein the information on the at least
one neural network access unit includes a layer parameter set representing parameter
information of neural network layers, wherein the layer parameter set includes a first layer 
parameter set and a second layer parameter set to be decoded subsequent to the first layer 
parameter set” for the similarity measure between the first layer parameter and second layer 
parameter set. The examiner states that argument is MOOT as a result of new reference 
“Zhuang” teaching the similarity measure.
-	The applicant argues on Page 7 compressed NN model referring to spec [0094] specifically providing the context of structural similarity that may exist between the layers constituting the neural network model (for example, various similarities may exist, such as the number of sub-layers included in a layer, a type of sub-layers, the number of neurons constituting a layer, and the similarity of connections).
Examiner’s Reponses:
Once again, the examiner states that this aspect of the applicant’s argument is MOOT as a result of new reference “Zhuang” teaching within the context of a similarity measure.
-	The applicant argues on Page 6 with respect to amended claim that that reference MATLAGE does not teach parsing identifier information as required by the claim 1.
Examiner’s Reponses:
	First, the examiner observes that the no where in the specification the term “parsing” is stated. In view of the new reference “HAASE” used for teaching the amendment, this argument is MOOT.
-	The applicant argues on Page 6 with respect to amended claim that that reference MATLAGE does not teach “ decoding a bitstream received in a compressed form”   as required by the claim 1. The claim recites “ wherein the layer parameter set includes a first layer parameter set and a second layer parameter set to be decoded subsequent to the first layer parameter set”.
Examiner’s Reponses:
	First, examiner does not see that claim 1 specifically brings in a term “compression” term in the amended claim. However, the new reference “Zhuang” does teach compression as an encoding process (See [0008]) that is subsequently decoded. 
In CONCLUSION, the examiner rejects the claims 1-9 under 103 and MOVE the application to FINAL REJECTION under 103.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-9 are rejected under 35 U.S.C. 103 as being unpatentable over
Stefan MATLAGE et al. (hereinafter MATLAGE) US 2022/0222541 A1,
		[Foreign Priority: EP 19200928.0, Filed: 2019-10-01]
In view of  Zhongfeng  Zhuang et al. (hereinafter Zhuang) US 2020/0050941 A1.
In view of  Paul HAASE et al. (hereinafter HAASE) US 2022/0393986 A1 [Foreign Priority:
		EP 19218862.1 Filed: 2019-12-20]
In regard to claim 1: (Currently Amended)
MATLAGE discloses:
-	A neural network-based signal processing method, the method comprising:
In [0084]:
NNs that feature data-channel specific operations, e.g. a layer of an image-processing NN whose operations can be executed separately per, e.g., colour-channel in a parallel fashion
In [0089]:
as illustrated in FIG. 3 for an exemplary image-processing NN with a clear association between entries, i.e. the weights 32, of the parameter matrices, i.e. the parameter tensor 30, and samples 100.sub.2 and color channels 100.sub.1. 
in [0092]:
 FIG. 5 shows an example for a single-output-channel convolutional layer, e.g., for a picture and/or video analysing application. Color images have multiple channels, typically one for each color channel, such as red, green, and blue. From a data perspective, that means that a single image provided as input to the model is, in fact, three images.
In [0176]:
when the chosen serialization 100.sub.1 results in sub-layers 240 being image color channel specific and this allowing for data channel-wise parallelization of decoding/inference, this should be indicated in the bitstream 45 to a client
-	receiving a bitstream including information on a neural network model, the bitstream including at least one neural network access unit; 
In [0015]:
an apparatus for encoding neural network parameters, which represent a neural network, into a data stream, so that the neural network parameters are encoded into the data stream in a manner quantized onto quantization indices, and the neural network parameters are encoded into the data stream so that neural network parameters in different neural network portions of the neural network are quantized differently, wherein the apparatus is configured to provide the data stream indicating, for each of the neural network portions, a reconstruction rule for dequantizing neural network parameters relating to the respective neural network portion.
(BRI: an apparatus is a NN access unit)
In [0016]:
the apparatus is configured to decode from the data stream, for each of the neural network portions, a reconstruction rule for dequantizing neural network parameters relating to the respective neural network portion.
In [0089]:
 In a further embodiment, as shown in FIG. 4, the bitstream, i.e. the data stream 45, specifies the order 104 in which the encoder 40 traversed the NN parameters 32, e.g., layers, neurons, tensors, while encoding so that the decoder 50 can reconstruct the NN parameters 32 accordingly while decoding,
In [0141]:
the data stream 45 is structured into individually accessible portions 200, each portion 200 representing a corresponding NN portion, e.g. one or more NN layer or portions of a NN layer, of the neural network, 
In [0138]:
Accessing subsets of bitstreams is vital in many applications, e.g. to parallelize the layer processing, or package the bitstream into respective container formats. One way in the state-of-the-art for allowing such access, for instance, is breaking coding dependencies after the parameter tensors 30 of each layer 210 and inserting start codes into the model bitstream, i.e. data stream 45, before each of the layer bitstreams
In [0070]:
It was found, that in the current activities of coded representations of NN such as developed in the ongoing MPEG activity on NN compression, it can be beneficial to separate a model bitstream representing parameter tensors of multiple layers into smaller sub-bitstreams that contain the coded representation of the parameter tensors of individual layers, i.e. layer bitstreams,
In [0071]:
various examples are described which may assist in achieving an effective compression of a neural network, NN, and/or in improving an access to data representing the NN and thus resulting in an effective transmission/updating of the NN.
In [0090] :
encoding parameters along different dimensions may benefit the resulting compression performance since the entropy coder may be able to better capture dependencies among them
-	obtaining information on the at least one neural network access unit from the bitstream;
In [0016]:
an apparatus for  decoding neural network parameters, which represent a neural network, from a data stream, wherein the neural network parameters are encoded into the data stream in a manner quantized onto quantization indices, and the neural  network parameters are encoded into the data stream so that neural network parameters in different neural network portions of the neural network are quantized differently, wherein the apparatus is configured to decode from the data stream, for each of the neural network portions, 
(BRI: obtaining information on the at least one neural network access unit from the bitstream" is a step in the decoding process) . 
-	 and reconstructing the neural network model based on the information on the at least one neural network access unit.  
In [0020]:
the NN parameters can be encoded using individual coding orders dependent on the application scenario of the NN and the decoder can reconstruct the NN parameters accordingly while decoding, because of the information provided by the serialization parameter. The NN parameters might represent entries of one or more parameter matrices or tensors, wherein the parameter matrices or tensors might be used for inference procedures. It was found that the one or more parameter matrices or tensors of the NN can be efficiently reconstructed by a decoder based on decoded NN parameters and the serialization parameter.
In [0016]:
the neural network are quantized differently, wherein the apparatus is configured to decode from the data stream, for each of the neural network portions, a reconstruction rule for dequantizing neural network parameters relating to the respective neural network portion.
	
MATLAGE does not explicitly disclose:
-	wherein the information on the at least one neural network access unit includes a layer parameter set representing parameter information of neural network layers, wherein the layer parameter set includes a first layer parameter set and a second layer parameter set to be decoded subsequent to the first layer parameter set, 
	However, Zhuang discloses:
-	wherein the information on the at least one neural network access unit includes a layer parameter set representing parameter information of neural network layers, wherein the layer parameter set includes a first layer parameter set and a second layer parameter set to be decoded subsequent to the first layer parameter set, 
In [0005]:
A closely-related problem is distance metric learning. It is often desirable that the feature representation of observed data has the property that similar observations have similar features, i.e., that such observations are clustered in the feature space while the representations of dissimilar observations are more distantly separated. In distance metric learning, the goal is therefore to learn a suitable distance metric based on a set of similar/dissimilar pairs of instances. 
(Note: The phrase [0005] strongly represents a relevancy to the invention as the prior art provide the details as related to the “distance” and phrase in [0005]  does not disclose the invention itself nor provide motivation for 103 rejection)
In [0019]:
It is an objective to learn an embedding whereby the pairs of fixed-length feature representations of corresponding samples of attributed sequence data have a smaller distance under the distance metric when labeled as similar 
In [Abstract]:
Machine learning systems and methods for embedding attributed sequence data. The attributed sequence data includes an attribute data part having a fixed number of attribute data elements and a sequence data part having a variable number of sequence data elements.
(BRI: a sequence of data is a “bitstream) 
In [0008]:
The system includes an attribute network module comprising a feedforward neural network configured to convert the attribute data part to an encoded attribute vector having a first predetermined number of attribute features, and a sequence network module comprising a recurrent neural network configured to convert the sequence data part to an encoded sequence vector having a second predetermined number of sequence features. The attribute network module and the sequence network module may be operatively coupled such that, in use, the machine learning system is configured to learn and output a fixed-length feature representation of input attributed sequence data which encodes dependencies between different attribute data elements in the attribute data part, dependencies between different sequence data elements in the sequence data part, and dependencies between attribute data elements and sequence data elements within the attributed sequence data.
In [0051] :
The term ‘attributed sequence’ is used throughout this specification to refer to any data sample, such as the e-commerce interaction data 202, 204, which comprises associated attribute and sequence records. More particularly, an attributed sequence                 
                    
                        
                            J
                        
                        
                            k
                        
                    
                     
                
            comprising a fixed-length attribute vector                  
                    
                        
                            x
                        
                        
                            k
                        
                    
                     
                
             and a variable-length sequence                 
                    
                        
                            S
                        
                        
                            k
                        
                    
                     
                
            may be denoted                 
                    
                        
                            J
                        
                        
                            k
                        
                    
                
            = (                
                    
                        
                            x
                        
                        
                            k
                        
                    
                     
                
            ,                 
                    
                        
                            S
                        
                        
                            k
                        
                    
                
            ).
(BRI: Perhaps as known to a POSITA, an encoding a sequence does represent compression of a sequence by assigning a variable-length codes to input characters effectively reducing the total number of bits used for representing the sequence)
In [0040]:
referring to a range of possible implementations of devices, apparatus and systems comprising a combination of hardware and software. This includes single-processor and multi-processor devices and apparatus, including portable devices, desktop computers, and various types of server systems, including cooperating hardware and software platforms that may be co-located or distributed. Physical processors may include general purpose CPUs, digital signal processors, graphics processing units (GPUs), field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), and/or other hardware devices suitable for efficient execution of required programs and algorithms. 
In [0073]:
The learning processes are composed of a number of iterations, and the parameters are updated during each iteration based on the gradient computed. L.sup.τ.sub.A and L.sup.τ.sub.S denote the τ-th iteration of attribute network and sequence network, respectively.
In [0057]:
FIG. 5 is a schematic illustration of an attribute network 500 having a fixed number u of input attributes 502 comprising an input x.sub.k, an input layer 504, and a plurality of further layers, e.g., 506, 508. In particular, an attribute network 500 may comprise M layers, with d.sub.m hidden units and corresponding output V.sub.k.sup.(m) in the m-th layer (m=1 . . . M). The structure of the attribute network 500 may then be represented as:

    PNG
    media_image1.png
    133
    376
    media_image1.png
    Greyscale

In [0058]:
In Equation (1) δ is a nonlinear activation function, e.g., sigmoid, ReLU or tan h, W.sub.A.sup.(m) is a matrix of weight parameters, and b.sub.A.sup.(m) is a vector of bias parameters. In the case of a system configured to learn feature representations of attributed sequences in an unsupervised manner, i.e., in the absence of any labeled data identifying similar and/or dissimilar attributed sequences, it is convenient to define an alternative network size parameter M′ such that M=2M′, and to define the structure of the attribute network 500 as:

    PNG
    media_image2.png
    168
    378
    media_image2.png
    Greyscale

In [0014]:
 In a related embodiment of the training method, the multilayer feedforward neural network comprises an encoder having an encoder input layer which comprises the attribute data input layer and an encoder output layer which comprises the attribute vector output layer. The encoder further comprises a decoder having a decoder input layer coupled to the encoder output layer, and a decoder output layer which comprises a reconstructed estimate of an input to the encoder input layer. The first objective function may comprise a distance measure between the input to the encoder input layer and the reconstructed estimate. Training the multilayer feedforward neural network may then comprise iteratively performing steps of forward- and back-propagation with the attribute data part of the attributed sequence as input to the encoder input layer until the distance measure satisfies a first convergence target. The second objective function may comprise a likelihood measure of incorrect prediction of a next sequence item at each one of a plurality of training time steps of the LSTM network. Training the LSTM network may comprise iteratively repeating the plurality of training time steps until the likelihood measure satisfies a second convergence target. 
(BRI: 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine MATLAGE and Zhuang.
MATLAGE teaches signal processing method, compression and reconstruction.
Zhuang teaches similarity between encoder and decoder layers.
One of ordinary skill would have motivation to combine MATLAGE and Zhuang that effectively measures the similarity between sequences (Zhuang [0093]).
	MATLAGE and Zhuang do not explicitly disclose:
-	wherein the first layer parameter set is used to construct a candidate list for the second layer parameter set, and wherein the second layer parameter set is decoded by identifying a candidate among a plurality of candidates in the candidate list, based on identifier information obtained from the bitstream and corresponding to the information on the at least one neural network access unit.
However, HAASE discloses:
-	wherein the first layer parameter set is used to construct a candidate list for the second layer parameter set, and wherein the second layer parameter set is decoded by identifying a candidate among a plurality of candidates in the candidate list, based on identifier information obtained from the bitstream and corresponding to the information on the at least one neural network access unit.
In [0010]:
Another embodiment may have an apparatus for reconstructing neural network parameters, which define a neural network, configured to derive first neural network parameters for a first reconstruction layer to yield, per neural network parameter, a first-reconstruction-layer neural network parameter value, decode second neural network parameters for a second reconstruction layer from a data stream to yield,  per neural  network parameter, a second-reconstruction-layer neural network parameter value, and reconstruct the neural network parameters by, for each neural network parameter, combining the first-reconstruction-layer neural network parameter value and the second-reconstruction-layer neural network parameter value.
In [0290] :
the syntax for transmitting the quantization indexes of a layer includes a bin that specifies whether the quantization index is greater than zero or lower than zero, e.g. the beforementioned sign_flag. In other words, the bin indicates the sign of the quantization index. The selection of the probability model used depends on the quantization set (i.e., the set of reconstruction levels) that applies to the corresponding quantization index.
In [0293]:
 In embodiments, the described selection of probability models is combined with one or more of the following entropy coding aspects: 
In [0294]:
 The absolute values of the quantization indexes are transmitted using a binarization scheme that consists of a number of bins that are coded using adaptive probability models
In [0295]:
The probability model (as referred to a context) used for coding this bin is selected among a set of candidate probability models. The selected candidate probability model is not only determined by the quantization set (set of admissible reconstruction levels) or state variable for the current quantization index 56, but, in addition, it is also determined by already transmitted quantization indexes for the layer. In an embodiment, the quantization set (or state variable) determines a subset (also called context set) of the available probability models and the values of already coded quantization indexes determine the used probability model inside this subset (context set)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine MATLAGE, Zhuang and HAASE.
MATLAGE teaches signal processing method, compression and reconstruction.
Zhuang teaches similarity between encoder and decoder layers.
HASSE teaches to construct a candidate list for the second layer parameter set and selecting a candidate among a plurality of candidates in the candidate list. 
One of ordinary skill would have motivation to combine MATLAGE, Zhuang and HASSE that can provide an improved NN codec (HASSE [0189]).
In regard to claim 2: (Original)
MATLAGE discloses:
-	wherein the at least one neural network access unit includes a plurality of neural network layers.  
In [0019]:
The neuron interconnections might represent connections between neurons of different NN layers of the NN. In other words, a NN parameter might define a connection between a first neuron associated with a first layer of the NN and a second neuron associated with a second layer of the NN. A decoder might use the coding order to assign NN parameters serially decoded from the data stream to the neuron interconnections.
In [0062]:
The neural network model may include one or more nodes or one or more layers to define the relationship between the input value and the output value. In the training process of the neural network model, a relationship between nodes (example: weight) or relationship between layers may be varied
In regard to claim 3: (Currently Amended) 
MATLAGE discloses:
-	wherein the information on the at least one neural network access unit further includes at least one of model information specifying the neural network model
In [0027]:
it may be advantageous to encode/decode into/from a data stream a type parameter indicting a parameter type of the NN parameters. The type parameter may indicate whether the NN parameters represent weights or bias. The data stream is structured into one or more individually accessible portions, each individually accessible portion representing a corresponding NN layer of the NN.  
In regard to claim 4: (Original)
MATLAGE discloses:
-	 wherein the layer information includes at least one of a number, a type, a location, an identifier, an arrangement order, a priority, whether to skip compression, node information of the neural network layers, or whether there is a dependency between the neural network layers.  
In [0029]:
apparatus is configured to encode/decode into/from the data stream, for each of the one or more predetermined individually accessible sub-portions, a start code at which the respective predetermined individually accessible sub-portion begins, and/or a pointer pointing to a beginning of the respective predetermined individually accessible sub-portion, and/or a data stream length parameter indicating a data stream length of the respective predetermined individually accessible sub-portion for skipping the respective predetermined individually accessible sub-portion in parsing the DS. 
In regard to claim 5: (Original)
MATLAGE discloses:
-	wherein the model parameter set includes at least one of a number of the neural network layers, entry point information specifying a starting position in the bitstream corresponding to the neural network layers, quantization information used for compression of the neural network model, or type information of the neural network layers.  
In [0029]:
The start code, the pointer and/or the data stream length parameter enable an efficient access to the predetermined individually accessible sub-portions. This is especially beneficial for applications that may rely on grouping NN parameter within a NN layer in a specific configurable fashion as it can be beneficial to have the NN parameter decoded/processed/inferred partially or in parallel.
In [0029]:
 based on the finding, that an amount of data per NN layer, i.e. individually accessible portion, is usually less than in case NN layers are to be detected by start codes within the whole data stream. 
In [0031]:
The NN parameters are encoded into the data stream so that NN parameters in different NN portions of the NN are quantized differently, and the data stream indicates, for each of the NN portions, a reconstruction rule for dequantizing NN parameters relating to the respective NN portion. 
In regard to claim 6: (Original)
MATLAGE discloses:
-	wherein the entry point information is individually included in the model parameter set according to the number of the neural network layers.  
In [0019]:
 The serialization parameter indicates a coding order at which NN parameters, which define neuron interconnections of the NN, are encoded into the data stream. The neuron interconnections might represent connections between neurons of different NN layers of the NN. In other words, a NN parameter might define a connection between a first neuron associated with a first layer of the NN and a second neuron associated with a second layer of the NN. A decoder might use the coding order to assign NN parameters serially decoded from the data stream to the neuron interconnections.
In [0027]:
The type parameter may indicate whether the NN parameters represent weights or bias.
In [0027]:
 Each individually accessible sub-portion is completely traversed by a coding order before a subsequent individually accessible sub-portion is traversed by the coding order. Into each individually accessible sub-portion,
In regard to claim 7: (Currently Amended)
MATLAGE discloses:
-	wherein the layer parameter set includes at least one of a parameter type of a current neural network layer, a number of sub-layers of the current neural network layer, entry point information specifying a starting position in the bitstream corresponding to the sub-layers, quantization information used for compression of the current neural network layer 
In [0053] :
FIG. 14a shows a sub-layer access using pointer, according to an embodiment;
In [0054]:
 FIG. 14b shows a sub-layer access using start codes, according to an embodiment;
In [0058]:
FIG. 18 shows a determination of a reconstruction rule based on quantization indices representing quantized neural network parameter, according to an embodiment;
In regard to claim 8: (Original)
MATLAGE discloses:
-	 wherein the compressed neural network layer information includes at least one of weight information, bias information or normalization parameter information.  
In [0027]:
Similarly, it may be advantageous to encode/decode into/from a data stream a type parameter indicting a parameter type of the NN parameters. The type parameter may indicate whether the NN parameters represent weights or bias. 
In [0302]:
Further variants are depicted in FIG. 23, wherein an advanced version of the NN is created to compensate for a compression impact on the original NN by training in presence of the lossy compressed baseline NN variant. The advanced NN is inferred in parallel to the baseline NN and its NN parameter, e.g., weights, connect to the same neurons as the baseline NN. FIG. 23 shows, for example, a training of an augmentation NN based on a lossy coded baseline NN variant.
In regard to claim 9: (Currently Amended)
MATLAGE discloses:
-	A neural network-based signal processing apparatus, the apparatus comprising: a processor controlling the signal processing apparatus; 
	In [0397];
-	and a memory combined with the processor and storing data, 
	In [0398];
-	wherein the processor receives a bitstream including information on a neural network model, wherein the bitstream includes at least one neural network access unit, wherein the processor obtains information on the at least one neural network access unit from the bitstream, [[and]]
In [0138]:
Accessing subsets of bitstreams is vital in many applications, e.g. to parallelize the layer processing, or package the bitstream into respective container formats. One way in the state-of-the-art for allowing such access, for instance, is breaking coding dependencies after the parameter tensors 30 of each layer 210 and inserting start codes into the model bitstream, i.e. data stream 45, before each of the layer bitstreams
In [0070] :
in the current activities of coded representations of NN such as developed in the ongoing MPEG activity on NN compression, it can be beneficial to separate a model bitstream representing parameter tensors of multiple layers into smaller sub-bitstreams that contain the coded representation of the parameter tensors of individual layers, i.e. layer bitstreams
in [0016] : 
Yet another embodiment may have an apparatus for decoding neural network
 parameters, which represent a neural network, from a data stream, wherein the neural network parameters are encoded into the data stream in a manner quantized onto quantization indices, and the neural network parameters are encoded into the data stream so that neural network parameters in different neural network portions of the neural network are quantized differently, wherein the apparatus is configured to decode from the data stream, for each of the neural network portions, 
In [0071]:
various examples are described which may assist in achieving an effective compression of a neural network, NN, and/or in improving an access to data representing the NN and thus resulting in an effective transmission/updating of the NN.
In [0090] :
encoding parameters along different dimensions may benefit the resulting compression performance since the entropy coder may be able to better capture dependencies among them
-	and wherein the processor reconstructs the neural network model based on the information on the at least one neural network access unit.
In [0020]:
the NN parameters can be encoded using individual coding orders dependent on the application scenario of the NN and the decoder can reconstruct the NN parameters accordingly while decoding, because of the information provided by the serialization parameter. The NN parameters might represent entries of one or more parameter matrices or tensors, wherein the parameter matrices or tensors might be used for inference procedures. It was found that the one or more parameter matrices or tensors of the NN can be efficiently reconstructed by a decoder based on decoded NN parameters and the serialization parameter.
In [0016]:
the neural network are quantized differently, wherein the apparatus is configured to decode from the data stream, for each of the neural network portions, a reconstruction rule for dequantizing neural network parameters relating to the respective neural network portion.
	MATLAGE does not explicitly disclose:
-	wherein the information on the at least one neural network access unit includes a layer parameter set representing parameter information of neural network layers, wherein the layer parameter set includes a first layer parameter set and a second layer parameter set to be decoded subsequent to the first layer parameter set, wherein  the first layer parameter set is used to construct a candidate list for the second layer parameter set, 
However, Zhuang discloses:
-	wherein the information on the at least one neural network access unit includes a layer parameter set representing parameter information of neural network layers, wherein the layer parameter set includes a first layer parameter set and a second layer parameter set to be decoded subsequent to the first layer parameter set, 
In [0005]:
A closely-related problem is distance metric learning. It is often desirable that the feature representation of observed data has the property that similar observations have similar features, i.e., that such observations are clustered in the feature space while the representations of dissimilar observations are more distantly separated. In distance metric learning, the goal is therefore to learn a suitable distance metric based on a set of similar/dissimilar pairs of instances. 
(Note: The phrase [0005] strongly represents a relevancy to the invention as the prior art provide the details as related to the “distance” and phrase in [0005]  does not disclose the invention itself nor provide motivation for 103 rejection)
In [0019]:
It is an objective to learn an embedding whereby the pairs of fixed-length feature representations of corresponding samples of attributed sequence data have a smaller distance under the distance metric when labeled as similar 
In [Abstract]:
Machine learning systems and methods for embedding attributed sequence data. The attributed sequence data includes an attribute data part having a fixed number of attribute data elements and a sequence data part having a variable number of sequence data elements.
(BRI: a sequence of data is a “bitstream) 
In [0008]:
The system includes an attribute network module comprising a feedforward neural network configured to convert the attribute data part to an encoded attribute vector having a first predetermined number of attribute features, and a sequence network module comprising a recurrent neural network configured to convert the sequence data part to an encoded sequence vector having a second predetermined number of sequence features. The attribute network module and the sequence network module may be operatively coupled such that, in use, the machine learning system is configured to learn and output a fixed-length feature representation of input attributed sequence data which encodes dependencies between different attribute data elements in the attribute data part, dependencies between different sequence data elements in the sequence data part, and dependencies between attribute data elements and sequence data elements within the attributed sequence data.
In [0051] :
The term ‘attributed sequence’ is used throughout this specification to refer to any data sample, such as the e-commerce interaction data 202, 204, which comprises associated attribute and sequence records. More particularly, an attributed sequence                 
                    
                        
                            J
                        
                        
                            k
                        
                    
                     
                
            comprising a fixed-length attribute vector                  
                    
                        
                            x
                        
                        
                            k
                        
                    
                     
                
             and a variable-length sequence                 
                    
                        
                            S
                        
                        
                            k
                        
                    
                     
                
            may be denoted                 
                    
                        
                            J
                        
                        
                            k
                        
                    
                
            = (                
                    
                        
                            x
                        
                        
                            k
                        
                    
                     
                
            ,                 
                    
                        
                            S
                        
                        
                            k
                        
                    
                
            ).
(BRI: Perhaps as known to a POSITA, an encoding a sequence does represent compression of a sequence by assigning a variable-length codes to input characters effectively reducing the total number of bits used for representing the sequence)
In [0040]:
referring to a range of possible implementations of devices, apparatus and systems comprising a combination of hardware and software. This includes single-processor and multi-processor devices and apparatus, including portable devices, desktop computers, and various types of server systems, including cooperating hardware and software platforms that may be co-located or distributed. Physical processors may include general purpose CPUs, digital signal processors, graphics processing units (GPUs), field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), and/or other hardware devices suitable for efficient execution of required programs and algorithms. 
In [0073]:
The learning processes are composed of a number of iterations, and the parameters are updated during each iteration based on the gradient computed. L.sup.τ.sub.A and L.sup.τ.sub.S denote the τ-th iteration of attribute network and sequence network, respectively.
In [0057]:
FIG. 5 is a schematic illustration of an attribute network 500 having a fixed number u of input attributes 502 comprising an input x.sub.k, an input layer 504, and a plurality of further layers, e.g., 506, 508. In particular, an attribute network 500 may comprise M layers, with d.sub.m hidden units and corresponding output V.sub.k.sup.(m) in the m-th layer (m=1 . . . M). The structure of the attribute network 500 may then be represented as:

    PNG
    media_image1.png
    133
    376
    media_image1.png
    Greyscale

In [0058]:
In Equation (1) δ is a nonlinear activation function, e.g., sigmoid, ReLU or tan h, W.sub.A.sup.(m) is a matrix of weight parameters, and b.sub.A.sup.(m) is a vector of bias parameters. In the case of a system configured to learn feature representations of attributed sequences in an unsupervised manner, i.e., in the absence of any labeled data identifying similar and/or dissimilar attributed sequences, it is convenient to define an alternative network size parameter M′ such that M=2M′, and to define the structure of the attribute network 500 as:

    PNG
    media_image2.png
    168
    378
    media_image2.png
    Greyscale

In [0014]:
 In a related embodiment of the training method, the multilayer feedforward neural network comprises an encoder having an encoder input layer which comprises the attribute data input layer and an encoder output layer which comprises the attribute vector output layer. The encoder further comprises a decoder having a decoder input layer coupled to the encoder output layer, and a decoder output layer which comprises a reconstructed estimate of an input to the encoder input layer. The first objective function may comprise a distance measure between the input to the encoder input layer and the reconstructed estimate. Training the multilayer feedforward neural network may then comprise iteratively performing steps of forward- and back-propagation with the attribute data part of the attributed sequence as input to the encoder input layer until the distance measure satisfies a first convergence target. The second objective function may comprise a likelihood measure of incorrect prediction of a next sequence item at each one of a plurality of training time steps of the LSTM network. Training the LSTM network may comprise iteratively repeating the plurality of training time steps until the likelihood measure satisfies a second convergence target. 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine MATLAGE and Zhuang.
MATLAGE teaches signal processing method, compression and reconstruction.
Zhuang teaches similarity between encoder and decoder layers.
One of ordinary skill would have motivation to combine MATLAGE and Zhuang that effectively measures the similarity between sequences (Zhuang [0093]).

MATLAGE and Zhuang do not explicitly disclose:
-	and wherein the second layer parameter set is decoded by identifying a candidate among a plurality of candidates in the candidate list, based on identifier information obtained from the bitstream and corresponding to the information on the at least one neural network access unit.
However, HAASE discloses:
-	and wherein the second layer parameter set is decoded by identifying a candidate among a plurality of candidates in the candidate list, based on identifier information obtained from the bitstream and corresponding to the information on the at least one neural network access unit.
In [0010]:
Another embodiment may have an apparatus for reconstructing neural network parameters, which define a neural network, configured to derive first neural network parameters for a first reconstruction layer to yield, per neural network parameter, a first-reconstruction-layer neural network parameter value, decode second neural network parameters for a second reconstruction layer from a data stream to yield,  per neural  network parameter, a second-reconstruction-layer neural network parameter value, and reconstruct the neural network parameters by, for each neural network parameter, combining the first-reconstruction-layer neural network parameter value and the second-reconstruction-layer neural network parameter value.
In [0290] :
the syntax for transmitting the quantization indexes of a layer includes a bin that specifies whether the quantization index is greater than zero or lower than zero, e.g. the beforementioned sign_flag. In other words, the bin indicates the sign of the quantization index. The selection of the probability model used depends on the quantization set (i.e., the set of reconstruction levels) that applies to the corresponding quantization index.
In [0293]:
 In embodiments, the described selection of probability models is combined with one or more of the following entropy coding aspects: 
In [0294]:
The absolute values of the quantization indexes are transmitted using a binarization scheme that consists of a number of bins that are coded using adaptive probability models
In [0295]:
The probability model (as referred to a context) used for coding this bin is selected among a set of candidate probability models. The selected candidate probability model is not only determined by the quantization set (set of admissible reconstruction levels) or state variable for the current quantization index 56, but, in addition, it is also determined by already transmitted quantization indexes for the layer. In an embodiment, the quantization set (or state variable) determines a subset (also called context set) of the available probability models and the values of already coded quantization indexes determine the used probability model inside this subset (context set)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine MATLAGE, Zhuang and HAASE.
MATLAGE teaches signal processing method, compression and reconstruction.
Zhuang teaches similarity between encoder and decoder layers.
HASSE teaches to construct a candidate list for the second layer parameter set and selecting a candidate among a plurality of candidates in the candidate list. 
One of ordinary skill would have motivation to combine MATLAGE and HASSE that can provide an improved NN codec (HASSE [0189]).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the
examiner should be directed to TIRUMALE KRISHNASWAMY RAMESH whose telephone number is (571)272-4605. The examiner can normally be reached by phone.
Examiner interviews are available via telephone, in-person, and video conferencing
using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B
Zhen can be reached on phone (571-272-3768). The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be
obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit:
https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for
information about filing in DOCX format. For additional questions, contact the Electronic
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO
Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/TIRUMALE K RAMESH/Examiner, Art Unit 2121                                                                                                                                                                                                        

/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121
Read full office action
Prosecution Timeline

Dec 16, 2022
Application Filed
Dec 16, 2022
Response after Non-Final Action
Sep 24, 2025
Non-Final Rejection — §103
Jan 02, 2026
Response Filed
Mar 25, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

16/739,694
Patent 12518153
TRAINING MACHINE LEARNING SYSTEMS
2y 5m to grant Granted Jan 06, 2026
17/136,054
Patent 12293284
META COOPERATIVE TRAINING PARADIGMS
2y 5m to grant Granted May 06, 2025
17/064,561
Patent 12229651
BLOCK-BASED INFERENCE METHOD FOR MEMORY-EFFICIENT CONVOLUTIONAL NEURAL NETWORK IMPLEMENTATION AND SYSTEM THEREOF
2y 5m to grant Granted Feb 18, 2025
17/039,178
Patent 12131244
HARDWARE-OPTIMIZED NEURAL ARCHITECTURE SEARCH
2y 5m to grant Granted Oct 29, 2024
16/844,335
Patent 11803745
TERMINAL DEVICE AND METHOD FOR ESTIMATING FIREFIGHTING DATA
2y 5m to grant Granted Oct 31, 2023
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
18%
Grant Probability
20%
With Interview (+2.1%)
4y 5m
Median Time to Grant
Moderate
PTA Risk
Based on 40 resolved cases by this examiner. Grant probability derived from career allow rate.
METHOD AND APPARATUS FOR COMPRESSION AND TRAINING OF NEURAL NETWORK

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email