Last updated: April 19, 2026
Application No. 18/547,446
REINFORCEMENT LEARNING FOR SON PARAMETER OPTIMIZATION

Non-Final OA §102§103
Filed
Aug 22, 2023
Examiner
SCHLACK, SCOTT A
Art Unit
2418
Tech Center
2400 — Computer Networks
Assignee
Nokia Solutions and Networks Oy
OA Round
1 (Non-Final)
Interview Optional

— +34.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 52 resolved cases, 2023–2026
Examiner Intelligence

SCHLACK, SCOTT A View full profile →
Grants 44% of resolved cases
Career Allow Rate
23 granted / 52 resolved
-13.8% vs TC avg
Strong +35% interview lift
Without
With
+34.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 10m
Avg Prosecution
37 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
0.6%
-39.4% vs TC avg
§103
65.8%
+25.8% vs TC avg
§102
16.7%
-23.3% vs TC avg
§112
16.7%
-23.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 52 resolved cases
Office Action

§102 §103
DETAILED ACTION
	This Office Action is responsive to the claims filed on: 08/22/2023. 
Claims 141-160 are pending for Examination.
Claims 1-140 were cancelled by preliminary amendment.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statements
The information disclosure statement (IDS) submitted on: 12/27/2023 is determined to be compliance with the provisions of 37 CFR 1.97.  Accordingly, this IDS is being considered by the Examiner.
Claim Interpretation – Alternative Claim Language
The claims of the instant application are given their Broadest Reasonable Interpretation (BRI) using the plain meaning of the claim language in light of the specification, as it would be understood by one of ordinary skill in the art.  Accordingly, the BRI of an alternative claim limitation or term can be determined to be the least-limiting interpretation, consistent with the specification. In this context, the term “or” by plain meaning can be interpreted to alternatively be: one or the other (i.e., A or B), but not both (i.e., not A and B). The term “and/or” by plain meaning can be interpreted to be: “and” or alternatively “or,” but not both, as this would not make sense. In this context, the forward-slash “/” is equivalent to the alternative “or.” Likewise, the alternative terms “at least one of,” “one or more of,” and the like, followed by multiple alternative claim limitations can be reasonably interpreted to be only “one of” a group of alternative claim limitations.
Prior art disclosing any one of multiple alternative claim limitations discloses matter within the scope of the claimed invention. "When a claim covers several structures or compositions, either generically or as alternatives, the claim is deemed anticipated if any of the structures or compositions within the scope of the claim is known in the prior art." Brown v. 3M, 265 F.3d 1349, 1351, 60 USPQ2d 1375, 1376 (Fed. Cir. 2001) (claim to a system for setting a computer clock to an offset time to address the Year 2000 (Y2K) problem, applicable to records with year date data in "at least one of two-digit, three-digit, or four-digit" representations, was held anticipated by a system that offsets year dates in only two-digit formats). See MPEP 2131.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 (or as subject to pre-AIA  35 U.S.C. 102) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention. 
Claims 141-144, 149, 154-155, 157-158, and 160 are rejected under 35 U.S.C. 102(a)(2) as being unpatentable in view of US PG Pub. 2022/0394531 A1, Jeong et al. (hereinafter “Jeong”).
	With respect to claim 141, Jeong teaches: 
A method comprising:
receiving at least one network performance indicator of a communication network from at least one cell in the network (paras. [0005], [0008], [0102]-[0104], and [0108]-[0119] —KPIs, i.e., live KPIs, can be received from one or more network cells; KPIs 610 of Figs. 6, 10, and 11; and network nodes 542 of network 540); 
determining a reward for the at least one cell in the network based on the at least one network performance indicator (paras. [0005], [0043], [0076], [0102], [0146], and [0176] —a reward/loss value can be determined for one or more network cells based on KPI values; blocks 1210 of Fig. 12, 1702 of Fig. 17, and 1902 of Fig. 19); and 
determining whether to modify at least one self-organizing network parameter of the at least one cell in the network to change the at least one network performance indicator or an average value of the reward, based in part on the determined reward (paras. [0005], [0088], [0108]-[0122], and [0149]; 1706 of Fig. 17, and 1904 of Fig. 19 —a configurable parameter, i.e., a RET antenna tilt parameter, of SON BS optimization solution can be modified/adjusted based on the KPI(s) and a determined reward/loss value —the alternative term “or” only requires examination on-the-merits of a single claimed alternative for the reasons explained above in the Claim Interpretation — Alternative Claim Language section).  

With respect to claim 142, Jeong teaches: 
An apparatus comprising:
at least one processor; and 
at least one non-transitory memory including computer program code (paras. [0007] and [0125]; and network node 542 with processor 1403 and memory 1405 of Fig. 14); 
wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to: 
receive at least one network performance indicator of a communication network from at least one cell in the network (paras. [0005], [0008], [0102]-[0104], and [0108]-[0119] —KPIs, i.e., live KPIs, can be received from one or more network cells; KPIs 610 of Figs. 6, 10, and 11; and block 1900 of Fig. 19); 
determine a reward for the at least one cell in the network based on the at least one network performance indicator (paras. [0005], [0043], [0076], [0102], [0146], and [0176] —a reward/loss value can be determined for one or more cells based on KPI values; blocks 1210 of Fig. 12, 1702 of Fig. 17, and 1902 of Fig. 19); and 
determine whether to modify at least one self-organizing network parameter of the at least one cell in the network to change the at least one network performance indicator or an average value of the reward, based in part on the determined reward (paras. [0005], [0088], [0108]-[0122], and [0149]; 1706 of Fig. 17, and 1904 of Fig. 19 —a configurable parameter, i.e., a RET antenna tilt parameter, of SON BS optimization solution can be modified/adjusted based on the KPI(s) and a determined reward/loss value —the alternative term “or” only requires examination on-the-merits of a single claimed alternative for the reasons explained above in the Claim Interpretation — Alternative Claim Language section).  

With respect to claim 143, Jeong teaches: 
The apparatus of claim 142, wherein the at least one self-organizing network parameter is related to at least one of:
an antenna tilt of at least one antenna in the network; 
an electrical antenna tilt of the at least one antenna in the network; 
a parameter related to a multiple input multiple output antenna; 
a mobility parameter; 
a cell individual offset; or 
a time to trigger (paras. [0088], and [0108]-[0122]; and Fig. 1 —a configurable parameter of a SON BS optimization solution can be associated with a degree of RET antenna tilt —the terms “at least one of” and “or” only require examination on-the-merits of a single claimed alternative for the reasons explained above in the Claim Interpretation — Alternative Claim Language section).  

With respect to claim 144, Jeong teaches: 
The apparatus of claim 143, wherein the at least one antenna is at least one antenna of a base station in the network (paras. [0088], and [0108]-[0122]; and Fig. 1—the configurable antenna tilt parameter of the SON BS optimization solution can be that of a network BS, as depicted in Fig. 1 —the terms “at least one of” and “or” only require examination on-the-merits of a single claimed alternative for the reasons explained above in the Claim Interpretation — Alternative Claim Language section).  

With respect to claim 149, Jeong teaches: 
The apparatus of claim 142, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to:
determine a state of the network using at least one selected one of the at least one network performance indicator (paras. [0058], [0061], and [0075]-[0077]; block 707 of Fig. 7, and Figs. 10 and 11 —a network state of can be associated with signal RSRP, quality, interference, and/or cell load, cell overlap, etc. state(s) of one or more cells in a network that can be determined by received KPIs 610 —network policy modeling can learn from such input data: (state, action, reward) trajectories.  

With respect to claim 157, Jeong teaches: 
The apparatus of claim 142, wherein the reward is determined with at least one initialized value (paras. [0062] and [0076]; and blocks 250 of Fig. 2 and 705 of Fig. 7 —a reward can be determined with an initial pretrained value —also, a reward can be weighted through iteration having a previous reward value as an initial value, as depicted in Fig. 2).  

With respect to claim 158, Jeong teaches the apparatus of claim 157, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to:
generate a simulator of the network that approximates at least one self-organizing network parameter of the network (paras. [0058], [0088], [0098], and [0120]-[0121]; and Fig. 18 —a generated policy training model can approximate/estimate a SON RET antenna tilt parameter); and 
connect the simulator off-line within a closed-loop with a reinforcement learning agent to converge the reinforcement learning agent to the at least one initialized value (paras. [0096]-[0098], [0127], and [0131]-[0132]; and Figs 2, 10, and 15 —a policy model can be trained using a baseline data set, i.e., block 1500, of initialized values for iterative, closed-loop model training that is offline, i.e., block 1502, as depicted in Figs. 2 and 10 —during policy training baseline values can be binned offline).  

With respect to claim 160, this claim recites similar features to independent claims 141 and 142, except claim 160 is directed to a non-transitory storage device readable by a machine (memory 516 with program code 518 of Fig. 5 or memory 1405 with program code 1413 of Fig. 14). As such, claim 160 is likewise rejected under U.S.C. 102(a)(2) based on Jeong, for the same reasons explained above for independent claims 141 and 142.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim 145 is rejected under 35 U.S.C. 103 as being unpatentable over Jeong in view of US PG Pub 2020/0076520 A1, Jana et al. (hereinafter “Jana”).
With Respect to Claim 145, Jeong teaches the apparatus of claim 142, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to evaluating at least one performance indicator prior to determining a loss/reward value, where the network performance indicator is associated with a measurement recorded in a simulation round with the at least one SON parameter of the at least one cell in the network set to a static value (paras. [0005], [0043], [0076], [0102], [0146], and [0176]; and Fig. 19 —a reward/loss value can be determined for one or more network cells 1902 after determining one or more training KPIs associated with measurement values of a live telecommunications network 1900 —the SON cell parameter can be associated with a SON BS antenna tilt adjustment, as depicted in Figs. 1-2).
However, Jeong does not explicitly teach:
normalizing the at least one performance indicator using a cumulative distribution function with a sample mean and sample standard deviation of at least one measurement recorded in a simulation round.
Jana does teach:
normalizing a performance indicator using a cumulative distribution function with a sample mean and sample standard deviation of at least one measurement recorded in a simulation round (paras. [0049], [0066]-[0075], and [0093]-[0094], and Figs. 2G-3 —a CDF can be applied to normalize KPI measurement data considering both mean and SD of measured sample data, as these are widely-known statistical modeling techniques).
It would have been prima-facie obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to have modified Jeong’s KPI evaluation prior to reward adjustment/modeling with KPI value normalization via statistical CDF, as taught by Jana.  
The motivation for doing so would have been to apply known statistical normalization techniques, i.e., mean and SD, to more accurately model KPI data in order to improve ML model training, as recognized by Jana (paras. [0049], [0066]-[0075], and [0093]-[0094], and Figs. 2G-3).

Claims 146-148 are rejected under 35 U.S.C. 103 as being unpatentable over Jeong in view of US PG Pub 2022/0264330 A1, Xie (hereinafter “Xie”).
With respect to claim 146, Jeong teaches: 
The apparatus of claim 142, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to:
determine a current physical resource block utilization based on the received at least one network performance indicator (paras. [0043], [0067]-[0068]; block 230 of Fig. 2 and Fig. 4 —received KPIs, i.e., DOF KPIs, can be utilized to determine a present network capacity, coverage, and/or quality, which is interpreted to be associated with a physical resource block utilization of a network BS/cell, i.e., in terms of RET DOF, depicted in Fig. 1); 
decrease the at least one self-organizing network parameter in response to the current physical resource block utilization being less than a physical resource block utilization of a RL policy, or increase the at least one self-organizing network parameter in response to the current physical resource block utilization being greater than a physical resource block utilization of an RL policy (paras. [0043], [0058], [0067]-[0071], and [0076]; Fig. 1, block 240 of Fig. 2, and Fig. 4 —a SON parameter can be a degree of RET antenna tilt, which can be increased or decreased based on model training and reward/loss variation —the term “or” only requires examination on-the-merits of a single claimed alternative for the reasons explained above in the Claim Interpretation — Alternative Claim Language section).  
However, Jeong does not explicitly teach:
determine an optimal physical resource block utilization based on the reward; and
determine a difference between the current physical resource block utilization and the optimal physical resource block utilization; 
Xie does teach:
determining an optimal physical resource block utilization based on a reward (paras. [0131]-[0132] —a thresholded/optimal PRB utilization determination can be based on the reward function); and
determine a difference between a current physical resource block utilization/state and the optimal physical resource block utilization/state (paras. [0131]-[0132] —a difference between optimal to current state of PRB utilization determination can be made for iterative reward calculation);
It would have been prima-facie obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to have modified Jeong’s SON parameter adjustment, i.e., antenna tilt, solution based on reward-based policy training, with iterative RL evaluation of differences between actual and ideal PRB utilization, as taught by Xie.  
The motivation for doing so would have been to train a policy model with adjusted reward data in terms of PRB utilization, as recognized by Xie (paras. [0131]-[0132]).

With respect to claim 147, Jeong in view of Xie teaches: 
The apparatus of claim 146, wherein the at least one self-organizing network parameter is a tilt of at least one antenna in the network (Jeong: paras. [0005], [0088], [0108]-[0122], and [0149]; Fig. 1, and 240 of Fig. 2 —the SON parameter can be a RET antenna tilt parameter of BS optimization solution).  

With respect to claim 148, Jeong teaches the apparatus of any claim 146.
However, Jeong does not explicitly teach:
 wherein the optimal physical resource block utilization is determined through estimating the reward for a plurality of discrete quantized levels of physical resource block utilization for the at least one cell.  
Xie does teach: 
determining an optimal physical resource block utilization by estimating a reward for a plurality of discrete quantized levels of PRB utilization for the at least one cell (paras. [0074]-[0080] and [0131]-[0133]; and Fig. 8 —through an iterative state determination such as PRB utilization levels, i.e., a discrete quantization, reward estimations can converge to an optimal PRB utilization value after a maximum number of iterations is reached, as depicted in Fig. 8).  
It would have been prima-facie obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to have modified Jeong’s SON parameter adjustment solution based on reward-based policy training, with iterative state determinations of PRB utilization to converge on an ideal PRB utilization, as taught by Xie.  
The motivation for doing so would have been to train a policy model with iterative, adjusted reward data to more quickly, and autonomously converge on an optimal PRB utilization for a network cell(s), as recognized by Xie (paras. [0074]-[0080] and [0131]-[0133]; and Fig. 8).

Claims 150 and 159 are rejected under 35 U.S.C. 103 as being unpatentable over Jeong in view of US PG Pub. 2020\0382361 A1, Chandrasekhar et al. (hereinafter “Chandrasekhar”).
With respect to claim 150, Jeong teaches the apparatus of claim 149, including determining the network state. 
However, Jeong does not explicitly teach:
determining a normalized number of active users connected to the at least one cell.
Chandrasekhar does teach:
determining a normalized number of active users connected to the at least one cell (paras. [0126] and [0135] —a given normalized number of active DL/UL users, i.e., UEActiveDLAvg and/or UEActiveULAvg of a particular cell geographic area can be determined and binned for ML training —the term “or” only requires examination on-the-merits of a single claimed alternative for the reasons explained above in the Claim Interpretation — Alternative Claim Language section).  
It would have been prima-facie obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to have modified Jeong’s network state determination indicating load-based KPI with determination of a number of active users in a cell at a given time, as taught by Chandrasekhar.  
The motivation for doing so would have been to more accurately determine network load in terms of a number of active users, as recognized by Chandrasekhar (paras. [0126] and [0135]).

With respect to claim 159, Jeong teaches the apparatus of claim 142 where a SON parameter adjustment is associated with antenna tilt, as depicted in Figs. 1 and 2.
However, Jeong does not explicitly teach:
increase a tilt of at least one antenna in the network when a physical resource block utilization should be decreased, and 
decrease the tilt of the at least one antenna in the network when the physical resource block utilization should be increased.  
Chandrasekhar does teach:
increase a tilt of at least one antenna in the network when a physical resource block utilization should be decreased, and decrease the tilt of the at least one antenna in the network when the physical resource block utilization should be increased (paras. [0080], [0117], and [0136] —when PRB utilization should be increased tilting an antenna upward will increase coverage are and utilization, whereas to decrease PRB utilization tilting an antenna downward decreases coverage area and utilization).  
It would have been prima-facie obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to have modified Jeong’s SON parameter adjustment for RET in terms of traffic/PRB utilization, as taught by Chandrasekhar.  
The motivation for doing so would have been to specify antenna tilt adjustment in terms of desired traffic/PRB utilization adjustment, as recognized by Chandrasekhar (paras. [0080], [0117], and [0136]).	

Claims 151-155 are rejected under 35 U.S.C. 103 as being unpatentable over Jeong in view of Xie, in further view of US PG Pub. 2022/0264331 A1, Arnott et al., (hereinafter “Arnott”).
With respect to claim 151, Jeong teaches the apparatus of claim 149.
Jeong in view of Xie teaches the subject matter of claim 148, and the corresponding obviousness rationale and reference citation for this analogous subject matter of claim 151 is the same as that provided above for claim 148.
However, Jeong and Xie do not explicitly teach:   
determining, with a probability epsilon, a value for the at least one self-organizing parameter that maximizes the reward for the at least one cell among a set of possible values for the self-organizing parameter, based on the state of the network; and 
determining, with probability one minus epsilon, the value for the at least one self-organizing parameter.
Arnott does teach:
determining, with a probability epsilon, a value for the at least one self-organizing parameter that maximizes the reward for the at least one cell among a set of possible values for the self-organizing parameter, based on the state of the network (paras. [0036]-[0039], [0099]-[0100], and [0177]-[0178]; and Figs. 6, 9B, and 11 —a statistical probability epsilon, i.e., an epsilon-greedy strategy/algorithm can be applied to a SON parameter based on a network state in order to maximize reward); and 
determining, with probability one minus epsilon, the value for the at least one self-organizing parameter (paras. [0038], [0099]-[0100], and [0178] —parameter determination can be made with a probability of 1 minus epsilon —this is a well-known statistical  modeling algorithm, i.e., the epsilon-greedy strategy, in the field of reinforcement learning (RL)).
It would have been prima-facie obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to have modified Jeong and Xie’s iterative, optimal PRB utilization determinations to include a statistical probably epsilon strategy, as taught by Arnott.  
The motivation for doing so would have been to improve determination of SON parameters in RL by employing a probability epsilon strategy, as recognized by Arnott (paras. [0038], [0099]-[0100], and [0178]).

With respect to claim 152, Jeong in view of Xie and Arnott teaches: 
The apparatus of claim 151, wherein the at least one self-organizing network parameter is a tilt of at least one antenna in the network, and the set of possible values is a set of possible antenna tilts (Jeong: paras. [0005], [0088], and [0120]-[0122]) —the SON parameter can be an RET antenna tilt that is adjustable within a set of tilt degrees, as depicted in 240 of Fig. 2).  

With respect to claim 153, Jeong teaches the apparatus of claim 149, and running a gradient-descent based optimization on a objective with a parameterized policy.
Jeong in view of Xie also teach the subject matter of claim 148, and the corresponding obviousness rationale and reference citation for this analogous subject matter of claim 153 is the same as that provided above for claim 148.
However, Jeong and Xie do not explicitly teach:   
determining, with a probability epsilon, a value for the at least one self-organizing parameter that maximizes the reward for the at least one cell among a set of possible values for the self-organizing parameter, based on the state of the network;
a predicted reward determined using a neural network trained with gradient descent; and
determining, with probability one minus epsilon, the value for the at least one self-organizing parameter.
Arnott does teach:
determining, with a probability epsilon, a value for the at least one self-organizing parameter that maximizes the reward for the at least one cell among a set of possible values for the self-organizing parameter, based on the state of the network (paras. [0036]-[0039], [0099]-[0100], and [0177]-[0178]; and Figs. 6, 9B, and 11 —a statistical probability epsilon, i.e., an epsilon-greedy strategy/algorithm can be applied to a SON parameter based on a network state in order to maximize reward); 
determining a predicted reward using a neural network trained with gradient descent (paras. [0095]-[0097]; and Fig. 8 —the weighed reward values can be determined/updated using a stochastic gradient of descent).
determining, with probability one minus epsilon, the value for the at least one self-organizing parameter (paras. [0038], [0099]-[0100], and [0178] —parameter determination can be made with a probability of 1 minus epsilon —this is a well-known statistical  modeling algorithm, i.e., the epsilon-greedy strategy, in the field of reinforcement learning (RL)).
It would have been prima-facie obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to have modified Jeong and Xie’s iterative, optimal PRB utilization determinations to include a statistical probably epsilon strategy and a gradient-descent optimization, as taught by Arnott.  
The motivation for doing so would have been to improve determination of SON parameters in RL by employing known statistical strategies, as recognized by Arnott (paras. [0038], [0099]-[0100], and [0178]).

With respect to claim 154, Jeong in view of Xie and Arnott teaches: 
The apparatus of claim 153, wherein the at least one self-organizing network parameter is a tilt of at least one antenna in the network, and the set of possible values is a set of possible antenna tilts (Jeong: paras. [0005], [0088], and [0120]-[0122]) —the SON parameter can be an RET antenna tilt that is adjustable within a set of tilt degrees, as depicted in 240 of Fig. 2).  

With respect to claim 155, Jeong in view of Xie teaches: 
The apparatus of claim 153, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to train the neural network using a target vector corresponding to an action taken by the at least one cell (Jeong: paras. [0043], [0054], [0138], and [0170]; and Figs. 2, 12, and 16 —the neural network 520 can be configured/trained with a training weight vector corresponding to an action related to a tilt adjustment 1600 taken by a cell, as depicted in Fig. 2).
However, Jeong in view of Xie does not explicitly teach:
the target vector having been overwritten with the determined reward.
Arnott does teach:
a target action having been overwritten with the determined reward (paras. [0046], [0088], and [0092]-[0099]; and Figs. 6 and 8 —the training action(s) can be  overwritten/updated with a target determined weight/reward, as depicted in Fig. 8).  
It would have been prima-facie obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to have modified Jeong and Xie’s neural network training with an action of a cell to include overwriting a target vector with a determined reward, as taught by Arnott.  
The motivation for doing so would have been to explicitly teach the iterative training mechanism including overwriting of trained data, as recognized by Arnott (paras. [0046], [0088], and [0092]-[0099]; and Figs. 6 and 8).

Claim 156 is rejected under 35 U.S.C. 103 as being unpatentable under Jeong in further view of US PG Pub. 2023/0116202 A1, Mendo Mateo et al. (hereinafter “Mendo Mateo”).
With respect to claim 156, Jeong teaches the apparatus of claim 142, where a weighted reward can be calculated based on information of both a first cell and neighboring cell (paras. [0073], [0096], and [0139]; and Figs. 11 and 12 —the cell and neighboring cell data can include interference, load, overlap metrics between the neighboring cells, and this reward value can be weighted).
However, Jeong does not explicitly teach the reward being a weighted average for a reward for each of a cell and its neighboring cell.
Mendo Mateo does teach:
reward being a weighted average for a reward for each of a cell and its neighboring cell (paras. [0077], [0092]-[0094], [0109], and [0112] ; and Figs. 8 and 10 —the reward can be determined and applied as a global reward that is weighted considering mutual, interrelated KPI input, affecting multiple cells at the same time).
It would have been prima-facie obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to have modified Jeong reward determination for both a cell and its neighbor cell(s), with the determination of reward being based on a weighted average of rewards of the cell and its neighboring cell(s), as taught by Mendo Mateo.  
The motivation for doing so would have been to improve reward determinations by averaging rewards for a larger area including both a cell and its neighbors coverage areas, as recognized by Mendo Mateo (paras. [0077], [0092]-[0094], [0109], and [0112] ; and Figs. 8 and 10).
Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure is as follows: 
US PG Pub. 2022/0343117 A1, Jeong et al.: teaches machine learning solutions for self-optimizing parameters associated with distributed cells.
US PG Pub 2019/0014488 A1, Tan et al.: teaches machine learning solutions for self-optimizing parameters associated with distributed cells.
US PG Pub 2021/0360474 A1, Chen et al.: teaches machine learning solutions for self-optimizing parameters associated with distributed cells.
US Patent 12,047,248 B2, Lee et al.: teaches machine learning solutions for self-optimizing parameters associated with distributed cells.
US PG Pub 2022/0248237 A1, Hu et al.: teaches machine learning solutions for self-optimizing parameters associated with distributed cells.
US PG Pub 2021/0241090 A1, Chen et al.: teaches machine learning solutions for self-optimizing parameters associated with distributed cells.
Any inquiry concerning this communication or earlier communications from the Examiner should be directed to Scott Schlack whose telephone number is (571)272-2332. The Examiner can normally be reached Mon. through Fri., from 11am-6pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor, Huy Vu can be reached at (571)272-3155. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/Scott A. Schlack/Examiner, Art Unit 2461

	
/HUY D VU/Supervisory Patent Examiner, Art Unit 2461
Read full office action
Prosecution Timeline

Aug 22, 2023
Application Filed
Dec 04, 2025
Non-Final Rejection — §102, §103
Mar 24, 2026
Examiner Interview Summary
Mar 24, 2026
Applicant Interview (Telephonic)
Apr 14, 2026
Applicant Interview (Telephonic)
Apr 14, 2026
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

17/497,072
Patent 12604212
METHOD AND SYSTEM FOR MOBILITY MANAGEMENT
2y 5m to grant Granted Apr 14, 2026
17/996,147
Patent 12581325
APPARATUS FOR WIRELESS COMMUNICATIONS SYSTEM AND USER EQUIPMENT
2y 5m to grant Granted Mar 17, 2026
17/397,690
Patent 12550195
REDUCED OVERHEAD BEAM SWEEP FOR INITIAL ACCESS
2y 5m to grant Granted Feb 10, 2026
17/593,833
Patent 12507258
Range Extension for Sidelink Control Information (SCI) Stage 2
2y 5m to grant Granted Dec 23, 2025
17/437,889
Patent 12489510
Beam Failure Detection
2y 5m to grant Granted Dec 02, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
44%
Grant Probability
79%
With Interview (+34.8%)
3y 10m
Median Time to Grant
Low
PTA Risk
Based on 52 resolved cases by this examiner. Grant probability derived from career allow rate.