Last updated: April 19, 2026
Application No. 18/208,718
LEARNING-BASED ADAPTIVE TUNING OF 5G CONTROL PARAMETERS

Final Rejection §102§103
Filed
Jun 12, 2023
Examiner
AL SAMAHI, SANAA SHAKER ABED
Art Unit
2463
Tech Center
2400 — Computer Networks
Assignee
Microsoft Technology Licensing, LLC
OA Round
2 (Final)
Interview Optional

— +50.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 3 resolved cases, 2023–2026
Examiner Intelligence

AL SAMAHI, SANAA SHAKER ABED View full profile →
Grants 67% — above average
Career Allow Rate
2 granted / 3 resolved
+8.7% vs TC avg
Strong +50% interview lift
Without
With
+50.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 4m
Avg Prosecution
38 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
1.1%
-38.9% vs TC avg
§103
58.7%
+18.7% vs TC avg
§102
30.2%
-9.8% vs TC avg
§112
10.1%
-29.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 3 resolved cases
Office Action

§102 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement filed on 09/30/2024 comply with all application rules and regulations. Therefore, the information referred to therein has been considered.
Response to Remarks
This Office action is considered fully responsive to the amendments filed 09/05/2025.
a)  Claims 1-2, 4-12, and 14-20 are pending in the application. Claims 1, 8, 11, 17-18 have been amended, claims 3 and 13 have been canceled, and claims 2, 4-7, 9-10, 12, 14-16, and 19-20  were previously presented.  
b) The objection to the claims is withdrawn in light of Applicant’s amendments.
c) The claim rejection under 35 USC § 112(b) is withdrawn in light of Applicant’s amendments.

Response to Arguments
Applicant's arguments filed 8/20/2025 have been fully considered but they are not persuasive. The claim rejections section below details the rejections of the instant claims.


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
 
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.



Claims 1, 2, 4, 6, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18, 19 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Yeh et al. (US 2022/0014963 Al), here in referred to as Yeh1, as field on Sep. 24, 2021 and published on Jan. 13, 2022.
Regarding claim 1 (Currently Amended), Yeh1 teaches A computer-implemented method (Figs. 26-27, [0078], lines 1-4, [0088] lines 1-12, [0517], [0421] the figure illustrates an example for compute circuitry 2602 which includes circuitry processor and memory with other peripheral subsystems which capable of performing computing tasks and communicate with other devices of the method), comprising:
obtaining network state data by transmitting (Fig. 1a,  [0037], [0171], lines 1-7, Claim 53, lines 4-5, and claim 72, the network state data (observation measurements) can be obtaining by UEs  in the form of measurement reports (states s), according to a predetermined time interval (Table 2, [0334], a periodic reporting mechanisms have been described for collected network state data (measurement report), the node collects the related data and reports it at the end of the reporting period), a request to a radio access network intelligent controller  ("RIC") for the network state data of station  ([0284], lines 11-14 and  23-26, [0322], [0334], the request is made over  E2 interface, which is a part of the BS, and radio access network intelligent controller RIC trigger the definition IE for the data that should be reported); wherein the RIC comprises a near real-time RIC and a non real-time RIC (Figs. 21-24, [0102], lines 1-4,  [0319] ,  [0294]-[0295] describe the RIC includes both a near real-time RIC (Near-RT RIC) and a non real-time RIC (Non-RT  RIC) ), the predetermined time interval comprises a near real-time time interval and a non real-time time interval ([0554], the term “superframe” may refers to a time interval comprising two time slots, which can be the  a time interval during which a signal is signaled, [0102], the two distinct time intervals [0294], non-real-time control and optimization of RAN, which typically include long-term processing due to policy generation and AI/ML training [0059], and near-real-time control and optimization of RAN [0295], where the time interval less than 10 ms [0554], lines 3-4, [0195], lines 1-3), the near real-time time interval is less than the non real-time time interval ([0198], describe Near-RT RIC operates in ultra-low latency to handle the requirements for multiple applications, while the Fig. 13, [0297], the non-RT RIC is responsible for long-term processing, AI/ML model training and under different policy generations and conditions and these tasks are computationally intensive and with high latency making the cloud data center a suitable for hosting these process, [0195], lines 7-14, “ As a result, operations at a core network data center 1335 or a cloud data center 1345, with latencies of at least 50 to 100 ms or more” as shown in Fig. 13, which implies the near real-time time interval is less than the non real-time time interval), the request comprises a first request to the near real-time RIC according to the near real-time time interval ([0337], the first request to the near-RT RIC “Near-RT RIC is requesting to subscribe followed by a list of categories or subcounters to be measured for each measurement type, and a granularity period indicating collection interval of those measurements“, which is based on interval time for each measurements, [0357]) and a second request to the non real-time RIC according to the non real-time time interval  ([0296], lines 13-15, the second request to the near-RT RIC for long-term tasks such as AI/ML model training and policy generation “the non-RT RIC 2112 may request or trigger ML model training in the training hosts regardless of where the model is deployed and executed. ML models may be trained and not currently deployed’ which is also according to its time interval, [0297]); determine, based on the network state data, a set of control parameter data using a deep reinforcement-based learning model (Fig. 3, [0051], lines 1-11, [0054], lines 1-6, [0055], lines 1-16,  [0058], lines 1-8, claim 74, lines 9-24, the figure describe the observation data (network stat data) as an input to Recurrent Neural Network RNN (301), and the actor network (303) select/extract the optimum control parameter data (actions) while critic network (305) evaluate the quality of the selection of control parameter data), wherein the deep reinforcement-based learning model comprises a near real-time deep reinforcement-based learning model (Fig. 5, [0032], lines 1-3, [0062], lines 1-8, [0096], the DRL model includes multi real-time access RL/NN for optimizing the traffic  management in a multi access edge compute nodes) and a non real- time deep reinforcement-based learning model ([0102], “Model training is located at non-RT RIC for off-line RL or at near-RT RIC for on-line RL”, [0568], the non-RT models focus long-term objective “In RL, an agent aims to optimize a long-term objective by interacting with the environment based on a trial and error process”), the deep reinforcement-based learning model determines the set of control parameter data as an action and based on the network state data as a state (Fig. 1a and b, Fig. 3, abstract, lines 6-10, claim 74, lines 9-24, illustrate the agent 140  as included RNN which determines set of control parameter data as an action and the network state data as a state), thereby improving a probability of achieving a target condition at the base station as a reward that results from the action, (Fig. 7, [0038], lines 4-14, [0042], 18-21, [0079], lines 12-14,  [0136], lines 7-10, [0568], lines 4-6, [0569], lines 1-5 and claim 83,  it has been describe the reward function  that provided by agent 140 which is part of DRL/RL  to improve the performance of the network conditions (target conditions) as a rewards from the action, where the selection of the action can be done using a probabilistic approach); the set of control parameter data comprises a set of near real-time control parameter data ([0295], [0390], the control parameters are managed by the near-RT RIC, such as traffic steering parameters [0079], lines 12-14, network resources parameters [0074], and QoS parameters Claim 71, and other control parameters [0368], etc.) and a set of non real-time control parameter data ([0294], [0278], lines 15-19, these parameters are managed by the non-RT RIC  and are communicated to the near-RT RIC via AI interface to guide real time decision making, such as policy parameters described in [0378], ML model parameters [0561], configuration parameters [0192], and parameters for optimizing network operations [0261], etc.), the near real-time deep reinforcement-based learning model creates the set of near real-time control parameter data , and the non real-time deep reinforcement-based learning model creates the set of non real-time control parameter data ([0135], [0296], [0367], describe an example that near-RT DRL model can create control parameter data for real-time RAN optimization, [0032]” two different architectures are discussed herein to address the scalability issues in DRL to handle arbitrary number of active multi-access UEs or flows in the network.” Which implies the operation of the two models, [0352] the role of E2 interface in transmitting theses parameters to RAN nodes, and  [0152], “From these graphs, it can be observed that the data-driven approach (DDPG) can outperform all existing solutions, and achieve higher score and better QoS performance than the existing solutions”,  which emphasize the ability of the DRL model to create optimized control parameters): and transmitting the set of near real-time control parameter data to the near real-time RIC ([0295] and [0367] describe the control parameters are sent to the near-RT RIC through the E2 interface, Figs. 21-22, and the near-RT RIC use these parameters for optimization the system and communicate them to RAN node for real time implementation [0295])([0294], [0299] and Figs. 21-22 describe the control parameters are sent to the near-RT RIC through the A1 interface since the non-RT RIC is a part of SMO and uses the A1 interface to interact with near-RT RIC [0278], lines 15-19, and to enable policy-driven guidance of Near-RT RIC applications/functions), causing the base station to update control parameters of the base station (Figs. 20-22 and [0278], lines 1-11, [0277], and  illustrate the interface O1 between the O-RAN and SMO  it is essential for exchange data “The O1 interface is an interface between orchestration & management entities (Orchestration/NMS) and O-RAN managed elements, for operation and management, by which FCAPS management, Software management, File management and other similar functions shall be achieved”, Fig. 21, [0364], lines 4-9, [0284], lines 6-14 and 24-27, [0367], lines 9-13-2, the RIC can also asynchronously send a "CONTROL action" asking the E2 node (Example: e/gNBs) to update its control parameters based on the  values are assigned by the RIC via the "CONTROL action" as shown in Table 12 ).

Regarding claim 2 (Original), Yeh1 teaches the computer-implemented method according to claim 1, wherein the base station is associated with a 5G multi-access edge computing and core network (Figs. 4 and 18, [0083], lines 1-6, [0088], lines 20-23, [0089], lines 7-11, [0290], lines 1-4, [0302, lines 11-19, the base station connected to the 5G core network via NG-C (N2) interface and integrated with Multi-access Edge Computing MEC) , and wherein the deep reinforcement-based learning model includes a trained deep neural network (Fig. 1c, [0055], lines 16-20, “In DRL, the policy function and/or the value function is computed by respective multi-layer neural networks. The independent-layer structure of a deep neural network DNN allows the gradient computations through backpropagation”).

Regarding claim 4 (Original), Yeh1 teaches the computer-implemented method according to claim 1.
wherein the network state data includes at least one of (Fig. 1b, [0036], lines 2-7 and lines 14-17, [0041], lines 8-10, determine the network state data (state s or s’) base on the obtained observation data, various measurements and/or contextual data):
a channel condition (Figs. 7-8, [ 0073], lines 1-6, [0133],lines 2-5, [0136], lines 1-5, as shown in the Figs. “the design of state 721/821, action 722/822, and reward 723/823 is used to design the context space to be continuous ( e.g., time series of traffic load, channel condition, and backhaul delay) or discrete (e.g., average traffic load, channel condition, and backhaul delay)”),
a user-requested data rate 
latency,
reliability data,
location data associated with the base station,
traffic load data, or 
quality-of-service parameter data. 
Regarding to claim 6, Yeh1 teaches the computer-implemented method according to claim 1. 
wherein the set of control parameter data includes at least one of ([0367], lines 2-6, set of control actions and each action deals with a specific function):
Yeh1 teaches a second value associated with a signal transmission power ([0171], lines 10-24, transmission power is considered which is a direct measurement of the signal power level at which signals are transmitted by the UE or BS in addition to other forms of  power measurements, where the sequences of the values, first or second among the control data, can be varied).
a level of priority associated with scheduling data transmission associated with user equipment connected to the base station,
a cyclic prefix length, 
a first value associated with reference signal density, or
a third value associated with a mobility management parameter. 
Regarding to claim 7 (Original), Yeh1 teaches the computer-implemented method according to claim 1.
wherein the target condition includes at least one of ([0569], the outputs as a rewards value based on one or more reward variables):
Yeh1 teaches target latency data ([0075], lines 5-7, [0116], lines 8-13, [0195] and Table 9, the goal includes latency data as measurement for evaluating the network performance),
target spectral efficiency of radio signals at the base station, 
target reliability of the base station,
target energy efficiency level at the base station , or
target fairness values associated with allocating computing resources to user equipment at the base station.

Regarding claim 8 (Currently Amended), Yeh1 teaches the computer-implemented method according to claim 1, wherein the transmitting the set of control parameter data uses an E2 Control interface of Open Radio Access Network (RAN) protocols ([0295], lines 1-4, the O-RAN near-RT RIC 2114 is a logical function that enables near-real-time control and optimization of RAN elements and resources via fine-grained data collection and actions ( control parameter data) over the E2 interface), and wherein the receiving of the network state data is according to E2 Monitor interface of Open RAN protocols (Fig. 22, [0295], [0329], lines 1-8, [0330], lines 3-8, the observation measurements (state data/ collection data) received through E2 interface in O-RAN, which include various metrics and report them periodically).

Regarding claim 9 (Original), Yeh1 teaches the computer-implemented method according to claim 1, wherein the predetermined time interval is less than or equal to 10 milliseconds (Table 2, [0334], [0339], lines 5-11 and [0554], lines 3-9,  table 2 describes the predetermine time interval for reporting the collection measurements can be less than 10 milliseconds. In case the E2 Node is not able to provide data for a granularity period during the reporting period, it may include the optional Incomplete Flag IE, which indicates that the corresponding measurements record in the reported data is not reliable). 

Regarding claim 10 (Original), Yeh1 teaches The computer-implemented method according to claim 1, further comprising: performing offline training of the deep reinforcement-based learning model using training data ( Figs. 20-24)  and [0102], lines 3-4, as shown in the figures, model training is located at non-RT RIC for off-line RL or at near-RT RIC for on-line RL), wherein the training data includes a set of truthful network state data as the state, truthful control parameter data as the action, and truthful target conditions as rewards (Fig. 1a, [0037], lines 1-6, claim 72, [0111], lines 1-4, [0471], lines2-4, [0459], all the observation data are real data, the actions based on the real observations, calculating the reward value using a reward function based on the collected observation (real) data).

Regarding claim 11 (Currently Amended), Yeh1 teaches A system for updating a set of control parameter data ofa base station ([0079], lines 1-4, [0287], lines 6-8, [0523], lines 14-16, the system of updating a set of actions is directly related to the BS by using DRL models for dynamically update the control parameters (actions) based on the network conditions and user requirements as shown in Figs. 1b and 6), the system comprising: a processor; and a memory storing computer-executable instructions that when executed by the processor cause the system to execute operations (Figs. 26-27, [0229], lines 1-10, [0406], lines 1-3, claim 74, lines 1-8, a processor and memory storing computer-executable instructions which can be executed by the processor) comprising: obtaining network state data by transmitting (Fig. 1a,  [0037], [0171], lines 1-7, Claim 53, lines 4-5, and claim 72, the network state data (observation measurements) can be obtaining by UEs  in the form of measurement reports (states s)), according to a predetermined time interval (Table 2, [0334], a periodic reporting mechanisms have been described for collected network state data (measurement report), the node collects the related data and reports it at the end of the reporting period), a request to a radio access network intelligent controller  ("RIC") for the network state data of  the base station ([0284], lines 11-14 and  23-26, [0322], [0334], the request is made over  E2 interface, which is a part of the BS, and radio access network intelligent controller RIC trigger the definition IE for the data that should be reported); wherein the RIC comprises a near real-time RIC and a non real-time RIC (Figs. 21-24, [0102], lines 1-4,  [0319] ,  [0294]-[0295] describe the RIC includes both a near real-time RIC (Near-RT RIC) and a non real-time RIC (Non-RT  RIC) ), the predetermined time interval comprises a near real-time time interval and a non real-time time interval ([0554], the term “superframe” may refers to a time interval comprising two time slots, which can be the  a time interval during which a signal is signaled, [0102], the two distinct time intervals [0294], non-real-time control and optimization of RAN, which typically include long-term processing due to policy generation and AI/ML training [0059], and near-real-time control and optimization of RAN [0295], where the time interval less than 10 ms [0554], lines 3-4, [0195], lines 1-3), the near real-time time interval is less than the non real-time time interval ([0198], describe Near-RT RIC operates in ultra-low latency to handle the requirements for multiple applications, while the Fig. 13, [0297], the non-RT RIC is responsible for long-term processing, AI/ML model training and under different policy generations and conditions and these tasks are computationally intensive and with high latency making the cloud data center a suitable for hosting these process, [0195], lines 7-14, “ As a result, operations at a core network data center 1335 or a cloud data center 1345, with latencies of at least 50 to 100 ms or more” as shown in Fig. 13, which implies the near real-time time interval is less than the non real-time time interval), the request comprises a first request to the near real-time RIC according to the near real-time time interval ([0337], the first request to the near-RT RIC “Near-RT RIC is requesting to subscribe followed by a list of categories or subcounters to be measured for each measurement type, and a granularity period indicating collection interval of those measurements“, which is based on interval time for each measurements, [0357]) and a second request to the non real-time RIC according to the non real-time time interval  ([0296], lines 13-15, the second request to the near-RT RIC for long-term tasks such as AI/ML model training and policy generation “the non-RT RIC 2112 may request or trigger ML model training in the training hosts regardless of where the model is deployed and executed. ML models may be trained and not currently deployed’ which is also according to its time interval, [0297]); determine, based on the network state data, the set of control parameter data using a deep reinforcement-based learning model (Fig. 3, [0051], lines 1-11, [0054], lines 1-6, [0055], lines 1-16,  [0058], lines 1-8, claim 74, lines 9-24, the figure describe the observation data (network stat data) as an input to Recurrent Neural Network RNN (301), and the actor network (303) select/extract the optimum control parameter data (actions) while critic network (305) evaluate the quality of the selection of control parameter data),  wherein the deep reinforcement-based learning model comprises a near real-time deep reinforcement-based learning model (Fig. 5, [0032], lines 1-3, [0062], lines 1-8, [0096], the DRL model includes multi real-time access RL/NN for optimizing the traffic  management in a multi access edge compute nodes) and a non real- time deep reinforcement-based learning model ([0102], “Model training is located at non-RT RIC for off-line RL or at near-RT RIC for on-line RL”, [0568], the non-RT models focus long-term objective “In RL, an agent aims to optimize a long-term objective by interacting with the environment based on a trial and error process”), the deep reinforcement-based learning model determines the set of control parameter data as an action and based on the network state data as a state (Fig. 1a and b, Fig. 3, abstract, lines 6-10, claim 74, lines 9-24, illustrate the agent 140  as included RNN which determines set of control parameter data as an action and the network state data as a state), thereby improving a probability of achieving a target condition at the base station as reward that results from the action (Fig. 7, [0038], lines 4-14, [0042], 18-21, [0079], lines 12-14,  [0136], lines 7-10, [0568], lines 4-6, [0569], lines 1-5 and claim 83,  it has been describe the reward function  that provided by agent 140 which is part of DRL/RL  to improve the performance of the network conditions (target conditions) as a rewards from the action, where the selection of the action can be done using a probabilistic approach),  the set of control parameter data comprises a set of near real-time control parameter data ([0295], [0390], the control parameters are managed by the near-RT RIC, such as traffic steering parameters [0079], lines 12-14, network resources parameters [0074], and QoS parameters Claim 71, and other control parameters [0368], etc.) and a set of non real-time control parameter data ([0294], [0278], lines 15-19, these parameters are managed by the non-RT RIC  and are communicated to the near-RT RIC via AI interface to guide real time decision making, such as policy parameters described in [0378], ML model parameters [0561], configuration parameters [0192], and parameters for optimizing network operations [0261], etc.), the near real-time deep reinforcement-based learning model creates the set of near real-time control parameter data , and the non real-time deep reinforcement-based learning model creates the set of non real-time control parameter data ([0135], [0296], [0367], describe an example that near-RT DRL model can create control parameter data for real-time RAN optimization, [0032]” two different architectures are discussed herein to address the scalability issues in DRL to handle arbitrary number of active multi-access UEs or flows in the network.” Which implies the operation of the two models, [0352] the role of E2 interface in transmitting theses parameters to RAN nodes, and  [0152], “From these graphs, it can be observed that the data-driven approach (DDPG) can outperform all existing solutions, and achieve higher score and better QoS performance than the existing solutions”,  which emphasize the ability of the DRL model to create optimized control parameters):  and transmitting the set of near real-time control parameter data to the near real- time RIC ([0295] and [0367] describe the control parameters are sent to the near-RT RIC through the E2 interface, Figs. 21-22, and the near-RT RIC use these parameters for optimization the system and communicate them to RAN node for real time implementation [0295]) and the set of non real-time control parameter data to the near real-time RIC through the non real-time radio access intelligent controller ([0294], [0299] and Figs. 21-22 describe the control parameters are sent to the near-RT RIC through the A1 interface since the non-RT RIC is a part of SMO and uses the A1 interface to interact with near-RT RIC [0278], lines 15-19, and to enable policy-driven guidance of Near-RT RIC applications/functions), causing the base station to update control parameters of the base station  (Figs. 20-22 and [0278], lines 1-11, [0277], and  illustrate the interface O1 between the O-RAN and SMO  it is essential for exchange data “The O1 interface is an interface between orchestration & management entities (Orchestration/NMS) and O-RAN managed elements, for operation and management, by which FCAPS management, Software management, File management and other similar functions shall be achieved”, Fig. 21, [0364], lines 4-9, [0284], lines 6-14 and 24-27, [0367], lines 9-13-2, the RIC can also asynchronously send a "CONTROL action" asking the E2 node (Example: e/gNBs) to update its control parameters based on the  values are assigned by the RIC via the "CONTROL action" as shown in Table 12 ).

Regarding to claim 14 (Original), Yeh1 teaches the system according to claim 11.
wherein the network state data includes at least one of (Fig. 1b, [0036], lines 2-7 and lines 14-17, [0041], lines 8-10, determine the network state data (state s or s’) base on the obtained observation data, various measurements and/or contextual data):
a channel condition (Figs. 7-8, [ 0073], lines 1-6, [0133],lines 2-5, [0136], lines 1-5, as shown in the Figs. “the design of state 721/821, action 722/822, and reward 723/823 is used to design the context space to be continuous ( e.g., time series of traffic load, channel condition, and backhaul delay) or discrete (e.g., average traffic load, channel condition, and backhaul delay)”),
a user-requested data rate,
latency,
reliability data,  
location data associated with the base station, 
traffic load data, or 
quality-of-service parameter data.
Regarding to claim 15 (Original), Yeh1 teaches the system according to claim 11. 
wherein the set of control parameter data includes at least one of ([0367], lines 2-6, set of control actions and each action deals with a specific function):
Yeh1 teaches a second value associated with a signal transmission power ([0171], lines 10-24, transmission power is considered which is a direct measurement of the signal power level at which signals are transmitted by the UE or BS in addition to other forms of  power measurements, where the sequences of the values, first or second among the control data, can be varied).
a level of priority associated with scheduling data transmission associated with user equipment connected to the base station, 
a cyclic prefix length, 
a first value associated with reference signal density, or 
a third value associated with a mobility management parameter. 

Regarding to claim 16 (Original), Yeh1 teaches the system according to claim 11, wherein the target condition includes at least one of ([0569], the outputs as a rewards value based on one or more reward variables):
Yeh1 teaches target latency data ([0075], lines 5-7, [0116], lines 8-13, [0195] and Table 9, the goal includes latency data as measurement for evaluating the network performance),
target spectral efficiency of radio signals at the base station,
target reliability of the base station, 
target energy efficiency level at the base station, or
target fairness values associated with allocating computing resources to user equipment at the base station.

Regarding claim 17 (Currently Amended), Yeh1 teaches a device for deep reinforcement-based learning of control parameters  of (Figs. 1a-1b, [0094], lines 3-7, claim 53, [0304], lines 2-5, [0453], [0491], [0523], lines 1-6, as shown in the figures the DRL model is hosted by the edge compute node, the DRL model is embedded within the agent that collect observations,  determine states, and refine control parameters for traffic management which can be used by BS), comprising:
a memory; and a processor configured to execute operations ([0421], lines 1-5, [0422], lines 1-6, Figs. 26-27, [0229], lines 1-10, [0406], lines 1-3, a processor and memory storing computer-executable instructions which can be executed by the processor) comprising:
transmitting, according to a predetermined time interval, a first request to a radio access network intelligent controller ("RIC") for receiving network state data of the base station (Fig. 21, [0284], lines 23-26, [0303], lines 1-3, [0335], [0346], lines 1-5, [0357], lines 16-18, [0359], lines 8-12, a RIC service is a service provided on the E2 Node ( as a Base station)  for message and measurements and/or enable control of the E2 Node from the Near-RT RIC. On demand report, as a service, periodically requested which allows xApps or more the Near-RT RIC to request state data or observations, which the RIC collects and sends back. The device may request the observations (state data) from the RIC in different ways based on request, on demand report,  event triggers, or subscription),  wherein the RIC comprises a near real-time RIC and a non real-time RIC (Figs. 21-24, [0102], lines 1-4,  [0319] ,  [0294]-[0295] describe the RIC includes both a near real-time RIC (Near-RT RIC) and a non real-time RIC (Non-RT  RIC) ), the predetermined time interval comprises a near real-time time interval and a non real-time time interval ([0554], the term “superframe” may refers to a time interval comprising two time slots, which can be the  a time interval during which a signal is signaled, [0102], the two distinct time intervals [0294], non-real-time control and optimization of RAN, which typically include long-term processing due to policy generation and AI/ML training [0059], and near-real-time control and optimization of RAN [0295], where the time interval less than 10 ms [0554], lines 3-4, [0195], lines 1-3), the near real-time time interval is less than the non real-time time interval ([0198], describe Near-RT RIC operates in ultra-low latency to handle the requirements for multiple applications, while the Fig. 13, [0297], the non-RT RIC is responsible for long-term processing, AI/ML model training and under different policy generations and conditions and these tasks are computationally intensive and with high latency making the cloud data center a suitable for hosting these process, [0195], lines 7-14, “ As a result, operations at a core network data center 1335 or a cloud data center 1345, with latencies of at least 50 to 100 ms or more” as shown in Fig. 13, which implies the near real-time time interval is less than the non real-time time interval),  the first request comprises a first sub-request to the near real-time RIC according to the near real-time time interval ([0337], the first request to the near-RT RIC “Near-RT RIC is requesting to subscribe followed by a list of categories or subcounters to be measured for each measurement type, and a granularity period indicating collection interval of those measurements“, which is based on interval time for each measurements, [0357]) and a second sub-request to the non real-time RIC according to the non real-time time interval ([0296], lines 13-15, the second request to the near-RT RIC for long-term tasks such as AI/ML model training and policy generation “the non-RT RIC 2112 may request or trigger ML model training in the training hosts regardless of where the model is deployed and executed. ML models may be trained and not currently deployed’ which is also according to its time interval, [0297]); receiving the network state data from a near real-time RIC;  ( [0357], lines 15-18, [0359], lines 18-21, on demand report (the state data report style) is used to report cell-related and UE-related information upon request from Near-RT RIC); determine, based on the network state data, a set of control parameter data using a deep reinforcement-based learning model (Fig. 3, [0051], lines 1-11, [0054], lines 1-6, [0055], lines 1-16,  [0058], lines 1-8, claim 74, lines 9-24, the figure describe the observation data (network stat data) as an input to Recurrent Neural Network RNN (301), and the actor network (303) select/extract the optimum control parameter data (actions) while critic network (305) evaluate the quality of the selection of control parameter data), wherein the deep reinforcement-based learning model comprises a near real-time deep reinforcement-based learning model (Fig. 5, [0032], lines 1-3, [0062], lines 1-8, [0096], the DRL model includes multi real-time access RL/NN for optimizing the traffic  management in a multi access edge compute nodes) and a non real- time deep reinforcement-based learning model ([0102], “Model training is located at non-RT RIC for off-line RL or at near-RT RIC for on-line RL”, [0568], the non-RT models focus long-term objective “In RL, an agent aims to optimize a long-term objective by interacting with the environment based on a trial and error process”), the deep reinforcement-based learning model determines the set of control parameter data as an action and based on the network state data as a state (Fig. 1a and b, Fig. 3, abstract, lines 6-10, claim 74, lines 9-24, illustrate the agent 140  as included RNN which determines set of control parameter data as an action and the network state data as a state), thereby improving a probability of achieving a target condition at the base station as reward that results from the action (Fig. 7, [0038], lines 4-14, [0042], 18-21, [0079], lines 12-14,  [0136], lines 7-10, [0568], lines 4-6, [0569], lines 1-5 and claim 83,  it has been describe the reward function  that provided by agent 140 which is part of DRL/RL  to improve the performance of the network conditions (target conditions) as a rewards from the action, where the selection of the action can be done using a probabilistic approach),  the set of control parameter data comprises a set of near real-time control parameter data ([0295], [0390], the control parameters are managed by the near-RT RIC, such as traffic steering parameters [0079], lines 12-14, network resources parameters [0074], and QoS parameters Claim 71, and other control parameters [0368], etc.) and a set of non real-time control parameter data ([0294], [0278], lines 15-19, these parameters are managed by the non-RT RIC  and are communicated to the near-RT RIC via AI interface to guide real time decision making, such as policy parameters described in [0378], ML model parameters [0561], configuration parameters [0192], and parameters for optimizing network operations [0261], etc.), the near real-time deep reinforcement-based learning model creates the set of near real-time control parameter data , and the non real-time deep reinforcement-based learning model creates the set of non real-time control parameter data ([0135], [0296], [0367], describe an example that near-RT DRL model can create control parameter data for real-time RAN optimization, [0032]” two different architectures are discussed herein to address the scalability issues in DRL to handle arbitrary number of active multi-access UEs or flows in the network.” Which implies the operation of the two models, [0352] the role of E2 interface in transmitting theses parameters to RAN nodes, and  [0152], “From these graphs, it can be observed that the data-driven approach (DDPG) can outperform all existing solutions, and achieve higher score and better QoS performance than the existing solutions”,  which emphasize the ability of the DRL model to create optimized control parameters):  and transmitting the set of near real-time control parameter data to the near real- time RIC ([0295] and [0367] describe the control parameters are sent to the near-RT RIC through the E2 interface, Figs. 21-22, and the near-RT RIC use these parameters for optimization the system and communicate them to RAN node for real time implementation [0295]) and the set of non real-time control parameter data to the near real-time RIC through the non real-time radio access intelligent controller ([0294], [0299] and Figs. 21-22 describe the control parameters are sent to the near-RT RIC through the A1 interface since the non-RT RIC is a part of SMO and uses the A1 interface to interact with near-RT RIC [0278], lines 15-19, and to enable policy-driven guidance of Near-RT RIC applications/functions), causing the base station to update the control parameters of the base station (Figs. 20-22 and [0278], lines 1-11, [0277], and  illustrate the interface O1 between the O-RAN and SMO  it is essential for exchange data “The O1 interface is an interface between orchestration & management entities (Orchestration/NMS) and O-RAN managed elements, for operation and management, by which FCAPS management, Software management, File management and other similar functions shall be achieved”, Fig. 21, [0364], lines 4-9, [0284], lines 6-14 and 24-27, [0367], lines 9-13-2, the RIC can also asynchronously send a "CONTROL action" asking the E2 node (Example: e/gNBs) to update its control parameters based on the  values are assigned by the RIC via the "CONTROL action" as shown in Table 12 ).

Regarding claim 18 (Currently Amended), Yeh1 teaches the device according to claim 17, the processor further configured to execute operations ([0421], lines 1-5, [0422], lines 1-6, Figs. 26-27, [0229], lines 1-10, [0406], lines 1-3, the device includes  processor and memory storing computer-executable instructions which can be executed by the processor)  comprising:
transmitting the first request for the network state data to the radio access network intelligent controller([0077], lines 7-9, [0303], lines 1-3, [0357], lines 16-18, [0359], lines 8-12, the device (e.g., agent) can request the observations (state data) from the RIC based on request, on demand report,  event triggers, or subscription);  causing the radio access network intelligent controller to transmit a second request for the network state data to the base station using an E2 Control interface of Open RAN (Fig. 21, Table 1, [0284], lines 1-14, [0301], [0367], lines 6-13, the Near RT RIC sends a RIC control request massage to the BS via E2 interface, RIC Control Header and RIC Control Message as shown in Table 1, where the E2 interface in Open RAN connects the Near-RT-RIC to the E2 nodes); and causing, the radio access network intelligent controller to receive the network state data from the base station using an E2 Monitor interface of Open RAN (Table 1, Fig. 21, [0397], lines 9-12, [0295], lines 1-9, the BS processes the control request , executes the required actions (e.g., collecting state data or modifying configurations) and sends the message back to the RIC using the RIC Indication message);
Regarding claim 19 (Original), Yeh1 teaches the device according to claim 17, the processor further configured to execute operations comprising:
transmitting the set of control parameter data to the radio access network intelligent controller ( Table 1, Fig. 21, [0397], lines 9-12, [0295], lines 1-9, [0339], lines 1-9, the BS processes the control request , executes the required actions (e.g., collecting state data or modifying configurations) and sends the message back to the RIC using the RIC Indication message);
causing the radio access network intelligent controller to transmit the set of control parameter data to the base station using E2 Control interface of Open RAN; and causing the base station to update control parameter setting according to the set of control parameter data (Table 12, [0301], [0352], the BS through some unites  (e.g., O- RAN-Distributed unit) receives control parameter data by E2 interface, which includes instructions for optimizing network performance and the BS updates its internal settings accordingly, as shown in Table 12, row 2).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 5, 12, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Yeh et al. (US 20220014963 Al), here in referred to as Yeh1 in view of Yeh et al. (US 20220124560 A1), here in referred to as Yeh2.
Regarding to claim 5 (Original), Yeh1 teaches the computer-implemented method according to claim 1.
However, Yeh1 does not explicitly teach wherein the deep reinforcement-based learning model includes a deep convolutional neural network,
 Yeh2 teaches wherein the deep reinforcement-based learning model includes a deep convolutional neural network (Figs. 5b and 5c, [0103], lines 1-8, DRL framework (NN 500c) can include a convolution NN (CNN), deep CNN (DCN), recurrent NN (RNN)).
Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yeh1 to incorporate the teachings of Yeh2 (in analogous art) by includes a deep convolutional neural network in the deep reinforcement-based learning model. Doing this integration can enhance the ability of the model to extract features and handle high dimensional data ( Yeh2, [0103]) .

Regarding to claim 12 (Original), Yeh1 teaches the system according to claim 11, wherein the base station and the radio access network intelligent controller are associated with a 5G multi-access edge computing and core network (Figs. 4 and 18, [0083], lines 1-6, [0088], lines 20-23, [0089], lines 7-11, [0290], lines 1-4, [0302, lines 11-19, claim 73, the base station and RIC connected to the 5G core network via NG-C (N2) interface and integrated with Multi-access Edge Computing MEC), and 
However, Yeh1 does not explicitly teach wherein the deep reinforcement-based learning model includes a trained deep convolutional neural network, 
Yeh2 teaches wherein the deep reinforcement-based learning model includes a trained deep convolutional neural network (Figs. 5b and 5c, [0103], lines 1-8, DRL framework (NN 500c) can include a convolution NN (CNN), deep CNN (DCN), recurrent NN (RNN)).
Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yeh1 to incorporate the teachings of Yeh2 (in analogous art) by includes a deep convolutional neural network in the deep reinforcement-based learning model. Doing this integration can enhance the ability of the model to extract features and handle high dimensional data ( Yeh2, [0103]) .

Regarding to claim 20 (Original), Yeh1 teaches  the device according to claim 17. 
Yeh1 further teaches  wherein the network state data includes at least one of (Fig. 1b, [0036], lines 2-7 and lines 14-17, [0041], lines 8-10, determine the network state data (state s or s’) base on the obtained observation data, various measurements and/or contextual data):
a channel condition (Figs. 7-8, [ 0073], lines 1-6, [0133],lines 2-5, [0136], lines 1-5, as shown in the Figs. “the design of state 721/821, action 722/822, and reward 723/823 is used to design the context space to be continuous ( e.g., time series of traffic load, channel condition, and backhaul delay) or discrete (e.g., average traffic load, channel condition, and backhaul delay)”),
a user-requested data rate,
latency,
reliability data,
location data associated with the base station,
traffic load data, or
quality-of-service parameter 
Yeh1 further teaches wherein the set of control parameter data includes at least one of ([0367], lines 2-6, set of control actions and each action deals with a specific function):
a second value associated with a signal transmission power ([0171], lines 10-24, transmission power is considered which is a direct measurement of the signal power level at which signals are transmitted by the UE or BS in addition to other forms of  power measurements, where the sequences of the values, first or second among the control data, can be varied).
a level of priority associated with scheduling data transmission associated with user equipment connected to the base station, 
a cyclic prefix length, 
a first value associated with reference signal density, or
a third value associated with a mobility management parameter.
Yeh1 further teaches wherein the target condition includes at least one of ([0569], the outputs as a rewards value based on one or more reward variables):
target latency data ([0075], lines 5-7, [0116], lines 8-13, [0195] and Table 9, the goal includes latency data as measurement for evaluating the network performance),
target spectral efficiency of radio signals at the base station,
target reliability of the base station,
target energy efficiency level at the base station, or
target fairness values associated with allocating computing resources to user equipment at the base station.
However, Yeh1 does not explicitly teach wherein the deep reinforcement-based learning model includes a deep convolutional neural network, 
Yeh2 teaches wherein the deep reinforcement-based learning model includes a deep convolutional neural network (Figs. 5b and 5c, [0103], lines 1-8, DRL framework (NN 500c) can include a convolution NN (CNN), deep CNN (DCN), recurrent NN (RNN)),
Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yeh1 to incorporate the teachings of Yeh2 (in analogous art) by includes a deep convolutional neural network in the deep reinforcement-based learning model. Doing this integration can enhance the ability of the model to extract features and handle high dimensional data ( Yeh2, [0103]) .



Relevant Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's
disclosure.
Tirola et al. (US 20130343314 A1),  Song et al. (US 20070211672 A1),  Lee et al.  (US 20220131764 A1), Ranganath et al. (US  20240259879 A1), Lee et al. (US 20220104113 A1)  and  Ranganath et al.  (WO 2023091664 A1) teach method involved optimized the control parameters in a wireless communication systems.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SANAA AL SAMAHI whose telephone number is (571)272-4171. The examiner can normally be reached M-F 8-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Asad Nawaz can be reached at (571) 272-3988. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit http
Read full office action
Prosecution Timeline

Jun 12, 2023
Application Filed
Jun 02, 2025
Non-Final Rejection — §102, §103
Aug 21, 2025
Applicant Interview (Telephonic)
Aug 22, 2025
Examiner Interview Summary
Sep 05, 2025
Response Filed
Oct 14, 2025
Final Rejection — §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/611,024
Patent 12587875
METHOD, DEVICE AND STORAGE MEDIUM FOR DRIVE TEST
2y 5m to grant Granted Mar 24, 2026
18/108,759
Patent 12367466
SATELLITE DATA NFT TRANSFER LEVERAGING BLOCKCHAIN AND SMART CONTRACT CAPABILITIES
2y 5m to grant Granted Jul 22, 2025
Study what changed to get past this examiner. Based on 2 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
67%
Grant Probability
99%
With Interview (+50.0%)
3y 4m
Median Time to Grant
Moderate
PTA Risk
Based on 3 resolved cases by this examiner. Grant probability derived from career allow rate.