Last updated: May 29, 2026
Application No. 17/891,273
ELECTRONIC DEVICE FOR PROVIDING INFORMATION FOR REINFORCEMENT LEARNING AND METHOD FOR OPERATING THEREOF

Final Rejection §101§103
Filed
Aug 19, 2022
Priority
Aug 19, 2021 — RE 10-2021-0109360 +1 more
Examiner
KAPOOR, DEVAN
Art Unit
2126
Tech Center
2100 — Computer Architecture & Software
Assignee
Samsung Electronics Co., Ltd.
OA Round
4 (Final)
Interview Optional

— +16.7% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 10% grant rate with +16.7% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 10 resolved cases, 2023–2026
Examiner Intelligence

KAPOOR, DEVAN View full profile →
Grants only 10% of cases
Career Allowance Rate
1 granted / 10 resolved
-45.0% vs TC avg
Strong +17% interview lift
Without
With
+16.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 4m
Avg Prosecution
20 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§103
100.0%
+60.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 10 resolved cases
Office Action

§101 §103
DETAILED ACTION
This action is responsive to the application filed on 09/30/2025. Claims 1-16, 22-23 are pending and have been examined. This action is Non-final.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 09/30/2025 has been entered.
 
Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C.
120, 121, 365(c), or 386(c) is acknowledged.
Response to Arguments
Argument 1: The applicant argues that the examiner wrongly equated Shi’s Q-learning agent with both the operation determination model and the learning learner. They assert that claim 1 requires these to be distinct entities (multiple models and corresponding learners) whereas Shi only describes an internal reinforcement learning loop, not a request/response structure. The applicant also contends that Peng and Devitt do not disclose or suggest multiple learners, models, or a reward-based information exchange.
Examiner Response to Argument 1: Regarding the applicant’s argument “The Office Action improperly equates Q-learning agent = operation determination model = learning learner,” it has been fully considered. However, it is not persuasive because the updated mapping to the amended claims now explicitly distinguishes the roles of the operation determination model and the learning learner. Specifically, Katti teaches a framework where trained models executed in the RIC near-RT output operational decisions to the RAN, while distinct learners in the RIC non-RT perform model training and optimization based on feedback and data exchange over the A1 interface. This structure demonstrates separation between the learner (non-RT RIC) and the operational model (near-RT RIC), directly addressing the applicant’s contention that prior art such as Shi and Peng show only a single agent. Accordingly, the examiner respectfully refers to the detailed mapping to the amended claims, which clearly identifies how each reference collectively teaches or renders obvious the claimed distinction and functionality. Regarding the applicant's argument that the cited art is centralized and does not disclose “a plurality of operation determination models” or “a plurality of learning learners,” it has been fully considered but is not persuasive. Please see the detailed mapping to the amended claims provided above. As shown therein, Katti expressly teaches a distributed architecture involving a plurality of models and learners, where trained models and policy messages generated in the non-RT RIC are conveyed to the near-RT RIC for runtime execution, satisfying the recited “plurality of operation determination models” and corresponding “learning learners.” Further, the combined teachings of Katti, Shi, and Peng collectively disclose and suggest the claimed multi-model and multi-learner framework, as well as distinct yet corresponding learning and execution entities, as detailed in the updated mapping. Accordingly, the applicant’s contentions regarding the absence of a plurality of models and learners, or lack of distinct learner-model correspondence, are overcome by the evidence cited and discussed in the updated rejection.
Argument 2: The applicant argues claim 1 is patent eligible because it is tied to a device that runs multiple learning models on a processor, not to steps done in the mind. They say the claim improves how a RAN is run by producing training data only when asked by a learner, which cuts resource waste, reduces duplicate data, allows user settings, and supports flexible learning for new or updated agents. The claim also stores RAN parameters, computes a reward, and provides it to the learner, which the applicant frames as a technical step central to reinforcement learning rather than mere data handling. They add that managing and training multiple models in parallel boosts RAN efficiency and overall network performance, so the claim integrates any abstract idea into a practical application and has significantly more. On that basis, they ask that the 101 rejection be withdrawn.
Examiner Response to Argument 2: The examiner has considered the applicant’s arguments above, however they do not overcome the mapping on record. As amended, claim 1 still recites receiving a request from a learning learner, identifying parameter values and information in response, computing a reward from those values, producing information for learning an operation determination model using the values and reward, and providing that information. Under Step 2A Prong 1 these are mental steps and mathematical concepts even if executed on a processor. Naming multiple operation determination models distinct from learners does not supply a specific model architecture, data structure, or algorithm that changes how the computer or RAN operates. Under Step 2A Prong 2 the claim remains result oriented: the asserted benefits are outcomes while the recited actions are generic data storage, selection, calculation, formatting, and transmission, including reduced waste, evading duplicate data, and support for user settings. Under Step 2B these additional elements are well understood routine and conventional and the change from providing to producing and providing does not add a technical improvement. Arguments about efficiency, parallel learners, or novelty do not establish eligibility. The section 101 rejection of claims 1, 14, 15, 16, 22, and 23 is therefore maintained. Likewise, the system and non-transitory computer readable medium claims merely implement these abstract data handling steps on generic processors and memories and therefore do not add an inventive concept.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition
of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the
conditions and requirements of this title. 
Claims 1-16, 22-23 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim 1, 
Step 1: The claim is directed to method, which is considered to be a process. The claim satisfies step 1. 
Step 2A Prong 1: 
“in response to the request, identifying, among the at least one value, a value corresponding to at least one first parameter corresponding to the operation determination model and information associated with an operation corresponding to the at least one first parameter;” -- This limitation is directed to identifying a value that corresponds to a parameter that will also correspond to an operation of the determination model, in response of a request. Identifying values based on data and initiating once getting the request and evaluating and observing it, the limitation is directed to process that can be performed in the human mind using evaluation, observation, and judgement, and thus the limitation is directed to a mental process. 
“producing the information for learning the operation determination model based on the value corresponding to the at least one first parameter, the information associated with the operation corresponding to the at least one first parameter, and a reward value identified based on at least a part of the value corresponding to the at least one first parameter” -- The limitation is directed to producing the information for learning the model based on values/information associated that correspond to the first parameter, as well identifying a reward value based on at least a part of the value that corresponds to at least a first parameter. The limitation is directed to the use of mathematical calculation and involves a mathematical concept of computing values and basing a new value by a calculated one, and thus the limitation is directed to math. 
Step 2A Prong 2 and Step 2B: 
“A method of operating an electronic device, the method comprising:…wherein a value corresponding to one or more parameters among the plurality of parameters is used based on a plurality of operation determination models executable by the electronic device to determine at least a part of the information associated with the operation, wherein each of the plurality of operation determination models is configured to... and each of the plurality of operation determination models is distinct from a respective learning learner and corresponds to the respective learning learner among a plurality of learning learners;… from a learning learner among the plurality of learning learners for learning the operation determination model” – The limitation recites that a value that corresponds to parameters is used based on an executable determination model by the electronic device to determine at least a part of the information, and that the operation models will be configured/instructed to performed additional elements. The limitation merely recites mere instructions to apply onto a computer, and thus it does not integrate to a practical application, nor does it provide significantly more than the judicial exception (see MPEP 2106.05(f)). 
“storing at least one value corresponding to each of a plurality of parameters associated with a radio access network (RAN), and information associated with an operation performed by the RAN,” -- The limitation recites storing values to a parameter that is associated with a network and storing information that is associated with operations performed by the network. Storing data involves mere data gathering, for which cannot be integrated to a practical application (see MPEP 2106.05(g)). Furthermore, under step 2B, the act of storing and/or retrieving data in memory and electronic recordkeeping is a well-understood, routine and conventional activity, that cannot provide significantly more than the judicial exception (see MPEP 2106.05(d)(II)). 
“receiving…a request for information for learning an operation determination model selected from among the plurality of operation determination models…and providing the information for learning the operation determination model to the learning learner” -- The limitation recites receiving a request for information to learn a determination model that is selecting amongst multiple determination models, as well as providing information for the learning of the model to the learning learner. The limitation is directed to an insignificant, extra-solution activity that cannot be integrated to a practical application (see MPEP 2106.05(g)). Furthermore, under Step 2B, the act of sending/receiving data and its requests over a network is a well-understood, routine, and conventional activity (WURC), and does not provide significantly more than the judicial exception (see MPEP 2106.05(d)(II)). 
Thus, claim 1 is non-patent eligible. Claim 14 and 22 are analogous to claim 1, the main difference being the type of claim. The same rejection applies to both claims. For claim 22, the main claim difference is the claim type and the below limitation in the preamble: 
“A non-transitory computer-readable storage medium for storing instructions which, when executed individually and/or collectively by at least one processor of an electronic device, control the electronic device to perform:” -- The limitation recites a non-transitory CRM for storing instructions that will be recited executions to apply onto a computer/device and how the device will be controlled. The limitation amounts to no more than mere instructions to apply onto a computer, and it does not integrate to a practical application, nor does it provide significantly more than the judicial exception (see MPEP 2106.05(f)). 

Regarding claim 2, 
Step 1: The claim is directed to method, which is considered to be a process. The claim satisfies step 1. 
Step 2A Prong 1:
“classifying” – The limitation is directed to classifying, in the context and broadest reasonable interpretation of the claim as a whole, values that will correspond to a parameter and information associated with the operation performed by the computer. The act of classifying is directed to a process that can be performed using evaluation, judgement, and observation in the human mind with aid of pen and paper, and thus is directed to a mental process. 
Step 2A Prong 2 and Step 2B: 
“The method of claim 1, wherein the storing of the at least one value corresponding to each of the plurality of parameters associated with the RAN, and the information associated with the operation performed by the RAN comprises: and storing, for each of a plurality of points in time, the at least one value corresponding to each of the at least one parameter and the information associated with the operation performed by the RAN.” – The limitation recites that instructions of storing the value that that will correspond to a parameter of the (neural network/computer) will comprise storing values for multiple points in time, that will correspond with a parameter and information associated by an operation of the computer (network), which is considered to be an insignificant, extra-solution activity that cannot be integrated to a practical application (see MPEP 2106.05(g)). Furthermore, under step 2B, the act of storing and/or retrieving data from memory (RAN) and electronic recordkeeping is a well-understood, routine, and conventional activity, and cannot provide significantly more than the judicial exception (see MPEP 2106.05(d)(II)). 
Thus, claim 2 is non-patent eligible. 

Regarding claim 3, 
Step 1: The claim is directed to method, which is considered to be a process. The claim satisfies step 1. 
There are no elements to be evaluated under Step 2A Prong 1.
Step 2A Prong 2 and Step 2B: 
“The method of claim 1, further comprising: obtaining, from the learning learner, the operation determination model updated based on the provided information; obtaining a new value corresponding to the at least one first parameter from the RAN; obtaining information associated with a new operation which is a result obtained by applying the new value corresponding to the at least one first parameter to the updated operation determination model;” – The limitation recites obtaining and updating an operation determination model by a first learning learner and based on the provided information. The limitation involves updating a model (similar to updating an activity log) and obtaining that model based on gathered information. The limitation goes on to recite obtaining a new value that corresponds to a parameter on the network (RAN), and lastly obtains information associated with an operation to be manipulated and placed onto corresponding data (the parameter). All the limitations recited above is considered to be an insignificant, extra-solution activity and it cannot be integrated to a practical application (see MPEP 2106.05(g)). Furthermore, under step 2B, the act of updating a model and mere data gathering is considered to be directed to electronic recordkeeping, which is a well-understood, routine, and conventional activities that cannot provide significantly more than the judicial exception (see MPEP 2106.05(d)(II)). 
“providing the information associated with the new operation to the RAN.” –The limitation recites providing information that is associated to a new operation of the network. Sending/receiving data over a network and merely outputting data is considered to be an insignificant, extra solution activity that cannot be integrated to a practical application (see MPEP 2106.05(g)). Furthermore, under step 2B, the act of transmitting data over a network is a well-understood, routine and conventional (WURC) activity, which cannot provide significantly more than the judicial exception (see MPEP 2106.05(d)(II)). 
Thus, claim 3 is non-patent eligible. 

Regarding claim 4,  
Step 1: The claim is directed to method, which is considered to be a process. The claim satisfies step 1. 
Step 2A Prong 1:
“identifying a parameter used by the updated operation determination model as at least one second parameter which is at least partially different from the at least one first parameter” – The limitation is directed to identifying a parameter that is to be a parameter that is different from the first parameter. Identifying a parameter is a process that can be performed in the human mind using evaluation, observation, and judgment, thus the limitation is considered to be a mental process.
Step 2A Prong 2 and Step 2B:
The majority of the limitations in this claim is analogous to claim 3. The following in (a) is considered analogous and thus will face the same reject as recited in claim 3 (insignificant, extra-solution activity under 2106.05(g) and WURC under 2106.05(d)(II): 
“obtaining, from the learning learner, the operation determination model updated based on the provided information; obtaining a value corresponding to the at least one second parameter from the RAN; obtaining information associated with a new operation which is a result obtained by applying the value corresponding to the at least one second parameter to the updated operation determination model; and providing the information associated with the new operation to the RAN.” 
Thus, claim 4 is non-patent eligible. 

Regarding claim 5, 
Step 1: The claim is directed to method, which is considered to be a process. The claim satisfies step 1.
Step 2A Prong 1:  
“The method of claim 4, further comprising: identifying a new request for information for learning the operation determination model; in response to the new request, identifying a value corresponding to the at least one second parameter and information associated with an operation corresponding to the at least one second parameter;” – The limitation is directed to identifying requests for information and in response of the request, to identify a value that corresponds to a parameter and information. This limitation recites all processes that can be performed in the human mind using evaluation, observation, and judgment, as well as aid of pen and paper to perform the task, thus it is directed to a mental process. 
Step 2A Prong 2 and Step 2B: 
“providing, as new information, the value corresponding to the at least one second parameter, the information associated with the operation corresponding to the at least one second parameter, and a reward value identified based on at least a part of the value corresponding to the at least one second parameter.” – The limitation recites types of information (aka data) and the value that corresponds to a parameter. Providing information and values that correspond to parameters in the RAN is directed to selecting particular data to be manipulated and will also involve mere data gathering, which is an insignificant, extra-solution activity, and it cannot be integrated to a practical application (see MPEP 2106.05(g)). Furthermore, under step 2B, the act of transmitting data over a network is a well-understood, routine, and conventional activity (WURC), and it cannot provide significantly more than the judicial exception (see MPEP 2106.05(d)(II)). 
Thus, claim 5 is non-patent eligible

Regarding claim 6, 
Step 1: The claim is directed to method, which is considered to be a process. The claim satisfies step 1.
Step 2A Prong 1: 
“The method of claim 1, wherein, in response to the request, the identifying of the value corresponding to the at least one first parameter corresponding to the operation determination model among the at least one value and the information associated with the operation corresponding to the at least one first parameter comprises:” –The limitation is analogous to claim 1’s limitation: “in response to the request, identifying, among the at least one value, a value corresponding to at least one first parameter corresponding to the operation determination model and information associated with an operation corresponding to the at least one first parameter;” which was directed to a mental process, and thus the limitation of claim 6 is also directed to a mental process. 
“identifying whether at least one value corresponding to each of the plurality of parameters supports the at least one first parameter, and based on the at least one first parameter being supported, identifying the value corresponding to the at least one first parameter corresponding to the operation determination model and the information associated with the operation corresponding to the at least one first parameter.” – The limitation is directed to identifying whether a value corresponds to a parameter and based on the supported parameter, identifying a value that corresponds to an operation of the model and information associated with an operation that corresponds to a parameter. Identifying a value and determining if that value should correspond with other data or values is directed to a process that can be performed in the human mind, and thus is directed to a mental process. 
There are no elements to be evaluated under Step 2A Prong 2 and Step 2B. 
Thus, claim 6 is non-patent eligible. Claim 15 is analogous to claim 6, aside from the added limitation below. Majority of claim 6’s mapping applies to claim 15. For claim 15, Step 2A Prong 2 and Step 2B, the limitation added “instructions, when executed by the at least one processor, cause the electronic device to:” is directed to mere instructions to apply onto computer, and thus does not integrate to a practical application, nor does it provide significantly more than the judicial exception (see MPEP 2106.05(f)). 

Regarding claim 7, 
Step 1: The claim is directed to method, which is considered to be a process. The claim satisfies step 1.
Step 2A Prong 1: 
“The method of claim 1, wherein the producing the information for learning the operation determination model comprises: producing the information for learning the operation determination model based on the value corresponding to the at least one first parameter, the information associated with the operation corresponding to the at least one first parameter, and an entirety of the value corresponding to the at least one first parameter, and an entirety of a reward value identified based on the entirety of the value” -- The limitation is directed to producing information for the learning of the determination model based on a value that corresponds to a first parameter, and the information associated with the operation that corresponds with the parameter/same for an entirety of a value, and a reward value based on the value entirety. The limitation is directed to a process that can be performed in the human mind (with aid of pen and paper), and thus the limitation is directed to a mental process. 
There are no elements to be evaluated under Step 2A Prong 2 and Step 2B. 
Thus, claim 7 is non-patent eligible. 

Regarding claim 8, 
Step 1: The claim is directed to method, which is considered to be a process. The claim satisfies step 1.
Step 2A Prong 1: 
“The method of claim 1, wherein the producing information for learning the operation determination model comprises: selecting a part among the value corresponding to the at least one first parameter, the information associated with the operation corresponding to the at least one first parameter, and the value corresponding to the at least one first parameter, and producing the information for learning the operation determination model based on the selected part, and a reward value identified based on the selected part to the learning learner as the information for learning the operation determination model.” -- The limitation is directed to producing information for learning the model by selecting a part of a value that corresponds to a parameter, the information associated with the value, and a reward value based on a selected part of the learner as information for the model. The limitation is directed to a process that can be performed in the human mind using evaluation, observation, and judgement (with aid of pen and paper), and thus the limitation is directed to a mental process. 
There are no elements to be evaluated under Step 2A Prong 2 and Step 2B.
Thus, claim 8 is non-patent eligible. Claim 16 is analogous to claim 8, aside from the added limitation below. Majority of claim 8’s mapping applies to claim 16. For claim 16, Step 2A Prong 2 and Step 2B, the limitation added “instructions, when executed by the at least one processor, cause the electronic device to:” is directed to mere instructions to apply onto computer, and thus does not integrate to a practical application, nor does it provide significantly more than the judicial exception (see MPEP 2106.05(f)).

Regarding claim 9, 
Step 1: The claim is directed to method, which is considered to be a process. The claim satisfies step 1.
Step 2A Prong 1:
“selecting the part based on priority of each of the at least one first parameter, selecting the part based on a point in time at which each value corresponding to the at least one first parameter is obtained, or selecting the part in a random manner” – The limitation is directed to multiple selection of parts of a parameter and/or a value that corresponds to a parameter, based on priority/on a point in time, or selecting the part in a random fashion. All of these limitations are capable of being performed in the human mind, using evaluation, observation, and judgement as well as pen and paper, thus the limitation is directed to a mental process. 
Step 2A Prong 2 and Step 2B:
“The method of claim 8, wherein the selecting of the value corresponding to the at least one first parameter, the information associated with the operation corresponding to the at least one first parameter, and the part of the value corresponding to the at least one first parameter comprises:  selecting the part among the value corresponding to the at least one first parameter, the information associated with the operation corresponding to the at least one first parameter, and the value corresponding to the at least one first parameter based on at least one operation among:” – The limitation is very similar to limitation already addressed in claim 1 and claim 8 above, with the added limitation of selecting a part of a value that corresponds to a parameter, based on priority/on a point in time, or selecting the part in a random fashion, which does not amount to more than merely limiting the field of use and/or particular environment of the value, and it cannot be implemented to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(h)). 
Thus, claim 9 is non-patent eligible. 

Regarding claim 10, 
Step 1: The claim is directed to method, which is considered to be a process. The claim satisfies step 1.
Step 2A Prong 1: 
“The method of claim 1, further comprising: identifying the reward value based on a reward determination scheme and at least a part of the value corresponding to the at least one first parameter,” – The limitation is directed to identifying a reward value based on a scheme and a part of a value that will correspond to a parameter. Identifying values based on a scheme and a value based on data can be performed in the human mind with aid of pen and paper, thus the limitation is directed to a mental process. 
Step 2A Prong 2 and Step 2B: 
“wherein the reward determination scheme is stored in advance in the electronic device or is received by the electronic device.” – The limitation recites that the determination scheme will either be stored in a electronic device (computer/network) or received by the device. This limitation is directed to mere data gathering, which is an insignificant, extra-solution activity that cannot be integrated to a practical application (see MPEP 2106.05(g)). Furthermore, under step 2B, the act of storing and/or receiving data over a network are both considered well-understood, routine and conventional activities (WURC), and it cannot provide significantly more than the judicial exception (see MPEP 2106.05(d)(II)). 
Thus, claim 10 is non-patent eligible. 

Regarding claim 11, 
Step 1: The claim is directed to method, which is considered to be a process. The claim satisfies step 1.
Step 2A Prong 1:
“The method of claim 1, wherein, in response to the request, the identifying of the value corresponding to the at least one first parameter corresponding to the operation determination model among the at least one value, and the information associated with the operation corresponding to the at least one first parameter comprises; identifying the at least one first parameter declared by the operation determination model.” – This limitation is directed to identifying a value that corresponds to a parameter that will also correspond to an operation of the determination model, in response of a request, as well as identifying a parameter that was declared by an operation of the determination model. Identifying values based on data and initiating once getting the request and evaluating and observing it is directed to a mental process. 
There are no elements to be evaluated under Step 2A Prong 2 and Step 2B.  
Thus, claim 11 is non-patent eligible. 

Regarding claim 12, 
Step 1: The claim is directed to method, which is considered to be a process. The claim satisfies step 1.
Step 2A Prong 1:
“The method of claim 1, wherein, in response to the request, the identifying of the value corresponding to the at least one first parameter corresponding to the operation determination model among the at least one value, and the information associated with the operation corresponding to the at least one first parameter comprises: identifying the at least one first parameter based on an external input.” – This limitation is directed to identifying a value that corresponds to a parameter that will also correspond to an operation of the determination model, in response of a request, as well as identifying a parameter based on an input. Identifying values based on data and initiating once getting the request and evaluating and observing it is directed to a mental process. 
There are no elements to be evaluated under Step 2A Prong 2 and Step 2B.
Thus, claim 12 is non-patent eligible. 

Regarding claim 13, 
Step 1: The claim is directed to method, which is considered to be a process. The claim satisfies step 1.
There are no elements to be evaluated under Step 2A Prong 1. 
Step 2A Prong 2 and Step 2B: 
“The method of claim 1, wherein the information for learning the operation determination model comprises: a value corresponding to the at least one first parameter at a first point in time, an operation performed by the RAN at the first point in time, a value corresponding to the at least one first parameter at a second point in time after the first point in time according to a result of performing the first operation, and a reward value at the first point in time.” – The limitation recites a value, a first operation performed by the RAN and certain point in times (first and second) that a value corresponds to a parameter will exist once the first operation performs. All these limitations are merely further limiting the field of use/particular environment of the claim and it’s not integrating to a practical application nor providing significantly more than the judicial exception (see MPEP 2106.05(h)). 
Thus, claim 13 is non-patent eligible. 

Regarding claim 23, 
Step 1: The claim is directed to method, which is considered to be a process. The claim satisfies step 1.
There are no elements to be evaluated under Step 2A Prong 1.
Step 2A Prong 2 and Step 2B: 
“The method of claim 1, wherein the information for learning the operation determination model is produced temporarily, and wherein the method further comprises: after the information for learning the operation determination model is provided to the learning learner, deleting the information for learning the operation determination model.” -- The limitation recites that the operation determination will further be producing information temporarily, and that the after information is learned by the model and provided to the learning learner, the information is then deleted. The limitation amounts to no more than mere further limiting to a field of use/environment, and it does not integrate to a practical application, nor does it provide significantly more than the judicial exception (see MPEP 2106.05(h)). 
Thus, claim 23 is non-patent eligible. 


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this
Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not
identically disclosed as set forth in section 102, if the differences between the claimed invention and the
prior art are such that the claimed invention as a whole would have been obvious before the effective filing
date of the claimed invention to a person having ordinary skill in the art to which the claimed invention
pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are
summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness. 
Claims 1, 11-12, 14-15, and 22 are rejected under 35 U.S.C. 103 as being unpatentable over NPL reference “O-RAN: Towards an Open and Smart RAN. White Paper.”, by Katti et. al. (referred herein as Katti) in view of US-8255524-B2, by Devitt et. al. (referred herein as Devitt). 

Regarding claim 1, Katti teaches:
A method of operating an electronic device, the method comprising: storing at least one value corresponding to each of a plurality of parameters associated with a radio access network (RAN), and information associated with an operation performed by the RAN, ([Katti, page 11] “The RIC near-RT functions leverage a database called the Radio-Network Information Base (R-NIB) which captures the near real-time state of the underlying network via E2 and commands from RIC non-RT via A1.”, and [Katti, page 13] “Network & UE-level information/context exposure from eNB/gNB to RIC non-RT to support various requirements such as network management, online learning and offline training of AI/ML models and driving non-RT optimization into the network.”, wherein the examiner interprets the Radio-Network Information Base (R-NIB) capturing the near real-time state of the network and the network and UE-level information/context exposure between RIC near-RT and RIC non-RT to be the same as storing at least one value corresponding to each of a plurality of parameters associated with a RAN and information associated with an operation performed by the RAN, because they are both directed to the storage and maintenance of RAN-related parameters (such as network state, UE context, and operational data) that reflect ongoing operations of the radio access network.)
wherein a value corresponding to one or more parameters among the plurality of parameters is used based on a plurality of operation determination models executable by the electronic device to determine at least a part of the information associated with the operation, wherein each of the plurality of operation determination models is configured to output a respective operation to the RAN for the RAN to operate based on the respective operation, and each of the plurality of operation determination models is distinct from a respective learning learner and corresponds to the respective learning learner among a plurality of learning learners; ([Katti, page 11]. “Trained models and real-time control functions produced in the RIC non-RT are distributed to the RIC near-RT for runtime execution…Messages generated from AI-enabled policies and ML based training models in RIC non-RT are conveyed to RIC near-RT. The core algorithm of RIC non-RT is developed and owned by operators. It provides the capability to modify the RAN behaviors by deployment of different models optimized to individual operator policies and optimization objectives…While the E2 interface feeds data, including various RAN measurements, to the RIC near-RT to facilitate radio resource management, it is also the interface through which the RIC near-RT may initiate configuration commands directly to CU/DU.”, and [Katti, page 12] “With the amount of L1/L2/L3 data collected from eNB/gNB (including CU/DU), useful data features and models can be learned to empower the intelligent management and control in RAN.”, wherein the examiner interprets the RIC non-RT [non-Real Time] performing model-training and conveying trained models and policy messages to the RIC near-RT [near Real Time] and the RIC near-RT executing those models and initiating configuration commands to CU/DU, to be the same as “a plurality of operation determination models executable by the electronic device that output respective operations to the RAN” and the RIC non-RT (learning component) being distinct from but corresponding to the executed models in the RIC near-RT, because they are both directed to a separated learner-model pipeline where multi-layer RAN parameters (L1/L2/L3 features) are used to train models and the executed models produce operational outputs that reconfigure and control the RAN.).
receive, from a learning learner among the plurality of learning learners for learning the operation determination model, a request for information for learning an operation determination model selected from among the plurality of operation determination models ([Katti, page 11] “Messages generated from AI-enabled policies and ML based training models in RIC non-RT are conveyed to RIC near-RT” and [Katti, page 13] “The A1 interface supports communication & information exchange between Orchestration/NMS layer containing RIC non-RT and eNB/gNB containing RIC near-RT. Key functions that the A1 interface is expected to provide include: Network & UE-level information/context exposure from eNB/gNB to RIC non-RT to support various requirements such as network management, online learning and offline training of AI/ML models and driving non-RT optimization into the network. Support for policy-based guidance of RIC near-RT functions/use-cases, deploying/updating AI/ML models into RIC near-RT, and feedback mechanisms from RIC near-RT to ensure SLAs.”, wherein the examiner interprets “messages ..are conveyed” and “the A1 interface supporting communication and information exchange between RIC non-RT and RIC near-RT”, including the exposure of network and UE-level information and the deployment and updating of AI/ML models, to be the same as receiving, from a learning learner among the plurality of learning learners, a request for information for learning an operation determination model, because they are both directed to an explicit message-based exchange between the learning learner (non-RT RIC) requesting and receiving information and the electronic device (near-RT RIC) providing data and model-related updates for learning and training operations.)
and information associated with an operation corresponding to the at least one first parameter; ([Katti, page 11] “E2 interface feeds data, including various RAN measurements, to the RIC near-RT to facilitate radio resource management, it is also the interface through which the RIC near-RT may initiate configuration commands directly to CU/DU.”, wherein the examiner interprets the RIC near-RT obtaining RAN measurement data and issuing configuration commands to be the same as obtaining information associated with an operation corresponding to at least one first parameter, because they are both directed to using network-level operational data (RAN measurements) and control actions (configuration commands) that correspond to the identified parameters for adaptive model-based operation.)
provide the produced information for learning the operation determination model to the learning learner. ([Katti, page 11] “RIC non-RT can distribute well-trained user mobility and traffic prediction models to the RIC near-RT so that near-real-time predictions and decisions related to user mobility and traffic load are efficiently executed…In a similar fashion, E2 interface can be leveraged to fetch data feeds from the radio nodes and provide those to the RIC non-RT to train AI models.”, wherein the examiner interprets the distribution of trained models and the provision of data feeds between the RIC near-RT and RIC non-RT to be the same as “providing the information for learning the operation determination model to the learning learner”, because they are both directed to transmitting model-related training information and performance data between an executing component and a learning component, enabling continuous exchange for learning and model improvement.).
Katti does not teach in response to the request, identifying, among the at least one value, a value corresponding to at least one first parameter corresponding to the operation determination model…produce the information for learning the operation determination model based on the value corresponding to the at least one first parameter, the information associated with the operation corresponding to the at least one first parameter, and a reward value identified based on at least a part of the value corresponding to the at least one first parameter. 
Devitt teaches: 
in response to the request, identifying, among the at least one value, a value corresponding to at least one first parameter corresponding to the operation determination model ([Devitt, page 6] “A sensitivity analysis can be used to determine which events...have the strongest influence on the KPI...to perform a root cause analysis of predicted or actual KPI violations”, wherein the examiner interprets identifying the most influential inputs for a given KPI model (via sensitivity analysis) to be the same as “identifying a value corresponding to at least one first parameter corresponding to the operation determination model,” because both refer to locating specific parameters relevant to a given model’s function or training.)
produce the information for learning the operation determination model based on the value corresponding to the at least one first parameter, the information associated with the operation corresponding to the at least one first parameter, and a reward value identified based on at least a part of the value corresponding to the at least one first parameter; ([Devitt, col. 6, lines 7-9] “The utility node is particularly adapted to assign a value to each quality evaluation based on parameter (variable) value and decision combinations.”, wherein the examiner interprets the act of assigning a value to each quality evaluation based on parameter and decision combinations to be the same as “producing the information for learning the operation determination model based on the value corresponding to at least one first parameter, the information associated with the operation, and a reward value”, because they are both directed to generating a performance-based data output derived from parameters and operational decisions that quantifies how well a system performed, which is then usable for model training or learning.)
Katti, Devitt, and the instant application are analogous art because they are all directed to systems and methods for operating radio access networks (RANs) using machine learning models that dynamically determine operations based on stored parameter data, operational feedback, and model learning processes.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the model-based RAN control framework disclosed by Katti to include the sensitivity analysis disclosed by Devitt. One would be motivated to do so to effectively identify and prioritize key parameters most relevant to model performance within Katti’s RIC-based model training pipeline, as suggested by Devitt ([Devitt, page 6] “A sensitivity analysis can be used to determine which events...have the strongest influence on the KPI... to perform a root cause analysis of predicted or actual KPI violations.”). 

Regarding claim 11, Katti and Devitt teaches The method of claim 1, (see rejection of claim 1).
                Devitt further teaches:
 wherein, in response to the request, the identifying of the value corresponding to the at least one first parameter corresponding to the operation determination model among the at least one value, and the information associated with the operation corresponding to the at least one first parameter comprises:  ([Devitt et al., page 6] “A sensitivity analysis can be used to determine which events... have the strongest influence on the KPI... to perform a root cause analysis of predicted or actual KPI violations”, wherein the examiner interprets identifying the most influential inputs for a given KPI model (via sensitivity analysis) to be the same as “the identifying of the value corresponding to the at least one first parameter corresponding to the operation determination model” because both refer to locating specific parameters relevant to a given model’s function or training.)
 	identifying the at least one first parameter declared by the operation determination model. [Devitt, col 6, lines 43-55] “A Bayesian Network comprises as referred to above, a DAG structure with nodes representing statistical variables such as performance counters and the arcs represent the influential relationships between these nodes. In addition thereto there is an associated conditional probability distribution over said statistical variables, for example performance counters. The conditional probability distribution encodes the probability that the variables assume their different values given the values of other variables in the BN. According to different embodiments the probability distribution is assigned by an expert, learnt off-line from historical data or learnt on-line incrementally from a live feed of data. Most preferably the probabilities are learnt on-line on the network devices.” wherein the examiner interprets “nodes representing statistical variables such as performance counters” to be the same as “identifying the at least one first parameter declared by the operation determination model” as both describe defining parameters used within a model for determining system outcomes.   
               Katti, Devitt, and the instant application are analogous art because they are all directed to systems and methods for identifying and processing parameters in decision-making models using stored or dynamically learned data.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method of claim 1 as disclosed by Katti and Devitt to include the “nodes representing statistical variables such as performance counters” as disclosed by Devitt. One would be motivated to do so to effectively enhance the learning and identification of key parameters used in decision-making models, as suggested by Devitt (Devitt, [col 6, lines 43-55] “Most preferably the probabilities are learnt on-line on the network devices.”).


Regarding claim 12, Katti and Devitt teaches The method of claim 1, (see rejection of claim 1).
Devitt further teaches:
 wherein, in response to the request, the identifying of the value corresponding to the at least one first parameter corresponding to the operation determination model among the at least one value, and the information associated with the operation corresponding to the at least one first parameter comprises: [([Devitt et al., page 6] “A sensitivity analysis can be used to determine which events... have the strongest influence on the KPI... to perform a root cause analysis of predicted or actual KPI violations”, wherein the examiner interprets identifying the most influential inputs for a given KPI model (via sensitivity analysis) to be the same as “the identifying of the value corresponding to the at least one first parameter corresponding to the operation determination model” because both refer to locating specific parameters relevant to a given model’s function or training.).
identifying the at least one first parameter based on an external input. [Devitt, col 5, lines 40-55] “The performance parameters particularly comprise performance counters. The additional or correlation parameters ‘may comprise one or more of alarm, configuration action, KPI definition or external performance counters. In a preferred implementation a conditional probability distribution over the performance parameters (or variables) is provided which is adapted to encode the probability that the performance parameters (variables) assume different values when specific values are given for other performance variables or parameters. In one particular embodiment the arrangement is. adapted to receive the probability distribution on-line although there are also other ways to provide it, for example it may be provided by an expert, or learnt off-line e.g. from historical data.” wherein the examiner interprets “The additional or correlation parameters may comprise one or more of alarm, configuration action, KPI definition or external performance counters” to be the same as “identifying the at least one first parameter based on an external input”, as both describe determining at least one first parameter using externally provided information.  
Katti, Devitt, and the instant application are analogous art because they are all directed to methods for identifying and processing parameters in decision-making models based on externally provided inputs.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method of claim 1 as disclosed by Katti and Devitt to include the “additional or correlation parameters may comprise one or more of alarm, configuration action, KPI definition or external performance counters” as disclosed by Devitt. One would be motivated to do so to effectively enhance adaptability in parameter identification by incorporating external inputs, as suggested by Devitt (Devitt, [col 5, lines 40-55] “In one particular embodiment the arrangement is adapted to receive the probability distribution on-line although there are also other ways to provide it, for example it may be provided by an expert, or learnt off-line e.g. from historical data.”).
Regarding claim 14, Katti teaches: 
An electronic device, comprising: a storage device storing instructions; and at least one processor operatively connected to the storage device, wherein the instructions, when executed by the at least one processor, cause the electronic device to: store, in the storage device, at least one value corresponding to each of a plurality of parameters associated with a radio access network (RAN), and information associated with an operation performed by the RAN  ([Katti, page 11] “The RIC near-RT functions leverage a database called the Radio-Network Information Base (R-NIB) which captures the near real-time state of the underlying network via E2 and commands from RIC non-RT via A1.”, and [Katti, page 13] “Network & UE-level information/context exposure from eNB/gNB to RIC non-RT to support various requirements such as network management, online learning and offline training of AI/ML models and driving non-RT optimization into the network.”, wherein the examiner interprets the Radio-Network Information Base (R-NIB) capturing the near real-time state of the network and the network and UE-level information/context exposure between RIC near-RT and RIC non-RT to be the same as storing at least one value corresponding to each of a plurality of parameters associated with a RAN and information associated with an operation performed by the RAN, because they are both directed to the storage and maintenance of RAN-related parameters (such as network state, UE context, and operational data) that reflect ongoing operations of the radio access network.)
wherein a value corresponding to one or more parameters among the plurality of parameters is used based on a plurality of operation determination models executable by the electronic device to determine at least a part of the information associated with the operation, wherein each of the plurality of operation determination models is configured to output a respective operation to the RAN for the RAN to operate based on the respective operation, and each of the plurality of operation determination models is distinct from a respective learning learner and corresponds to a-the respective learning learner among a plurality of learning learners; ([Katti, page 11]. “Trained models and real-time control functions produced in the RIC non-RT are distributed to the RIC near-RT for runtime execution…Messages generated from AI-enabled policies and ML based training models in RIC non-RT are conveyed to RIC near-RT. The core algorithm of RIC non-RT is developed and owned by operators. It provides the capability to modify the RAN behaviors by deployment of different models optimized to individual operator policies and optimization objectives…While the E2 interface feeds data, including various RAN measurements, to the RIC near-RT to facilitate radio resource management, it is also the interface through which the RIC near-RT may initiate configuration commands directly to CU/DU.”, and [Katti, page 12] “With the amount of L1/L2/L3 data collected from eNB/gNB (including CU/DU), useful data features and models can be learned to empower the intelligent management and control in RAN.”, wherein the examiner interprets the RIC non-RT [non-Real Time] performing model-training and conveying trained models and policy messages to the RIC near-RT [near Real Time] and the RIC near-RT executing those models and initiating configuration commands to CU/DU, to be the same as “a plurality of operation determination models executable by the electronic device that output respective operations to the RAN” and the RIC non-RT (learning component) being distinct from but corresponding to the executed models in the RIC near-RT, because they are both directed to a separated learner-model pipeline where multi-layer RAN parameters (L1/L2/L3 features) are used to train models and the executed models produce operational outputs that reconfigure and control the RAN.).
receive, from a learning learner among the plurality of learning learners for learning the operation determination model, a request for information for learning an operation determination model selected from among the plurality of operation determination models ([Katti, page 11] “Messages generated from AI-enabled policies and ML based training models in RIC non-RT are conveyed to RIC near-RT” and [Katti, page 13] “The A1 interface supports communication & information exchange between Orchestration/NMS layer containing RIC non-RT and eNB/gNB containing RIC near-RT. Key functions that the A1 interface is expected to provide include: Network & UE-level information/context exposure from eNB/gNB to RIC non-RT to support various requirements such as network management, online learning and offline training of AI/ML models and driving non-RT optimization into the network. Support for policy-based guidance of RIC near-RT functions/use-cases, deploying/updating AI/ML models into RIC near-RT, and feedback mechanisms from RIC near-RT to ensure SLAs.”, wherein the examiner interprets “messages ..are conveyed” and “the A1 interface supporting communication and information exchange between RIC non-RT and RIC near-RT”, including the exposure of network and UE-level information and the deployment and updating of AI/ML models, to be the same as receiving, from a learning learner among the plurality of learning learners, a request for information for learning an operation determination model, because they are both directed to an explicit message-based exchange between the learning learner (non-RT RIC) requesting and receiving information and the electronic device (near-RT RIC) providing data and model-related updates for learning and training operations.)
and information associated with an operation corresponding to the at least one first parameter; ([Katti, page 11] “E2 interface feeds data, including various RAN measurements, to the RIC near-RT to facilitate radio resource management, it is also the interface through which the RIC near-RT may initiate configuration commands directly to CU/DU.”, wherein the examiner interprets the RIC near-RT obtaining RAN measurement data and issuing configuration commands to be the same as obtaining information associated with an operation corresponding to at least one first parameter, because they are both directed to using network-level operational data (RAN measurements) and control actions (configuration commands) that correspond to the identified parameters for adaptive model-based operation.)
provide the produced information for learning the operation determination model to the learning learner. ([Katti, page 11] “RIC non-RT can distribute well-trained user mobility and traffic prediction models to the RIC near-RT so that near-real-time predictions and decisions related to user mobility and traffic load are efficiently executed … In a similar fashion, E2 interface can be leveraged to fetch data feeds from the radio nodes and provide those to the RIC non-RT to train AI models.”, wherein the examiner interprets the distribution of trained models and the provision of data feeds between the RIC near-RT and RIC non-RT to be the same as “providing the information for learning the operation determination model to the learning learner”, because they are both directed to transmitting model-related training information and performance data between an executing component and a learning component, enabling continuous exchange for learning and model improvement.).


Katti does not teach in response to the request, identify a value corresponding to at least one first
parameter corresponding to the operation determination model …produce the information for learning the operation determination model based on the value corresponding to the at least one first parameter, the information associated with the operation corresponding to the at least one first parameter, and a reward value identified based on at least a part of the value corresponding to the at least one first parameter;.
Devitt teaches:
in response to the request, identify a value corresponding to at least one first
parameter corresponding to the operation determination model ([Devitt, page 6] “A sensitivity analysis can be used to determine which events... have the strongest influence on the KPI... to perform a root cause analysis of predicted or actual KPI violations”, wherein the examiner interprets identifying the most influential inputs for a given KPI model (via sensitivity analysis) to be the same as “identifying a value corresponding to at least one first parameter corresponding to the operation determination model,” because both refer to locating specific parameters relevant to a given model’s function or training.)
produce the information for learning the operation determination model based on the value corresponding to the at least one first parameter, the information associated with the operation corresponding to the at least one first parameter, and a reward value identified based on at least a part of the value corresponding to the at least one first parameter; ([Devitt, col. 6, lines 7-9] “The utility node is particularly adapted to assign a value to each quality evaluation based on parameter (variable) value and decision combinations.”, wherein the examiner interprets the act of assigning a value to each quality evaluation based on parameter and decision combinations to be the same as “producing the information for learning the operation determination model based on the value corresponding to at least one first parameter, the information associated with the operation, and a reward value”, because they are both directed to generating a performance-based data output derived from parameters and operational decisions that quantifies how well a system performed, which is then usable for model training or learning.)
Katti, Devitt, and the instant application are analogous art because they are all directed to ML driven RAN management in which network parameters and operational data are used to generate learning information that trains models controlling RAN operations.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the RIC based RAN control disclosed by Katti to include the “assign a value to each quality evaluation based on parameter value and decision combinations” disclosed by Devitt. One would be motivated to do so to efficiently generate training information that reflects KPI oriented performance for model learning, as suggested by Devitt (Devitt, [col. 6, lines 7-9] “assign a value to each quality evaluation based on parameter value and decision combinations.”). In addition, Devitt teaches identifying inputs with the “strongest influence on the KPI,” reinforcing the benefit of targeted learning signals (Devitt, [page 6] “strongest influence on the KPI”).

Regarding claim 15, Katti, and Devitt teaches The electronic device of claim 14 (see rejection of clam 14). 
Devitt further teaches wherein, in response to the request, as at least a part of the identifying of the value corresponding to the at least one first parameter corresponding to the operation determination model among the at least one value and the information associated with the operation corresponding to the at least one first parameter, [Devitt, col 8, lines 34-42] “For monitoring the KPI for a network device, the Decision Graph model subscribes to performance parameters and other events of interest, i.e. all those encoded in the Decision Graph, on the network device (or entire network). This means that any change to a performance parameter automatically is updated in the Decision Graph model. The basic functionality of a BN ensures that each individual change propagates through the Decision Graph changing the probabilities of related variables in the graph.”, wherein the examiner interprets “the Decision Graph model subscribes to performance parameters and other events of interest” to be the same as “in response to the request”, as both describe a system reacting to new input data. The examiner further interprets “any change to a performance parameter automatically is updated in the Decision Graph model” to be the same as “identifying of the value corresponding to the at least one first parameter corresponding to the operation determination model among the at least one value”, as both describe detecting changes in network parameters and integrating them into a decision-making framework. Finally, the examiner interprets “this incremental learning process means that over time the Decision Graph will be able to make predictions about future behaviour on the basis of past experience” to be the same as “the information associated with the operation corresponding to the at least one first parameter”, as both describe how historical performance data informs predictive decision-making for network optimization.)
the instructions, when executed by the at least one processor, cause the electronic device to: identify whether at least one value corresponding to each of the plurality of parameters supports the at least one first parameter, and [Devitt, col 8, lines 34-42] “For monitoring the KPI for a network device, the Decision Graph model subscribes to performance parameters and other events of interest, i.e. all those encoded in the Decision Graph, on the network device (or entire network). This means that any change to a performance parameter automatically is updated in the Decision Graph model. The basic functionality of a BN ensures that each individual change propagates through the Decision Graph changing the probabilities of related variables in the graph.”, wherein the examiner interprets “the Decision Graph model subscribes to performance parameters and other events of interest” to be the same as “the instruction, when executed by the at least one processor, cause the electronic device to: identify whether at least one value corresponding to each of the plurality of parameters supports the at least one first parameter”, as both describe a process and instructions where system components monitor and evaluate performance parameters to determine their impact on associated variables, ensuring the identification of relevant values for decision-making.)
based on the at least one first parameter being supported, identify the value corresponding to the at least one first parameter corresponding to the operation determination model and the information associated with the operation corresponding to the at least one first parameter. [Devitt, col 8, lines 34-54] “For monitoring the KPI for a network device, the Decision Graph model subscribes to performance parameters and other events of interest, i.e. all those encoded in the Decision Graph, on the network device (or entire network). This means that any change to a performance parameter automatically is updated in the Decision Graph model. The basic functionality of a BN ensures that each individual change propagates through the Decision Graph changing the probabilities of related variables in the graph. This means that a change to a performance parameter will result in a modification of the probabilities of the values of any associated KPIs. Over time these probability estimates will stabilize and they are fine-tuned by constant updates to performance parameters which constitute a consistent supply of evidence for incremental learning algorithms of the Decision Graph. This incremental learning process means that over time the Decision Graph will be able to make predictions about future behaviour on the basis of past experience. Normally these predictions become more accurate over time as the probability estimates are fine-tuned.”, wherein the examiner interprets “the Decision Graph model subscribes to performance parameters and other events of interest, i.e. all those encoded in the Decision Graph” to be the same as “based on the at least one first parameter being supported”, as both describe a system monitoring and relying on specific performance parameters to make determinations. The examiner further interprets “a change to a performance parameter will result in a modification of the probabilities of the values of any associated KPIs” to be the same as “identify the value corresponding to the at least one first parameter corresponding to the operation determination model and the information associated with the operation corresponding to the at least one first parameter”, as both describe dynamically determining and adjusting values related to a model based on parameter changes in a network system.)
, Katti, Devitt, and the instant application are analogous art because they are all directed to systems and methods for identifying and evaluating parameters in decision-making models based on performance data and system updates.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method of claim 14 disclosed by Katti and Devitt to include the Decision Graph model as disclosed by Devitt. One would be motivated to do so to effectively improve decision-making accuracy by continuously updating parameter values based on real-time system changes, as suggested by Devitt (Devitt, [col 8, lines 34-54] “Over time these probability estimates will stabilize and they are finetuned by constant updates to performance parameters which constitute a consistent supply of evidence for incremental learning algorithms of the Decision Graph.”).

Regarding claim 22, Katti teaches: 
A non-transitory computer-readable storage medium for storing instructions which, when executed individually and/or collectively by at least one processor of an electronic device, control the electronic device to perform: storing at least one value corresponding to each of a plurality of parameters associated with a radio access network (RAN), and information associated with an operation performed by the RAN, ([Katti, page 11] “The RIC near-RT functions leverage a database called the Radio-Network Information Base (R-NIB) which captures the near real-time state of the underlying network via E2 and commands from RIC non-RT via A1.”, and [Katti, page 13] “Network & UE-level information/context exposure from eNB/gNB to RIC non-RT to support various requirements such as network management, online learning and offline training of AI/ML models and driving non-RT optimization into the network.”, wherein the examiner interprets the Radio-Network Information Base (R-NIB) capturing the near real-time state of the network and the network and UE-level information/context exposure between RIC near-RT and RIC non-RT to be the same as storing at least one value corresponding to each of a plurality of parameters associated with a RAN and information associated with an operation performed by the RAN, because they are both directed to the storage and maintenance of RAN-related parameters (such as network state, UE context, and operational data) that reflect ongoing operations of the radio access network.)
wherein a value corresponding to one or more parameters among the plurality of parameters is used based on a plurality of operation determination models executable by the electronic device to determine at least a part of the information associated with the operation, wherein each of the plurality of operation determination models is configured to output a respective operation to the RAN for the RAN to operate based on the respective operation, and each of the plurality of operation determination models is distinct from a respective learning learner and corresponds to the respective learning learner among a plurality of learning learners; ([Katti, page 11]. “Trained models and real-time control functions produced in the RIC non-RT are distributed to the RIC near-RT for runtime execution…Messages generated from AI-enabled policies and ML based training models in RIC non-RT are conveyed to RIC near-RT. The core algorithm of RIC non-RT is developed and owned by operators. It provides the capability to modify the RAN behaviors by deployment of different models optimized to individual operator policies and optimization objectives…While the E2 interface feeds data, including various RAN measurements, to the RIC near-RT to facilitate radio resource management, it is also the interface through which the RIC near-RT may initiate configuration commands directly to CU/DU.”, and [Katti, page 12] “With the amount of L1/L2/L3 data collected from eNB/gNB (including CU/DU), useful data features and models can be learned to empower the intelligent management and control in RAN.”, wherein the examiner interprets the RIC non-RT [non-Real Time] performing model-training and conveying trained models and policy messages to the RIC near-RT [near Real Time] and the RIC near-RT executing those models and initiating configuration commands to CU/DU, to be the same as “a plurality of operation determination models executable by the electronic device that output respective operations to the RAN” and the RIC non-RT (learning component) being distinct from but corresponding to the executed models in the RIC near-RT, because they are both directed to a separated learner-model pipeline where multi-layer RAN parameters (L1/L2/L3 features) are used to train models and the executed models produce operational outputs that reconfigure and control the RAN.).
receiving, from a learning learner among the plurality of learning learners for learning the operation determination model, a request for information to learn an operation determination model selected from among the plurality of operation determination models; ([Katti, page 11] “Messages generated from AI-enabled policies and ML based training models in RIC non-RT are conveyed to RIC near-RT” and [Katti, page 13] “The A1 interface supports communication & information exchange between Orchestration/NMS layer containing RIC non-RT and eNB/gNB containing RIC near-RT. Key functions that the A1 interface is expected to provide include: Network & UE-level information/context exposure from eNB/gNB to RIC non-RT to support various requirements such as network management, online learning and offline training of AI/ML models and driving non-RT optimization into the network. Support for policy-based guidance of RIC near-RT functions/use-cases, deploying/updating AI/ML models into RIC near-RT, and feedback mechanisms from RIC near-RT to ensure SLAs.”, wherein the examiner interprets “messages ..are conveyed” and “the A1 interface supporting communication and information exchange between RIC non-RT and RIC near-RT”, including the exposure of network and UE-level information and the deployment and updating of AI/ML models, to be the same as receiving, from a learning learner among the plurality of learning learners, a request for information for learning an operation determination model, because they are both directed to an explicit message-based exchange between the learning learner (non-RT RIC) requesting and receiving information and the electronic device (near-RT RIC) providing data and model-related updates for learning and training operations.)
and information associated with an operation corresponding to the at least one first parameter; ([Katti, page 11] “E2 interface feeds data, including various RAN measurements, to the RIC near-RT to facilitate radio resource management, it is also the interface through which the RIC near-RT may initiate configuration commands directly to CU/DU.”, wherein the examiner interprets the RIC near-RT obtaining RAN measurement data and issuing configuration commands to be the same as obtaining information associated with an operation corresponding to at least one first parameter, because they are both directed to using network-level operational data (RAN measurements) and control actions (configuration commands) that correspond to the identified parameters for adaptive model-based operation.)
providing the information for learning the operation determination model to the learning learner. ([Katti, page 11] “RIC non-RT can distribute well-trained user mobility and traffic prediction models to the RIC near-RT so that near-real-time predictions and decisions related to user mobility and traffic load are efficiently executed … In a similar fashion, E2 interface can be leveraged to fetch data feeds from the radio nodes and provide those to the RIC non-RT to train AI models.”, wherein the examiner interprets the distribution of trained models and the provision of data feeds between the RIC near-RT and RIC non-RT to be the same as “providing the information for learning the operation determination model to the learning learner”, because they are both directed to transmitting model-related training information and performance data between an executing component and a learning component, enabling continuous exchange for learning and model improvement.).
Katti does not teach in response to the request, identifying, among the at least one value, a value corresponding to at least one first parameter corresponding to the operation determination model … producing the information for learning the operation determination model based on the value corresponding to the at least one first parameter, the information associated with the operation corresponding to the at least one first parameter, and a reward value identified based on at least a part of the value corresponding to the at least one first parameter;. 
Devitt teaches:
 in response to the request, identifying, among the at least one value, a value corresponding to at least one first parameter corresponding to the operation determination model ([Devitt, page 6] “A sensitivity analysis can be used to determine which events...have the strongest influence on the KPI... to perform a root cause analysis of predicted or actual KPI violations”, wherein the examiner interprets identifying the most influential inputs for a given KPI model (via sensitivity analysis) to be the same as “identifying a value corresponding to at least one first parameter corresponding to the operation determination model,” because both refer to locating specific parameters relevant to a given model’s function or training.)
producing the information for learning the operation determination model based on the value corresponding to the at least one first parameter, the information associated with the operation corresponding to the at least one first parameter, and a reward value identified based on at least a part of the value corresponding to the at least one first parameter; ([Devitt, col. 6, lines 7-9] “The utility node is particularly adapted to assign a value to each quality evaluation based on parameter (variable) value and decision combinations.”, wherein the examiner interprets the act of assigning a value to each quality evaluation based on parameter and decision combinations to be the same as “producing the information for learning the operation determination model based on the value corresponding to at least one first parameter, the information associated with the operation, and a reward value”, because they are both directed to generating a performance-based data output derived from parameters and operational decisions that quantifies how well a system performed, which is then usable for model training or learning.)
Katti, Devitt, and the instant application are analogous art because they are all directed to ML driven RAN management in which network parameters and operational data are used to generate learning information that trains models controlling RAN operations.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the RIC based RAN control disclosed by Katti to include the “assign a value to each quality evaluation based on parameter value and decision combinations” disclosed by Devitt. One would be motivated to do so to efficiently generate training information that reflects KPI oriented performance for model learning, as suggested by Devitt (Devitt, [col. 6, lines 7-9] “assign a value to each quality evaluation based on parameter value and decision combinations.”). In addition, Devitt teaches identifying inputs with the “strongest influence on the KPI,” reinforcing the benefit of targeted learning signals (Devitt, [page 6] “strongest influence on the KPI”).

Claim 2-5, and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Katti in view of Devitt, further in view of Peng et. al., US11201784B2 (referred herein as Peng).  

Regarding claim 2, Katti and Devitt teaches The method of claim 1 (see rejection of claim 1).
	Katti and Devitt do not teach wherein the storing of the at least one value corresponding to each of the plurality of parameters associated with the RAN, and the information associated with the operation performed by the RAN comprises: classifying and storing, for each of a plurality of points in time, the at least one value corresponding to each of the plurality of parameters and the information associated with the operation performed by the RAN.
Peng teaches wherein the storing of the at least one value corresponding to each of the plurality of parameters associated with the RAN, and the information associated with the operation performed by the RAN comprises: classifying and storing, for each of a plurality of points in time, the at least one value corresponding to each of the plurality of parameters and the information associated with the operation performed by the RAN. [Peng, col 7, lines 22-35] “As shown in FIG. 1, FIG. 1 is a flow chart of an artificial intelligence-based networking method for F-RANs, which 110: may include the following steps: Step a central computing logic module receives reported data which may include: measurement report data from user terminals, wireless transmission data from base stations, and operation and maintenance data from a radio access network. The measurement report data relates to user behavior information, the wireless transmission data relates to the performance indicators of the radio access network, and operation and maintenance data relates to service attributes.”, [Peng, FIG. 1] “Based on the reported data obtained during a cycle T1 and a proper machine learning algorithm, the central computing logic module configures an operating mode of the radio access network that matches the user behavior information, the service attributes, and the radio access network performance indicators ... The edge computing logic module receives the operating mode information from the central computing logic module. According to the operating mode, the edge computing logic module, during the cycle T2, determines whether the current configuration of the edge communication entity meets the networking aim.”, AND [Peng, col 13, 30-35] “The step 141 may further include, but is not limited to: using a deep reinforcement learning algorithm to learn the data related to the user terminals, and obtaining a strategy to perform configuration optimization of the edge communication entity.” wherein the examiner interprets “receives reported data which may include: measurement report data from user terminals, wireless transmission data from base stations, and operation and maintenance data from a radio access network” to be the same as “storing of the at least one value corresponding to each of the at least one parameter associated with the RAN, and the information associated with the operation performed by the RAN”, as both describe gathering and storing various types of network-related (or associated) data. The examiner further interprets “based on the reported data obtained during a cycle T1” to be the same as “for each of a plurality of points in time”, as both describe organizing and analyzing data over distinct time intervals. Additionally, the examiner interprets “using a deep reinforcement learning algorithm to learn the data related to the user terminals, and obtaining a strategy to perform configuration optimization of the edge communication entity” to be the same as “the information associated with the operation performed by the RAN”, as both describe leveraging stored data to optimize the operation of network components The “plurality of parameters” is analogous to the “user behavior information, the service attributes, and the radio access network performance indicators”)
Katti, Devitt, Peng, and the instant application are analogous art, because they are all directed to storing a value(s)/data using reinforcement learning and RAN performance optimization.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method of claim 1 disclosed by Katti and Devitt to include the “receives reported data ... including user behavior, service attributes, and performance indicators” disclosed by Peng. One would be motivated to do so to efficiently enhance the learning model's input data richness and relevance, as suggested by Peng (Peng, [col. 7, lines 22-35])  “Measurement report data relates to user behavior information, the wireless transmission data relates to the performance indicators of the radio access network, and operation and maintenance data relates to service attributes.” AND [Peng, col 13, 30-35] “The step 141 may further include, but is not limited to: using a deep reinforcement learning algorithm to learn the data related to the user terminals, and obtaining a strategy to perform configuration optimization of the edge communication entity.”)

Regarding claim 3, Katti and Devitt teaches The method of claim 1 (see rejection of claim 1).
	Katti and Devitt does not teach further comprising: obtaining, from the learning learner, the operation determination model updated based on the provided information; obtaining a new value corresponding to the at least one first parameter from the RAN; obtaining information associated with a new operation which is a result obtained by applying the new value corresponding to the at least one first parameter to the updated operation determination model; and providing the information associated with the new operation to the RAN.
Peng teaches: 
further comprising: obtaining, from the learning learner, the operation determination model updated based on the provided information; [Peng, col. 11, lines 18-22] “The central computing logic module receives updated reported data which includes the measurement report data from the user terminals, the wireless transmission data from the base stations, and the operation and maintenance data from the radio access network.” wherein the examiner interprets “the central computing logic module receives updated reported data” to be the same as “obtaining, from the first learning learner, the operation determination model updated based on the provided information”, as both describe receiving updated input that informs a subsequent decision-making or operational process.)
obtaining a new value corresponding to the at least one first parameter from the RAN; [Peng, col 18-19, 65-67, 1-5] “based on the reported data obtained during the cycle T1 and the proper machine learning algorithm, the central computing logic module configures more operating modes of radio access network that match the user behavior, the service attributes, and the performance indicators of the radio access network.” wherein the examiner interprets “reported data obtained during the cycle T1 and proper machine learning algorithm” to be the same as “obtaining a new value corresponding to the at least one first parameter from the RAN”, as both describe acquiring updated network data, including performance indicators, for further processing.)
obtaining information associated with a new operation which is a result obtained by applying the new value corresponding to the at least one first parameter to the updated operation determination model; [Peng, col. 11, lines 27-33] “The edge computing logic module receives the operating mode information from the central computing logic module. According to the operating mode, the edge computing logic module, during a cycle T2, determines whether the current configuration of the edge communication entity meets the networking aim.” wherein the examiner interprets “the edge computing logic module receives the operating mode information from the central computing logic module” to be the same as “obtaining information associated with a new operation which is a result obtained by applying the new value corresponding to the at least one first parameter to the updated operation determination model”, as both describe deriving operational configurations based on newly processed data.)
and providing the information associated with the new operation to the RAN. [Peng, col. 11, lines 65-67, col. 12, lines 1-3] “Step 140: If the current configuration meets the aim, the edge computing logic module allocates resources to the user terminals connected to the edge communication entity. The edge resources communication entities and user terminals that are allocated with proper resources are networked as an F-RAN. The resources may include radio resources, computing resources, and caching resources.” wherein the examiner interprets “allocates resources to the user terminals connected to the edge communication entity” to be the same as “providing the information associated with the new operation to the RAN”, as both describe transmitting the determined operational configuration to the network for implementation.)
Katti, Devitt, Peng, and the instant application are analogous art, because they are all directed to machine learning-based configuration and optimization of RAN behavior in response to updated network observations.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method of claim 1 disclosed by Katti and Devitt to include the “central computing logic module receives updated reported data which includes the measurement report data from the user terminals, the wireless transmission data from the base stations, and the operation and maintenance data from the radio access network” disclosed by Peng. One would be motivated to do so to effectively refine network control decisions based on fresh, granular network conditions and user behavior, as suggested by Peng (Peng, [col. 11, lines 18-22]) “The central computing logic module receives updated reported data which includes the measurement report data from the user terminals, the wireless transmission data from the base stations, and the operation and maintenance data from the radio access network.” AND [Peng, col. 11, lines 27-33] “the edge computing logic module receives the operating mode information from the central computing logic module”).

Regarding claim 4, Katti and Devitt teaches The method of claim 1 (see rejection of claim 1).
	Katti and Devitt does not teach further comprising: obtaining, from the learning learner, the operation determination model updated based on the provided information; identifying a parameter used by the updated operation determination model as at least one second parameter which is at least partially different from the at least one first parameter; obtaining a value corresponding to the at least one second parameter from the RAN; obtaining information associated with a new operation which is a result obtained by applying the value corresponding to the at least one second parameter to the updated operation determination model; and providing the information associated with the new operation to the RAN.
Peng teaches:
further comprising: obtaining, from the learning learner, the operation determination model updated based on the provided information; [Peng, col 13, lines 32-35] “using a deep reinforcement learning algorithm to learn the data related to the user terminals, and obtaining a strategy to perform configuration optimization of the edge communication entity.”, wherein the examiner interprets “using a deep reinforcement learning algorithm to learn the data related to the user terminals” to be the same as “obtaining, from the learning learner, the operation determination model updated based on the provided information”, as both describe acquiring an updated learning model that incorporates previously gathered data to refine operational decisions.)
identifying a parameter used by the updated operation determination model as at least one second parameter which is at least partially different from the at least one first parameter; [Peng, col 3, lines 10-19] “during the cycle T2, monitoring, by the edge computing logic module, performance of the edge communication entity and checking whether a variation of a target performance indicator exceeds a preset threshold; if exceeds, determining that the current configuration of the edge communication entity does not meet the networking aim. Then there is a need for the edge computing logic module to optimize the current configuration of the edge communication entity.”, wherein the examiner interprets “monitoring, by the edge computing logic module, performance of the edge communication entity and checking whether a variation of a target performance indicator exceeds a preset threshold” to be the same as “identifying a parameter used by the updated operation determination model as at least one second parameter which is at least partially different from the at least one first parameter”, as both describe evaluating a parameter that has changed and requires adjustment for the updated model.)
obtaining a value corresponding to the at least one second parameter from the RAN; [Peng, col 8, lines 48-54] “Step 1. the central computing logic module monitors the measurement report data from all the user terminals in the radio access network, and checks whether the obtained quality of service and the number of active user terminals exceed respective preset thresholds.”, wherein the examiner interprets “the central computing logic module monitors the measurement report data from all the user terminals in the radio access network, and checks whether the obtained quality of service and the number of active user terminals exceed respective preset thresholds” to be the same as “obtaining a value corresponding to the at least one second parameter from the RAN”, as both describe retrieving updated parameter values from network conditions for use in further processing. )
obtaining information associated with a new operation which is a result obtained by applying the value corresponding to the at least one second parameter to the updated operation determination model; [Peng, col 13, lines 32-35] “using a deep reinforcement learning algorithm to learn the data related to the user terminals, and obtaining a strategy to perform configuration optimization of the edge communication entity.”, wherein the examiner interprets “using a deep reinforcement learning algorithm to learn the data related to the user terminals, and obtaining a strategy to perform configuration optimization of the edge communication entity” to be the same as “obtaining information associated with a new operation which is a result obtained by applying the value corresponding to the at least one second parameter to the updated operation determination model”, as both describe processing updated data within a learning model to determine a new strategy or action for system optimization. )
and providing the information associated with the new operation to the RAN. [Peng, col 3, lines 10-19] “Then there is a need for the edge computing logic module to optimize the current configuration of the edge communication entity.”, wherein the examiner interprets “optimizing the current configuration of the edge communication entity” to be the same as “providing the information associated with the new operation to the RAN”, as both describe updating the network configuration based on newly obtained information.)
Katti, Devitt, Peng, and the instant application are analogous art, because they are all directed to dynamic model-based reconfiguration of a radio access network using updated and identified operational parameters and reinforcement learning.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method of claim 1 disclosed by Katti and Devitt to include the “monitoring, by the edge computing logic module, performance of the edge communication entity and checking whether a variation of a target performance indicator exceeds a preset threshold; if exceeds, determining that the current configuration ... does not meet the networking aim” disclosed by Peng. One would be motivated to do so to effectively detect when system performance deviates from desired metrics and dynamically update model inputs, as suggested by Peng (Peng, [col. 3, lines 10-19]) “During the cycle T2, monitoring ... and checking whether a variation of a target performance indicator exceeds a preset threshold; if exceeds, determining that the current configuration ... does not meet the networking aim.”)

Regarding claim 5, Katti, Devitt, and Peng teaches The method of claim 4 (see rejection of claim 4). 
Peng further teaches further comprising: identifying a new request for information for learning the operation determination model; [Peng, col 17, lines 15-21] “In the step 2, the edge computing logic module enters the trigger state to optimize the resource allocation of the edge communication entity. Here, taking the deep reinforcement learning as an example, referring to FIG. 7, the edge computing logic module performs actions according to the rewards brought by different actions in the current state”, wherein the examiner interprets “the edge computing logic module enters the trigger state to optimize the resource allocation of the edge communication entity ... taking deep reinforcement learning as an example” to be the same as “identifying a new request for information for learning the operation determination model”, as both describe a process where the system initiates an update based on a machine learning technique for optimization.) 
 in response to the new request, identifying a value corresponding to the at least one second parameter and information associated with an operation corresponding to the at least one second parameter; [Peng, col 17, lines 19-21] “the edge computing logic module performs actions according to the rewards brought by different actions in the current state” wherein the examiner interprets “the edge computing logic module performs actions according to the rewards brought by different actions in the current state” to be the same as “identifying a value corresponding to the at least one second parameter and information associated with an operation corresponding to the at least one second parameter”, as both describe evaluating parameters and corresponding operational actions in response to system conditions.)
 and providing, as new information, corresponding to the at least one second parameter, the information associated with the operation corresponding to the at least one second parameter, and a reward value identified based on at least a part of the value corresponding to the at least one second parameter. [Peng, col 17, lines 21-33] “The choice according to the resource allocation strategy obtained by deep reinforcement learning maximizes the benefits in a continuous time. In the step 3, after the resource adjustment is completed, the edge computing logic module monitors the performance and checks whether the networking aim is met. If it is not met, the edge computing logic module directly jumps to the cycle T2 and triggers the configuration optimization of edge communication entity; If it is met, then the edge computing logic module continues monitoring until the time reaches an integral multiple of the cycle T3, and performs the next round of resource allocation.”, wherein the examiner interprets “the choice according to the resource allocation strategy obtained by deep reinforcement learning maximizes the benefits in a continuous time” to be the same as “providing, as new information, the value corresponding to the at least one second parameter, the information associated with the operation corresponding to the at least one second parameter, and a reward value identified based on at least a part of the value corresponding to the at least one second parameter”, as both describe leveraging learned strategies to refine system performance based on received data, also using a learning approach.)
Katti, Devitt, Peng, and the instant application are analogous art, because they are all directed to machine learning-based systems for adaptively controlling and optimizing radio access network operations based on updated network data and observed performance.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method of claim 4 disclosed by Katti, Devitt, and Peng to include the “edge computing logic module enters the trigger state to optimize the resource allocation of the edge communication entity ... taking deep reinforcement learning as an example” disclosed by Peng. One would be motivated to do so to effectively initiate model updates and data collection in response to real-time network conditions for improved configuration learning, as suggested by Peng (Peng, [col. 17, lines 15-21]) “the edge computing logic module enters the trigger state to optimize the resource allocation of the edge communication entity ... referring to FIG. 7, the edge computing logic module performs actions according to the rewards brought by different actions in the current state.”)

Regarding claim 10, Katti and Devitt teaches The method of claim 1, (see rejection of claim 1).
	Katti and Devitt do not teach further comprising: identifying the reward value based on a reward determination scheme and at least a part of the value corresponding to the at least one first parameter, wherein the reward determination scheme is stored in advance in the electronic device or is received by the electronic device.
	Peng  teaches further comprising: identifying the reward value based on a reward determination scheme and at least a part of the value corresponding to the at least one first parameter, wherein the reward determination scheme is stored in advance in the electronic device or is received by the electronic device. [Peng, col 14, lines 1-3, lines 17-19, lines 45-46, “The reward function of the DRL1 may refer to the number of user terminals whose outrage rate is greater than a preset threshold....The reward function of the DRL2 may refer to the average throughput of all access points....the reward function is a weighted sum of success and failure of data transmission to a node of the next hopage” wherein the examiner interprets “the reward function of the DRL1 may refer to the number of user terminals whose outrage rate is greater than a preset threshold,” and “The reward function of the DRL2 may refer to the average throughput of all access points,” and “the reward function is a weighted sum of success and failure of data transmission to a node of the next hop” to be the same as “identifying the reward value based on a reward determination scheme and at least a part of the value corresponding to the at least one first parameter”, as both describe determining a reward value using predefined criteria that evaluate network performance. The reward function(s) DRL1, DRL2, etc. are predefined or provided externally and hence the quote provided by Peng is also the same as “the reward determination scheme is stored in advance in the electronic device or is received by the electronic device”)
	Katti, Devitt, Peng, and the instant application are analogous art because they are all directed to optimizing network behavior by evaluating parameter-driven performance using predefined or externally provided reward functions.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method of claim 1 disclosed by Katti and Devitt to include the “the reward function of the DRL1 may refer to the number of user terminals whose outrage rate is greater than a preset threshold ... the reward function is a weighted sum of success and failure of data transmission to a node of the next hop” disclosed by Peng. One would be motivated to do so to effectively incorporate structured and externally configurable performance metrics that allow reinforcement models to evaluate network behavior in a consistent and scalable manner, as suggested by Peng (Peng, [col. 14, lines 1–3, 17–19, 45–46]  “The reward function is a weighted sum of success and failure of data transmission to a node of the next hopage”)

Claim 7-9, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Katti in view of Devitt in view of NPL reference “Reinforcement Learning for Dynamic Resource Optimization in 5G Radio Access Network Slicing” by Shi et. al. (referred herein as Shi) further in view of US-11201784-B2, by Peng et. al. (referred herein as Peng). 

Regarding claim 7,  Katti and Devitt teaches The method of claim 1, (see rejection of claim 1). 
Kati and Devitt does not teach wherein the producing of the information for learning the operation determination model comprises: producing the information for learning the operation determination model based on the value corresponding to the at least one first parameter, the information associated with the operation corresponding to the at least one first parameter, and an entirety of the value corresponding to the at least one first parameter, and an entirety of a reward value identified based on the entirety of the value.
Shi teaches wherein the producing of the information for learning the operation determination model comprises: producing the information for learning the operation determination model based on the value corresponding to the at least one first parameter, the information associated with the operation corresponding to the at least one first parameter, ([Shi, page 4] “The gNodeB applies Q-learning to compute the function Q : S × A → R to evaluate the quality of action A producing reward R at state S. Note that the gNodeB maintains Q as the Q-table. At each time t, the gNodeB selects an action at, observes a reward rt, and transitions from the current state st to a new state st+1 (this transition depends on current state st and action at), and updates Q.” wherein the examiner interprets the state S and action A used to compute Q(S, A) to be the same as the value corresponding to at least one first parameter because state S includes resource parameters (frequency, CPU, latency, etc), and also they are both directed to using parameter values (network state variables) as inputs for producing learning information (Q-updates). The examiner further interprets the action a (which corresponds to the operation taken by the system, e.g. resource allocation)ₜ selected by “the gNodeB” to be the same as the information associated with the operation corresponding to the at least one first parameter, because they are both directed to an operational decision derived from specific parameter values used in model learning.)
Katti, Devitt, and Shi do not teach and an entirety of the value corresponding to the at least one first parameter, and an entirety of a reward value identified based on the entirety of the value.
Peng teaches and an entirety of the value corresponding to the at least one first parameter, and an entirety of a reward value identified based on the entirety of the value. [(Peng, Figure 1, [110]] “Based on the reported data obtained during a cycle T₁ and a proper machine-learning algorithm, the central computing logic module configures an operating mode of the radio access network that matches the user behavior information, the service attributes, and the radio access network performance indicators.” wherein the examiner interprets “reported data…user behavior, service attributes, and performance indicators” to be the same as an entirety of the value corresponding to the at least one first parameter, because Peng uses all collected parameters (e.g. user behavior, service attributes, and performance indicators) together to determine network operation and they are both directed to using the full set of such measured parameter values rather than partial subsets when generating learning or configuration information.)
Katti, Devitt, Shi, Peng, and the instant application are analogous art because they are all directed to methods of operating a RAN that utilize learning-based models to optimize operational decisions.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the  method of claim 1 disclosed by Katti and Devitt to include the reinforcement learning-based RAN control framework disclosed by Shi and the central computing logic module disclosed by Peng. One would be motivated to do so to effectively enhance the RAN’s adaptability and decision-making precision by combining Shi’s distributed, reward-driven Q-learning update framework with Peng’s centralized configuration logic and cycle-based parameter aggregation, as suggested by Peng ([Peng, page 3, Fig. 1] “Based on the reported data obtained during a cycle T₁ and a proper machine-learning algorithm, the central computing logic module configures an operating mode of the radio access network that matches the user behavior information, the service attributes, and the radio access network performance indicators.”).

Regarding claim 8, Katti and Devitt teaches The method of claim 1, (see rejection of claim 1). 
Katti and Devitt does not teach wherein the producing of the information for learning the operation determination model comprises: selecting a part among the value corresponding to the at least one first parameter, the information associated with the operation corresponding to the at least one first parameter, and the value corresponding to the at least one first parameter, and producing the information for learning the operation determination model based on the selected part and a reward value identified based on the selected part.
Shi teaches wherein the producing of the information for learning the operation determination model comprises: selecting a part among the value corresponding to the at least one first parameter, the information associated with the operation corresponding to the at least one first parameter, and the value corresponding to the at least one first parameter, ([Shi, page 6] “At each time t, the gNodeB selects an action aₜ, observes a reward rₜ, and transitions from the current state sₜ to a new state sₜ₊₁ (this transition depends on current state sₜ and action aₜ), and updates Q.” wherein the examiner interprets selecting an action a_t based on the current state s_t in Shi to be the same as selecting a part among the value corresponding to the at least one first parameter and the information associated with the operation, because they are both directed to choosing a subset of available parameter information (state/action pair) that will be used to generate learning data for the model’s update step.)
Katti, Devitt, and Shi do not teach and producing the information for learning the operation determination model based on the selected part and a reward value identified based on the selected part.
Peng teaches and producing the information for learning the operation determination model based on the selected part and a reward value identified based on the selected part. ([Peng, col 15, lines 45-54]), “The edge computing logic module selects an action according to certain rewards so that the resource allocation based on Deep Q Network (DQN) can maximize the benefit in a continuous period. The state is jointly defined by interference distribution, link status, buffer status, available computing resources, etc. The reward function is selected from one or more of the following: rate, energy efficiency, delay, etc.”, wherein the examiner interprets producing updated model information by selecting an action according to a reward function in Peng to be the same as producing the information for learning the operation determination model based on the selected part and a reward value identified based on the selected part, because they are both directed to using a reward-driven feedback mechanism to generate training information for model optimization based on the parameters and decisions previously selected.)
Katti, Devitt, Shi, Peng, and the instant application are analogous art because they are all directed to methods of generating and refining learning information in a RAN environment by selecting parameters and operational information to update a machine learning or reinforcement learning (RL) model.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the reinforcement learning-based resource selection and Q-value update process disclosed by Shi to include the edge computing logic module disclosed by Peng. One would be motivated to do so to effectively enhance model optimization and improve continuous RAN decision-making by integrating a DQN-based reward function that enables faster convergence and higher operational efficiency, as suggested by Peng (Peng, [col. 15, lines 45–54] “The edge computing logic module selects an action according to certain rewards so that the resource allocation based on Deep Q Network (DQN) can maximize the benefit in a continuous period.”).
Regarding claim 9, Katti, Devitt, Shi, and Peng teaches The method of claim 8, (see rejection of claim 8).
	Shi further teaches wherein the selecting of the value corresponding to the at least one first parameter, the information associated with the operation corresponding to the at least one first parameter, and the part of the value corresponding to the at least one first parameter comprises: [Shi, page 4], “We use Q-learning as the model-free RL algorithm to learn the policy that determines which action (resource assignment) to take under a given state (available resources and requests) for the gNodeB. The gNodeB applies Q-learning to compute the function Q : S x A → R to evaluate the quality of action A producing reward R at state S. Note that the gNodeB maintains Q as the Q-table. At each time t, the gNodeB selects an action at, observes a reward rt, and transitions from the current state st to a new state st+1 (this transition depends on current state st and action at), and updates Q…An action of the gNodeB at time t corresponds to the assignment of resources to a request at time t.” wherein the examiner interprets learning a policy that determines which action to take under a given state, selecting an action at each time, and the action being the assignment of resources to a request to be the same as the selecting of the value corresponding to the at least one first parameter, the information associated with the operation corresponding to the at least one first parameter, and the part of the value corresponding to the at least one first parameter because they are both directed to choosing, under system state and reward feedback, the specific subset of parameter values and operation information that will be used for the learning experience.)
selecting the part among the value corresponding to the at least one first parameter, the information associated with the operation corresponding to the at least one first parameter, and the value corresponding to the at least one first parameter based on at least one operation among: ([Shi, page 5], “For comparison, we consider three baseline algorithms: random, first come first serve (FCFS), and myopic algorithms.” wherein the examiner interprets using “baseline algorithms” such as “random” and “first come first serve (FCFS)” that determine which request is served to be the same as “selecting the part among the value corresponding to the at least one first parameter….selecting the part based on at least one operation among” because they are both directed to applying a chosen operation from multiple available operations to perform the selection.)
selecting the part based on priority of each of the at least one first parameter, ([Shi, page 5], “In optimization problem (18), weight w_ij assigns priority to request j of UE i. … We show the number of served requests for each UE in Table II [the impact of weight]. Results indicate that if a UE’s weight is increased and it is larger than others, the number of served requests for this UE increases relative [to] other UEs.”, wherein the examiner interprets weights that “assigns priority to request(s)” and affect which requests are served to be the same as “selecting the part based on priority of each of the at least one first parameter” because they are both priority-driven choices about which item is selected.)
selecting the part based on a point in time at which each value corresponding to the at least one first parameter is obtained, ([Shi, page 5], “ FCFS algorithm: Available resources are allocated to net work slice requests based on the arrival times of requests, i.e., at any given time, the oldest network slice request is answered first provided that the available resources are sufficient to grant this request.” wherein the examiner interprets selecting by “arrival time” and serving the “oldest” request to be the same as “selecting the part based on the point in time at which each value is obtained” because they are both time-order criteria (a.k.a. parameters) that use when the item arrived/was obtained to decide selection.)
or selecting the part in a random manner. ([Shi, page 5], “Random algorithm: Available resources are allocated to uniformly randomly selected network slice requests.”, wherein the examiner interprets “uniformly random selection” of requests to be the same as “selecting the part in a random manner” because they are both random selection rules that do not depend on additional attributes.)
Katti, Devitt, Shi, Peng, and the instant application are analogous art because they are all directed to optimizing network behavior by evaluating parameter-driven performance using predefined reward functions.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method of claim 8 as disclosed by Katti, Devitt, Shi, and Peng to include the reward and priority procedure for serving requests as disclosed by Shi. One would be motivated to do so to effectively guide the learning toward prioritized service fulfillment over time, as suggested by Shi ([Shi, page 3], “assignments is to maximize the weighted number of supported requests or the total provided services, where weights represent priorities of these requests.”)

Regarding Claim 16, Katti and Devitt teaches The electronic device of claim 14 (see rejection of clam 14). 
Katti and Devitt do not teach wherein, as at least a part of the producing the information for learning the operation determination model, the instructions, when executed by the at least one processor, cause the electronic device to: select a part among the value corresponding to the at least one first parameter, the information associated with the operation corresponding to the at least one first parameter, and the value corresponding to the at least one first parameter, produce the information for learning the operation determination model based on the selected part and a reward value identified based on the selected part. 
Shi teaches wherein, as at least a part of the producing the information for learning the operation determination model, the instructions, when executed by the at least one processor, cause the electronic device to: select a part among the value corresponding to the at least one first parameter, the information associated with the operation corresponding to the at least one first parameter, and the value corresponding to the at least one first parameter, ([Shi, page 6] “At each time t, the gNodeB selects an action aₜ, observes a reward rₜ, and transitions from the current state sₜ to a new state sₜ₊₁ (this transition depends on current state sₜ and action aₜ), and updates Q.” wherein the examiner interprets selecting an action a_t based on the current state s_t in Shi to be the same as select a part among the value corresponding to the at least one first parameter, the information associated with the operation corresponding to the at least one first parameter, and the value corresponding to the at least one first parameter, because they are both directed to choosing a subset of available parameter information (state/action pair) that will be used to generate learning data for the model’s update step.)
Katti, Devitt, and Shi does not teach produce the information for learning the operation determination model based on the selected part and a reward value identified based on the selected part. 
Peng teaches produce the information for learning the operation determination model based on the selected part and a reward value identified based on the selected part ([Peng, col 15, lines 45-54), “The edge computing logic module selects an action according to certain rewards so that the resource allocation based on Deep Q Network (DQN) can maximize the benefit in a continuous period. The state is jointly defined by interference distribution, link status, buffer status, available computing resources, etc. The reward function is selected from one or more of the following: rate, energy efficiency, delay, etc.” wherein the examiner interprets producing updated model information by selecting an action according to a reward function in Peng to be the same as producing the information for learning the operation determination model based on the selected part and a reward value identified based on the selected part, because they are both directed to using a reward-driven feedback mechanism to generate training information for model optimization based on the parameters and decisions previously selected.)
Katti, Devitt, Shi, Peng, and the instant application are analogous art because they are all directed to methods of operating radio access network (RAN) control systems that use parameterized state information, machine learning models, and reward-based updates to optimize network operations.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method of claim 14 disclosed by Katti, and Devitt to include choosing/selecting a subset of available parameter information disclosed by Shi and the edge computing logic module disclosed by Peng. One would be motivated to do so to effectively enhance the optimization of RAN operational decisions under dynamic network conditions by incorporating reward-based resource control, as suggested by Peng ([Peng, col. 15, lines 45-54] “The edge computing logic module selects an action according to certain rewards so that the resource allocation based on Deep Q Network (DQN) can maximize the benefit in a continuous period.”).

Claims 6,13, and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Katti in view of Devitt further in view of Shi.
Regarding claim 6, Katti and Devitt teaches The method of claim 1, (see rejection of claim 1).
	Devitt further teaches wherein, in response to the request, the identifying of the value corresponding to the at least one first parameter corresponding to the operation determination model among the at least one value and the information associated with the operation corresponding to the at least one first parameter comprises: ([Devitt et al., page 6] “A sensitivity analysis can be used to determine which events... have the strongest influence on the KPI... to perform a root cause analysis of predicted or actual KPI violations”, wherein the examiner interprets identifying the most influential inputs for a given KPI model (via sensitivity analysis) to be the same as “identifying a value corresponding to at least one first parameter corresponding to the operation determination model,” because both refer to locating specific parameters relevant to a given model’s function or training.)
	Katti and Devitt do not teach identifying whether at least one value corresponding to each of the plurality of parameters supports the at least one first parameter, and based on the at least one first parameter being supported, identifying the value corresponding to the at least one first parameter corresponding to the operation determination model and the information associated with the operation corresponding to the at least one first parameter.
Shi teaches:                identifying whether at least one value corresponding to each of the plurality of parameters supports the at least one first parameter, ([Shi, Sec 1.B] “In our Q-learning solution, the states correspond to the available resources that transition over time depending on how they are occupied (for granted network slicing requests) or released (for completed requests).”, wherein the examiner interprets “the states correspond to the available resources that transition over time depending on how they are occupied (for granted network slicing requests) or released (for completed requests)” to be the same as “identifying whether at least one value corresponding to each of the plurality of parameters supports the at least one first parameter,”, as both describe evaluating whether existing resource states (values corresponding to parameters) align with the requirements of network slicing requests (first parameter), determining whether they can be used in a given operational scenario.)
 and based on the at least one first parameter being supported, identifying the value corresponding to the at least one first parameter corresponding to the operation determination model and the information associated with the operation corresponding to the at least one first parameter. [Shi, Sec 1.B] “We show that Q-learning successfully allocates resources over a time horizon and provides major gains in network utility compared to myopic, random and first come first served (FCFS) resource allocation algorithms. As the number of UEs increases or priorities of network slicing requests change over time, we show that Q-learning successfully adapts to dynamic user demands.”, wherein the examiner interprets “Q-learning successfully allocates resources over a time horizon and provides major gains in network utility” to be the same as identifying the value corresponding to the parameters corresponding to the operation determination model and the information associated with the operation corresponding, as both describe determining and applying resource allocation decisions based on dynamic conditions and an optimization model.)
Katti, Devitt, Shi, and the instant application are analogous art because they are all directed to dynamically optimizing resource allocation in networked systems using machine learning techniques.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method of claim 1 disclosed by Shi, Peng, and Devitt to include the approach in which “Q-learning successfully allocates resources over a time horizon and provides major gains in network utility” as disclosed by Shi. One would be motivated to do so to efficiently enhance resource allocation and adaptability in response to changing network conditions, as suggested by Shi (Shi, [Sec 1.B] “we show that Q-learning successfully adapts to dynamic user demands.”)

Regarding claim 13, Katti and Devitt teaches The method of claim 1, (see rejection of claim 1).
	Katti and Devitt do not teach wherein the information for learning the operation determination model comprises: a value corresponding to the at least one first parameter at a first point in time, a first operation performed by the RAN at the first point in time, a value corresponding to the at least one first parameter at a second point in time after the first point in time according to a result of performing the first operation, and a reward value at the first point in time.
Shi teaches wherein the information for learning the operation determination model comprises: a value corresponding to the at least one first parameter at a first point in time, a first operation performed by the RAN at the first point in time, a value corresponding to the at least one first parameter at a second point in time after the first point in time according to a result of performing the first operation, and a reward value at the first point in time. [Shi, page 4, sec 3.A] “Starting Q as a random matrix and using the weighted average of the old value and the new information, Q-learning performs the value iteration update for Q as follows:  
    PNG
    media_image1.png
    62
    463
    media_image1.png
    Greyscale
where α is the learning rate (0 < α ≤ 1) and γ is the discount factor (0 ≤ γ ≤ 1) for rewards over time. In (19), max_a Q(st+1, a) refers to the estimate of the optimal future value of Q”  wherein the examiner interprets “using the weighted average of the old value and the new information” to be the same as “a value corresponding to the at least one first parameter at a first point in time”, as both describe incorporating past parameter values as a start into an iterative learning process. The examiner further interprets “Q-learning performs the value iteration update for Q” to be the same as “an operation performed by the RAN at the first point in time”, as both describe executing an operation that updates system behavior. The examiner also interprets “max_a Q(st+1, a) refers to the estimate of the optimal future value of Q” to be the same as “a value corresponding to the at least one first parameter at a second point in time after the first point in time according to a result of performing the first operation”, as both describe computing a future parameter value based on the results of a prior operation. Finally, the examiner interprets “γ is the discount factor (0 ≤ γ ≤ 1) for rewards over time” to be the same as “a reward value at the first point in time”, as both describe assigning a reward value based on past actions and their anticipated impact over time.
Katti, Devitt, Shi, and the instant application are analogous art because they are all directed to reinforcement learning-based optimization of network resource allocation and decision-making over time.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method of claim 1 disclosed by Katti and Devitt to include the process of “using the weighted average of the old value and the new information” as disclosed by Shi. One would be motivated to do so to efficiently improve decision-making accuracy by incorporating past and future values into learning iterations, as suggested by Shi (Shi, [page 4, Sec 3.A] “Q-learning performs the value iteration update for Q”).
Regarding claim 23, Katti and Devitt teaches The method of claim 1, (see rejection of claim 1).
Katti and Devitt do not teach wherein the information for learning the operation determination model is produced temporarily, and wherein the method further comprises: after the information for learning the operation determination model is provided to the learning learner, deleting the information for learning the operation determination model.
Shi teaches wherein the information for learning the operation determination model is produced temporarily, and wherein the method further comprises: after the information for learning the operation determination model is provided to the learning learner, deleting the information for learning the operation determination model. ([Shi, page 1] “Each decision of resource allocation makes some of the resources temporarily unavailable for future,...therefore, a Q-learning solution is presented to maximize the network utility…over a time horizon.” and [Shi, page 4] “The transition of the state at time t is driven by blocking resources for requests that are granted at time t and releasing resources after the lifetimes of active services expire at time t.”, wherein the examiner interprets the RL agent generates temporary state and resource allocation data at each time step to determine actions over a time horizon; this data exists only during the current learning episode and is discarded once the next episode begins. Furthermore, as highlighted in the specification [0058, 0063] of the instant application, the use of learning information and the subsequent deleting is the same as producing data “temporarily”.)
Katti, Devitt, Shi, and the instant application are analogous art because they are all directed to methods of operating a RAN using machine-learning-based that manage network parameters, generate ops data for learning, and optimize RAN configurations.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the centralized RAN optimization process disclosed by Peng to include the reinforcement-learning technique disclosed by Shi. One would be motivated to do so to efficiently improve adaptive decision-making and learning continuity within each RAN optimization cycle, as suggested by Shi ([Shi, page 4] “The transition of the state at time t is driven by blocking resources for requests that are granted at time t and releasing resources after the lifetimes of active services expire at time t.”).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DEVAN KAPOOR whose telephone number is (703)756-1434. The examiner can normally be reached Monday - Friday: 9:00AM - 5:00 PM EST (times may vary).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached at (571) 270-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/DEVAN KAPOOR/Examiner, Art Unit 2126                                                                                                                                                                                                        
/DAVID YI/Supervisory Patent Examiner, Art Unit 2126
Read full office action
Prosecution Timeline

Show 3 earlier events
May 29, 2025
Applicant Interview (Telephonic)
May 29, 2025
Examiner Interview Summary
Jul 30, 2025
Final Rejection mailed — §101, §103
Sep 30, 2025
Request for Continued Examination
Oct 08, 2025
Response after Non-Final Action
Nov 14, 2025
Non-Final Rejection mailed — §101, §103
Feb 12, 2026
Response Filed
May 26, 2026
Final Rejection mailed — §101, §103 (current)
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

5-6
Expected OA Rounds
10%
Grant Probability
27%
With Interview (+16.7%)
4y 4m (~7m remaining)
Median Time to Grant
High
PTA Risk
Based on 10 resolved cases by this examiner. Grant probability derived from career allowance rate.