Last updated: April 19, 2026

Application No. 18/609,797

ENERGY SAVING IN CELLULAR WIRELESS NETWORKS VIA TRANSFER DEEP REINFORCEMENT LEARNING

Non-Final OA §103

Filed

Mar 19, 2024

Examiner

SAMPAT, RUSHIL PARIMAL

Art Unit

2469

Tech Center

2400 — Computer Networks

Assignee

Samsung Electronics Co., Ltd.

OA Round

1 (Non-Final)

Interview Optional

— +4.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 327 resolved cases, 2023–2026

Examiner Intelligence

SAMPAT, RUSHIL PARIMAL View full profile →

Grants 88% — above average

Career Allow Rate

286 granted / 327 resolved

+29.5% vs TC avg

Minimal +5% lift

Without

With

+4.8%

Interview Lift

resolved cases with interview

Typical timeline

2y 8m

Avg Prosecution

29 currently pending

Career history

356

Total Applications

across all art units

Statute-Specific Performance

§101

1.3%

-38.7% vs TC avg

§103

58.5%

+18.5% vs TC avg

§102

23.5%

-16.5% vs TC avg

§112

14.9%

-25.1% vs TC avg

Black line = Tech Center average estimate • Based on career data from 327 resolved cases

Office Action

§103

DETAILED ACTION
Claim(s) 1-20 are presented for examination.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
	
Priority
As required by M.P.E.P.201.14(c), acknowledgement is made to applicant’s claim for priority based on application(s) 63/470,131 submitted on May 31st, 2023.

Information Disclosure Statement
The information disclosure statement(s) (IDS) submitted on March 19th, 2024 follow the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 U.S.C. § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. § 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim(s) 1, 2, 4-6, 8-11, 13-15 and 17-20 are rejected under 35 U.S.C. § 103 as being unpatentable over JEONG et al. (US 2022/0343117 A1) hereinafter “Jeong” in view of Lin et al. (US 2024/0056933 A1) hereinafter “Lin”.

Regarding Claims 1 and 10,
Jeong discloses an apparatus for operating a target base station [see fig(s). 2B & 4, pg. 3, ¶52 lines 1-15, an agent, such as a policy training server “250”], the apparatus [see fig(s). 2B & 4, pg. 3, ¶52 lines 1-15, the agent or policy training server “250”] comprising: 
a memory storing instructions [see fig(s). 2B & 4, pg. 3, ¶52 lines 1-15, memory “255” storing various functional modules]; and 
one or more processors communicatively coupled to the memory [see fig(s). 2B & 4, pg. 3, ¶52 lines 1-15, the memory “255” coupled to a processing circuit “253”]; 
wherein the one or more processors are configured to execute the instructions to [see fig(s). 2B & 4, pg. 3, ¶52 lines 1-15, the processing circuit “253” implemented to]: 
collect a plurality of trajectories corresponding to the target base station and a plurality of source base stations [see fig. 4: Step “404”, pg. 5, ¶71 lines 1-9, the agent collects a trajectory sample including the current state, the next state following the action, the action and the reward associated with the action]; 
cluster [see fig. 4: Step “406”, pg. 5, ¶72 lines 1-6; ¶73 lines 1-10, the agent runs a policy update and a value function estimator update], using an unsupervised reinforcement learning model [see fig. 4: Step “406”, pg. 5, ¶72 lines 1-6; ¶73 lines 1-10, by computing a robust target value using collected samples and the value function estimator], the plurality of trajectories into a plurality of clusters comprising a target cluster [see fig. 4: Step “406”, pg. 5, ¶72 lines 1-6; ¶73 lines 1-10, in parallel using trajectory samples collected]; 
select [see fig. 4: Step(s) “408”/ “410”, pg. 5, ¶73 lines 1-10, the agent updates], as a target trajectory [see fig. 4: Step(s) “408”/ “410”, pg. 5, ¶73 lines 1-10, upon computation of the robust target value and a loss of value function estimator], a selected trajectory from the target cluster that maximizes an energy-saving parameter of the target base station [see fig. 4: Step(s) “408”/ “410”, pg. 5, ¶73 lines 1-10, the value function estimator]; and 
apply [see fig. 4: Step “402”, pg. 5, ¶71 lines 1-9, the agent applies], to the target base station [see fig. 4: Step “402”, pg. 5, ¶71 lines 1-9, to the environment], an energy-saving control policy corresponding to the target trajectory [see fig. 4: Step “402”, pg. 5, ¶71 lines 1-9, the policy by executing an action according to the policy].
Although Jeong discloses clustering the plurality of trajectories into a plurality of clusters comprising a target cluster, Jeong does not explicitly teach “the target cluster corresponding to the target base station and at least one source base station from among the plurality of source base stations”.
However Lin discloses a method for operating a target base station [see fig. 9, pg. 12, ¶159 lines 1-6, an example method in a network node], by an apparatus [see fig. 9, pg. 12, ¶159 lines 1-6, the network node], the method [see fig. 9, pg. 12, ¶159 lines 1-6, the example method] comprising: 
collecting a plurality of trajectories corresponding to the target base station and a plurality of source base stations [see fig. 9: Step “912”, pg. 12, ¶161 lines 1-14, the network node obtains data samples for modeling a wireless network environment that comprises a plurality of cells]; 
clustering [see fig. 9: Step “914”, pg. 12, ¶164 lines 1-6; ¶165 lines 1-10, the network node builds], using an unsupervised reinforcement learning model [see fig. 9: Step “914”, pg. 12, ¶164 lines 1-6; ¶165 lines 1-10, via a machine learning model (e.g., reinforcement learning model) of the wireless network trained to determine], the plurality of trajectories into a plurality of clusters comprising a target cluster [see fig. 9: Step “914”, pg. 12, ¶164 lines 1-6; ¶165 lines 1-10, using the obtained data samples], the target cluster corresponding to the target base station and at least one source base station from among the plurality of source base stations [see fig. 9: Step “914”, pg. 12, ¶164 lines 1-6; ¶165 lines 1-10, a sequence of handovers for a wireless device among the plurality of cells for the wireless device to traverse from a source cell to a destination cell by minimizing one or more of a number of handovers, radio link failure (RLF) .. etc.]; 
selecting [see fig. 9: Step “918”, pg. 12, ¶168 lines 1-10, the network node (the machine learning model) determines], as a target trajectory [see fig. 9: Step “918”, pg. 12, ¶168 lines 1-10, based on the mobility information comprising a destination], a selected trajectory from the target cluster that maximizes an energy-saving parameter of the target base station [see fig. 9: Step “918”, pg. 12, ¶168 lines 1-10, an optimal next hop and/or an optimal route to a destination]; and 
applying [see fig. 9: Step “920”, pgs. 12-13, ¶170 lines 1-11, the network node transmits], to the target base station [see fig. 9: Step “920”, pgs. 12-13, ¶170 lines 1-11, to the wireless device as the wireless device navigates to the destination (i.e. destination cell)], an energy-saving control policy corresponding to the target trajectory [see fig. 9: Step “920”, pgs. 12-13, ¶170 lines 1-11, a next hop handover or a sequence of handovers].
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to include “the target cluster corresponding to the target base station and at least one source base station from among the plurality of source base stations” as taught by Lin the system of Jeong for providing efficient three-dimensional (3D) mobility support using reinforcement learning [see Lin pg. 1, ¶1 lines 1-4].

Regarding Claims 2 and 11,
The combined system of Jeong and Lin discloses the apparatus of claim 10.
Jeong further discloses wherein the one or more processors are further configured to execute further instructions to: 
monitor one or more energy-saving parameters of the target base station [see fig. 4: Step(s) “408”/ “410”, pg. 5, ¶73 lines 1-10, the agent computes a loss of value function estimator]; and 
adjust the energy-saving control policy applied to the target base station based on the one or more energy-saving parameters [see fig. 4: Step(s) “408”/ “410”, pg. 5, ¶73 lines 1-10, the agent updates the value function estimator, for example, via backpropagation with loss function, which is a method of training a neural network].

Regarding Claims 4 and 13,
The combined system of Jeong and Lin discloses the apparatus of claim 10.
Jeong further discloses wherein the one or more processors are further configured to execute further instructions to: 
generate [see pg. 7, ¶90 lines 1-21, generating], using a base reinforcement learning model [see pg. 7, ¶90 lines 1-21, according to a policy π, a subsequent state st+1 of the simulated system following the action at, and a reward r associated with the action at], a plurality of source control policies corresponding to the plurality of source base stations [see pg. 7, ¶90 lines 1-21, a trajectory sample of a simulated system that corresponds to the dynamic system, the trajectory sample comprising a current state st of the simulated system at time t, an action at taken on the simulated system at time t].

Regarding Claims 5 and 14,
The combined system of Jeong and Lin discloses the apparatus of claim 13.
Jeong further discloses wherein the one or more processors are further configured to execute further instructions to: 
collect a plurality of source base station trajectories corresponding to the plurality of source base stations [see fig. 4: Step “404”, pg. 5, ¶71 lines 1-9, the agent collects a trajectory sample including the current state, the next state following the action, the action and the reward associated with the action], based on the plurality of source control policies [see fig. 4: Step “402”, pg. 5, ¶71 lines 1-9, according to the policy by executing an action according to the policy]; and 
select the energy-saving control policy from among a control policy of the target base station and the plurality of source control policies [see fig. 4: Step(s) “408”/ “410”, pg. 5, ¶73 lines 1-10, the agent updates the value function estimator, for example, via backpropagation with loss function].

Regarding Claims 6 and 15,
The combined system of Jeong and Lin discloses the apparatus of claim 10.
Jeong further discloses wherein the one or more processors are further configured to execute further instructions to: 
formulate the plurality of trajectories based on a Markov Decision Process (MDP) [see pgs. 3-4, ¶54 lines 1-8, robust reinforcement learning utilizes Robust Markov Decision Processes], wherein each trajectory of the plurality of trajectories comprises a state space [see pgs. 3-4, ¶54 lines 1-8, which combines ideas from Reinforcement Learning and Robust Control], an action space [see pgs. 3-4, ¶54 lines 1-8, to create agents with embedded uncertainty about the simulated environment], a reward function [see pgs. 3-4, ¶54 lines 1-8, opting for pessimistic optimization in order to handle potential gaps between simulators], and a state transition probability function [see pgs. 3-4, ¶54 lines 1-8 and reality].

Regarding Claims 8 and 17,
The combined system of Jeong and Lin discloses the apparatus of claim 10.
Jeong further discloses wherein the one or more processors are further configured to execute further instructions to: 
perform iterative testing of respective control policies of each trajectory of the target cluster [see pg. 7, ¶90 lines 1-21, estimating a robust target value Vπ(st ) for the trajectory sample, wherein the robust target value Vπ(st ) is an expected value of a sum of the reward r and a minimum estimated value Vπ(st+1 ) of the simulated system at the subsequent state st+1 based on a plurality of transition possibilities p from the current state st in response to the action at]; 
determine [see pg. 7, ¶90 lines 1-21, updating], for each trajectory of the target cluster [see pg. 7, ¶90 lines 1-21, based on the robust target value], an accumulated reward [see pg. 7, ¶90 lines 1-21, a value function estimator]; and 
select [see pg. 7, ¶90 lines 1-21, updating], as the target trajectory [see pg. 7, ¶90 lines 1-21, based on the trajectory and the value function estimator], the selected trajectory from the target cluster that maximizes the accumulated reward [see pg. 7, ¶90 lines 1-21, the policy].

Regarding Claims 9 and 18,
The combined system of Jeong and Lin discloses the apparatus of claim 17.
Jeong further discloses wherein the one or more processors are further configured to execute further instructions to: 
perform iterative testing of the respective control policies of each trajectory of the target cluster for a predetermined number of iterations [see pg. 7, ¶90 lines 1-21, estimating a robust target value Vπ(st ) for the trajectory sample, wherein the robust target value Vπ(st ) is an expected value of a sum of the reward r and a minimum estimated value Vπ(st+1 ) of the simulated system at the subsequent state st+1 based on a plurality of transition possibilities p from the current state st in response to the action at].

Regarding Claim 19,
Jeong discloses an non-transitory computer-readable storage medium storing computer-executable instructions for operating a target base station by an apparatus that [see fig(s). 2B & 4, pg. 3, ¶52 lines 1-15, memory “255” storing various functional modules], when executed by at least one processor of the apparatus [see fig(s). 2B & 4, pg. 3, ¶52 lines 1-15, the memory “255” coupled to a processing circuit “253” of an agent, such as a policy training server “250”], cause the apparatus to [see fig(s). 2B & 4, pg. 3, ¶52 lines 1-15, the processing circuit “253” of the policy training server “250” implemented to]: 
collect a plurality of trajectories corresponding to the target base station and a plurality of source base stations [see fig. 4: Step “404”, pg. 5, ¶71 lines 1-9, the agent collects a trajectory sample including the current state, the next state following the action, the action and the reward associated with the action]; 
cluster [see fig. 4: Step “406”, pg. 5, ¶72 lines 1-6; ¶73 lines 1-10, the agent runs a policy update and a value function estimator update], using an unsupervised reinforcement learning model [see fig. 4: Step “406”, pg. 5, ¶72 lines 1-6; ¶73 lines 1-10, by computing a robust target value using collected samples and the value function estimator], the plurality of trajectories into a plurality of clusters comprising a target cluster [see fig. 4: Step “406”, pg. 5, ¶72 lines 1-6; ¶73 lines 1-10, in parallel using trajectory samples collected]; 
select [see fig. 4: Step(s) “408”/ “410”, pg. 5, ¶73 lines 1-10, the agent updates], as a target trajectory [see fig. 4: Step(s) “408”/ “410”, pg. 5, ¶73 lines 1-10, upon computation of the robust target value and a loss of value function estimator], a selected trajectory from the target cluster that maximizes an energy-saving parameter of the target base station [see fig. 4: Step(s) “408”/ “410”, pg. 5, ¶73 lines 1-10, the value function estimator]; and 
apply [see fig. 4: Step “402”, pg. 5, ¶71 lines 1-9, the agent applies], to the target base station [see fig. 4: Step “402”, pg. 5, ¶71 lines 1-9, to the environment], an energy-saving control policy corresponding to the target trajectory [see fig. 4: Step “402”, pg. 5, ¶71 lines 1-9, the policy by executing an action according to the policy].
Although Jeong discloses clustering the plurality of trajectories into a plurality of clusters comprising a target cluster, Jeong does not explicitly teach “the target cluster corresponding to the target base station and at least one source base station from among the plurality of source base stations”.
However Lin discloses a method for operating a target base station [see fig. 9, pg. 12, ¶159 lines 1-6, an example method in a network node], by an apparatus [see fig. 9, pg. 12, ¶159 lines 1-6, the network node], the method [see fig. 9, pg. 12, ¶159 lines 1-6, the example method] comprising: 
collecting a plurality of trajectories corresponding to the target base station and a plurality of source base stations [see fig. 9: Step “912”, pg. 12, ¶161 lines 1-14, the network node obtains data samples for modeling a wireless network environment that comprises a plurality of cells]; 
clustering [see fig. 9: Step “914”, pg. 12, ¶164 lines 1-6; ¶165 lines 1-10, the network node builds], using an unsupervised reinforcement learning model [see fig. 9: Step “914”, pg. 12, ¶164 lines 1-6; ¶165 lines 1-10, via a machine learning model (e.g., reinforcement learning model) of the wireless network trained to determine], the plurality of trajectories into a plurality of clusters comprising a target cluster [see fig. 9: Step “914”, pg. 12, ¶164 lines 1-6; ¶165 lines 1-10, using the obtained data samples], the target cluster corresponding to the target base station and at least one source base station from among the plurality of source base stations [see fig. 9: Step “914”, pg. 12, ¶164 lines 1-6; ¶165 lines 1-10, a sequence of handovers for a wireless device among the plurality of cells for the wireless device to traverse from a source cell to a destination cell by minimizing one or more of a number of handovers, radio link failure (RLF) .. etc.]; 
selecting [see fig. 9: Step “918”, pg. 12, ¶168 lines 1-10, the network node (machine learning model) determines], as a target trajectory [see fig. 9: Step “918”, pg. 12, ¶168 lines 1-10, based on the mobility information comprising a destination], a selected trajectory from the target cluster that maximizes an energy-saving parameter of the target base station [see fig. 9: Step “918”, pg. 12, ¶168 lines 1-10, an optimal next hop and/or an optimal route to a destination]; and 
applying [see fig. 9: Step “920”, pgs. 12-13, ¶170 lines 1-11, the network node transmits], to the target base station [see fig. 9: Step “920”, pgs. 12-13, ¶170 lines 1-11, to the wireless device as the wireless device navigates to the destination (i.e. destination cell)], an energy-saving control policy corresponding to the target trajectory [see fig. 9: Step “920”, pgs. 12-13, ¶170 lines 1-11, a next hop handover or a sequence of handovers].
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to include “the target cluster corresponding to the target base station and at least one source base station from among the plurality of source base stations” as taught by Lin the system of Jeong for providing efficient three-dimensional (3D) mobility support using reinforcement learning [see Lin pg. 1, ¶1 lines 1-4].

Regarding Claim 20,
The combined system of Jeong and Lin discloses the non-transitory computer-readable storage medium of claim 19.
Jeong further discloses wherein the computer-executable instructions, when executed by the at least one processor, further cause the apparatus to: 
monitor one or more energy-saving parameters of the target base station [see fig. 4: Step(s) “408”/ “410”, pg. 5, ¶73 lines 1-10, the agent computes a loss of value function estimator]; and 
adjust the energy-saving control policy applied to the target base station based on the one or more energy-saving parameters [see fig. 4: Step(s) “408”/ “410”, pg. 5, ¶73 lines 1-10, the agent updates the value function estimator, for example, via backpropagation with loss function, which is a method of training a neural network].

Allowable Subject Matter
Claims 3, 7, 12 and 16 is/are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all the limitations of the base claim and any intervening claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
United States Patent Application Publication: Legg et al. (US 2015/0189589 A1); see fig. 4, pg. 4, ¶56.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to RUSHIL P SAMPAT whose telephone number is (469) 295-9141. The examiner can normally be reached on Mon-Fri (8 AM - 5 PM).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ian Moore can be reached on (571) 272-3085. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/RUSHIL P. SAMPAT/Primary Examiner- TC 2400, Art Unit 2469

Read full office action

Prosecution Timeline

Mar 19, 2024

Application Filed

Feb 22, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/272,491

Patent 12604338

HANDLING DOWNLINK AND UPLINK COLLISIONS IN HALF DUPLEX FREQUENCY DIVISION DUPLEX USER EQUIPMENT

2y 5m to grant Granted Apr 14, 2026

18/314,329

Patent 12603730

HYBRID AUTOMATIC REPEAT REQUEST PROCESS DETERMINATION FOR ENHANCED SEMI-PERSISTENT SCHEDULING AND/OR CONFIGURED GRANT CONFIGURATION

2y 5m to grant Granted Apr 14, 2026

18/270,933

Patent 12598589

ACCESS AND TRANSMISSION METHOD FOR AVOIDING PREAMBLE COLLISION IN MASSIVE ACCESS TECHNOLOGY, AND NETWORK-SIDE DEVICE, TERMINAL AND STORAGE MEDIUM

2y 5m to grant Granted Apr 07, 2026

18/189,149

Patent 12580712

METHOD AND APPARATUS FOR CONFIGURING SEMI-PERSISTENT CSI-RS RESOURCES

2y 5m to grant Granted Mar 17, 2026

18/271,120

Patent 12581516

RESOURCE SELECTION IN SIDE LINK (SL) COMMUNICATIONS

2y 5m to grant Granted Mar 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

88%

Grant Probability

92%

With Interview (+4.8%)

2y 8m

Median Time to Grant

Low

PTA Risk

Based on 327 resolved cases by this examiner. Grant probability derived from career allow rate.