DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office action is in responsive to communication(s): original application filed on 12/05/2022, said application claims a priority filing date of 12/28/2021. Claims 1-5 are pending. Claims 1 and 4 are independent.
Drawings
Figure 1 should be designated by a legend such as --Prior Art-- because only that which is old is illustrated (see Page 2, lines 8-9 of the specification). See MPEP § 608.02(g). Corrected drawings in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. The replacement sheet(s) should be labeled “Replacement Sheet” in the page header (as per 37 CFR 1.84(c)) so as not to obstruct any portion of the drawing figures. If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they do not include the following reference sign(s) mentioned in the description: 120 in Page 12, lines 2-8. Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(4) because (1) reference characters "220" in FIG. 3; Page 10, line 8; Page 12, line 23; Page 13, line 25 – Page 14, line 26; Page 16, line 12 – Page 17, line 4; Page 18, line 21; and Claims 1 and 3 and "120" in Page 2, lines 2-8 have both been used to designate "reinforcement learning agent"; and (2) reference characters "123" in FIG. 4 and "213" in Page 12, line 28; Page 14, line 2; Page 18, line 20; and Claim 3 have both been used to designate "simulation portion". Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they include the following reference character(s) not mentioned in the description: 123 in FIG. 4. Corrected drawing sheets in compliance with 37 CFR 1.121(d), or amendment to the specification to add the reference character(s) in the description in compliance with 37 CFR 1.121(b) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Specification
The disclosure is objected to because of the following informalities:
in .
Appropriate correction is required.
Claim Objections
Claims 1-5 are objected to because of the following informalities:
in Claim 1, lines 17-19 (i.e., Page 19, lines 19-21), "… feedback regarding decision making by a reinforcement learning agent (220); and a reinforcement learning agent (220) configured to …" appears to be "… feedback regarding decision making by a reinforcement learning agent (220); and the reinforcement learning agent (220) configured to …";
in Claim 2, lines 1-2 (i.e., Page 20, lines 9-10) and Claim 3, lines 1-2 (i.e., Page 20, lines 14-15), "The apparatus for reinforcement learning based on a user learning environment in semiconductor design of claim 1 …" appears to be "The apparatus for the reinforcement learning based on the user learning environment in the semiconductor design of claim 1 …" (see also 112 rejection to Claim 1);
in Claim 3, lines 5-12 (i.e., Page 20, lines 18-25), "… included in design data through configuration information input from the user terminal (100), distinguish semiconductor elements, standard cells, and wires according to characteristics or functions so as to prevent learning ranges from increasing during reinforcement learning, and distinguish, based on addition of specific colors, the objects distinguished according to characteristics or functions, thereby configuring a customized reinforcement learning environment …" appears to be "… included in the design data through the configuration information input from the user terminal (100), distinguish the semiconductor elements, the standard cells, and the wires according to the characteristics or functions so as to prevent the learning ranges from the increasing during the reinforcement learning, and distinguish, based on the addition of the specific colors, the objects distinguished according to the characteristics or functions, thereby configuring the customized reinforcement learning environment …" (see also 112 rejection to Claim 1);
in Claim 3, lines 13-22 (i.e., Page 20, line 26 – Page 21, line 5), "… analyze object information comprising semiconductor elements and standard cells based on design data comprising semiconductor netlist information, generate simulation data constituting a customized reinforcement learning environment by adding constraint or position change information configured by the environment configuration portion (211), and request, based on the simulation data, the reinforcement learning agent (220) to provide optimization information for disposition of at least one semiconductor element and standard cell …" appears to be "… analyze the object information comprising the semiconductor elements and the standard cells based on the design data comprising the semiconductor netlist information, generate simulation data constituting the customized reinforcement learning environment by adding the constraint or the position change information configured by the environment configuration portion (211), and request, based on the simulation data, the reinforcement learning agent (220) to provide optimization information for the disposition of the at least one semiconductor element and standard cell …" (see also 112 rejection to Claims 1 and 3);
in Claim 3, lines 23-31 (i.e., Page 21, lines 6-14), "… perform simulation constituting a reinforcement learning environment regarding semiconductor elements and standard cells, based on actions received from the reinforcement learning agent (220), and state information comprising semiconductor element disposition information to be used for reinforcement learning, and provide the reinforcement learning agent (220) with reward information calculated based on connection information of semiconductor elements and standard cells simulated as feedback regarding decision making …" appears to be "… perform the simulation constituting the reinforcement learning environment regarding the semiconductor elements and the standard cells, based on actions received from the reinforcement learning agent (220), and the state information comprising semiconductor element disposition information to be used for the reinforcement learning, and provide the reinforcement learning agent (220) with the reward information calculated based on the connection information of the semiconductor elements and the standard cells simulated as the feedback regarding the decision making …" (see also 112 rejection to Claim 1);
in Claim 4, lines 2-3 (i.e., Page 21, lines 18-19), "… the method comprising the steps of …" appears to be "… the method comprising steps of …";
in Claim 4, lines 10-12 (i.e., Page 21, lines 26-28), "… adding constraint or position change information with regard to each object through configuration information input from a user terminal (100) …" appears to be "… adding constraint or position change information with regard to each object through configuration information input from the user terminal (100) …";
in Claim 4, lines 19-21 (i.e., Page 22, lines 5-7) "… so as to optimize disposition of at least one semiconductor element disposition and stand cell disposition …" appears to be "… so as to optimize disposition of at least one semiconductor element and standard cell …" according to Claim 1;
in Claim 4, lines 30-32 (i.e., Page 22, lines 16-17), "… wherein the customized reinforcement learning environment configured in step b) …" appears to be "… wherein the customized reinforcement learning environment configured in the step b) …";
in Claim 4, lines 37-38 (i.e., Page 22, lines 23-24), "… wherein, in step c), the reinforcement learning server (200) determines an action …" appears to be "… wherein, in the step c), the reinforcement learning server (200) determines an action …" ;
In Claim 5, lines 1-3 (i.e., Page 22, line 30 – Page 23, line 2), "The method for reinforcement learning based on a user learning environment in semiconductor design of claim 4, wherein the design data in step a) is …" appears to "The method for the reinforcement learning based on the user learning environment in the semiconductor design of claim 4, wherein the design data in the step a) is …" (see also 112 rejection to Claim 4).
Appropriate correction is required.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are: "a simulation engine (210) configured to ..." and "a reinforcement learning agent (220) configured to ..." in Claim 1; and "an environment configuration portion (211) configured to ...", "a reinforcement learning environment construction portion (212) configured to …" and "a simulation portion (213) configured to ..." in Claim 3.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-5 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites the limitation "An apparatus for " in lines 1-2, which rendering the claim indefinite because it is unclear whether these instances of ".
Claim 1 recites the limitation "." in lines 4-35 (i.e., Page 19, line 6 , which rendering the claim indefinite because (1) it is unclear whether "a semiconductor element and a standard cell", "at least one semiconductor element and standard cell", and "semiconductor elements and standard cells" are related or not; (2) it is unclear whether the multiple instances of "semiconductor elements", "standard cells", and "wires" are the same or not; and (3) if they are different, which instance of "semiconductor elements" and "standard cells" is respectively referred by "the semiconductor elements" and "the standard cells" disposed in optimal positions. Clarification is required.
Claim 1 recites the limitation "... based on an action determined to optimize disposition of at least one semiconductor element and standard cell ... determining an action so as to optimize disposition of semiconductor elements and standard cells ... determines an action ... through learning using a reinforcement learning algorithm such that the semiconductor elements and the standard cells are disposed in optimal positions" in lines 12-35 (i.e., Page 19, line 14 , which rendering the claim indefinite because it is unclear whether these three instances of ".
Claim 1 recites the limitation "... perform simulation based on …, and state information of the customized reinforcement learning environment ... perform reinforcement learning based on state information and reward information ..." in line, which rendering the claim indefinite because it is unclear whether these two instances of ".
Claim 1 recites the limitation "... provide reward information calculated based on ... perform reinforcement learning based on state information and reward information ..." in line, which rendering the claim indefinite because it is unclear whether these two instances of ".
Claim 1 recites the limitation "... distinguishes semiconductor elements, standard cells, and wires according to characteristics or functions, and distinguishes, based on addition of specific colors, the objects distinguished according to characteristics or functions ..." in line, which rendering the claim indefinite because it .
Claim 1 recites the limitation "... by adding constraint or position change information with regard to each object through configuration information input … distinguishes, based on addition of specific colors, the objects distinguished according to characteristics or functions ..." in lines 7-27 (i.e., Page 19, lines 9-29). There is insufficient antecedent basis for the limitation "the objects" in the claim. Clarification is required.
Claims 2-3 are rejected for fully incorporating the deficiency of their respective base claims.
Claim 3 recites the limitation "." in lines 4-7 (i.e., Page 20, lines 17-20), which rendering the claim indefinite because ".
Claim 4 recites the limitation "A " in lines 1-, which rendering the claim indefinite because it is unclear whether .
Claim 4 recites the limitation "." in lines (see also claim objection to Claim 4), which rendering the claim indefinite because (1) it is unclear whether "a semiconductor element and a standard cell", "at least one semiconductor element and standard cell", and "semiconductor elements and standard cells" are related or not; (2) it is unclear whether the multiple instances of "semiconductor elements", "standard cells", and "wires" are the same or not; and (3) if they are different, which instance of "semiconductor elements" and "standard cells" is respectively referred by "the semiconductor elements" and "the standard cells" disposed in optimal positions. Clarification is required.
Claim 4 recites the limitation "... determining an action so as to optimize disposition of at least one semiconductor element ... regarding disposition of the semiconductor element and standard cell based on an action ... determines an action ... through learning using a reinforcement learning algorithm such that the semiconductor elements and the standard cells are disposed in optimal positions" in lines 1, which rendering the claim indefinite because it is unclear whether these three instances of ".
Claim 4 recites the limitation "... performing … reinforcement learning based on reward information and state information ... generating reward information calculated based on ..." in line, which rendering the claim indefinite because it is unclear whether these two instances of ".
Claim 4 recites the limitation "... configuring a customized reinforcement learning environment by adding ... the customized reinforcement learning environment comprising disposition information of semiconductor elements and standard cells ... simulation constituting a reinforcement learning environment regarding disposition of the semiconductor element and standard cell ... " in lines, which rendering the claim indefinite because it is unclear whether ".
Claim 4 recites the limitation "... performing" in lines 22-2, which rendering the claim indefinite because it is unclear whether these two instances of ".
Claim 4 recites the limitation "... distinguishes semiconductor elements, standard cells, and wires according to characteristics or functions … and distinguishes, based on addition of specific colors, the objects distinguished according to characteristics or functions ..." in line, which rendering the claim indefinite because it is unclear .
Claim 4 recites the limitation "... by adding constraint or position change information with regard to each object through configuration information input … distinguishes, based on addition of specific colors, the objects distinguished according to characteristics or functions ..." in lines . There is insufficient antecedent basis for the limitation "the objects" in the claim. Clarification is required.
Claim 5 is rejected for fully incorporating the deficiency of their respective base claims.
Claim limitations "a simulation engine (210) configured to ..." and "a reinforcement learning agent (220) configured to ..." in Claims 1-3; and "an environment configuration portion (211) configured to ...", "a reinforcement learning environment construction portion (212) configured to …" and "a simulation portion (213) configured to ..." in Claim 3 invokes 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. The whole disclosure is completely silent on any structure that performs these functions in these claims Therefore, these claims are indefinite and are rejected under 35 U.S.C. 112(b) or pre-AIA 35 U.S.C. 112, second paragraph.
Applicant may:
(a) Amend these claims so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph;
(b) Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or
(c) Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either:
(a) Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or
(b) Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-3 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because "An apparatus ... comprising: a simulation engine (210) configured to ...; and a reinforcement learning agent (220) configured to ..." (in Claim 1) and "… wherein the simulation engine (210) comprises: an environment configuration portion (211) configured to …; a reinforcement learning environment construction portion (212) configured to …; and a simulation portion (213) configured to … " (in Claim 3) are recited and however, the whole specification is completely silent on any structure that performs functions for "simulation engine (210)", "reinforcement learning agent (220)", "environment configuration portion (211)", "reinforcement learning environment construction portion (212)", and "simulation portion (213)" in an apparatus. These functional components without describing their structures in the specification can be considered as software components to implement these functions; i.e., software per se. Therefore, an apparatus with only software components are cited in the claim is not one of the four categories of patent eligible subject matter.
Allowable Subject Matter
Claims 1-3 would be allowable if rewritten or amended to overcome the rejection(s) under 35 U.S.C. 101 and 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action.
Claims 4-5 would be allowable if rewritten or amended to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action.
The following is a statement of reasons for the indication of allowable subject matter: .
In regard to independent Claims 1 and 4, prior arts of records, either singularly or in combination, do not teach or suggest the combination of claimed elements including "an apparatus for reinforcement learning based on a user learning environment in semiconductor design, the apparatus comprising: a simulation engine (210) configured to analyze object information comprising a semiconductor element and a standard cell based on design data comprising semiconductor netlist information, configure a customized reinforcement learning environment by adding constraint or position change information with regard to each object through configuration information input from a user terminal (100) and the analyzed object information, perform reinforcement learning based on the customized reinforcement learning environment, perform simulation based on an action determined to optimize disposition of at least one semiconductor element and standard cell, and state information of the customized reinforcement learning environment, and provide reward information calculated based on connection information of semiconductor elements and standard cells according to a simulation result as feedback regarding decision making by a reinforcement learning agent (220); and a reinforcement learning agent (220) configured to perform reinforcement learning based on state information and reward information received from the simulation engine (210), thereby determining an action so as to optimize disposition of semiconductor elements and standard cells, wherein the simulation engine (210) distinguishes semiconductor elements, standard cells, and wires according to characteristics or functions, and distinguishes, based on addition of specific colors, the objects distinguished according to characteristics or functions, thereby preventing learning ranges from increasing during reinforcement learning, and wherein the reinforcement learning agent (220) determines an action, by reflecting distances between semiconductor elements and lengths of wires connecting semiconductor elements and standard cells, through learning using a reinforcement learning algorithm such that the semiconductor elements and the standard cells are disposed in optimal positions" or "a method for reinforcement learning based on a user learning environment in semiconductor design, the method comprising the steps of: a) receiving, by a reinforcement learning server (200), design data comprising semiconductor netlist information from a user terminal (100);b) analyzing, by the reinforcement learning server (200), object information comprising a semiconductor element and a standard cell from the received design data, and configuring a customized reinforcement learning environment by adding constraint or position change information with regard to each object through configuration information input from a user terminal (100), based on the analyzed object information; c) performing, by the reinforcement learning server (200), reinforcement learning based on reward information and state information of the customized reinforcement learning environment comprising disposition information of semiconductor elements and standard cells to be used for reinforcement learning through a reinforcement learning agent, thereby determining an action so as to optimize disposition of at least one semiconductor element disposition and stand cell disposition; and d) performing, by the reinforcement learning server (200), simulation constituting a reinforcement learning environment regarding disposition of the semiconductor element and standard cell based on an action, and generating reward information calculated based on connection information of semiconductor elements and standard cells according to a result of performing simulation as feedback regarding decision making by the reinforcement learning agent, wherein the customized reinforcement learning environment configured in step b) distinguishes semiconductor elements, standard cells, and wires according to characteristics or functions so as to prevent learning ranges from increasing during reinforcement learning, and distinguishes, based on addition of specific colors, the objects distinguished according to characteristics or functions, and wherein, in step c), the reinforcement learning server (200) determines an action, by reflecting distances between semiconductor elements and lengths of wires connecting semiconductor elements and standard cells, through learning using a reinforcement learning algorithm such that the semiconductor elements and the standard cells are disposed in optimal positions" when interpreted as a whole.
Goldie et al. ("Placement Optimization with Deep Reinforcement Learning", ARXIV ID: 2003.08445, Mar. 18, 2020; ISPD ’20, March 29 – April 1, 2020, Taipei, Taiwan) discloses in ABSTRACT that (1) Placement Optimization is an important problem in systems and chip design, which consists of mapping the nodes of a graph onto a limited set of resources to optimize for an objective, subject to constraints; (2) start by motivating reinforcement learning as a solution to the placement problem; (3) then give an overview of what deep reinforcement learning is; and (4) next formulate the placement problem as a reinforcement learning problem, and show how this problem can be solved with policy gradient optimization. Goldie'2020 further discloses in Section 1 that (1) an important problem in systems and chip design is Placement Optimization, which refers to the problem of mapping the nodes of a graph onto a limited set of resources to optimize for an objective, subject to constraints; (2) common examples of this class of problem include placement of TensorFlow graphs onto hardware devices to minimize training or inference time, or placement of an ASIC or FPGA netlist onto a grid to optimize for power, performance, and area; (3) placement is a very challenging problem as several factors, including the size and topology of the input graph, number and properties of available resources, and the requirements and constraints of feasible placements all contribute to its complexity; (4) a range of algorithms including analytical approaches [3, 12, 14, 15], genetic and hill-climbing methods [4, 6, 13], Integer Linear Programming (ILP) [2, 27], and problem-specific heuristics have been proposed to the placement problem; (5) more recently, a new type of approach to the placement problem based on deep Reinforcement Learning (RL) [16, 17, 28] has emerged; (6) RL-based methods bring new challenges, such as interpretability, brittleness of training to convergence, and unsafe exploration; and (7) however, they also offer new opportunities, such as the ability to leverage distributed computing, ease of problem formulation, end-to-end optimization, and domain adaptation, meaning that these methods can potentially transfer what they learn from previous problems to new unseen instances. Goldie'2020 also discloses in Section 2 with FIG. 1 that (1) most successful applications of machine learning are examples of supervised learning, where a model is trained to approximate a particular function, given many input-output examples; (2) today’s state-of-the-art supervised models are typically deep learning models, meaning that the function approximation is achieved by updating the weights of a multi-layered (deep) neural network via gradient descent against a differentiable loss function; (3) reinforcement learning, on the other hand, is a separate branch of machine learning in which a model, or policy in RL parlance, learns to take actions in an environment (either the real world or a simulation) to maximize a given reward function; (4) deep reinforcement learning is simply reinforcement learning in which the policy is a deep neural network; (5) RL problems can be reformulated as Markov Decision Processes (MDPs) which rely on the Markov assumption, meaning that the next state st+1 depends only on the current state st , and is conditionally independent of the past; (6) like MDPs, RL problems are defined by five key components: (a) states: the set of possible states of the world; (b) actions: the set of actions that can be taken by the agent; (c) state transition probabilities: the probability of transitioning between any two given states; (d) reward: the objective to be maximized, subject to future discounting as defined below; and (e) discount for future rewards: how much to discount the value of future reward, due to its relative uncertainty; (6) at each time step t, the agent begins in state (st), takes an action (at), arrives at a new state (st+1), and receives a reward (rt) from the environment, as shown in Figure 1; (7) through repeated episodes (sequences of states, actions, and rewards), the agent learns to take actions that will maximize cumulative reward; (8) reinforcement learning approaches can be divided into two broad categories: model-free and model-based; (9) in model-free reinforcement learning, train a policy to take actions that maximize reward from a black-box environment; (9) in model-based reinforcement learning, train a policy to take actions that maximize reward, while also training an explicit model of the world, which learns to predict the reward and state transitions of the environment; (10) most existing work on reinforcement learning for systems problems has taken a model-free approach, as it is generally easier to train to convergence; (10) however, model-based reinforcement learning has been shown to be more sample efficient in other domains [11], so it may be a viable direction to take in future work, especially in situations where the reward function is very expensive to evaluate; (11) since the agent's goal is to maximize cumulative reward, one approach is to learn a value function that can predict the reward given a state, v(s), and then take the action which will bring the agent into a state that yields the highest reward; (12) however, a more common approach in recent years is to use policy gradient methods, which seek to directly learn the policy π(a|s) that predicts the optimal action given the current state; (13). popular policy gradient methods include REINFORCE [25], A3C [18], TRPO [21], and PPO[22]; (14) RL is helpful in cases where we do not have sufficient labeled data (input-output examples) to take a supervised learning approach or when the objective function is not differentiable; (15) it is also well-suited to massive search problems, where exhaustive or heuristic-based methods cannot scale, such as AlphaGo [23], AlphaStar [24], and OpenAI Five DOTA [19]; and (16) Reinforcement learning policies are famously difficult to train, as they tend to be brittle with respect to their hyperparameters, hard to interpret and debug, and prone to catastrophic failures and unsafe exploration. Goldie'2020 further teaches in Section 3.1 with FIG. 2 that (1) assume the input graph g has nodes v1, v2 , … , vN, and want to place these nodes onto placement locations l1, l2, … , lM; (2) use RL to find a mapping (v1, v2 , … , vN) → (l1, l2, … , lM) that maximizes a reward function R subject to constraints; (3) here, there is a unique placement location for each node vi; , but each location lj can be assigned to multiple nodes; (4) the constraints vary by problem, but a common constraint is limited capacity for each placement location, meaning that there is a limit on how many nodes can be assigned to each location; (5) instead of finding the absolute best placement, one can train a policy that generates a probability distribution of nodes to placement locations such that it maximizes the expected reward generated by those placements.; (6) let us denote the policy π parameterized by θ as πθ, wherein θ represents the weights of a deep network architecture; (6) describe the objective, which is to train parameters θ such that the network predicts placement decisions for the nodes of the input graph g, and as a result, the placement reward Rl,g is maximized; (7) train this policy (optimize parameters θ) using a policy gradient based method; (8) as shown in Figure 2, all of these placement problems require mapping the nodes of a graph onto placement locations such that their corresponding reward metrics are optimized; (9) for each of these problems, the neural network policy receives a state as input, and outputs an action for that state; (10) the network then outputs a probability distribution representing the probability of assigning an input node onto each placement location; (11) the action is selected by sampling or taking the argmax of the output probability distribution; and (12) the reward function varies for different problems; e.g., for TensorFlow graph placement, use negative runtime of a training step of the placed deep network model, and for ASIC and FPGA netlists, the reward is more complex and should include various metrics related to power and timing (e.g. total wirelength, routability congestion, and cell density). Goldie'2020 also teaches in Section 3.2 that (1) many placement problems take input in the form of graph; (2) the way in which these input graphs are represented has great impact on the ability of machine learning models to generate high-quality placements; (3) more meaningful representations also help models to learn patterns that generalize to new unseen graphs, as opposed to merely memorizing the graphs that they encounter; (4) graph neural networks can be divided into four high-level categories [26]: recurrent graph neural networks (RecGNNs) [7, 8, 20], convolutional graph neural networks (ConvGNNs) [1 , 5, 10], graph autoencoders (GAEs), and spatial-temporal graph neural networks (STGNNs); (5) most graph neural network methods used in systems today are ConvGNNs, which generalize the concept of convolution; and (6) ConvGNNs use deep convolutional networks to capture even distant relationships within the graph. Goldie'2020 further discloses in Section 3.3-3.4 that (1) domain adaptation in placement is the problem of training policies that can learn across multiple graphs and transfer the acquired knowledge to generate more optimized placements for new unseen graphs; (2) domain adaptation means to train a policy across a set of TensorFlow graphs, ASIC or FPGA netlists and apply the trained policy to an unseen TensorFlow graph, ASIC or FPGA netlist; (3) write the derivative of the objective function in Equation 1 as Equations 3-6; and (4) the equations 3-6 are the basis of various policy gradient optimization methods, such as REINFORCE [25], PPO [22], and SAC[9]. Goldie'2020 also discloses in Section 4 that (1) designing the right reward function is one of the most critical decisions to solve placement problems in computer systems and chip design; (2) some properties of effective reward functions are as follows: (a) reward functions should be fast to evaluate; (b) reward functions should be strongly correlated with the true objective; (c) another important factor is correctly engineering the reward function; (3) another key ingredient is designing the appropriate action space; (4) the constraints for feasible placements vary across placement problems; e.g., a common constraint is the capacity of placement locations, which limits the number of nodes that can be placed onto that location; (5) perhaps the most straightforward way to handle the constraints is to penalize the policy with a large negative reward whenever it generates infeasible placements, and a challenge with this solution is that the policy does not gain any information about how far this placement was from a feasible placement; (6) if all of the initial placements generated by the policy are infeasible, there will be no positive signal to teach the policy how to explore the environment and training will fail; (7) thus, creating a reward function that penalizes the infeasible placements relative to how far they are from viable placements becomes critical; (8) another approach is to force the policy to only generate feasible placements; (9) each time a new node is placed, the density of all the locations is updated (based on the locations of the nodes that are already placed); (10) the action space then becomes limited to those locations that have enough free capacity to accept the new node; and (11) the way in which state is represented has significant impact on the performance of the policy and its ability to generalize to unseen instances of the placement problem.
Goldie (US 2021/0334445 A1, pub. on 10/28/2021) discloses in ABSTRACT that (1) obtaining netlist data for a computer chip; (2) generating a computer chip placement, comprising placing a respective macro node at each time step in a sequence comprising a plurality of time steps; (3) generating an input representation for the time step; (4) processing the input representation using a node placement neural network having a plurality of network parameters, wherein the node placement neural network is configured to process the input representation in accordance with current values of the network parameters to generate a score distribution over a plurality of positions on the surface of the computer chip; and (5) assigning the macro node to be placed at the time step to a position from the plurality of positions using the score distribution. Goldie'445 further discloses in ¶¶ [0005]-[0011] that (1) generate a chip placement for an integrated circuit; (2) the integrated circuit for which the chip placement is being generated will be referred to in this specification as a "computer chip" but should generally be understood to mean any collection of electronic circuits that are fabricated on one piece of semiconductor material; (3) the chip placement places each node from a netlist of nodes at a respective location on the surface of the computer chip; (4) floor planning, which involves placing the components of a chip on the surface of the chip, is a crucial step in the chip design process; (5) the placement of the components should optimize metrics such as area, total wire length and congestion; (6) if a floorplan does not perform well on these metrics, the integrated circuit chip that is generated based on the floor plan will perform poorly; e.g., the integrated circuit chip could fail to function, could consume an excessive amount of power, could have an unacceptable latency, or have any of a variety of other undesirable properties that are caused by sub-optimal placement of components on the chip; (7) allow for a high-quality chip floorplan to be generated automatically and with minimal user involvement by making use of the described node placement neural network and the described training techniques; (8) as a particular example, when distributed training is employed, a high-quality (i.e., a superhuman) placement can be generated in on the order of hours without any human expert involvement; (9) by effectively making use of reinforcement learning to train the described node placement neural network, however, the described techniques are able to quickly generate a high-quality floorplan; (10) an integrated circuit chip which is produced using the method may have reduced power consumption compared to one produced by a conventional method; (11) when the encoder neural network is trained through supervised learning and the policy neural network is trained through reinforcement learning, can generalize quickly to new netlists and new integrated circuit chip dimensions; and (12) this greatly reduces the amount of computational resources that are required to generate placements for new netlists, because little to no computationally expensive fine-tuning is required to generate a high-quality floorplan for a new netlist. Goldie'445 also discloses in ¶¶ [0018]-[0055] with FIG. 1 that (1) the system 100 receives netlist data 102 for a computer chip, i.e., a very large-scale integration (VLSI) chip, that is to be manufactured and that includes a plurality of integrated circuit components, e.g., transistors, resistors, capacitors, and so on; (2) the netlist data 102 specifies a connectivity on the computer chip among a plurality of nodes that each correspond to one or more of a plurality of integrated circuit components of the computer chip; (3) in other words, the netlist data 102 identifies, for each of the plurality of nodes, which other nodes (if any) the node needs to be connected to by one or more wires in the manufactured computer chip; (4) the system 100 generates, as output, a final computer chip placement 152 that places some or all of the nodes in the netlist data 102 at a respective position on the surface of the computer chip; i.e., the final computer chip placement 152 identifies a respective position on the surface of the computer chip for some or all of the nodes in the netlist data 102 and, therefore, for the integrated circuit components that are represented by the node; (5) the netlist data 102 can identify two types of nodes: nodes that represent macro components and nodes that represent standard cell components; (6) macro components are large blocks of IC components, e.g., static random-access memory (SRAM) or other memory blocks, that are represented as a single node in the netlist; (7) standard cell components are a group of transistor and interconnect structures, e.g., a group that provides a Boolean logic function (e.g., AND, OR, XOR, XNOR, inverters) or a group that provides a storage function (e.g., flipflop or latch); (8) the placement 152 assigns each node to a grid square in an N x M grid overlaid over the surface of the chip, where N and M are integers; (9) the system 100 can process an input derived from the netlist data, data characterizing the surface of the integrated circuit chip, or both using a grid generation machine learning model that is configured to process the input to generate an output that defines how to divide the surface of the integrated circuit chip into the N x M grid; (10) the system 100 includes a node placement neural network 110 and a graph placement engine 130; (11) the system 100 uses the node placement neural network 110 to generate a macro node placement 122; (12) the macro node placement 122 places each macro node, i.e., each node representing a macro, in the netlist data 102 at a respective position on the surface of the computer chip; (13) the system 100 generates the macro node placement node-by-node over a number of time steps, with each macro node being placed at a location at a different one of the time steps, accord