DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office action is in responsive to communication(s): original application filed on 12/05/2022, said application claims a priority filing date of 12/28/2021. Claims 1-5 are pending. Claims 1 and 4 are independent.
Drawings
Figure 1 should be designated by a legend such as --Prior Art-- because only that which is old is illustrated (see Page 2, lines 8-9 of the specification). See MPEP § 608.02(g). Corrected drawings in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. The replacement sheet(s) should be labeled “Replacement Sheet” in the page header (as per 37 CFR 1.84(c)) so as not to obstruct any portion of the drawing figures. If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they do not include the following reference sign(s) mentioned in the description: 120 in Page 12, lines 2-8. Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(4) because (1) reference characters "220" in FIG. 3; Page 10, line 8; Page 12, line 23; Page 13, line 25 – Page 14, line 26; Page 16, line 12 – Page 17, line 4; Page 18, line 21; and Claims 1 and 3 and "120" in Page 2, lines 2-8 have both been used to designate "reinforcement learning agent"; and (2) reference characters "123" in FIG. 4 and "213" in Page 12, line 28; Page 14, line 2; Page 18, line 20; and Claim 3 have both been used to designate "simulation portion". Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they include the following reference character(s) not mentioned in the description: 123 in FIG. 4. Corrected drawing sheets in compliance with 37 CFR 1.121(d), or amendment to the specification to add the reference character(s) in the description in compliance with 37 CFR 1.121(b) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Specification
The disclosure is objected to because of the following informalities:
in .
Appropriate correction is required.
Claim Objections
Claims 1-5 are objected to because of the following informalities:
in Claim 1, lines 17-19 (i.e., Page 19, lines 19-21), "… feedback regarding decision making by a reinforcement learning agent (220); and a reinforcement learning agent (220) configured to …" appears to be "… feedback regarding decision making by a reinforcement learning agent (220); and the reinforcement learning agent (220) configured to …";
in Claim 2, lines 1-2 (i.e., Page 20, lines 9-10) and Claim 3, lines 1-2 (i.e., Page 20, lines 14-15), "The apparatus for reinforcement learning based on a user learning environment in semiconductor design of claim 1 …" appears to be "The apparatus for the reinforcement learning based on the user learning environment in the semiconductor design of claim 1 …" (see also 112 rejection to Claim 1);
in Claim 3, lines 5-12 (i.e., Page 20, lines 18-25), "… included in design data through configuration information input from the user terminal (100), distinguish semiconductor elements, standard cells, and wires according to characteristics or functions so as to prevent learning ranges from increasing during reinforcement learning, and distinguish, based on addition of specific colors, the objects distinguished according to characteristics or functions, thereby configuring a customized reinforcement learning environment …" appears to be "… included in the design data through the configuration information input from the user terminal (100), distinguish the semiconductor elements, the standard cells, and the wires according to the characteristics or functions so as to prevent the learning ranges from the increasing during the reinforcement learning, and distinguish, based on the addition of the specific colors, the objects distinguished according to the characteristics or functions, thereby configuring the customized reinforcement learning environment …" (see also 112 rejection to Claim 1);
in Claim 3, lines 13-22 (i.e., Page 20, line 26 – Page 21, line 5), "… analyze object information comprising semiconductor elements and standard cells based on design data comprising semiconductor netlist information, generate simulation data constituting a customized reinforcement learning environment by adding constraint or position change information configured by the environment configuration portion (211), and request, based on the simulation data, the reinforcement learning agent (220) to provide optimization information for disposition of at least one semiconductor element and standard cell …" appears to be "… analyze the object information comprising the semiconductor elements and the standard cells based on the design data comprising the semiconductor netlist information, generate simulation data constituting the customized reinforcement learning environment by adding the constraint or the position change information configured by the environment configuration portion (211), and request, based on the simulation data, the reinforcement learning agent (220) to provide optimization information for the disposition of the at least one semiconductor element and standard cell …" (see also 112 rejection to Claims 1 and 3);
in Claim 3, lines 23-31 (i.e., Page 21, lines 6-14), "… perform simulation constituting a reinforcement learning environment regarding semiconductor elements and standard cells, based on actions received from the reinforcement learning agent (220), and state information comprising semiconductor element disposition information to be used for reinforcement learning, and provide the reinforcement learning agent (220) with reward information calculated based on connection information of semiconductor elements and standard cells simulated as feedback regarding decision making …" appears to be "… perform the simulation constituting the reinforcement learning environment regarding the semiconductor elements and the standard cells, based on actions received from the reinforcement learning agent (220), and the state information comprising semiconductor element disposition information to be used for the reinforcement learning, and provide the reinforcement learning agent (220) with the reward information calculated based on the connection information of the semiconductor elements and the standard cells simulated as the feedback regarding the decision making …" (see also 112 rejection to Claim 1);
in Claim 4, lines 2-3 (i.e., Page 21, lines 18-19), "… the method comprising the steps of …" appears to be "… the method comprising steps of …";
in Claim 4, lines 10-12 (i.e., Page 21, lines 26-28), "… adding constraint or position change information with regard to each object through configuration information input from a user terminal (100) …" appears to be "… adding constraint or position change information with regard to each object through configuration information input from the user terminal (100) …";
in Claim 4, lines 19-21 (i.e., Page 22, lines 5-7) "… so as to optimize disposition of at least one semiconductor element disposition and stand cell disposition …" appears to be "… so as to optimize disposition of at least one semiconductor element and standard cell …" according to Claim 1;
in Claim 4, lines 30-32 (i.e., Page 22, lines 16-17), "… wherein the customized reinforcement learning environment configured in step b) …" appears to be "… wherein the customized reinforcement learning environment configured in the step b) …";
in Claim 4, lines 37-38 (i.e., Page 22, lines 23-24), "… wherein, in step c), the reinforcement learning server (200) determines an action …" appears to be "… wherein, in the step c), the reinforcement learning server (200) determines an action …" ;
In Claim 5, lines 1-3 (i.e., Page 22, line 30 – Page 23, line 2), "The method for reinforcement learning based on a user learning environment in semiconductor design of claim 4, wherein the design data in step a) is …" appears to "The method for the reinforcement learning based on the user learning environment in the semiconductor design of claim 4, wherein the design data in the step a) is …" (see also 112 rejection to Claim 4).
Appropriate correction is required.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are: "a simulation engine (210) configured to ..." and "a reinforcement learning agent (220) configured to ..." in Claim 1; and "an environment configuration portion (211) configured to ...", "a reinforcement learning environment construction portion (212) configured to …" and "a simulation portion (213) configured to ..." in Claim 3.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-5 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites the limitation "An apparatus for " in lines 1-2, which rendering the claim indefinite because it is unclear whether these instances of ".
Claim 1 recites the limitation "." in lines 4-35 (i.e., Page 19, line 6 , which rendering the claim indefinite because (1) it is unclear whether "a semiconductor element and a standard cell", "at least one semiconductor element and standard cell", and "semiconductor elements and standard cells" are related or not; (2) it is unclear whether the multiple instances of "semiconductor elements", "standard cells", and "wires" are the same or not; and (3) if they are different, which instance of "semiconductor elements" and "standard cells" is respectively referred by "the semiconductor elements" and "the standard cells" disposed in optimal positions. Clarification is required.
Claim 1 recites the limitation "... based on an action determined to optimize disposition of at least one semiconductor element and standard cell ... determining an action so as to optimize disposition of semiconductor elements and standard cells ... determines an action ... through learning using a reinforcement learning algorithm such that the semiconductor elements and the standard cells are disposed in optimal positions" in lines 12-35 (i.e., Page 19, line 14 , which rendering the claim indefinite because it is unclear whether these three instances of ".
Claim 1 recites the limitation "... perform simulation based on …, and state information of the customized reinforcement learning environment ... perform reinforcement learning based on state information and reward information ..." in line, which rendering the claim indefinite because it is unclear whether these two instances of ".
Claim 1 recites the limitation "... provide reward information calculated based on ... perform reinforcement learning based on state information and reward information ..." in line, which rendering the claim indefinite because it is unclear whether these two instances of ".
Claim 1 recites the limitation "... distinguishes semiconductor elements, standard cells, and wires according to characteristics or functions, and distinguishes, based on addition of specific colors, the objects distinguished according to characteristics or functions ..." in line, which rendering the claim indefinite because it .
Claim 1 recites the limitation "... by adding constraint or position change information with regard to each object through configuration information input … distinguishes, based on addition of specific colors, the objects distinguished according to characteristics or functions ..." in lines 7-27 (i.e., Page 19, lines 9-29). There is insufficient antecedent basis for the limitation "the objects" in the claim. Clarification is required.
Claims 2-3 are rejected for fully incorporating the deficiency of their respective base claims.
Claim 3 recites the limitation "." in lines 4-7 (i.e., Page 20, lines 17-20), which rendering the claim indefinite because ".
Claim 4 recites the limitation "A " in lines 1-, which rendering the claim indefinite because it is unclear whether .
Claim 4 recites the limitation "." in lines (see also claim objection to Claim 4), which rendering the claim indefinite because (1) it is unclear whether "a semiconductor element and a standard cell", "at least one semiconductor element and standard cell", and "semiconductor elements and standard cells" are related or not; (2) it is unclear whether the multiple instances of "semiconductor elements", "standard cells", and "wires" are the same or not; and (3) if they are different, which instance of "semiconductor elements" and "standard cells" is respectively referred by "the semiconductor elements" and "the standard cells" disposed in optimal positions. Clarification is required.
Claim 4 recites the limitation "... determining an action so as to optimize disposition of at least one semiconductor element ... regarding disposition of the semiconductor element and standard cell based on an action ... determines an action ... through learning using a reinforcement learning algorithm such that the semiconductor elements and the standard cells are disposed in optimal positions" in lines 1, which rendering the claim indefinite because it is unclear whether these three instances of ".
Claim 4 recites the limitation "... performing … reinforcement learning based on reward information and state information ... generating reward information calculated based on ..." in line, which rendering the claim indefinite because it is unclear whether these two instances of ".
Claim 4 recites the limitation "... configuring a customized reinforcement learning environment by adding ... the customized reinforcement learning environment comprising disposition information of semiconductor elements and standard cells ... simulation constituting a reinforcement learning environment regarding disposition of the semiconductor element and standard cell ... " in lines, which rendering the claim indefinite because it is unclear whether ".
Claim 4 recites the limitation "... performing" in lines 22-2, which rendering the claim indefinite because it is unclear whether these two instances of ".
Claim 4 recites the limitation "... distinguishes semiconductor elements, standard cells, and wires according to characteristics or functions … and distinguishes, based on addition of specific colors, the objects distinguished according to characteristics or functions ..." in line, which rendering the claim indefinite because it is unclear .
Claim 4 recites the limitation "... by adding constraint or position change information with regard to each object through configuration information input … distinguishes, based on addition of specific colors, the objects distinguished according to characteristics or functions ..." in lines . There is insufficient antecedent basis for the limitation "the objects" in the claim. Clarification is required.
Claim 5 is rejected for fully incorporating the deficiency of their respective base claims.
Claim limitations "a simulation engine (210) configured to ..." and "a reinforcement learning agent (220) configured to ..." in Claims 1-3; and "an environment configuration portion (211) configured to ...", "a reinforcement learning environment construction portion (212) configured to …" and "a simulation portion (213) configured to ..." in Claim 3 invokes 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. The whole disclosure is completely silent on any structure that performs these functions in these claims Therefore, these claims are indefinite and are rejected under 35 U.S.C. 112(b) or pre-AIA 35 U.S.C. 112, second paragraph.
Applicant may:
(a) Amend these claims so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph;
(b) Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or
(c) Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either:
(a) Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or
(b) Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-3 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because "An apparatus ... comprising: a simulation engine (210) configured to ...; and a reinforcement learning agent (220) configured to ..." (in Claim 1) and "… wherein the simulation engine (210) comprises: an environment configuration portion (211) configured to …; a reinforcement learning environment construction portion (212) configured to …; and a simulation portion (213) configured to … " (in Claim 3) are recited and however, the whole specification is completely silent on any structure that performs functions for "simulation engine (210)", "reinforcement learning agent (220)", "environment configuration portion (211)", "reinforcement learning environment construction portion (212)", and "simulation portion (213)" in an apparatus. These functional components without describing their structures in the specification can be considered as software components to implement these functions; i.e., software per se. Therefore, an apparatus with only software components are cited in the claim is not one of the four categories of patent eligible subject matter.
Allowable Subject Matter
Claims 1-3 would be allowable if rewritten or amended to overcome the rejection(s) under 35 U.S.C. 101 and 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action.
Claims 4-5 would be allowable if rewritten or amended to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action.
The following is a statement of reasons for the indication of allowable subject matter: .
In regard to independent Claims 1 and 4, prior arts of records, either singularly or in combination, do not teach or suggest the combination of claimed elements including "an apparatus for reinforcement learning based on a user learning environment in semiconductor design, the apparatus comprising: a simulation engine (210) configured to analyze object information comprising a semiconductor element and a standard cell based on design data comprising semiconductor netlist information, configure a customized reinforcement learning environment by adding constraint or position change information with regard to each object through configuration information input from a user terminal (100) and the analyzed object information, perform reinforcement learning based on the customized reinforcement learning environment, perform simulation based on an action determined to optimize disposition of at least one semiconductor element and standard cell, and state information of the customized reinforcement learning environment, and provide reward information calculated based on connection information of semiconductor elements and standard cells according to a simulation result as feedback regarding decision making by a reinforcement learning agent (220); and a reinforcement learning agent (220) configured to perform reinforcement learning based on state information and reward information received from the simulation engine (210), thereby determining an action so as to optimize disposition of semiconductor elements and standard cells, wherein the simulation engine (210) distinguishes semiconductor elements, standard cells, and wires according to characteristics or functions, and distinguishes, based on addition of specific colors, the objects distinguished according to characteristics or functions, thereby preventing learning ranges from increasing during reinforcement learning, and wherein the reinforcement learning agent (220) determines an action, by reflecting distances between semiconductor elements and lengths of wires connecting semiconductor elements and standard cells, through learning using a reinforcement learning algorithm such that the semiconductor elements and the standard cells are disposed in optimal positions" or "a method for reinforcement learning based on a user learning environment in semiconductor design, the method comprising the steps of: a) receiving, by a reinforcement learning server (200), design data comprising semiconductor netlist information from a user terminal (100);b) analyzing, by the reinforcement learning server (200), object information comprising a semiconductor element and a standard cell from the received design data, and configuring a customized reinforcement learning environment by adding constraint or position change information with regard to each object through configuration information input from a user terminal (100), based on the analyzed object information; c) performing, by the reinforcement learning server (200), reinforcement learning based on reward information and state information of the customized reinforcement learning environment comprising disposition information of semiconductor elements and standard cells to be used for reinforcement learning through a reinforcement learning agent, thereby determining an action so as to optimize disposition of at least one semiconductor element disposition and stand cell disposition; and d) performing, by the reinforcement learning server (200), simulation constituting a reinforcement learning environment regarding disposition of the semiconductor element and standard cell based on an action, and generating reward information calculated based on connection information of semiconductor elements and standard cells according to a result of performing simulation as feedback regarding decision making by the reinforcement learning agent, wherein the customized reinforcement learning environment configured in step b) distinguishes semiconductor elements, standard cells, and wires according to characteristics or functions so as to prevent learning ranges from increasing during reinforcement learning, and distinguishes, based on addition of specific colors, the objects distinguished according to characteristics or functions, and wherein, in step c), the reinforcement learning server (200) determines an action, by reflecting distances between semiconductor elements and lengths of wires connecting semiconductor elements and standard cells, through learning using a reinforcement learning algorithm such that the semiconductor elements and the standard cells are disposed in optimal positions" when interpreted as a whole.
Goldie et al. ("Placement Optimization with Deep Reinforcement Learning", ARXIV ID: 2003.08445, Mar. 18, 2020; ISPD ’20, March 29 – April 1, 2020, Taipei, Taiwan) discloses in ABSTRACT that (1) Placement Optimization is an important problem in systems and chip design, which consists of mapping the nodes of a graph onto a limited set of resources to optimize for an objective, subject to constraints; (2) start by motivating reinforcement learning as a solution to the placement problem; (3) then give an overview of what deep reinforcement learning is; and (4) next formulate the placement problem as a reinforcement learning problem, and show how this problem can be solved with policy gradient optimization. Goldie'2020 further discloses in Section 1 that (1) an important problem in systems and chip design is Placement Optimization, which refers to the problem of mapping the nodes of a graph onto a limited set of resources to optimize for an objective, subject to constraints; (2) common examples of this class of problem include placement of TensorFlow graphs onto hardware devices to minimize training or inference time, or placement of an ASIC or FPGA netlist onto a grid to optimize for power, performance, and area; (3) placement is a very challenging problem as several factors, including the size and topology of the input graph, number and properties of available resources, and the requirements and constraints of feasible placements all contribute to its complexity; (4) a range of algorithms including analytical approaches [3, 12, 14, 15], genetic and hill-climbing methods [4, 6, 13], Integer Linear Programming (ILP) [2, 27], and problem-specific heuristics have been proposed to the placement problem; (5) more recently, a new type of approach to the placement problem based on deep Reinforcement Learning (RL) [16, 17, 28] has emerged; (6) RL-based methods bring new challenges, such as interpretability, brittleness of training to convergence, and unsafe exploration; and (7) however, they also offer new opportunities, such as the ability to leverage distributed computing, ease of problem formulation, end-to-end optimization, and domain adaptation, meaning that these methods can potentially transfer what they learn from previous problems to new unseen instances. Goldie'2020 also discloses in Section 2 with FIG. 1 that (1) most successful applications of machine learning are examples of supervised learning, where a model is trained to approximate a particular function, given many input-output examples; (2) today’s state-of-the-art supervised models are typically deep learning models, meaning that the function approximation is achieved by updating the weights of a multi-layered (deep) neural network via gradient descent against a differentiable loss function; (3) reinforcement learning, on the other hand, is a separate branch of machine learning in which a model, or policy in RL parlance, learns to take actions in an environment (either the real world or a simulation) to maximize a given reward function; (4) deep reinforcement learning is simply reinforcement learning in which the policy is a deep neural network; (5) RL problems can be reformulated as Markov Decision Processes (MDPs) which rely on the Markov assumption, meaning that the next state st+1 depends only on the current state st , and is conditionally independent of the past; (6) like MDPs, RL problems are defined by five key components: (a) states: the set of possible states of the world; (b) actions: the set of actions that can be taken by the agent; (c) state transition probabilities: the probability of transitioning between any two given states; (d) reward: the objective to be maximized, subject to future discounting as defined below; and (e) discount for future rewards: how much to discount the value of future reward, due to its relative uncertainty; (6) at each time step t, the agent begins in state (st), takes an action (at), arrives at a new state (st+1), and receives a reward (rt) from the environment, as shown in Figure 1; (7) through repeated episodes (sequences of states, actions, and rewards), the agent learns to take actions that will maximize cumulative reward; (8) reinforcement learning approaches can be divided into two broad categories: model-free and model-based; (9) in model-free reinforcement learning, train a policy to take actions that maximize reward from a black-box environment; (9) in model-based reinforcement learning, train a policy to take actions that maximize reward, while also training an explicit model of the world, which learns to predict the reward and state transitions of the environment; (10) most existing work on reinforcement learning for systems problems has taken a model-free approach, as it is generally easier to train to convergence; (10) however, model-based reinforcement learning has been shown to be more sample efficient in other domains [11], so it may be a viable direction to take in future work, especially in situations where the reward function is very expensive to evaluate; (11) since the agent's goal is to maximize cumulative reward, one approach is to learn a value function that can predict the reward given a state, v(s), and then take the action which will bring the agent into a state that yields the highest reward; (12) however, a more common approach in recent years is to use policy gradient methods, which seek to directly learn the policy π(a|s) that predicts the optimal action given the current state; (13). popular policy gradient methods include REINFORCE [25], A3C [18], TRPO [21], and PPO[22]; (14) RL is helpful in cases where we do not have sufficient labeled data (input-output examples) to take a supervised learning approach or when the objective function is not differentiable; (15) it is also well-suited to massive search problems, where exhaustive or heuristic-based methods cannot scale, such as AlphaGo [23], AlphaStar [24], and OpenAI Five DOTA [19]; and (16) Reinforcement learning policies are famously difficult to train, as they tend to be brittle with respect to their hyperparameters, hard to interpret and debug, and prone to catastrophic failures and unsafe exploration. Goldie'2020 further teaches in Section 3.1 with FIG. 2 that (1) assume the input graph g has nodes v1, v2 , … , vN, and want to place these nodes onto placement locations l1, l2, … , lM; (2) use RL to find a mapping (v1, v2 , … , vN) → (l1, l2, … , lM) that maximizes a reward function R subject to constraints; (3) here, there is a unique placement location for each node vi; , but each location lj can be assigned to multiple nodes; (4) the constraints vary by problem, but a common constraint is limited capacity for each placement location, meaning that there is a limit on how many nodes can be assigned to each location; (5) instead of finding the absolute best placement, one can train a policy that generates a probability distribution of nodes to placement locations such that it maximizes the expected reward generated by those placements.; (6) let us denote the policy π parameterized by θ as πθ, wherein θ represents the weights of a deep network architecture; (6) describe the objective, which is to train parameters θ such that the network predicts placement decisions for the nodes of the input graph g, and as a result, the placement reward Rl,g is maximized; (7) train this policy (optimize parameters θ) using a policy gradient based method; (8) as shown in Figure 2, all of these placement problems require mapping the nodes of a graph onto placement locations such that their corresponding reward metrics are optimized; (9) for each of these problems, the neural network policy receives a state as input, and outputs an action for that state; (10) the network then outputs a probability distribution representing the probability of assigning an input node onto each placement location; (11) the action is selected by sampling or taking the argmax of the output probability distribution; and (12) the reward function varies for different problems; e.g., for TensorFlow graph placement, use negative runtime of a training step of the placed deep network model, and for ASIC and FPGA netlists, the reward is more complex and should include various metrics related to power and timing (e.g. total wirelength, routability congestion, and cell density). Goldie'2020 also teaches in Section 3.2 that (1) many placement problems take input in the form of graph; (2) the way in which these input graphs are represented has great impact on the ability of machine learning models to generate high-quality placements; (3) more meaningful representations also help models to learn patterns that generalize to new unseen graphs, as opposed to merely memorizing the graphs that they encounter; (4) graph neural networks can be divided into four high-level categories [26]: recurrent graph neural networks (RecGNNs) [7, 8, 20], convolutional graph neural networks (ConvGNNs) [1 , 5, 10], graph autoencoders (GAEs), and spatial-temporal graph neural networks (STGNNs); (5) most graph neural network methods used in systems today are ConvGNNs, which generalize the concept of convolution; and (6) ConvGNNs use deep convolutional networks to capture even distant relationships within the graph. Goldie'2020 further discloses in Section 3.3-3.4 that (1) domain adaptation in placement is the problem of training policies that can learn across multiple graphs and transfer the acquired knowledge to generate more optimized placements for new unseen graphs; (2) domain adaptation means to train a policy across a set of TensorFlow graphs, ASIC or FPGA netlists and apply the trained policy to an unseen TensorFlow graph, ASIC or FPGA netlist; (3) write the derivative of the objective function in Equation 1 as Equations 3-6; and (4) the equations 3-6 are the basis of various policy gradient optimization methods, such as REINFORCE [25], PPO [22], and SAC[9]. Goldie'2020 also discloses in Section 4 that (1) designing the right reward function is one of the most critical decisions to solve placement problems in computer systems and chip design; (2) some properties of effective reward functions are as follows: (a) reward functions should be fast to evaluate; (b) reward functions should be strongly correlated with the true objective; (c) another important factor is correctly engineering the reward function; (3) another key ingredient is designing the appropriate action space; (4) the constraints for feasible placements vary across placement problems; e.g., a common constraint is the capacity of placement locations, which limits the number of nodes that can be placed onto that location; (5) perhaps the most straightforward way to handle the constraints is to penalize the policy with a large negative reward whenever it generates infeasible placements, and a challenge with this solution is that the policy does not gain any information about how far this placement was from a feasible placement; (6) if all of the initial placements generated by the policy are infeasible, there will be no positive signal to teach the policy how to explore the environment and training will fail; (7) thus, creating a reward function that penalizes the infeasible placements relative to how far they are from viable placements becomes critical; (8) another approach is to force the policy to only generate feasible placements; (9) each time a new node is placed, the density of all the locations is updated (based on the locations of the nodes that are already placed); (10) the action space then becomes limited to those locations that have enough free capacity to accept the new node; and (11) the way in which state is represented has significant impact on the performance of the policy and its ability to generalize to unseen instances of the placement problem.
Goldie (US 2021/0334445 A1, pub. on 10/28/2021) discloses in ABSTRACT that (1) obtaining netlist data for a computer chip; (2) generating a computer chip placement, comprising placing a respective macro node at each time step in a sequence comprising a plurality of time steps; (3) generating an input representation for the time step; (4) processing the input representation using a node placement neural network having a plurality of network parameters, wherein the node placement neural network is configured to process the input representation in accordance with current values of the network parameters to generate a score distribution over a plurality of positions on the surface of the computer chip; and (5) assigning the macro node to be placed at the time step to a position from the plurality of positions using the score distribution. Goldie'445 further discloses in ¶¶ [0005]-[0011] that (1) generate a chip placement for an integrated circuit; (2) the integrated circuit for which the chip placement is being generated will be referred to in this specification as a "computer chip" but should generally be understood to mean any collection of electronic circuits that are fabricated on one piece of semiconductor material; (3) the chip placement places each node from a netlist of nodes at a respective location on the surface of the computer chip; (4) floor planning, which involves placing the components of a chip on the surface of the chip, is a crucial step in the chip design process; (5) the placement of the components should optimize metrics such as area, total wire length and congestion; (6) if a floorplan does not perform well on these metrics, the integrated circuit chip that is generated based on the floor plan will perform poorly; e.g., the integrated circuit chip could fail to function, could consume an excessive amount of power, could have an unacceptable latency, or have any of a variety of other undesirable properties that are caused by sub-optimal placement of components on the chip; (7) allow for a high-quality chip floorplan to be generated automatically and with minimal user involvement by making use of the described node placement neural network and the described training techniques; (8) as a particular example, when distributed training is employed, a high-quality (i.e., a superhuman) placement can be generated in on the order of hours without any human expert involvement; (9) by effectively making use of reinforcement learning to train the described node placement neural network, however, the described techniques are able to quickly generate a high-quality floorplan; (10) an integrated circuit chip which is produced using the method may have reduced power consumption compared to one produced by a conventional method; (11) when the encoder neural network is trained through supervised learning and the policy neural network is trained through reinforcement learning, can generalize quickly to new netlists and new integrated circuit chip dimensions; and (12) this greatly reduces the amount of computational resources that are required to generate placements for new netlists, because little to no computationally expensive fine-tuning is required to generate a high-quality floorplan for a new netlist. Goldie'445 also discloses in ¶¶ [0018]-[0055] with FIG. 1 that (1) the system 100 receives netlist data 102 for a computer chip, i.e., a very large-scale integration (VLSI) chip, that is to be manufactured and that includes a plurality of integrated circuit components, e.g., transistors, resistors, capacitors, and so on; (2) the netlist data 102 specifies a connectivity on the computer chip among a plurality of nodes that each correspond to one or more of a plurality of integrated circuit components of the computer chip; (3) in other words, the netlist data 102 identifies, for each of the plurality of nodes, which other nodes (if any) the node needs to be connected to by one or more wires in the manufactured computer chip; (4) the system 100 generates, as output, a final computer chip placement 152 that places some or all of the nodes in the netlist data 102 at a respective position on the surface of the computer chip; i.e., the final computer chip placement 152 identifies a respective position on the surface of the computer chip for some or all of the nodes in the netlist data 102 and, therefore, for the integrated circuit components that are represented by the node; (5) the netlist data 102 can identify two types of nodes: nodes that represent macro components and nodes that represent standard cell components; (6) macro components are large blocks of IC components, e.g., static random-access memory (SRAM) or other memory blocks, that are represented as a single node in the netlist; (7) standard cell components are a group of transistor and interconnect structures, e.g., a group that provides a Boolean logic function (e.g., AND, OR, XOR, XNOR, inverters) or a group that provides a storage function (e.g., flipflop or latch); (8) the placement 152 assigns each node to a grid square in an N x M grid overlaid over the surface of the chip, where N and M are integers; (9) the system 100 can process an input derived from the netlist data, data characterizing the surface of the integrated circuit chip, or both using a grid generation machine learning model that is configured to process the input to generate an output that defines how to divide the surface of the integrated circuit chip into the N x M grid; (10) the system 100 includes a node placement neural network 110 and a graph placement engine 130; (11) the system 100 uses the node placement neural network 110 to generate a macro node placement 122; (12) the macro node placement 122 places each macro node, i.e., each node representing a macro, in the netlist data 102 at a respective position on the surface of the computer chip; (13) the system 100 generates the macro node placement node-by-node over a number of time steps, with each macro node being placed at a location at a different one of the time steps, according to a macro node order; (14) at each particular time step in the sequence, the system 100 generates an input representation for the particular time step and processes the input representation using the node placement neural network 110; (15) the input representation for a particular time step generally characterizes at least (i) respective positions on the surface of the chip of any macro nodes that are before a particular macro node to be placed at the particular time step in the macro node order and (ii) the particular macro node to be placed at the particular time step; (16) the input representation can also optionally include data that characterizes the connectivity between the nodes that is specified in the netlist data 102; (17) the node placement neural network 110 is a neural network that has parameters (referred to in this specification as "network parameters") and that is configured to process the input representation in accordance with current values of the network parameters to generate a score distribution, e.g., a probability distribution or a distribution of logits, over a plurality of positions on the surface of the computer chip; (18) the system 100 then assigns the macro node to be placed at the particular time step to a position from the plurality of positions using the score distribution generated by the neural network; (19) once the system 100 has generated the macro node placement 122, the graph placement engine 130 generates an initial computer chip placement 132 by placing each of the standard cells at a respective position on the surface of a partially placed integrated circuit chip that includes the macro components represented by the macro nodes placed according to the macro node placement, i.e., placed as in the macro node placement 122; (20) the engine 130 can cluster the standard cells using a partitioning technique that is based on the normalized minimum cut objective; (21) the engine 130 can directly places each standard cell at a respective position on the surface of the partially placed integrated circuit chip using the graph placement technique without clustering the standard cells; (22) the engine 130 can use a force based technique, i.e., a force directed technique, and in particular, when using a force based technique, the engine 130 represents the netlist as a system of springs that apply force to each node, according to the weight x distance formula, causing tightly connected nodes to be attracted to one another; (23) optionally, the engine 130 also introduces a repulsive force between overlapping nodes to reduce placement density; (24) to reduce oscillations, the engine 130 can set a maximum distance for each move; (25) the system 100 provides the initial placement 132 as input to a legalization engine 150 that adjusts the initial placement 132 to generate the final placement 152; (26) the engine 150 can perform a greedy legalization step to snap macros onto the nearest legal position while honoring the minimum spacing constraints; (27) optionally, the engine 150 can further refine the legalized placement or can refine the initial placement 132 directly without generating the legalized placement, e.g., by performing simulated annealing on a reward function; e.g., the engine 150 can perform simulated annealing by applying a hill climbing algorithm to iteratively adjust the placements in the legalized placement or the initial placement 132 to generate the final computer chip placement 152; (28) the system 100 or an external system can then fabricate (produce) a chip (integrated circuit) according to the final placement 152, and such an integrated circuit may exhibit improved performance, e.g., have one or more of lower power consumption, lower latency, or smaller surface area, than one designed using a conventional design process, and/or be producible using fewer resources; (29) the system 100 can receive the netlist data 102 as an upload from a remote user of the system over a data communication network, e.g., using an application programming interface (API); (30) the system 100 can then provide the final placement 152 to the remote user through the API for use in fabricating a chip according to the final placement 152; (31) the system 100 can be part of an electronic design automation (EDA) software tool and can receive the netlist data 102 from a user of the tool or from another component of the tool; and (32) the system 100 can provide the final placement 152 for evaluation by another component of the EDA software tool before the computer chip is fabricated.
Cheng et al. ("On Joint Learning for Solving Placement and Routing in Chip Design", ARXIV ID: 2111.00234, Oct. 30, 2021, pp. 1-12) discloses in ABSTRACT that (1) machine learning has been an emerging tool for solving the placement and routing problems, as two critical steps in modern chip design flow; (2) to achieve end-to-end placement learning, first propose a joint learning method termed by DeepPlace for the placement of macros and standard cells, by the integration of reinforcement learning with a gradient based optimization scheme; (3) to further bridge the placement with the subsequent routing task, also develop a joint learning approach via reinforcement learning to fulfill both macro placement and routing, which is called DeepPR; (4) one key design in our (reinforcement) learning paradigm involves a multi-view embedding model to encode both global graph level and local node level information of the input macros; and (5) moreover, the random network distillation is devised to encourage exploration. Cheng further discloses in Section 1 with FIG. 1 that (1) placement is one of the most crucial but time-consuming steps of the chip design process. It maps the components of a netlist including macros and standard cells to locations on the chip layout, where standard cells are basic logic cells e.g. logic gates and macros are functional blocks e.g. SRAMs; (2) a good placement leads to better chip area utilization, timing performance and routability; (3) based on the placement assignment, routing assigns wires to connect the components, which is strongly coupled with placement task; (4) in addition, the placement solution also serves as a rough estimation of wirelength and congestion, which is valuable in guiding the earlier stages of design flow; (5) the objective of placement is to minimize metrics of power, performance, and area (PPA) without violating the constraints such as placement density and routing congestion; (6) provide a pipeline to the placement problem: a netlist is represented by hypergraph H = (V;E), where V denotes set of nodes (cells) and E denotes set of hyperedges (nets) that indicates the connectivity between circuit components; (7) macro placement firstly determines the locations of macros on the chip canvas, followed by immense numbers of standard cells adjust their position based on adjacent macros and finally obtains the full placement solution, as shown in Fig. 1(a); (8) the routing problem, however, takes placement solution as input and tries to connect those electronic components in a circuit coarsely; (9) without violating constraints on edges between neighboring routing tiles, the target of routing is to minimize the total wirelength, as shown in Fig. 1(b); (10) for learning based placement, propose an end-to-end approach DeepPlace for both macros and standard cells, whereby the two kinds of components are sequentially arranged by reinforcement learning and neural network formed gradient optimization, respectively; (11) propose DeepPR to jointly solve placement and routing via (reinforcement) learning; (12) to adapt reinforcement learning more effectively into our pipeline, design a novel policy network that introduces both CNN and GNN to provide two views to the placement input, in contrast to previous works that use CNN [3] or GNN [1] alone to obtain the embedding; (13) the hope is that both global embedding and node level embedding information can be synthetically explored; and (14) further adopt the random network distillation to encourage exploration in reinforcement learning. Cheng also discloses in Section 3 with FIGS. 2-3 that (1) target the problem of macro placement, whose objective is to determine locations of macros on the chip canvas with no overlap and wirelength minimized; (2) RL agent sequentially maps the macros to valid positions on the layout; (3) once all macros have been placed, either fix their positions and adopt gradient-based placement optimization to obtain a complete placement solution with corresponding evaluation metrics such as wirelength and congestion, as shown in Fig. 2(a); (4) alternatively, develop another RL agent to route the placement solution and regard the exact total wirelength as rewards for both placement and routing task, which is shown in Fig. 2(b); (5) the key elements of the Markov Decision Processes (MDPs) are defined as follows: (a) State st: the state representation consists of two part, global image I portrayed the layout and netlist graph H which contains detailed position of all macros that have been placed; (b) Action at: the action space contains available positions in the n[Symbol font/0x20][Symbol font/0xB4] n canvas at time t, where n denotes the size of grid, and once a spare position (x, y) is selected by the current macro, set Ixy = 1 and remove this position from the available list; (c) Reward rt: the reward at the end of episode is a negative weighted sum of wirelength and routing congestion from the final solution, wherein the weight is a trade-off between main objective wirelength and routing congestion which indicates the routability for routing task, and different from other deep RL placers that set the reward to 0 for all previous actions, adopt random network distillation (RND) inspired from [27] to calculate intrinsic rewards at each time step; (6) as the episode goes, the policy network learns to maximize the expected reward from placing prior chips and improves the quality of placement over time; (7) use Proximal Policy Optimization (PPO) to update the policy network; (8) to determine the position of macros (pieces) in a sequential manner, we model the current state as an image I of size n [Symbol font/0xB4] n, wherein Ixy = 1 when previous macro is placed on position(x, y); (9) this image representation gives an overview of the partial placement with loss of some detailed information; (10) further obtain the global embedding from convolutional neural network (CNN); (11) moreover, the netlist graph as critical input information implies the rule of reward calculation, which is detailed guidance on the action prediction; (12) develop a graph neural network (GNN) architecture that produces detailed node embedding for current macro in consideration; (13) The role of graph neural networks is to explore the physical meaning of netlist and distill information about connectivity of nodes into low-dimensional vector representations which is utilized in following calculations; (14) after obtain both global embedding from CNN and detailed node embedding from GNN, fuse them by concatenation and pass the result to a fully-connected layer to generate a probability distribution over actions; (15) the multi-view embedding model is able to synthetically explore global and node level information, and whole structure of policy network is shown in Fig. 3(a); (16) define cost functions with both wirelength and congestion, trying to optimize the performance and routability simultaneously; (17) employ half-perimeter wirelength (HPWL) as the approximation for wirelength; (18) adopt Rectangular Uniform wire Density (RUDY) [29] to approximate the routing congestion, as HPWL is an intermediate result during the calculation process; (19) inspired by the idea of random network distillation (RND) [27], give an intrinsic bonus in each time step to encourage exploration as shown in Fig. 3(b); (20) there are two networks involved in RND: a fixed and randomly initialized target network and a predictor network trained on global images I collected by the agent; (21) the predictor network is trained by SGD to minimize this expected MSE which distills randomly initialized network into a trained one; (22) this distillation error could be seen as a quantification of prediction uncertainty; (23) to ensure the runtime of each iteration affordable for training, apply state-of-the-art gradient based optimization placer DREAMPlace [2] to arrange standard cells in the reward calculation step; (24) on the one hand, the position of large macros as fixed instances will influence the solution quality of gradient based optimization placer, which can improve over time through training; (25) on the other hand, better approximation to the metrics such as wirelength leads to a better guidance for training the agent; (26) as a result, the combination of RL agent with gradient based optimization placer will mutually enhance each other; (27) the state-of-the-art tool DREAMPlace implements key kernels in analytical placement, e.g., wirelength and density computation with deep learning toolkit, which fully explores the potential of GPU acceleration and reduces the runtime in less than a minute; (28) routability is one of the most critical factors to consider during placement, hence routing congestion is a necessary component in the objective (reward) function in most previous methods; (29) jointly learn placement and routing task, both of which try to minimize the wirelength in practice; (30) adopt another RL agent that predicts the routing direction after decomposing the netlist obtained from the placement task to pin-to-pin routing problems; (31) the overall wirelength is then used as episodic reward for both placement and routing agents to update parameters of two policy networks respectively; and (32) the advantages of this joint learning paradigm are twofold: (a) on one hand, placement solution provides abundant training data for the routing agent, instead of randomly generated data used in previous work which lacks of modeling the distribution of real domain data; (b) on the other hand, routing provides a direct objective for the placement agent to optimize, hence relieving the need of intermediate cost models and reducing bias in the reward signal.
Mirhoseini et al. ("Chip Placement with Deep Reinforcement Learning", ARXIV ID: 2004.10746, April 22, 2020) discloses in ABSTRACT that (1) present a learning-based approach to chip placement, one of the most complex and time-consuming stages of the chip design process; (2) unlike prior methods, this approach has the ability to learn from past experience and improve over time; (3) in particular, as train over a greater number of chip blocks, this method becomes better at rapidly generating optimized placements for previously unseen chip blocks; (4) to achieve these results, pose placement as a Reinforcement Learning (RL) problem and train an agent to place the nodes of a chip netlist onto a chip canvas; (5) to enable our RL policy to generalize to unseen blocks, ground representation learning in the supervised task of predicting placement quality; (6) by designing a neural architecture that can accurately predict reward across a wide variety of netlists and their placements, capable to generate rich feature embeddings of the input netlists; (7) then use this architecture as the encoder of our policy and value networks to enable transfer learning; (8) objective is to minimize PPA (power, performance, and area), and show that, in under 6 hours, this method can generate placements that are superhuman or comparable on modern accelerator netlists, whereas existing baselines require human experts in the loop and take several weeks. Mirhoseini further discloses in Section 1 that (1) present a learning-based approach to chip placement, one of the most complex and time-consuming stages of the chip design process; (2) the objective is to place a netlist graph of macros (e.g., SRAMs) and standard cells (logic gates, such as NAND, NOR, and XOR) onto a chip canvas, such that power, performance, and area (PPA) are optimized, while adhering to constraints on placement density and routing congestion; (3) pose chip placement as a Reinforcement Learning (RL) problem, where we train an agent (e.g., RL policy network) to optimize the placements.; (4) in each iteration of training, all of the macros of the chip block are sequentially placed by the RL agent, after which the standard cells are placed by a force-directed method; and (5) training is guided by a fast-but-approximate reward signal for each of the agent’s chip placements. Mirhoseini further discloses in Section 3 with FIG. 1 that (1) target the chip placement optimization problem, in which the objective is to map the nodes of a netlist (the graph describing the chip) onto a chip canvas (a bounded 2D space), such that final power, performance, and area (PPA) is optimized; (2) take a deep reinforcement learning approach to the placement problem, where an RL agent (policy network) sequentially places the macros; once all macros are placed, a force-directed method is used to produce a rough placement of the standard cells, as shown in Figure 1; (3) RL problems can be formulated as Markov Decision Processes (MDPs), consisting of four key elements: (a) states: the set of possible states of the world (e.g., every possible partial placement of the netlist onto the chip canvas); (b) actions: the set of actions that can be taken by the agent (e.g., given the current macro to place, the available actions are the set of all the locations in the discrete canvas space (grid cells) onto which that macro can be placed without violating any hard constraints on density or blockages); (c) state transition: given a state and an action, this is the probability distribution over next states; (d) reward: the reward for taking an action in a state. (e.g., the reward is 0 for all actions except the last action where the reward is a negative weighted sum of proxy wirelength and congestion, subject to density constraints); (4) at the initial state, s0, have an empty chip canvas and an unplaced netlist; (5) the final state sT corresponds to a completely placed netlist; (6) at each step, one macro is placed, and thus, T is equal to the total number of macros in the netlist; (7) at each time step t, the agent begins in state (st), takes an action (at), arrives at a new state (st+1), and receives a reward (rt) from the environment (0 for t < T and negative proxy cost for t = T); (8) define st to be a concatenation of features representing the state at time t, including a graph embedding of the netlist (including both placed and unplaced nodes), a node embedding of the current macro to place, metadata about the netlist, and a mask representing the feasibility of placing the current node onto each cell of the grid; (9) the action space is all valid placements of the tth macro, which is a function of the density mask ; (10) action at is the cell placement of the tth macro that was chosen by the RL policy network; (11) st+1 is the next state, which includes an updated representation containing information about the newly placed macro, an updated density mask, and an embedding for the next node to be placed; (12) rt is 0 for every time step except for the final rT , where it is a weighted sum of approximate wirelength and congestion; (13) through repeated episodes (sequences of states, actions, and rewards), the policy network learns to take actions that will maximize cumulative reward; (14) goal in this work is to minimize power, performance and area, subject to constraints on routing congestion and density; (15) true reward is the output of a commercial EDA tool, including wirelength, routing congestion, density, power, timing, and area; (16) define approximate cost functions for both wirelength and congestion; (17) to combine multiple objectives into a single reward function, take the weighted sum of proxy wirelength and congestion where the weight can be used to explore the trade-off between the two metrics; (18) while treat congestion as a soft constraint (i.e., lower congestion improves the reward function), treat density as a hard constraint, masking out actions (grid cells to place nodes onto) whose density exceeds the target density; (19) apply several approximations to the calculation of the reward function: (a) group millions of standard cells into a few thousand clusters using hMETIS; (b) discretize the grid to a few thousand grid cells and place the center of macros and standard cell clusters onto the center of the grid cells; (c) when calculating wirelength, make the simplifying assumption that all wires leaving a standard cell cluster originate at the center of the cluster; and (d) to calculate routing congestion cost, only consider the average congestion of the top 10% most congested grid cells; (20) employ half-perimeter wirelength (HPWL), the most commonly used approximation for wirelength; (21) limit the maximum number of rows and columns to 128, and treat choosing the optimal number of rows and columns as a bin-packing problem and rank different combinations of rows and columns by the amount of wasted space they incur; (22) to select the order in which the macros are placed, sort macros by descending size and break ties using a topological sort; (23) by placing larger macros first, reduce the chance of there being no feasible placement for a later macro; (24) to place standard cell clusters, use an approach similar to classic force-directed methods; (25) calculating proxy congestion, using a simple deterministic routing based on the locations of the driver and loads on the net; (26) treat density as a hard constraint, disallowing the policy network from placing macros in locations which would cause density to exceed the target (maxdensity) or which would result in infeasible macro overlap; (27) enable blockage-aware placements (such as clock straps) by setting the density function of the blocked areas to 1; (28) for policy optimization purposes, convert the canvas into a m [Symbol font/0xB4] n grid; (28) thus, for any given state, the action space (or the output of the policy network) is the probability distribution of placements of the current macro over the m [Symbol font/0xB4] n grid; and (29) state contains information about the netlist graph (adjacency matrix), its node features (width, height, type, etc.), edge features (number of connections), current node (macro) to be placed, and metadata of the netlist and the underlying technology (e.g., routing allocations, total number of wires, macros, and standard cell clusters, etc.). Mirhoseini further discloses in Section 4 with FIG. 2 that (1) goal is to develop RL agents that can generate higher quality results as they gain experience placing chips; (2) formally define the placement objective function as shown in Equation (3), where (a) J(θ, G) is the cost function; (b) the agent is parameterized by θ; (c) the dataset of netlist graphs of size K is denoted by G with each individual netlist in the dataset written as g; and (d) Rp,g is the episode reward of a placement p drawn from the policy network applied to netlist g; (4) Equation 4 shows the reward used for policy network optimization, which is the negative weighted average of wirelength and congestion, subject to density constraints; (5) propose a novel neural architecture that enables us to train domain-adaptive policies for chip placement; (6) first focused on learning rich representations of the state space; (7) intuition was that a policy network architecture capable of transferring placement optimization across chips should also be able to encode the state associated with a new unseen chip into a meaningful signal at inference time; (8) to train a supervised model that can accurately predict wirelength and congestion labels and generalize to unseen data, develop a novel graph neural network architecture that embeds information about the netlist; (9) create a vector representation of each node by concatenating the node features; (10) then repeatedly perform the following updates: (a) each edge updates its representation by applying a fully connected network to an aggregated representation of intermediate node embeddings, and (b) each node updates its representation by taking the mean of adjacent edge embeddings; (11) supervised model consists of: (a) The graph neural network described above that embeds information about node types and the netlist adjacency matrix; (b) a fully connected feedforward network that embeds the metadata, including information about the underlying semiconductor technology (horizontal and vertical routing capacity), the total number of nets (edges), macros, and standard cell clusters, canvas size and number of rows and columns in the grid; (c) a fully connected feedforward network (the prediction layer) whose input is a concatenation of the netlist graph and metadata embedding and whose output is the reward prediction; (12) Figure 2 depicts an overview of the policy network (modeled by πθ in Equation 3) and the value network architecture developed for chip placement; (13) the netlist graph is passed through proposed graph neural network architecture to generate embeddings of (a) the partially placed graph and (b) the current node; and (c) use a simple feedforward network to embed the metadata; and (14) to optimize the parameters of the policy network, use Proximal Policy Optimization (PPO) with a clipped objective.
Somayaji et al. ("Prioritized Reinforcement Learning for Analog Circuit Optimization With Design Knowledge", 2021 58th ACM/IEEE Design Automation Conference (DAC), Dec. 5-9, 2021, pp. 1231-1236) discloses in Abstract that (1) analog circuit design and optimization manifests as a critical phase in IC design, which still heavily relies on extensive and time-consuming manual designing by experienced experts; (2) in recent years, the development of reinforcement learning (RL) algorithms draws attention with related techniques being introduced into the analog design field for circuit optimization; (3) however, for robust and efficient analog circuit design, a smart and rapid search for high-quality design points is more desired than finding a globally optimal agent as in traditional RL applications, which was a point not fully considered in some previous works; (4) in this work, propose three techniques within the RL framework aiming at fast high-quality design point search in a data efficient manner: (i) incorporate design knowledge from experienced designers into the critic network design to achieve a better reward evaluation with less data; (ii) guide the RL training with non-uniform sampling techniques prioritizing exploitation over high quality designs and exploration for poorly-trained space; and (iii) leverage the trained critic network and limited additional circuit simulation for smart and efficient sampling to get high-quality design points. Somayaji further discloses in Section I that (1) treat automatic analog sizing as a black-box function optimization problem and explore the Bayesian optimization (BO) techniques for efficient search of the sizing solutions; (2) place the analog optimization problem inside the reinforcement learning (RL) context where they treat the circuit simulator as an environment to interact with, and train a specific RL agent using RL algorithms like deep deterministic policy gradient (DDPG) for efficient circuit design; (3) in order to achieve efficient and high-quality analog design, the consideration for RL algorithm design is significantly different from traditional application; (4) for traditional applications, in order to deal with different situations under an unknown environment, the RL agent should be globally well-trained for different scenarios and make no specific assumption about the environment; (5) while, in analog circuit design, to find high quality designs using less computational resources, the RL agent should be guided to focus only on certain regions of the design space for achieving good design points implying that a globally-optimal agent, which requires extensive training, is unnecessary; (6) in addition, with designers’ experience, design knowledge of the environment (circuit simulator) can be incorporated into RL agent for a data-efficient circuit optimization process; (7) propose a prioritized reinforcement learning framework for automatic analog circuit design incorporating design knowledge; (8) propose a non-uniform probabilistic sampling strategy for RL agent training to guide the training process towards potentially high-quality design areas, achieving fast convergence to optimized design solutions while significantly reducing the amount of expensive simulation data needed; (9) in addition, also pave way for embedding design knowledge of expert human designers into the critic network design to obtain an accurate model with less training data, also leading to efficient circuit optimization; and (10) finally, develop a trajectory guided local exploration method to efficiently search for good circuit designs along the samples collected by the RL agent. Somayaji also discloses in Section II that (1) given a D-dimensional parameter point x [Symbol font/0xCE] Ω, get the circuit performance f(x) of the analog circuit using the circuit simulator, where x represents the circuit design parameters such as resistance, capacitance or transistor sizing parameters; (2) the circuit performance f (x) can be a single metric or a combined figure of merit (FOM); (3) therefore, the problem of finding high-quality design points to achieve good circuit performance, can be formulated as Equation (1); (4) any constraints can be easily embedded into the design of f (x), and typically, f (x) is a highly-complex nonlinear black-box function requiring a high simulation cost which makes it challenging to be optimized; (5) in general, reinforcement learning trains an agent interacting with an environment to achieve high reward; (6) at a given time point t, the agent acts over the environment using some action at observing the next state st+1 and reward rt from the updated environment; (6) based on the collected transitions (st, at, rt, st+1), the RL agent is trained to achieve the maximum sum of future discounted rewards with a discounting factor γ[Symbol font/0xCE][0,1]; (7) for analog circuit, the environment is the circuit simulator, and define the corresponding state, action, and reward as follows: (a) state: the design parameters x in the current circuit design are used as the state for RL; (b) action: the action is defined as the incremental change in the design parameters given the current state; and (c) reward: correspondingly, the reward at a time t is defined to be the incremental change in circuit performance, where both states and actions are defined under continuous spaces; (8) the RL agent is encouraged to find positive rewards suggesting performance boosts; (9) the sum of future discounted reward Rt is directly related to the circuit performance as long as is close to 1; and (10) the particular RL agent employed for this work is deep deterministic policy gradients (DDPG). Somayaji further teaches in Section III that (1) he DDPG algorithm uses an experience replay buffer serving as a repository of all state, action, reward, next state pairs encountered by the RL agent; (2) non-uniform sampling of the replay buffer can be utilized to prioritize over select regions of the design space; (3) one particular indicator of prioritization is the temporal-difference error (TD error); (4) prioritizing over the transition with a high TD error is beneficial as policy learning can be accelerated towards regions which are not well-trained; (5) a small TD error implies that the current critic network predicts the future rewards well according to the policy at the time it was populated at; (5) however, a large TD error suggests that the future discounted reward evaluation is not accurate enough. Thus prioritizing over these transitions in the replay buffer automatically redirects learning over states for which an optimal policy is not learnt; (6) propose a probabilistic approach to switch between two different modes for replay buffer sampling; (7) the probabilistic sampling is beneficial to the RL process as it strikes a balance between random sampling and prioritized sampling; (8) propose a novel non-uniform prioritization metric to enable accelerated design point search for the circuit optimization problem; (8) make use of the critic evaluation to guide the RL learning process by sampling transitions in the replay buffer which help improve the RL performance to obtain high-quality designs; (9) the weighting parameter w is tuned in an adaptive manner during the RL process with a sampling budget of T; (10) the proposed adapting weighting parameter ensures that in the initial phase exploration over poorly-trained regions is favored, as the priority is weighted more towards the absolute TD error value; (11) however, towards the end of the simulation, intend to guide the RL process towards high quality design points and gain better confidence in the critic network evaluation; (12) therefore, transitions with higher critic evaluation are favored for further exploitation over high-quality design areas; and (13) the weight adapted non-uniform sampling confers potential boosts and speed-ups compared to the traditional RL agent. Somayaji also teaches in Section IV with FIG. 2 that (1) propose to incorporate such human expert based design knowledge into the RL framework for fast design process; (2) for analog circuit design, the mapping from sizing (states) to the final performance value (FOM) is usually highly complex, which involves numerous nonlinear circuit behaviors and complex secondary device characteristics; (3) however with the help of design knowledge, the circuit performance can be better evaluated than just using the sizing choice; (4) therefore, the RL agent’s performance can be further boosted by embedding the design knowledge into the architecture of the critic network (performance predictor); (5) the newly designed critic network mainly consists of two stages as shown in Fig. 2: (a) the first stage is to embed the design knowledge chosen by the circuit designer so as to extract important design parameters and is denoted as DK (s) derived from the RL state s; and (b) the second stage is an augmented critic network QDK (s, s0, a|θQ) with additional inputs from the design knowledge; (6) combining the two stages together, the new critic network can be represented as Equation (11) which can be easily incorporated into the original RL algorithm flow; and (7) the extra internal design parameter input to the augmented critic network QDK, incorporates the circuit designers’ experience and insights, leading to a better performance evaluation with less training samples required.
NAGARAJA (US 2018 / 0260498 A1, pub. date: 09/13/2018) discloses in ABSTRACT that (1) designing SoC by using a reinforcement learning processor; (2) an SoC specification input is received and a plurality of domains and a plurality of subdomains is created using application specific instruction set to generate chip specific graph library; (3) an interaction is initiated between the reinforcement learning agent and the reinforcement learning environment using the application specific instructions; (4) each of the SoC sub domains from the plurality of SoC sub domains is mapped to a combination of environment, rewards and actions by a second processor; (5) further, interaction of a plurality of agents is initiated with the reinforcement learning environment for a predefined number of times and further Q value, V value, R value, and A value is updated in the second memory module; and (6) thereby, an optimal chip architecture for designing SoC is acquired using application domain specific instruction set (ASI). NAGARAJA further discloses in ¶¶ [0023] and [0062]-[0069] that (1) Single Instruction Multiple Agents (SIMA) type instructions are specifically designed to be implemented simultaneously on a plurality of reinforcement learning agents which in turn are interacting with corresponding reinforcement learning environments; (2) the SIMA type instructions are configured to create a plurality of domains and a plurality of subdomains for artificial intelligence (AI) setup to generate chip specific graph library; (3) the SIMA type instructions are specifically configured to receive either a reinforcement learning agent ID or a reinforcement learning environment ID as the operand, wherein the reinforcement learning agent ID (RL agent ID) corresponds to a reinforcement learning agent, while the reinforcement learning environment ID (RL environment ID) corresponds to a reinforcement learning environment (with which the reinforcement learning agent represented by reinforcement learning agent ID interacts); (4) the SIMA type instructions executed by the reinforcement learning processor trigger an interaction between the reinforcement learning agent and reinforcement learning environment to derive values corresponding to an optimal chip design; (5) the SoC design framework is embedded within the SIMA based processor and configured to adapt and implement reinforcement learning techniques; (6) the SoC design framework is designed to reduce the complexity in the decision making process associated with design of SoC circuits; (7) the SoC design frame work is configured to generalize the multitude of heterogeneous decisions into a plurality of generalized, homogenous decisions, which are in turn utilized to finalize and implement the design for the SoC circuits; (8) the SoC design framework is designed to utilize a combination of artificial intelligence and reinforcement learning principles to arrive at optimal decisions regarding the design and implementation of SoC circuits; (9) the SIMA type instructions when executed by the reinforcement processor, trigger a reinforcement learning agent to interact with a corresponding reinforcement learning environment and further enable the reinforcement learning agent to explore the reinforcement learning environment and deduce relevant learnings from the reinforcement learning environment; (10) additionally, SIMA type instructions also provide for the deduced learnings to be iteratively applied onto the reinforcement learning environment to deduce furthermore learnings therefrom; (11) the SIMA type instructions when executed by the reinforcement learning processor, also enable the reinforcement learning agent to exploit the learnings deduced from any previous interactions between the reinforcement learning agent and the reinforcement learning environment; (12) further, the SIMA type instructions also enable the reinforcement learning agent to iteratively exploit the learnings deduced from the previous interactions, in any of the subsequent interactions with the reinforcement learning environment; (13) further, the SIMA type instructions also provide for construction of a Markov Decision Process ( MDP ) and a Semi-Markov Decision Process (SMDP) based on the interaction between the reinforcement learning agent and the corresponding reinforcement learning environment; (14) the SIMA type instructions also enable selective updating of the MDP and SMDP, based on the interactions between the reinforcement learning agent and the corresponding reinforcement learning environment; (15) further, the SIMA type instructions, upon execution by the reinforcement learning processor, read and analyze the ‘learning context' corresponding to the reinforcement learning agent and the reinforcement learning environment; (16) further, the SIMA type instructions determine an optimal Q-value corresponding to a current state of the reinforcement learning agent, and trigger the reinforcement learning agent to perform generalized policy iteration, and on-policy and off-policy learning method; (17) further , the SIMA type instructions, upon execution, approximate a state-value function and a reward function for the current state of the reinforcement learning agent; and (18) further, the SIMA type instructions, when executed by the reinforcement learning processor, train at least one of a deep neural network (DNN) and a recurrent neural network (RNN) using a predetermined learning context, and further trigger the deep neural network or the recurrent neural network for approximating at least one of a reward function and state-value function corresponding to the current state of the reinforcement learning agent. NAGARAJA further discloses in ¶¶ [0089]-[0092] with FIG. 1B that (1) receiving a SoC specification input from a first memory module; (2) further, the chip design (of SoC) is initialized by extracting details regarding chip, chip skeleton, clock, Input outputs, partitions that are retrieved from the received SoC specification input and a chip database library; (3) subsequently, a plurality of domains and a plurality of subdomains is created by artificial intelligence (AI) setup in form of Markov Decision Process (MDP), Semi Markov Decision Process (SMDP)s, Hierarchical Abstract Machines (HAM)s and MAX-Q Q using application specific instruction set to generate chip specific graph library (103); (4) the artificial intelligence setup comprises a combination of reinforcement learning (AI) agent, a reinforcement learning environment, and a task; (5) thereafter, an interaction is initiated between the reinforcement learning agent created and the reinforcement learning environment using the application specific instructions; (6) each of the SoC subdomains from the plurality of SoC sub domains is mapped to a combination of environment, rewards and actions by a second processor; (7) mapping is performed with a reinforcement learning environment ID by the second processor; (8) the Q values for each domain are generated using the reinforcement learning agent; (9) the AI agent is configured to interact with AI environment through task to extract Q values for each domain; (10) thereafter, the reinforcement learning environment is mapped to respective Electronic design automation EDA tools associated with the corresponding AI SOC subdomain; (11) sequentially, an interaction of a plurality of agents is initiated with the reinforcement learning environment for a predefined number of times and further Q value, V value, R value, and A value is updated in the second memory module; (12) thereby, an optimal chip architecture for designing SoC is acquired using application domain specific instruction set (ASI); (13) the optimal chip architecture corresponds to a maximum Q value of a top level in a SMDP Q table; (14) an empty environment corresponding to the reinforcement learning environment ID is created; (15) further, an empty agent within the reinforcement learning environment denoted by the reinforcement learning environment ID is created; (16) thereafter, the plurality of reinforcement learning agents is associated to at least one reinforcement leaning environment; (17) subsequently, training is initiated on the reinforcement learning agent represented by the reinforcement learning agent ID by using exploration instruction; (18) further, the chip topology is associated to a Markov Decision Process (MDP) with HAM constraints; (19) the chip topology is branched to a plurality of SMDPs or MDPs; (20) the plurality of SMDPs or MDPS are activated; (21) further, the plurality of activated SMDPs or MDPs is terminated on achieving a preset Q-value or a preset objective; (22) the plurality of SMDPs or MDPs are synchronized after termination; (23) a plurality of subdomains at a same level is activated; (24) further, the plurality of activated subdomains is terminated on achieving a preset Q-value or a preset objective; and (25) hereafter, the plurality of subdomains are synchronized after termination.
However, closest arts of records, as discussed above, singly or in combination do not teach or suggest at least following features "a simulation engine (210) configured to analyze object information comprising a semiconductor element and a standard cell based on design data comprising semiconductor netlist information, configure a customized reinforcement learning environment by adding constraint or position change information with regard to each object through configuration information input from a user terminal (100) and the analyzed object information, perform reinforcement learning based on the customized reinforcement learning environment, perform simulation based on an action determined to optimize disposition of at least one semiconductor element and standard cell, and state information of the customized reinforcement learning environment, and provide reward information calculated based on connection information of semiconductor elements and standard cells according to a simulation result as feedback regarding decision making by a reinforcement learning agent (220); and a reinforcement learning agent (220) configured to perform reinforcement learning based on state information and reward information received from the simulation engine (210), thereby determining an action so as to optimize disposition of semiconductor elements and standard cells, wherein the simulation engine (210) distinguishes semiconductor elements, standard cells, and wires according to characteristics or functions, and distinguishes, based on addition of specific colors, the objects distinguished according to characteristics or functions, thereby preventing learning ranges from increasing during reinforcement learning, and wherein the reinforcement learning agent (220) determines an action, by reflecting distances between semiconductor elements and lengths of wires connecting semiconductor elements and standard cells, through learning using a reinforcement learning algorithm such that the semiconductor elements and the standard cells are disposed in optimal positions" or "a) receiving, by a reinforcement learning server (200), design data comprising semiconductor netlist information from a user terminal (100);b) analyzing, by the reinforcement learning server (200), object information comprising a semiconductor element and a standard cell from the received design data, and configuring a customized reinforcement learning environment by adding constraint or position change information with regard to each object through configuration information input from a user terminal (100), based on the analyzed object information; c) performing, by the reinforcement learning server (200), reinforcement learning based on reward information and state information of the customized reinforcement learning environment comprising disposition information of semiconductor elements and standard cells to be used for reinforcement learning through a reinforcement learning agent, thereby determining an action so as to optimize disposition of at least one semiconductor element disposition and stand cell disposition; and d) performing, by the reinforcement learning server (200), simulation constituting a reinforcement learning environment regarding disposition of the semiconductor element and standard cell based on an action, and generating reward information calculated based on connection information of semiconductor elements and standard cells according to a result of performing simulation as feedback regarding decision making by the reinforcement learning agent, wherein the customized reinforcement learning environment configured in step b) distinguishes semiconductor elements, standard cells, and wires according to characteristics or functions so as to prevent learning ranges from increasing during reinforcement learning, and distinguishes, based on addition of specific colors, the objects distinguished according to characteristics or functions, and wherein, in step c), the reinforcement learning server (200) determines an action, by reflecting distances between semiconductor elements and lengths of wires connecting semiconductor elements and standard cells, through learning using a reinforcement learning algorithm such that the semiconductor elements and the standard cells are disposed in optimal positions" when combining with all other limitations of the claim as a whole.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HWEI-MIN LU whose telephone number is (313)446-4913. The examiner can normally be reached Mon - Fri: 9:00 AM - 6:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela D. Reyes can be reached at (571) 270-1006. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/HWEI-MIN LU/Primary Examiner, Art Unit 2142