Last updated: April 19, 2026
Application No. 18/707,243
CONTROL OF AN INDUSTRIAL ROBOT FOR A GRIPPING TASK

Non-Final OA §103§112
Filed
May 03, 2024
Examiner
GAMMON, MATTHEW CHRISTOPHER
Art Unit
3657
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Vathos GmbH
OA Round
1 (Non-Final)
This examiner grants 65% of cases after interview

— +23.4% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 102 resolved cases, 2023–2026
Examiner Intelligence

GAMMON, MATTHEW CHRISTOPHER View full profile →
Grants 65% of resolved cases
Career Allow Rate
66 granted / 102 resolved
+12.7% vs TC avg
Strong +23% interview lift
Without
With
+23.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
32 currently pending
Career history
134
Total Applications
across all art units
Statute-Specific Performance

§101
7.4%
-32.6% vs TC avg
§103
32.4%
-7.6% vs TC avg
§102
26.8%
-13.2% vs TC avg
§112
31.1%
-8.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 102 resolved cases
Office Action

§103 §112
DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Claim Objections
Claims 1, 3, 12, 14, and 16 – 18 are objected to because of the following informalities:  
Claim 1 recites “serves to transmitting the image data”. It should read “serves to transmit the image data”.
Claim 3 recites “annotated post-training data are generated”. It should read “annotated post-training data is generated”.
Claim 3 recites “the the 3D model”. It should read “the 3D model”.
Claim 12 recites “in particular CAD model”. It should read “in particular a CAD model”.
Claim 12 alternates the form of verbs recited in the method steps. For example, at times using verbs ending in “-ing” and others ending without. A specific example is “Provisioning” followed immediately by “Transmission” followed immediately again by “Reading”. They should share a verb form, preferably that of “-ing”.
Claim 14 recites “pre-training of a neural network”. Claim 14 is a method claim. It should read “pre-training a neural network” (no “of”).
Claim 16 recites “the objects depicted in the image data”. It should read “objects depicted in the image data” (no “the”).
Claim 17 recites “the g instructions”. It should read “the instructions”.
Claim 18 recites “the specifief”. It should read “the specified”.
Appropriate correction is required.



Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1 – 21 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The claims are generally narrative and indefinite, failing to conform with current U.S. practice.  They appear to be a literal translation into English from a foreign document and are replete with errors. Due to these perceived errors, the exact metes and bounds of the claims are unclear to the point where Examiner has only been able to make a best effort attempt at examining the claims with respect to 35 USC § 101, 102, and 103. For example, while allowable subject matter might exist, without knowing the proper scope of the claims it is impossible to be determined with any certainty. As another example, it is unclear if actual execution of any command, control signal, or similar is performed such that an attempt is made to actually grasp/grip an object. Such a feature directly relates to the 35 USC § 101 analysis. For the purpose of compact prosecution, a limitation as such is interpreted as existing, for example with respect to “gripping the object”. Applicant is respectfully requested to review the following rejections and amend the claims such that they conform with current U.S. practice. The rejections found under 35 USC § 103 are made in light of these amendments.

A non-exhaustive list of issues found with respect foremost to Claim 1 follows. As all claims depend from Claim 1, Claim 1 is considered representative of the issues present in all claims, and all claims at a minimum inherit the issues of Claim 1 and are therefore also rejected. The issues of claim 1 influence the dependent claims, and the issues appear extensive. Rejections are only provided with respect to dependent claims where considered necessary. 

First, Claim 1 and many of its dependent claims recite, and are directed to, an apparatus. “Features of an apparatus may be recited either structurally or functionally. In re Schreiber, 128 F.3d 1473, 1478, 44 USPQ2d 1429, 1432 (Fed. Cir. 1997)” (MPEP 2114(I)). Furthermore, “ “[A]pparatus claims cover what a device is, not what a device does.” Hewlett-Packard Co.v.Bausch & Lomb Inc., 909 F.2d 1464, 1469, 15 USPQ2d 1525, 1528 (Fed. Cir. 1990) (emphasis in original). A claim containing a “recitation with respect to the manner in which a claimed apparatus is intended to be employed does not differentiate the claimed apparatus from a prior art apparatus” if the prior art apparatus teaches all the structural limitations of the claim. Ex parte Masham, 2 USPQ2d 1647 (Bd. Pat. App. & Inter. 1987)” (MPEP 2114(II)). A positively recited apparatus claim limitation should not be a recitation of how the apparatus is designed or intended to operate or an intended or achieved result or effect, particularly wherein there is no clear and definite structural distinction required by said design or intent. 
Second, those claims depending from Claim 1 which recite and are directed towards a method or process of using the particular apparatus continue to exhibit many of the issues presented below. For example, certain phrases are unclear as positive recitations regardless of if the claim is directed towards an apparatus or method.

(1) It is frequently at minimum somewhat unclear if a recitation is a structural or functional limitation. Sometimes a recitation appears to only be a recitation of an intended result. Sometimes a recitation appears to potentially indicate only a preference rather than a requirement of the claim. Finally, sometimes entire sections of recitations are clearly not a structural or functional limitations and are structured in a manner more consistent with a method claim rather than an apparatus. 
First, Applicant repeatedly uses the phrases “designed to”, “intended for”, “intended to”, and “serves to”. These phrasings are unclear if intended as functional limitations, especially “intended for/to” which clearly encompasses an interpretation of mere intent and purpose rather than indicating a particular structure is required.
See for example, “the central training computer is intended for pre-training and for post-training of the neural network”. It is unclear if the central training computer must be capable of, or in other words be configured to perform the above function, or if it is merely sufficient that there be an intent to use a central training computer in this fashion.
These might be clarified to clear recitations of functional limitations by instead reciting “configure to” or similar.
In the interest of compact prosecution, all recitations of “designed to” and “serves to” have been interpreted as meaning and reading “configured to” or “is configured to”. In light of the use of other phrasing such as “designed to” within the same claims and the plain English meaning of the phrase, the phrases “intended for” and “intended to” have been interpreted as a non-functional description, and therefore non-limiting.
Next, Applicant recites transitional phrases such as “to”, “for”, “as a result of”, “in order to”, “so that”, “used to”, and “for the purpose of”. The broadest reasonable interpretation of the prepositions of “to” and “for” includes meanings of indicating a purpose, intention, tendency, or result, while the other phrases indicate similar meanings with less nuance towards other interpretations. This is regardless of if the claim is an apparatus or method claim.
For example, Claim 1 recites “as a result of the pre-training, pre-training parameters of a pre-trained ANN are transmitted”. First, the phrasing is similar to that of a method claim rather than an apparatus claim, and second and more importantly, it is highly unclear if this recitation is merely descriptive of what can or may happen rather than being a required function of a particular structure. In other words, it appears to potentially only describe the manner in which the claimed apparatus is intended to be employed. 
As another example, Claim 1 recites “the neural network is trained for object recognition and position detection … to calculate grasping instructions”. Again, it is unclear if this recitation is merely describing an intended outcome or result of being trained for object recognition and position detection, or in other words the manner in which the neural network or apparatus as a whole is intended to be employed, or of a particular function and/or structure of the neural network or the greater apparatus.
These might be clarified to clear recitations of functional limitations by instead reciting “configured to” or similar, in some cases (usually in the case of “for” or “to”), and in other cases (usually those of “as a result of”, and “in order to”) by implementing each function as its own clear clause or recitation for a given structure/component. For example, by using formatting such as:
“at least one local processing unit configured to:
store different instances …
receive pre-training parameters …
execute a pre-trained ANN …
continuously and cyclically replace a pre-trained ANN …”.
Wherein each functional limitation is recited following a phrase such as “configured to” by the particular functional action verb.
However, in the interest of compact prosecution and for the purposes of examination during this Office Action, the phrases above (“to”, “for”, “as a result of”, “in order to”, “so that”, “used to”, and “for the purpose of”) have been interpreted as non-functional, non-limiting phrases merely indicating intended results, purpose, preferences, etc.
Next, Applicant sometimes uses the phrase “in particular”. If the recitation is a limitation which narrows the scope of the claim, such phrases are wholly unnecessary. Instead, inclusion of such phrases indicates that what is recited may instead merely be descriptive or refer to a preference which does not narrow the scope of the claim, especially when following a phrase such as “in order to” or similar as is typical within the claims.
In the interest of compact prosecution, these phrases have instead been interpreted to read “specifically” as it is believed they are meant to be limiting and to provide the specific way in which to limit the claim.
Finally, Applicant frequently provides recitations which are wholly directed to an action performed rather than a structure or functional limitation. 
For example, Claim 1 recites two sections which begin with “wherein the pretrained or post-trained neural network is applied in an inference phase on the local process unit”, “wherein a modified Iterative Closest Point, ICP, algorithm is executed on the local process unit”, and another section where in in the middle it recites “serves as a” which already indicates a potential purpose rather than a clear recitation of a limitation (see above) it recites “image data … and the refined result data set … is transmitted”. In the case of the first two items, these are clear recitations of how a component of the apparatus has been or will be operated, or in other words a recitation with respect to the manner in which a claimed apparatus is intended to be employed, rather a functional limitation recitation. These appear as if they could be at least partially clarified by implementing each action as its own clear functional limitation clause or recitation for a given structure/component, as shown above or as in the example of:
“wherein the local processing unit is configured to:
executing an ANN during an inference phase …
execute a modified iterative closest point algorithm …
compare reference image data with image data …”
or similar. 
In the case of the last item, there is no clear structure which might be performing the actions described. If this is performed by a/the local processing unit the above example is applicable. Furthermore, related to the above the “serves as a” should be removed and instead something like the word “comprise a” used.
In the interest of compact prosecution, the first two sections/items are interpreted as instead being constructed in a manner wherein the actions are clearly functional limitations, and in the case of the third section/item “serves as a” is interpreted as reading “comprises” and the “transmitted” action constructed in a manner which makes it a functional limitation of the distributed system.
Other claims similarly recite actions which are neither clear functional or structural limitations. They are addressed below as if they did where possible and similarly require correction. Such claims include Claims 3, 4, 6, 7, 10 – 12, and 19 – 20.
(2) Applicant appears to create terms and use terms contrary to their ordinary meaning. Where applicant acts as his or her own lexicographer to specifically define a term of a claim contrary to its ordinary meaning, the written description must clearly redefine the claim term and set forth the uncommon definition so as to put one reasonably skilled in the art on notice that the applicant intended to so redefine that claim term. Process Control Corp. v. HydReclaim Corp., 190 F.3d 1350, 1357, 52 USPQ2d 1029, 1033 (Fed. Cir. 1999). The terms “pre-training/ed” and “post-training/ed” in the claims appear to be used by the claims to mean “initial training”/“initially trained” and “further training”/“further trained” (or similar) respectively while the accepted meanings based on the plain English definition of the prefixes “pre” and “post” are of “before/ in advance to/prior to training” and “after/subsequent to training”. The term is indefinite because the specification does not clearly redefine the term. The specification makes it clear that “pre-training” refers to an initial instance of training, rather than something done before any training. See for example Page 4 of Applicant’s specification which reads “The initial training or pre-training” or Figure 1. Similarly see Page 6 of Applicant’s specification which reads “Post-training is used to retrain” and Figure 1. The plain meaning of post-training clearly refers to further or later training and something which occurs after all training. 
In light of the above, for the purposes of examination in this Office Action the terms have instead been interpreted as reading “initial training” / “initially trained” and “further training” / “further trained”.
(3) Applicant frequently has missing subjects or objects.
Claim 1 recites “to perform a pre-training”, however it is not provided what the “pre-training” is of. The limitations preceding and following this limitation imply what it might be, “an instance of a neural network”, however it is far from clear and should be clearly and explicitly recited if so. Additionally, this relates to the above issue of the meaning of terms. 
(4) Applicant frequently has significant issues of antecedent basis.
Claim 1 first recites “objects” rather than “an object” or similar. However, throughout the claims the phrase “the object” is used. There is insufficient antecedent basis for this limitation in the claim. The first instance should read “an object”, or more appropriately for clarity a particular object referred to with respect to the already introduced objects.
Claim 1 first recites “different object types” and “an object type” but later recites “the respective object types”. There is insufficient antecedent basis for this limitation in the claim. It should recite “having a respective object type of the different object types” or “the received object type” or similar if referring to either, or simply read “an object type” if not.
In the interest of compact prosecution, the limitation has been interpreted as simply reading “an object type”. 
Claim 1 recites “the image data of the optical acquisition device which as been fed to the implemented neural network for application”. There is insufficient antecedent basis for this limitation in the claim. No such highly specific “image data” is previously recited. While Applicant may have intended it to refer to the “image data” part of the “applied in an inference phase” limitation above, the image data is not recited as being “fed” or “for application” and the neural networks are described “applied” not implemented”. Furthermore, it would appear if such is the case, that it would be simpler to just refer to it as “the image data” as no other image data is recited to be confused with, and the image data appears to be no different.
In the interest of compact prosecution, the limitation has been interpreted as simply reading “the image data”. 
Claim 1 recites “the result data set determined by the neural network”. There is insufficient antecedent basis for this limitation in the claim. It is believed that this limitation refers to “wherein the pretrained or post-trained neural network is applied in an inference phase on the local processing unit determining a result data set”. As shown, “the neural network” is not referred to, but instead “the pretrained or post-trained neural network”. Furthermore, it is not actually stated that the result data set is determined by any neural network. What is stated is that the in applying one of said neural networks a result data set is determined in an inference phase, however the inference phase does not consist of applying said neural networks. Therefore, the result data set may presently be determined by any means in the inference phase, the relationship to said neural networks not being actually claimed.
In the interest of compact prosecution, the limitation has been interpreted as simply reading “the result data set”. 
Claim 1 recites “the local processing unit” as well as “the set of local processing units”. Previously what has been recited is “at least one local processing unit”. There is insufficient antecedent basis for these limitations in the claim.
In the interest of compact prosecution, these limitations have been interpreted as simply reading “the at least one local processing unit”.
Claim 2 recites “the refined result data set generated on the at least one local processing unit”. There is insufficient antecedent basis for this limitation in the claim. While it may be implied that the refined data set is generated on the at least one local processing unit, it is not presently positively claimed as such. 
In the interest of compact prosecution, the limitation has been interpreted as reading “the refined data set”.
Claim 3 recites “the image data acquired locally with the optical acquisition device and fed to the neural network”. There is insufficient antecedent basis for this limitation in the claim for several reasons. Previously recited image data is not recited as being “acquired locally”. Previosuly recited image data is not recited as being “fed” to anything. Previously recited image data was recited in the same clause as “the pretrained or post-trained neural network”, not the neural network.
In the interest of compact prosecution, the limitation has been interpreted as simply reading “the image data”. 
Claim 4 recites “the objects to be grasped”. There is insufficient antecedent basis for this limitation in the claim for several reasons.
In the interest of compact prosecution, the limitation has been interpreted as simply reading “the objects of different object types”. 


(5) Some limitations do not appear to have clear meaning.
Claim 1 recites “a pre-trained ANN which is continuously and cyclically replaced by a post-trained neural network”. Based on Applicant’s disclosure (See Figure 1), what is understood by this is that a particular ANN may be replaced by later “instances” of an initial neural network, however the same initial neural network is itself not repeatedly replaced. Instead, each preceding replacement is itself replaced. The plain English meaning of the limitation however, would appear to indicate that a pre-trained ANN is somehow repeatedly replaced, indicating that while it is replaced, it also somehow persists. It is not entirely clear which meaning is intended.
In the interest of compact prosecution, the limitation has instead been interpreted as reading/meaning “an ANN which is continuously and cyclically replaced by a post-trained neural network and may initially be an initially trained ANN”
Claim 1 recites “and is transmitted”. What “is transmitted” under normal English grammar would be “the image data .. and the refined result data set” however it should then read “are”. However, this also follows the phrase “a post-training data set”. 
For the purpose of compact prosecution, the “and is” has been removed from the limitation.
Claim 3 recites “the system according to claim 1, in which annotated post-training data are generated … and synthesized reference image data … which are transmitted”. Everything beginning with “and synthesized reference image data” to the end of the claim appears to be an incomplete clause or phrase. It is entirely unclear what the meaning is or should be.
For the purpose of compact prosecution, everything after “and synthesized” has been ignored.
Regarding Claim 12, the claim recites “executing a modified ICP algorithm which, as input data, firstly evaluates and compares … and, secondly, reference image data …”. Everything following and including “secondly, reference image data” appears to be incomplete clause or phrase.
For the purpose of compact prosecution, in light of other claims and in particular Claim 1 from which it depends and appears to at least be mostly repeating, this phrase, clause is interpreted to share the same meaning as the similar phrase found in Claim 1, in method form.
Regarding Claim 18, the claim as a whole does not particularly make sense. The claim recites “when using the … neural network”, however the verb “use” or its derivatives is not previously used, so what is specifically referred to is unclear, the verb “use” being extremely broad. The claim also recites “the objects are arranged under certain simplifying assumptions, in particular on a plane and disjointly in the working area” and “the objects are arranged in the working area without adhering to any simplifying assumptions”. These are vague and unclear phrasings. Furthermore, this is a contingent limitation (see “when …”) in a method claim, and may therefore not be required. MPEP 2111.04 relates. 
The claim has been interpreted as merely indicating that image data input in an initial training phase is synthetic data and that image data input in a later or further training phase is real.






Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Examiner notes that recitations not necessarily required to be disclosed by the prior art have still been addressed where expedient in the interest of compact prosecution. See 112(b) rejections above. Furthermore, the rejections provided below are a best effort in light of the 112(b) rejections above.

Claims 1 – 4 and 6 – 21 are rejected under 35 U.S.C. 103 as being unpatentable over Tremblay et al. (US 20190228495 A1) in light of Kehoe et al. (B. Kehoe, A. Matsukawa, S. Candido, J. Kuffner and K. Goldberg, "Cloud-based robot grasping with the google object recognition engine," 2013 IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, 2013, pp. 4263-4270) and Shanley (US 10133696 B1).
Regarding Claim 1, Tremblay teaches:
A distributed system (See at least example environment 100 and Figure 1) for controlling at least one robot (See at least robot 102) in a gripping task for gripping objects of different object types which are arranged in a working area of the robot (See at least Figures 2A – 2D), comprising:
a central training computer (See at least Provider Environment 125), having a memory (See at least memory 704) on which an instance of a neural network, ANN, is stored (See at least model repository 134), wherein the central training computer is intended for pre-training and for post-training of the neural network (See at least [0124] “The communication, or information from the communication, can be directed to a training manager 130, which can select an appropriate model or network and then train the model using relevant training data 132”); wherein the neural network is trained for object recognition and position, including detection of an orientation of the object detection (See at least [0027] “FIGS. 2A through 2C illustrate portions of a basic task that can be learned … A robot capturing image data representative of these actions could analyze the image data to determine orientation, location, relationship, and other information about the objects), to calculate grasping instructions for an end effector unit (See at least [0026] “end effector”) of the robot for grasping the object (See at least [0026] “gripper assembly” and [0027] “The plan can be a program, file, database, or set of actions or instructions, which could include steps such as “Place Block B on Block A” followed by “Place Block C to the right of Block A” ”);
wherein the central training computer is designed to receive an object type (See at least [0027] “identifiable by their respective colors or other such aspects … orientation, location, relationship, and other information about the objects”); and
wherein the central training computer is designed to perform a pre-training (See at least [0065] “the training manager 504 can be instructed to perform further training” (meaning there is prior initial training)) exclusively with synthetically generated object data (See at least [0037] “Leveraging convolutional pose machines, object cuboids can be reliably detected in images even when severely occluded, after training only on synthetic images” (emphasis added)) which is generated by means of a geometric, object-type-specific (See at least [0056] “The classified data can include instances of at least one type of object for which a statistical model is to be trained”) 3D model (See at least [0041] “Each object of interest can be modeled, such as by a bounding cuboid” and [0083] “our system operates in 3D”) of the object, and wherein, as a result of the pre-training, pre-training parameters of a pre-trained ANN are transmitted to at least one local processing unit via a network interface (See at least [0023] “This can involve, for example, using a training module 110 on the robot itself, or sending the data across the at least one network 122 for processing … At least some functionality may also operate on a remote device, networked device, or in “the cloud” in some embodiments”), and 
wherein the central training computer is further designed to continuously and cyclically perform a post-training of the neural network (See at least [0065] “the training manager 504 can be instructed to perform further training, or in some instances try training a new or different model”) and to transmit post-training parameters of a post-trained neural network to at least one local processing unit via the network interface as a result of the post-training (See at least [0026] “The execution neural network can perform the inference on the robot 102, on the client device 138, or using an inference 136 in the provider environment 124, among other such options. Once the instructions are generated, the instructions can be provided to the control system 104 of the robot, either directly or upon execution by the processor 112, etc.”);
a set of local resources that interact via a local network (See at least Figure 1):
the robot with a robot controller (See at least controller 104), a manipulator (See at least [0026] “multi-link manipulator”) and the end effector unit, wherein the robot controller is intended for controlling the robot and in particular its end effector unit for executing the gripping task for a respective object of the respective object type (See again at least Figures 2A – 2D);
an optical acquisition device for capturing image data of objects in the working area of the robot (See at least [0022] “sensors 108 … for example, one or more cameras to capture images or video of the performance in the environment within a field of view 118 of the respective sensors … the sensors 108 can capture information, such as video and position data, representative of the objects 120 in the task environment”);
at least one local processing unit for interacting with the robot controller (See at least processor 112), the at least one local processing unit being intended to store different instances of the neural network (See at least training program 110 and/or memory 114), receiving pre-training parameters and post-training parameters from the central training computer (See at least [0026] “The execution neural network can perform the inference on the robot 102, on the client device 138, or using an inference 136 in the provider environment 124, among other such options. Once the instructions are generated, the instructions can be provided to the control system 104 of the robot, either directly or upon execution by the processor 112, etc.”), in particular, in order to implement a pre-trained ANN which is continuously and cyclically replaced by a post-trained neural network until a convergence criterion is fulfilled (See at least [0061] “In some embodiments the training manager can monitor the quality of patterns (i.e., the model convergence) during training, and can automatically stop the training when there are no more data points or patterns to discover”), and
wherein the pretrained or post-trained neural network is applied in an inference phase on the local processing unit determining a result data set from the image data captured by the optical acquisition device, which is used to calculate the gripping instructions for the end effector unit for gripping the object and to transmit these to the robot controller for execution (See at least [0041] “a camera can acquire a live video feed of a scene, from which a pair of networks can infer the positions and relationships of objects in the scene in real time. The resulting percepts can be fed to another network that generates a plan to explain how to recreate those percepts”);
…
and, secondly, reference image data and compares them with each other in order to minimize errors and to generate a refined result data set, the reference image data being a synthesized, rendered image which is rendered based on the result data set determined by the neural network and the 3D model (See at least [0054] “During training, data flows through the DNN in a forward propagation phase until a prediction is produced that indicates a label corresponding to the input. If the neural network does not correctly label the input, then errors between the correct label and the predicted label are analyzed, and the weights are adjusted for each feature during a backward propagation phase until the DNN correctly labels the input and other inputs in a training dataset”);
and whereby the image data captured with the optical acquisition device and the refined result data set serves as a post-training data set and is transmitted to the central training computer for the purpose of post-training (See again [0054]);
the network interface for data exchange between the central training computer and the set of local processing unit (See at least network 122), …

Tremblay does not teach, but Kehoe teaches:
…
wherein a modified Iterative Closest Point, ICP, algorithm is executed on the local processing unit, which firstly takes as input data the image data of the optical acquisition device which has been fed to the implemented neural network for application (See Section IV, E, “First, estimating the pose of the object using a least-squares fit between the detected 3D point cloud and the reference point set using the iterative closest point method (ICP) [36] [38]. We use the ICP implementation from PCL. The ICP algorithm performs a local optimization and therefore requires a reasonable initial pose estimate to find the correct alignment.”),
…
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to utilize a well-known pose estimation technique such as that using an ICP method as disclosed in Kehoe in the system of Tremblay with a reasonable expectation of success. ICP is a well known method with particular advantages which would be obvious to utilize in Tremblay, which is not particular as to how ground truth and other data for comparison is generated.

Tremblay does not teach, but Shanley teaches:
…
whereby the data exchange takes place via an asynchronous protocol (See at least Column 4, Lines 29 – 33 “An example system having a bridge, an asynchronous channel based bus, and a message broker to provide asynchronous communication is shown in accordance with various embodiments, are then described”).
…
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to utilize an asynchronous protocol as disclosed in Shanley in the system of Tremblay with a reasonable expectation of success. It is common for different components within a system to operate with different timings such that an asynchronous protocol is required. Asynchronous protocols are well known and routine in computer systems including networking and that disclosed by Shanley would merely be one of many different solutions to a typical problem.

Regarding Claim 2, the combination of Tremblay, Kehoe, and Shanley teaches:
The system according to claim 1, 
Tremblay further teaches:
wherein the network interface serves to transmit parameters for instantiating the pre-trained or post-trained neural network from the central training computer to the at least one local processing unit (See at least [0026] “The execution neural network can perform the inference on the robot 102, on the client device 138, or using an inference 136 in the provider environment 124, among other such options. Once the instructions are generated, the instructions can be provided to the control system 104 of the robot, either directly or upon execution by the processor 112, etc.” and/or Figure 1), and/or wherein the network interface serves to transmitting the image data captured with the optical acquisition device and the refined result data set generated on the at least one local processing unit to the central training computer for post-training (See at least Figure 1 and [0069] “the now classified data instances can be stored to the classified data repository, which can be used for further training of the trained model 508 by the training manager”) and/or wherein the network interface serves to load the geometric, object-type-specific 3D model on the local processing unit (See again at least [0023], [0041], and [0056]).

Regarding Claim 3, the combination of Tremblay, Kehoe, and Shanley teaches:
The system according to claim 1, 
Tremblay further teaches:
in which annotated post-training data are generated on the local processing unit from the image data acquired locally with the optical acquisition device and fed to the neural network (See at least [0056] “For example, the classified data might include a set of images that each includes a representation of a type of object, where each image also includes, or is associated with, a label, metadata, classification, or other piece of information identifying the type of object represented in the respective image”) and synthesized reference image data by means of an annotation algorithm, which are transmitted to the central training computer for the purpose of post-training, the synthesized reference image data being a synthesized, rendered image which is rendered based on the result data set determined by the neural network and the the 3D model.

Regarding Claim 4, the combination of Tremblay, Kehoe, and Shanley teaches:
The system according to claim 1, 
Tremblay further teaches:
wherein the system comprises a user interface (See at least [0024] “The interface layer 126 can include application programming interfaces (APIs) or other exposed interfaces enabling a user, client device, or other such source to submit requests or other communications to the provider environment”) which is intended to provide one selection field in order to determine an object type of the objects to be grasped and wherein the determined object type is transmitted to the central training computer, so that the central training computer, in response to the determined object type, loads the object-type-specific 3D model from a model storage in order to synthesize object-type-specific images in all physically plausible positions and/or orientations by means of a synthesis algorithm, which serve as the basis for the pre-training of the neural network.
	
Regarding Claim 4, the combination of Tremblay, Kehoe, and Shanley teaches:
The system according to claim 1, 
Tremblay does not teach, but Shanley has already been shown to teach in combination with Tremblay:
The system according to claim 1 any one of the preceding claims, wherein the network interface facilitates synchronization using a message broker implemented as a microservice (See at least Column 10, Lines 11 – 14, “The function of bridge 510 is to extend specific channel(s) 640 on bus 410 out to an application's message broker 555 (broker 555 may be a broker, platform, designated microservice or the like)”).

Regarding Claim 7, the combination of Tremblay, Kehoe, and Shanley teaches:
The system according to claim 1, 
Tremblay further teaches:
in which the data exchange between the local resources and the central training computer takes place exclusively via the local processing unit, which serves as a gateway (See at least Figure 1).

Regarding Claim 8, the combination of Tremblay, Kehoe, and Shanley teaches:
The system according to claim 1, 
Tremblay further teaches:
wherein the grasping instructions comprise an identification data set (See at least [0041] “The resulting percepts can be fed to another network that generates a plan to explain how to recreate those percepts. Finally, an execution network reads the plan and generates actions for the robot”) used to identify at least one end effector suitable for the object from a set of end effectors of the end effector unit.

Regarding Claim 9, the combination of Tremblay, Kehoe, and Shanley teaches:
The system according to claim 1, 
Tremblay further teaches:
wherein the optical acquisition device is a device for capturing depth images (See at least [0022] “Other sensors or mechanisms can be utilized as well, as may include depth sensors”) and optionally for capturing intensity images in the visible or infrared spectrum.

Regarding Claim 10, the combination of Tremblay, Kehoe, and Shanley teaches:
The system according to claim 9, 
Tremblay further teaches:
wherein the computed grasping instructions can be (This indicates capability or capacity for which is especially broad. There is no indication that any plan comprising “grasping instructions” in Tremblay is unable to meet the following requirements, and furthermore such “visualization” is standard practice for robotic interfaces. See also [0083] “It also generates human-readable plans, unlike those of the recent work” and [0016] “In embodiments where the plan is human readable, a human can view the plan and make any corrections, either manually or through another demonstration of the task”) visualized by showing a virtual scene of the gripper grasping the object, the calculated visualization of the grasping instructions being output on a user interface.

Regarding Claim 11, the combination of Tremblay, Kehoe, and Shanley teaches:
The system according to claim 1, 
Tremblay further teaches:
wherein the post-training of the neural network (ANN) is performed iteratively and cyclically (See at least [0068] “In one embodiment building a machine learning application is an iterative process that involves a sequence of steps”) following a transmission of post-training data in the form of refined result data sets comprising image data acquired locally by the optical acquisition device, which are automatically annotated and which have been transmitted from the local processing unit (LCU) to the central training computer (See at least [0069] “the now classified data instances can be stored to the classified data repository, which can be used for further training of the trained model 508 by the training manager. In some embodiments the model will be continually trained as new data is available” and [0049] “During a training process, performance data is captured 402 or otherwise obtained or received that is representative of a task to be performed at least partially in the physical world. As mentioned, this can include image data captured by at least one camera, among other such options”).

Regarding Claim 12, the combination of Tremblay, Kehoe, and Shanley teaches:
The system according to claim 1, 
Tremblay further teaches:
wherein a post-training data set for post-training the neural network is gradually and continuously expanded by image data acquired by sensors in the vicinity of the robot (See at least [0049] “During a training process, performance data is captured 402 or otherwise obtained or received that is representative of a task to be performed at least partially in the physical world. As mentioned, this can include image data captured by at least one camera, among other such options”).


Regarding Claim 13, the combination of Tremblay, Kehoe, and Shanley teaches:
a system according claim 1, 
Tremblay further teaches or has already been shown to teach:
An operating method for operating a system according claim 1, comprising the following method steps:
on the central training computer: Read in an object type (See at least [0027] “identifiable by their respective colors or other such aspects … orientation, location, relationship, and other information about the objects”);
on the central training computer: Access a model storage (See at least model repository 134) in order to load the 3D model, in particular CAD model, assigned to the selected object type and generate synthetic object data from it (See at least [0047] “synthetic data generated by randomly sampling”) and use it (See at least [0032] “The object detection network can be a convolutional neural network that is trained on a set of training images, using domain randomization to overcome any reality gap resulting from the use of synthetic data”) for the purpose of pre-training;
on the central training computer: Pre-training of a neural network with the generated synthetic object data (See again at least [0032]); 
on the central training computer: Provisioning of pre-training parameters (See again at least [0032]);
on the central training computer: Transmission of the pre-training parameters via the network interface to at least one local processing unit (See at least [0023] “This can involve, for example, using a training module 110 on the robot itself, or sending the data across the at least one network 122 for processing … At least some functionality may also operate on a remote device, networked device, or in “the cloud” in some embodiments”); 
on the at least one local processing unit: reading pre-training parameters or post-training parameters of a pre-trained or post-trained neural network via the network interface (See at least [0026] “The execution neural network can perform the inference on the robot 102, on the client device 138, or using an inference 136 in the provider environment 124, among other such options. Once the instructions are generated, the instructions can be provided to the control system 104 of the robot, either directly or upon execution by the processor 112, etc.”) in order to implement the pre-trained or post-trained neural network; 
on the at least one local processing unit: Acquisition of image data (See at least Figure 1 and field of view 118);
on the at least one local processing unit: applying the pre-trained or post-trained neural network with the acquired image data to determine the result dataset (See at least [0041] “a camera can acquire a live video feed of a scene, from which a pair of networks can infer the positions and relationships of objects in the scene in real time. The resulting percepts can be fed to another network that generates a plan to explain how to recreate those percepts”); 
on the at least one local processing unit:
…
secondly, reference image data, to minimize alignment errors and to generate a refined result data set, wherein the reference image data is a synthesized, rendered image which is rendered based on the result data set determined by the neural network and the 3D model (See at least [0054] “During training, data flows through the DNN in a forward propagation phase until a prediction is produced that indicates a label corresponding to the input. If the neural network does not correctly label the input, then errors between the correct label and the predicted label are analyzed, and the weights are adjusted for each feature during a backward propagation phase until the DNN correctly labels the input and other inputs in a training dataset”); 
on the at least one local processing unit: calculating gripping instructions for the end effector unit of the robot based on the generated refined result data set (See at least [0041] “The resulting percepts can be fed to another network that generates a plan to explain how to recreate those percepts. Finally, an execution network reads the plan and generates actions for the robot”); 
on the at least one local processing unit: Data exchange with the robot controller (See at least Figure 1) for controlling the end effector unit of the robot with the generated gripping instructions; 
on the at least one local processing unit: generating post-training data, wherein the refined result data set serves as the post-training data set and is transmitted to the central training computer (CTC) for the purpose of post-training (See at least [0069] “the now classified data instances can be stored to the classified data repository, which can be used for further training of the trained model 508 by the training manager. In some embodiments the model will be continually trained as new data is available” and [0049] “During a training process, performance data is captured 402 or otherwise obtained or received that is representative of a task to be performed at least partially in the physical world. As mentioned, this can include image data captured by at least one camera, among other such options”); 
on the central training computer: acquisition of the post-training data via the network interface (See at least Figure 1), the post-training data comprising the labeled real image data acquired with the optical acquisition device (See again at least [0049] and [0069]); 
on the central training computer: Continuous and cyclical retraining of the neural network with the recorded retraining data until a convergence criterion is fulfilled for the provision of post-training parameters (See at least [0061] “In some embodiments the training manager can monitor the quality of patterns (i.e., the model convergence) during training, and can automatically stop the training when there are no more data points or patterns to discover”); 
on the central training computer: Transmission of the post-training parameters via the network interface to at least one local processing unit (See at least Figure 1).

Kehoe has already been shown to teach in combination with Tremblay:
…
executing a modified ICP algorithm which, as input data, firstly evaluates and compares the image data of the optical acquisition device which have been supplied to the implemented neural network for application and
…

Regarding Claim 14, the combination of Tremblay, Kehoe, and Shanley teaches:
a system according claim 1, 
Tremblay further teaches or has already been shown to teach:
A method for operating a central training computer in a system according to claim 1, comprising the following method steps:
reading in an object type (See at least [0027] “identifiable by their respective colors or other such aspects … orientation, location, relationship, and other information about the objects”);
accessing the model storage (See at least model repository 134) in order to load the 3D model, in particular the CAD model, assigned to the detected object type and to generate synthetic object data from it and use it for the purpose of pre-training;
pre-training of a neural network (See again at least [0032]) with the generated synthetic object data, which serve as pre-training data, to provide pre-training parameters;
transmission of the pre-training parameters via the network interface to the at least one local processing unit 
Read full office action
Prosecution Timeline

May 03, 2024
Application Filed
Dec 12, 2025
Non-Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/556,276
Patent 12594673
Method of Calibrating Manipulator, Control System and Robot System
2y 5m to grant Granted Apr 07, 2026
18/550,894
Patent 12588646
MILKING SYSTEM COMPRISING A MILKING ROBOT
2y 5m to grant Granted Mar 31, 2026
18/324,527
Patent 12583110
ROBOT CONTROL SYSTEM
2y 5m to grant Granted Mar 24, 2026
19/008,421
Patent 12576523
CONTROLLING ROBOTS USING MULTI-MODAL LANGUAGE MODELS
2y 5m to grant Granted Mar 17, 2026
18/588,084
Patent 12544926
OBJECT INTERFERENCE CHECK METHOD
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
65%
Grant Probability
88%
With Interview (+23.4%)
2y 9m
Median Time to Grant
Low
PTA Risk
Based on 102 resolved cases by this examiner. Grant probability derived from career allow rate.