Last updated: May 29, 2026
Application No. 18/077,746
METHOD AND ELECTRONIC DEVICE FOR OBTAINING TAG THROUGH HUMAN COMPUTER INTERACTION AND PERFORMING COMMAND ON OBJECT

Final Rejection §103§112
Filed
Dec 08, 2022
Priority
Jan 25, 2022 — RE 10-2022-0011063 +2 more
Examiner
HANSEN, CONNOR LEVI
Art Unit
2672
Tech Center
2600 — Communications
Assignee
Samsung Electronics Co., Ltd.
OA Round
4 (Final)
Interview Optional

— +27.9% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 78% grant rate with +27.9% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 32 resolved cases, 2023–2026
Examiner Intelligence

HANSEN, CONNOR LEVI View full profile →
Grants 78% — above average
Career Allowance Rate
25 granted / 32 resolved
+16.1% vs TC avg
Strong +28% interview lift
Without
With
+27.9%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
20 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
4.4%
-35.6% vs TC avg
§103
81.7%
+41.7% vs TC avg
§102
2.6%
-37.4% vs TC avg
§112
11.3%
-28.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 32 resolved cases
Office Action

§103 §112
Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 12/18/2025 have been fully considered but they are not persuasive. 


    PNG
    media_image1.png
    618
    554
    media_image1.png
    Greyscale
On pages 11-12, Applicant argues,


    PNG
    media_image2.png
    370
    552
    media_image2.png
    Greyscale
	Examiner respectfully disagrees and maintains that the combination of Wake in view of Siato teaches all the limitations according to amended claim 1. Wake teaches a robotic system that learns tags for target objects to perform tasks based on verbal cues. While Wake’s tags are limited to object names and attributes, Saito provides the missing teaching for associating user and contextual information with those tags. 
For example, Saito at col. 12, lines 53-58, teaches storing a “target user” field in its specific expression database (see Fig. 5). This field associates the specific expression (e.g., “curry dish”) with the identity of the user who uses that expression. This directly correspond to the claim limitation “information of a subject who uses the target object”. The user “uses” the target object by referencing it with a specific verbal cue. For example, when the user states “this is a curry dish,” they are using the dish as the subject of their utterance to define its identity for the system. Thus, the combination of Wake in view of Siato would teach associating object tags with the identities of the users who refer to and thereby “use” the target object.
Siato at col. 4, lines 63-67, further teaches learning and storing user-related information, such as a user-preferred location in association with the specific expression for the target object. This directly corresponds to the claim limitation “information of the subject related to the target object”, in that a location preference specified by the user (information of the subject) is linked to the target objects. Thus, the combination of Wake in view of Siato would teach tags that not only identify an object and its user but also contain contextual information like object locations. 
Note that under the Broadest Reasonable Interpretation, the claim limitation “wherein the tag of the target object comprises information about… “ merely requires that the tag includes or contains this information and does not specify a particular structure of the tag. The tag learning process as taught by Siato, which links a specific expression to associated data for user identities and location preferences, satisfies this requirement. The “tag” being the entire data structure (see tags stored for target objects in Fig. 5), and it comprises the user and location information as parts. Therefore, Wake in view of Siato teaches all the limitations according to claim 1 and the claim is rejected under 35 U.S.C. 103 (see below for additional detail).


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 3 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 3, lines 5-7, recites “wherein determining the size of the FOV includes adjusting a length of the FOV at least horizontally, vertically, or diagonally, based on a center of a forward direction of the electronic device” which is indefinite. It is unclear how determining the size of the FOV is “based on a center of a forward direction of the electronic device”. For example, a “forward direction” would be represented as vector and would not contain a center. Thus, what constitutes a center of a forward direction and its relationship to the size adjustment is unclear, and one of ordinary skill in the art would not be able to ascertain the scope of the claims. For examination purposes, the claim limitation will be interpreted as “determine the size of the FOV… based on a forward direction of a center of the electronic device.”
Note claim 3, line 8 additionally recites, “wherein the location of the FOV is based on a center point of size adjustment”. The claim limitation is being interpreted to mean that the location is based on a center point used for size adjustment.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 6, 7, 10-14, 16, 17, 19, 20, and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Wake et al. (US 20210402593 A1), (hereinafter Wake) in view of Saito (US 11430429 B2).

Regarding claim 1, Wake teaches a method, performed by an electronic device, of performing an operation through an interaction with a user (Wake, “In aspects, Learning-from-Observation (LFO) is a machine learning model that observes and maps human movements into motor commands for a robot to perform. A LFO model is an efficient way to reduce robot programming by teaching physical movement through observation.”, pg. 2, paragraph 0020, lines 1-5, see Fig. 1A), the method comprising: 
obtaining, using a camera of the electronic device, a plurality of images including a target object; detecting a motion of the user manipulating the target object, based on the plurality of images obtained using the camera of the electronic device (Wake, “In aspects, the human demonstration of the task sequence in environment 120 may be recorded as a set of time-series images. The time-series images may be RGB–D images, which include RGB color information with per pixel depth information. To increase efficiency of the object recognition, pre-processing may be performed by cropping the set of time-series images using a minimum rectangle area that includes all of the detected hand positions of the human during the demonstration.”, pg. 3, paragraph 0028, lines 1-9); 
obtaining, by the electronic device, a visual descriptor of the target object including visual information for identifying the target object wherein the visual descriptor further includes grasping information for providing movement of the target object, the grasping information obtained by the electronic device by analyzing the motion of the user using the plurality of images (Wake, “As illustrated, first voxel 134 is associated with a right hand of human 122 grasping cup 124 at the first time, and second voxel 135 is associated with the right hand of human 122 releasing cup 124 on shelf 128 at a second time... For simplicity, voxels associated with cup 124 within intermediate frames (e.g., images showing the human hand grasping the cup in the air between the table 126 and the shelf 128) are not illustrated; however, this should not be understood as limiting.”, pg. 3, paragraph 0033, lines 9-24, see Fig. 1B, A robot observes a task demonstrated by a human to extract grasp and release positions for moving a target object. This includes representing the object in a voxel grid and identifying visual characteristics, such as color and grasping points, from the demonstration.); 
obtaining, by the electronic device, a tag of the target object based on information related to the target object received by marking the target object, wherein the tag of the target object is based on characteristics of the target object (Wake, “the images may be further cropped to form a bounding box around each detected object, converting the bounding box into a point cloud represented by an environmental coordinate, and calculating a mean value of the point cloud as a 3D position for each detected object . A color attribute for each detected object may be determined by searching a dominant pixel color in the hue, saturation, value ( HSV ) color space of the cropped time - series images . Each detected object may then be represented in four-dimensional (4D) space (e.g., spatial/temporal space based on the 3D position and a one-dimensional (1D) time attribute extracted from the time – series images) along with a color attribute. Based on the above processing, a first object may be identified as a “cup ” (e.g., cup 124), a second object may be identified as a shaker (e.g., salt shaker 130), and third object may be identified as a plate (e.g., plate 132).”, pg. 3, paragraphs 0028 and 0029, The method includes marking each detected object in the image with bounding boxes. The user’s verbal input (e.g., “cup”) is then matched to the corresponding bounding box. This generates an object tag for the target object, which can be used for future task execution.); and 
in response to receiving an input signal corresponding to the tag, performing, by the electronic device, an operation corresponding to the input signal on the target object, based on the visual descriptor including the grasping information (Wake, “Once the robot knows “ where and when ” to pay attention, a task model encoder identifies skill parameters associated with the grasp and release actions, including corresponding hand laterality. The skill parameters may then be encoded in a task model defining the task sequence of picking up and placing a cup on a shelf. The robot 142 then decodes the task model to calculate motor commands based on the encoded skill parameters, thereby enabling the robot 142 to perform the task sequence of “ picking up the cup 144 and placing it on shelf 148.” In this way, the input – based FOA 140 enables robot 142 to learn a task (or task sequence) in a real environment 120.”, pg. 5, paragraph 0040, “The language parser 214 may analyze the input (e.g., verbal cue and/or textual cue) received from human 202 to identify one or more task - related verbs , a target object name (or object type), and an object attribute.”, pg. 5, paragraph 0045, lines 1-4, Once the task model for moving the target object is learned, the robot can then be instructed through verbal and/or text cues that include the objects name (e.g., “pick up the cup”).),

Wake does not teach wherein the tag of the target object comprises information about a subject who uses the target object, and information of the subject related to the target object. 
However, Saito teaches wherein the tag of the target object comprises information about a subject who uses the target object, and information of the subject related to the target object (Saito, “At this time, the response information output by the agent device can be presented on the basis of information learned by the agent device. For example, the agent device learns by recognizing a recognition target such as an object by general object recognition and acquiring a general name (hereinafter also referred to as a general expression) of the recognition target… However, the same object may be expressed differently (hereinafter also referred to as a specific expression) from the general expression depending on an individual or a group (for example, a family)… an embodiment of the present disclosure are conceived focusing on the above points, and enable more smooth communication with a user by learning a recognition target and a specific expression for the recognition target in association with each other. For this purpose, the information processing apparatus for implementing the information processing method according to the present embodiment is characterized in controlling output of response information using the specific expression on the basis of input information. Here, the specific expression is an expression different from the general expression. For example, the general expression for dishes used for meals is "dish". In a case where the dish is used for serving curry, the dish may be called "curry dish" in some homes. The expression "curry dish" corresponds to the specific expression. Note that the specific expression is not limited to a name of an object such as the above-described dish. For example, the specific expression may be an adjective expression indicating a property, a state, or the like of an object or the like, in addition to the name of an object or the like.”, cols. 3 and 4, lines 54-67 and 1-41, respectively, “Note that the information processing terminal 10 may learn information indicating a location of the dish 52 of the user utterance 300 of "the curry dish is placed on a top shelf in a cupboard" in association with the specific expression of the dish 52.”, col.4, lines 63-67, “Furthermore, the specific expression DB may store information regarding a target user, the number of times, and an image, in addition to the general expression and the specific expression. The target user is a user who is used the registered specific expression and is stored in association with the specific expression.”, col. 12, lines 53-58, see Fig. 5, An electronic device learns tags for target objects in order to perform operations based on verbal cue’s given by a user. Figure 5 illustrates stored tags associated with a target object. These tags include general and user-specific expressions along with names of users who use the target object. The device can further learn user-specified location preferences and stores them as part of the tag associated with the target object. Thus, the tag can be considered to comprise information of a subject who uses the object and information about the subject related to the target object such that each tag is linked to a target object for subsequent operations prompted by the user.)
Wake teaches a robotic system that learns tags for target objects to perform tasks, such as moving objects, based on verbal cues containing the object’s name (Wake, pg. 5, paragraph 0045, lines 1-4). Wake further teaches including adjectives as attribute information to distinguish between objects of the same type (Wake, “Additionally , the adjective modifying “cup” within the input may be identified as the object attribute . In aspects, the object attribute distinguishes the target object from other objects of the same object type (e.g., having the same object name , "cup” ) within a cluttered space .”, pg. 5, paragraph 0045, lines 18-22) but does not teach associating a tag with information of a subject who uses the target object or information about the user related to the target object. Saito teaches a system that learns tags for target objects which include information of a user who uses the object and user-related information about the object, such as a specified location preference, to perform subsequent operations (see above). Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the robotic system of Wake to include the object tag learning as taught by Saito (Saito, cols. 3 and 4, lines 54-67 and 1-41, respectively, col.4, lines 63-67, see Figs. 5 and 10). The motivation for doing so would have been to incorporate additional descriptors for target objects to distinguish objects of the same base type, thereby improving object detection and contextual understanding of the robot. The combination of Wake in view of Siato would learn comprehensive tags for target objects, including general expression, user-specified expressions, user identities, and location preferences, and upon receiving a verbal cue containing a specific expression, perform a moving operation on the corresponding target objects using all of the associated tags. Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine the teachings of Wake with Saito to obtain the invention as specified in claim 1.

Regarding claim 2, Wake in view of Saito teaches the method according to claim 1, wherein the obtaining of the visual descriptor comprises obtaining the visual descriptor, in response to the motion of the user being detected within a field of view in which the plurality of images are obtained (Wake, “From the viewpoint of robotic manipulation, tasks may be classified into three categories , including a position goal task , a force goal task , and a hybrid goal task . A position goal task is a task that achieves a desired state by applying a positional shift to the target object (e.g., NC-NC for moving an object); a force goal task is a task that achieves a desired state by applying force to a target object (e.g., PC-NC for lifting an object); and a hybrid goal task is a task that achieves a desired state by applying positional shift and force to a target object ( e.g. , PC - PC for scraping with an object ) .”, pg. 7, paragraph 0060, lines 1-11, A demonstration for moving the target object is collected as a time-series of images taken in a field of view defined by the robots camera. The motion of a user is analyzed from these images to identify visual characteristics, such as color and grasping points, based on the demonstration.).

Regarding claim 4, Wake in view of Saito teaches the method according to claim 1, wherein the tag of the target object comprises information about at least one of:
	a frequency of use of the target object, 
	a purpose of the target object,
	an exterior of the target object (Wake, “Rather, in addition to recognizing an object name , a color attribute for each object may be determined . For instance, the first object may be identified as a “ red cup,”… ”, pg. 3, paragraph 0029, lines 8-11), or
	a preference of the user for the target object.

Regarding claim 6, Wake in view of Saito teaches the method according to claim 1, wherein the performing of the operation corresponding to the input signal comprises:
	Identifying the target object, based on the visual information included in the visual descriptor; and providing the movement of the target object, based on the grasping information (Wake, “Thereafter, the images may be further cropped to form a bounding box around each detected object, converting the bounding box into a point cloud represented by an environmental coordinate, and calculating a mean value of the point cloud as a 3D position for each detected object . A color attribute for each detected object may be determined by searching a dominant pixel color in the hue , saturation , value ( HSV ) color space of the cropped time-series images. Each detected object may then be represented in four-dimensional (4D) space (e.g., spatial/temporal space based on the 3D position and a one-dimensional (1D) time attribute extracted from the time–series images) along with a color attribute. Based on the above processing, a first object may be identified as a “cup” ( e.g. , cup 124 )… ”, pg. 3, paragraphs 0028 and 0029, lines 9-21 and 1-2, respectively, The bounding boxes converted into point cloud representations provide spatial and color attributes for candidate objects in the time-series of images. This information along with the users verbal cue, allows the system to identify the target object by name. Once identified, grasp locations for the target object can be extracted and used to move the target object upon instruction.).

Regarding claim 7, Wake in view of Saito teaches the method according to claim 1, wherein the visual descriptor comprises information indicating at least one of:
	a 3-dimensional (3D) model of the target object,
	a point cloud of all or a portion of the target object,
	texture of all or a portion of the target object,
	a descriptor limited to visual characteristics of the target object,
	a geometric structure of the target object, or 
	an exterior of the target object (Wake, “Thereafter, the images may be further cropped to form a bounding box around each detected object, converting the bounding box into a point cloud represented by an environmental coordinate, and calculating a mean value of the point cloud as a 3D position for each detected object . A color attribute for each detected object may be determined by searching a dominant pixel color in the hue , saturation , value ( HSV ) color space of the cropped time-series images. Each detected object may then be represented in four-dimensional (4D) space (e.g., spatial/temporal space based on the 3D position and a one-dimensional (1D) time attribute extracted from the time–series images) along with a color attribute .”, pg. 3, paragraph 0028, lines 9-21).  

Regarding claim 10, Wake in view of Saito teaches the methods according to claim 1, further comprising:
	storing the visual descriptor in a database; storing the tag of the target object in the database; and storing a link between the visual descriptor and the tag of the target object in the database (Wake, “In aspects, object selector 218 may communicate with various databases 224 and/or servers 222 via network 206 to identify the target object. For example, in a cluttered environment including a plate, a cup, and a salt shaker, based on the target object (e.g., cup) and the object attribute (e.g., green) output by the language parser 214, the object selector 218 may identify a green cup within the time-series images.”, pg. 5, paragraph 0047, lines 1-10, Bounding boxes from the time-series of images are stored as object candidates with spatial and color data. The user’s verbal cue is stored separately for linguistic analysis. The system links the verbal cue to the correct bounding box to identify the target object and saves this association for later task execution.).

Claim 11 corresponds to claim 1, additionally reciting an electronic device including a camera, a memory and at least one processor. Wake in view of Saito teaches an electronic device including a camera, a memory and at least one processor (Wake, “Furthermore , embodiments of the disclosure may be practiced in an electrical circuit comprising discrete electronic elements , packaged or integrated electronic chips containing logic gates, a circuit utilizing a microprocessor, or on a single chip containing electronic elements or microprocessors… The computing device 600 may also have one or more input device (s) 612 such as visual image sensors”, pg. 11, paragraphs 0088 and 0089, lines 1-6 and 1-2, respectively) to perform the method of claim 1. As indicated in the analysis of claim 1, Wake in view of Saito teaches the method according to claim 1. Therefore, claim 11 is rejected for the same reasons of obviousness as claim 1. 

Claim 12 corresponds to claim 2, additionally reciting an electronic device including a camera, a memory and at least one processor. Wake in view of Saito teaches an electronic device including a camera, a memory and at least one processor (see analysis of claim 11) to perform the method of claim 2. As indicated in the analysis of claim 2, Wake in view of Saito teaches the method according to claim 2. Therefore, claim 12 is rejected for the same reasons of obviousness as claim 2.

Claim 13 corresponds to claim 3, additionally reciting an electronic device including a camera, a memory and at least one processor. Wake in view of Saito teaches an electronic device including a camera, a memory and at least one processor (see analysis of claim 11) to perform the method of claim 3. As indicated in the analysis of claim 3, Wake in view of Saito teaches the method according to claim 3. Therefore, claim 13 is rejected for the same reasons of obviousness as claim 3.

Claim 14 corresponds to claim 4, additionally reciting an electronic device including a camera, a memory and at least one processor. Wake in view of Saito teaches an electronic device including a camera, a memory and at least one processor (see analysis of claim 11) to perform the method of claim 4. As indicated in the analysis of claim 4, Wake in view of Saito teaches the method according to claim 4. Therefore, claim 14 is rejected for the same reasons of obviousness as claim 4.

Claim 16 corresponds to claim 6, additionally reciting an electronic device including a camera, a memory and at least one processor. Wake in view of Saito teaches an electronic device including a camera, a memory and at least one processor (see analysis of claim 11) to perform the method of claim 3. As indicated in the analysis of claim 3, Wake in view of Saito teaches the method according to claim 3. Therefore, claim 13 is rejected for the same reasons of obviousness as claim 3.

Claim 17 corresponds to claim 7, additionally reciting an electronic device including a camera, a memory and at least one processor. Wake in view of Saito teaches an electronic device including a camera, a memory and at least one processor (see analysis of claim 11) to perform the method of claim 7. As indicated in the analysis of claim 7, Wake in view of Saito teaches the method according to claim 7. Therefore, claim 17 is rejected for the same reasons of obviousness as claim 7.

Claim 19 corresponds to claim 10, additionally reciting an electronic device including a camera, a memory and at least one processor. Wake in view of Saito teaches an electronic device including a camera, a memory and at least one processor (see analysis of claim 11) to perform the method of claim 10. As indicated in the analysis of claim 10, Wake in view of Saito teaches the method according to claim 10. Therefore, claim 19 is rejected for the same reasons of obviousness as claim 10.

Claim 20 corresponds to claim 1, additionally reciting a non-transitory computer-readable recording medium. Wake in view of Saito teaches a non-transitory computer-readable recording medium (Wake, “Robot 204 may include a computer system 208 comprising an input-based FOA 210 having a number of modules. Each module may perform aspects of the input based FOA 210 based on executing computer-readable instructions.”, pg. 5, paragraph 0043, lines 1-5) to perform the method of claim 1. As indicated in the analysis of claim 1, Wake in view of Saito teaches the method according to claim 1. Therefore, claim 20 is rejected for the same reasons of obviousness as claim 1.

Regarding claim 22, Wake in view of Saito teaches the method of claim 1, wherein the performing of the operation corresponding to the input signal on the target object includes moving the target object using a robot arm of the electronic device (Wake, “Based on the processing described in FIG. 1B, including parsing the input 136 and spatio-temporally filtering 138 the time-series images of the human demonstration of the task sequence (collectively, input-based Focus-of-Attention 140), a robot 142 performs the task sequence of picking up a cup 144 from table 146 and placing it on shelf 148.”, pg. 4, paragraph 0037, lines 7-12, see Fig. 1A-1C). 

Claims 8, 9, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Wake et al. (US 20210402593 A1)in view of Saito (US 11430429 B2) and further in view of Buehler et al. (US 20130346348 A1), (hereinafter, Buehler).

Regarding claim 8, Wake in view of Saito the method of claim 1. Wake in view of Saito does not teach wherein the obtaining of the tag of the target object comprises marking the target object by using at least one light source, based on the visual descriptor.

However, Buehler teaches wherein the obtaining of the tag of the target object comprises marking the target object by using at least one light source, based on the visual descriptor (Buehler, “the robot may place optical signals into the real world itself. For example, to communicate its object selection to the user, the robot may simply shine a laser pointer at the object.”, pg. 7, paragraph 0048, lines 26-29, The robot marks the object via a laser pointer to communicate the object selection to the user.).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified Wake in view of Saito to include a laser pointer for object selection as taught by Buehler (Buehler, pg. 7, paragraph 0048, lines 26-29). The motivation for doing so would have been to allow the robot to verify the intended target object with the user prior to task learning, thereby improving the communication between user and robot. Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine the teachings of Wake in view of Saito with Buehler to obtain the invention as specified in claim 8.

Regarding claim 9, Wake in view of Saito teaches the method according to claim 1. Wake in view of Saito does not teach wherein the obtaining of the tag of the target object comprises marking the target object using at least one augmented reality (AR) projection, based on the visual descriptor.

However, Buehler teaches wherein the obtaining of the tag of the target object comprises marking the target object using at least one augmented reality (AR) projection, based on the visual descriptor (Buehler, “Visually displaying the task in an augmented-reality display means that the robot overlays computer-generated graphics (in the broadest sense, i.e., including pictorial elements, text, etc.) onto the real-world, or a representation (such as a camera image) thereof. For example, the robot may overlay graphics onto a camera view captured by the robot, and display the resulting augmented-reality image on a monitor.”, pg. 2, paragraph 0011, lines 1-8, “In some embodiments, the task involves selecting a vision model from among a plurality of computer-implemented vision models, and identifying an object in a camera view based on a representation of the object in accordance with the model. Again, visually displaying performance of the task may involve overlaying an object outline onto the camera view.”, pg. 2, paragraph 0013, The object is detected based on a visual model and an outline of the detection result is project in augment reality for user confirmation.).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified Wake in view of Saito to include marking the target object using an augmented reality projection as taught by Buehler (Buehler, pg. 2, paragraph 0011, lines 1-8 and pg. 2, paragraph 0013). The motivation for doing so would have been to allow the robot to verify the intended target object with the user prior to task learning, thereby improving the communication between user and robot. Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine the teachings of Wake in view of Saito with Buehler to obtain the invention as specified in claim 9.

Claim 18 corresponds to claim 8, additionally reciting an electronic device including a camera, a memory and at least one processor. Wake in view of Saito and Buehler teaches an electronic device including a camera, a memory and at least one processor (see analysis of claim 11) to perform the method of claim 8. As indicated in the analysis of claim 8, Wake in view of Saito and Buehler teaches the method according to claim 8. Therefore, claim 18 is rejected for the same reasons of obviousness as claim 8.

Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Wake et al. (US 20210402593 A1) in view of Saito (US 11430429 B2) and further in view of Shin (US 20150360368 A1).

Regarding claim 21, Wake in view of Saito teaches the non-transitory computer-readable recording medium of claim 20, wherein the grasping information includes information about a point where a body part of a user contacts the target object to manipulate the target object (Wake, “As illustrated, first voxel 134 is associated with a right hand of human 122 grasping cup 124 at the first time, and second voxel 135 is associated with the right hand of human 122 releasing cup 124 on shelf 128 at a second time... For simplicity, voxels associated with cup 124 within intermediate frames (e.g., images showing the human hand grasping the cup in the air between the table 126 and the shelf 128) are not illustrated; however, this should not be understood as limiting.”, pg. 3, paragraph 0033, lines 9-24, see Fig. 1B, A voxel position where the hand contacts the target object is determined by the robot.).

	Wake in view of Saito does not teach wherein the grasping information includes a center of gravity of the target object.
	However, Shin teaches wherein the grasping information includes a center of gravity of the target object (Shin, “In order to achieve the above object, according to an aspect of the present disclosure, there is provided a method of controlling a system for calculating weight and center of gravity of an object to be lifted by a robot”, pg. 1, paragraph 0009, lines 1-4, see Fig. 3). 
	Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified Wake in view of Saito to include a center of gravity determination step for each of the target objects, as taught by Shin (Shin, pg. 1, paragraph 0009, lines 1-4, see Fig. 3). The motivation for doing so would have been to account for object variability and mass distribution, thereby improving the robot’s ability to maintain balance and control during object manipulation (as suggested by Shin, “it is necessary to know the weights and the centers of gravity of objects to be manipulated so as to prevent the intelligent robots or wearable robots from losing their balance due to the objects.”, pg. 1, paragraph 0005). Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine the teachings of Wake in view of Saito with Shin to obtain the invention as specified in claim 21.

Examiner notes that no prior art was applied against claim 3. However, claim 3 stands rejected under 35 U.S.C. 112(b) due to indefiniteness, and cannot be considered to be allowable subject matter because the scope of the claims is not clearly defined.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CONNOR LEVI HANSEN whose telephone number is (703)756-5533. The examiner can normally be reached Monday-Friday 9:00-5:00 (ET).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached on (571) 272-3638. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/CONNOR L HANSEN/Examiner, Art Unit 2672



/SUMATI LEFKOWITZ/Supervisory Patent Examiner, Art Unit 2672
Read full office action
Prosecution Timeline

Show 6 earlier events
Jun 20, 2025
Response after Non-Final Action
Jul 29, 2025
Request for Continued Examination
Jul 30, 2025
Response after Non-Final Action
Sep 30, 2025
Non-Final Rejection mailed — §103, §112
Dec 18, 2025
Response Filed
Apr 07, 2026
Final Rejection mailed — §103, §112
May 19, 2026
Request for Continued Examination
May 22, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

18/042,602
Patent 12633085
METHOD FOR DETERMINING THE STORAGE FUNCTIONALITY OF AN IMAGING PLATE FOR X-RAY IMAGES
3y 2m to grant Granted May 19, 2026
17/928,394
Patent 12530785
TRACKING DEVICE, TRACKING METHOD, AND RECORDING MEDIUM
3y 1m to grant Granted Jan 20, 2026
17/932,201
Patent 12524984
HISTOGRAM OF GRADIENT GENERATION
3y 4m to grant Granted Jan 13, 2026
18/152,283
Patent 12518363
IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, IMAGE PROCESSING SYSTEM, AND STORAGE MEDIUM WITH PIECEWISE LINEAR FUNCTION FOR TONE CONVERSION ON IMAGE
2y 12m to grant Granted Jan 06, 2026
18/160,126
Patent 12499648
IMAGE PROCESSING APPARATUS, IMAGE CAPTURING APPARATUS, CONTROL METHOD, AND STORAGE MEDIUM FOR DETECTING SUBJECT IN CAPTURED IMAGE
2y 10m to grant Granted Dec 16, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

5-6
Expected OA Rounds
78%
Grant Probability
99%
With Interview (+27.9%)
2y 11m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 32 resolved cases by this examiner. Grant probability derived from career allowance rate.
METHOD AND ELECTRONIC DEVICE FOR OBTAINING TAG THROUGH HUMAN COMPUTER INTERACTION AND PERFORMING COMMAND ON OBJECT

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email