DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office action is in responsive to RCE filed on 01/28/2026. Claims remain pending in the application. Claims 1, 11, and 16 are independent.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 4-9, 11, 14-16, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over TANG et al. (US 2020/0226814 A1, filed on 03/11/2019), hereinafter TANG in view of Ravasz et al. (US 2021/0090332 A1, filed on 09/20/2019), hereinafter Ravasz, Mendes et al. ("Mid-Air Interactions Above Stereoscopic Interactive Tables", 2014 IEEE Symposium on 3D User Interfaces (3DUI), March 29-30, 2014, pp. 3-10), hereinafter Mendes, and YANAI et al. (US 2015/0324001 A1, pub. date: 11/12/2015), hereinafter YANAI.
Independent Claims 1, 11, and 16
TANG discloses a method of interacting with a virtual object (TANG, ¶¶ [0002], [0014], and [0028]: various interaction methodologies have each been developed to facilitate a user's virtual interactions with computer generated three-dimensional object within virtual reality (VR), augmented reality (AR), and mixed reality (MR) environments; with ray-casting, a virtual light ray of sorts, projected from a user's hand or head, can enable the user to interact with virtual objects that are far away or presented as being out of arms' reach; facilitate user interaction with one or more rendered virtual objects , facilitated by one of a plurality of interaction methodologies), the method comprising:
receiving one or more images of a first hand and a second hand of a user (TANG, ¶ [0017]-[0018] with FIG. 1: the head-mounted display device 10 include a two-dimensional image camera 20 (e.g. a visible light camera and/or infrared camera) and a depth imaging device 22; depth imaging device 22 may include an infrared light-based depth camera (also referred to as an infrared light camera) configured to acquire video of a scene including one or more human subjects; ¶¶ [0026] and [0015] with 255 and 260 in FIG. 2: a first ray 265 is displayed as originating from the palm of the user's left hand 255, depicted as a dashed line terminating in a selection cursor 270; a second ray 275 is displayed as originating from the palm of the user's right hand 260, depicted as a dashed line terminating in a selection cursor 280; casting the ray through the user's modeled arm joint and their hand-tracked palm joint via hand-tracking technology; ¶¶ [0029]-[0032] with 320-350 in FIG. 3: receiving information from the depth camera about an environment; determine a position of the display device within the environment based on the information received from the depth camera and one or more additional sensor components (e.g., gyroscope , accelerometer , and magnetometer, infrared lights, infrared cameras, motion sensors, light sensors, 3D scanners, CMOS sensors, GPS radio, etc.); inferring a position of a joint of a user's arm based on the position of the head mounted display when the joint of the user's arm is not be visible to the depth camera of the head-mounted display; when an external camera is available, this calculation may not be needed; determining a position of a user's hand based on the information received from the depth camera);
analyzing the one or more images to detect a plurality of keypoints associated with each of the first hand and the second hand (TANG, ¶¶ [0033]-[0034] with 350 in FIG. 3: point clouds (e.g., portions of a depth map) corresponding to the user's hands may be further processed to reveal the skeletal substructure of the hands, and to identify components of the user's hands, such as wrist joints, finger joints, adjoining finger segments, knuckles, palm, dorsum, etc.; a virtual skeleton is fit to the pixels of depth video that correspond to the user);
determining an interaction point for each of the first hand and the second hand based on the plurality of keypoints associated with each of the first hand and the second hand, wherein determining the interaction point for each of the first hand and the second hand (TANG, ¶¶ [0034]-[0035] with 360-370 in FIG. 3: casting a ray from a portion of the user's hand based on the position of the joint of the user's arm and the position of the user's hand, which determines the poses, movements, gestures or actions of the imaged hand; e.g., lines may be generated for the shoulder, and/or elbow, as well as for the palm, wrist, knuckle, etc.; ¶¶ [0037]-[0040] with 410 in FIGS. 4A-4B and 4D and 460 in FIG. 4C: an initial ray 415 is positioned with an origin at the user's shoulder 405 passing through the user's palm 410; an initial ray 440 is positioned with an origin at the user's elbow 435 passing through the user's palm 410; rays may be cast from a user's fingertips; an initial ray 450 is positioned with an origin at the user's eye 455 passing through the user's finger 460; the ray may be cast based on palm orientation alone, e.g., a targeting ray 470 is extended from the user's palm 410 into the environment; i.e., interaction point – user's palm 410 or user's fingertips is selected/determined from a portion of the user's hand keypoints – wrist joints, finger joints, adjoining finger segments, knuckles, palm, dorsum, etc.) includes:
identifying a first particular gesture performed by the first hand from a plurality of gestures (TANG, ¶ [0018]: process the acquired video to identify one or more objects within the operating environment, one or more postures and/or gestures of the user wearing head-mounted display device 10, one or more postures and/or gestures of other users within the operating environment, etc.; ¶¶ [0033]-[0034] with 350 in FIG. 3: a virtual skeleton is fit to the pixels of depth video that correspond to the user; by analyzing positional change in the various hand joints and/or segments, the corresponding poses, movements, gestures or actions of the imaged hand may be determined; ¶ [0036]: the user may be able to make a ray longer or shorter with a predetermined gesture; ¶ [0042]: pointing the palm up may be a gesture designated to signify another function; pointing down, making a fist or closed grip may also signal the user's intent not to cast the ray; this may enable the system to match the user's intent as to when they want to use the functionality and to cast a ray; other intentional gestures may be provided to the user for turning ray-casting on or off; semantic gestures such as finger pointing, or a "spiderman-style web-cast" gesture may also be used; ¶¶ [0048]-[0049]: once a virtual object is targeted, the head-mounted display may recognize a selection gesture from the user's hand based on information received from the depth camera , and select the targeted object responsive to recognizing the selection gesture; perform a finger-based gesture, such as a two-finger pinch to select the object; the appearance of the targeting cursor may be adjusted responsive to recognizing a selection gesture from the user's hand; ¶¶ [0051]-[0054] with FIG. 6: recognize a manipulation gesture from fingers of the user's hand based on information received from the depth camera; if a user is manipulating a proximal object, they do so with a predetermined set of actions and gestures; e.g., at 600 of FIG . 6, a user is shown grasping virtual ball 605 with a left hand 610 and pinching a control point 615 of virtual ball with a right hand 620; a pinch gesture may be used to scale an object, grasping may enable moving an object, etc.; examples presented herein are centered around a "pinch" gesture, but more generally any sort of suitable manipulation gesture may be used, e.g., pinch, point, grasp, wipe, push; objects may be rotated, translated, resized, stretched, deleted, etc.; to manipulate a distal object, the same actions and gestures can be used, because the user's palm sets the ray, and the fingers are left available for manipulation; e.g., at 630 of FIG . 6, a user is shown targeting virtual ball 605 with a first ray 635 emanating from left hand 610 and targeting virtual ball 605 with a second ray 640 emanating from left hand 620; the user may manipulate virtual ball 605 using the same manipulation gestures shown at 600, e.g., grasping virtual ball 605 with a left hand 610 and pinching a control point 615 of virtual ball with a right hand 620; ¶ [0015]: the user may make a selection gesture, such as an air-tap, to select an item that they are currently targeting);
selecting a subset of the plurality of keypoints associated with the first hand (TANG, ¶¶ [0034]-[0035] with 360-370 in FIG. 3: point clouds (e.g., portions of a depth map) corresponding to the user's hands may be further processed to reveal the skeletal substructure of the hands, and to identify components of the user's hands, such as wrist joints, finger joints, adjoining finger segments, knuckles, palm, dorsum, etc.; casting a ray from a portion of the user's hand based on the position of the joint of the user's arm and the position of the user's hand, which determines the poses, movements, gestures or actions of the imaged hand; e.g., lines may be generated for the shoulder, and/or elbow, as well as for the palm, wrist, knuckle, etc.; ¶¶ [0037]-[0040] with 410 in FIGS. 4A-4B and 4D and 460 in FIG. 4C: an initial ray 415 is positioned with an origin at the user's shoulder 405 passing through the user's palm 410; an initial ray 440 is positioned with an origin at the user's elbow 435 passing through the user's palm 410; rays may be cast from a user's fingertips; an initial ray 450 is positioned with an origin at the user's eye 455 passing through the user's finger 460; the ray may be cast based on palm orientation alone, e.g., a targeting ray 470 is extended from the user's palm 410 into the environment; i.e., interaction point – user's palm 410 or user's fingertips is selected/determined from a portion of the user's hand keypoints – wrist joints, finger joints, adjoining finger segments, knuckles, palm, dorsum, etc.);
;
generating one or more rays based on the interaction point for each of the first hand and the second hand; and interacting with the virtual object using the one or more rays (TANG, ¶¶ [0043] and [0045] with 370 in FIG. 3: responsive to the ray intersecting with one or more control points of a virtual object, indicating to the user that the object is being targeted; a virtual object may be targeted when a cursor at the terminus of a cast ray is coincident with one or control points of the virtual object; ¶¶ [0046]-[0058] with FIGS. 5A-5B and FIG. 6: user casting a ray 500 from the user's palm 505; virtual ball 515 is selected based on the intersection of selection cursor 510 with bounding box 520; when selection is done (e.g., the user pinches their index and thumb together, as shown at 525 of FIG. 5B) the cursor is animated to match the finger actions (e.g., resize to dot ) as shown at 530 of FIG. 5B; grasping virtual ball 605 with a left hand 610 and pinching a control point 615 of virtual ball with a right hand 620; a pinch gesture may be used to scale an object, grasping may enable moving an object , etc.; targeting virtual ball 605 with a first ray 635 emanating from left hand 610 and targeting virtual ball 605 with a second ray 640 emanating from left hand 620; for two hand manipulation, two or more modes may be available; e.g., each hand may select a different affordance, thus enabling two hand scaling, rotation, manipulation, etc.; this may provide the user more control; movement of the user's hands may cause the manipulation; e.g., if the user's hands stay the same distance apart the object rotates; if the user's hands spread, the object may be resized, etc.).
TANG further discloses a system (TANG, ¶ [0060] with 700 in FIG. 7: a computing system 700) comprising: one or more processors (TANG, ¶¶ [0062]-[0063] with 710 in FIG. 7: the logic machine 710 may include one or more processors); and a machine-readable medium (TANG, ¶¶ [0064]-[0066] with 720 in FIG. 7: storage machine 720 may include optical memory, semiconductor memory, and/or magnetic memory, among others) comprising instructions (TANG, ¶ [0064]: storage machine 720 includes one or more physical devices configured to hold instructions) that, when executed by the one or more processors, cause the one or more processors to perform operations described above (TANG, ¶¶ [0062]-[0064]: logic machine 710 includes one or more physical devices configured to execute instructions; the logic machine 710 may include one or more processors configured to execute software instructions; storage machine 720 configured to hold instructions executable by the logic machine 710 to implement the methods and processes described herein)
TANG further discloses a non-transitory machine-readable medium (TANG, ¶¶ [0064]-[0066] with 720 in FIG. 7: storage machine 720 may include optical memory, semiconductor memory, and/or magnetic memory, among others) comprising instructions (TANG, ¶ [0064]: storage machine 720 includes one or more physical devices configured to hold instructions) that, when executed by one or more processors (TANG, ¶¶ [0062]-[0063] with 710 in FIG. 7: the logic machine 710 may include one or more processors), cause the one or more processors to perform operations described above (TANG, ¶¶ [0062]-[0064]: logic machine 710 includes one or more physical devices configured to execute instructions; the logic machine 710 may include one or more processors configured to execute software instructions; storage machine 720 configured to hold instructions executable by the logic machine 710 to implement the methods and processes described herein).
TANG fails to explicitly disclose (1) selecting a subset of the plurality of keypoints associated with the first hand that correspond to the first particular gesture for determining the interaction point; (2) determining a bimanual interaction point as a midpoint between the interaction point for each of the first hand and the second hand; (3) generating one or more bimanual deltas based on the interaction point for each of the first hand and the second hand; and interacting with the virtual object using the one or more bimanual deltas and the bimanual interaction point.
Ravasz teaches a system and a method relating to object interaction in an artificial reality environment (Ravasz, ¶ [0002]), wherein determining the interaction point based on the first particular gesture (Ravasz, ¶¶ [0080]-[0081], [0084], [0086], [0100], [0104], and [0126] with 502 in FIG. 5; FIGS. 6-9; 1202 in FIG. 12; FIG. 13/42B; and FIG. 20: casting a ray projection with an origin point (e.g., a user's dominant eye, a point between the user's eyes, another point on the user's head, the user's hip, the user's shoulder, or a context variable point, e.g., between the user's hip and shoulder) and a control point (e.g., the tips of a user's finger, a user's palm, a user's wrist, or a center of a user's fist); track a portion of a hand as a control point; the control point can be identified in response to a user making a particular gesture, such as forming her fingers into a pinch; in some cases, the control point can be offset from a tracked portion of the user, e.g., the control point can be an offset from the users palm or wrist; determine a control point and a casting direction, for a ray projection, based on a tracked position of one or more body parts; a user's hand 1306/4252 has formed a gesture by connecting her thumb and middle/index finger, indicating initiation of a ray projection 1302 extended from the control point 1304 which is an offset from the user's middle/index finger; the user began by making a gesture 2002, bringing her thumb-tip, index fingertip, and middle fingertip together; the hand interaction system then began tracking control point 2004 (a point offset in front of the gesture 2002) and casting direction 2006); determining a bimanual interaction point based on the interaction point for each of the first hand and the second hand; generating one or more bimanual deltas based on the interaction point for each of the first hand and the second hand; and interacting with the virtual object using the one or more bimanual deltas and the bimanual interaction point (Ravasz, ¶¶ [0079]-[0085] with FIG. 5: track a portion of a hand as a control point; e.g., a control point can be the tips of a user's finger, a user's palm, a user's wrist, or a center of a user's fist; the control point can be offset from a tracked portion of the user; e.g., the control point can be an offset from the users palm or wrist; track a second body part as an origin point; origin point can be based on a position of a user's eye, shoulder, hip, etc.; a machine learning model can be used to analyze images from such a camera and to generate 3D position data for a model of the user's hands or other various body parts; determine a projection orientation that is centered on a line the passes through the origin point determined and the control point determined; the projection can be one of various types such as a ray, sphere, cylinder, cone, pyramid, etc.; the projection can extend outward from the user starting at the control point or offset from the control point; perform an action in relation to real or virtual objects, based on one or more locations of the projection; ¶¶ [0142]-[0166] with FIGS. 25-34: identify an action corresponding to starting object selection; the action can be a two-handed "pinch" gesture, formed with the user's thumb from each hand touching the index or middle finger of that hand, with the two pinches touching at the thumb/finger intersection point; the action can be a two-handed "L" gesture, formed with a first hand having the thumb sticking up perpendicular to the floor and the index finger parallel to the floor and with a second hand having the thumb sticking down perpendicular to the floor and the index finger parallel to the floor, defining two opposite corners of a rectangle; continuously determine a shape defined by a first tracked portion of a first user hand and a second tracked portion of a second user hand; display a representation (such as an outline) of the shape determined; identify a pyramid formed with the tip of the pyramid at one of the user's eyes and the pyramid walls being formed based on the shape determined; upon the user releasing the gesture identified, or when the velocity of the user's hand movement falls below a threshold, select the objects that fall within the pyramid (or other shape) identified; the user began by making gestures 2602A and 2602B , bringing her thumb-tips and index fingertips together and touching those gestures together at point 2612; began tracking opposite corners of rectangle 2604 (i.e., bimanual deltas) based on the locations of gestures 2602A and 2602B, from the user's point of view; from the user's point of view, moving gesture points 2602A and 2602B formed rectangle 2604; as the user moved her hands apart, the corners of rectangle 2604 moved apart, increasing the size of rectangle 2604; as the user formed this rectangle, determined a pyramid formed with the pyramid tip at the user's dominant eye and with sides extending through the edges of rectangle 2604; in example 2600, continuously selected (or deselected) objects that at least partially intersect with the pyramid until the user released one of gestures 2602A or 2602B; in example 2650, the user formed a rectangle 2652 by forming two pinch gestures and pulling them apart; formed a pyramid 2654 with the tip of the pyramid at the user's dominant eye 2656 and extending so that the four triangles that form edges of the pyramid coincide with rectangle 2652; determined any objects that are both beyond the rectangle 2652 (i.e. on the opposite side of the rectangle 2652 from the user) and that fall completely within the pyramid 2654; at block 2702, determine a control point and casting direction based on a tracked position of one or more body parts; at block 2704, generate a ray projection from the control point along the casting direction; at block 2706, continuously determine a distance relationship (i.e., bimanual deltas) between a first hand (e.g., a dominant hand) controlling the control point and a second hand (e.g., a non-dominant hand); the distance relationship can be linearly or exponentially proportional to the actual distance between the user's hands; this relationship can be based on a speed at which the user changes the distance between her hands; at block 2708, continuously set a length of the ray projection or a "hook" location based on the distance relationship determined at block 2706; instead of setting a ray length, set an interaction point along the ray based on the distance relationship; whether at the end of the ray or at a point along the ray, this interaction point is referred to herein as the "hook"; at block 2710, identify one or more objects based on an intersection with the hook; example 2800 begins with a user creating ray projection 2802 by performing pinch gesture 2804 between her thumb and middle finger on her dominant hand 2806; the user can position the ray 2802 so that it intersects with objects 2808 and 2810; the user can then control the length of the ray, with a hook 2814 at the end of the ray 2802, based on a distance 2816 between her dominant hand 2806 and non-dominant hand 2818; as the hook 2814 intersects with the object 2810, which the user intends to target, she can make a second pinch gesture (not shown), this time with her index finger and thumb on her dominant hand 2806; as the hand interaction system identifies this gesture, it selects object 2810, which the hook 2814 intersects at that moment; in example 2900, the user then changes the length of ray 2802, and accordingly the position of the hook 2814, by lengthening the distance 2816 (i.e., bimanual deltas) between her dominant hand 2806 and her non-dominant hand 2818, while still holding the first thumb/middle-finger pinch, causing the hook 2814 to intersect with object 2812; the user can also select object 2812, now that it intersects with the hook 2814, by again making the gesture 2902 with a further thumb/index-finger pinch; at block 3002, determine a control point and casting direction based on a tracked position of one or more body parts; at block 3004, generate a cone or cylinder projection from the control point along the casting direction; at block 3006, continuously determine a distance relationship (i.e., bimanual deltas) between a first hand (e.g., a dominant hand ) controlling the control point and a second hand (e.g., a non-dominant hand); at block 3008, continuously set a diameter of the cylinder or of the base of the cone based on the distance relationship determined at block 3006; at block 3010, identify one or more objects based on at least partial intersection or full encompassment by the cone or cylinder; the user can control the length of the cylinder or cone with one gesture (e.g., the distance between the tip of a thumb and forefinger on a dominant hand) and can control the diameter of the cylinder or cone base with another gesture (e.g., the distance between her two hands; example 3100 begins with a user creating cone projection 3102 by performing pinch gesture 3104 between her thumb and middle finger on her dominant hand 3106; the user can then control the diameter of the base 3114 of cone 3102 based on a distance 3116 between her dominant hand 3106 and non-dominant hand 3118; in example 3200, the user has changed the diameter of the base 3114 of cone 3102 by changing distance 3116 between her dominant hand 3106 and non-dominant hand 3118, while still holding the first thumb/middle-finger pinch gesture, causing the cone 3102 to intersect with only objects 3112 and 3108; example 3300 begins with a user creating cylinder projection 3302 by performing pinch gesture 3304 between her thumb and middle finger on her dominant hand 3310; the user can then control the diameter of the base 3314 of cylinder 3302 by changing the distance 3316 between her dominant hand 3310 and non-dominant hand 3318; in example 3400, the user then changes the diameter of the base 3314 of cylinder 3302 by changing distance 3316 between her dominant hand 3310 and non-dominant hand 3318, while still holding the first thumb/middle-finger pinch).
TANG and Ravasz are analogous art because they are from the same field of endeavor, a system and a method relating to object interaction in an artificial reality environment. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to apply the teaching of Ravasz to TANG. Motivation for doing so would provide flexibility for users to control objects easily and improve selection accuracy (Ravasz, ¶¶ [0150], [0158], and [0097]).
TANG in view of Ravasz fails to explicitly disclose (1) selecting a subset of the plurality of keypoints associated with the first hand that correspond to the first particular gesture for determining the interaction point; and (2) determining a bimanual interaction point as a midpoint between the interaction point for each of the first hand and the second hand.
Mendes teaches a system and a method relating to interaction with virtual objects (Mendes, ABSTRACT), wherein determining a bimanual interaction point as a midpoint between the interaction point for each of the first hand and the second hand (Mendes, Section 4 with FIGS. 3-8 of Pages 5-6: implement five different interaction techniques for object manipulation; four of these use mid-air interactions, both direct and indirect, and one is solely touch-based; all implemented techniques provide 7 DOF, three for translation, three for rotation, and a uniform scale; the 6-DOF Hand technique in FIG. 4: (a) the hand that grabs the object directly controls its translation and rotation; (b) the distance between both grabbed hands scales the object; and (c) the grabbed point in the object will remain the center of all transformations, during the entire manipulation, until the object is released; the 3-DOF Hand technique in FIG. 5: (a) the hand that grabs the object directly controls its translation; (b) the rotations of the other hand define the object orientation; (c) the distance between both hands scales the object; and (d) the grabbed point in the object will remain as the center of all transformations; the Handle-Bar technique in FIG. 6: (a) the middle point of both hands is used to manipulate the object, reacting as if the user was holding a bar placed across the object (i.e., the user can translate the object by moving both hands in the same direction and rotate it by moving the hands in different directions); and (b) the distance between both hands scales the object; the Air TRS technique in FIG. 7: (a) the first hand grabs and moves the object; (b) the movement of the second hand relatively to the first defines rotation and scale transformations; and (c) two transformations are centered in the object pinched point; the Touch TRS + Widgets technique in FIG. 8: (a) one touch bellow the object enables widget visibility and moves the object; (b) a second touch outside the widgets apply the TRS algorithm (translation and yaw rotation); and (c) the widgets offer height manipulation, roll and pitch rotations).
TANG in view of Ravasz, and Mendes are analogous art because they are from the same field of endeavor, a system and a method relating to interaction with virtual objects. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to apply the teaching of Mendes to TANG in view of Ravasz. Motivation for doing so would provide improvement on seamless interactions and an effective way to transform 3D objects in mid-air (Mendes, Abstract and 2nd paragraph of Section 2.3 in Page 4).
TANG in view of Ravasz and Mendes fails to explicitly disclose selecting a subset of the plurality of keypoints associated with the first hand that correspond to the first particular gesture for determining the interaction point.
YANAI teaches a system and a method relating to gesture-based UI control (YANAI, ¶¶ [0002]-[0003] and [0022]), wherein selecting a subset of the plurality of keypoints associated with the first hand that correspond to the first particular gesture for determining the interaction point (YANAI, ¶¶ [0037]-[0039] and [0081] with FIGS. 2, 17A-B, and 18A-B: the UI control system 200 may be configured to perform one or more gesture-recognition techniques, and thus enable a gesture-based interface; as used herein, "gesture recognition" may refer to identifying specific movements or pose configurations performed by a user; e.g., gesture recognition may refer to identifying a swipe of a hand in a particular direction having a particular speed, a finger tracing a specific shape on a touch screen, or the wave of a hand; FIGS. 17A-B and 18A-B illustrate various gestures that may be recognized by the selection logic 212 to select an identified UI element; in particular, FIGS. 17 A-B illustrate a "pinch" gesture and FIGS. 18A-B illustrate a "grab" gesture, either or both of which may be recognized by the selection logic 212 to select an identified UI element; ¶¶ [0049]-[0056] and [0060]-[0067] with FIGS. 2-10: multiple landmarks that may be identified by the landmark logic 204 on a portion 300 of a user's body (e.g., a user's hand); several different types of landmarks are indicated in FIG. 3, including fingertip landmarks 302a-e, first finger joint landmarks 304a-e, second finger joint landmarks 306a-e, base joint landmarks 308a-e, a palm center landmark 314 and a hand base landmark 312; the landmark logic 204 may identify other landmarks of a user's hand (e.g., landmarks associated with edges of a user's hand in an image) and/or landmarks on other portions of a user's body; FIG. 3 also depicts structural lines 316, which may connect various landmarks identified by the landmark logic 204 to form a skeleton model of the portion 300 of the user's body; the landmark logic 204 may utilize such a model for tracking and landmark recognition purposes; FIG. 3 also depicts several auxiliary landmarks determined by the landmark logic 204 based on other landmarks; in particular, the auxiliary landmark 318 may be located at a point, such as the midpoint, on the line segment between the landmarks 302c (the tip of the middle finger) and 302d (the tip of the index finger); the auxiliary landmark 320 may be located at a point, such as the midpoint, on the line segment between the landmarks 302b (the tip of the ring finger) and 302e (the tip of the thumb); the auxiliary landmark 322 may be located at a point, such as the midpoint, on the line segment between the landmarks 102d (the tip of the index finger) and 302e (the tip of the thumb); the control operations logic 202 may include pointer logic 206, which may be coupled to the landmark logic 204 and may be configured to determine a pointer based on the landmark locations determined by the landmark logic 204; the pointer logic 206 may be configured to determine a pointer in any of a number of ways; the pointer may be a multi-location pointer based on a plurality of landmark locations; the pointer may be a base pointer based on a location of a lower portion of an index finger of a user's hand; the pointer may be a virtual pointer based on a relative position of a landmark location with respect to a virtual interaction region; the control operations logic 202 may include identification logic 208, which may be coupled to the pointer logic 206 and may be configured to identify a UI element, from multiple UI elements, based on the pointer determined by the pointer logic 206; the control operations logic 202 may include display logic 210, which may be coupled to the display 222 and may be configured to adjust the display 222 based on various UI control operations performed by the control operations logic 202; e.g., the display logic 210 may be configured to visually indicate the pointer determined by the pointer logic 206 in the display 222, wherein the pointer may be visually indicated as a cursor, an arrow, an image of a hand or another visual indicator; the pointer is a virtual pointer based on a relative position of a landmark location and a virtual interaction region; the control operations logic 202 may include selection logic 212, which may be coupled to the identification logic 208 and may be configured to select the UI element identified by the identification logic 208 based on gesture data indicative of a gesture of the user; the gesture data analyzed by the selection logic 212 may include any of the kinds of data discussed above with reference to landmark location for the landmark logic 204; identify user gestures based on the color and/or depth data included in one or more images of the user's body captured by the depth camera; the landmark logic 204 may be configured to determine locations of multiple landmarks associated with a portion of a user's body, and the pointer logic 206 may be configured to determine a multi-location pointer based on the multiple landmark locations; FIG. 4 illustrates a multilocation pointer 400 that includes the landmarks 302c, 304c, 302d, 304d, 302e, 314, 318, 320 and 322; the pointer logic 206 determines a multi-location pointer, the identification logic 208 may identify a UI element by determining a pointer score for each element in a UI and identifying the UI element that has the highest score; the pointer score for each UI element may be equal to the sum of the number of landmarks included in the multi-location pointer that overlap with the UI element; in the case of a tie, the identification logic 208 may be configured to give priority to certain kinds of landmarks (e.g., tip landmarks may be more important than palm landmarks or auxiliary landmarks); the pointer score for each UI element may be equal to a weighted sum of the number of landmarks included in the multi-location pointer that overlap with the UI element; each landmark in the multi-location pointer may be associated with a weight, and this weight may be used to determine the contribution of that particular landmark to the pointer score of a UI element; these weights may be assigned in any desired manner, and may be dynamically determined (e.g., using a learning algorithm) based on the types of hand motions that a user typically or preferably uses to interact; a weight associated with a landmark in a multi-location pointer may change depending upon the distance between the landmark and a reference location on a portion of the user's body; such a weight adjustment may be performed to reduce the influence of landmarks on the fingers when those fingers are closed onto the user's palm; the weights associated with each of the landmarks may be nominal weights and may be reduced as the landmark approaches a reference location (e.g., a location at the center of the palm) until the weight is approximately 0 when the finger is fully folded; a multi-location pointer may be determined by calculating a weighted average location based on the landmark locations; this weighted average location may represent a single point in a UI, which may correspond to a cursor that may be used to identify a UI element (e.g., when the cursor overlaps with the UI element); a multi-location pointer may include an area, in addition to or instead of a collection of landmark locations; FIGS. 6-9 depict multi-location pointers 600-900 (for different gestures), each of which includes an area whose boundaries are based on one or more landmark locations determined by the landmark logic 204; in particular, the multi-location pointer 600 for a gesture shown in FIG. 6 may be based on the landmark 102e, the multilocation pointer 700 for a gesture shown in FIG. 7 may be based on the landmarks 302e and 302d, the multi-location pointer 800 for a gesture shown in FIG. 8 may be based on the landmark 302d, and the multi-location pointer 900 for a gesture shown in FIG. 9 may be based on the landmarks 302c, 302d and 302e; the boundaries of the multi-location pointers 600-900 may be set to be a particular distance away from an identified landmark, and may include regions between multiple landmarks included in the multi-location pointer; when the portion of the user's body includes the user's hand and one or more of the landmarks is associated with points on the user's fingers, a landmark located on a finger that is fully folded onto the palm may be ignored for the purposes of determining a pointer; i.e., landmarks located on a finger that is fully folded are not selected (or filtered out) for determining a pointer; a landmark on a finger must be greater than a threshold distance away from a reference location (e.g., the center of the palm) in order for the landmark to be used in the determination of the pointer; i.e., landmarks located on a finger less than a threshold distance away from the center of the palm are filtered out for determining a pointer; ¶¶ [0070]-[0074] with FIG. 13: the UI control system 200 may be configured to determine and utilize a virtual pointer; in particular, the landmark logic 204 may be configured to determine a location of a landmark associated with a portion of the user's body, and the pointer logic 206 may be configured to determine a virtual pointer based on a relative position of the landmark location with respect to a virtual interaction region; as used herein, a "virtual interaction region" may refer to a plane, a curved surface, a volume (such as a sphere or rectangular solid) or another region spaced away from UI hardware of a computing system (e.g., a display or input device), which may be used as a reference for user interaction with a computing system; a virtual interaction region may be a three-dimensional region in which UI elements are projected or otherwise displayed; e.g., FIG. 13 depicts a configuration 1300 in which a virtual interaction region 1302 is shown, where the virtual interaction region 1302 example of FIG. 13 is a concave curved surface spaced away from a display 222; the pointer logic 206 may determine the virtual pointer based on relative positions of landmarks on the user's body (e.g., landmarks on the index and middle fingers of the user 1304, and/or landmarks on the other fingers of the user 1304) and the virtual interaction region 1302; the landmark used by the pointer logic 206 to determine the virtual pointer may vary as the portion of the user's body changes position with respect to the virtual interaction region; the landmark logic 204 may determine the location of the landmark on the portion of the user's body by determining a location of a first landmark proximate to a tip of a finger of the user's hand (e.g., any of the tip landmarks 302a-e) and determine a location of a second landmark proximate to a base joint of the finger of the user's hand (e.g., any of the base joint landmarks 308a-e); when the first landmark (proximate to the tip of the finger) is positioned on a first side of the virtual interaction region, the pointer logic 206 may determine the virtual pointer based on the relative position of the location of the first landmark with respect to the virtual interaction region; when the first landmark is positioned on a second side of the virtual interaction region, the pointer logic may determine the virtual pointer based on the relative position of the location of the second landmark (proximate to the base joint of the finger) with respect to the virtual interaction region; as the tip of the user's finger approaches the virtual interaction region, intersects with the virtual interaction region, and proceeds through the virtual interaction region to the opposite side, the landmark used by the pointer logic 206 to determine the virtual pointer may change accordingly, and may transition from the first landmark to the second landmark).
TAN and YANAI are analogous art because they are from the same field of endeavor, a system and a method relating to gesture-based UI control. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to apply the teaching of YANAI to TANG in view of Ravasz and Mendes. Motivation for doing so would .
Claims 4, 14, and 19
TANG in view of Ravasz, Mendes, and YANAI discloses all the elements as stated in Claims 1, 11, and 16 respectively and further discloses wherein determining the interaction point for the first hand further includes: determining a first particular location relative to the subset of the plurality of keypoints associated with the first hand, wherein the first particular location is determined based on the subset of the plurality of keypoints associated with the first hand and the first particular gesture; and registering the interaction point for the first hand to the first particular location (TANG, ¶¶ [0034]-[0035] with 360-370 in FIG. 3: casting a ray from a portion of the user's hand based on the position of the joint of the user's arm and the position of the user's hand, which determines the poses, movements, gestures or actions of the imaged hand; e.g., lines may be generated for the shoulder, and/or elbow, as well as for the palm, wrist, knuckle, etc.; ¶¶ [0037]-[0045] with 410 in FIGS. 4A-4B and 4D and 460 in FIG. 4C: an initial ray 415 is positioned with an origin at the user's shoulder 405 passing through the user's palm 410; an initial ray 440 is positioned with an origin at the user's elbow 435 passing through the user's palm 410; rays may be cast from a user's fingertips; an initial ray 450 is positioned with an origin at the user's eye 455 passing through the user's finger 460; the ray may be cast based on palm orientation alone, e.g., a targeting ray 470 is extended from the user's palm 410 into the environment; i.e., interaction point – user's palm 410 or user's fingertips is selected/determined from a portion of the user's hand keypoints – wrist joints, finger joints, adjoining finger segments, knuckles, palm, dorsum, etc.; pointing the palm up may be a gesture designated to signify another function; pointing down, making a fist or closed grip may also signal the user's intent not to cast the ray; this may enable the system to match the user's intent as to when they want to use the functionality and to cast a ray; other intentional gestures may be provided to the user for turning ray-casting on or off; semantic gestures such as finger pointing, or a "spiderman-style web-cast" gesture may also be used; the ray may be cast based at least in part on dynamic positioning and not just a fixed location) (Ravasz, ¶¶ [0080]-[0081], [0084], [0086], [0100], [0104], and [0126] with 502 in FIG. 5; FIGS. 6-9; 1202 in FIG. 12; FIG. 13/42B; and FIG. 20: casting a ray projection with an origin point (e.g., a user's dominant eye, a point between the user's eyes, another point on the user's head, the user's hip, the user's shoulder, or a context variable point, e.g., between the user's hip and shoulder) and a control point (e.g., the tips of a user's finger, a user's palm, a user's wrist, or a center of a user's fist); track a portion of a hand as a control point; the control point can be identified in response to a user making a particular gesture, such as forming her fingers into a pinch; in some cases, the control point can be offset from a tracked portion of the user, e.g., the control point can be an offset from the users palm or wrist; determine a control point and a casting direction, for a ray projection, based on a tracked position of one or more body parts; a user's hand 1306/4252 has formed a gesture by connecting her thumb and middle/index finger, indicating initiation of a ray projection 1302 extended from the control point 1304 which is an offset from the user's middle/index finger; the user began by making a gesture 2002, bringing her thumb-tip, index fingertip, and middle fingertip together; the hand interaction system then began tracking control point 2004 (a point offset in front of the gesture 2002) and casting direction 2006) (YANAI, ¶¶ [0037]-[0039] and [0081] with FIGS. 2, 17A-B, and 18A-B: the UI control system 200 may be configured to perform one or more gesture-recognition techniques, and thus enable a gesture-based interface; as used herein, "gesture recognition" may refer to identifying specific movements or pose configurations performed by a user; e.g., gesture recognition may refer to identifying a swipe of a hand in a particular direction having a particular speed, a finger tracing a specific shape on a touch screen, or the wave of a hand; FIGS. 17A-B and 18A-B illustrate various gestures that may be recognized by the selection logic 212 to select an identified UI element; in particular, FIGS. 17 A-B illustrate a "pinch" gesture and FIGS. 18A-B illustrate a "grab" gesture, either or both of which may be recognized by the selection logic 212 to select an identified UI element; ¶¶ [0049]-[0056] and [0060]-[0067] with FIGS. 2-10: multiple landmarks that may be identified by the landmark logic 204 on a portion 300 of a user's body (e.g., a user's hand); several different types of landmarks are indicated in FIG. 3, including fingertip landmarks 302a-e, first finger joint landmarks 304a-e, second finger joint landmarks 306a-e, base joint landmarks 308a-e, a palm center landmark 314 and a hand base landmark 312; the landmark logic 204 may identify other landmarks of a user's hand (e.g., landmarks associated with edges of a user's hand in an image) and/or landmarks on other portions of a user's body; FIG. 3 also depicts structural lines 316, which may connect various landmarks identified by the landmark logic 204 to form a skeleton model of the portion 300 of the user's body; the landmark logic 204 may utilize such a model for tracking and landmark recognition purposes; FIG. 3 also depicts several auxiliary landmarks determined by the landmark logic 204 based on other landmarks; in particular, the auxiliary landmark 318 may be located at a point, such as the midpoint, on the line segment between the landmarks 302c (the tip of the middle finger) and 302d (the tip of the index finger); the auxiliary landmark 320 may be located at a point, such as the midpoint, on the line segment between the landmarks 302b (the tip of the ring finger) and 302e (the tip of the thumb); the auxiliary landmark 322 may be located at a point, such as the midpoint, on the line segment between the landmarks 102d (the tip of the index finger) and 302e (the tip of the thumb); the control operations logic 202 may include pointer logic 206, which may be coupled to the landmark logic 204 and may be configured to determine a pointer based on the landmark locations determined by the landmark logic 204; the pointer logic 206 may be configured to determine a pointer in any of a number of ways; the pointer may be a multi-location pointer based on a plurality of landmark locations; the pointer may be a base pointer based on a location of a lower portion of an index finger of a user's hand; the pointer may be a virtual pointer based on a relative position of a landmark location with respect to a virtual interaction region; the control operations logic 202 may include identification logic 208, which may be coupled to the pointer logic 206 and may be configured to identify a UI element, from multiple UI elements, based on the pointer determined by the pointer logic 206; the control operations logic 202 may include display logic 210, which may be coupled to the display 222 and may be configured to adjust the display 222 based on various UI control operations performed by the control operations logic 202; e.g., the display logic 210 may be configured to visually indicate the pointer determined by the pointer logic 206 in the display 222, wherein the pointer may be visually indicated as a cursor, an arrow, an image of a hand or another visual indicator; the pointer is a virtual pointer based on a relative position of a landmark location and a virtual interaction region; the control operations logic 202 may include selection logic 212, which may be coupled to the identification logic 208 and may be configured to select the UI element identified by the identification logic 208 based on gesture data indicative of a gesture of the user; the gesture data analyzed by the selection logic 212 may include any of the kinds of data discussed above with reference to landmark location for the landmark logic 204; identify user gestures based on the color and/or depth data included in one or more images of the user's body captured by the depth camera; the landmark logic 204 may be configured to determine locations of multiple landmarks associated with a portion of a user's body, and the pointer logic 206 may be configured to determine a multi-location pointer based on the multiple landmark locations; FIG. 4 illustrates a multilocation pointer 400 that includes the landmarks 302c, 304c, 302d, 304d, 302e, 314, 318, 320 and 322; the pointer logic 206 determines a multi-location pointer, the identification logic 208 may identify a UI element by determining a pointer score for each element in a UI and identifying the UI element that has the highest score; the pointer score for each UI element may be equal to the sum of the number of landmarks included in the multi-location pointer that overlap with the UI element; in the case of a tie, the identification logic 208 may be configured to give priority to certain kinds of landmarks (e.g., tip landmarks may be more important than palm landmarks or auxiliary landmarks); the pointer score for each UI element may be equal to a weighted sum of the number of landmarks included in the multi-location pointer that overlap with the UI element; each landmark in the multi-location pointer may be associated with a weight, and this weight may be used to determine the contribution of that particular landmark to the pointer score of a UI element; these weights may be assigned in any desired manner, and may be dynamically determined (e.g., using a learning algorithm) based on the types of hand motions that a user typically or preferably uses to interact; a weight associated with a landmark in a multi-location pointer may change depending upon the distance between the landmark and a reference location on a portion of the user's body; such a weight adjustment may be performed to reduce the influence of landmarks on the fingers when those fingers are closed onto the user's palm; the weights associated with each of the landmarks may be nominal weights and may be reduced as the landmark approaches a reference location (e.g., a location at the center of the palm) until the weight is approximately 0 when the finger is fully folded; a multi-location pointer may be determined by calculating a weighted average location based on the landmark locations; this weighted average location may represent a single point in a UI, which may correspond to a cursor that may be used to identify a UI element (e.g., when the cursor overlaps with the UI element); a multi-location pointer may include an area, in addition to or instead of a collection of landmark locations; FIGS. 6-9 depict multi-location pointers 600-900 (for different gestures), each of which includes an area whose boundaries are based on one or more landmark locations determined by the landmark logic 204; in particular, the multi-location pointer 600 for a gesture shown in FIG. 6 may be based on the landmark 102e, the multilocation pointer 700 for a gesture shown in FIG. 7 may be based on the landmarks 302e and 302d, the multi-location pointer 800 for a gesture shown in FIG. 8 may be based on the landmark 302d, and the multi-location pointer 900 for a gesture shown in FIG. 9 may be based on the landmarks 302c, 302d and 302e; the boundaries of the multi-location pointers 600-900 may be set to be a particular distance away from an identified landmark, and may include regions between multiple landmarks included in the multi-location pointer; when the portion of the user's body includes the user's hand and one or more of the landmarks is associated with points on the user's fingers, a landmark located on a finger that is fully folded onto the palm may be ignored for the purposes of determining a pointer; i.e., landmarks located on a finger that is fully folded are not selected (or filtered out) for determining a pointer; a landmark on a finger must be greater than a threshold distance away from a reference location (e.g., the center of the palm) in order for the landmark to be used in the determination of the pointer; i.e., landmarks located on a finger less than a threshold distance away from the center of the palm are filtered out for determining a pointer; ¶¶ [0070]-[0074] with FIG. 13: the UI control system 200 may be configured to determine and utilize a virtual pointer; in particular, the landmark logic 204 may be configured to determine a location of a landmark associated with a portion of the user's body, and the pointer logic 206 may be configured to determine a virtual pointer based on a relative position of the landmark location with respect to a virtual interaction region; as used herein, a "virtual interaction region" may refer to a plane, a curved surface, a volume (such as a sphere or rectangular solid) or another region spaced away from UI hardware of a computing system (e.g., a display or input device), which may be used as a reference for user interaction with a computing system; a virtual interaction region may be a three-dimensional region in which UI elements are projected or otherwise displayed; e.g., FIG. 13 depicts a configuration 1300 in which a virtual interaction region 1302 is shown, where the virtual interaction region 1302 example of FIG. 13 is a concave curved surface spaced away from a display 222; the pointer logic 206 may determine the virtual pointer based on relative positions of landmarks on the user's body (e.g., landmarks on the index and middle fingers of the user 1304, and/or landmarks on the other fingers of the user 1304) and the virtual interaction region 1302; the landmark used by the pointer logic 206 to determine the virtual pointer may vary as the portion of the user's body changes position with respect to the virtual interaction region; the landmark logic 204 may determine the location of the landmark on the portion of the user's body by determining a location of a first landmark proximate to a tip of a finger of the user's hand (e.g., any of the tip landmarks 302a-e) and determine a location of a second landmark proximate to a base joint of the finger of the user's hand (e.g., any of the base joint landmarks 308a-e); when the first landmark (proximate to the tip of the finger) is positioned on a first side of the virtual interaction region, the pointer logic 206 may determine the virtual pointer based on the relative position of the location of the first landmark with respect to the virtual interaction region; when the first landmark is positioned on a second side of the virtual interaction region, the pointer logic may determine the virtual pointer based on the relative position of the location of the second landmark (proximate to the base joint of the finger) with respect to the virtual interaction region; as the tip of the user's finger approaches the virtual interaction region, intersects with the virtual interaction region, and proceeds through the virtual interaction region to the opposite side, the landmark used by the pointer logic 206 to determine the virtual pointer may change accordingly, and may transition from the first landmark to the second landmark).
Claims 5, 15, and 20
TANG in view of Ravasz, Mendes, and YANAI discloses all the elements as stated in Claims 1, 11, and 16 respectively and further discloses wherein determining the interaction point for the second hand further includes: determining, based on analyzing the one or more images, whether the second hand is making or is transitioning into making a second particular gesture from the plurality of gestures (TANG, ¶ [0018]: process the acquired video to identify one or more objects within the operating environment, one or more postures and/or gestures of the user wearing head-mounted display device 10, one or more postures and/or gestures of other users within the operating environment, etc.; ¶¶ [0033]-[0034] with 350 in FIG. 3: point clouds (e.g., portions of a depth map) corresponding to the user's hands may be further processed to reveal the skeletal substructure of the hands, and to identify components of the user's hands, such as wrist joints, finger joints, adjoining finger segments, knuckles, palm, dorsum, etc.; a virtual skeleton is fit to the pixels of depth video that correspond to the user; by analyzing positional change in the various hand joints and/or segments, the corresponding poses, movements, gestures or actions of the imaged hand may be determined; ¶ [0051]: recognize a manipulation gesture from fingers of the user's hand based on information received from the depth camera) (Ravasz, ¶¶ [0062] and [0083]: one or more cameras included in the HMD 200 or external to it can monitor the positions and poses of the user's hands to determine gestures and other hand and body motions; a machine learning model can be used to analyze images from such a camera and to generate 3D position data for a model of the user's hands or other various body parts) (YANAI, ¶¶ [0037]-[0039] and [0081] with FIGS. 2, 17A-B, and 18A-B: the UI control system 200 may be configured to perform one or more gesture-recognition techniques, and thus enable a gesture-based interface; as used herein, "gesture recognition" may refer to identifying specific movements or pose configurations performed by a user; e.g., gesture recognition may refer to identifying a swipe of a hand in a particular direction having a particular speed, a finger tracing a specific shape on a touch screen, or the wave of a hand; FIGS. 17A-B and 18A-B illustrate various gestures that may be recognized by the selection logic 212 to select an identified UI element; in particular, FIGS. 17 A-B illustrate a "pinch" gesture and FIGS. 18A-B illustrate a "grab" gesture, either or both of which may be recognized by the selection logic 212 to select an identified UI element); and
in response to determining that the second hand is making or is transitioning into making the second particular gesture: selecting a subset of the plurality of keypoints associated with the second hand that correspond to the second particular gesture; determining a second particular location relative to the subset of the plurality of keypoints associated with the second hand, wherein the second particular location is determined based on the subset of the plurality of keypoints associated with the second hand and the second particular gesture; and registering the interaction point for the second hand to the second particular location (TANG, ¶¶ [0034]-[0035] with 360-370 in FIG. 3: casting a ray from a portion of the user's hand based on the position of the joint of the user's arm and the position of the user's hand, which determines the poses, movements, gestures or actions of the imaged hand; e.g., lines may be generated for the shoulder, and/or elbow, as well as for the palm, wrist, knuckle, etc.; ¶¶ [0037]-[0045] with 410 in FIGS. 4A-4B and 4D and 460 in FIG. 4C: an initial ray 415 is positioned with an origin at the user's shoulder 405 passing through the user's palm 410; an initial ray 440 is positioned with an origin at the user's elbow 435 passing through the user's palm 410; rays may be cast from a user's fingertips; an initial ray 450 is positioned with an origin at the user's eye 455 passing through the user's finger 460; the ray may be cast based on palm orientation alone, e.g., a targeting ray 470 is extended from the user's palm 410 into the environment; i.e., interaction point – user's palm 410 or user's fingertips is selected/determined from a portion of the user's hand keypoints – wrist joints, finger joints, adjoining finger segments, knuckles, palm, dorsum, etc.; pointing the palm up may be a gesture designated to signify another function; pointing down, making a fist or closed grip may also signal the user's intent not to cast the ray; this may enable the system to match the user's intent as to when they want to use the functionality and to cast a ray; other intentional gestures may be provided to the user for turning ray-casting on or off; semantic gestures such as finger pointing, or a "spiderman-style web-cast" gesture may also be used; the ray may be cast based at least in part on dynamic positioning and not just a fixed location) (Ravasz, ¶¶ [0080]-[0081], [0084], [0086], [0100], [0104], and [0126] with 502 in FIG. 5; FIGS. 6-9; 1202 in FIG. 12; FIG. 13/42B; and FIG. 20: casting a ray projection with an origin point (e.g., a user's dominant eye, a point between the user's eyes, another point on the user's head, the user's hip, the user's shoulder, or a context variable point, e.g., between the user's hip and shoulder) and a control point (e.g., the tips of a user's finger, a user's palm, a user's wrist, or a center of a user's fist); track a portion of a hand as a control point; the control point can be identified in response to a user making a particular gesture, such as forming her fingers into a pinch; in some cases, the control point can be offset from a tracked portion of the user, e.g., the control point can be an offset from the users palm or wrist; determine a control point and a casting direction, for a ray projection, based on a tracked position of one or more body parts; a user's hand 1306/4252 has formed a gesture by connecting her thumb and middle/index finger, indicating initiation of a ray projection 1302 extended from the control point 1304 which is an offset from the user's middle/index finger; the user began by making a gesture 2002, bringing her thumb-tip, index fingertip, and middle fingertip together; the hand interaction system then began tracking control point 2004 (a point offset in front of the gesture 2002) and casting direction 2006) (YANAI, ¶¶ [0049]-[0056] and [0060]-[0067] with FIGS. 2-10: multiple landmarks that may be identified by the landmark logic 204 on a portion 300 of a user's body (e.g., a user's hand); several different types of landmarks are indicated in FIG. 3, including fingertip landmarks 302a-e, first finger joint landmarks 304a-e, second finger joint landmarks 306a-e, base joint landmarks 308a-e, a palm center landmark 314 and a hand base landmark 312; the landmark logic 204 may identify other landmarks of a user's hand (e.g., landmarks associated with edges of a user's hand in an image) and/or landmarks on other portions of a user's body; FIG. 3 also depicts structural lines 316, which may connect various landmarks identified by the landmark logic 204 to form a skeleton model of the portion 300 of the user's body; the landmark logic 204 may utilize such a model for tracking and landmark recognition purposes; FIG. 3 also depicts several auxiliary landmarks determined by the landmark logic 204 based on other landmarks; in particular, the auxiliary landmark 318 may be located at a point, such as the midpoint, on the line segment between the landmarks 302c (the tip of the middle finger) and 302d (the tip of the index finger); the auxiliary landmark 320 may be located at a point, such as the midpoint, on the line segment between the landmarks 302b (the tip of the ring finger) and 302e (the tip of the thumb); the auxiliary landmark 322 may be located at a point, such as the midpoint, on the line segment between the landmarks 102d (the tip of the index finger) and 302e (the tip of the thumb); the control operations logic 202 may include pointer logic 206, which may be coupled to the landmark logic 204 and may be configured to determine a pointer based on the landmark locations determined by the landmark logic 204; the pointer logic 206 may be configured to determine a pointer in any of a number of ways; the pointer may be a multi-location pointer based on a plurality of landmark locations; the pointer may be a base pointer based on a location of a lower portion of an index finger of a user's hand; the pointer may be a virtual pointer based on a relative position of a landmark location with respect to a virtual interaction region; the control operations logic 202 may include identification logic 208, which may be coupled to the pointer logic 206 and may be configured to identify a UI element, from multiple UI elements, based on the pointer determined by the pointer logic 206; the control operations logic 202 may include display logic 210, which may be coupled to the display 222 and may be configured to adjust the display 222 based on various UI control operations performed by the control operations logic 202; e.g., the display logic 210 may be configured to visually indicate the pointer determined by the pointer logic 206 in the display 222, wherein the pointer may be visually indicated as a cursor, an arrow, an image of a hand or another visual indicator; the pointer is a virtual pointer based on a relative position of a landmark location and a virtual interaction region; the control operations logic 202 may include selection logic 212, which may be coupled to the identification logic 208 and may be configured to select the UI element identified by the identification logic 208 based on gesture data indicative of a gesture of the user; the gesture data analyzed by the selection logic 212 may include any of the kinds of data discussed above with reference to landmark location for the landmark logic 204; identify user gestures based on the color and/or depth data included in one or more images of the user's body captured by the depth camera; the landmark logic 204 may be configured to determine locations of multiple landmarks associated with a portion of a user's body, and the pointer logic 206 may be configured to determine a multi-location pointer based on the multiple landmark locations; FIG. 4 illustrates a multilocation pointer 400 that includes the landmarks 302c, 304c, 302d, 304d, 302e, 314, 318, 320 and 322; the pointer logic 206 determines a multi-location pointer, the identification logic 208 may identify a UI element by determining a pointer score for each element in a UI and identifying the UI element that has the highest score; the pointer score for each UI element may be equal to the sum of the number of landmarks included in the multi-location pointer that overlap with the UI element; in the case of a tie, the identification logic 208 may be configured to give priority to certain kinds of landmarks (e.g., tip landmarks may be more important than palm landmarks or auxiliary landmarks); the pointer score for each UI element may be equal to a weighted sum of the number of landmarks included in the multi-location pointer that overlap with the UI element; each landmark in the multi-location pointer may be associated with a weight, and this weight may be used to determine the contribution of that particular landmark to the pointer score of a UI element; these weights may be assigned in any desired manner, and may be dynamically determined (e.g., using a learning algorithm) based on the types of hand motions that a user typically or preferably uses to interact; a weight associated with a landmark in a multi-location pointer may change depending upon the distance between the landmark and a reference location on a portion of the user's body; such a weight adjustment may be performed to reduce the influence of landmarks on the fingers when those fingers are closed onto the user's palm; the weights associated with each of the landmarks may be nominal weights and may be reduced as the landmark approaches a reference location (e.g., a location at the center of the palm) until the weight is approximately 0 when the finger is fully folded; a multi-location pointer may be determined by calculating a weighted average location based on the landmark locations; this weighted average location may represent a single point in a UI, which may correspond to a cursor that may be used to identify a UI element (e.g., when the cursor overlaps with the UI element); a multi-location pointer may include an area, in addition to or instead of a collection of landmark locations; FIGS. 6-9 depict multi-location pointers 600-900 (for different gestures), each of which includes an area whose boundaries are based on one or more landmark locations determined by the landmark logic 204; in particular, the multi-location pointer 600 for a gesture shown in FIG. 6 may be based on the landmark 102e, the multilocation pointer 700 for a gesture shown in FIG. 7 may be based on the landmarks 302e and 302d, the multi-location pointer 800 for a gesture shown in FIG. 8 may be based on the landmark 302d, and the multi-location pointer 900 for a gesture shown in FIG. 9 may be based on the landmarks 302c, 302d and 302e; the boundaries of the multi-location pointers 600-900 may be set to be a particular distance away from an identified landmark, and may include regions between multiple landmarks included in the multi-location pointer; when the portion of the user's body includes the user's hand and one or more of the landmarks is associated with points on the user's fingers, a landmark located on a finger that is fully folded onto the palm may be ignored for the purposes of determining a pointer; i.e., landmarks located on a finger that is fully folded are not selected (or filtered out) for determining a pointer; a landmark on a finger must be greater than a threshold distance away from a reference location (e.g., the center of the palm) in order for the landmark to be used in the determination of the pointer; i.e., landmarks located on a finger less than a threshold distance away from the center of the palm are filtered out for determining a pointer; ¶¶ [0070]-[0074] with FIG. 13: the UI control system 200 may be configured to determine and utilize a virtual pointer; in particular, the landmark logic 204 may be configured to determine a location of a landmark associated with a portion of the user's body, and the pointer logic 206 may be configured to determine a virtual pointer based on a relative position of the landmark location with respect to a virtual interaction region; as used herein, a "virtual interaction region" may refer to a plane, a curved surface, a volume (such as a sphere or rectangular solid) or another region spaced away from UI hardware of a computing system (e.g., a display or input device), which may be used as a reference for user interaction with a computing system; a virtual interaction region may be a three-dimensional region in which UI elements are projected or otherwise displayed; e.g., FIG. 13 depicts a configuration 1300 in which a virtual interaction region 1302 is shown, where the virtual interaction region 1302 example of FIG. 13 is a concave curved surface spaced away from a display 222; the pointer logic 206 may determine the virtual pointer based on relative positions of landmarks on the user's body (e.g., landmarks on the index and middle fingers of the user 1304, and/or landmarks on the other fingers of the user 1304) and the virtual interaction region 1302; the landmark used by the pointer logic 206 to determine the virtual pointer may vary as the portion of the user's body changes position with respect to the virtual interaction region; the landmark logic 204 may determine the location of the landmark on the portion of the user's body by determining a location of a first landmark proximate to a tip of a finger of the user's hand (e.g., any of the tip landmarks 302a-e) and determine a location of a second landmark proximate to a base joint of the finger of the user's hand (e.g., any of the base joint landmarks 308a-e); when the first landmark (proximate to the tip of the finger) is positioned on a first side of the virtual interaction region, the pointer logic 206 may determine the virtual pointer based on the relative position of the location of the first landmark with respect to the virtual interaction region; when the first landmark is positioned on a second side of the virtual interaction region, the pointer logic may determine the virtual pointer based on the relative position of the location of the second landmark (proximate to the base joint of the finger) with respect to the virtual interaction region; as the tip of the user's finger approaches the virtual interaction region, intersects with the virtual interaction region, and proceeds through the virtual interaction region to the opposite side, the landmark used by the pointer logic 206 to determine the virtual pointer may change accordingly, and may transition from the first landmark to the second landmark).
Claim 6
TANG in view of Ravasz, Mendes, and YANAI discloses all the elements as stated in Claim 5 and further discloses wherein the plurality of gestures includes at least one of a grasping gesture, a pointing gesture, or a pinching gesture (TANG, ¶ [0042]: semantic gestures such as finger pointing, or a "spiderman-style web-cast" gesture may also be used; ¶ [0048]: the user may then perform a finger-based gesture, such as a two-finger pinch to select the object; ¶¶ [0052]-[0053] with FIG. 6: a user is shown grasping virtual ball 605 with a left hand 610 and pinching a control point 615 of virtual ball with a right hand 620; e.g., a pinch gesture may be used to scale an object, grasping may enable moving an object, etc.; any sort of suitable manipulation gesture may be used, e.g., pinch, point, grasp, wipe, push).
Claim 7
TANG in view of Ravasz, Mendes, and YANAI discloses all the elements as stated in Claim 1 and further discloses wherein the one or more images include: a first image of the first hand and a second image of the second hand; or a single image of the first hand and the second hand (TANG, ¶¶ [0018] and [0034] with FIG. 1: acquire video of a scene including one or more human subject; identify one or more objects within the operating environment, one or more postures and/or gestures of the user wearing head-mounted display device 10, one or more postures and/or gestures of other users within the operating environment , etc.; by analyzing positional change in the various hand joints and/or segments, the corresponding poses, movements, gestures or actions of the imaged hand may be determined) (Ravasz, ¶¶ [0062] and [0083]: one or more cameras included in the HMD 200 or external to it can monitor the positions and poses of the user's hands to determine gestures and other hand and body motions; a machine learning model can be used to analyze images from such a camera and to generate 3D position data for a model of the user's hands or other various body parts).
Claim 8
TANG in view of Ravasz, Mendes, and YANAI discloses all the elements as stated in Claim 1 and further discloses wherein the one or more images include a series of time-sequenced imaged (TANG, ¶¶ [0018] and [0034] with FIG. 1: acquire video of a scene including one or more human subject; the video may include a time- resolved sequence of images of spatial resolution and suitable frame rate; identify one or more objects within the operating environment, one or more postures and/or gestures of the user wearing head-mounted display device 10, one or more postures and/or gestures of other users within the operating environment , etc.; by analyzing positional change in the various hand joints and/or segments, the corresponding poses, movements, gestures or actions of the imaged hand may be determined) (Ravasz, ¶ [0148] with FIGS. 26A-B: the user began by making gestures 2602A and 2602B , bringing her thumb-tips and index fingertips together and touching those gestures together at point 2612; began tracking opposite corners of rectangle 2604 (i.e., bimanual deltas) based on the locations of gestures 2602A and 2602B, from the user's point of view; from the user's point of view, moving gesture points 2602A and 2602B formed rectangle 2604; as the user moved her hands apart, the corners of rectangle 2604 moved apart, increasing the size of rectangle 2604).
Claim 9
TANG in view of Ravasz, Mendes, and YANAI discloses all the elements as stated in Claim 1 and further discloses wherein the one or more bimanual deltas are determined based on a frame-to-frame movement of the interaction point for each of the first hand and the second hand (Ravasz, ¶¶ [0148] and [0142] with FIGS. 26A-B: the user began by making gestures 2602A and 2602B , bringing her thumb-tips and index fingertips together and touching those gestures together at point 2612; began tracking opposite corners of rectangle 2604 (i.e., bimanual deltas) based on the locations of gestures 2602A and 2602B, from the user's point of view; from the user's point of view, moving gesture points 2602A and 2602B formed rectangle 2604; as the user moved her hands apart, the corners of rectangle 2604 moved apart, increasing the size of rectangle 2604).
Claims 10, 13, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over TANG in view of Ravasz, Mendes, and YANAI as applied to Claims 9, 11, and 16 respectively above, and further in view of Everitt et al. (US 2014/0282274 A1, pub. date: 09/18/2014), hereinafter Everitt.
Claims 10, 13, and 18
TANG in view of Ravasz, Mendes, and YANAI discloses all the elements as stated in Claims 9, 11, and 16 respectively and further discloses wherein the one or more bimanual deltas include: (Ravasz, ¶ [0142]-[0149] with FIGS. 25 and 26A-B: identify an action corresponding to starting object selection; the action can be a two-handed "pinch" gesture, formed with the user's thumb from each hand touching the index or middle finger of that hand, with the two pinches touching at the thumb/finger intersection point; the action can be a two-handed "L" gesture, formed with a first hand having the thumb sticking up perpendicular to the floor and the index finger parallel to the floor and with a second hand having the thumb sticking down perpendicular to the floor and the index finger parallel to the floor, defining two opposite corners of a rectangle; continuously determine a shape defined by a first tracked portion of a first user hand and a second tracked portion of a second user hand; display a representation (such as an outline) of the shape determined; identify a pyramid formed with the tip of the pyramid at one of the user's eyes and the pyramid walls being formed based on the shape determined; upon the user releasing the gesture identified, or when the velocity of the user's hand movement falls below a threshold, select the objects that fall within the pyramid (or other shape) identified; the user began by making gestures 2602A and 2602B , bringing her thumb-tips and index fingertips together and touching those gestures together at point 2612; began tracking opposite corners of rectangle 2604 (i.e., bimanual deltas) based on the locations of gestures 2602A and 2602B, from the user's point of view; from the user's point of view, moving gesture points 2602A and 2602B formed rectangle 2604; as the user moved her hands apart, the corners of rectangle 2604 moved apart, increasing the size of rectangle 2604; as the user formed this rectangle, determined a pyramid formed with the pyramid tip at the user's dominant eye and with sides extending through the edges of rectangle 2604; in example 2600, continuously selected (or deselected) objects that at least partially intersect with the pyramid until the user released one of gestures 2602A or 2602B; in example 2650, the user formed a rectangle 2652 by forming two pinch gestures and pulling them apart; formed a pyramid 2654 with the tip of the pyramid at the user's dominant eye 2656 and extending so that the four triangles that form edges of the pyramid coincide with rectangle 2652; determined any objects that are both beyond the rectangle 2652 (i.e. on the opposite side of the rectangle 2652 from the user) and that fall completely within the pyramid 2654) (TANG, ¶ [0058] with FIG. 6: for two hand manipulation, two or more modes may be available; e.g., each hand may select a different affordance, thus enabling two hand scaling, rotation, manipulation, etc.; this may provide the user more control; as shown at 630 of FIG. 6, an affordanceless mode may also be called, where the user aims two rays at an object surface and performs scaling, rotation, translation, etc.; movement of the user's hands may cause the manipulation; e.g., if the user's hands stay the same distance apart the object rotates; if the user's hands spread, the object may be resized, etc.).
TANG in view of Ravasz, Mendes, and YANAI fails to explicitly disclose wherein the one or more bimanual deltas include: a translation delta corresponding to a frame-to-frame translational movement of the interaction point for each of the first hand and the second hand.
Everitt teaches a system and a method relating to detecting gestures (Everitt, ¶ [0001]), wherein the one or more bimanual deltas include: a translation delta corresponding to a frame-to-frame translational movement of the interaction point for each of the first hand and the second hand (Everitt, ¶ [0014]: identifying a synchronous movements of the plurality of control objects to adjust/pan the displayed context based on the detected movement; the panning matches the detected synchronous movement of the plurality of control objects; the plurality of control objects are hands of a user; ¶¶ [0064]-[0075] with FIGS. 2A-C: user 210 is shown positioned in a positive z-axis location facing the x-y plane, and user 210 may thus make a gesture that may be captured by a camera, with the user facing the display, with the coordinates of the motion captured by the camera processed by a computer using the corresponding x, y, and z coordinates as observed by the camera; for a panning gesture illustrated by FIG. 2A, movement across x and y coordinates by control objects in a control plane may be the same or different from x and y coordinates used to display and manipulate content on a display surface; the user may then move the control objects, which are hands in FIG. 2A; detect the motion of the control objects, and translate this motion to pan content displayed in a display surface; two hands are used in a linear, open palm motion across the detection area as illustrated; a stream of frames containing x, y, and Z coordinates of the user hands and optionally other joint locations may then be received to identify the gesture; to engage the panning operation, the user may hold both hands still and level; once the system is engaged, panning may begin; while panning the application may track the average motion of the 2 hands onto the object being panned; when the user has moved the object to the desired location they may disengage the panning operation using a panning disengagement motion; during a panning gesture, first control object 230 moves from position 1A to position 1B, and second control object Substantially simultaneously moves in an approximately synchronized motion from location 2A to location 2B; the synchronized relative position (i.e., bimanual delta) between the first control object 230 and the second control object 240 is maintained during the gesture; in response to detection and processing of the gesture, a content portion 215 moves from an initial position 3a to a panned position 3b, where the movement from position 3a to 3b corresponds with the synchronized movement from locations 1A and 2A to locations 1B and 2B of control objects 230 and 240; a tolerance threshold may be identified for the level of synchronization of the control objects; FIG. 2C includes first control object 230 and second control object 240; during a panning gesture movement mode, an initial synchronized relative position may be established as synchronized relative position 220 when a user first places the first control object and the second control object 240 into a control plane; threshold 226 shows an allowable variation in the synchronized relative position 220 during a panning gesture; if synchronized relative position 220 varies beyond threshold 226, the panning gesture movement mode may be terminated, and the content presented at the content surface may stop panning to match movements of the first and second control objects; the threshold 226 may be variable, based on a number of different factors; e.g., the threshold may be made proportional to the velocity or speed of the control objects; during extended panning sessions, a user may grow tired, and the threshold may be increased over time to compensate for reduced user control as the user grows tired; ¶ [0104]: a gesture dictionary may store movement data or patterns for recognizing gestures that may include pokes, pats, taps, pushes, guiding, flicks, turning, rotating, grabbing and pulling, two hands with palms open for panning images, drawing (e.g., finger painting), forming shapes with fingers, and Swipes, all of which may be accomplished on or in close proximity to the apparent location of a virtual object in a generated display).
TANG in view of Ravasz, Mendes, and YANAI, and Everitt are analogous art because they are from the same field of endeavor, a system and a method relating to detecting gestures. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to apply the teaching of Everitt to TANG in view of Ravasz, Mendes, and YANAI. Motivation for doing so would provide more comprehensive gesture dictionary for users to .
Response to Arguments
Applicant’s arguments filed on 0 with respect to Claims 1, 11, and 16 have been fully considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
HOLZ et al. (US 2019/0042957 A1, pub. date: 02/07/2019) discloses in ABSTRACT and ¶¶ [0019]-[0031] that (1) determining whether positions and/or motions of an object (e.g., hand, tool, hand and tool combinations, other detectable objects or combinations thereof) might be interpreted as an interaction with one or more virtual objects; (2) detecting a hand in a three-dimensional (3D) sensory space and generating a predictive model of the hand, and using the predictive model to track motion of the hand, wherein the predictive model includes positions of calculation points of fingers, thumb and palm of the hand; (3) dynamically selecting at least one manipulation point proximate to a virtual object based on the motion tracked by the predictive model and positions of one or more of the calculation points, and manipulating the virtual object by interaction between at least some of the calculation points of the predictive model and the dynamically selected manipulation point; (4) detecting opposable motion and positions of the calculation points of the hand using the predictive model, which includes detecting opposable motion and positions of the calculation points of the hand using the predictive model, detecting a manipulation point proximate to a point of convergence of the opposable calculation points, and assigning a strength attribute to the manipulation point based on a degree of convergence of the opposable calculation points; (5) the dynamically selected manipulation point is selected from a predetermined list of available manipulation points for a particular form of the virtual object; (6) the dynamically selected manipulation point is created proximate to the virtual object based on the motion tracked by the predictive model and positions of the calculation points; (7) dynamically selecting at least one grasp point proximate to the predictive model based on the motion tracked by the predictive model and positions of two or more of the calculation points on the predictive model; (8) force applied by the calculation points is calculated between the manipulation point and grasp point; (9) manipulating the virtual object responsive to a proximity between at least some of the calculation points of the predictive model and the manipulation point of the virtual object; (10) the calculation points include opposable finger tips and a base of the hand or an opposable finger and thumb; (11) detecting two or more hands in the 3D sensory space, generating predictive models of the respective hands, and using the predictive models to track respective motions of the hands; (12) the predictive models include positions of calculation points of the fingers, thumb and palm of the respective hands; (13) dynamically selecting two or more manipulation points proximate to opposed sides of the virtual object based on the motion tracked by the respective predictive models and positions of one or more of the calculation points of the respective predictive models, defining a selection plane through the virtual object linking the two or more manipulation points, and manipulating the virtual object responsive to manipulation of the selection plane; (14) dynamically selecting an grasp point for the predictive model proximate to convergence of two or more of the calculation points, assigning a strength attribute to the grasp point based on a degree of convergence to the dynamically selected manipulation point proximate to the virtual object, and manipulating the virtual object responsive to the grasp point strength attribute when the grasp point and the manipulation point are within a predetermined range of each other; (15) the grasp point of a pinch gesture includes convergence of at least two opposable finger or thumb contact points; (16) the grasp point of a grab gesture includes convergence of a palm contact point with at least one opposable finger contact point; (17) the grasp point of a swat gesture includes convergence of at least two opposable finger contact points; (18) using the predictive model to track motion of the hand and positions of the calculation points relative to two or more virtual objects to be manipulated, dynamically selecting one or more manipulation points proximate to at least one of the virtual objects based on the motion tracked by the predictive model and positions of the calculation points, and manipulating at least one of the virtual objects by interaction between at least some of the calculation points of the predictive model and the dynamically selected manipulation point; and (19) using the predictive model to track motion of the hand and positions of the calculation points relative to two or more virtual objects to be manipulated, manipulating a first virtual object by interaction between at least some of the calculation points of the predictive model and at least one virtual manipulation point of the first virtual object, dynamically selecting at least one manipulation point of a second virtual object responsive to convergence of calculation points of the first virtual object, and manipulating the second virtual object when the virtual manipulation point of the first virtual object and the virtual manipulation point of the second virtual object are within a predetermined range of each other. HOLZ further discloses in ¶¶ [0064]-[0070] with FIG. 2 that (1) determining a manipulation point 201A relative to a prediction model 201A-1, wherein the prediction model 201A-1 is a predicted virtual representation of at least a portion of a hand (i.e., a "virtual hand"), but could also include virtual representations of a face, a tool, or any combination thereof; (2) manipulation point 201A comprises a location in virtual space, wherein a manipulation point can comprise one or more quantities representing various attributes, such as for example a manipulation point "strength" attribute, which is indicated in FIG. 2 by the shading of manipulation point 201A; (3) a manipulation point can be used to describe an interaction in virtual space, properties and/or attributes thereof, as well as combinations thereof; (4) in example 201, a manipulation point 201A represents a location of a "pinch" gesture in virtual space; the shading of the point as depicted by FIG. 2 indicates a relative strength of the manipulation point; (5) with reference to a manipulation point example 202, a manipulation point 202A comprises a strength and a location of a "grab" gesture 202A-1; (6) manipulation points, or attributes thereof, can be used to describe interactions with objects in virtual space; (7) in single handed manipulation example 203 a virtual hand 203A-1 starts with a weak "pinch" manipulation point between the thumb and the index finger; (8) the virtual hand 203A-1 approaches a virtual object 203A-2, and the thumb and index finger are brought closer together; (9) this proximity may increase the strength of the manipulation point 203A; (10) if the strength of the manipulation point exceeds a threshold and/or the manipulation point is in sufficient proximity to a virtual object, the virtual object can be "selected"; (11) selection can comprise a virtual action (e.g., virtual grab, virtual pinch, virtual swat and so forth) relative to the virtual object that represents a physical action that can be made relative to a physical object; (12) virtual actions can result in virtual results (e.g., a virtual pinch can result in a virtual deformation or a virtual swat can result in a virtual translation); (13) thresholding (or other quantitative techniques) can be used to describe the extent of a virtual action yielding a virtual result depending on an object type and other properties of the scene; (14) once a manipulation point selects a virtual object, the virtual object can be rotated, translated, scaled, and otherwise manipulated; (15) if the thumb and index finger of the virtual hand become separated, the strength of the manipulation point may decrease, and the object may be disengaged from the prediction model; (16) a two handed interaction example 204 illustrates a two-handed manipulation of a virtual object 204A-2 facilitated by a plurality of manipulation points 204A; (17) the manipulation point 204A need not intersect the virtual object 204A-2 to select it; (18) a plurality of manipulation points may engage with one another and "lock" on as if one or more of the plurality was itself a virtual object; and (19) two or more manipulation points may lock if they both exceed a threshold strength; this may define a "selection plane" 204X (or vector, or other mathematical construct defining a relationship) as illustrated in 204. HOLZ also discloses in ¶¶ [0071]-[0075] with FIG. 3 that (1) a collection of "calculation points" 301-1 in proximity to a virtual hand 301 can be input into a "manipulation point determination method" to determine at least a portion of at least one parameter of a manipulation point 301-3; (2) determining a weighted average of distance from each calculation point to an anchor point; (3) calculation point(s) can evolve through space; (4) in example 301B underlying prediction model 301 has changed from previous configuration of prediction model 301 in Example 301A, and the manipulation point 301-3 is determined to be at a different location based at least in part on the evolution of model 301; (5) with reference to example 303A, an "anchor point" 303-2 can be defined as a calculation point and can serve as an input into the manipulation point determination method; e.g., an anchor point can be selected according to a type of interaction and/or a location of where the interaction is to occur (i.e., a center of activity) (e.g., a pinch gesture indicates an anchor point between the thumb and index finger, a thrumming of fingertips on a desk indicates an anchor point located at the desk where the wrist is in contact); (6) as shown with reference to example 303B in comparison to example 303A, a manipulation point 303-3 can be determined based at least in part upon the one or more calculation points 303-1 and the anchor point 303-2; e.g., the location is determined using a weighted average of the locations of the calculation points with respect to the location of the anchor point; (7) the strength of the manipulation point 303-3 can be determined in a variety of ways, such as for example according to a location of the calculation point determined to be "farthest" away from manipulation point 303-3; (8) alternatively, the strength could be determined according to a weighting of different distances of calculation points from the manipulation point 303-3; (9) by moving an anchor point around relative to a predictive model, a resulting manipulation point can be in various locations; e.g., with reference to example 305A, an anchor point 305-2 may be defined in a different location on the prediction model 301; (10) the location of an anchor point can influence the type of manipulation point calculated; (11) with reference to example 303B, the anchor point 303-3 could be used to define a "grab" point, while the configuration of example 305B yields a manipulation point 305-3 that can be used to define a pinch point; (12) an anchor point 307-3 in example 307A can itself serve as a calculation point, thereby enabling determining a further refined manipulation point 307-4 as shown by example 307B; (13) a weighted average of the location and strength of a plurality of manipulation points 307-3, 307-3-2 in example 307 can be used to define a "general manipulation point" 307-4 in example 307B; (14) anchor or calculation points can be placed on objects external to the prediction model as illustrated with reference to example 309; (15) as shown by example 309, an object 309-5, separate from predictive model 301 includes an anchor point 309-2. Object(s) 309-5 can be purely virtual constructs, or virtual constructs based at least on part on prediction models of physical object; (16) with reference to example 311, such object is a "virtual surface" 311-5, and complex interactions can be enabled by determining the manipulation point of a prediction model 301 with respect to at least one anchor point 311-2 defined on virtual surface 311-5; (17) such virtual surface can correspond to a desk, kitchen countertop, lab table or other work surface(s) in physical space; and (18) association of anchor point 311-2 with virtual surface 311-5 can enable modeling of a user interaction "anchored" to a physical surface, e.g., a user's hand resting on a flat surface while typing while interacting meaningfully with the virtual space.
Krejov et al. ("A Multitouchless Interface: Expanding User Interaction", IEEE Computer Graphics and Applications, VOL. 34, NO. 3, May 2014, pp. 40-48) discloses in Introduction Section of Pages 40-41 with FIG. 1 that (1) detects only fingertips but does so without machine learning and a large amount of training data; (2) its real-time methodology uses geodesic maxima on the hand’s surface instead of its visual appearance, which is efficient to compute and robust to both the pose and environment; (3) with this methodology, create a “multitouchless” interface that allows direct interaction with data through finger tracking and gesture recognition; (4) extending interaction beyond surfaces within the operator’s physical reach (see Figure 1) provides greater scope for interaction; (5) easily extend the user’s working space to walls or the entire environment; (6) the approach also achieves real-time tracking of up to four hands on a standard desktop computer, which opens up possibilities for collaboration, especially because the workspace isn’t limited by the display’s size. Krejov further discloses "Tracking Fingertips" with FIGS. 2-5 Section of Pages 41-45 that (1) first, capture the depth image and calibrate the point cloud; (2) locate and separate hand blobs with respect to the image domain; (3) processing each hand in parallel, build a weighted graph from the real-world point information for each hand’s surface; (4) an efficient shortest-path algorithm traverses the graph to find candidate fingertips; (5) then filter these candidates on the basis of their location relative to the center of the hand and the wrist, and use them in a temporally smoothed model of the fingertip locations; (6) remove the body and background from the depth image so the remaining depth space is the workspace for gesture-based interaction; (7) cluster any points in this space into connected blobs; (8) ignore blobs smaller than potential hand shapes so the remaining blobs are candidates for the user’s arms; (9) then classify the points in each blob as part of a hand subset or wrist subset, wherein this classification uses a depth threshold of one-quarter of the arm’s total depth; (10) the hand’s center serves as the seed point for distance computation; (11) simply using the centroid of points would result in the seed point shifting when the hand is opened and closed; (12) use the hand’s chamfer distance, measured from the external boundary to find a stable center, wherein the chamfer distance is a transform that computes the distance between sets of points, and in this context, it computes each point’s distance from the closest boundary point; (13) a hand’s center is the point with the greatest distance in the chamfer image; (14) by mapping the hand’s surface and searching for extremities, find the fingertips, excluding the closed fingers; (15) conduct this search by mapping the distance from the hand’s center to all other points on the hand’s surface; (16) the geodesic distances with the greatest local value correspond to geodesic maxima; (17) search for up to five extrema to account for all open fingertips; (18) in practice, however, the wrist forms additional extremities with a similar geodesic distance so, greedily compute the first seven extremities, which accounts for each fingertip, including two false positives; (19) when the fingers are closed, the tips aren’t of interest because the fingers contribute to the fist’s formation; (20) the extremity normally associated with a folded finger forms an additional false positive that we filter out at a later stage; (21) to find the candidate fingertips, first build a weighted undirected graph of the hand, wherein each point in the hand represents a vertex in the graph; (22) connect these vertices with neighboring vertices in an 8-neighborhood fashion, deriving the edge cost from the Euclidean distance between their world coordinates; (23) using the hand’s center as the seed point, compute the first geodesic extremity of the hand graph; (24) use an efficient implementation of Dijkstra’s shortest-path algorithm and search for hand locations with the longest direct path from the hand’s center (for a given source vertex, Dijkstra’s algorithm finds the lowest-cost path to all other vertices in the mesh, and the geodesic extremity is the vertex with the greatest cost); (25) this uses the hand’s internal structure to find the extrema and, as such, is robust to contour noise, which is frequent in the depth image and would cause circular features to fail; (26) then find the remaining extrema iteratively, again using Dijkstra’s algorithm but with a noninitialized distance map, reducing the computational complexity; (27) Figure 2 shows the seven extremities found for various hand shapes, wherein each extremity is associated with its shortest path, as Figure 2f shows; (28) next, filter the candidate fingertips’ points to a subset of valid fingertips; (29) combine a penalty metric derived from the path taken during Dijkstra’s algorithm with a candidate’s position relative to the hand’s center; (30) finding the penalty for each candidate requires the covariance of the hand and arm points, translated to the wrist location (the covariance models the variation in pixels for the chosen body part); (31) using the covariance to map the wrist in this manner takes into consideration the variability of hand shapes; (32) translate the covariance to form an elliptical mask centered around the wrist; (33) if a pixel has a Mahalanobis distance within three standard deviations of the wrist, mark it as 1; otherwise, we mark it as 0 (see Figure 3); (34) to find the penalty, use both the paths from the extrema to the hand’s center and the elliptical mask; (35) after finding the penalty for each candidate, remove the candidate with the greatest penalty; (36) reduce the remaining candidates using the Euclidean distance of the fingertip to the hand’s center, rejecting fingers with a distance less than 7 mm, which forms a sphere around the user’s hand; (37) consider any fingertip outside this sphere a true positive; and (38) it’s more accurate to track the fingertips in 3D than in the image domain because the additional depth information and filter estimation provide subpixel accuracy.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HWEI-MIN LU whose telephone number is (313)446-4913. The examiner can normally be reached Mon - Fri: 9:00 AM - 6:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela D. Reyes can be reached at (571) 270-1006. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/HWEI-MIN LU/Primary Examiner, Art Unit 2142