Last updated: April 19, 2026
Application No. 18/978,248
USING WEARABLE DEVICES TO CAPTURE ACTIONS OF PARTICIPANTS IN A MEETING

Non-Final OA §102§103
Filed
Dec 12, 2024
Examiner
LAM, VINH TANG
Art Unit
2628
Tech Center
2600 — Communications
Assignee
Zoom Video Communications, Inc.
OA Round
2 (Non-Final)
Interview Optional

— +9.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 655 resolved cases, 2023–2026
Examiner Intelligence

LAM, VINH TANG View full profile →
Grants 72% — above average
Career Allow Rate
471 granted / 655 resolved
+9.9% vs TC avg
Moderate +9% lift
Without
With
+9.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
25 currently pending
Career history
680
Total Applications
across all art units
Statute-Specific Performance

§101
2.0%
-38.0% vs TC avg
§103
47.4%
+7.4% vs TC avg
§102
31.5%
-8.5% vs TC avg
§112
14.3%
-25.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 655 resolved cases
Office Action

§102 §103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:

A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

2.	Claim(s) 1-3, 6-10, 13-17, and 20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Legatski et al. (US Patent/PGPub. No. 20240163123).


Regarding Claim 1,
Legatski et al. teach
a method ([0010], FIG. 1, i.e. methods of the present disclosure allow users to use their hands to activate pre-programmed execution of the various controls) comprising:
connecting ([0012], FIG. 1, i.e. connected to multiple communication networks 120, 130, through which various client devices 140-180 can participate in video conferences), by a virtual conference device ([0012], FIG. 1, i.e. video conference provider 110 … client devices 140-180), to one or more client devices ([0012], FIG. 1, i.e. client devices 140-180; [0031], FIG. 2, i.e. client devices 220-250) associated with one or more remote participants ([0039], FIG. 2, i.e. media servers 212 may not be co-located, but instead may be located at multiple different geographic locations) in a virtual conference ([0039], FIG. 2, i.e. video conference) and one or more user devices ([0032], FIG. 2, i.e. video conference provider 210 uses one or more real-time media servers 212) associated with one or more local participants ([0039], FIG. 2, i.e. one or more of these servers may be co-located … local servers 212) within a common area ([0039], FIG. 2, i.e. a client's premises, e.g., at a business or other organization);
receiving, by the virtual conference device, sensor data ([0056], FIG. 2, i.e. gesture recognition engine 270, a gaze recognition engine 275) related to a local participant ([0039], FIG. 2, i.e. one or more of these servers may be co-located … local servers 212) associated with a user device ([0032], FIG. 2, i.e. video conference provider 210) of the one or more user devices ([0056], FIG. 2, i.e. combination of the servers or clients, for example, depicted in systems 100, 200) within the common area during the virtual conference ([0056], FIG. 2, i.e. user is participating within a virtual meeting);
determining ([0058], FIG. 2, i.e. gesture recognition engine 270 can be configured to track and interpret movements made by a user), by the virtual conference device, a physical action ([0056], FIG. 2, i.e. user's movements can be captured by a digital camera) of the local participant associated with the user device at least based on the sensor data (i.e. please see above citation(s)); and
providing, by the virtual conference device, a representation ([0057], FIG. 2, i.e. user's movements can be captured by a digital camera and the user can view themselves in a video feed) of the physical action to the one or more client devices associated with the one or more remote participants ([0033], FIG. 2, i.e. real-time media servers 212 provide multiplexed multimedia streams to meeting participants, such as the client devices 220-250 … video and audio streams … are transmitted from the client devices 220-250).

Regarding Claim 2,
Legatski et al. teach
the method of claim 1, wherein the physical action (i.e. please see above citation(s)) comprises hand raising (FIG. 4 & 5A, i.e. as shown by the figure(s) user raising his hand 304A), headshaking, head-nodding, hand-waving, standing up, sitting down, or making a thumbs-up gesture (i.e. alternative limitation(s) omitted).

Regarding Claim 3,
Legatski et al. teach
the method of claim 1, wherein the representation of the physical action (i.e. please see above citation(s)) is overlayed on a section of a graphical user interface ([0057], FIG. 4 & 5A, i.e. user can interact with the accessible control panel generated by the overlay engine 265 by using their hands within a video feed to select different controls) of the virtual conference corresponding to the virtual conference device (i.e. please see above citation(s)).

Regarding Claim 6,
Legatski et al. teach
the method of claim 1, wherein the user device of the one or more user devices (i.e. please see above citation(s)) comprises one or more sensors ([0050], FIG. 2, i.e. computing device designed to receive video … from each of the cameras) configured to collect sensor data ([0056], FIG. 2, i.e. gesture recognition engine 270, a gaze recognition engine 275) associated with the local participant (i.e. please see above citation(s)).

Regarding Claim 7,
Legatski et al. teach
the method of claim 1, wherein the sensor data (i.e. please see above citation(s)) comprises motion data ([0056], FIG. 2, i.e. gesture recognition engine 270), location data, biometric data, audio data, or video data (i.e. alternative limitation(s) omitted).

Regarding Claim 8,
Legatski et al. teach
a system ([0031], FIG. 2, i.e. system 200) comprising:
a communications interface ([0090], FIG. 8, i.e. communications buses 802);
a non-transitory computer-readable medium ([0090], FIG. 8, i.e. memory 820); and
one or more processors ([0090], FIG. 8, i.e. processor 810) communicatively coupled to the communications interface and the non-transitory computer-readable medium, the one or more processors (i.e. please see above citation(s)) configured to execute processor-executable instructions ([0090], FIG. 8, i.e. execute processor-executable instructions) stored in the non-transitory computer-readable medium (i.e. please see above citation(s)) to:
connect ([0012], FIG. 1, i.e. connected to multiple communication networks 120, 130, through which various client devices 140-180 can participate in video conferences) to one or more client devices ([0012], FIG. 1, i.e. client devices 140-180; [0031], FIG. 2, i.e. client devices 220-250) associated with one or more remote participants ([0039], FIG. 2, i.e. media servers 212 may not be co-located, but instead may be located at multiple different geographic locations) in a virtual conference ([0039], FIG. 2, i.e. video conference) and one or more user devices ([0032], FIG. 2, i.e. video conference provider 210 uses one or more real-time media servers 212) associated with one or more local participants ([0039], FIG. 2, i.e. one or more of these servers may be co-located … local servers 212) within a common area ([0039], FIG. 2, i.e. a client's premises, e.g., at a business or other organization);
receive sensor data ([0056], FIG. 2, i.e. gesture recognition engine 270, a gaze recognition engine 275) related to a local participant ([0039], FIG. 2, i.e. one or more of these servers may be co-located … local servers 212) associated with a user device ([0032], FIG. 2, i.e. video conference provider 210) of the one or more user devices ([0056], FIG. 2, i.e. combination of the servers or clients, for example, depicted in systems 100, 200) within the common area during the virtual conference ([0056], FIG. 2, i.e. user is participating within a virtual meeting);
determine ([0058], FIG. 2, i.e. gesture recognition engine 270 can be configured to track and interpret movements made by a user) a physical action ([0056], FIG. 2, i.e. user's movements can be captured by a digital camera) of the local participant associated with the user device at least based on the sensor data (i.e. please see above citation(s)); and
provide a representation ([0057], FIG. 2, i.e. user's movements can be captured by a digital camera and the user can view themselves in a video feed) of the physical action to the one or more client devices associated with the one or more remote participants ([0033], FIG. 2, i.e. real-time media servers 212 provide multiplexed multimedia streams to meeting participants, such as the client devices 220-250 … video and audio streams … are transmitted from the client devices 220-250).

Regarding Claim 9,
Legatski et al. teach
the system of claim 8, wherein the physical action (i.e. please see above citation(s)) comprises hand raising (FIG. 4 & 5A, i.e. as shown by the figure(s) user raising his hand 304A), headshaking, head-nodding, hand-waving, standing up, sitting down, or making a thumbs-up gesture (i.e. alternative limitation(s) omitted).

Regarding Claim 10,
Legatski et al. teach
the system of claim 8, wherein the representation of the physical action (i.e. please see above citation(s)) is overlayed on a section of a graphical user interface ([0057], FIG. 4 & 5A, i.e. user can interact with the accessible control panel generated by the overlay engine 265 by using their hands within a video feed to select different controls) of the virtual conference on a virtual conference device (i.e. please see above citation(s)).

Regarding Claim 13,
Legatski et al. teach
the system of claim 8, wherein the user device of the one or more user devices (i.e. please see above citation(s)) comprises one or more sensors ([0050], FIG. 2, i.e. computing device designed to receive video … from each of the cameras) configured to collect sensor data ([0056], FIG. 2, i.e. gesture recognition engine 270, a gaze recognition engine 275) associated with the local participant (i.e. please see above citation(s)).

Regarding Claim 14,
Legatski et al. teach
the system of claim 8, wherein the sensor data (i.e. please see above citation(s)) comprises motion data ([0056], FIG. 2, i.e. gesture recognition engine 270), location data, biometric data, audio data, or video data (i.e. alternative limitation(s) omitted).

Regarding Claim 15,
Legatski et al. teach
a non-transitory computer-readable medium ([0090], FIG. 8, i.e. memory 820) comprising processor-executable instructions ([0090], FIG. 8, i.e. execute processor-executable instructions) configured to cause one or more processors ([0090], FIG. 8, i.e. processor 810) to:
connect ([0012], FIG. 1, i.e. connected to multiple communication networks 120, 130, through which various client devices 140-180 can participate in video conferences) to one or more client devices ([0012], FIG. 1, i.e. client devices 140-180; [0031], FIG. 2, i.e. client devices 220-250) associated with one or more remote participants ([0039], FIG. 2, i.e. media servers 212 may not be co-located, but instead may be located at multiple different geographic locations) in a virtual conference ([0039], FIG. 2, i.e. video conference) and one or more user devices ([0032], FIG. 2, i.e. video conference provider 210 uses one or more real-time media servers 212) associated with one or more local participants ([0039], FIG. 2, i.e. one or more of these servers may be co-located … local servers 212) within a common area ([0039], FIG. 2, i.e. a client's premises, e.g., at a business or other organization);
receive sensor data ([0056], FIG. 2, i.e. gesture recognition engine 270, a gaze recognition engine 275) related to a local participant ([0039], FIG. 2, i.e. one or more of these servers may be co-located … local servers 212) associated with a user device ([0032], FIG. 2, i.e. video conference provider 210) of the one or more user devices ([0056], FIG. 2, i.e. combination of the servers or clients, for example, depicted in systems 100, 200) within the common area during the virtual conference ([0056], FIG. 2, i.e. user is participating within a virtual meeting);
determine ([0058], FIG. 2, i.e. gesture recognition engine 270 can be configured to track and interpret movements made by a user) a physical action ([0056], FIG. 2, i.e. user's movements can be captured by a digital camera) of the local participant associated with the user device at least based on the sensor data (i.e. please see above citation(s)); and
provide a representation ([0057], FIG. 2, i.e. user's movements can be captured by a digital camera and the user can view themselves in a video feed) of the physical action to the one or more client devices associated with the one or more remote participants ([0033], FIG. 2, i.e. real-time media servers 212 provide multiplexed multimedia streams to meeting participants, such as the client devices 220-250 … video and audio streams … are transmitted from the client devices 220-250).

Regarding Claim 16,
Legatski et al. teach
the non-transitory computer-readable medium of claim 15, wherein the physical action (i.e. please see above citation(s)) comprises hand raising (FIG. 4 & 5A, i.e. as shown by the figure(s) user raising his hand 304A), headshaking, head-nodding, hand-waving, standing up, sitting down, or making a thumbs-up gesture (i.e. alternative limitation(s) omitted).

Regarding Claim 17,
Legatski et al. teach
the non-transitory computer-readable medium of claim 15, wherein the representation of the physical action (i.e. please see above citation(s)) is overlayed on a section of a graphical user interface ([0057], FIG. 4 & 5A, i.e. user can interact with the accessible control panel generated by the overlay engine 265 by using their hands within a video feed to select different controls) of the virtual conference on a virtual conference device (i.e. please see above citation(s)).

Regarding Claim 20,
Legatski et al. teach
the non-transitory computer-readable medium of claim 15, wherein the sensor data (i.e. please see above citation(s)) comprises motion data ([0056], FIG. 2, i.e. gesture recognition engine 270), location data, biometric data, audio data, or video data (i.e. alternative limitation(s) omitted).

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

3.	Claim(s) 4, 11, and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Legatski et al. (US Patent/PGPub. No. 20240163123) in view of HARO (US Patent/PGPub. No. 20230146178).

Regarding Claim 4,
Legatski et al. teach
the method of claim 1.
However, Legatski et al. do not explicitly teach
the representation of the physical action is inserted to a chat window associated with the virtual conference.
In the same field of endeavor, HARO teaches
the representation of the physical action is inserted to a chat window ([0150], FIG. 1, i.e. the eye movements are represented in the 3D environment in which the clients engage, for example during chat sessions) associated with the virtual conference ([0039], FIG. 1, i.e. environments such as videoconferences, chat sessions, virtual reality environments).
It would have been obvious to a person having ordinary skill in the art at the time the invention’s effective date was filed to combine Legatski et al. teaching method of tracking user’s action in a virtual conference with HARO teaching of method of tracking user’s action and presenting user’s representation in a virtual conference to effectively provide seamless, realistic and natural communication feeling due to the corrected eye gaze (HARO’s [0151]).

Regarding Claim 11,
Legatski et al. teach
the system of claim 8.
However, Legatski et al. do not explicitly teach
the representation of the physical action is inserted to a chat window associated with the virtual conference.
In the same field of endeavor, HARO teaches
the representation of the physical action is inserted to a chat window ([0150], FIG. 1, i.e. the eye movements are represented in the 3D environment in which the clients engage, for example during chat sessions) associated with the virtual conference ([0039], FIG. 1, i.e. environments such as videoconferences, chat sessions, virtual reality environments).
It would have been obvious to a person having ordinary skill in the art at the time the invention’s effective date was filed to combine Legatski et al. teaching system of tracking user’s action in a virtual conference with HARO teaching of system of tracking user’s action and presenting user’s representation in a virtual conference to effectively provide seamless, realistic and natural communication feeling due to the corrected eye gaze (HARO’s [0151]).

Regarding Claim 18,
Legatski et al. teach
the non-transitory computer-readable medium of claim 15.
However, Legatski et al. do not explicitly teach
the representation of the physical action is inserted to a chat window associated with the virtual conference.
In the same field of endeavor, HARO teaches
the representation of the physical action is inserted to a chat window ([0150], FIG. 1, i.e. the eye movements are represented in the 3D environment in which the clients engage, for example during chat sessions) associated with the virtual conference ([0039], FIG. 1, i.e. environments such as videoconferences, chat sessions, virtual reality environments).
It would have been obvious to a person having ordinary skill in the art at the time the invention’s effective date was filed to combine Legatski et al. teaching medium of tracking user’s action in a virtual conference with HARO teaching of medium of tracking user’s action and presenting user’s representation in a virtual conference to effectively provide seamless, realistic and natural communication feeling due to the corrected eye gaze (HARO’s [0151]).

4.	Claim(s) 5, 12, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Legatski et al. (US Patent/PGPub. No. 20240163123) in view of Cheung et al. (US Patent/PGPub. No. 10701316).

Regarding Claim 5,
Legatski et al. teach
the method of claim 1.
However, Legatski et al. do not explicitly teach
the user device of the one or more user devices is connected to one or more wearable devices comprising one or more sensors, the one or more sensors are configured to collect sensor data associated with the local participant.
In the same field of endeavor, Cheung et al. teach
the user device of the one or more user devices (Col. 5, Ln. 31-36, FIG. 1, i.e. first video conferencing system 12) is connected to one or more wearable devices (Col. 5, Ln. 16-23, FIG. 1, i.e. variety of devices with video conferencing capabilities, such as … wearable device) comprising one or more sensors (Col. 5, Ln. 31-36, FIG. 1, i.e. image capture systems 20A-20B), the one or more sensors are configured to collect sensor data (Col. 5, Ln. 31-36, FIG. 1, i.e. image capture capabilities) associated with the local participant (Col. 5, Ln. 23-30, FIG. 1, i.e. user 30A).
It would have been obvious to a person having ordinary skill in the art at the time the invention’s effective date was filed to combine Legatski et al. teaching method of tracking user’s action in a virtual conference with Cheung et al. teaching method of sensing user’s movement of a wearable device in a virtual conference to effectively provide video conferencing via an increasing popular wearable device by sensing user’s movement (Cheung et al.’s Col. 1, Ln. 11-16).

Regarding Claim 12,
Legatski et al. teach
the system of claim 8.
However, Legatski et al. do not explicitly teach
the user device of the one or more user devices is connected to one or more wearable devices comprising one or more sensors, the one or more sensors are configured to collect sensor data associated with the local participant.
In the same field of endeavor, Cheung et al. teach
the user device of the one or more user devices (Col. 5, Ln. 31-36, FIG. 1, i.e. first video conferencing system 12) is connected to one or more wearable devices (Col. 5, Ln. 16-23, FIG. 1, i.e. variety of devices with video conferencing capabilities, such as … wearable device) comprising one or more sensors (Col. 5, Ln. 31-36, FIG. 1, i.e. image capture systems 20A-20B), the one or more sensors are configured to collect sensor data (Col. 5, Ln. 31-36, FIG. 1, i.e. image capture capabilities) associated with the local participant (Col. 5, Ln. 23-30, FIG. 1, i.e. user 30A).
It would have been obvious to a person having ordinary skill in the art at the time the invention’s effective date was filed to combine Legatski et al. teaching system of tracking user’s action in a virtual conference with Cheung et al. teaching of system of sensing user’s movement of a wearable device in a virtual conference to effectively provide video conferencing via an increasing popular wearable device by sensing user’s movement (Cheung et al.’s Col. 1, Ln. 11-16).

Regarding Claim 19,
Legatski et al. teach
the non-transitory computer-readable medium of claim 15.
However, Legatski et al. do not explicitly teach
the user device of the one or more user devices is connected to one or more wearable devices comprising one or more sensors, the one or more sensors are configured to collect sensor data associated with the local participant.
In the same field of endeavor, Cheung et al. teach
the user device of the one or more user devices (Col. 5, Ln. 31-36, FIG. 1, i.e. first video conferencing system 12) is connected to one or more wearable devices (Col. 5, Ln. 16-23, FIG. 1, i.e. variety of devices with video conferencing capabilities, such as … wearable device) comprising one or more sensors (Col. 5, Ln. 31-36, FIG. 1, i.e. image capture systems 20A-20B), the one or more sensors are configured to collect sensor data (Col. 5, Ln. 31-36, FIG. 1, i.e. image capture capabilities) associated with the local participant (Col. 5, Ln. 23-30, FIG. 1, i.e. user 30A).
It would have been obvious to a person having ordinary skill in the art at the time the invention’s effective date was filed to combine Legatski et al. teaching medium of tracking user’s action in a virtual conference with Cheung et al. teaching medium of sensing user’s movement of a wearable device in a virtual conference to effectively provide video conferencing via an increasing popular wearable device by sensing user’s movement (Cheung et al.’s Col. 1, Ln. 11-16).

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to VINH TANG LAM whose telephone number is (571) 270-3704. The examiner can normally be reached Monday to Friday 8:00 AM to 5:00 PM.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nitin K Patel can be reached at (571) 272-7677. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/VINH T LAM/Primary Examiner, Art Unit 2628
Read full office action
Prosecution Timeline

Dec 12, 2024
Application Filed
Oct 24, 2025
Non-Final Rejection — §102, §103
Jan 28, 2026
Response Filed
Mar 02, 2026
Non-Final Rejection — §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

19/016,747
Patent 12596512
CONTENT RENDERING METHOD AND APPARATUS, READABLE MEDIUM, AND ELECTRONIC DEVICE
2y 5m to grant Granted Apr 07, 2026
18/809,196
Patent 12592051
OPTIMIZATION OF EYE CAPTURE CONDITIONS FOR EACH USER AND USE CASE
2y 5m to grant Granted Mar 31, 2026
18/057,147
Patent 12579446
MACHINE-LEARNING TECHNIQUES FOR RISK ASSESSMENT BASED ON CLUSTERING
2y 5m to grant Granted Mar 17, 2026
18/804,463
Patent 12581829
DISPLAY DEVICE
2y 5m to grant Granted Mar 17, 2026
18/953,534
Patent 12566525
TOUCH DEVICE
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

2-3
Expected OA Rounds
72%
Grant Probability
81%
With Interview (+9.2%)
2y 8m
Median Time to Grant
Moderate
PTA Risk
Based on 655 resolved cases by this examiner. Grant probability derived from career allow rate.