Last updated: April 19, 2026

Application No. 18/600,926

SYSTEMS AND METHODS FOR ARTIFICIAL-INTELLIGENCE ASSISTANCE IN VIDEO COMMUNICATIONS

Non-Final OA §102§103

Filed

Mar 11, 2024

Examiner

NGUYEN, DUC MINH

Art Unit

2691

Tech Center

2600 — Communications

Assignee

Liveperson Inc.

OA Round

1 (Non-Final)

This examiner grants 22% of cases after interview

— +17.4% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 85 resolved cases, 2023–2026

Examiner Intelligence

NGUYEN, DUC MINH View full profile →

Grants only 22% of cases

Career Allow Rate

19 granted / 85 resolved

-39.6% vs TC avg

Strong +17% interview lift

Without

With

+17.4%

Interview Lift

resolved cases with interview

Typical timeline

3y 11m

Avg Prosecution

16 currently pending

Career history

101

Total Applications

across all art units

Statute-Specific Performance

§101

2.4%

-37.6% vs TC avg

§103

62.6%

+22.6% vs TC avg

§102

22.5%

-17.5% vs TC avg

§112

8.3%

-31.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 85 resolved cases

Office Action

§102 §103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 11, 2024, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-2 and 4-6 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Kelly (US 20220327961 A1).

Regarding claim 1, Kelly discloses a computer-implemented method comprising: extracting one or more video frames from video streams of a set of communication sessions (“a sequence of these per-frame features” Kelly [0019]), wherein the video frames include a representation of an object (Fig. 5, 53; Kelly), and wherein the object is associated with an issue for which the communication session is established (“sign language information” Kelly [0019]); defining a training dataset from the one or more video frames and features extracted from the set of communication sessions (Fig. 3; Kelly); training a neural network using the training dataset, the neural network being configured to generate predictions of actions associated with the object (Fig. 4, 407-408; Kelly); extracting a video frame from a new video stream of a new communication session, wherein the video frame includes a representation of a particular object, and wherein the object is associated with a particular issue (“In real time, or after the signing is completed, the sign language information is sent to 12, which extracts out features (e.g. body pose keypoints, hand keypoints, hand pose, thresholded image, etc. . . . )” Kelly [0019]); executing the neural network using the video frame from the new video stream, wherein the neural network generates a predicted action associated with the particular object (“The features produced by 12 are then transmitted to component 13 which extracts sign language information (e.g. detecting if an individual is signing, transcribing that signing into gloss, or translating that signing into a target language) from a sequence of these per-frame features.” Kelly [0019]); and facilitating a transmission of a communication to a device of the new communication session, the communication including a representation of the predicted action (“Finally, the output is displayed on 14.” Kelly [0019]; Fig. 1, 14; Kelly).

Regarding claim 2, Kelly discloses the computer-implemented method of claim 1, wherein the new communication session is between a user device (Fig. 1, 11; Kelly) and a terminal device (Fig. 1, 14; Kelly).  

Regarding claim 4, Kelly discloses the computer-implemented method of claim 1, wherein the neural network is an ensemble network comprising two or more neural networks configured to generate outputs of different types (“In the processing module 202, the feature train is split into each individual sign via the sign-splitting component 209 via a 1D Convolutional Neural Network which highlights the sign transition periods.” Kelly [0028]).  

Regarding claim 5, Kelly discloses the computer-implemented method of claim 1, wherein the neural network is configured to generate a boundary box over the object (“These results are combined to find the bounding box of both the dominant and non-dominant hand by iterating through all bounding boxes found from 205 and finding the one closest to each wrist joint produced by 206.” Kelly [0023]).  

Regarding claim 6, Kelly discloses the computer-implemented method of claim 1, wherein the neural network is configured to generate a predicted identification of the object (Fig. 4, 407-408; Kelly).  

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 3 and 7-20 are rejected under 35 U.S.C. 103 as being unpatentable over Kelly (US 20220327961 A1) in view of Ramakrishnan et. al (US 20240146871 A1, hereinafter Ramakrishnan).

Regarding claim 3, Kelly discloses the computer-implemented method of claim 1.
Kelly does not disclose “wherein the particular issue is associated with a hardware or software fault in a device operated by a user.”
However, Ramakrishnan does teach wherein the particular issue is associated with a hardware or software fault in a device operated by a user (Fig. 3, 312; Ramakrishnan).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the sign recognition application (as taught by Kelly) with the greenhouse emission advising application (as taught by Ramakrishnan). The rationale to do so is to combine prior art elements according to known methods to yield the predictable result of utilizing sign language to communicate an issue with the software or hardware in a device in a video conference environment. 

Regarding claim 7, Kelly discloses the computer-implemented method of claim 1.
Kelly does not disclose “wherein the predicted action associated with the particular object comprises a maintenance action or a repair action configured to restore operability in the particular action.”
However, Ramakrishnan does teach wherein the predicted action associated with the particular object comprises a maintenance action or a repair action configured to restore operability in the particular action (Fig. 5, 502, 504, 508; Ramakrishnan).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the sign recognition application (as taught by Kelly) with the greenhouse emission advising application (as taught by Ramakrishnan). The rationale to do so is to combine prior art elements according to known methods to yield the predictable result of utilizing sign language to communicate an issue with the software or hardware in a device in a video conference environment. 

Regarding claim 8, Kelly discloses a system comprising: extracting one or more video frames from video streams of a set of communication sessions (“a sequence of these per-frame features” Kelly [0019]), wherein the video frames include a representation of an object (Fig. 5, 53; Kelly), and wherein the object is associated with an issue for which the communication session is established (“sign language information” Kelly [0019]); defining a training dataset from the one or more video frames and features extracted from the set of communication sessions (Fig. 3; Kelly); training a neural network using the training dataset, the neural network being configured to generate predictions of actions associated with the object (Fig. 4, 407-408; Kelly); extracting a video frame from a new video stream of a new communication session, wherein the video frame includes a representation of a particular object, and wherein the object is associated with a particular issue (“In real time, or after the signing is completed, the sign language information is sent to 12, which extracts out features (e.g. body pose keypoints, hand keypoints, hand pose, thresholded image, etc. . . . )” Kelly [0019]); executing the neural network using the video frame from the new video stream, wherein the neural network generates a predicted action associated with the particular object (“The features produced by 12 are then transmitted to component 13 which extracts sign language information (e.g. detecting if an individual is signing, transcribing that signing into gloss, or translating that signing into a target language) from a sequence of these per-frame features.” Kelly [0019]); and facilitating a transmission of a communication to a device of the new communication session, the communication including a representation of the predicted action (“Finally, the output is displayed on 14.” Kelly [0019]; Fig. 1, 14; Kelly).  
Kelly does not expressively teach “one or more processors; and a non-transitory machine-readable storage medium storing instructions that when executed by the one or more processors, cause the one or more processors to perform operations including.”
	However, Ramakrishnan does teach one or more processors (Fig. 1, 101; Ramakrishnan); and a non-transitory machine-readable storage medium storing instructions that when executed by the one or more processors, cause the one or more processors to perform operations including (Fig. 1, 102-103; Ramakrishnan).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the sign recognition application (as taught by Kelly) with the greenhouse emission advising application (as taught by Ramakrishnan). The rationale to do so is to combine prior art elements according to known methods to yield the predictable result of utilizing sign language to communicate an issue with the software or hardware in a device in a video conference environment. 

Regarding claim 9, Kelly, in view of Ramakrishnan, discloses the system of claim 8, wherein the new communication session is between a user device (Fig. 1, 11; Kelly) and a terminal device (Fig. 1, 14; Kelly).  

Regarding claim 10, Kelly, in view of Ramakrishnan, discloses the system of claim 8, wherein the particular issue is associated with a hardware or software fault in a device operated by a user (Fig. 3, 312; Ramakrishnan).  

Regarding claim 11, Kelly, in view of Ramakrishnan, discloses the system of claim 8, wherein the neural network is an ensemble network comprising two or more neural networks configured to generate outputs of different types (“In the processing module 202, the feature train is split into each individual sign via the sign-splitting component 209 via a 1D Convolutional Neural Network which highlights the sign transition periods.” Kelly [0028]).  

Regarding claim 12, Kelly, in view of Ramakrishnan, discloses the system of claim 8, wherein the neural network is configured to generate a boundary box over the object (“These results are combined to find the bounding box of both the dominant and non-dominant hand by iterating through all bounding boxes found from 205 and finding the one closest to each wrist joint produced by 206.” Kelly [0023]).  

Regarding claim 13, Kelly, in view of Ramakrishnan, discloses the system of claim 8, wherein the neural network is configured to generate a predicted identification of the object (Fig. 4, 407-408; Kelly).  

Regarding claim 14, Kelly, in view of Ramakrishnan, discloses the system of claim 8, wherein the neural network is configured to generate a predicted identification of the object (Fig. 4, 407-408; Kelly).  

Regarding claim 15, Kelly discloses extracting one or more video frames from video streams of a set of communication sessions (“a sequence of these per-frame features” Kelly [0019]), wherein the video frames include a representation of an object (Fig. 5, 53; Kelly), and wherein the object is associated with an issue for which the communication session is established (“sign language information” Kelly [0019]); defining a training dataset from the one or more video frames and features extracted from the set of communication sessions (Fig. 3; Kelly); training a neural network using the training dataset, the neural network being configured to generate predictions of actions associated with the object (Fig. 4, 407-408; Kelly); extracting a video frame from a new video stream of a new communication session, wherein the video frame includes a representation of a particular object, and wherein the object is associated with a particular issue (“In real time, or after the signing is completed, the sign language information is sent to 12, which extracts out features (e.g. body pose keypoints, hand keypoints, hand pose, thresholded image, etc. . . . )” Kelly [0019]); executing the neural network using the video frame from the new video stream, wherein the neural network generates a predicted action associated with the particular object (“The features produced by 12 are then transmitted to component 13 which extracts sign language information (e.g. detecting if an individual is signing, transcribing that signing into gloss, or translating that signing into a target language) from a sequence of these per-frame features.” Kelly [0019]); and facilitating a transmission of a communication to a device of the new communication session, the communication including a representation of the predicted action (“Finally, the output is displayed on 14.” Kelly [0019]; Fig. 1, 14; Kelly).  
Kelly does not expressively teach “a non-transitory machine-readable storage medium storing instructions that when executed by one or more processors, cause the one or more processors to perform operations including.”
However, Ramakrishnan does teach a non-transitory machine-readable storage medium storing instructions that when executed by one or more processors, cause the one or more processors to perform operations including (Fig. 1, 186-187; Ramakrishnan). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the sign recognition application (as taught by Kelly) with the greenhouse emission advising application (as taught by Ramakrishnan). The rationale to do so is to combine prior art elements according to known methods to yield the predictable result of utilizing sign language to communicate an issue with the software or hardware in a device in a video conference environment. 

Regarding claim 16, Kelly, in view of Ramakrishnan, discloses the non-transitory machine-readable storage medium of claim 15, wherein the new communication session is between a user device (Fig. 1, 11; Kelly) and a terminal device (Fig. 1, 14; Kelly).  

Regarding claim 17, Kelly, in view of Ramakrishnan, discloses the non-transitory machine-readable storage medium of claim 15, wherein the particular issue is associated with a hardware or software fault in a device operated by a user (Fig. 3, 312; Ramakrishnan).  

Regarding claim 18, Kelly, in view of Ramakrishnan, discloses the non-transitory machine-readable storage medium of claim 15, wherein the neural network is an ensemble network comprising two or more neural networks configured to generate outputs of different types (“In the processing module 202, the feature train is split into each individual sign via the sign-splitting component 209 via a 1D Convolutional Neural Network which highlights the sign transition periods.” Kelly [0028]).  

Regarding claim 19, Kelly, in view of Ramakrishnan, discloses the non-transitory machine-readable storage medium of claim 15, wherein the neural network is configured to generate a boundary box over the object (“These results are combined to find the bounding box of both the dominant and non-dominant hand by iterating through all bounding boxes found from 205 and finding the one closest to each wrist joint produced by 206.” Kelly [0023]).  

Regarding claim 20, Kelly, in view of Ramakrishnan, discloses the non-transitory machine-readable storage medium of claim 15, wherein the neural network is configured to generate a boundary box over the object (“These results are combined to find the bounding box of both the dominant and non-dominant hand by iterating through all bounding boxes found from 205 and finding the one closest to each wrist joint produced by 206.” Kelly [0023]).  

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SAAD AHMED SYED whose telephone number is (571) 272-6777. The examiner can normally be reached Monday - Friday 8:30 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached at (571) 272-7503. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SAAD AHMED SYED/Examiner, Art Unit 2691                 

/DUC NGUYEN/Supervisory Patent Examiner, Art Unit 2691

Read full office action

Prosecution Timeline

Mar 11, 2024

Application Filed

Dec 09, 2025

Non-Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/020,906

Patent 12563110

IMAGE SECURITY USING SOURCE IDENTIFICATION

2y 5m to grant Granted Feb 24, 2026

18/237,452

Patent 12549900

LOUDSPEAKER TRANSDUCER ARRANGEMENT

2y 5m to grant Granted Feb 10, 2026

18/192,637

Patent 12477275

Speaker Device and Acoustic System

2y 5m to grant Granted Nov 18, 2025

18/525,814

Patent 12389183

PLAYER DEVICE AND ASSOCIATED SIGNAL PROCESSING METHOD

2y 5m to grant Granted Aug 12, 2025

17/826,502

Patent 11889221

Selective Video Conference Segmentation

2y 5m to grant Granted Jan 30, 2024

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

22%

Grant Probability

40%

With Interview (+17.4%)

3y 11m

Median Time to Grant

Low

PTA Risk

Based on 85 resolved cases by this examiner. Grant probability derived from career allow rate.