Last updated: April 19, 2026

Application No. 18/488,582

Generating An Image In A Video Conference

Final Rejection §103

Filed

Oct 17, 2023

Examiner

NGUYEN, PHUNG HOANG JOSEPH

Art Unit

2691

Tech Center

2600 — Communications

Assignee

Zoom Video Communications, Inc.

OA Round

2 (Final)

Interview Optional

— +32.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 877 resolved cases, 2023–2026

Examiner Intelligence

NGUYEN, PHUNG HOANG JOSEPH View full profile →

Grants 79% — above average

Career Allow Rate

694 granted / 877 resolved

+17.1% vs TC avg

Strong +32% interview lift

Without

With

+32.1%

Interview Lift

resolved cases with interview

Typical timeline

2y 9m

Avg Prosecution

32 currently pending

Career history

909

Total Applications

across all art units

Statute-Specific Performance

§101

5.6%

-34.4% vs TC avg

§103

56.8%

+16.8% vs TC avg

§102

15.2%

-24.8% vs TC avg

§112

8.2%

-31.8% vs TC avg

Black line = Tech Center average estimate • Based on career data from 877 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-2, 4-6, 8-11, 13-15, 17-18 and 20  are rejected under 35 U.S.C. 103 as being unpatentable over Bikumala in view of Mireles OR Allegretti and further in view of Cheng et al (US 2020/0401812) OR Teplitsky et al (US 9,582,406) and also further in view of Pauli et al (US 2023/0154166).

Claims 1, 10 and 17, Bikumala teaches a method, a medium and a system , comprising: 
receiving, by a server of a unified communications platform for multimedia communication between clients, an identifier of a video frame of a video conference from a client device;   (Video frames 112 with timestamp 126, associating a sentiment 128, [0020-0025]);
obtaining, by the server, a time-contiguous set of video frames based on the identifier; (timestamp: [0012, 0016-0019, 0022-0025, ]);
extracting, using an image data extraction engine executed by the server, standardized image data from the time-contiguous video frames using an acquisition frequency limiter that adjusts a capture rate based on a processing speed of the server; (Bikumala does not teach the feature.  Cheng teaches: “During the detection and recognition of a target in a real-time video, the video needs to be split into image frames, and a target is then detected and recognized based on a single frame of image. From the perspective of a discernible frequency limit of naked eyes, human eyes can no longer further discern any difference when a video frame rate exceeds 30 frames per second. Therefore, currently the video frame rate is usually set to 30 frames per second or less, [0060]”;    Teplitsky teaches, “Another example for a constraint may be that any video streaming over the Ethernet controller 106 or Ethernet itself may not exceed certain number of frames per seconds (e.g., 24 frames per second—FPS). Such constraint may act for example on Ethernet controller 106 or video controller 108. If a video stream (with a frame rate of 24 frames per second) is to be streamed through the system, then through process-of-elimination, as the Ethernet controller cannot be implicated, it must be that either the video resides on the DUT in RAM 104 or the video must come from the camera controller 112 (provided that no other relevant constraint limits this frequency of display). A user, for example, may specify a scenario such as: “play high frequency video at 30 frames per second.” The frame rate constraint would inform that such a video cannot be captured through the Ethernet controller 106 but must instead either come from RAM 104 or camera controller 114 according to the system topology or DUT representation, col. 6, lines 30-55”.
computing, for each frame in at least a subset of the time-contiguous set of video frames, a score corresponding to a likelihood of having a specified feature; (timestamp of a frame with a sentiment, (e.g., happy, sad, disgust, confused, and the like OR a neutral sentiment, a surprise sentiment, a fear sentiment, a disgust sentiment, an angry sentiment, a happy sentiment, a sad sentiment, or a contempt sentiment. ), [0012, 0014, 0016-0025].  
While Bikumala does not explicitly detail “score”, he teaches “pre-specified” sentiment such as “a surprise sentiment, a fear sentiment, a disgust sentiment, an angry sentiment, or a contempt sentiment, [0019]” where a particular sentiment is mapped or matched the pre-specified sentiment based on the identification of a portion of the media content at a specific timestamp, [0016, 0018].  Here examiner reads that Bikumala suggests such determination as a score where it is utilized to determine the appropriate sentiment (a surprise sentiment, a fear sentiment, a disgust sentiment, an angry sentiment, or a contempt sentiment).    If not a suggestion, it then an obviousness based ordinary understanding of the ordinary artisan.  To support such obviousness, examiner wishes to provide a number of additional references for support:
Mireles:  [0025-0029]: initialize a first face model containing an initial set of coefficients (or “weights”) OR set of condition to obtain the photorealistic feed.  Please see Fig. 3 and [[0058-0060, 0062] for description of weights or conditions, or target set of facial landmark.  See [0064-0070] for baseline coefficients.  Also see [0093] for target weight.
Allegretti: see Tables 1 and 2, [0078-0083]… 
by applying, to the standardized image data, a pre-trained neural network model configured by a model configuration file specifying weights; (Bikumala does not teach the feature.   Pauli teaches:  performance analyzer 316 may provide a command 340 to image selector 318. Command 340 may specify the average classification performance score. Image selector 318 may utilize a weighting scheme, where images 310 comprising such items of interest are weighted more, thereby increasing the likelihood that such images are selected for training, [0040-0041]).
determining, based on the computed scores, a frame having a highest likelihood of having the specified feature; (See the previous step); and 
generating, for storage in a data repository, an image based on the determined frame.  (Bikumala: the server may store a sentiment map of the media content. The sentiment map may identify, based on the timestamp, a portion (e.g., a scene or a chapter) of the media content and a sentiment associated with the portion, [0030].   Mireles, store these types and/or magnitudes of facial muscle actions, expressions, and/or emotions in any other format in the facial expression container., [0053]).
Therefore it would have been obvious to the ordinary artisan before the effective filing date to incorporate the teaching Mireles or Allegretti for the purpose of explicitly presenting a target, threshold or score to accurately determine the content of the communication at a specific time (e.g., identifier/timestamp) with a specific feature (facial feature, human emotion or sentiment).
Claims 2, 11 and 18. The method of claim 1, wherein the time-contiguous set of video frames is from a camera-generated video stream of the client device in the video conference. (Bikumala: camera may capture micro-expressions associated with the users and associate a time stamp with each micro-expression. [0014]).
Claim 4. The method of claim 1, wherein the identifier comprises a timestamp. (See the independent claims).
Claim 5. The method of claim 1, comprising: providing, to the client device, access to the image in the data repository. (Bikumala: the server 104 may use a database 140 to store multiple media content items 132(1) to 132(M) (M>0) and associated sentiment maps 136(1) to 136(M), respectively. [0024] for user 150 viewing, [0022, 0025].   Allegretti:  (Not just user of a device can access the content, but also others as well, i.e., processing the video file on a server according to a set of processing parameters, 4) uploading the processed video file to a storage location, and 5) sending a location (e.g. URL) of the finished processed video file to a second user, wherein the second user can access and view the video at the location provided, [0055] including an administrator, [0116, 0126]).
Claims 6 and 15, The method of claim 1, comprising: providing, to an administrator device different from the client device, access to the image in the data repository. (See claim 5 above).
Claim 8. The method of claim 1, wherein computing the score and determining the frame occur at a time when demand for the server is below a threshold demand level. (Mireles: the image falls below a threshold difference, [0024, 0062, 0072]).
Claim 9. The method of claim 1, wherein the server receives multiple identifiers, including the identifier, wherein the server determines, for each of the multiple identifiers, a corresponding frame having the highest likelihood of having the specified feature, the method further comprising: selecting, from among the corresponding frames, a first frame based on the likelihood of having the specified feature, wherein the generated image for storage in the data repository corresponds to the first frame. (Bikumala: see Fig. 2 for sentiment mapping with each frame and each timestamp. 
Claim 13. The non-transitory computer readable medium of claim 10, wherein the identifier comprises a frame identification number. (See claim 9 OR Bikumala: see Fig. 2 for sentiment mapping with each frame and each timestamp).
Claim 14. The non-transitory computer readable medium of claim 10, the operations comprising: providing, to a device different from the client device, access to the image in the data repository. (See claims 5 and 6).
Claim 20. The system of claim 17, wherein the identifier comprises at least one of a timestamp or a frame identification number. (See claim 9 OR Bikumala: see Fig. 2 for sentiment mapping with each frame and each timestamp).


Claim(s) 3, 12 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Bikumala in view of Mireles OR Allegretti and further in view of Cheng et al (US 2020/0401812) OR Teplitsky et al (US 9,582,406) and further in view of Pauli et al (US 2023/0154166) and further in view of Muraoka et al (US 2018/0330501) or Namose et al (US 2023/0237843).

Claims 3, 12 and 19. Bikumala does not teach  “the time-contiguous set of video frames includes m frames before a video frame associated with the identifier and n frames after the video frame associated with the identifier, wherein m and n are positive integers”.
Muraoka teaches, [0072], “the bone extraction result (labeling) are compared among corresponding pixels for each pixel of a certain target frame image. The compared frame images include m-frame images (m is a positive integer) successive before the target frame image (capturing order is before), n-frame images (n is a positive integer) successive after the target frame image (capturing order is after), and the target frame image, that is m+n+1 frame images”.  OR
Namose teaches, “[0052] Operation section detector 103 detects an operation section to which each frame of the moving image obtained from camera 40 belongs. Specifically, operation section detector 103 inputs a predetermined number of consecutive frames including a frame (hereinafter referred to as “target frame”) from which an operation section is to be detected to inference model 133. For example, a predetermined number (m+n+1) of frames including m consecutive frames before the target frame, the target frame, and n consecutive frames after the target frame are input to inference model 133. Operation section detector 103 detects the operation section indicated by the label output from inference model 133 as the operation section to which the target frame belongs.
	Therefore it would have been obvious to the ordinary artisan before the effective filing date to incorporate the teaching of Muraoka or Namose into the teaching of Bikumala for the purpose of precisely labeling the frames to identify exact the target frame matching with the interest region.

Claim(s) 7 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Bikumala in view of Mireles OR Allegretti and further in view of Cheng et al (US 2020/0401812) OR Teplitsky et al (US 9,582,406) and further in view of Pauli et al (US 2023/0154166) and further in view of Droz et al(US 2012/0011454) OR Shionozaki et al (US 2013/0142452).

Claims 7 and 16. Bikumala does not teach “generating the image comprises: identifying a foreground of the determined frame; and generating the image to include the foreground of the determined frame and a preset background different from a background of the determined frame”.
Droz teaches the feature. See Figs. 8-11. Or
Shinozaki teaches the feature in Figures 2 and 3.
Therefore it would have been obvious to the ordinary artisan before the effective filing date to incorporate the teaching of Droz or Shionozaki into the teaching of Bikumala for the purpose of presenting a communication in a customized new imagery as the foreground and background merged together.
Response to Arguments
Applicant’s arguments with respect to the current claim(s) filed 12/23/25 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicant states/argues, “Independent claim 1 is amended to recite, among other things, (i) "extracting, using an image data extraction engine executed by the server, standardized image data from the time- contiguous video frames using an acquisition frequency limiter that adjusts a capture rate based on a processing speed of the server," and (ii) "applying, to the standardized image data, a pre- trained neural network model configured by a model configuration file specifying weights." None of the cited references discloses or suggests these features”.  Se Remark’s pages 6-8 for detail”.
Examiner respectfully disagrees as examiner has produced the new references addressing the applicant’s argument. 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to PHUNG-HOANG J. NGUYEN whose telephone number is (571)270-1949. The examiner can normally be reached Reg. Sched. 6:00-3:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached at 571-272-7503. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/PHUNG-HOANG J NGUYEN/Primary Examiner, Art Unit 2691

Read full office action

Prosecution Timeline

Oct 17, 2023

Application Filed

Sep 27, 2025

Non-Final Rejection — §103

Nov 10, 2025

Interview Requested

Nov 19, 2025

Applicant Interview (Telephonic)

Nov 19, 2025

Examiner Interview Summary

Dec 23, 2025

Response Filed

Mar 07, 2026

Final Rejection — §103

Apr 14, 2026

Interview Requested

Precedent Cases

Applications granted by this same examiner with similar technology

17/843,731

Patent 12598256

DISRUPTED-SPEECH MANAGEMENT ENGINE FOR A MEETING MANAGEMENT SYSTEM

2y 5m to grant Granted Apr 07, 2026

18/518,577

Patent 12591408

DISPLAY APPARATUS AND METHOD INCORPORATING INTEGRATED SPEAKERS WITH ADJUSTMENTS

2y 5m to grant Granted Mar 31, 2026

17/989,972

Patent 12587612

Method and Device for Invoking Public or Private Interactions during a Multiuser Communication Session

2y 5m to grant Granted Mar 24, 2026

18/256,155

Patent 12587705

LIVESTREAMING AUDIO PROCESSING METHOD AND DEVICE

2y 5m to grant Granted Mar 24, 2026

18/629,549

Patent 12587700

GROUPING IN A SYSTEM WITH MULTIPLE MEDIA PLAYBACK PROTOCOLS

2y 5m to grant Granted Mar 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

79%

Grant Probability

99%

With Interview (+32.1%)

2y 9m

Median Time to Grant

Moderate

PTA Risk

Based on 877 resolved cases by this examiner. Grant probability derived from career allow rate.

Generating An Image In A Video Conference

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email