Last updated: April 19, 2026

Application No. 18/728,018

High-Definition Video Segmentation for Web-Based Video Conferencing

Non-Final OA §102§103

Filed

Jul 10, 2024

Examiner

FLORA, NURUN N

Art Unit

2619

Tech Center

2600 — Communications

Assignee

Google LLC

OA Round

1 (Non-Final)

Interview Optional

— +1.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 387 resolved cases, 2023–2026

Examiner Intelligence

FLORA, NURUN N View full profile →

Grants 86% — above average

Career Allow Rate

331 granted / 387 resolved

+23.5% vs TC avg

Minimal +1% lift

Without

With

+1.3%

Interview Lift

resolved cases with interview

Fast prosecutor

2y 1m

Avg Prosecution

24 currently pending

Career history

411

Total Applications

across all art units

Statute-Specific Performance

§101

5.5%

-34.5% vs TC avg

§103

46.5%

+6.5% vs TC avg

§102

27.1%

-12.9% vs TC avg

§112

9.6%

-30.4% vs TC avg

Black line = Tech Center average estimate • Based on career data from 387 resolved cases

Office Action

§102 §103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation Under - 35 USC § 112(f)
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

Use of the word “means” (or “step for”) in a claim with functional language creates a rebuttable presumption that the claim element is to be treated in accordance with 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph).  The presumption that 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph) is invoked is rebutted when the function is recited with sufficient structure, material, or acts within the claim itself to entirely perform the recited function.  
Absence of the word “means” (or “step for”) in a claim creates a rebuttable presumption that the claim element is not to be treated in accordance with 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph).  The presumption that 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph) is not invoked is rebutted when the claim element recites function but fails to recite sufficiently definite structure, material or acts to perform that function. 

Claim elements in this application that use the word “means” (or “step for”) are presumed to invoke 35 U.S.C. 112(f) except as otherwise indicated in an Office action.  Similarly, claim elements that do not use the word “means” (or “step for”) are presumed not to invoke 35 U.S.C. 112(f) except as otherwise indicated in an Office action.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)       the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)       the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)       the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: encoder blocks are configured to process, bottleneck blocks are configured to process, decoder blocks are configured to process, a convolution block configured to process, concatenation block configured to concatenate, upsample block configured to upsample, in claims 15-20.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

	
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1-7, 9-11 is/are rejected under 35 U.S.C. 102(a)(1) and/or 102(a)(2) as being anticipated by Lu et al. (US 20190130229 A1, hereinafter Lu).
Regarding claim 1, Lu discloses a computing system (figs. 1, 7) for web-based video segmentation (components 702-720 of the deep salient object segmentation system 110 may be implemented as one or more web-based applications hosted on a remote server, fig. 7, ¶0143; the deep salient object segmentation system 110 can provide a real-time salient content neural network and/or a static salient content neural network to the client device 104a (i.e., a mobile device) as part of a digital image editing application installed on the client device 104a, ¶0054, fig. 1. As apparent from fig. 1, & ¶0054, the neural network is provided to client device 104a through the web, since system 102 is in server. Also see 102 is a web-hosting server in ¶0050 & ¶0052), the system comprising:
one or more processors (1002, fig. 10); and
one or more non-transitory computer-readable media (1004, 1006, ¶0166-0168, fig. 10) that collectively store instructions that, when executed by the one or more processors, cause the computing system to perform operations, the operations comprising (¶0166-0168):
 	obtaining image data (utilize the client device 104a to capture a real-time digital visual media feed (e.g., a live video feed from a smartphone camera), ¶0055) from a user computing device (client device 104a), wherein the image data comprises an object (salient object in ¶0055, also see fig. 4);
implementing, by one or more processors of the user computing device, a video conferencing web application (¶0054 and ¶0143; for video conferencing see ¶0091: video call), wherein implementing the video conferencing web application comprises:
processing, by the one or more processors of the user computing device, the image data with a machine-learned image segmentation model to generate segmentation data (¶0054, ¶0091);
generating augmented image data based at least in part on the segmentation data (¶0091, last sentence; deep salient object segmentation system 110 can modify the segmented real-time digital visual media feed (similar to modifying the segmented digital image at the act 318), ¶0089 in connection with ¶0087 describing act 318; whether the modified video is regarded as "augmented" or not is not a technical question but rather a question of judgment, and it is reasonably understood that the processing proposed in D1 is done with the intention to lead to a "better" and thus "augmented" video; see also claim 7 of the application: background replacement is disclosed in D1 in ¶0087, last sentence; it is also applied to video, see ¶0089 as cited above); and
providing the augmented image data for display (¶0082, first sentence).
Regarding claim 2, Lu discloses the system of any preceding claim 1, the operations further comprise:
transmitting a request to utilize an image augmentation service associated with the video conferencing web application; and obtaining a software package from a server computing system based on the request, wherein the software package comprises the machine-learned image segmentation model (¶0054).
Regarding claim 3, Lu discloses the system of any preceding claim 1, wherein the image data comprises a video associated with a video conference call, wherein each frame of the video is processed by the machine-learned image segmentation model to generate a plurality of
segmentation masks, and wherein the augmented image data comprises an augmented video generated based on the plurality of segmentation masks (¶0091 & fig. 2c, probability (e.g. Foreground Pixel Mask)).
Regarding claim 4, Lu discloses the system of any preceding claim 1, wherein the operations further comprise: sending the augmented image data to a second user computing device associated with a second user (¶0091, last sentence).
Regarding claim 5, Lu discloses the system of any preceding claim 1, wherein the segmentation data comprises a segmentation mask associated with the object (fig. 2c).
Regarding claim 6, Lu discloses the system of claim 5, wherein the segmentation mask is descriptive of a plurality of pixels associated with a human, wherein the object comprises a human (figs. 2c & 4).
Regarding claim 7, Lu discloses the system of any preceding claim 1, wherein the augmented image data is descriptive of the object with an artificial background (¶0091 & 0089 in connection with ¶0087, last sentence).
Regarding claim 9, Lu discloses the system of any preceding claim 1, wherein the object comprises at least a portion of a human (fig. 4).
Regarding claim 10, Lu discloses the system of any preceding claim 1, wherein the image data is obtained as part of a video conference service provided by the video conferencing web application, wherein the video conference service comprises: sending the augmented image data to a second user computing device; receiving second user image data from the second user computing device; and providing the second user image data for display (¶0091; receiving and displaying video from a remote user is implicit to the term "video calf').
Regarding method claim(s) 11, although wording is different, the material is considered substantively equivalent to the system claim(s) 1 as described above.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claim 8, 12 is rejected under 35 U.S.C. 103 as being unpatentable over Lu in view of Zingade et al. (US 20220237735 A1, hereinafter Zingade).
Regarding claim 8, Lu discloses the system of any preceding claim 1, except, wherein the image data is captured by a webcam associated with the user computing device.
However, Zingade discloses techniques to automatically detect and enlarge a speaking one of plurality of participants on one side of a video conference, wherein, the speaking participant is identified using one or more heuristics and/or one or more neural networks (Abstract). Web camera is used to capture images of the participants in the tele-conference (¶0002-0003, ¶0054, 0073).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to modify the invention of Lu such that image capturing camera is a web camera as disclosed by Zingade, because, combining prior art elements ready to be improved according to known method to yield predictable results is obvious.
Regarding claim 12, Lu discloses the method of any preceding claim 11, wherein processing, by the user computing system, the image data with the machine-learned image segmentation model to generate the segmentation data comprises processing utilizing a see processors in ¶0142, figs. 7-8. See claim 1 rejection above, Abstract, ¶0050-0054, ¶0143).
Lu is not found disclosing expressly that the processing unit is graphics processing unit. 
However, Zingade discloses techniques to automatically detect and enlarge a speaking one of plurality of participants on one side of a video conference, wherein, the speaking participant is identified using one or more heuristics and/or one or more neural networks (Abstract). Zingade further discloses that machine learning segmentation task is performed in a computing system having GPU therein (¶0527-0529, fig. 37).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to implement the machine-learned image segmentation model to generate the segmentation data in processing utilizing a graphics processing unit of a user computing device of the user computing system as disclosed by Zingade, because, combining prior art elements ready to be improved according to known method to yield predictable results is obvious.
Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Lu in view of Zingade and further in view of SMITH et al. (US 20200098161 A1, hereinafter Smith).
Regarding claim 13, Lu discloses the method of any preceding claim 11, wherein processing, by the user computing system, the image data with the machine-learned image segmentation model to generate the segmentation data comprises accessing the one or more processors (see processors in ¶0142, figs. 7-8. See claim 1 rejection above, Abstract, ¶0050-0054, ¶0143).
Lu is not found disclosing expressly that accessing the one or more processors is done 
 via a web browser application programming interface.
However, Zingade discloses techniques to automatically detect and enlarge a speaking one of plurality of participants on one side of a video conference, wherein, the speaking participant is identified using one or more heuristics and/or one or more neural networks (Abstract). Zingade further discloses that accessing the one or more HW processors 3622 is accessed to setup model registry 3624 of training system 36604 via a  (¶0517, ¶0521, fig. 37).
Zingade however is not found disclosing expressly that the API is web browser based.
Smith however discloses that GPU is access through WebGL APIs running on a web browser (¶0001).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to implement the model training of Lu using an API of Zingade to transfer the training system registry elements,  wherein the API is run  on a web browser of Smith, to obtain, accessing the one or more processors is done via a web browser application programming interface, because, combining prior art elements ready to be improved according to known method to yield predictable results is obvious.

Allowable Subject Matter
Claims 15-20 are allowed.

Regarding claim 15, Lu discloses one or more non-transitory computer-readable media (fig. 10: memory 1004, storage 1006) that collectively store instructions that, when executed by one or more computing devices, cause the one or more computing devices to perform operations (first sentence ¶0166, and ¶0168), the operations comprising: 
 	obtaining input data, wherein the input data comprises image data associated with a user, and wherein the image data comprises an object in a scene (utilize the client device 104a to capture a real-time digital visual media feed (e.g., a live video feed from a smartphone camera), ¶0055 and salient object; see also fig. 4);
processing the input data with a machine-learned image segmentation model to generate segmentation data (¶0054, 0091). 
Lu is not found disclosing expressly the limitation of, wherein the machine-learned image segmentation model comprises: one or more encoder blocks, wherein the one or more encoder blocks are configured to process the image data to generate an encoder output, and wherein the one or more encoder blocks comprise: a channel expansion block; a depthwise convolution block; and a channel compression block; one or more bottleneck blocks, wherein the one or more bottleneck blocks are configured to process the encoder output to generate a bottleneck output;
one or more decoder blocks, wherein the one or more decoder blocks are configured to process the bottleneck output to generate the segmentation data; and in response to processing the input data with the machine-learned image segmentation model, generating output data based on the segmentation data, wherein the output data comprises augmented image data, wherein the augmented image data is descriptive of the object without the scene.
	Machine-learned image segmentation models using an encoder-decoder model with
bottleneck are well-known in the art. E.g., Goren (US 20200327334 A1) discloses segmenting neural network 108 may be a SegNET deep fully convolutional neural network having 10,000-100,000 parameters trained to perform semantic pixel-wise segmentation on reduced resolution input (¶0017).
Therefore, basic encoder and decoder structure is indeed disclosed before in the art as it pertains to segmentation of image data, even for a video conferencing session. However, the detailed structure at least as pertains to channel expansion block, channel compression block and bottleneck block are not disclosed expressly in Goren. Therefore, prior arts of record taken alone or in combination fails to reasonably disclose or suggest, encoder blocks comprise: a channel expansion block; a channel compression block; one or more bottleneck blocks, wherein the one or more bottleneck blocks are configured to process the encoder output to generate a bottleneck output; one or more decoder blocks, wherein the one or more decoder blocks are configured to process the bottleneck output to generate the segmentation data; and in response to processing the input data with the machine-learned image segmentation model, generating output data based on the segmentation data, wherein the output data comprises augmented image data, wherein the augmented image data is descriptive of the object without the scene.

Claims 16-20 are allowable for being dependent on allowable claim 15.

Claim 14 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  
Regarding claim 14, prior arts of record taken alone or in combination fails to reasonably disclose or suggest,
storing, by the user computing system, the software package; accessing, by the user computing system, a web page associated with the web application; obtaining, by the user computing system, additional image data; processing, by the user computing system, the additional image data with the machine- learned image segmentation model to generate additional segmentation data; generating, by the user computing system, additional augmented image data based at least in part on the additional segmentation data.



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NURUN FLORA whose telephone number is (571)272-5742. The examiner can normally be reached M-F 9:30 am -5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jason Chan can be reached at (571) 272-3022. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NURUN FLORA/Primary Examiner, Art Unit 2619

Read full office action

Prosecution Timeline

Jul 10, 2024

Application Filed

Feb 07, 2026

Non-Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/379,893

Patent 12592025

IMAGE RENDERING BASED ON LIGHT BAKING

2y 5m to grant Granted Mar 31, 2026

18/203,122

Patent 12586250

COMPRESSION AND DECOMPRESSION OF SUB-PRIMITIVE PRESENCE INDICATIONS FOR USE IN A RENDERING SYSTEM

2y 5m to grant Granted Mar 24, 2026

18/534,271

Patent 12586254

High-quality Rendering on Resource-constrained Devices based on View Optimized RGBD Mesh

2y 5m to grant Granted Mar 24, 2026

18/468,218

Patent 12579751

TECHNIQUES FOR PARALLEL EDGE DECIMATION OF A MESH

2y 5m to grant Granted Mar 17, 2026

18/354,619

Patent 12561896

INSERTING THREE-DIMENSIONAL OBJECTS INTO DIGITAL IMAGES WITH CONSISTENT LIGHTING VIA GLOBAL AND LOCAL LIGHTING INFORMATION

2y 5m to grant Granted Feb 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

86%

Grant Probability

87%

With Interview (+1.3%)

2y 1m

Median Time to Grant

Low

PTA Risk

Based on 387 resolved cases by this examiner. Grant probability derived from career allow rate.