Last updated: April 20, 2026

Application No. 18/543,832

CONTEXT-AWARE OBJECT INTERACTION FOR VIDEO CONFERENCE STREAM COMPOSITING

Final Rejection §103

Filed

Dec 18, 2023

Examiner

NGUYEN, PHUNG HOANG JOSEPH

Art Unit

2691

Tech Center

2600 — Communications

Assignee

Intel Corporation

OA Round

2 (Final)

Interview Optional

— +32.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 877 resolved cases, 2023–2026

Examiner Intelligence

NGUYEN, PHUNG HOANG JOSEPH View full profile →

Grants 79% — above average

Career Allow Rate

694 granted / 877 resolved

+17.1% vs TC avg

Strong +32% interview lift

Without

With

+32.1%

Interview Lift

resolved cases with interview

Typical timeline

2y 9m

Avg Prosecution

32 currently pending

Career history

909

Total Applications

across all art units

Statute-Specific Performance

§101

5.6%

-34.4% vs TC avg

§103

56.8%

+16.8% vs TC avg

§102

15.2%

-24.8% vs TC avg

§112

8.2%

-31.8% vs TC avg

Black line = Tech Center average estimate • Based on career data from 877 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

		Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-8, 10-18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Shoss in view of Bae et al (US 2025/0103141) OR Hassan (US 2017/0103124) and further in view of Mori et al (US 2023/0108419), Springer (US 2022/0353473) OR Duquette (US 2024/0242380).

Claims 1 and 11, Shoss teaches a computing system and medium configured to perform video segmentation and processing operations, comprising: a memory device to store received video data; and processing circuitry configured to: 
obtain video data from a video data source, the video data depicting a human user and an object in a scene; 
Shoss: Fig. 6A and 6B: host 604 and objects 621, 622 and 685);
obtain context data from at least one other data source that provides non-video data, the context data related to an interaction of the human user with the object; 
Shoss: Figs. 7A and 7B: a host individual picking up a product from a table), speech pertaining to the product (e.g., a host individual mentioning the product), eye gaze (e.g., a host individual looking at the product), and/or other activities, [0069-0070]);
analyze the context data to determine a shape of the object and a type of the interaction of the human user with the object; 
Shoss: to identify objects…[0035].  Please see [0033-0038] for more detail on analysis of object and human interaction.  Regarding context data which can be “news and weather information, sports highlights, product information, reviews of products and services, product promotion, educational material, how-to videos, advertising, and more, [0008] or icons, company logos, [0051, 0069, 0070, 0076].  Regarding shape which can be a “box” as in in product 622 or can shape as in product 621 which is a subject matter keyword that the host selects and discusses.     Springer, via Fig. 10,  utilizes Speech-to-Text to evaluate the textual; data for keywords or character strings, [0120-0128],    the keywords “zoom.com”, [0105], or keyword for a baseball, [0106].  Since both Shoss and Springer teach a capability to identify an object, it is obvious that it can also identify the shape of an object, i.e., every object in Shoss’ figs. 6B, 7A, 7B.  or a baseball (round  baseball at least by obviousness) as in Springer, [0106].  However Examiner wishes to provide additional data to support the obviousness as Bae teaches that keyword identifies an object having a unique shape, See Fig. 2.    Hassan teaches, “when the search keyword describes a shape of a visual pattern, the analysis program analyzes the content of the scene and lists the shape of each object that appears in the scene assigned with a location corresponding to each object's location, [0032]), and 
generate a video stream that includes a virtual background overlaid on the video data, (Shoss: virtual background can replace actual background, [0009]; 
wherein the virtual background is modified to be segmented based on at least one outline of the human user, to remove the virtual background and cause the video stream to display portions of video data inside the at least one outline of the human user, and wherein the virtual background is modified to be further segmented based on the shape of the object and the type of the interaction of the human user with the object, to remove the virtual background and cause the video stream to display other portions of the video data inside the at least one outline of the shape of the objects.   
As presented in the previous action, Shoss teaches everything in the immediate step (Figs 7a and 7b and [0066])… except the amendment highlighted.
Shoss, via Figs. 7a and 7B, generates/composes a livestream video combining of  the outline or contour of the host individual’s hand lifting up the objects/products 621 and 622 with the shape of a can or a box as the host individual interacts (i.e., discusses/demonstrates the products, [0066].   Furthermore, Shoss teaches removing a virtual background and replacing with a new background [0058]… Again he does not teach the highlighted (underlining) amendment.
Mori teaches, via  [0037], “In order to increase the accuracy of the picking control model, it is necessary to let the picking control model learn an image for training which is as close to the real environment RE as possible. For this reason, it is conceivable to generate a virtual environment image showing a state within a photographing range of the virtual camera VC after making the virtual object VO a color close to the real object RO and making the virtual background VB a color close to the real background RB. Alternatively, for example, a portion of the virtual background VB in the virtual environment image may be replaced with a portion of the real background RB in the image taken by the camera 40.  OR
Springer similarly teaches via [0065], “A preconfigured rule may also include negative operators to preclude when a virtual background is not to be used. For example, a user may have personal based virtual backgrounds that are used when video conferencing with family or friends, and may not want to have the virtual background used for company or business meetings. The user may identify conditions and/or parameters of when not to use a particular virtual background for certain meeting contextual information. In this instance, this would preclude use of the virtual background from being automatically selected for those do not use meeting situations. Also, the system may be configured to preclude the user from manually selecting the virtual background is such situations, or at least prompt the user noting the virtual background has been precluded for use for the particular meeting at hand, and then allowing the user to override the preconfigured rule is the user so desires.
Duquette via Fig. 5B and 5D show the contours/masks of the virtual objects once the virtual background being removed.
Therefore it would have been obvious to the ordinary artisan before the effective filing date to incorporate the teaching of Bae or Hassan into the teaching Shoss for the purpose of expressly detail the keyword associating with an item and its characteristic, i.e., shape, to enhance the communication and also to incorporate the teaching of Mori, Springer or Duquette  into the teaching Shoss for the purpose of providing tool to replace/remove the virtual background with some proper real images for the communication.
Claims 2 and 12, wherein the context data includes audio data with speech from the human user, and wherein the processing circuitry is further configured to: perform speech-to-text conversion of the audio data to produce text, (Shoss: speech-to-text, [0045, 0051] where the selection of the product can be based on information in an audio track 232. The information can include a combination of tones. The information can include utterances and/or speech from a host individual. The speech can be processed by a speech-to-text process for further analysis, [0045]” where a product is selected and discussed by a host, [0011]) wherein the shape of the object is determined based on at least one keyword from the text; Springer, via Fig. 10,  utilizes Speech-to-Text to evaluate the textual; data for keywords or character strings, [0120-0128].  Springer further details the if a preconfigured rule includes the keywords “zoom.com”, and the contextual information includes a reference to zoom.com, such as a user email address, then the virtual background boundary area 704 would be initiated and become active during the video meeting, and the system would display the associated virtual background (e.g., the graphical image of the word “ZOOM”., [0105], or keyword for a baseball, [0106]

Claims 3 and 13. The computing system of claim 2, wherein the processing circuitry is further configured to: identify the object in the video data based on the at least one keyword from the text.  (Shoss: subject matter keyword, [0051]; …The selection of the product can be based on information in an audio track 232. The information can include a combination of tones. The information can include utterances and/or speech from a host individual. The speech can be processed by a speech-to-text process for further analysis, [0045]”.

Claims 4 and 14. The computing system of claim 2, wherein the shape of the object is provided from a database of pre-trained objects, and wherein a selection of the object from the database is performed using the at least one keyword from the text. (Springer, via Fig. 10,  utilizes Speech-to-Text to evaluate the textual; data for keywords or character strings, [0120-0128].

Claims 5 and 15. The computing system of claim 2, further comprising:
a camera to capture the video data; and a microphone to capture the audio data.  (Shoss: Some devices may include multiple cameras, including wide-angle, ultrawide, and telephoto lenses, (i.e., camera 608 of Fig. 5), along with stereo microphones, [0005] to provide information which can include a combination of tones. The information can include utterances and/or speech from a host individual. The speech can be processed by a speech-to-text process for further analysis. In embodiments, the defining the virtual background is based on the host individual's spoken words, [0045]).

Claims 6 and 16, further comprising a display device; wherein the processing circuitry is further configured to identify screen content output on the display device; and wherein the interaction of the human user with the object is ignored or identified based on the screen content.  (Shoss: the identification of the foreground object as a product can include performing optical character recognition on text imprinted on a foreground object, and/or implementing other image recognition techniques. Further, the identification of the foreground object as a product can include scanning of an optical code such as a barcode that is imprinted on the product, [0026] or by motion of the product. For example, when a host individual picks up an object, the motion of the object can be detected and a virtual background can be defined and/or selected based on the motion of the object, [0027] on the screen of Fig. 7A and 7B).
Claims 7 and 17, further comprising a user input device; wherein the processing circuitry is further configured to identify a user input provided to the user input device, wherein the user input includes at least one of a keyboard, mouse, touch, or gesture input from the human user; and wherein the interaction of the human user with the object is ignored or identified based on the user input.  (Shoss: The insertion point can be based on an absolute time, a time interval, a change of subject matter, motion of a foreground object, spoken words of a host individual, motion/gestures of a host individual, and/or other criteria, [0039, 0040, 0066, 0072]).

Claims 8 and 18, wherein the processing circuitry is further configured to: analyze the context data to determine a plurality of candidate objects in the scene for interaction; and select the object from the plurality of candidate objects based on at least one other interaction performed by the human user related to the object.  (Shoss: Figs. 6A and 6B: products 621, 622, 685 where the individual interact with the products in Figs. 7A and 7B).

Claims 10 and 20, further comprising: communications circuitry to provide the video stream to another computing system in a video call or video conferencing session.  (Springer: See Fig. 9 for multiple participants at different locations.  Here clearly video stream is distributing to more than one participating system).

Claim(s) 9 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Shoss in view of Bae OR Hassan and in view of Mori, Springer OR Duquette and further in view of in view of Lovemelt.

Claims 9 and 19, Shoss does not teach, “perform video post-processing on the video data based on the shape of the object and the type of the interaction of the human user”.
Lovemelt: [0085] In addition to previously described features and functionality, the video processing system 1020 is depicted as compositing functionality 1026 and effects functionality 1027. In an example, the compositing functionality 1026 is adapted to process camera video streams (e.g., camera feeds 760, 765) from a NIR/Visible camera system (e.g., dual camera system 300), and create a matte (e.g., luma matte 780) and generate output image and video (e.g., image with alpha channel 785) from the two respective video streams. The effects functionality 1027 is also adapted to implement post-processing video effects on all or a portion of the video streams (e.g., with the addition of additional video objects or layers, the distortion of colors, shapes, or perspectives in the video, and any other number of other video changes). In a further example, the video processing system 1020 may operate as a server, to receive and process video data obtained from the video capture system 1030, and to serve video data output to the video input/output system 1010.
Therefore it would have been obvious to the ordinary artisan before the effective filing date to incorporate the teaching of Lovemelt into the teaching Shoss for the purpose of providing a post-processing effect in the composing technique to enhance communication in the immersive video environment.

Response to Arguments
Applicant’s arguments with respect to the current claim(s) filed 1/16/26 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. 
Please see the remark for detail of the applicant’s arguments.
Examiner respectfully disagrees as examiner has produced new references to address applicant’s arguments. 
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to PHUNG-HOANG J. NGUYEN whose telephone number is (571)270-1949. The examiner can normally be reached Reg. Sched. 6:00-3:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached at 571-272-7503. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/PHUNG-HOANG J NGUYEN/Primary Examiner, Art Unit 2691

Read full office action

Prosecution Timeline

Dec 18, 2023

Application Filed

Oct 11, 2025

Non-Final Rejection — §103

Jan 16, 2026

Response Filed

Mar 08, 2026

Final Rejection — §103

Apr 01, 2026

Interview Requested

Apr 09, 2026

Applicant Interview (Telephonic)

Apr 09, 2026

Examiner Interview Summary

Precedent Cases

Applications granted by this same examiner with similar technology

17/843,731

Patent 12598256

DISRUPTED-SPEECH MANAGEMENT ENGINE FOR A MEETING MANAGEMENT SYSTEM

2y 5m to grant Granted Apr 07, 2026

18/518,577

Patent 12591408

DISPLAY APPARATUS AND METHOD INCORPORATING INTEGRATED SPEAKERS WITH ADJUSTMENTS

2y 5m to grant Granted Mar 31, 2026

17/989,972

Patent 12587612

Method and Device for Invoking Public or Private Interactions during a Multiuser Communication Session

2y 5m to grant Granted Mar 24, 2026

18/256,155

Patent 12587705

LIVESTREAMING AUDIO PROCESSING METHOD AND DEVICE

2y 5m to grant Granted Mar 24, 2026

18/629,549

Patent 12587700

GROUPING IN A SYSTEM WITH MULTIPLE MEDIA PLAYBACK PROTOCOLS

2y 5m to grant Granted Mar 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

79%

Grant Probability

99%

With Interview (+32.1%)

2y 9m

Median Time to Grant

Moderate

PTA Risk

Based on 877 resolved cases by this examiner. Grant probability derived from career allow rate.

CONTEXT-AWARE OBJECT INTERACTION FOR VIDEO CONFERENCE STREAM COMPOSITING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email