DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Status
Claims 1-23 are pending. Independent Claims 1 and 22-23 have been amended. No claim has been added. No claim has been cancelled.
Response to Arguments
Applicant’s arguments based on the amended limitation are moot in view of the Examiner’s new ground of rejection.
Compact Prosecution
With respect to Claim Interpretation, the Examiner has provided some notes regarding “[BRI on the record]” throughout the Office Action, so that the record is clearer about the scope of the claimed invention, and the record is also clear about the basis for the Examiner’s analyses. A clear record of the claim interpretation could expedite the examination by creating the condition to allow the examination to focus on Applicant’s inventive concept and its comparison with related prior art.
If there are disagreements, Applicant may present an alternative interpretation based on MPEP 2111. The Examiner will adopt Applicant’s interpretation on the record, if Applicant’s interpretation is reasonable and/or arguments are persuasive.
Applicant may amend claims relying on the Examiner’s claim interpretation provided on the record.
Claim Objections
The objection to Claim 1 due to minor informalities is withdrawn in view of Applicant’s amendments.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-5, 16-18, 20, and 22-23 are rejected under 35 U.S.C. 103 as being unpatentable over Chong et al. (“Xpression : Mobile real-time facial expression transfer”) in view of Groshev et al. (“GHOST—A New Face Swap Approach for Image and Video Domains”) and Zakharov et al. (“Few-Shot Adversarial Learning of Realistic Neural Talking Head Models.”).
Regarding Claim 1, Chong teaches A computer-implemented method for video analysis (
[Mapping Analysis]
“We developed Xpression, a mobile application which allows user to reenact faces from images and videos with only RGB camera on mobile devices. It transfers facial expression from source user
to target user. Unlike other reenactment researches, our method works on video as well as still images and requires only mobile device. Our application is freely available to the public on iOS App Store.” Chong Abstract.) comprising:
receiving, by a human host, a request for a video chat, wherein the request is initiated by a user (
[BRI on the record]
With respect to “human host,” the Examiner is interpreting the limitation to mean: a person who hosts the video chat recited in the claim.
With respect to “a user,” the Examiner is interpreting the limitation to mean: a person, another party in addition to the claimed “human host,” who uses the video chat recited in the claim.
[Mapping Analysis]
PNG
media_image1.png
276
338
media_image1.png
Greyscale
Here, the video interview is a video chat. Figure 2 shows the human host, and the other party to the interview is mapped to the user.
“Video chat enable better communication through facial expression and gesture etc., and is
widely adopted in both formal and informal scenarios such as remote work, interviews and causal communication.” Chong 2.4 Applications.
Chong teaches a video chat between a human host and a user. However, Chong does not explicitly disclose who made an initial request.
It would have been “Obvious to try” – choosing from a finite number of identified, predictable solutions, with a reasonable expectation of success. For a video chat between two parties A and B, there are two identified, predictable solutions to the problem of starting a video chat: (a) a request is initiated by party A to be received by party B; and (b) the request is initiated by party B to be received by party A. A reasonable expectation of success is expected to start the video chat, no matter who initiates the request.
Therefore, it would have been obvious to try receiving, by a human host, a request for a video chat, wherein the request is initiated by a user.);
retrieving an image for a synthetic host, wherein the image includes a representation of an individual (
[BRI on the record]
With respect to “synthetic host,” the Examiner is interpreting the limitation to mean: a host that is computer generated and is not a real host who hosts the claimed video chat. For example, person A, a real person, is hosting the video chat. An image of person B, a real person as well, is received, and the person B is made by the computer to appear as the host, even though person B is not the host. Here, the person B is a “synthetic host.” The point of the example is to show that the term “synthetic” is about the hosting aspect of the “host” and the performance of the “host.” Meanwhile, the “synthetic host” could represent a real person, e.g., person B in the example, or could represent a synthetic person based on an image generated by AI of a certain person.
With respect to “an individual,” the Examiner is interpreting the limitation to mean: an individual person. This interpretation is based on the plain meaning of the term and the specification:
[0039] The infographic 300 includes retrieving an image for a synthetic host 350, wherein the image includes a representation of an individual. In some embodiments, the retrieving of image can include a video of the individual. The retrieving can further comprise retrieving an image of the individual based on information about the user. An image of the user can be combined with the demographic, economic, and geographic information collected from the website hosting the chat and can be used as input to an artificial intelligence (AI) machine learning model. In embodiments, an AI machine learning model can be trained to recognize ethnicity, sex, and age. The AI machine learning model can access a library of images of individuals that can be used as synthetic hosts. The library of images can include options of ethnicity, sex, age, hair color and style, clothing, accessories, etc. Information related to each host image can be stored as metadata with each image.
Spec. 39.
With respect to “the image includes a representation of an individual,” the Examiner is interpreting the limitation to mean: an image of an individual person. It is not, for example, interpreted as including an image of a dog/cat that is supposed to represent the individual. This interpretation is in light of the specification:
[0048] The infographic 400 includes retrieving an image 450 for a synthetic host, wherein the image includes a representation of an individual. In some embodiments, the retrieving of image can include a video of the individual. The retrieving can further comprise retrieving an image of the individual based on information about the user. An image of the user can be combined with the demographic, economic, and geographic information collected from the website hosting the chat and can be used as input to an artificial intelligence (Al) machine learning model. In embodiments, an AI machine learning model can be trained to recognize ethnicity, sex, and age. The AI machine learning model can access a library of images of individuals that can be used as synthetic hosts. The library of images can include options of ethnicity, sex, age, hair color and style, clothing, accessories, etc. Information related to each host image can be stored as metadata with each image.
Spec. ¶ 48.
PNG
media_image2.png
528
734
media_image2.png
Greyscale
[Mapping Analysis]
PNG
media_image3.png
254
698
media_image3.png
Greyscale
The claimed “image for a synthetic host” is mapped to the disclosed an video/image of the source actor.);
extracting, using one or more processors, the individual from the image that was retrieved (
Figs. 1-2 show that at least the actor face and facial expression are extracted and used in a synthetic image/video.);
capturing a video performance by the human host that is in response to a statement or query by the user (
[BRI on the record]
With respect to the limitation, the Examiner is interpreting the limitation to require: a video performance . . . that is in response to a statement or query by the user.
[Mapping Analysis]
“We developed Xpression, a mobile application which allows user to reenact faces from images and videos with only RGB camera on mobile devices. It transfers facial expression from source user to target user.” Chong Abstract.
Fig. 1 shows the human host performing before a smartphone’s camera.
Fig. 2 and Chong explains, “Even if the user wake up with his pajamas on (shown in the small window), he can still chat with the interviewers appear to be professional and well prepared.” During a video interview, the human host provides video performance in front of a camera to be captured, and in in response to questions, including a statement or query, by the interviewer, mapped to the user.);
creatingChong fig. 1-2),
[BRI on the record]
With respect to “the video performance of the human host is replaced by the individual that was extracted,” the Examiner is interpreting the limitation to mean: the human host in the video performance is replaced by the individual that was extracted.
[0027] . . . In embodiments, multiple images of a synthetic host may be used to create a synthesized chat video that replaces the human host performance in the chat with a performance by the synthesized host.
Spec. ¶ 27.
PNG
media_image2.png
528
734
media_image2.png
Greyscale
With respect to “a synthetic host performance,” the Examiner is interpreting the limitation to mean: the human host in the video performance is replaced by the individual that was extracted, and, after the replacement, the video performance becomes the synthetic host performance.
With respect to “the synthetic host performance is created dynamically,” the Examiner is interpreting the limitation to mean: the human host in the video performance is replaced by the individual that was extracted as the video performance is being captured.
[0022] Techniques for video analysis are disclosed. A user can initiate a video chat session from a website, during a livestream event, etc., and ask a question or make a comment. Information about the user, including demographic, economic, and geographic data can be combined with the image of the user captured from the video chat and can be used as input to an AI machine learning model. The AI neural network can analyze the user information to select an
image of a synthetic host that can be customized to interact with the user during the video chat
session.
[Mapping Analysis]
Chong fig. 1-2 show a synthetic host performance, Fig. 2 and Chong explains, “Even if the user wake up with his pajamas on (shown in the small window), he can still chat with the interviewers appear to be professional and well prepared.” Here, the synthetic host performance is live and changes long the interview chat continues, and therefore, the synthetic host performance is created dynamically.), and
wherein the synthetic host performance responds to the statement or query by the user (Fig. 2 and Chong explains, “Even if the user wake up with his pajamas on (shown in the small window), he can still chat with the interviewers appear to be professional and well prepared.” During an interview, the interviewee continues to responds to the statement or query by the interviewer.);
rendering the video chat, wherein the video chat includes the synthetic host performance (fig. 2); and
supplementing the video chat with one or more additional synthetic host performances (
[BRI on the record]
With respect to “supplementing the video chat,” the Examiner is interpreting the limitation to mean: continuing the video chat. The interpretation is made in light of the specification:
[0086] The system 800 can include a supplementing component 892. The supplementing component 892 can include functions and instructions for supplementing the video chat with one or more additional synthetic host performances. In embodiments, the one or more additional host performances are based on at least one further statement or query by the user.
[Mapping Analysis]
Fig. 2 and Chong explains, “Even if the user wake up with his pajamas on (shown in the small window), he can still chat with the interviewers appear to be professional and well prepared.” During an interview, the interviewee continues to responds to the statement or query by the interviewer. The interviewee’s responses in synthetic video performances to subsequent interview questions are mapped to additional synthetic host performances.).
Chong does not explicitly disclose
creating, by a generative adversarial network (GAN), a synthetic host performance,
wherein the video performance of the human host is replaced by the individual that was extracted.
Groshev teaches
extracting, using one or more processors, aspects of the individual from the image that was retrieved, wherein the aspects of the individual are the individual’s facial features (
PNG
media_image4.png
250
686
media_image4.png
Greyscale
);
creating, by a generative adversarial network (GAN), a synthetic host performance (
PNG
media_image5.png
62
558
media_image5.png
Greyscale
PNG
media_image6.png
120
554
media_image6.png
Greyscale
“In terms of future research we consider the fine tuning process of our solution in a GAN pipeline, where the proposed architecture will be used as a generator and SoTA deep fake detection models will be used as a discriminator.” Groshev VI Conclusion.),
wherein the video performance of the human host is replaced by the individual that was extracted (
“Deep fake [1] is a technique of swapping an original face (target) with another one (source) in an image or video. Different deep fake synthesis approaches exist, when source and target data can be presented as image or video data.” Groshev 1. Introduction.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Groshev’s Face Swap for Video with primary reference Chong. One of ordinary skill in the art would be motivated to beautify the appearance of a participant of a video chat, and/or obscure the identity of the participant of the video chat. Groshev states, “Deep fake stands for a face swapping algorithm where the source and target can be an image or a video. Researchers have investigated sophisticated generative adversarial networks (GAN), autoencoders, and other approaches to establish precise and robust algorithms for face swapping.” Groshev Abstract.
Chong in view of Groshev teaches creating, by a generative adversarial network (GAN), a synthetic host performance, in the context of future research.
Zakarov also teaches creating, by a generative adversarial network (GAN), a synthetic host performance (
PNG
media_image7.png
250
674
media_image7.png
Greyscale
PNG
media_image8.png
408
672
media_image8.png
Greyscale
Note the teaching of “Generator” and “Discriminator.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Zakarov’s GAN with Chong in view of Groshev. One of ordinary skill in the art would be motivated to improve image quality. Zakarov discloses:
PNG
media_image9.png
134
330
media_image9.png
Greyscale
.
Regarding Claim 2, Chong in view of Groshev and Zakarov teaches The method of claim 1 wherein the one or more additional synthetic host performances is based on at least one further statement or query by the user ( Chong Fig. 2 and Chong explains, “Even if the user wake up with his pajamas on (shown in the small window), he can still chat with the interviewers appear to be professional and well prepared.” During an interview, the interviewee continues to responds to the statement or query by the interviewer. The interviewee’s responses to subsequent interview questions, captured in synthetic video performances, are mapped to additional synthetic host performances.).
Regarding Claim 3, Chong in view of Groshev and Zakarov teaches The method of claim 1 wherein the creating a synthetic host performance further comprises changing attributes of the synthetic host (
PNG
media_image10.png
398
318
media_image10.png
Greyscale
After the combination of Chong in view of Groshev, the synthetic host is mapped to the person in the disclosed “Source image” in Groshev. At least the skin tone and ears of the person in the disclosed “source image” is changed in the resulting synthetic image.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Groshev’s Face Swap for Video with primary reference Chong. One of ordinary skill in the art would be motivated to beautify the appearance of a participant of a video chat, and/or obscure the identity of the participant of the video chat.
Regarding Claim 4, Chong in view of Groshev and Zakarov teaches The method of claim 1 wherein the creating a synthetic host performance further comprises changing a background of the synthetic host (
PNG
media_image4.png
250
686
media_image4.png
Greyscale
After the combination of Chong in view of Groshev, the synthetic host is mapped to the person in the disclosed “Source image” in Groshev. The background of the person in the disclosed “source image” is changed in the resulting synthetic image.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Groshev’s Face Swap for Video with primary reference Chong. One of ordinary skill in the art would be motivated to beautify the appearance of a participant of a video chat, and/or obscure the identity of the participant of the video chat.
Regarding Claim 5, Chong in view of Groshev and Zakarov teaches The method of claim 4 wherein the background comprises images, text, audio, or video (
PNG
media_image4.png
250
686
media_image4.png
Greyscale
After the combination of Chong in view of Groshev, the synthetic host is mapped to the person in the disclosed “Source image” in Groshev. The image background of the person in the disclosed “source image” is changed in the resulting synthetic image.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Groshev’s Face Swap for Video with primary reference Chong. One of ordinary skill in the art would be motivated to beautify the appearance of a participant of a video chat, and/or obscure the identity of the participant of the video chat.
Regarding Claim 16, Chong in view of Groshev and Zakarov teaches The method of claim 1 further comprising creating an image of a synthetic host, wherein the creating is based on the individual from the image (
PNG
media_image4.png
250
686
media_image4.png
Greyscale
After the combination of Chong in view of Groshev, the synthetic host is mapped to the person in the disclosed “Source image” in Groshev.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Groshev’s Face Swap for Video with primary reference Chong. One of ordinary skill in the art would be motivated to beautify the appearance of a participant of a video chat, and/or obscure the identity of the participant of the video chat.
Regarding Claim 17, Chong in view of Groshev and Zakarov teaches The method of claim 16 further comprising displaying, to the user, the image of the synthetic host (
Chong fig. 1-2 show a synthetic host performance, Fig. 2 and Chong explains, “Even if the user wake up with his pajamas on (shown in the small window), he can still chat with the interviewers appear to be professional and well prepared.” The interviewee looks professional to the interviewer during the video chat.).
Regarding Claim 18, Chong in view of Groshev and Zakarov teaches The method of claim 1 wherein the video chat includes the user and the synthetic host (Chong fig. 1-2 show a synthetic host performance, Fig. 2 and Chong explains, “Even if the user wake up with his pajamas on (shown in the small window), he can still chat with the interviewers appear to be professional and well prepared.” The video chat is between the synthetic host, the interviewee, and the user, the interviewer.).
Regarding Claim 20, Chong in view of Groshev and Zakarov teaches The method of claim 1 wherein the retrieving an image for a synthetic host includes a video of the individual (
“Deep fake [1] is a technique of swapping an original face (target) with another one (source) in an image or video. Different deep fake synthesis approaches exist, when source and target data can be presented as image or video data.” Groshev 1. Introduction.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Groshev’s Face Swap for Video with primary reference Chong. One of ordinary skill in the art would be motivated to beautify the appearance of a participant of a video chat, and/or obscure the identity of the participant of the video chat.
Claims 22-23 are substantially similar to Claim 1. The rejection analyses of Claim 1 based on Chong in view of Groshev and Zakarov is applied to Claims 22-23. In addition, Claim 22 recites, “A computer program product embodied in a non-transitory computer readable medium for video analysis, the computer program product comprising code which causes one or more processors to perform operations of: . . . “ and Claim 23 recites, “A computer system for video analysis, comprising: a memory which stores instructions; one or more processors coupled to the memory wherein the one or more processors, when executing the instructions which are stored, are configured to: . . . ” (Chong fig. 1, Abstract, 1 Introduction, showing and discussing the use of smartphone and desktop computers to process images).
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Chong in view of Groshev and Zakarov as applied to Claim 1, in further view of Yao et al. (US 20130051542 A1).
Regarding 6, Chong in view of Groshev and Zakarov teaches The method of claim 1.
Chong in view of Groshev and Zakarov does not explicitly disclose wherein the request for a video chat includes information about the user.
Yao teaches wherein the request for a video chat includes information about the user (
PNG
media_image11.png
402
256
media_image11.png
Greyscale
“FIG. 2 illustrates an example interface implementing a social caller ID for a friend of the user.” Yao ¶ 4.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Yao’s inclusion of caller information with Chong in view of Groshev and Zakarov. One of ordinary skill in the art would be motivated to help the other party of a video chat to decide whether to take the call. Spam calls are prevalent, and such information would help a person to screen calls. “FIG. 2 illustrates an example interface implementing a social caller ID for a friend of the user.” Yao ¶ 4.
Claims 7-8, 10, 12, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Chong in view of Groshev, Zakarov, and Yao as applied to Claim 6, in further view of Finster et al. (US 20140129343 A1).
Regarding Claim 7, Chong in view of Groshev, Zakarov, and Yao teaches The method of claim 6.
However, Chong in view of Groshev, Zakarov, and Yao does not explicitly disclose wherein the retrieving an image further comprises selecting an image of the individual based on the information about the user.
Finster teaches wherein the retrieving an image further comprises selecting an image of the individual based on the information about the user (
Finster teaches generating an avatar to please a user, stating “For example, a user is watching an episode of a TV show ‘ABC’ on a device (e.g., Xbox). During an advertising break, the user is presented with an advertisement with an avatar having one or more characteristics which allow the user to recognize that it is based on the user's avatar attributes, but is now wearing a shirt with ‘XYZ’ brand label on the shirt. The user can obtain further information about the ‘XYZ’ brand by interacting with the avatar. For example, the user can click on the avatar and may be presented with additional information about the brand, e.g., a web site, video, etc. The avatar can be dynamically generated as needed for each advertisement presented. By employing the avatar as a digital spokesperson to promote a certain brand of clothing, the advertiser for that brand is able to deliver an engaging and interactive advertising experience to the user that is likely to result in conversions for the advertiser.” Finster ¶ 20.
Finster teaches generating an avatar based on user information, stating “The interface may be the aforementioned user interface 204 provided by the content management or may comprise an API allowing advertisers to create dynamically generated user-based advertising, provide dynamically generated user-based avatars and advertising campaign information to the system 100. The dynamically generated user-based advertising avatar may have avatar feature attributes, such as gender, hair style, hair color, and race, as well as style attributes such as branded clothing, branded props and animations, all of which become associated with the dynamically generated user-based advertising avatar during an instance of the avatar in an advertisement.” Finster ¶ 46.
After Chong in view of Groshev and Yao is combined with Finster, an image of the individual, a synthetic host/avatar, is selected based on user information that considers information, including “gender, hair style, hair color, and race.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Finster’s customization of an avatar to influence a user with Chong in view of Groshev, Zakarov, and Yao. One of ordinary skill in the art would be motivated to influence a user by creating the perception that the user is communicating with someone who shares the user’s attributes. “The avatar can be dynamically generated as needed for each advertisement presented. By employing the avatar as a digital spokesperson to promote a certain brand of clothing, the advertiser for that brand is able to deliver an engaging and interactive advertising experience to the user that is likely to result in conversions for the advertiser.” Finster ¶ 20.
Regarding Claim 8, Chong in view of Groshev, Zakarov, Yao, and Finster teaches The method of claim 6 further comprising customizing an appearance of the synthetic host, wherein the customizing is based on the information from the user (Finster teaches generating an avatar based on user information, stating “The interface may be the aforementioned user interface 204 provided by the content management or may comprise an API allowing advertisers to create dynamically generated user-based advertising, provide dynamically generated user-based avatars and advertising campaign information to the system 100. The dynamically generated user-based advertising avatar may have avatar feature attributes, such as gender, hair style, hair color, and race, as well as style attributes such as branded clothing, branded props and animations, all of which become associated with the dynamically generated user-based advertising avatar during an instance of the avatar in an advertisement.” Finster ¶ 46.
After Chong in view of Groshev, Zakarov, and Yao is combined with Finster, an image of the individual, a synthetic host/avatar, is selected based on user information that considers information, including “gender, hair style, hair color, and race.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Finster’s customization of an avatar to influence a user with Chong in view of Groshev, Zakarov, and Yao. One of ordinary skill in the art would be motivated to influence a user by creating the perception that the user is communicating with someone who shares the user’s attributes.
Regarding Claim 10, Chong in view of Groshev, Zakarov, Yao, and Finster teaches The method of claim 8 wherein the customizing includes a gender of the synthetic host (Finster teaches generating an avatar based on user information, stating “The interface may be the aforementioned user interface 204 provided by the content management or may comprise an API allowing advertisers to create dynamically generated user-based advertising, provide dynamically generated user-based avatars and advertising campaign information to the system 100. The dynamically generated user-based advertising avatar may have avatar feature attributes, such as gender, hair style, hair color, and race, as well as style attributes such as branded clothing, branded props and animations, all of which become associated with the dynamically generated user-based advertising avatar during an instance of the avatar in an advertisement.” Finster ¶ 46. After Chong in view of Groshev, Zakarov, and Yao is combined with Finster, an image of the individual, a synthetic host/avatar, is selected based on user information that considers information, including “gender, hair style, hair color, and race.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Finster’s customization of an avatar to influence a user with Chong in view of Groshev, Zakarov, and Yao. One of ordinary skill in the art would be motivated to influence a user by creating the perception that the user is communicating with someone who shares the user’s attributes.
Regarding Claim 12, Chong in view of Groshev, Zakarov, Yao, and Finster teaches The method of claim 8 wherein the customizing includes clothing or accessories of the synthetic host (
Finster teaches generating an avatar based on user information, stating “The interface may be the aforementioned user interface 204 provided by the content management or may comprise an API allowing advertisers to create dynamically generated user-based advertising, provide dynamically generated user-based avatars and advertising campaign information to the system 100. The dynamically generated user-based advertising avatar may have avatar feature attributes, such as gender, hair style, hair color, and race, as well as style attributes such as branded clothing, branded props and animations, all of which become associated with the dynamically generated user-based advertising avatar during an instance of the avatar in an advertisement.” Finster ¶ 46. After Chong in view of Groshev, Zakarov, and Yao is combined with Finster, an image of the individual, a synthetic host/avatar, is selected based on user information that considers information, including “style attributes such as branded clothing.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Finster’s customization of an avatar to influence a user with Chong in view of Groshev, Zakarov, and Yao. One of ordinary skill in the art would be motivated to influence a user by creating the perception that the user is communicating with someone who shares the user’s attributes.
Regarding Claim 15, Chong in view of Groshev, Zakarov, Yao, and Finster teaches The method of claim 8 further comprising highlighting, by the synthetic host, a product for sale (
Finster teaches generating an avatar to please a user, stating “For example, a user is watching an episode of a TV show ‘ABC’ on a device (e.g., Xbox). During an advertising break, the user is presented with an advertisement with an avatar having one or more characteristics which allow the user to recognize that it is based on the user's avatar attributes, but is now wearing a shirt with ‘XYZ’ brand label on the shirt. The user can obtain further information about the ‘XYZ’ brand by interacting with the avatar. For example, the user can click on the avatar and may be presented with additional information about the brand, e.g., a web site, video, etc. The avatar can be dynamically generated as needed for each advertisement presented. By employing the avatar as a digital spokesperson to promote a certain brand of clothing, the advertiser for that brand is able to deliver an engaging and interactive advertising experience to the user that is likely to result in conversions for the advertiser.” Finster ¶ 20.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Finster’s customization of an avatar to influence a user with Chong in view of Groshev, Zakarov, and Yao. One of ordinary skill in the art would be motivated to influence a user to purchase by creating the perception that the user is communicating with someone who shares the user’s attributes. “The avatar can be dynamically generated as needed for each advertisement presented. By employing the avatar as a digital spokesperson to promote a certain brand of clothing, the advertiser for that brand is able to deliver an engaging and interactive advertising experience to the user that is likely to result in conversions for the advertiser.” Finster ¶ 20.
Claims 9 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Chong in view of Groshev, Zakarov, Yao, and Finster as applied to Claim 8, in further view of Taylor et al. (US 20180048865 A1).
Regarding Claim 9, Chong in view of Groshev, Zakarov, Yao, and Finster teaches The method of claim 8.
Chong in view of Groshev, Zakarov, Yao, and Finster does not explicitly disclose wherein the customizing includes an accent of the synthetic host.
Taylor teaches wherein the customizing includes an accent of the synthetic host (
“In addition, the mobile device 600 can enable the user to select a further customize selectable element 622 to further customize the avatar of the virtual support representative 612. For example, the mobile device 600 enables the user to select features relevant to the speech of the virtual support representative 612, such as talk speed, accent, inflection, pitch, and response time. Another feature can include whether to provide closed captioning within the video chat. Additional or alternative features can include language preferences, privacy settings for the video chat, bandwidth constraints, and preferences with respect to a virtual support representative versus a support representative user.” Taylor ¶ 116. ).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Taylor’s customization of an avatar’s accent with Chong in view of Groshev, Zakarov, Yao, and Finster. One of ordinary skill in the art would be motivated to influence a user by creating the perception that the user is communicating with someone who shares the user’s attributes. A user might be more likely persuaded by someone who shares the same accent and background.
Regarding Claim 11, Chong in view of Groshev, Zakarov, Yao, Finster, and Taylor teaches The method of claim 8 wherein the customizing includes an intonation or pitch of a voice of the synthetic host (“In addition, the mobile device 600 can enable the user to select a further customize selectable element 622 to further customize the avatar of the virtual support representative 612. For example, the mobile device 600 enables the user to select features relevant to the speech of the virtual support representative 612, such as talk speed, accent, inflection, pitch, and response time. Another feature can include whether to provide closed captioning within the video chat. Additional or alternative features can include language preferences, privacy settings for the video chat, bandwidth constraints, and preferences with respect to a virtual support representative versus a support representative user.” Taylor ¶ 116.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Taylor’s customization of an avatar’s speech with Chong in view of Groshev, Zakarov, Yao, and Finster. One of ordinary skill in the art would be motivated to influence a user by creating the perception that the user is communicating with someone who shares the user’s attributes. A user might be more likely persuaded by someone who shares the same speech habits and background. Some may trust an authoritative voice more. Some may prefer a friendly voice.
Claims 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Chong in view of Groshev, Zakarov, Yao, and Finster as applied to Claim 8, in further view of Brady et al. (US 20200034025 A1).
Regarding Claim 13, Chong in view of Groshev, Zakarov, Yao, and Finster teaches The method of claim 8.
Chong in view of Groshev, Zakarov, Yao, and Finster does not explicitly disclose wherein the customizing includes a nationality of the synthetic host.
Brady teaches wherein the customizing includes a nationality of the synthetic host (
“. . . wherein the processor is further configured to: output a customization selection list to the display, wherein the customization selection list comprises at least one of a gender selection, an age selection, an emotion selection, a race selection, a location selection, a nationality selection, and a language selection; receive a customization selection based on the customization list; modify the avatar based on the customization selection to create a modified avatar, . . .” Claim 6.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Brady’s customization of an avatar with Chong in view of Groshev, Zakarov, Yao, and Finster. One of ordinary skill in the art would be motivated to influence a user by creating the perception that the user is communicating with someone who shares the user’s nationality. A user might be more likely persuaded by someone who shares the same background.
Regarding Claim 14, Chong in view of Groshev, Zakarov, Yao, and Finster teaches The method of claim 8.
Chong in view of Groshev, Zakarov, Yao, and Finster does not explicitly disclose wherein the customizing includes an age of the synthetic host.
Brady teaches wherein the customizing includes an age of the synthetic host (
“. . . wherein the processor is further configured to: output a customization selection list to the display, wherein the customization selection list comprises at least one of a gender selection, an age selection, an emotion selection, a race selection, a location selection, a nationality selection, and a language selection; receive a customization selection based on the customization list; modify the avatar based on the customization selection to create a modified avatar, . . .” Claim 6.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Brady’s customization of an avatar with Chong in view of Groshev, Zakarov, Yao, and Finster. One of ordinary skill in the art would be motivated to influence a user by creating the perception that the user is communicating with someone who generally belongs to the same age group. A user might be more likely persuaded by someone who belongs to the same age group.
Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Chong in view of Groshev and Zakarov as applied to Claim 18, in further view of Zhu et al. (“US 20200177823 A1”).
Regarding Claim 19, Chong in view of Groshev and Zakarov teaches The method of claim 18.
However, Chong in view of Groshev and Zakarov does not explicitly disclose wherein the user and the synthetic host are displayed in a split screen display.
Zhu teaches wherein the user and the synthetic host are displayed in a split screen display (
PNG
media_image12.png
430
520
media_image12.png
Greyscale
“For example, as shown in FIG. 4, the first terminal respectively displays the video image of the first terminal and the video image of the second terminal in two rectangular display subareas with the same size that are arranged side by side in the video communication interface.” Zhu ¶ 37.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Zhu’s split screen with Chong in view of Groshev and Zakarov. One of ordinary skill in the art would be motivated to see both parties of the video-chat conversation during the chat. The user would be able to see if the user looks proper in front of a camera, e.g., looking professional, and whether adjustments are needed; and the user would be able to observe the other party’s facial expressions at the same time.
Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Chong in view of Groshev and Zakarov as applied to Claim 1, in further view of Studnicka (US 20190391858 A1).
Regarding Claim 21, Chong in view of Groshev and Zakarov teaches The method of claim 1.
Chong in view of Groshev and Zakarov does not explicitly teach wherein the video chat supports an ecommerce purchase.
Studnicka teaches wherein the video chat supports an ecommerce purchase (“During video chat session 1202, the first user associated with first user device 110 may utilize another interface and/or application to perform online shopping session 1210, and may wish to share an item 1212 that the first user views during online shopping session 1210 with second user 1203 on second user 1203's device.” Studnicka ¶ 78.
PNG
media_image13.png
536
690
media_image13.png
Greyscale
).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Studnicka’s integration of video chat with online shopping with Chong in view of Groshev and Zakarov. One of ordinary skill in the art would be motivated to make it more convenient for a user during online shopping. If shoppers are able to easily compare notes about their shopping experiences, it may motivate stronger desire for shopping. “During video chat session 1202, the first user associated with first user device 110 may utilize another interface and/or application to perform online shopping session 1210, and may wish to share an item 1212 that the first user views during online shopping session 1210 with second user 1203 on second user 1203's device.” Studnicka ¶ 78.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Thies et al. (“HeadOn: Real-time Reenactment of Human Portrait Videos”)
PNG
media_image14.png
194
640
media_image14.png
Greyscale
Shang et al. (“Protecting Real-time Video Chat against Fake Facial Videos Generated by Face Reenactment”)
PNG
media_image15.png
146
374
media_image15.png
Greyscale
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZHENGXI LIU whose telephone number is (571)270-7509. The examiner can normally be reached M-F 9 AM - 5 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached at 571-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ZHENGXI LIU/ Primary Examiner, Art Unit 2611