DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-16 and 18-21 are rejected under 35 U.S.C. 103 as being unpatentable over Chang, Tang OR Karri in view of Amento et al (US 2009/0276802) and further in view of Neustaedter, Gopal OR Kurtz.
(A) Claims 1, 9 and 18, Chang, via Fig. 2, teaches a method, a non-transitory medium and a system for updating a model of a participant of a three dimensional (3D) video conference, the method comprises:
obtaining images of the participant during the 3D video conference; (Fig. 2, step 210 - this is illustrated at step 210. The real-time data 125 collection begins when a user has selected an option for enabling interactive event and facial expression data collection. The facial expression data is collected by a camera 120 photographing the user's face while the user interacts with an interactive device 115. In some embodiments, the facial expression data can be collected by another image sensing device (e.g., a 3D scanner, a thermographic imager, a hyperspectral imager, etc.) in addition to or instead of the camera 120. The facial expression data can be in the form of still image or video data, and can be 2D or 3D images, [0029]);
determining, by a change detector and based on the images, that a captured expression of the participant is an unmodeled expression by a trained model of the participant; (Step 220: A detection module 140 receives the facial expression data, and detects differences in the user's expression relative to the previous images. The detection component 140 records image(s) depicting the new expression, as well as corresponding interactive event data. For example, the detection component 140 can record the interactive event data received at approximately (e.g., ±1-5 seconds) the same time as the new facial expression. In some embodiments, the detection component records the real-time data 125 only when the user's facial expression has changed relative to a baseline facial expression (e.g., a neutral expression). In other embodiments, the real-time data 125 is recorded when the detection module 140 detects any change from one expression to another, [0031]);
causing the trained model of the participant to be re-trained to incorporate the captured expression in response to the determination that the captured expression is an unmodeled expression; (if a matching defined facial expression is not found at step 230, the most similar defined facial expression is selected, [0035], This is illustrated at step 250. The training data 130 is updated to include the recognized facial expression and its associated definition. This allows the expression classifier 150 to be retrained on the updated training data 130 for greater accuracy. [0034]);
receiving an updated model of the participant. (Chang’s claim 10 updating the set of training data with the new facial expression and the definition); and
generating a representation of the participant using the updated model to mimic the captured expression. (Per applicant’s argument, Chang, Tang and Karri, fail to teach, “generating a representation of the participant using the updated model to mimic the captured expression". Examiner respectfully provides Amento who teaches a virtual environment displays wherein the method further includes continually updating the avatar in response to the animation input data, which may be continually received or continually updated. In some embodiments, monitoring the viewer includes estimating facial expressions and the animation input data may include facial expression data., …to mimic actions by corresponding viewers. [0010-0012]. Please note that cameras 157 are used to generate animation data that is processed and translated by one or more animation modules in corresponding avatar emotions and actions. Each viewer has a corresponding avatar that may be shown in each instance of a virtual environment, [0045].
(B) Claims 1, 9 and 18, Tang, via Figs 1B and 1C, teaches a method, a non-transitory medium and a system for updating a model of a participant of a three dimensional (3D) video conference, the method comprises:
obtaining images of the participant during the 3D video conference; (As shown by reference number 106, the user device may obtain first video data of a first facial expression and/or a first eye gaze direction of the user, [0022]);
determining, by a change detector and based on the images, that a captured expression of the participant is an unmodeled expression by a trained model of the participant; (a sentiment analysis of the first facial expression to categorize the first facial expression as a negative facial expression, and/or the like, [0027]);
causing the trained model of the participant to be re-trained to incorporate the captured expression in response to the determination that the captured expression is an unmodeled expression; (the user device may train, retrain, update, and/or the like the product recommendation model based on any one or more of the first utterance, the first facial expression, the first eye gaze direction, the one or more first words, the sentiment category of the first utterance, the first emotion, the sentiment category of the first facial expression, the first attribute of the product observed by the user, and/or the like. Similarly, the user device may train, retrain, update, and/or the like the product recommendation model based on any one of the second utterance, the second facial expression, the second eye gaze direction, the one or more second words, the sentiment category of the second utterance, the second emotion, the sentiment category of the second facial expression, the second attribute of the product observed by the user, and/or the like, [0045]… Wherein the sentiment category of the second facial expression, the second attribute of the product observed by the user, and/or the like. In some implementations, the user device may determine that the second reaction is a positive reaction, [0035]);
receiving an updated model of the participant. (See steps above); and
generating a representation of the participant using the updated model to mimic the captured expression. (Per applicant’s argument, Chang, Tang and Karri, fail to teach, “generating a representation of the participant using the updated model to mimic the captured expression". Examiner respectfully provides Amento who teaches a virtual environment displays wherein the method further includes continually updating the avatar in response to the animation input data, which may be continually received or continually updated. In some embodiments, monitoring the viewer includes estimating facial expressions and the animation input data may include facial expression data., …to mimic actions by corresponding viewers. [0010-0012]. Please note that cameras 157 are used to generate animation data that is processed and translated by one or more animation modules in corresponding avatar emotions and actions. Each viewer has a corresponding avatar that may be shown in each instance of a virtual environment, [0045].
(C) Claims 1, 9 and 18, Karri teaches a method, a non-transitory medium and a system for updating a model of a participant of a three dimensional (3D) video conference, the method comprises:
obtaining images of the participant during the 3D video conference; (The user computing device 100 may be coupled to a video camera 114, internal or external to the user computing device 100, to capture video or still images of the user 112 face and optionally a biometric device 116 to capture biometric data from the user, such as heart rate, perspiration, body movement, etc. The user computing device 100 includes program components comprising: a gaze tracking analyzer 118 to determine from the video camera 114 a location in the document 108 at which the user 112 eyes are gazing; an emotion detector 120 to determine from the video camera 114 an emotion of the user 112 based on an image of the user 112 facial expressions; a biometric analyzer 122 to analyze biometric data read from the biometric device 116; a real-time content interface 124 to generate user reaction data 200 including information on a content value in a section 110.sub.i on which the user is gazing, user emotions, and biometric data to return to the content server 102 for analysis; [0018, 0035]);
determining, by a change detector and based on the images, that a captured expression of the participant is an unmodeled expression by a trained model of the participant; (A monitoring device detects user biometric data in response to detecting the section the user is observing. Input comprising the content value in the section the user is observing, the detected user biometric data, and personal information of the user is provided to a machine learning module to produce output indicating a likelihood that the user approved or disapproved of the content value in the section the user was observing, [0017]);
causing the trained model of the participant to be re-trained to incorporate the captured expression in response to the determination that the captured expression is an unmodeled expression; (Fig. 10, the machine learning module is retrained to improve the accuracy of the predicted likelihood of user approval or disapproval based on the user reaction data and personal information to provide more optimized and accurate output 134, [0044-0045];
receiving an updated model of the participant. (See above step or The machine learning module 132 is retrained (at block 1006) with the generated input 500 to output 134 the received indication of approval or disapproval, [0044]); and
generating a representation of the participant using the updated model to mimic the captured expression. (Per applicant’s argument, Chang, Tang and Karri, fail to teach, “generating a representation of the participant using the updated model to mimic the captured expression". Examiner respectfully provides Amento who teaches a virtual environment displays wherein the method further includes continually updating the avatar in response to the animation input data, which may be continually received or continually updated. In some embodiments, monitoring the viewer includes estimating facial expressions and the animation input data may include facial expression data., …to mimic actions by corresponding viewers. [0010-0012]. Please note that cameras 157 are used to generate animation data that is processed and translated by one or more animation modules in corresponding avatar emotions and actions. Each viewer has a corresponding avatar that may be shown in each instance of a virtual environment, [0045].
While Chang, Tang and Karri teaches the technical features that met the claimed requirement in various environments… but the conferencing environment.
Neustadt teaches, The networked media space or video communications system 290 of FIG. 1 advantageously supports video conferencing or video-telephony, particularly from one residential location to another. During a video communication event, comprising one or more video scenes, the video communication client 300 at the local site 362 can both transmit local video and audio signals to the remote site 364 and also receive remote video and remote audio signals from the remote site 364. As would be expected, the local user 10a at the local site 362 is able to see the remote user 10b (located at the remote site 364) as an image displayed locally on display 110, thereby enhancing human interaction. Image processor 320 can provide a number of functions to facilitate two-way communication, including improving the quality of image capture at the local site 362, improving the quality of images displayed at the local display 110, and handling the data for remote communication (by data compression, encryption, etc.), [0042]…
Gopal teaches, via Fig. 3A and 3B, a flowchart of a method for carrying out scheduled video chat sessions, according to various aspects of the present disclosure based on some rule, [0019].
OR
Kurtz presents a system for aiding family video-conferencing or video communications with one or more remote individuals. Such a system should function as seamlessly as is reasonably possible while being adaptable to the dynamic situations present in a residence. In particular, the system should enable the users to readily manage and maintain their privacy, relative at least to image capture, recording, and transmission. This system should also manage the contextual information of the user and their environments, to provide an effective communication experience.
Therefore it would have been obvious to the ordinary artisan before the effective filing date to incorporate the teaching of Amento into the teaching of Chang, Tang OR Karri for the purpose of explicitly detailing the generating/updating of an avatar/representation reflecting the user/participant’s latest emotion/expression and also to incorporate the teaching of Neustaedter, Gopal or Kurtz into the teaching of Chang, Tang OR Karri for the purpose of expanding the claimed features to include the conferencing benefit where a real time video communication link between two or more locations, and more particularly to an automated method for detecting and characterizing activity in a local environment, and then transmitting or recording video images, for either live or time shifted viewing in a remote location, respectively, depending on both the acceptability of the characterized images and the status of users at the remote viewing system based on the desired privacy rules when sending/receiving images.
Claims 2, 10 and 19, wherein the model of the participant was generated based on training expressions, wherein the captured expression determined to be an unmodeled expression based on a detected difference between the captured expression and the training expressions. (See the independent claims).
Claims 7-8, 15-16 and 20-21, determining one or more parameters of the one or more captured expressions; wherein the determining that the captured expression of the participant is an unmodeled expression is based on values of the one or more parameters; and, wherein the trained model of the participant is retrained using the one or more parameters. (Chang: facial expression, mental state (e.g., interested, happy, excited, bored, etc.); Tang: first/second facial expressions, a positive facial expression, a neutral facial expression, a negative facial expression, and/or the like, [0027]; Karri: facial expressions; biometric data, user gazing, user emotions, [0018]).
17. (Cancelled)
Claims 3 and 11. (Currently Amended) The method according to claim 1, further comprising: sending the images comprising the captured expression to a computerized system under privacy restrictions.
Neustaedter: [0060] The acceptability test 520 can operate by comparing the results or values obtained by characterizing the activity, or attributes thereof, appearing in the captured video content to the pre-determined acceptable content criteria for such attributes or activities, as provided by the local or remote users of the video communications clients 300 and 305. If the activity is not acceptable, video is not transmitted in real time to the respective remote video communications clients 305, nor is it recorded for future transmission and playback. In this case, delete video step 525 deletes the video from the frame buffer 347. Ongoing video capture and monitoring (capture video step 505 and detect activity step 510) can then continue. As an optional alternative, local user preferences can initiate a record video for local use step 557, during which acceptable video image content of activity in the local environment is automatically recorded, regardless of whether the resulting recorded video is ever transmitted to a remote site 364 or not. This resulting recoded video can be characterized, subjected to privacy constraints, and processed, in a similar manner to the time-shifted video that is recorded for transmission.
Gopal: [0019] In some instances, the preloading operations may include transmitting audio and/or video from a participant's device to one or more other devices for some period of time prior to the start of the meet-and-greet. For example, a short video clip may be transmitted to a server, which may determine (e.g., via human review, image analysis, etc.) whether the participant's video contains images that violate one or more rules (e.g., prohibited symbols, nudity or inappropriate attire, etc.). Alternatively and/or additionally, a video clip may be transmitted to the celebrity or public figure's device for him or her to decide whether or not to initiate the call, [0019].
Kurtz teaches an enhancement for privacy as seen in the table below:
PNG
media_image1.png
648
656
media_image1.png
Greyscale
Claims 4 and 12, wherein the privacy restrictions restrict a number of one or more images sent to the computerized system. (Kurtz teaches the limitations of the predetermined privacy and contextual settings, [0163]. It is obvious that user can set the rule of image quantity to be transmitted to any destination).
Claims 5 and 13, wherein the privacy restrictions restrict information embedded in the one or more images sent to the computerized system. ((Kurtz teaches the limitations of the predetermined privacy and contextual settings, [0163] and encryption is notoriously well known in the art… It is obvious that user can set condition with ease without much modification to the current references).
Claims 6 and 14, wherein the privacy restrictions prevent sending one or more images that enable a reconstruction of a content of the 3D video conference. (Gopal: preloading operations,” “preloading checks,” and the like may refer to steps in which video and/or audio information captured by the client device during preloading are analyzed and determined to represent images and/or sounds that preclude the user from proceeding with a private one-on-one video chat with the celebrity or public figure. For example, certain profanity, nudity, or other content may be prohibited on the platform, and the preloading stage may be used to prevent such prohibited content from being broadcast to the celebrity or public figure (e.g., using image analysis, video analysis, object recognition, human review, etc.). In other examples, the celebrity or public figure (or a representative of that celebrity or public figure) may be given a chance to preview the transmission from a client device prior to the private one-on-one meet-and-greet with that client device's user to determine for themselves whether to initiate or decline the meet-and-greet. In such cases, a celebrity's or public figure's decision to decline the meet-and-greet may be considered a preloading failure, in that the client device did not meet the requirement of prior approval by the celebrity or public figure, [0069]. Here examiner reads that no image will be sent before the preloading checks
Response to Arguments
Applicant’s arguments with respect to claim(s) filed 11/21/25 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicant argues that, “Generally, Chang is directed to a technique for tracking interactive events. A facial expression and interactive event are detected. A similar expression is identified, and the event is tagged in accordance with the previously tagged expression. See Chang, Abstract. The Examiner cites to paragraphs [0034]-[0035]. See Office Action, p. 4. A review of the cited portion of Chang reveals that Chang actually describes updating training data to include recognized facial
expressions and their associated definition. "This allows the expression classifier 150 to –be retrained on updated training data." The Examiner further cites to claim 10, which discloses updating the set of training data with the new facial expression. As a result, the retrained model can now classify additional expressions. Nothing in Chang describes a model that is used to generate a representation of the participant. To that end, Chang fails to disclose "generating a representation of the participant using the updated model to mimic the captured expression."
The Examiner additionally relies, in the alternative, on Tang. Generally, Tang is directed to a product recommendation model. See Tang, Abstract. The Examiner cites to paragraphs [0035] and [0045]. The cited portions of Tang are actually directed to determining a sentiment category based on a reaction by a user. Tang then generates a product recommendation based on the detected sentiment. Nothing in Tang describes a model that is used to generate a representation of the participant. To that end, Tang fails to disclose "generating a representation of the participant using the updated model to mimic the captured expression."
The Examiner additionally relies, in the alternative, on Karri. Generally, Karri is directed to a technique for providing targeted content to users. See Karri, Abstract. A user's facial reaction to content can be used to re-train a model configured to detect whether a user approves or disapproves of content. The Examiner cites to paragraphs [0044]- [0045]. The cited portions of Karri are actually directed to using user reaction data and personal information to retrain a machine learning model "to improve the accuracy of the predicted likelihood of user approval or disapproval based on the user reaction data." Nothing in Karri describes a model that is used to generate a representation of the participant. To that end, Tang fails to disclose "generating a representation of the participant using the updated model to mimic the captured expression."
For at least the reasons described above, the various combinations of cited art fail to disclose each and every feature of independent claim 1. Thus, claim 1 is allowable. Claims 9 and 18 are also independent and, thus, are also allowable for similar reasons to claim 1. The remaining claims depend, directly or indirectly, from claims 1, 9, and 18 and, thus, are allowable at least by virtue of their dependence from allowable independent claims. Withdrawal of the rejection is respectfully requested.
Examiner respectfully disagrees as examiner has produced additional teaching in Amento to address the applicant’s argument.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PHUNG-HOANG J. NGUYEN whose telephone number is (571)270-1949. The examiner can normally be reached on Reg. Sched. 6:00-3:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached on 571-272-7503. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/PHUNG-HOANG J NGUYEN/ Primary Examiner, Art Unit 2691