DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 6-9 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA the applicant regards as the invention.
Claim 6 recites limitations "an image editing process" and “explanatory variable”. It is unclear whether this occurrence of “an image editing process” is referring to the same occurrence limitation from claim 1 or it is a different “image editing process". It is unclear to how the “explanatory variable” is an explained variable.
Claim 7 recites the limitations “the predicted type”. There is insufficient antecedent basis for this limitation in the claim.
All dependent claims are also rejected based on their dependency of the defected parent claims.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-5, 10-14 are rejected under 35 U.S.C. 103 as being unpatentable over Ni et al. (US20180027307) in view of Cai et al. (Emotion Recognition Using Different Sensors, Emotion Models, Methods and Datasets: A Comprehensive Review).
Regarding claim 1, Ni teaches:
An imaging apparatus system comprising:
one or more image sensors configured to generate video data by capturing images; (Ni at least in par. [0003], teaches a camera on a client device may be initialized to capture frames of video of a user in response to determining that the user is viewing content.)
an interface device for acquiring data, the interface device acquiring at least one of audio data acquired during image capturing (Ni at least in pars. [0042, 0044, 0051], teaches emotional reaction data may be identified from video, audio, and/or other information relating to a user (e.g., a microphone recording a user crying, a camera capturing a user crossing her arms in frustration, the camera capturing a user waving her arms while cheering, etc.).)
Ni is silent to teach acquiring biometric data that is biometric information of a user acquired during the image capturing in addition to capturing other data set for determining user emotion;
On the other hand, Cai teaches acquiring biometric data that is biometric information of a user acquired during the image capturing in addition to capturing other data set for determining user emotion; (Cai at least in Figs. 8-9 and the end of Section 6. Datasets, teaches to consider the synchronization of multi-channel signals during the recording of different sensors... Multi-modal signal datasets can make the machine’s analysis of emotion more comprehensive.)
PNG
media_image1.png
495
1289
media_image1.png
Greyscale
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to acquire and to use multiple user data channels that relate to identifying user’s emotion from in Cai, with Ni’s user-avatar building system. The combination provides multi-modal signal datasets that can make the machine’s analysis of emotion more comprehensive.
Ni in view of Cai teaches:
a storage configured to record a data set in which at least one of the audio data and the biometric data is correlated with the video data; (Ni at least in pars. [0022, 0042], teaches storing many types of signals, such as in memory as physical memory states. It implies storing emotional reaction data identified from video, audio, and/or other information relating to a user. Cai at least in Figs. 8-9 and the end of Section 6. Datasets, teaches recording of different sensors.)
one or more signal processing circuits configured to determine an emotion the user felt regarding an event that was occurring at the time of the image capturing, using at least one of the video data, the audio data, and the biometric data, and configured to execute an image editing process on the video data depending on the determined emotion. (Ni in pars. [0029, 0034], processors. Ni par. [0051], teaches responsive to determining that the user 501 is viewing the eSports championship 504, a camera 512 and/or a microphone 514 may be initialized. The camera 512 may be used to capture one or more frames of video of the user 501. The frames may be evaluated to identify facial features of the user 501, such as a first ear 518, a second ear 516, a first eyebrow 520, a second eyebrow 522, a first eye 528, a second eye 526, a nose 524, a mouth 530, and/or other facial features. The frames may be captured and evaluated in real-time to generate sets of landmark points representing locations of facial features within the frames as the user 501 views the eSports championship 504. The microphone 514 may be utilized to capture audio content of the user 501 that is evaluated to identify a mood of the user… pars. [0058-0060], the expressive (edited) avatar may be displayed to the user while the user views the content. Cai at least in Figs. 8-9 and the end of Section 6. Datasets, teaches recording of different sensors.)
Regarding claim 2, Ni in view of Cai teaches:
The imaging apparatus system according to claim 1, wherein
the one or more signal processing circuits grasp the event using the video data and determine an emotion the user felt at the event using at least one of the audio data and the biometric data. (Ni in pars. [0029, 0034], processors. Ni par. [0051], teaches responsive to determining that the user 501 is viewing the eSports championship 504, a camera 512 and/or a microphone 514 may be initialized. The camera 512 may be used to capture one or more frames of video of the user 501. The frames may be evaluated to identify facial features of the user 501, such as a first ear 518, a second ear 516, a first eyebrow 520, a second eyebrow 522, a first eye 528, a second eye 526, a nose 524, a mouth 530, and/or other facial features. The frames may be captured and evaluated in real-time to generate sets of landmark points representing locations of facial features within the frames as the user 501 views the eSports championship 504. The microphone 514 may be utilized to capture audio content of the user 501 that is evaluated to identify a mood of the user… Cai at least in Figs. 8-9 above, teaches contribution from different sensors to user’s emotion.)
Regarding claim 3, Ni in view of Cai teaches:
The imaging apparatus system according to claim 2, wherein
the interface device acquires line-of-sight data indicating direction of the user's line of sight at the time of the image capturing, and wherein the one or more signal processing circuits grasp the event using the video data and the line-of-sight data and determine an emotion the user felt at the event using at least one of the audio data and the biometric data. (Ni in pars. [0029, 0034], processors. Ni par. [0051], teaches responsive to determining that the user 501 is viewing the eSports championship 504, a camera 512 and/or a microphone 514 may be initialized. The camera 512 may be used to capture one or more frames of video of the user 501. The frames may be evaluated to identify facial features of the user 501, such as a first ear 518, a second ear 516, a first eyebrow 520, a second eyebrow 522, a first eye 528, a second eye 526, a nose 524, a mouth 530, and/or other facial features. The frames may be captured and evaluated in real-time to generate sets of landmark points representing locations of facial features within the frames as the user 501 views the eSports championship 504. The microphone 514 may be utilized to capture audio content of the user 501 that is evaluated to identify a mood of the user… Cai at least in Figs. 8-9 above, teaches contribution from different sensors to user’s emotion. Ni in par. [0003], teaches a camera on a client device may be initialized to capture frames of video of a user in response to determining that the user is viewing content (e.g., the user is viewing a presidential inauguration video for which the user desires to share emotional reactions with other users). It implies the determining of user’s line of sight so that the user is viewing the displaying content.)
Regarding claim 5, Ni in view of Cai teaches:
The imaging apparatus system according to claim 3, wherein
the one or more signal processing circuits calculate a first value from the audio data, calculate a second value from the biometric data, and calculate a third value from the line-of-sight data, and wherein the one or more signal processing circuits determine the emotion based on a total value from the first value to the third value. (Ni in par. [0003], teaches a camera on a client device may be initialized to capture frames of video of a user in response to determining that the user is viewing content (e.g., the user is viewing a presidential inauguration video for which the user desires to share emotional reactions with other users). It implies the determining of user’s line of sight so that the user is viewing the displaying content. Cai at least in Figs. 8-9, teaches combining the calculated value data from different sensors to determine the user’s emotion.)
Regarding claim 4, Ni in view of Cai teaches:
The imaging apparatus system according to claim 1, wherein
the one or more image sensors comprise a first image sensor configured to capture images of a predetermined subject to generate first video data, (Ni at least in par. [0003], teaches the user is viewing a presidential inauguration video… Ni at least in par. [0060], teaches the content may be evaluated to identify a visual feature of the content (e.g., the content may comprise a president speech in a field, where a president is identified as a first entity visual feature… Ni par. [0038], One or more computing devices and/or techniques for emotional reaction sharing are provided. Users, viewing content such as a live stream of a video, may desire to share emotional reactions to the video. The live stream of a video (predetermined subject/president) is captured by an image sensor.) and
a second image sensor configured to capture images of the user's face to generate second video data, (Ni at least in par. [0003], teaches a camera on a client device may be initialized to capture frames of video of a user in response to determining that the user is viewing content (e.g., the user is viewing a presidential inauguration video for which the user desires to share emotional reactions with other users). It implies the determining of user’s line of sight so that the user is viewing the displaying content.) and
wherein the one or more signal processing circuits grasp the event using the first video data and determine an emotion the user felt at the event using the second video data and at least one of the audio data and the biometric data. (Ni in pars. [0029, 0034], processors. Ni in par. [0051], teaches responsive to determining that the user 501 is viewing the eSports championship 504, a camera 512 and/or a microphone 514 may be initialized. The camera 512 may be used to capture one or more frames of video of the user 501. The frames may be evaluated to identify facial features of the user 501, such as a first ear 518, a second ear 516, a first eyebrow 520, a second eyebrow 522, a first eye 528, a second eye 526, a nose 524, a mouth 530, and/or other facial features. The frames may be captured and evaluated in real-time to generate sets of landmark points representing locations of facial features within the frames as the user 501 views the eSports championship 504. The microphone 514 may be utilized to capture audio content of the user 501 that is evaluated to identify a mood of the user… Cai at least in Figs. 8-9 above, teaches contribution from different sensors to user’s emotion. Ni in par. [0003], teaches a camera on a client device may be initialized to capture frames of video of a user in response to determining that the user is viewing content (e.g., the user is viewing a presidential inauguration video for which the user desires to share emotional reactions with other users).
Regarding claim 10, Ni in view of Cai teaches:
The imaging apparatus system according to claim 1, comprising:
an imaging apparatus and a server apparatus that are communicable with each other, (Ni in Fig.1, teaches client devices - servers networks.)
wherein the imaging apparatus comprises: the one or more image sensors configured to generate the video data; a microphone configured to generate the audio data; and a biometric sensor configured to generate the biometric data (Ni at least in pars. [0029, 0034], processors. Ni at least in par. [0051], teaches responsive to determining that the user 501 is viewing the eSports championship 504, a camera 512 and/or a microphone 514 may be initialized. The camera 512 may be used to capture one or more frames of video of the user 501. The frames may be evaluated to identify facial features of the user 501, such as a first ear 518, a second ear 516, a first eyebrow 520, a second eyebrow 522, a first eye 528, a second eye 526, a nose 524, a mouth 530, and/or other facial features. The frames may be captured and evaluated in real-time to generate sets of landmark points representing locations of facial features within the frames as the user 501 views the eSports championship 504. The microphone 514 may be utilized to capture audio content of the user 501 that is evaluated to identify a mood of the user… Ni at least in par. [0052], teaches a user reaction distribution service (e.g., a server) may be configured to receive sets of landmark points from client devices (e.g., scalable to hundreds of thousands of client devices), evaluate the sets of landmark points to identify facial expressions of users viewing content, and provide a facial expression to client devices. Cai at least in Figs. 8-9 above, teaches contribution from different (including biometric) sensors to user’s emotion.)
wherein the server apparatus comprises: the storage; and the one or more signal processing circuits. (Ni at least in par. [0030], teaches the server 104 may comprise a mainboard featuring one or more communication buses 212 that interconnect the processor 210, the memory.)
Regarding claim 11, Ni in view of Cai teaches:
Claims 11-14 recite similar limitations of claims 1-5, and 10, rationale to reject claims 1-5, and 10 is applied to claims 11-14 with additional limitations of a server to perform the functions including an image editing process on the video data with factor (preference) of the user's emotion (Ni at least in pars. [0047, 0062], teaches the avatar may be selected for presentation to the second user based upon the user specifying a preference for being represented to other users using the avatar (e.g., the user may prefer to be depicted as a robot to other users). The facial expression data 906 may comprise other metadata and tags, such as an avatar preference of a user from which the crying with tears facial expression was identified (e.g., the avatar preference may indicate that the user prefers to be represented as a cat if other users have not specified how users are to be displayed to them). Ni at least in par. [0052], teaches a user reaction distribution service (e.g., a server) may be configured to receive sets of landmark points from client devices (e.g., scalable to hundreds of thousands of client devices), evaluate the sets of landmark points to identify facial expressions of users viewing content, and provide a facial expression to client devices.)
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Please see form PRO-892.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PHUC N DOAN whose telephone number is (571)270-3397. The examiner can normally be reached on Monday - Friday: 9am - 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Faulk Devona can be reached on (571) 272-7515. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/PHUC N DOAN/Examiner, Art Unit 2618