DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
Applicant is reminded of the proper content of an abstract of the disclosure.
A patent abstract is a concise statement of the technical disclosure of the patent and should include that which is new in the art to which the invention pertains. The abstract should not refer to purported merits or speculative applications of the invention and should not compare the invention with the prior art.
If the patent is of a basic nature, the entire technical disclosure may be new in the art, and the abstract should be directed to the entire disclosure. If the patent is in the nature of an improvement in an old apparatus, process, product, or composition, the abstract should include the technical disclosure of the improvement. The abstract should also mention by way of example any preferred modifications or alternatives.
Where applicable, the abstract should include the following: (1) if a machine or apparatus, its organization and operation; (2) if an article, its method of making; (3) if a chemical compound, its identity and use; (4) if a mixture, its ingredients; (5) if a process, the steps.
Extensive mechanical and design details of an apparatus should not be included in the abstract. The abstract should be in narrative form and generally limited to a single paragraph within the range of 50 to 150 words in length.
See MPEP § 608.01(b) for guidelines for the preparation of patent abstracts.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-6 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Osada (U.S. Patent Pub. No. 2018/0268262) in view of Francis (U.S. Patent Pub. No. 2022/0277217).
Regarding Claim 1, Osada teaches an information processing apparatus comprising:
at least one memory that is configured to store instructions; and
at least one processor that is configured to execute the instructions to (Fig. 2; ¶40 The CPU 18 reads a processing program stored in the ROM 20 or the HDD 26, and performs the functions of the learner 141, the inferrer 142, and the data interpolator 143 using the RAM 22 as a working memory:)
acquire a first modal set that includes at least a first type of modal of multiple types of modals and that does not include a second type of modal of the multiple types of modals (Fig. 4; ¶24 The data acquirers 10, 12 acquire different data (observational data), and supply the data to the processor 14. From the viewpoint of improvement of accuracy of inference processing by the processor 14, it is desirable that the observational data acquired by the data acquirers 10, 12 correlate to each other. For instance, the data acquirer 10 acquires data on motion of joints of a person (first modality), and the data acquirer 12 acquires voice data on the person (second modality);)
allow a modal generation model to output at least the second type of modal, by inputting the first modal set to the modal generation model, wherein the modal generation model outputs at least one of the multiple types of modals in a case where at least one of the multiple types of modals is inputted (¶32 The inferrer 142 uses a model obtained through learning by the learner 141 to infer the emotion of a person using the observational data acquired by the data acquirers 10, 12. The inferrer 142 outputs a result of the inference to the output 16. Although the inferrer 142 infers an emotion basically using both the observational data acquired by the data acquirers 10, 12, even when the observational data of either one of the data acquirers 10, 12 is missing, the inferrer 142 continues to make emotion inference using the remaining not missing observational data; Examiners note: The emotion is still determined when only having one of the modalities see Fig. 4 and the data consisting of a modality and missing data constitutes a first modal set,) and the modal generation model is generated by machine learning using a second modal set including the multiple types of modals; and (¶27 The learner 141 collects training data from the observational data acquired by the data acquirers 10, 12 (voice and motion data together without any data missing constitutes as the second modal set), and performs machine learning using the training data.)
generate a third modal set including the first modal set and the second type of modal outputted by the modal generation model (¶33 When data is missing in either of the data acquirers 10 and 12, the data interpolator 143 interpolates the missing data using an inference result obtained by the inferrer 142. The data interpolation includes a process of generating observational data using an inference result obtained by the inferrer 142, specifically, an emotion as an inferred latent factor, and a process of interpolating a missing portion by the generated observational data. The data interpolator 143 outputs a result of the data interpolation to the output 16.)
Osada implies but does not explicitly disclose the modal generation model is generated by machine learning using a second modal set including the multiple types of modals.
Francis is in the same field of art of image analysis. Further, Francis teaches the modal generation model is generated by machine learning using a second modal set including the multiple types of modals (Francis, Fig 3a and 3b; ¶36 As shown in FIG. 3A, a system may include a computing pipeline for multimodal scene understanding. The system may receive information from multiple sensors. In the embodiment shown below, there are two sensors utilized; ¶43 FIG. 3B is an alternative embodiment of a computing pipeline. The alternative embodiment may include, for example, a process to allow a fusion module 320 to obtain the features from the feature extraction or decoder. The fusion module may then fuse all the data to generate a data set to be fed a single machine learning model/decoder.)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Osada by generating by machine learning a model using multiple types of modal information that is taught by Francis; thus, one of ordinary skilled in the art would be motivated to combine the references to perform scene determination beyond what a human can do (Francis ¶35).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.
Regarding Claim 2, Osada in view of Francis discloses the information processing apparatus according to claim 1, wherein the at least one processor is configured to execute the instructions to generate the modal generation model by performing the machine learning using the second modal set (Osada, ¶27 The learner 141 collects training data from the observational data acquired by the data acquirers 10, 12 (voice and motion data together without any data missing constitutes as the second modal set), and performs machine learning using the training data.)
Regarding Claim 3, Osada in view of Francis discloses the information processing apparatus according to claim 1,wherein the at least one processor is configured to execute the instructions to allow the modal generation model to output the second type of modal (see claim 1), by inputting, to the modal generation model, environmental information indicating an acquisition environment when the first type of modal is acquired (Francis, ¶4 The processor is programmed to receive the first and second set of information indicative of the environment, extract one or more data features associated with the images and sound information utilizing an encoder, output metadata via a decoder to a spatiotemporal reasoning engine, wherein the metadata is derived utilizing the decoder and the one or more data features.)
Regarding Claim 4, Osada in view of Francis discloses the Information processing apparatus according to claim 3, wherein
the modal generation model includes: an encoder unit (Osada, ¶42 FIG. 3 and FIG. 4 schematically illustrate the processing by the processor 14. The processor 14 infers an emotion of a person using motion data acquired in time series, and voice data acquired in time series. Although emotions to be inferred include happiness, sadness, fear, anger, dislike, contempt, the emotions are not limited to these. Known techniques for inferring a latent factor include the hidden Markov model (HMM), the recurrent neural network (RNN), the autoencoder (AE), and the variational autoencoder (VAE).) that transforms features of at least one of the multiple types of modals into latent variables in a case where at least one of the multiple types of modals is inputted; and a decoder unit that generates at least one of the multiple types of modals by reconstructing the latent variables, and (Osada, ¶65 FIG. 10 and FIG. 11 schematically illustrate learning processing performed by the learner 141. Ley x be collected observational data (motion data and voice data), H be a characteristic quantity of the observational data, z be a latent variable (emotion), and y be a label, the learner 141 uses the RNN for learning calculation of a characteristic quantity H from the collected observational data x.sub.given, and uses the VAE, as learning using data without a label, for learning calculation (encoding) of a latent variable (emotion) z from the characteristic quantity H. In addition to these, in order to ensure the accuracy of calculation of the latent variable (emotion) z, data with a label is used, and a label y.sub.inferred corresponding to the calculated latent variable z is compared with a label y.sub.given as the correct data.)
the at least one processor is configured to execute the instructions to input the environmental information to at least one of the encoder unit and the decoder unit (Francis, ¶4 The processor is programmed to receive the first and second set of information indicative of the environment, extract one or more data features associated with the images and sound information utilizing an encoder, output metadata via a decoder to a spatiotemporal reasoning engine, wherein the metadata is derived utilizing the decoder and the one or more data features.)
Regarding Claim 5, Osada in view of Francis discloses the information processing apparatus according to claim 1, wherein the at least one processor is configured to execute the instructions to generate a modal estimation model that outputs a fourth type of modal that is different from a third type of modal of the multiple types of modals as an output modal, in a case where the third type of modal of the multiple types of modals is inputted as an input modal, by performing machine learning using the third modal set (Osada, ¶43 Also, even when one of the motion data and the voice data, for instance, the voice data is temporarily missing for some reasons, the processor 14 interpolates the missing voice data by continuing to make emotion inference processing and using an inferred emotion. As long as an emotion can be inferred, the missing voice data may not necessarily need to be interpolated. However, for instance, when acquired voice data is converted to text data and the text data is utilized (fourth modal type,) the missing voice data may need to be interpolated, and thus the missing data is interpolated in consideration of such a situation.)
Regarding Claim 6, Osada in view of Francis discloses the information processing apparatus according to claim 1,wherein the multiple types of modals are multiple types of biometric information including information about at least one of a face image, heart rate, and oxygen saturation (Osada, ¶121 Specifically, in the case where an emotion as a latent factor is inferred from motion data on the joints, motion data on the face, and the voice data as the observational data, when the motion data on the face has a missing portion, the missing portion of the motion data on the face is interpolated and outputted based on the inferred emotion. This can be expressed such that the motion of the face is artificially composed and simulated.)
Regarding claim 10, claim 10 has been analyzed with regard to claim 1 and is rejected for the same reasons of obviousness as used above.
Claims 7-9 are rejected under 35 U.S.C. 103 as being unpatentable over Osada (U.S. Patent 7Pub. No. 2018/0268262) in view of Francis (U.S. Patent Pub. No. 2022/0277217) in view of Darling (U.S. Patent Pub. No. 2022/0270344).
Regarding Claim 7, Osada teaches a biometric information estimation apparatus comprising:
at least one memory that is configured to store instructions; and
at least one processor that is configured to execute the instructions to (Fig. 2; ¶40 The CPU 18 reads a processing program stored in the ROM 20 or the HDD 26, and performs the functions of the learner 141, the inferrer 142, and the data interpolator 143 using the RAM 22 as a working memory:)
acquire a face image of a target person; and (Osada, ¶121 Specifically, in the case where an emotion as a latent factor is inferred from motion data on the joints, motion data on the face, and the voice data as the observational data, when the motion data on the face has a missing portion, the missing portion of the motion data on the face is interpolated and outputted based on the inferred emotion. This can be expressed such that the motion of the face is artificially composed and simulated.)
(¶32 The inferrer 142 uses a model obtained through learning by the learner 141 to infer the emotion of a person using the observational data acquired by the data acquirers 10, 12. The inferrer 142 outputs a result of the inference to the output 16. Although the inferrer 142 infers an emotion basically using both the observational data acquired by the data acquirers 10, 12, even when the observational data of either one of the data acquirers 10, 12 is missing, the inferrer 142 continues to make emotion inference using the remaining not missing observational data; Examiners note: The emotion is still determined when only having one of the modalities see Fig. 4 and the data consisting of a modality and missing data constitutes a third modal set,) by performing machine learning using a third modal set including (i) a first modal set that includes at least a first type of modal of the multiple types of modals and that does not include a second type of modal of the multiple types of modals, and (ii) the second type of modal outputted by a modal generation model, wherein (Fig. 4; ¶24 The data acquirers 10, 12 acquire different data (observational data), and supply the data to the processor 14. From the viewpoint of improvement of accuracy of inference processing by the processor 14, it is desirable that the observational data acquired by the data acquirers 10, 12 correlate to each other. For instance, the data acquirer 10 acquires data on motion of joints of a person (first modality), and the data acquirer 12 acquires voice data on the person (second modality);)
the modal generation model outputs at least one of the multiple types of modals in a case where at least one of the multiple types of modals is inputted, and the modal generation model is generated by machine learning using a second modal set including the multiple types of modals (¶27 The learner 141 collects training data from the observational data acquired by the data acquirers 10, 12 (voice and motion data together without any data missing constitutes as the second modal set), and performs machine learning using the training data.)
Osada does not explicitly disclose the following allow a modal estimation model to output biometric information on the target person as an output modal; the modal generation model is generated by machine learning using a second modal set including the multiple types of modals.
Francis is in the same field of art of image analysis. Further, Francis teaches the modal generation model is generated by machine learning using a second modal set including the multiple types of modals (Francis, Fig 3a and 3b; ¶36 As shown in FIG. 3A, a system may include a computing pipeline for multimodal scene understanding. The system may receive information from multiple sensors. In the embodiment shown below, there are two sensors utilized; ¶43 FIG. 3B is an alternative embodiment of a computing pipeline. The alternative embodiment may include, for example, a process to allow a fusion module 320 to obtain the features from the feature extraction or decoder. The fusion module may then fuse all the data to generate a data set to be fed a single machine learning model/decoder.)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Osada by generating by machine learning a model using multiple types of modal information that is taught by Francis; thus, one of ordinary skilled in the art would be motivated to combine the references to perform scene determination beyond what a human can do (Francis ¶35).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.
Darling is in the same field of art of image analysis. Further, Darling teaches allow a modal estimation model to output biometric information on the target person as an output modal (¶55 The present invention therefore provides a way of identifying and removing spectral components in the PPG image signal which result from artificial (ambient) light interference. It also provides an elegant and simple way of obtaining the actual PPG signal frequency which corresponds to the heart rate. Another aspect uses a similar method to obtain a breathing rate measurement from the PPG image signal. It is also possible with the embodiments to obtain a measurement of the peripheral arterial blood oxygen saturation SpO2. Furthermore, even in such environments with ambient light interference, multi-modal techniques can be combined to overcome such detriments a yet provide a higher confidence diagnosis or result if just using the camera alone.)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Osada in view of Francis by outputting biometric information based on multimodal techniques that is taught by Darling; thus, one of ordinary skilled in the art would be motivated to combine the references to diagnose with greater accuracy (Darling ¶3).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.
Regarding Claim 8, Osada in view of Francis in view of Darling discloses the biometric information estimation apparatus according to claim 7, wherein the at least one processor is configured to execute the instructions to acquire the face image from an image generation apparatus that generates the face image by imaging the target person for whom the face image is generated, through a communication line (Osada, ¶121 Specifically, in the case where an emotion as a latent factor is inferred from motion data on the joints, motion data on the face, and the voice data as the observational data, when the motion data on the face has a missing portion, the missing portion of the motion data on the face is interpolated and outputted based on the inferred emotion. This can be expressed such that the motion of the face is artificially composed and simulated.)
Regarding Claim 9, Osada in view of Francis in view of Darling discloses the biometric information estimation apparatus according to claim 7, wherein the biometric information includes information about at least one of heart rate and oxygen saturation (Darling, ¶55 The present invention therefore provides a way of identifying and removing spectral components in the PPG image signal which result from artificial (ambient) light interference. It also provides an elegant and simple way of obtaining the actual PPG signal frequency which corresponds to the heart rate. Another aspect uses a similar method to obtain a breathing rate measurement from the PPG image signal. It is also possible with the embodiments to obtain a measurement of the peripheral arterial blood oxygen saturation SpO2. Furthermore, even in such environments with ambient light interference, multi-modal techniques can be combined to overcome such detriments a yet provide a higher confidence diagnosis or result if just using the camera alone.)
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DUSTIN BILODEAU whose telephone number is (571)272-1032. The examiner can normally be reached 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached at (571) 272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/DUSTIN BILODEAU/Examiner, Art Unit 2664
/JENNIFER MEHMOOD/Supervisory Patent Examiner, Art Unit 2664