Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This action is responsive to the Amendment filed on 08/27/2025. Claims 1, 11, 16 have been amended. Claims 1-20 are pending in the case. Claims 1, 11 and 16 are independent claims.
Response to Arguments
Examiner note that “common portion” of the neural network, without any specificity of what that is, under BRI, is treated as a software module.
In regard to 103 rejection, Applicant’s arguments with respect to claims 1-20 have been considered and are persuasive. Previous rejection is withdrawn. A new reference is used and the current arguments do not apply to the newly cited reference that renders the claims obvious.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Atallah et al (US 20240091623 A1), provisional filed on 2/20/2018 in view of El Kaliouby et al (US 20170337438 A1) and in further view of Waxman et al (US 20110251985 A1).
Referring to claims 1 and 11 and 16, Atallah discloses a method for detecting two or more physiological signals of a subject based on a video input of the subject obtained using a mobile device and performed using local computational resources of the mobile device, ([0006] of Atallah, video input or images, to estimate physiological conditions and [0034] of Atallah, video input includes facial area of the individual. [0016] of Atallah, smart phone) comprising the steps of:
obtaining the video input, wherein the video input comprises a sequence of frames of image data depicting a face of the subject; (it is well understood that video data is a sequence of frames of image data, here [0006] of Atallah, video input or images, to estimate physiological conditions and [0034] of Atallah, video input includes facial area of the individual)
applying the video input to a multi-head neural network model to generate estimates of two or more physiological signals of the subject, wherein the multi-head neural network model is trained from a set of facial video inputs from a multitude of other subjects, to predict the two or more physiological signals from the video input, wherein applying the video input to the multi-head neural network model includes ([0006] of the Specification defined “multi-head neural network model” as a machine learning model in the form of trained neural network that has more than one output layers and associated predictions (referred to as "heads")” and “While the following disclosure gives an example of two-head neural network model predicting heart rate and respiratory rate”, here [0019] of Atallah, neural networks may include multiple layers (e.g., where a signal path traverses from front layers to back layers). In some embodiments, back propagation techniques may be utilized by the neural networks, where forward stimulation is used to reset weights on the “front” neural units and [0034] of Atallah, “prediction component 114 is configured to determine individual 700's vital signs (e.g., heart rate, respiratory rate, SpO2, etc.) by determining (i) one or more changes in the color of the face of individual 700 during one or more cardiac cycles and (ii) chest wall movements”).
Even though Atallah discloses figuring out different videoed contents using different layers, but Atallah does not specifically disclose “applying the video input to a common portion, of the multi-head neural network model, that comprises one or more common neural network layers to generate an intermediate output, applying the intermediate output to a first head, of the multi-head neural network model, that comprises one or more first head neural network layers to generate an estimate of a first physiological signal of the two or more physiological signals of the subject, wherein the first physiological signal is a rate of a physiological process of the subject across the video input, and applying the intermediate output to a second head, of the multi-head neural network model, that comprises one or more second head neural network layers to generate an estimate of a second physiological signal of the two or more physiological signals of the subject, wherein the second physiological signal is another rate of a second physiological process of the subject across the video input, and wherein the one or more common neural network layers include more layers than either of the one or more first head neural network layers or the one or more second head neural network layers.”
However, El Kaliouby discloses applying the video input to a common portion, of the multi-head neural network model, that comprises one or more common neural network layers to generate an intermediate output (Figs. 17-18 and [0110]-[0119] of El Kaliouby, video input is inputted though a common portion 1820 to generate an intermediate output), applying the intermediate output to a first head, of the multi-head neural network model, that comprises one or more first head neural network layers to generate an estimate of a first physiological signal of the two or more physiological signals of the subject, wherein the first physiological signal is a rate of a physiological process of the subject across the video input, and applying the intermediate output to a second head, of the multi-head neural network model, that comprises one or more second head neural network layers to generate an estimate of a second physiological signal of the two or more physiological signals of the subject, wherein the second physiological signal is another rate of a second physiological process of the subject across the video input, and wherein the one or more common neural network layers include more layers than either of the one or more first head neural network layers or the one or more second head neural network layers.”( Figs. 17-18 and [0110]-[0119] of El Kaliouby, “The intermediate layers can include a Rectified Linear Units (RELU) layer 1826. The output of the pooling layer 1824 can be input to the RELU layer 1826. In embodiments, the RELU layer implements an activation function such as f(x)−max(0,x), thus providing an activation with a threshold at zero. In some embodiments, the RELU layer 1826 is a leaky RELU layer. In this case, instead of the activation function providing zero when x<0, a small negative slope is used, resulting in an activation function such as f(x)=1(x<0)(αx)+1(x>=0)(x). This can reduce the risk of “dying RELU” syndrome, where portions of the network can be “dead” with nodes/neurons that do not activate across the training dataset. The image analysis can comprise training a multilayered analysis engine using the plurality of images, wherein the multilayered analysis engine can include multiple layers that include one or more convolutional layers 1822 and one or more hidden layers, and wherein the multilayered analysis engine can be used for emotional analysis… The example 1800 includes a fully connected layer 1830. The fully connected layer 1830 processes each pixel/data point from the output of the collection of intermediate layers 1820. The fully connected layer 1830 takes all neurons in the previous layer and connects them to every single neuron it has. The output of the fully connected layer 1830 provides input to a classification layer 1840. The output of the classification layer 1840 provides a facial expression and/or mental state as its output. Thus, a multilayered analysis engine such as the one depicted in FIG. 18 processes image data using weights, models the way the human visual cortex performs object recognition and learning, and is effective for analysis of image data to infer facial expressions and mental states.” And [0034] of Atallah, “prediction component 114 is configured to determine individual 700's vital signs (e.g., heart rate, respiratory rate, SpO2, etc.) by determining (i) one or more changes in the color of the face of individual 700 during one or more cardiac cycles and (ii) chest wall movements”)
Atallah and El Kaliouby are analogous art because both references concern detecting physiological changes within an objects using trained network. Accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Atallah’ object’s vital signs associated with detected video with classifying the estimated physiological signals by sharing common CNN layer as taught by El Kaliouby. The motivation for doing so would have been automatically identifying the ROIs, expert intervention is not required to perform specialized OCT imaging and thus, such imaging and analysis can be provided at more facilities and for more subjects for a lower cost.
Atallah in view of El Kaliouby do not specifically disclose “wherein the first head and the second head comprise respective separate predictor neural networks that separately depend from the intermediate output of the common portion, wherein the estimate of the first physiological signal is not based on any output of the one or more second head neural network layers, and wherein the estimate of the second physiological signal is not based on any output of the one or more first head neural network layers.”
However, Waxman discloses wherein the first head and the second head comprise respective separate predictor neural networks that separately depend from the intermediate output of the common portion, wherein the estimate of the first physiological signal is not based on any output of the one or more second head neural network layers, and wherein the estimate of the second physiological signal is not based on any output of the one or more first head neural network layers (Fig. 2B and [0059]-[0064] of Waxman, where the multiple output of the physiological event result are displayed after the data is processed through the “common portion” to the intermediate portion of 242 and 252).
Atallah and El Kaliouby and Waxman are analogous art because both references concern detecting physiological changes within an objects using trained network. Accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Atallah’ object’s vital signs associated with detected video with classifying the estimated physiological signals by sharing common CNN layer as taught by El Kaliouby and multiple physiological output after input being processed as taught by Waxman. The motivation for doing so would have been automatically identifying the ROIs, expert intervention is not required to perform specialized OCT imaging and thus, such imaging and analysis can be provided at more facilities and for more subjects for a lower cost.
Referring to claim 2, Atallah in view of El Kaliouby and Waxman disclose the method of claim 1, wherein the sequence of frames of image data depict the face and a chest of the subject and wherein the first and second physiological signals comprise heart rate and respiratory rate, respectively. ([0034] of Atallah, “prediction component 114 is configured to determine individual 700's vital signs (e.g., heart rate, respiratory rate, SpO2, etc.) by determining (i) one or more changes in the color of the face of individual 700 during one or more cardiac cycles and (ii) chest wall movements”)
Referring to claim 3, Atallah in view of El Kaliouby and Waxman disclose the method of claim 1, wherein obtaining the video input comprises operating a camera ([0004] of Atallah, camera) of a smartphone ([0016] of Atallah, smart phone) to generate the sequence of frames of image data (it is well understood that video data is a sequence of frames of image data, here [0006] of Atallah, video input or images, to estimate physiological conditions and [0034] of Atallah, video input includes facial area of the individual) and wherein providing the video input to the multi-head neural network model and generating with the model the estimate of the two or more physiological signals are performed by one or more processors of the smartphone. ([0034] of Atallah, “prediction component 114 is configured to determine individual 700's vital signs (e.g., heart rate, respiratory rate, SpO2, etc.) by determining (i) one or more changes in the color of the face of individual 700 during one or more cardiac cycles and (ii) chest wall movements”)
Referring to claim 4, Atallah in view of El Kaliouby and Waxman disclose the method of claim 3, further comprising: transmitting, by the smartphone over a communications network, an indication of the estimate of the two or more physiological signals of the subject. ([0034] of Atallah, “prediction component 114 is configured to determine individual 700's vital signs (e.g., heart rate, respiratory rate, SpO2, etc.) by determining (i) one or more changes in the color of the face of individual 700 during one or more cardiac cycles and (ii) chest wall movements” and [0019] of Atallah, information through network)
Referring to claim 5, Atallah in view of El Kaliouby and Waxman disclose the method of claim 1, wherein obtaining the video input comprises operating a camera of a smartphone to generate the sequence of frames of image data. (it is well understood that video data is a sequence of frames of image data, here [0006] of Atallah, video input or images, to estimate physiological conditions and [0034] of Atallah, video input includes facial area of the individual)
Referring to claim 6, Atallah in view of El Kaliouby and Waxman disclose the method of claim 5, further comprising: transmitting an indication of the video input from the smartphone to a remote computing resource, wherein providing the video input to the multi-head neural network model and generating with the model the estimate of the two or more physiological signals are performed by the remote computing resource. ([0019] of Atallah, neural networks may include multiple layers (e.g., where a signal path traverses from front layers to back layers). In some embodiments, back propagation techniques may be utilized by the neural networks, where forward stimulation is used to reset weights on the “front” neural units and [0034] of Atallah, “prediction component 114 is configured to determine individual 700's vital signs (e.g., heart rate, respiratory rate, SpO2, etc.) by determining (i) one or more changes in the color of the face of individual 700 during one or more cardiac cycles and (ii) chest wall movements” and [0019] of Atallah, information through network)
Referring to claim 7, Atallah in view of El Kaliouby and Waxman disclose the method of claim 1, wherein the method is implemented in one or more computing resources facilitating check-in at a medical office, clinic or hospital. ([0048] of Atallah, hospital)
Referring to claim 8, Atallah in view of El Kaliouby and Waxman disclose the method of claim 7, wherein the one or more computing resources comprises a smartphone. ([0016] of Atallah, smart phone)
Referring to claim 9, Atallah in view of El Kaliouby and Waxman disclose the method of claim 1, wherein the method is executed in a smart display. ([0016] of Atallah, smart phone)
Referring to claim 10, Atallah in view of El Kaliouby and Waxman disclose the method of claim 1, further comprising: using the set of facial video inputs from the multitude of other subjects to train the multi- head neural network model to predict the first and second physiological signals, wherein training the multi-head neural network model to predict the first and second physiological signals comprises updating parameters of the common portion, the first head, and the second head of the multi-head neural network model a plurality of times; ([0028] of Atallah, “Information exchange component 112 is configured to store a neural network (described below) on one or more computer-readable storage media 106 of first client computer system 100. In some embodiments, information exchange component 112 is configured to obtain updated information related to an updated neural network. In some embodiments, the updated neural network is trained on other audio, image, or video data to estimate physiological conditions. In some embodiments, information exchange component 112 is configured to replace the neural network with the updated neural network such that further video data is provided to the updated neural network in lieu of the neural network to obtain further physiological condition information. In one embodiment, the present disclosure comprises means for obtaining updated information related to an updated neural network, with such means for obtaining taking the form of information exchange component 112. In one embodiment, the present disclosure comprises a means for replacing the neural network with the updated neural network, with such means for replacing taking the form of information exchange component 112.”) and
using an additional set of facial video inputs, training a third head of the multi-head neural network model to predict a third physiological signal without altering the parameters of the common portion of the multi-head neural network model. ([0032] of Atallah, “Prediction component 114 is configured to provide, during a time period (e.g., video streaming session, video recording session, etc.), video data of the video stream (e.g., live video stream, recorded video stream, etc.) as input to the prediction model to obtain physiological condition information from the prediction model. In some embodiments, the physiological condition information indicates one or more physiological conditions of individual 700. In some embodiments, prediction component 114 is configured to cause the trained prediction model to output, during the time period (e.g., during the video streaming session, during the video recording session, etc.) prediction model responses to at least some of video data of the video stream (e.g., live video stream, recorded video stream, etc.) by providing the at least some of video data of the video stream obtained during the time period as input to the trained prediction model. In some embodiments, the prediction model responses are based on analysis of one or more colored signals of the video data. In some embodiments, the prediction model responses one or more biometrical signals (e.g., vital signs). In one embodiment, the present disclosure comprises means for providing, during a video streaming session, video data of a live video stream as input to the machine learning model to obtain physiological condition information from the machine learning model, with such means for providing taking the form of the prediction component 114.”)
Referring to claim 12, Atallah in view of El Kaliouby and Waxman disclose the smartphone of claim 11, wherein a representation of the multi-head neural network model is stored in a memory of the smartphone. ([0004] of Atallah)
Referring to claim 13, Atallah in view of El Kaliouby and Waxman disclose the smartphone of claim 11, wherein the first and second physiological signals comprise heart rate and respiratory rate, respectively. ([0034] of Atallah, “prediction component 114 is configured to determine individual 700's vital signs (e.g., heart rate, respiratory rate, SpO2, etc.) by determining (i) one or more changes in the color of the face of individual 700 during one or more cardiac cycles and (ii) chest wall movements” and [0019] of Atallah, information through network)
Referring to claim 14, Atallah in view of El Kaliouby and Waxman disclose the smartphone of claim 11, wherein the controller operations further comprise: transmitting, by the smartphone over a communications network, an indication of the generated estimate of the two or more physiological signals of the subject. ([0034] of Atallah, “prediction component 114 is configured to determine individual 700's vital signs (e.g., heart rate, respiratory rate, SpO2, etc.) by determining (i) one or more changes in the color of the face of individual 700 during one or more cardiac cycles and (ii) chest wall movements” and [0019] of Atallah, information through network via [0016] of Atallah, smart phone)
Referring to claim 15, Atallah in view of El Kaliouby and Waxman disclose the smartphone of claim 11, wherein the video input comprises a sequence of frames of RGB color images. ([0024] of Atallah, computer screen, touch screen, etc.. it is know that computer screen/monitors use RGB color for images)
Referring to claim 17, Atallah in view of El Kaliouby and Waxman disclose the method of claim 16, wherein the video of the face of the subject comprises a sequence of frames of RGB color images obtained from a smartphone. ([0024] of Atallah, computer screen, touch screen, etc.. it is know that computer screen/monitors use RGB color for images)
Referring to claim 18, Atallah in view of El Kaliouby and Waxman disclose the method of claim 17, further comprising the step of reporting the estimate of the physiological parameters to a remotely located medical provider. ([0048] of Atallah)
Referring to claim 19, Atallah in view of El Kaliouby and Waxman disclose the method of claim 17, further comprising the step of reporting the estimate of the physiological parameters to a communications device providing the video. ([0034] of Atallah, “prediction component 114 is configured to determine individual 700's vital signs (e.g., heart rate, respiratory rate, SpO2, etc.) by determining (i) one or more changes in the color of the face of individual 700 during one or more cardiac cycles and (ii) chest wall movements”)
Referring to claim 20, Atallah in view of El Kaliouby and Waxman disclose the method of claim 16, wherein the first physiological parameter and second physiological parameter comprise heart rate and respiratory rate, respectively. ([0034] of Atallah, “prediction component 114 is configured to determine individual 700's vital signs (e.g., heart rate, respiratory rate, SpO2, etc.) by determining (i) one or more changes in the color of the face of individual 700 during one or more cardiac cycles and (ii) chest wall movements)
The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure:
Levi et al (US 20210401298 A1): Devices and methods are provided for performing remote physiological monitoring of vital signs from one or more subjects. Camera pairs including an intensity camera and a depth camera are used to obtain intensity image data and depth image data that are then processed using one or more ROIs for extracting heart rate and respiratory waveforms from which the heart rate and respiratory rate may be estimated. In other embodiments, multiple ROIs may be used to obtain several heart rate and respiratory rate values which are then fused together. In some embodiments motion compensation may be used prior to generating the heart rate and respiratory waveforms. In other embodiments, multiple camera pairs may be used to obtain intensity and depth data from multiple fields of view which may be used to obtain several heart rate and respiratory rate values which are then fused together.
Applicant is required under 37 C.F.R. § 1.111(c) to consider these references fully when responding to this action.
It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way. A reference is relevant for all it contains and may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art. In re Heck, 699 F.2d 1331, 1332-33, 216 U.S.P.Q. 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 U.S.P.Q. 275, 277 (C.C.P.A. 1968)).
In the interests of compact prosecution, Applicant is invited to contact the examiner via electronic media pursuant to USPTO policy outlined MPEP § 502.03. All electronic communication must be authorized in writing. Applicant may wish to file an Internet Communications Authorization Form PTO/SB/439. Applicant may wish to request an interview using the Interview Practice website: http://;www.uspto.gov/patent/laws-and-regulations/interview-practice.
Applicant is reminded Internet e-mail may not be used for communication for matters under 35 U.S.C. § 132 or which otherwise require a signature. A reply to an Office action may NOT be communicated by Applicant to the USPTO via Internet e- mail. If such a reply is submitted by Applicant via Internet e-mail, a paper copy will be placed in the appropriate patent application file with an indication that the reply is NOT ENTERED. See MPEP § 502.03(II).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAIMEI JIANG whose telephone number is (571)270-1590. The examiner can normally be reached M-F 9-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela D Reyes can be reached on 571-270-1006. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/HAIMEI JIANG/Primary Examiner, Art Unit 2142