Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This Office Action is in response to an AMENDMENT entered on January 5, 2026 for patent application 18/772,870 filed on July 15, 2024.
Claims 1-20 are pending.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-5, 12, 13, 15, 16 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Verma et al. (Pub. No.: US 2022/0405507) in view of Addison et al. (Pub. No.: US 2021/0327245).
Regarding claim 1, Verma discloses a method of multimodal multimedia processing for at least one wearable device comprising an image sensor (Fig. 3, element 320, para. [0096]) and at least one processor (Fig. 3, element 306, para. [0091]), the method comprising: in response to the image sensor of the at least one wearable device being turned on, obtaining, by the at least one wearable device, a first measurement of an environmental parameter captured by the at least one wearable device in a vicinity of the individual (Fig. 5, paras. [0104]-[0106]); determining, by the at least one processor of the at least one wearable device, whether a triggering event has occurred based on the first measurement, the triggering event associated with generating tagging information based on the first measurement for the image sensor of the at least one wearable device (para. [0059]) using a machine learning model adapted to run on the at least one processor of the at least one wearable device (para. [0048]); and in response to determining that the triggering event has occurred, editing, by the at least one processor of the at least one wearable device, a selected clip from a multimedia stream currently being captured by the image sensor of the at least one wearable device to include the tagging information generated based on the first measurement at a corresponding timestamp (para. [0059]).
Verma does not explicitly disclose obtaining, by the at least one wearable device, a first measurement of a physiological parameter of an individual. However, in analogous art, Addison discloses “a method of artificial intelligence (AI) based video tagging for alarm management includes receiving, using a processor, a video stream, the video stream comprising a sequence of images for at least a portion of a patient, determining, using the processor, a physiological parameter for the patient based on the sequence of images, detecting, using machine learning, presence of a noise object and setting a interaction-flag to a positive value in response to detecting the noise object, comparing a quality level of the sequence of images with a threshold quality level, and modifying an alarm level based on the value of the interaction-flag and comparison of the quality level of the sequence of depth images with the threshold quality level (para. [0002]),” wherein “[s]uch physiological signal may be a volume signal associated with the breathing of a patient (para. [0022]).” Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Verma to allow for the first measurement to be a physiological parameter of an individual. This would have produced predictable and desirable results, in that it would allow for the safety feature of Addison to be incorporated into the system of Verma, such that if a person alone in a home doing an inspection starting having breathing trouble, for example, an alarm could be triggered.
Regarding claim 2, the combination of Verma and Addison discloses the method of claim 1, and further discloses wherein the at least one wearable device comprises the image sensor and a second sensor, wherein the second sensor performs multimodal cooperation with the image sensor (Verma, Fig. 3, elements 318 and 322, paras. [0095] and [0097]).
Regarding claim 3, the combination of Verma and Addison discloses the method of claim 2, and further discloses wherein determining, by the at least one processor of the at least one wearable device, whether the triggering event has occurred based on the first measurement further comprises: obtaining, by the second sensor, a second measurement of at least one of another physiological parameter of the individual or an environmental parameter captured by the second sensor in a vicinity of the individual; and determining, by the at least one processor of the at least one wearable device, whether the triggering event has occurred based on the first measurement and the second measurement (Verma, para. [0064]).
Regarding claim 4, the combination of Verma and Addison discloses the method of claim 1, and further discloses wherein the physiological parameter comprises at least one of: heart rate, heart rate variability, blood pressure, blood glucose level, body temperature, or respiration rate (Addison, para. [0025]. This claim is rejected on the same grounds as claim 1.).
Regarding claim 5, the combination of Verma and Addison discloses the method of claim 3, and further discloses wherein the environmental parameter comprises at least one of: altitude, GPS location, ambient temperature, ambient humidity, ambient noise index, or an environmental pollution index (Verma, para. [0064]).
Regarding claim 12, the combination of Verma and Addison discloses the method of claim 1, and further discloses further comprising: detecting, by the at least one processor of the at least one wearable device, at least one object from the selected clip based on the tagging information (Verma, para. [0065]); and determining a task associated with the at least one object using the machine learning model (Verma, para. [0065]).
Regarding claim 13, the combination of Verma and Addison discloses the method of claim 12, and further discloses wherein a parameter derived from the first measurement is used to determine a type of the at least one object in the task (Verma, paras. [0064]-[0065]).
Regarding claim 15, Verma discloses a wearable device for multimodal multimedia processing, comprising: an image sensor (Fig. 3, element 320, para. [0096]); a non-transitory memory (para. [0075]); and at least one processor (Fig. 3, element 306, para. [0091]) configured to execute instructions stored in the non-transitory memory to: in response to the image sensor of the wearable device being turned on, obtain a first measurement of an environmental parameter captured by the wearable device in a vicinity of the individual (Fig. 5, paras. [0104]-[0106]); determine whether a triggering event has occurred based on the first measurement, the triggering event associated with generating tagging information based on the first measurement for the image sensor (para. [0059]) using a machine learning model adapted to run on the at least one processor (para. [0048]); and in response to determining that the triggering event has occurred, edit a selected clip from a multimedia stream currently being captured by the image sensor to include the tagging information generated based on the first measurement at a corresponding timestamp (para. [0059]).
Verma does not explicitly disclose obtaining, by the at least one wearable device, a first measurement of a physiological parameter of an individual. However, in analogous art, Addison discloses “a method of artificial intelligence (AI) based video tagging for alarm management includes receiving, using a processor, a video stream, the video stream comprising a sequence of images for at least a portion of a patient, determining, using the processor, a physiological parameter for the patient based on the sequence of images, detecting, using machine learning, presence of a noise object and setting a interaction-flag to a positive value in response to detecting the noise object, comparing a quality level of the sequence of images with a threshold quality level, and modifying an alarm level based on the value of the interaction-flag and comparison of the quality level of the sequence of depth images with the threshold quality level (para. [0002]),” wherein “[s]uch physiological signal may be a volume signal associated with the breathing of a patient (para. [0022]).” Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Verma to allow for the first measurement to be a physiological parameter of an individual. This would have produced predictable and desirable results, in that it would allow for the safety feature of Addison to be incorporated into the system of Verma, such that if a person alone in a home doing an inspection starting having breathing trouble, for example, an alarm could be triggered.
Regarding claim 16, the combination of Verma and Addison discloses the wearable device of claim 15, and further discloses further comprising a second sensor, wherein the second sensor performs multimodal cooperation with the image sensor (Verma, Fig. 3, elements 318 and 322, paras. [0095] and [0097]), and the instructions to determine whether a triggering event has occurred based on the first measurement comprise instructions to: obtain, by the second sensor, a second measurement of at least one of another physiological parameter of the individual or an environmental parameter captured by the second sensor in a vicinity of the individual; and determine, by the at least one processor, whether the triggering event has occurred based on the first measurement and the second measurement (Verma, para. [0064]).
Regarding claim 20, the combination of Verma and Addison discloses a non-transitory computer-readable storage medium configured to store computer programs for multimodal multimedia processing using at least one wearable device (Verma, para. [0075]), the computer programs comprising instructions executable by at least one processor to perform the method of claim 1 (See the rejection of claim 1, above).
Claims 6, 14 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Verma et al. (Pub. No.: US 2022/0405507) in view of Addison et al. (Pub. No.: US 2021/0327245) and Yamasaki et al. (Pub. No.: US 2024/0386636).
Regarding claim 6, the combination of Verma and Addison discloses the method of claim 1, wherein the tagging information comprises the corresponding timestamp (Verma, para. [0059]). However, it could be argued that the combination does not explicitly disclose wherein the tagging information comprises the first measurement, and information extracted from the multimedia stream. However, in analogous art, Yamasaki discloses that a monitoring service can be provided for users wearing smart glasses, wherein “[e]vent analysis data is provided. The event analysis data includes classification of emotions, their factors, and objects of interest. It may further include statistical data. If a specific period of time in the video is specified, data analyzed for that period of time is provided. Suppose that a request is received from a family of a user to watch over that user. In that case, the family of the user is presented with event analysis data that includes the number of steps the user has taken, the number of times the user has been moved, the number of times the user's heart has sunk, etc., which are calculated using the acceleration sensor 122. FIG. 10 shows a display example of event analysis data provided by the monitoring service. By displaying such case analysis data, the user's family can grasp the user's condition even from a remote location, for example (paras. [0066]-[0067]).” Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Verma and Addison to allow for the tagging information to comprise the first measurement, and information extracted from the multimedia stream. This would have produced predictable and desirable results, in that it would allow for increased versatility from the wearable device, which could increase user satisfaction.
Regarding claim 14, the combination of Verma and Addison discloses the method of claim 1, but does not disclose further comprising: transmitting, by the at least one wearable device to a recipient monitoring the triggering event for the individual, the selected clip edited to include the tagging information with an alert. However, in analogous art, Yamasaki discloses that a monitoring service can be provided for users wearing smart glasses, wherein “[e]vent analysis data is provided. The event analysis data includes classification of emotions, their factors, and objects of interest. It may further include statistical data. If a specific period of time in the video is specified, data analyzed for that period of time is provided. Suppose that a request is received from a family of a user to watch over that user. In that case, the family of the user is presented with event analysis data that includes the number of steps the user has taken, the number of times the user has been moved, the number of times the user's heart has sunk, etc., which are calculated using the acceleration sensor 122. FIG. 10 shows a display example of event analysis data provided by the monitoring service. By displaying such case analysis data, the user's family can grasp the user's condition even from a remote location, for example (paras. [0066]-[0067]).” Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Verma and Addison to allow for transmitting, by the at least one wearable device to a recipient monitoring the triggering event for the individual, the selected clip edited to include the tagging information with an alert. This would have produced predictable and desirable results, in that it would allow for increased versatility from the wearable device, which could increase user satisfaction.
Regarding claim 17, the combination of Verma and Addison discloses the wearable device of claim 16, wherein the physiological parameter comprises at least one of: heart rate, heart rate variability, blood pressure, blood glucose level, body temperature, or respiration rate (Addison, para. [0025]), and the environmental parameter comprises at least one of: altitude, GPS location, ambient temperature, ambient humidity, ambient noise index, or an environmental pollution index (Verma, para. [0064]), and the tagging information comprises the corresponding timestamp (Verma, para. [0059]). However, it could be argued that the combination does not explicitly disclose wherein the tagging information comprises the first measurement, and information extracted from the multimedia stream. However, in analogous art, Yamasaki discloses that a monitoring service can be provided for users wearing smart glasses, wherein “[e]vent analysis data is provided. The event analysis data includes classification of emotions, their factors, and objects of interest. It may further include statistical data. If a specific period of time in the video is specified, data analyzed for that period of time is provided. Suppose that a request is received from a family of a user to watch over that user. In that case, the family of the user is presented with event analysis data that includes the number of steps the user has taken, the number of times the user has been moved, the number of times the user's heart has sunk, etc., which are calculated using the acceleration sensor 122. FIG. 10 shows a display example of event analysis data provided by the monitoring service. By displaying such case analysis data, the user's family can grasp the user's condition even from a remote location, for example (paras. [0066]-[0067]).” Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Verma and Addison to allow for the tagging information to comprise the first measurement, and information extracted from the multimedia stream. This would have produced predictable and desirable results, in that it would allow for increased versatility from the wearable device, which could increase user satisfaction.
Claims 7, 8, 11, 18 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Verma et al. (Pub. No.: US 2022/0405507) in view of Addison et al. (Pub. No.: US 2021/0327245) and Lee et al. (Pub. No.: US 2024/0362272).
Regarding claim 7, the combination of Verma and Addison discloses the method of claim 1, but does not disclose wherein the selected clip is to be analyzed with other tagged clips to determine a personalized multimedia lifelog entry. However, in analogous art, Lee discloses that “the video analysis system 130 generates clip descriptions as described in conjunction with FIG. 8 and stores the clip descriptions in a database store 990 for the video. In one embodiment, the LLM 970, which may be the same or different model from the LLM 950, is coupled to receive at least the set of prompt embeddings and the clip descriptions and generate an output for a given prompt. In one embodiment, the video analysis system 130 applies the LLM 970 to the dense clip descriptions and the set of prompt embeddings to generate a list of video-level text, including at least one of a title, hashtags, topic, summary, chapters, highlights, dense narrations, and the like of the video (para. [0131]; see also paras. [0133]-[0145]).” Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Verma and Addison to allow for the selected clip to be analyzed with other tagged clips to determine a personalized multimedia lifelog entry. This would have produced predictable and desirable results, in that it would allow for an automated processing of the video to allow for it to be quickly reviewed and understood, which could increase user satisfaction with the system.
Regarding claim 8, the combination of Verma, Addison and Lee discloses the method of claim 7, and further discloses wherein the machine learning model adapted to run on the at least one processor of the at least one wearable device comprises a first large language model customized for the individual and adapted to run on the at least one processor of the at least one wearable device (Lee, Fig. 8, element 850, para. [0112]), the method further comprising: sending the selected clip to a server, wherein the selected clip including the tagging information is analyzed to update the personalized multimedia lifelog entry using a second large language model and an expert knowledge base interacting with the second large language model (Lee, Fig. 9, elements 950 and 970, paras. [0117] and [0130]-[0145]. This claim is rejected on the same grounds as claim 7.).
Regarding claim 11, the combination of Verma, Addison and Lee discloses the method of claim 8, and further discloses further comprising: generating, by the at least one processor of the at least one wearable device, an instruction to provide a recommendation to the individual based on the first measurement and the personalized multimedia lifelog entry for the individual using at least one of the first large language model or the second large language model (Lee, para. [0018]. This claim is rejected on the same grounds as claim 7.).
Regarding claim 18, the combination of Verma and Addison discloses the wearable device of claim 15, but does not disclose wherein the selected clip is to be analyzed with other tagged clips to determine a personalized multimedia lifelog entry, and the machine learning model adapted to run on the at least one processor comprises a first large language model customized for the individual and adapted to run on the at least one processor, and the instructions stored in the non-transitory memory further comprise instructions to: send the selected clip to a server, wherein the selected clip including the tagging information is analyzed to update the personalized multimedia lifelog entry using a second large language model and an expert knowledge base interacting with the second large language model. However, in analogous art, Lee discloses that “the video analysis system 130 generates clip descriptions as described in conjunction with FIG. 8 and stores the clip descriptions in a database store 990 for the video. In one embodiment, the LLM 970, which may be the same or different model from the LLM 950, is coupled to receive at least the set of prompt embeddings and the clip descriptions and generate an output for a given prompt. In one embodiment, the video analysis system 130 applies the LLM 970 to the dense clip descriptions and the set of prompt embeddings to generate a list of video-level text, including at least one of a title, hashtags, topic, summary, chapters, highlights, dense narrations, and the like of the video (para. [0131]; see also paras. [0133]-[0145]),” and further discloses that “FIG. 8 illustrates an example inference process for generating responses to a user query using a video encoder 810 and/or a decoder 830 including an alignment model 840 and a LLM 850, in accordance with an embodiment. In one embodiment, the video encoder 810 is configured substantially similar or identical to the video encoder 210 of FIG. 2. The video encoder 810. The decoder 830 includes an alignment model 840 and a LLM 850. However, it is appreciated that in other embodiments, the video encoder 810 and/or the decoder 830 may include fewer or more components than that illustrated in FIG. 2 (para. [0112]),” wherein “[t]he LLM 850 interprets the set of video-language-aligned embeddings 828 based on the prompt incorporating the user query. In one embodiment, the LLM 850 is coupled to receive the set of prompt embeddings 825 (i.e., which may be generated substantially similar to the process described in FIGS. 2 and 4) and the set of video-language-aligned embeddings 828 to generate an output. For example, the LLM 850 is applied to the combined tensor to generate the output. The LLM 850 leverages extensive knowledge base to interpret the video-language-aligned embeddings 828 based on the user query. The LLM 850 decodes the information into an output, that can be converted into coherent, human-readable text (para. [0117]).” Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Verma and Addison to allow for the selected clip to be analyzed with other tagged clips to determine a personalized multimedia lifelog entry, and the machine learning model adapted to run on the at least one processor comprises a first large language model customized for the individual and adapted to run on the at least one processor, and the instructions stored in the non-transitory memory further comprise instructions to: send the selected clip to a server, wherein the selected clip including the tagging information is analyzed to update the personalized multimedia lifelog entry using a second large language model and an expert knowledge base interacting with the second large language model. This would have produced predictable and desirable results, in that it would allow for an automated processing of the video to allow for it to be quickly reviewed and understood, which could increase user satisfaction with the system.
Regarding claim 19, the combination of Verma, Addison and Lee discloses the wearable device of claim 18, and further discloses wherein the instructions stored in the non-transitory memory further comprise instructions to: generate an instruction to direct the image sensor to switch to perform a task based on the first measurement, wherein the task is generated using at least one of the first large language model or the second large language model; or generate an instruction to provide a recommendation to the individual based on the first measurement and the personalized multimedia lifelog entry for the individual using at least one of the first large language model or the second large language model (Lee, para. [0018]. This claim is rejected on the same grounds as claim 18.).
Claims 9 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Verma et al. (Pub. No.: US 2022/0405507) in view of Addison et al. (Pub. No.: US 2021/0327245), Lee et al. (Pub. No.: US 2024/0362272) and Horvath (Pub. No.: US 2016/0150196).
Regarding claim 9, the combination of Verma, Addison and Lee discloses the method of claim 8, but it could be argued that the combination does not explicitly disclose further comprising: generating, by the at least one processor of the at least one wearable device, an instruction to direct the image sensor of the at least one wearable device to switch to perform a task based on the first measurement, wherein the task is generated using at least one of the first large language model or the second large language model. However, in analogous art, Horvath discloses that a “capture setting adjustment step 740 can flag the camera modules 102 to increase the frame rate of the motion videos 306 when the thresholds are met for the sensors within the sensor block 504 of the tag 104 so that the camera modules 102 will capture the motion video 306 in the high frame rate video format. For example, when the sensors of the sensor block 502 detect an acceleration above the sensor threshold 738, the frame rate of the motion video 306 capture can increase from 24 to 30 frames per second when capturing the standard frame rate to 60 or 120 frames per second when capturing the high frame rate (para. [0139]).” Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Verma, Addison and Lee to allow for generating, by the at least one processor of the at least one wearable device, an instruction to direct the image sensor of the at least one wearable device to switch to perform a task based on the first measurement, wherein the task is generated using at least one of the first large language model or the second large language model. This would have produced predictable and desirable results, in that it would allow for the wearable device to adapt to changing conditions.
Regarding claim 10, the combination as stated above discloses the method of claim 9, and further discloses wherein the task to be performed by the image sensor of the at least one wearable device comprises at least one of: updating a frame rate of the multimedia stream currently being captured, or taking a high-resolution still photo (Horvath, para. [0139]. This claim is rejected on the same grounds as claim 9).
Response to Arguments
Applicant’s arguments, see pages 7-8, filed January 5, 2026, with respect to the 35 USC § 112 rejections of claims 12 and 13, have been fully considered and are persuasive. The 35 USC § 112 rejections of claims 12 and 13 has been withdrawn.
Applicant’s arguments with respect to the 35 USC § 102 and 103 rejections of all claims have been considered but are moot in view of the new grounds of rejection in view of Addison.
Conclusion
Claims 1-20 are rejected.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Joshua D Taylor whose telephone number is (571)270-3755. The examiner can normally be reached Monday - Friday 8 am - 6 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nasser Goodarzi can be reached at 571-272-4195. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Joshua D Taylor/Primary Examiner, Art Unit 2426 February 5, 2026