DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) submitted on September 30, 2025, and December 5, 2025, are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.
Response to Amendment
Applicant’s Amendments to the claims and specification filed on September 30, 2025, has been entered and made of record.
Currently pending Claims(s) 1-18
Independent Claim(s) 1, 8, and 15
Amended Claim(s) 1, 8, and 15
Canceled Claim(s) N/A
Response to Arguments
This office action is responsive to Applicant’s Arguments/Remarks Made in an Amendment received on September 30, 2025.
In view of the amendments filed on September 30, 2025, the Applicant has amended the title to be more descriptive; therefore, the objection to the specification has been overcome. Additionally, the Applicant has amended the independent claims 1, 8, and 15 to include an additional limitation (which is present in the specification at [0086]) that discloses adding client devices to a video conference call automatically without input from the client device.
In view of Applicant’s Arguments/Remarks filed July 10, 2025, with respect to the claims, the Applicant argued (Remarks page 10, paragraph 1) that neither Verma (US 2018/0253954 A1) or Slotznick (US 11,343,293 B1), whether taken alone or in combination, disclose or make obvious “joining, by the video conference provider, one or more of the plurality of client devices to an identification session without any input from the client devices, the identification session comprising a video conference hosted by the video conference provider.” In the previous office action (Non-Final Rejection dated June 30, 2025), the Examiner showed that Verma teaches a system for incident response which utilizes many client devices [0013]—such as cameras, microphone arrays, smart-assistant devices, etc. which are all connected to a central host server over the internet—that includes video conferencing capabilities [0031]. The client, the client’s family, and the caregiver are able to join a video conference using the system [0031], and the system alerts the family and the caregiver of incidents [0030], providing alarms which cause the caregiver to act and contact the client and/or the family [abstract]. Additionally, the Examiner showed that Slotznick teaches well-known functionalities and implementations of web conferencing software, such as increasing display sizes of certain streams on the conference [Col. 3-4] and break-out room assignments [Col. 10, lines 53-59], and Slotznick explains that video conferencing software is widely used and implemented by those of ordinary skill in the art for any purpose which involves communication between people remotely [Col. 2, line 65 – Col. 3, line 4].
However, the Examiner agrees that neither Verma or Slotznick teach joining client devices to a video conference without any input from the client devices. Thus, a new search was conducted for the amended claim 1, 8, and 15, and the Examiner now uses the art of Rosenberg (US 2019/0356703 A1) in the rejections to teach connecting client devices to a video conference without input from the client devices [Fig. 2].
The Applicant expands upon the argument (Remarks page 10, paragraph 2) saying that Verma teaches an intelligent digital client assistant (IDCA) that monitors a client and alerts a web server when incidents occur, but the web server does not create identification sessions and join devices automatically to the session. Rather, as discussed above, Verma teaches that the device relevant to an incident uses alerts to inform the caregiver to initiate a video conference, and the Examiner agrees that Slotznick also does not disclose automatically joining client devices to a video conferencing session. Therefore, a new search was conducted, and Rosenberg is now used in combination with Verma and Slotznick to address the new limitation.
Thus, claims 1, 8, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Verma in view of Slotznick and Rosenberg. Therefore, the rejections applied to the dependent claims 2-7, 9-14, and 15-18 are not overcome.
Additionally, the Examiner recommends reviewing the art of Washino (US 5,625,410 A) and Olds (US 2021/0209932 A1), included in the conclusion section of this office action. These references are not used in the 35 U.S.C. 103 rejections, but were found during the new prior art search and contain relevant art about increasing the prominence of incident-related video streams [Washino Col. 3, lines 20-27] and establishing video connections between alarm systems and responders [Olds 0004-0020].
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-5, 7-11, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Verma (US 2018/0253954 A1), further in view of Slotznick (US 11,343,293 B1) and Rosenberg et al. (US 2019/0356703 A1), hereafter Rosenberg.
Regarding claim 1, Verma teaches a method comprising:
establishing, by a video conference provider, an incident identification system ([Abstract] “The camera detects client's movements—such as sitting, lying, and fall—and generates appropriate alarms to a central server for a client assistance operator to act.” [0013] “The intelligent digital client assistant of the invention has plurality of built in function with at least a digital camera, and a HDMI connector for video conference.” Additionally, paragraph 0031 discusses the capabilities of the system to video conference between the client, the client assistance operator, and/or the client’s family via a connected phone application.),
determining, by the video conference provider, a plurality of client devices associated with the incident identification system (Fig. 1 shows all these devices in the system, and they are all connected over broadband internet and communicate with a server. Intelligent digital cameras with microphones are installed around the premises and stream video to the server. Smart assistant devices and the intelligent assistant AI monitor and communicate with the client. Additionally, a drone capable of streaming location and video information can monitor the client.),
receiving, by the video conference provider, at least one multimedia stream from each of the one or more client devices ([0027] “Invention uses an intelligent digital camera, iDC (102), mounted in all the locations of the house (100) and outside premises, including backyard. The intelligent camera (102) has smarts to detect clients face from live video during day and night.”);
determining, by the video conference provider, a relevant multimedia stream from the received multimedia streams and transmitting, by the video conference provider, an indication to increase a prominence of the relevant multimedia stream ([0028] “The camera can time stamp client performing all the activities to analyze if it is normal or abnormal to generate an alert for the system to check client's well-being. The camera in the invention is preloaded with these scene based analytic alerts at the time of installation based on location, e.g. a camera in the kitchen will alert how many times client has opened the refrigerator a day to deduce eating habits as well as need to refill supplies.” Alerts can be generated based on the location, and each intelligent camera knows its own location. Therefore, the alerts inform the caregiver and/or server operator which video stream is relevant.).
Although Verma teaches sending alerts to the web server in response to an incident so that the caregiver and/or server operator can initiate a video conference with the client experiencing the incident [Abstract; 0030-0031], manual input from the client and the host of the video conference is required. Thus, Verma fails to teach joining, by the video conference provider, one or more of the plurality of client devices to an identification session without any input from the client devices, the identification session comprising a video conference hosted by the video conference provider.
However, Rosenberg teaches joining, by the video conference provider, one or more of the plurality of client devices to an identification session without any input from the client devices, the identification session comprising a video conference hosted by the video conference provider (Fig. 2 shows a flowchart for automatically joining client devices to a video conference if the client devices have previously been invited to the conference. [0020] “Once the video conference is launched in the conference room by any device in the conference room that is in communication with the collaboration service, all other devices that are in communication with the collaboration service can also be automatically joined to the conference. Alternatively, only devices that are both in communication with the collaboration service, and that are associated with an identity that has been invited, or otherwise has access privileges to be present in the conference can be automatically joined.”).
Verma and Rosenberg are analogous to the claimed invention, because both utilize video conferencing software for viewing video streamed from a client device. Therefore, it would have been obvious to one of ordinary skill in the art to modify Verma’s invention by automatically joining the client to a video conference call when the caregiver and/or server operator initiates a video conference in response to a client-related incident. This modification would allow for the client to be joined automatically, which can greatly increase the convenience for the client ([Rosenberg 0016] For example if a conference participant were to join their portable device to the conferencing service, conference participant would be open to view conference materials on their portable device. However, asking a conference participant to take the steps required to manipulate their portable device to join a conference provides a barrier to entry so great that many conference participants will not take such steps. As such the present technology can automatically join the conference participants portable device to a conference without any action taken by the user of the portable device.”). Additionally, Verma teaches that the client may be an Alzheimer or Dementia patient undergoing an incident alone [0026], so automatically joining him/her to an incident related call can be vital if the incident prohibits him/her from providing input to the client device. For this purpose, Verma teaches two-way voice devices for automatic communication with an elderly client experiencing a fall [0027].
Additionally, although Verma teaches identifying multimedia streams which are relevant to an incident (Fig. 1 shows many connected client devices which can each identify incidents and transmit video streams of the incident) by alerting the web server from the relevant streaming device [0027-0028], Verma does not specifically teach increasing the prominence of a multimedia stream during a video conference.
However, Slotznick teaches transmitting, by the video conference provider, an indication to increase a prominence of the relevant multimedia stream during the identification session (Col. 3-4 discuss the resizing of display size windows as a commonly known feature of known video conferencing systems, such as Zoom and Microsoft Teams.).
Verma and Slotznick are analogous in the art to the claimed invention, because both teach systems utilizing video conferencing software. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize views such as the speaker view (Col. 3, lines 38-42 “a "speaker view" refers to when the screen shows only (or primarily features) the feed of the person speaking. The videoconferencing 40 system may automatically determine who is speaking and whose feed is shown.”) to increase the prominence of the relevant multimedia stream. This modification would allow for the video conferencing host (the caregiver and/or server operator in Verma’s invention) or the system to specify specific multimedia within the conference to be the focus of the meeting (Col. 3, lines 44-48 “the videoconferencing software may permit the host to specify several panelists in a panel discussion as speakers (e.g., Zoom meeting software permits the host to "spotlight" up to nine speakers),”), which is well-known feature of many video conference software.
Regarding claim 2, Verma teaches wherein the incident identification system is established in response to receiving a request from an administrator of a facility where an incident event has occurred ([0008] “In the present invention, the Web-based Server System with intelligent digital cameras, and intelligent digital client Assistant offers a solution to the total management of care for Alzheimer, Dementia, Autistic and assisted living population.” [Claim 18] “The system of claim 1 can be used to provide managed care services for a single client living alone to clients living in nursing home or assisted community living.” Verma’s invention provides a solution for managed care in nursing homes; the need for automatic managed care is due to incidents with single clients living alone.).
Regarding claim 3, Verma teaches wherein determining, by the video conference provider, the relevant multimedia stream comprises:
identifying, by the video conference provider, one or more incident factors present in a first multimedia stream of the received multimedia streams ([0027] “The intelligent camera, (102) has built in AI algorithms to detect fire, danger scenes like lake, canal etc. It will alert the server if the client is in danger, and near to these situations.”); and
determining, by the video conference provider, the first multimedia stream to be relevant based on the one or more incident factors ([0028] “The camera can time stamp client performing all the activities to analyze if it is normal or abnormal to generate an alert for the system to check client's well-being. The camera in the invention is preloaded with these scene based analytic alerts at the time of installation based on location, e.g. a camera in the kitchen will alert how many times client has opened the refrigerator a day to deduce eating habits as well as need to refill supplies.”).
Regarding claim 4, Verma teaches wherein the first multimedia stream comprises:
a first audio stream ([0008] “And camera has built in beam forming multiple microphones…” The cameras stream video with audio.” Additionally, the digital assistant devices listen to the client to recognize his/her speech. [0013] “The digital assistant in the invention has a multiple MEMS microphones array with digital signal processor for beam forming and speaker independent key words recognition such as TV, CNN, Alexa, Siri, Google, etc, and an array of smart speakers for quality sound listening…”), and
wherein identifying, by the video conference provider, the one or more incident factors present in the first multimedia stream comprises:
performing speech recognition on the first audio stream; and identifying, based on the speech recognition, one or more keywords indicating the one or more incident factors are present in the first audio stream ([0008] “And camera has built in beam forming multiple microphones with speaker independent key word recognition—like help, hurt, fall etc, and multiplicity of smart speakers and multiple of this camera installed at various positions in the client's residence to actively monitor client's well-being.”).
Regarding claim 5, Verma teaches wherein the first multimedia stream comprises a first video stream ([0009] “The system comprises of an intelligent digital camera, with night vision, and build in Artificial Intelligence based algorithms for face, age, emotion, movement, detection on the real time video, to monitor client 24/7,” The system can include more than one intelligent camera devices which stream video to the server 24/7.), and
wherein identifying, by the video conference provider, the one or more incident factors present in the first multimedia stream comprises:
performing visual recognition on the first video stream; and identifying, based on the visual recognition, visual activity indicating the one or more incident factors are present in the first video stream ([0009] “The system comprises of an intelligent digital camera, with night vision, and build in Artificial Intelligence based algorithms for face, age, emotion, movement, detection on the real time video, to monitor client 24/7, and to compare the video with pre stored images for hazards like fire, fall.” [0014] “The camera of this invention has built in AI based algorithms to monitor clients movements, recognize face and age, identify positions like sitting, lying, and falls which assists in analyzing client's behavior and routine.”).
Regarding claim 7, Verma fails to teach increasing the display size of the relevant multimedia stream. However, Slotznick teaches the indication to increase a display size of the relevant multimedia stream during the identification session relative to a size of other multimedia streams (Col. 3-4 discuss the resizing of display size windows as a commonly known feature of known video conferencing systems, such as Zoom and Microsoft Teams.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize views such as the speaker view (Col. 3, lines 38-42 “a "speaker view" refers to when the screen shows only (or primarily features) the feed of the person speaking. The videoconferencing 40 system may automatically determine who is speaking and whose feed is shown.”) to increase the prominence of the relative multimedia stream. This modification would allow for the video conferencing host or the system to direct members of the video conference to specify specific multimedia steams or panelists within the conference to be the focus of the meeting (Col. 3, lines 44-48 “the videoconferencing software may permit the host to specify several panelists in a panel discussion as speakers (e.g., Zoom meeting software permits the host to "spotlight" up to nine speakers),”).
Regarding claim 8, Verma teaches a system comprising:
a non-transitory computer-readable medium ([0009] “…and to compare the video with pre stored images for hazards like fire, fall.” The system is able to store images for use by the AI algorithms. [0011] “…and a database system to store, retrieve and archive all the data in the system.”);
a communications interface; and a processor communicatively coupled to the non-transitory computer-readable medium and the communications interface, the processor configured to execute processor-executable instructions stored in the non-transitory computer-readable medium (Fig. 1; Claim 1 “…a server which handles secure communication with all the connected devices...”) to:
establish, by a video conference provider ([0013] “The intelligent digital client assistant of the invention has plurality of built in function with at least a digital camera, and a HDMI connector for video conference.” Additionally, paragraph 0031 discusses the capabilities of the system to video conference between the client, the client assistance operator, and/or the client’s family via a connected phone application.),
an incident identification system ([Abstract] “The camera detects client's movements—such as sitting, lying, and fall—and generates appropriate alarms to a central server for a client assistance operator to act.”);
determine, by the video conference provider, a plurality of client devices associated with the incident identification system (Fig. 1 shows all devices in the system connected over broadband internet.”);
receive, by the video conference provider, at least one multimedia stream from each of the one or more client devices ([0027] “Invention uses an intelligent digital camera, iDC (102), mounted in all the locations of the house (100) and outside premises, including backyard. The intelligent camera (102) has smarts to detect clients face from live video during day and night.”);
determine, by the video conference provider, a relevant multimedia stream from the received multimedia streams; and transmit, by the video conference provider, an indication to increase the prominence of the relevant multimedia stream ([0028] “The camera can time stamp client performing all the activities to analyze if it is normal or abnormal to generate an alert for the system to check client's well-being. The camera in the invention is preloaded with these scene based analytic alerts at the time of installation based on location, e.g. a camera in the kitchen will alert how many times client has opened the refrigerator a day to deduce eating habits as well as need to refill supplies.” Alerts can be generated based on the location, and each intelligent camera knows its own location. Therefore, the alerts inform the caregiver and/or server operator which video stream is relevant.).
Although Verma teaches sending alerts to the web server in response to an incident so that the caregiver and/or server operator can initiate a video conference with the client experiencing the incident [Abstract; 0030-0031], manual input from the client and the host of the video conference is required. Thus, Verma fails to teach joining, by the video conference provider, one or more of the plurality of client devices to an identification session without any input from the client devices, the identification session comprising a video conference hosted by the video conference provider.
However, Rosenberg teaches join, by the video conference provider, one or more of the plurality of client devices to an identification session without any input from the client devices, the identification session comprising a video conference hosted by the video conference provider (Fig. 2 shows a flowchart for automatically joining client devices to a video conference if the client devices have previously been invited to the conference. [0020] “Once the video conference is launched in the conference room by any device in the conference room that is in communication with the collaboration service, all other devices that are in communication with the collaboration service can also be automatically joined to the conference. Alternatively, only devices that are both in communication with the collaboration service, and that are associated with an identity that has been invited, or otherwise has access privileges to be present in the conference can be automatically joined.”).
Therefore, it would have been obvious to one of ordinary skill in the art to modify Verma’s invention by automatically joining the client to a video conference call when the caregiver and/or server operator initiates a video conference in response to a client-related incident. This modification would allow for the client to be joined automatically, which can greatly increase the convenience for the client ([Rosenberg 0016] For example if a conference participant were to join their portable device to the conferencing service, conference participant would be open to view conference materials on their portable device. However, asking a conference participant to take the steps required to manipulate their portable device to join a conference provides a barrier to entry so great that many conference participants will not take such steps. As such the present technology can automatically join the conference participants portable device to a conference without any action taken by the user of the portable device.”). Additionally, Verma teaches that the client may be an Alzheimer or Dementia patient undergoing an incident alone [0026], so automatically joining him/her to an incident related call can be vital if the incident prohibits him/her from providing input to the client device. For this purpose, Verma teaches two-way voice devices for automatic communication with an elderly client experiencing a fall [0027].
Additionally, although Verma teaches identifying multimedia streams which are relevant to an incident (Fig. 1 shows many connected client devices which can each identify incidents and transmit video streams of the incident) by alerting the web server from the relevant streaming device [0027-0028], Verma does not specifically teach increasing the prominence of a multimedia stream during a video conference.
However, Slotznick teaches transmitting, by the video conference provider, an indication to increase a prominence of the relevant multimedia stream during the identification session (Col. 3-4 discuss the resizing of display size windows as a commonly known feature of known video conferencing systems, such as Zoom and Microsoft Teams.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize views such as the speaker view (Col. 3, lines 38-42 “a "speaker view" refers to when the screen shows only (or primarily features) the feed of the person speaking. The videoconferencing 40 system may automatically determine who is speaking and whose feed is shown.”) to increase the prominence of the relevant multimedia stream. This modification would allow for the video conferencing host (the caregiver and/or server operator in Verma’s invention) or the system to specify specific multimedia within the conference to be the focus of the meeting (Col. 3, lines 44-48 “the videoconferencing software may permit the host to specify several panelists in a panel discussion as speakers (e.g., Zoom meeting software permits the host to "spotlight" up to nine speakers),”), which is well-known feature of many video conference software.
Regarding claim 9, Verma teaches wherein the instructions to determine, by the video conference provider, the relevant multimedia stream further comprise processor-executable instructions stored in the non-transitory computer-readable medium (Fig. 1; [0009] “…and to compare the video with pre stored images for hazards like fire, fall.” The system is able to store images for use by the AI algorithms. [0011] “…and a database system to store, retrieve and archive all the data in the system.” Claim 1 “…a server which handles secure communication with all the connected devices...”) to:
determine, by the video conference provider, a first audio signature and a second audio signature in a first multimedia stream ([0008] “And camera has built in beam forming multiple microphones…” The cameras stream video with audio.” Additionally, the digital assistant devices listen to the client to recognize his/her speech.);
identify, by the video conference provider, the first audio signature to be not relevant to the identification session and the second audio signature to be relevant to the identification session ([0008] “And camera has built in beam forming multiple microphones with speaker independent key word recognition—like help, hurt, fall etc, and multiplicity of smart speakers and multiple of this camera installed at various positions in the client's residence to actively monitor client's well-being.” [0013] “The digital assistant in the invention has a multiple MEMS microphones array with digital signal processor for beam forming and speaker independent key words recognition such as TV, CNN, Alexa, Siri, Google, etc, and an array of smart speakers for quality sound listening…”); and
determine, by the video conference provider, the first multimedia stream to be relevant ([0012] “…independent keyword recognition such as fire, help, hurt, to intimate web server with alerts,”).
Regarding claim 10, Verma teaches wherein the instructions to determine, by the video conference provider, the relevant multimedia stream further comprise processor executable instructions stored in the non-transitory computer-readable medium (Fig. 1; [0009] “…and to compare the video with pre stored images for hazards like fire, fall.” The system is able to store images for use by the AI algorithms. [0011] “…and a database system to store, retrieve and archive all the data in the system.” Claim 1 “…a server which handles secure communication with all the connected devices...”) to:
analyze, by the video conference provider, the received multimedia streams for one or more incident factors ([0027] “The intelligent camera, (102) has built in AI algorithms to detect fire, danger scenes like lake, canal etc. It will alert the server if the client is in danger, and near to these situations.”),
wherein the one or more incident factors comprise one or more of:
one or more keywords; one or more audio signatures; an increase in audio activity; an increase in visual activity; or one or more visual signatures ([0008] “And camera has built in beam forming multiple microphones with speaker independent key word recognition—like help, hurt, fall etc, and multiplicity of smart speakers and multiple of this camera installed at various positions in the client's residence to actively monitor client's well-being.”); and
determine, by the video conference provider, a multimedia stream to be relevant based on the presence of one or more incident factors in the multimedia stream ([0028] “The camera can time stamp client performing all the activities to analyze if it is normal or abnormal to generate an alert for the system to check client's well-being. The camera in the invention is preloaded with these scene based analytic alerts at the time of installation based on location, e.g. a camera in the kitchen will alert how many times client has opened the refrigerator a day to deduce eating habits as well as need to refill supplies.” Alerts can be generated based on the location, and each intelligent camera knows its own location. Therefore, the system is aware of which specific multimedia stream is relevant.).
Regarding claim 11, Verma teaches wherein the processor is configured to execute further processor-executable instructions stored in the non-transitory computer-readable medium (Fig. 1; [0009] “…and to compare the video with pre stored images for hazards like fire, fall.” The system is able to store images for use by the AI algorithms. [0011] “…and a database system to store, retrieve and archive all the data in the system.” Claim 1 “…a server which handles secure communication with all the connected devices...”) to:
responsive to determining the multimedia stream to be relevant, generate, by the video conference provider, an incident alert for an incident event based on the one or more incident factors ([0009] “The camera generates appropriate alerts and send these alert to the web server over wi-fi router to act.”).
Regarding claim 15, Verma teaches a non-transitory computer-readable medium comprising processor-executable instructions configured to cause one or more processors (Fig. 1; [0009] “…and to compare the video with pre stored images for hazards like fire, fall.” The system is able to store images for use by the AI algorithms. [0011] “…and a database system to store, retrieve and archive all the data in the system.” Claim 1 “…a server which handles secure communication with all the connected devices...”) to:
establish, by a video conference provider, an incident identification system (Abstract “The camera detects client's movements—such as sitting, lying, and fall—and generates appropriate alarms to a central server for a client assistance operator to act.”);
determine, by the video conference provider, a plurality of client devices associated with the incident identification system (Fig. 1 shows all these devices in the system, and they are all connected over broadband internet and communicate with a server. Intelligent digital cameras with microphones are installed around the premises and stream video to the server. Smart assistant devices and the intelligent assistant AI monitor and communicate with the client. Additionally, a drone capable of streaming location and video information can monitor the client.);
receive, by the video conference provider, at least one multimedia stream from each of the one or more client devices ([0027] “Invention uses an intelligent digital camera, iDC (102), mounted in all the locations of the house (100) and outside premises, including backyard. The intelligent camera (102) has smarts to detect clients face from live video during day and night.”);
determine, by the video conference provider, a relevant multimedia stream from the received multimedia streams; and transmit, by the video conference provider, an indication to increase the prominence of the relevant multimedia stream ([0028] “The camera can time stamp client performing all the activities to analyze if it is normal or abnormal to generate an alert for the system to check client's well-being. The camera in the invention is preloaded with these scene based analytic alerts at the time of installation based on location, e.g. a camera in the kitchen will alert how many times client has opened the refrigerator a day to deduce eating habits as well as need to refill supplies.”).
Although Verma teaches sending alerts to the web server in response to an incident so that the caregiver and/or server operator can initiate a video conference with the client experiencing the incident [Abstract; 0030-0031], manual input from the client and the host of the video conference is required. Thus, Verma fails to teach joining, by the video conference provider, one or more of the plurality of client devices to an identification session without any input from the client devices without any input from the client devices, the identification session comprising a video conference hosted by the video conference provider, the identification session comprising a video conference hosted by the video conference provider.
However, Rosenberg teaches join, by the video conference provider, one or more of the plurality of client devices to an identification session without any input from the client devices, the identification session comprising a video conference hosted by the video conference provider (Fig. 2 shows a flowchart for automatically joining client devices to a video conference if the client devices have previously been invited to the conference. [0020] “Once the video conference is launched in the conference room by any device in the conference room that is in communication with the collaboration service, all other devices that are in communication with the collaboration service can also be automatically joined to the conference. Alternatively, only devices that are both in communication with the collaboration service, and that are associated with an identity that has been invited, or otherwise has access privileges to be present in the conference can be automatically joined.”).
Therefore, it would have been obvious to one of ordinary skill in the art to modify Verma’s invention by automatically joining the client to a video conference call when the caregiver and/or server operator initiates a video conference in response to a client-related incident. This modification would allow for the client to be joined automatically, which can greatly increase the convenience for the client ([Rosenberg 0016] For example if a conference participant were to join their portable device to the conferencing service, conference participant would be open to view conference materials on their portable device. However, asking a conference participant to take the steps required to manipulate their portable device to join a conference provides a barrier to entry so great that many conference participants will not take such steps. As such the present technology can automatically join the conference participants portable device to a conference without any action taken by the user of the portable device.”). Additionally, Verma teaches that the client may be an Alzheimer or Dementia patient undergoing an incident alone [0026], so automatically joining him/her to an incident related call can be vital if the incident prohibits him/her from providing input to the client device. For this purpose, Verma teaches two-way voice devices for automatic communication with an elderly client experiencing a fall [0027].
Additionally, although Verma teaches identifying multimedia streams which are relevant to an incident (Fig. 1 shows many connected client devices which can each identify incidents and transmit video streams of the incident) by alerting the web server from the relevant streaming device [0027-0028], Verma does not specifically teach increasing the prominence of a multimedia stream during a video conference.
However, Slotznick teaches transmitting, by the video conference provider, an indication to increase a prominence of the relevant multimedia stream during the identification session (Col. 3-4 discuss the resizing of display size windows as a commonly known feature of known video conferencing systems, such as Zoom and Microsoft Teams.).
Verma and Slotznick are analogous in the art to the claimed invention, because both teach systems utilizing video conferencing software. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize views such as the speaker view (Col. 3, lines 38-42 “a "speaker view" refers to when the screen shows only (or primarily features) the feed of the person speaking. The videoconferencing 40 system may automatically determine who is speaking and whose feed is shown.”) to increase the prominence of the relevant multimedia stream. This modification would allow for the video conferencing host (the caregiver and/or server operator in Verma’s invention) or the system to specify specific multimedia within the conference to be the focus of the meeting (Col. 3, lines 44-48 “the videoconferencing software may permit the host to specify several panelists in a panel discussion as speakers (e.g., Zoom meeting software permits the host to "spotlight" up to nine speakers),”), which is well-known feature of many video conference software.
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Verma in view of Slotznick and Robinson, and further in view of Kienzle and Sheinin (US 2008/0309761 A1), hereafter Kienzle.
Regarding claim 6, Verma teaches wherein the first multimedia stream comprises a first audio stream ([0008] “And camera has built in beam forming multiple microphones…” The cameras stream video with audio.” Additionally, the digital assistant devices listen to the client to recognize his/her speech. [0013] “The digital assistant in the invention has a multiple MEMS microphones array with digital signal processor for beam forming and speaker independent key words recognition such as TV, CNN, Alexa, Siri, Google, etc, and an array of smart speakers for quality sound listening…”), and
wherein identifying, by the video conference provider, the one or more incident factors present in the first multimedia stream comprises:
performing audio recognition on the first audio stream ([0008] “And camera has built in beam forming multiple microphones with speaker independent key word recognition—like help, hurt, fall etc, and multiplicity of smart speakers and multiple of this camera installed at various positions in the client's residence to actively monitor client's well-being.”).
Although Verma teaches using audio recognition to identify key words, Verma fails to teach using audio recognition for other audio-based activity. However, Kienzle teaches identifying, based on the audio recognition, audio activity indicating the one or more incident factors are present in the first audio stream ([0013] “as well as audio/speech recognition algorithms for speech recognition of a particular vocabulary (“Help”, “Robbery”, etc.). The audio recognition engine may be trained to recognize special audio signals such as gun shots, explosions, etc. as well as high-pitch and other voice signatures indicative of an alarm or emergency situation.”).
Verma and Kienzle are analogous in the art to the claimed invention, because both teach methods of automatic recognition of incidents and generating alerts in response to the incidents. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Verma’s invention by incorporating the audio recognition engine taught be Kienzle into Verma’s invention. This modification would allow for less false-positive alarms to be generated (Kienzle [0005] “Conventional video surveillance systems typically do not include any functionality or provision for monitoring audio… Police forces can be dispatched to the monitored location as a consequence of such an alarm. Obviously, fast sudden motion could have been generated by a child running towards his/her parent/friend and in this case the generated alarm becomes a false alarm which will cause an expensive dispatch of the police force.”).
Claims 12-13 and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Verma in view of Slotznick and Robinson, and further in view of deCharms (US 2016/0192166 A1).
Regarding claim 12, Verma teaches wherein the processor is configured to execute further processor-executable instructions stored in the non-transitory computer-readable medium (Fig. 1; [0009] “…and to compare the video with pre stored images for hazards like fire, fall.” The system is able to store images for use by the AI algorithms. [0011] “…and a database system to store, retrieve and archive all the data in the system.” Claim 1 “…a server which handles secure communication with all the connected devices...) to:
Verma fails to teach joining an authorized agency device to the identification session. However, deCharms teaches responsive to the incident alert for the incident event, transmit, by the video conference provider, a request to join an authorized agency device to the identification session ([0011] “…while transmitting real-time video from the mobile computing device to the other computing device; receiving a request to connect a mobile computing device with a responder service; Identifying an appropriate responder service based on the location of the mobile computing device; initiating contact with the appropriate responder service on behalf of the mobile computing device;” [0146] “emergency responders or police officers carrying mobile devices running the software provided for here may receive real time alerts when an event has taken place near them, including mapped information of its location, photo, video, audio or other information collected about the event.”); and
join, by the video conference provider, the authorized agency device to the identification session ([0011] “In another implementation, a computer-implemented method includes communicating, by a mobile computing device, with another computing device as part of a two-way video chat session over a network connection; …initiating contact with the appropriate responder service on behalf of the mobile computing device; recording video using one or more cameras that are accessible to a computing device; modifying the video by adding one or more features; and transmitting the modified video with the features to a remote storage system for persistent storage.”).
Verma and deCharms are analogous in the art because both teach methods and systems for generating alerts in response to incidents to alert authorized agencies. Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to modify Verma’s invention by allowing an authorized agency to join a video call with the client in response to an alert. Verma teaches the capabilities of creating alerts and informing the police when the client leaves the premises (see paragraph 0030), and the system is capable of hosting video calls between the client and others such as the caregiver or the client’s family members (see paragraph 0029). This modification is the result of combining prior art elements according to known methods to yield predictable results.
Regarding claim 13, Verma teaches wherein the identification session comprises at least one breakout room, and wherein the instructions to transmit, by the video conference provider, the indication to increase the prominence of the relevant multimedia stream during the identification session cause the processor to execute further processor-executable instructions stored in the non-transitory computer-readable medium (Fig. 1; [0009] “…and to compare the video with pre stored images for hazards like fire, fall.” The system is able to store images for use by the AI algorithms. [0011] “…and a database system to store, retrieve and archive all the data in the system.” Claim 1 “…a server which handles secure communication with all the connected devices...) to:
Verma fails to teach assigning participants to a breakout room. However, Slotznick teaches determine, by the video conference provider, a breakout room assignment for the relevant multimedia stream ([Col. 10, lines 53-59] In prior art, when a host wishes to initiate breakout rooms, a window, 601, pops up. The window displays options for the host to choose, such as the number of breakout rooms (603) and whether participants are assigned to the rooms automatically (605) or manually (607), or participants may choose the breakout room they join (609).”). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Verma’s invention by placing the relevant multimedia stream into a breakout room, because such a modification is the result of combining prior art elements according to known methods to yield predictable results.
Additionally, Verma fails to teach adding an agency device to the session and allowing the agency device to view the relevant multimedia stream. However, deCharms teaches transmit, by the video conference provider, the relevant multimedia stream to the authorized agency device based on the breakout room assignment ([0011] “In another implementation, a computer-implemented method includes communicating, by a mobile computing device, with another computing device as part of a two-way video chat session over a network connection; …initiating contact with the appropriate responder service on behalf of the mobile computing device; recording video using one or more cameras that are accessible to a computing device; modifying the video by adding one or more features; and transmitting the modified video with the features to a remote storage system for persistent storage.”). Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to modify Verma’s invention by allowing an authorized agency to join a video call with the client in response to an alert. Verma teaches the capabilities of creating alerts and informing the police when the client leaves the premises (see paragraph 0030), and the system is capable of hosting video calls between the client and others such as the caregiver or the client’s family members (see paragraph 0029).
Regarding claim 16, Verma teaches wherein the processor is configured to execute further processor-executable instructions stored in the non-transitory computer-readable medium (Fig. 1; [0009] “…and to compare the video with pre stored images for hazards like fire, fall.” The system is able to store images for use by the AI algorithms. [0011] “…and a database system to store, retrieve and archive all the data in the system.” Claim 1 “…a server which handles secure communication with all the connected devices...) to:
Although Verma teaches alerting authorized agencies of incidents and video conferencing with the system’s assistant operator, Verma fails to teach joining authorized agency devices to an identification session. However, deCharms teaches join, by the video conference provider, a first authorized agency device to the identification session ([0011] “In another implementation, a computer-implemented method includes communicating, by a mobile computing device, with another computing device as part of a two-way video chat session over a network connection; …initiating contact with the appropriate responder service on behalf of the mobile computing device; recording video using one or more cameras that are accessible to a computing device; modifying the video by adding one or more features; and transmitting the modified video with the features to a remote storage system for persistent storage.”). Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to modify Verma’s invention by allowing an authorized agency to join a video call with the client in response to a