Office Action Analysis: 18840889 — VIDEO RECORDING SYSTEM AND METHOD FOR COMPOSITING

Office Action

§102 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claim 1-11 objected to because of the following typographic error(s):
Claim 1, introduces “the basis of a user video” where it is expected to be “a gaze area of a user”. Appropriate correction is required.
Claim 2-11, introduces “the video recording system for compositing” where it is expected to be “the video recording system for video compositing”. Appropriate correction is required.
Claim 2, introduces “current basic posture still image” where it is expected to be “basic posture still image”. Appropriate correction is required.
Specification
The disclosure is objected to because of the following label(s) error:
Paragraph 39, line(s) 23-25; References to a label “one recording apparatus 110”. Figure 1, label 110, refers to a “camera device & monitor”. Appropriate correction is required.
Paragraph 42, line(s) 13; References to a label “first monitor 120”. Figure 1, label 120, refers to a “camera device & monitor”. Appropriate correction is required.
Paragraph 44, line(s) 28; paragraph 52, line(s) 13; References to a label “first monitor 120”. Figure 1, label 120, refers to a “camera device & monitor”. Appropriate correction is required.
Paragraph 47, line(s) 11; References to a label “camera device, 110”. Figure 1, label 110, refers to a “camera device & monitor”. Appropriate correction is required.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claim(s) 1-5, 10, and 12-16 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Lianides et al. (U.S. Pub. No. 2022/0105389).
Regarding claim 1, Lianides discloses a video recording system for video (Lianides: paragraph 1, line(s) 1 "The present invention relates generally to a system") compositing, comprising: a first monitor (Lianides: FIG. 1, 110; also, paragraph 14, line(s) 15-16 "an audio and video displaying device 110 (e.g., a computer monitor, a laptop or tablet's screen display, or the like)") which is positioned in a gaze area of a user and is-configured for outputting a live video of the user (Lianides: FIG.5; also, FIG 6; also, paragraph 12, line(s) 1-4 "FIG. 5 is diagram illustrating the user live video feed with the superposed skeleton image displayed in the user interface as shown in FIG. 3 when the user is doing an exercise correctly; and"; also, paragraph 13, line(s) 1-4 "FIG. 6 is a diagram illustrating the user live video feed with the superposed skeleton image displayed in the user interface as shown in FIG. 3 when the user is doing an exercise incorrectly."; also, paragraph 5, line(s) 15-24 "a provider video capturing device, a provider audio capturing device, and a provider interface controlled by a provider frontend application; the user interfacing device and the provider interfacing device are communicating video data and audio data via an interactive communication API over a network wherein the video data includes a user live stream showing the user body image and the user poses captured by the user video capturing device and displayed on the user interface and the provider interface";) and a basic posture still image displayed to be superimposed on the live video of the user; a recording apparatus configured for recording the user (Lianides: paragraph 5, line(s) 10-12 " user interfacing device further includes a user video capturing device, a user audio capturing device"); and an image controller configured for transmitting the basic posture still image and the live video of the user to the first monitor on the basis of a user video transmitted from the recording apparatus (Lianides: paragraph 16, line(s) 3-19 "user interface 114 and the provider interface 120 accessed via the user internet browser 116 and the provider internet browser 122 to send, receive, and/or share (collectively hereinafter referred to as “communicate”): (a) at least one video data stream and audio data stream during the ARPT using Twilio video chat API, an equivalence such as WebRTC, Pubnub, TokBox, or the like (hereinafter collectively referred to “interactive communication API” 192) over a network 119; and (b) at least one data stream via Twilio DataTrack API, an equivalence such as a web socket, or a web socket interface such as socket.io or the like (collectively hereinafter referred as “data communication API” 194) over the network 119. The video stream sends and receives video data 126 between the user interface 114 and the provider interface 120, and the frontend applications (118, 124) render the video data 126 for the user and the provider to see."; also, paragraph 12, line(s) 1-4 "FIG. 5 is diagram illustrating the user live video feed with the superposed skeleton image displayed in the user interface as shown in FIG. 3 when the user is doing an exercise correctly; and") and changing the basic posture still image transmitted to the first monitor when an image conversion condition is met while recording the live video of the user (Lianides: paragraph 13, line(s) 1-4 "FIG. 6 is a diagram illustrating the user live video feed with the superposed skeleton image displayed in the user interface as shown in FIG. 3 when the user is doing an exercise incorrectly.").
Regarding claim 2, Lianides discloses the video recording system for compositing of claim 1, wherein the basic posture still image includes a plurality of still images including a standby posture still image, the start still image of the gesture, and the end still image of the gesture (Lianides: paragraph 33, line(s) 1-10 "an image superposing process 414 whereby the user frontend application 118 (and optionally the provider frontend application 124) uses the pose rendering library 198 and the analyzed body motion frame data 157 to create and overlay a superposed skeleton image 206 onto the user body image 208 shown in the user live stream 166. The image superposing process 414 allows the superposed skeleton image 206 to dynamically tracks and moves with the movements of the user's markers 204"), and the image controller transmits one of the standby posture still image, the start still image of the gesture, and the end still image of the gesture, as a next basic posture still image of the current basic posture still image transmitted to the first monitor (Lianides: paragraph 47, line(s) 1-4 "If the user is in the correct pose for a given number of frames consecutively (e.g., ≥5 frames), the user pose is considered matched to the target pose, and the reference image shifts to the next stage of the exercise"), to the first monitor when the image conversion condition is met 
Regarding claim 3, Lianides discloses the video recording system for compositing of claim 1,  further comprising: a user terminal configured for transmitting an image conversion request to the image controller (Lianides: FIG.5; also, FIG 6; also, paragraph 12, line(s) 1-4 "FIG. 5 is diagram illustrating the user live video feed with the superposed skeleton image displayed in the user interface as shown in FIG. 3 when the user is doing an exercise correctly; and"; also, paragraph 13, line(s) 1-4 "FIG. 6 is a diagram illustrating the user live video feed with the superposed skeleton image displayed in the user interface as shown in FIG. 3 when the user is doing an exercise incorrectly."; also, paragraph 38, line(s) 5-19 "The system 100 uses the video capturing device 106 and the user frontend application 118 to capture and track each video frame of the reference user doing an exercise correctly during the reference user live stream 166 in order to provide the captured body motion frame data 156. This data 156 is then analyzed by the pose detection model 196 to detect the reference user's markers 204 in real-time thereby creating the analyzed body motion frame data 157. This analyzed body motion frame data 157 includes an (X, Y) coordinate for each marker 204 and a confidence score 210 for each coordinate. The analyzed body motion frame data 157 is sent back to the user frontend application 118 where it is processed by the pose rendering library 198 to create the target poses shown in the reference skeleton image 202."), wherein the image controller is configured to determine that the image conversion condition is met when a still image of the live video of the user matches the basic posture still image, or an image conversion request is received from the user terminal, or a preset time elapses after transmitting the basic posture still image to the first monitor while recording the live video of the user (Lianides: paragraph 38, line(s) 5-19 "The system 100 uses the video capturing device 106 and the user frontend application 118 to capture and track each video frame of the reference user doing an exercise correctly during the reference user live stream 166 in order to provide the captured body motion frame data 156. This data 156 is then analyzed by the pose detection model 196 to detect the reference user's markers 204 in real-time thereby creating the analyzed body motion frame data 157. This analyzed body motion frame data 157 includes an (X, Y) coordinate for each marker 204 and a confidence score 210 for each coordinate. The analyzed body motion frame data 157 is sent back to the user frontend application 118 where it is processed by the pose rendering library 198 to create the target poses shown in the reference skeleton image 202.").
Regarding claim 4, Lianides discloses the video recording system for compositing of claim 1, wherein the image controller is configured to output a compositing criterion meeting notification when a still image of the live video of the user while recording the live video of the user meets a preset compositing criterion compared to the basic posture still image (Lianides: paragraph 42, line(s) 1-9 “In order to ensure that the user's body shape and size are properly considered and evaluated during the AR process 400, the image superposing process 414 uses the above-described normalization process to compare the bounding box around the user pose's markers 204 to the bounding box around the target pose's markers 204. This comparison results in a factor that the system 100 must scale the user pose's markers 204 in order to match the target pose's markers 204.”).
Regarding claim 5, Lianides discloses the video recording system for compositing of claim 1, wherein the image controller is configured to output guidance on compositing criterion including at least one of a posture change portion for the still image of the live video of the user to match the basic posture still image, a description of the posture change portion, and an identification number of the posture change portion through the first monitor when a still image of the live video of the user while recording the live video of the user does not meet a preset compositing criterion compared to the basic posture still image (Lianides: paragraph 43, line(s) 1-9 "During the motion tracking process 412 and the image superposing process 414, the AR process 400 further provides for a movement matching process 416 whereby the user frontend application 118 (and optionally the provider frontend application 124) also uses the pose matching algorithm to determine whether the corresponding target poses shown in the reference skeleton image 202 have been matched by the user poses shown in the superposed skeleton image 206; also, paragraph 46, line(s) 1-22 "During the motion tracking process 412, the image superposing process 414, and the movement matching process 416, the AR process 400 further provides for a movement alerting process 418 whereby if the similarity score for a predetermined number of frames is under or within the target threshold, then the system 400 renders that certain markers 204 for specific body part(s) of the superposed skeleton image 206 a particular color 212 (e.g., green and shown in FIG. 5 as a solid line). Otherwise, the system 400 renders such markers 204 of the superposed skeleton image 206 a different color 214 (e.g., red and shown in FIG. 6 as a dashed line). For example, in one exemplary embodiment, in order to get color 212, the user pose shown in the superposed skeleton image 206 must meet the match target threshold of having the similarity score to be less than 0.07 and having such “correct” pose held for at least 5 frames. The system 100 further optionally allows the provider to adjust the requirement/strictness of the target threshold by setting the cosine similarity to be a specific level (e.g., <0.07 for strict, <0.08 for medium, <0.09 for lenient) and/or the duration of the “correct” pose (e.g., >5 frames, >10 frames, >30 frames, etc.)"; also, paragraph 48, line(s) 1-5 "During the movement alerting process 418, the system 100 using the user frontend application 118 optionally counts and displays on the interfaces (114, 120) the number of repetitions of the exercise completed by the user matching the target pose").
Regarding claim 10, Lianides discloses the video recording system for compositing of claim 1, further comprising: a recorder configured for receiving the user video transmitted from the recording apparatus (Lianides: paragraph 5, line(s) 10-12 " user interfacing device further includes a user video capturing device, a user audio capturing device"; also, paragraph 10, line(s) 1-4 "FIG. 3 is a diagram of an exemplary embodiment of the user interface of the augmented-reality system during a guided augmented reality physical therapy in accordance with embodiments of the present invention;"; also, paragraph 11, line(s) 1-4 "FIG. 4 is a diagram of an exemplary embodiment of the provider interface of the augmented-reality system during a guided augmented reality physical therapy in accordance with embodiments of the present invention;") and transmitting the user video to the image controller; and a control panel configured for receiving control information input by a user and transmitting the control information to the image controller (Lianides: paragraph 16, line(s) 15-26 "The video stream sends and receives video data 126 between the user interface 114 and the provider interface 120, and the frontend applications (118, 124) render the video data 126 for the user and the provider to see. The audio stream sends and receives audio data 128 between the user interface 114 and the provider interface 120, and the frontend applications (118, 124) render the audio data 128 for the user and the provider to hear. The data stream sends and receives additional data 130 between the user interface 114 and the provider interface 120 via their respective internet browsers (116, 122) and the frontend applications (118, 124)").
Regarding claim 12, Lianides discloses a video recording method for video (paragraph 1, line(s) 1-2 "The present invention relates generally to a system and a method") compositing, comprising: recording a user video using a recording apparatus (Lianides: paragraph 5, line(s) 10-12 " user interfacing device further includes a user video capturing device, a user audio capturing device"); outputting a live video of a user (FIG.5; also, FIG 6; also, paragraph 12, line(s) 1-4 "FIG. 5 is diagram illustrating the user live video feed with the superposed skeleton image displayed in the user interface as shown in FIG. 3 when the user is doing an exercise correctly; and"; also, paragraph 13, line(s) 1-4 "FIG. 6 is a diagram illustrating the user live video feed with the superposed skeleton image displayed in the user interface as shown in FIG. 3 when the user is doing an exercise incorrectly."; also, paragraph 5, line(s) 15-24 "a provider video capturing device, a provider audio capturing device, and a provider interface controlled by a provider frontend application; the user interfacing device and the provider interfacing device are communicating video data and audio data via an interactive communication API over a network wherein the video data includes a user live stream showing the user body image and the user poses captured by the user video capturing device and displayed on the user interface and the provider interface";) and a basic posture still image displayed to be superimposed (Lianides: paragraph 5, line(s) 38-40 "creating a superposed skeleton image onto the user body image displayed on the user live stream using the user frontend application") on the live video of the user through the first monitor (Lianides: FIG. 1, 110; also, paragraph 14, line(s) 15-16 "an audio and video displaying device 110 (e.g., a computer monitor, a laptop or tablet's screen display, or the like)") on the basis of a user video transmitted from the recording apparatus; checking whether an image conversion condition is met while recording the live video of the user; and changing the basic posture still image output through the first monitor when it is checked that the image conversion condition is met (Lianides: paragraph 13, line(s) 1-4 "FIG. 6 is a diagram illustrating the user live video feed with the superposed skeleton image displayed in the user interface as shown in FIG. 3 when the user is doing an exercise incorrectly.").
Regarding claim 13, Lianides discloses the video recording method for compositing of claim 12, wherein the basic posture still image includes a plurality of still images including a standby posture still image, a start still image of a gesture, and an end still image of a gesture (Lianides: paragraph 33, line(s) 1-10 "an image superposing process 414 whereby the user frontend application 118 (and optionally the provider frontend application 124) uses the pose rendering library 198 and the analyzed body motion frame data 157 to create and overlay a superposed skeleton image 206 onto the user body image 208 shown in the user live stream 166. The image superposing process 414 allows the superposed skeleton image 206 to dynamically tracks and moves with the movements of the user's markers 204"), and in the changing of the basic posture still image, one of the standby posture still image, the start still image of the gesture, and the end still image of the gesture, as a next basic posture still image of the current basic posture still image is transmitted to the first monitor, to the first monitor (Lianides: paragraph 47, line(s) 1-4 "If the user is in the correct pose for a given number of frames consecutively (e.g., ≥5 frames), the user pose is considered matched to the target pose, and the reference image shifts to the next stage of the exercise").
Regarding claim 14, Lianides discloses the video recording method for compositing of claim 12, wherein, in the checking of whether the image conversion condition is met, it is determined that the image conversion condition is met when a still image of the live video of the user matches the basic posture still image, or an image conversion request is received from the user terminal, or a preset time elapses after transmitting the basic posture still image to the first monitor while recording the live video of the user (Lianides: paragraph 38, line(s) 5-19 "The system 100 uses the video capturing device 106 and the user frontend application 118 to capture and track each video frame of the reference user doing an exercise correctly during the reference user live stream 166 in order to provide the captured body motion frame data 156. This data 156 is then analyzed by the pose detection model 196 to detect the reference user's markers 204 in real-time thereby creating the analyzed body motion frame data 157. This analyzed body motion frame data 157 includes an (X, Y) coordinate for each marker 204 and a confidence score 210 for each coordinate. The analyzed body motion frame data 157 is sent back to the user frontend application 118 where it is processed by the pose rendering library 198 to create the target poses shown in the reference skeleton image 202.").
Regarding claim 15, Lianides discloses the video recording method for compositing of claim 12, wherein, in the checking of whether the image conversion condition is met, it is checked whether a still image of the live video of the user while recording the live video of the user meets a preset compositing criterion compared to the basic posture still image (Lianides: paragraph 42, line(s) 1-9 “In order to ensure that the user's body shape and size are properly considered and evaluated during the AR process 400, the image superposing process 414 uses the above-described normalization process to compare the bounding box around the user pose's markers 204 to the bounding box around the target pose's markers 204. This comparison results in a factor that the system 100 must scale the user pose's markers 204 in order to match the target pose's markers 204.”), and when it is checked that the preset compositing criterion is met, a compositing criterion meeting notification is output before the changing of the basic posture still (Lianides: paragraph 47, line(s) 1-10 "If the user is in the correct pose for a given number of frames consecutively (e.g., ≥5 frames), the user pose is considered matched to the target pose, and the reference image shifts to the next stage of the exercise. If it is at the end position of a particular movement, the user is said to have completed a repetition, and the repetition count 174 discussed below is updated. If the user is not in the correct pose for the whole body, then need to show the user which portion of his body (i.e., specific individual body part) is in an incorrect pose/position."; also, Fig. 6; also, paragraph 13, line(s)1-4 "FIG. 6 is a diagram illustrating the user live video feed with the superposed skeleton image displayed in the user interface as shown in FIG. 3 when the user is doing an exercise incorrectly."; also, Fig. 5; also, paragraph 12, line(s) 1-4 "FIG. 5 is diagram illustrating the user live video feed with the superposed skeleton image displayed in the user interface as shown in FIG. 3 when the user is doing an exercise correctly; and").
Regarding claim 16, Lianides discloses the video recording method for compositing of claim 12, wherein, in the checking of whether the image conversion condition is met, it is checked whether a still image of the live video of the user while recording the live video of the user meets a preset compositing criterion compared to the basic posture still image, and when it is checked that the preset compositing criterion is not met, in the outputting of the basic posture still image through the first monitor, guidance on compositing criterion including at least one of a posture change portion for the still image of the live video of the user to match the basic posture still image, a description of the posture change portion, and an identification number of the posture change portion is output through the first monitor (Lianides: paragraph 43, line(s) 1-9 "During the motion tracking process 412 and the image superposing process 414, the AR process 400 further provides for a movement matching process 416 whereby the user frontend application 118 (and optionally the provider frontend application 124) also uses the pose matching algorithm to determine whether the corresponding target poses shown in the reference skeleton image 202 have been matched by the user poses shown in the superposed skeleton image 206; also, paragraph 46, line(s) 1-22 "During the motion tracking process 412, the image superposing process 414, and the movement matching process 416, the AR process 400 further provides for a movement alerting process 418 whereby if the similarity score for a predetermined number of frames is under or within the target threshold, then the system 400 renders that certain markers 204 for specific body part(s) of the superposed skeleton image 206 a particular color 212 (e.g., green and shown in FIG. 5 as a solid line). Otherwise, the system 400 renders such markers 204 of the superposed skeleton image 206 a different color 214 (e.g., red and shown in FIG. 6 as a dashed line). For example, in one exemplary embodiment, in order to get color 212, the user pose shown in the superposed skeleton image 206 must meet the match target threshold of having the similarity score to be less than 0.07 and having such “correct” pose held for at least 5 frames. The system 100 further optionally allows the provider to adjust the requirement/strictness of the target threshold by setting the cosine similarity to be a specific level (e.g., <0.07 for strict, <0.08 for medium, <0.09 for lenient) and/or the duration of the “correct” pose (e.g., >5 frames, >10 frames, >30 frames, etc.)"; also, paragraph 48, line(s) 1-5 "During the movement alerting process 418, the system 100 using the user frontend application 118 optionally counts and displays on the interfaces (114, 120) the number of repetitions of the exercise completed by the user matching the target pose").
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 6-9, 11, and 17-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lianides et al. (U.S. Pub. No. 2022/0105389) in view of Olshansky (U.S. Pub. No. 2022/0019806).
Regarding claim 6, Lianides discloses the video recording system for compositing of claim 1, wherein the recording apparatus (Lianides: paragraph 5, line(s) 15-24 "a provider video capturing device, a provider audio capturing device, and a provider interface controlled by a provider frontend application; the user interfacing device and the provider interfacing device are communicating video data and audio data via an interactive communication API over a network wherein the video data includes a user live stream showing the user body image and the user poses captured by the user video capturing device and displayed on the user interface and the provider interface";) includes a plurality of recording apparatuses located in different areas to record videos from different angles for the user. Lianides does not disclose a plurality of recording apparatuses located in different areas to record videos from different angles for the user.
However, in a similar field of endeavor, Olshansky discloses plurality of recording apparatuses located in different areas to record videos from different angles for the user (Olshansky: paragraph 44, line(s) 2-4 "a first camera 122, a second camera 124, and a third camera 126. Each of the cameras 120 is capable of recording video of the individual 110 from different angles.").
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified Lianides invention of the video recording system for compositing of claim 1 and the recording apparatus with the features of Olshansky's invention where a plurality of recording apparatuses located in different areas to record videos from different angles for the user. As demonstrated by Olshansky, one could add in the support for viewing individuals from different camera point of views to further improve on the video recording system.
Regarding claim 7, Lianides as modified by Olshansky discloses the video recording system for compositing of claim 6, wherein the image controller is configured to manage a plurality of user videos transmitted from the plurality of recording apparatuses by dividing the plurality of user videos into one of a gesture unit and a preset length unit and matching each of the plurality of user videos with at least one of video identification information and recording apparatus's identification information(Olshansky: paragraph 68, line(s) 1-12 "The video computer 430 is similarly responsible for the control and receipt of video data from cameras 120. The video computer 430 is preferably specially configured to handle video processing in an efficient manner. In one embodiment, the video computer 430 contains a high-powered graphics processing unit (or GPU) that will speed up its handling of the multiple video feeds coming from the multiple cameras 120. The GPU can be responsible, for instance, for all video encoding and decoding required by the booth 400. The video information received from the cameras 120 are stored on the video computer as video data 432."; also, paragraph 103, line(s) 9-13 "Similarly, camera one data 1010 is divided into four video segments 1012, 1014, 1016, 1018 in FIG. 10, while camera two data 1020 is divided into segments 1022, 1024, 1026, 1028 and camera three data 1030 is divided into segments 1032, 1034, 1036, and 1038"; also, paragraph 104, line(s) 1-18 "Although determining when certain instructions 412 are provided to the individual 110 is one of the best ways to divide up the time segment data 610, it is not the only way. The incorporated Ser. No. 16/366,746 patent application, for example, describe other techniques for defining time segments 810. This application described these techniques as searching for “switch-initiating events” that can be detected in the content of data 419, 422, 432, 434 recorded at the booth 400. Furthermore, behavioral data analysis 600 created by analyzing this recorded data can also be helpful. For example, facial recognition data, gesture recognition data, posture recognition data, and speech-to-text can be monitored to look for switch-initiating events. For example, if the candidate turns away from one of the video cameras to face a different video camera, the system can detect that motion and note it as a switch-initiating event. Hand gestures or changes in posture can also be used to trigger the system to cut from one camera angle to a different camera angle.").
Regarding claim 8, Lianides as modified by Olshansky discloses the video recording system for compositing of claim 7, wherein the plurality of user videos includes videos from different angles, and the image controller is configured to group the plurality of user videos including the videos from different angles into videos of the same angle, or configured to group all of a plurality of user videos recorded at the same time and manage grouped videos separately (Olshansky: paragraph 103, line(s) 9-13 "Similarly, camera one data 1010 is divided into four video segments 1012, 1014, 1016, 1018 in FIG. 10, while camera two data 1020 is divided into segments 1022, 1024, 1026, 1028 and camera three data 1030 is divided into segments 1032, 1034, 1036, and 1038"; also, paragraph 104, line(s) 1-18 "Although determining when certain instructions 412 are provided to the individual 110 is one of the best ways to divide up the time segment data 610, it is not the only way. The incorporated Ser. No. 16/366,746 patent application, for example, describe other techniques for defining time segments 810. This application described these techniques as searching for “switch-initiating events” that can be detected in the content of data 419, 422, 432, 434 recorded at the booth 400. Furthermore, behavioral data analysis 600 created by analyzing this recorded data can also be helpful. For example, facial recognition data, gesture recognition data, posture recognition data, and speech-to-text can be monitored to look for switch-initiating events. For example, if the candidate turns away from one of the video cameras to face a different video camera, the system can detect that motion and note it as a switch-initiating event. Hand gestures or changes in posture can also be used to trigger the system to cut from one camera angle to a different camera angle."; also, paragraph 105, line(s) 1-31 "While the Ser. No. 16/366,746 patent application primarily defines switch-initiating events in the context of switching cameras, these events are equally useful for dividing the time segment data 610 into different time segments 810. In one embodiment, the changes in instruction data 412 provided to the individual 110 are first used to create the separate time segments 810. Switching events detected within a single time segment 810 can then be used to split that time segment 810 into two different time segments 810. For example, the Ser. No. 16/366,746 application explains that the identification of low-noise event can be considered a switch-initiating events. If an average decibel level over a particular range of time (such as 4 seconds) is below a threshold level (such as 30 decibels), this will be considered a low noise audio segment that can be used to subdivide time segments 810. In the context of an interview, time segment 814 can originally be defined to cover the entire answer the individual 110 provided to a first instruction 412. If a low-noise event is identified within that answer, time segment 814 is split into two different time segments—one before the low-noise event and one after the low-noise event. Furthermore, this incorporated patent application describes the ability to optionally remove extended low volume segments or pauses from an audiovisual presentation 714 altogether. If time segment 814 were divided into two using this technique, the first of these new time segments would be the time before the beginning of the low noise event, and the second time segment would be the time after the low-volume segment or pause is completed, thereby removing the low volume segment from any of the defined time segments 810"; also, paragraph 107, line(s) 1-16 "For each time segment 810, the controller computer 410 can select the preferred audio and video data source. For instance, if time segment two 814 is desired in the presentation 714, then the controller computer 410 can select between the two microphones 130—namely between audio segment two (M1) 914 and audio segment two (M2) 924—for the audio. The controller computer 410 would also select between the three cameras—between video segment two (C1) 1014, video segment two (C2) 1024, and video segment two (C3) 1034. If the controller computer 410 determines that the best presentation of time segment two 814 is to use audio segment two (M1) 914 and video segment two (C3) 1034, then it will record that determination and use that audio segment 914 and that video segment 1034 whenever time segment two 814 is desired as part of a presentation 714."; also, paragraph 108, line(s) 3-8 "While this may have originally been only a single time segment 810, the process of subdividing the time segments 810 (such as by searching for switching events as described above) may have split this into multiple segments 810. These multiple segments can be grouped together by the controller computer 410").
Regarding claim 9, Lianides  discloses the video recording system for compositing of claim 1, further comprising: a second monitor located outside a recording booth and configured for receiving and outputting the same information as the output information of the first monitor transmitted from the image controller (Lianides: FIG.5; also, FIG 6; also, paragraph 12, line(s) 1-4 "FIG. 5 is diagram illustrating the user live video feed with the superposed skeleton image displayed in the user interface as shown in FIG. 3 when the user is doing an exercise correctly; and"; also, paragraph 13, line(s) 1-4 "FIG. 6 is a diagram illustrating the user live video feed with the superposed skeleton image displayed in the user interface as shown in FIG. 3 when the user is doing an exercise incorrectly."; also, paragraph 5, line(s) 15-24 "a provider video capturing device, a provider audio capturing device, and a provider interface controlled by a provider frontend application; the user interfacing device and the provider interfacing device are communicating video data and audio data via an interactive communication API over a network wherein the video data includes a user live stream showing the user body image and the user poses captured by the user video capturing device and displayed on the user interface and the provider interface";). Lianides does not disclose the use of a second monitor being located outside of a recording booth.
However, in a similar field of endeavor, Olshansky discloses a second monitor located outside a recording booth (Olshansky: paragraph 42, line(s) 8-9 "possible for the computers 10 to be located outside the booth").
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified Lianides invention of the video recording system for compositing of claim 1 and configured for receiving and outputting the same information as the output information of the first monitor transmitted from the image controller with the features of Olshansky's invention where a second monitor can be located outside a recording booth. As demonstrated by Olshansky, one could add in a series of monitors outside of a recording booth and mirror the screen of monitor one so that the second monitor shares the same information as the first monitor. 
Regarding claim 11, Lianides discloses the video recording system for compositing of claim 1, wherein the first monitor (Lianides: FIG. 1, 110; also, paragraph 14, line(s) 15-16 "an audio and video displaying device 110 (e.g., a computer monitor, a laptop or tablet's screen display, or the like)") is located inside the recording booth. Lianides  does not disclose the first monitor being located inside the recording booth.
However, in a similar field of endeavor, Olshansky discloses the location of a monitor  inside the recording booth (Olshansky: Figure 1; also, paragraph 49, line(s) 1-4 "The kiosk 100 also includes one or more user interfaces 150. User interface 150 is shown as a display screen that can display content and images to the individual 110").
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified Lianides invention of the video recording system for compositing of claim 1 and the first monitor with the features of Olshansky's invention where the first monitor is located inside the recording booth. As demonstrated by Olshansky, one could add in the support for a booth to contain a displaying screen.
Regarding claim 17, Lianides discloses the video recording method for compositing of claim 12, wherein the recording apparatus (Lianides: paragraph 5, line(s) 15-24 "a provider video capturing device, a provider audio capturing device, and a provider interface controlled by a provider frontend application; the user interfacing device and the provider interfacing device are communicating video data and audio data via an interactive communication API over a network wherein the video data includes a user live stream showing the user body image and the user poses captured by the user video capturing device and displayed on the user interface and the provider interface";) includes a plurality of recording apparatuses located in different areas to record videos from different angles for the user and the video recording method for compositing further comprises managing a plurality of user videos transmitted from the plurality of recording apparatuses by dividing the plurality of user videos into one of a gesture unit and a preset length unit and matching each of the plurality of user videos with at least one of video identification information and recording apparatus's identification information Lianides does not disclose video recording method for compositing further comprises managing a plurality of user videos transmitted from the plurality of recording apparatuses by dividing the plurality of user videos into one of a gesture unit and a preset length unit and matching each of the plurality of user videos with at least one of video identification information and recording apparatus's identification information.
However, in a similar field of endeavor, Olshansky discloses the video recording method for compositing further comprises managing a plurality of user videos transmitted from the plurality of recording apparatuses by dividing the plurality of user videos into one of a gesture unit and a preset length unit and matching each of the plurality of user videos with at least one of video identification information and recording apparatus's identification information (Olshansky: paragraph 68, line(s) 1-12 "The video computer 430 is similarly responsible for the control and receipt of video data from cameras 120. The video computer 430 is preferably specially configured to handle video processing in an efficient manner. In one embodiment, the video computer 430 contains a high-powered graphics processing unit (or GPU) that will speed up its handling of the multiple video feeds coming from the multiple cameras 120. The GPU can be responsible, for instance, for all video encoding and decoding required by the booth 400. The video information received from the cameras 120 are stored on the video computer as video data 432."; also, paragraph 103, line(s) 9-13 "Similarly, camera one data 1010 is divided into four video segments 1012, 1014, 1016, 1018 in FIG. 10, while camera two data 1020 is divided into segments 1022, 1024, 1026, 1028 and camera three data 1030 is divided into segments 1032, 1034, 1036, and 1038"; also, paragraph 104, line(s) 1-18 "Although determining when certain instructions 412 are provided to the individual 110 is one of the best ways to divide up the time segment data 610, it is not the only way. The incorporated Ser. No. 16/366,746 patent application, for example, describe other techniques for defining time segments 810. This application described these techniques as searching for “switch-initiating events” that can be detected in the content of data 419, 422, 432, 434 recorded at the booth 400. Furthermore, behavioral data analysis 600 created by analyzing this recorded data can also be helpful. For example, facial recognition data, gesture recognition data, posture recognition data, and speech-to-text can be monitored to look for switch-initiating events. For example, if the candidate turns away from one of the video cameras to face a different video camera, the system can detect that motion and note it as a switch-initiating event. Hand gestures or changes in posture can also be used to trigger the system to cut from one camera angle to a different camera angle."). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified Lianides invention of the video recording method for compositing of claim 12, the recording apparatus, and the video recording method for compositing further comprises managing a plurality of user videos transmitted from the plurality of recording apparatuses by dividing the plurality of user videos into one of a gesture unit and a preset length unit and matching each of the plurality of user videos with at least one of video identification information and recording apparatus's identification information with the features of Olshansky's invention that includes a plurality of recording apparatuses located in different areas to record videos from different angles for the user. As demonstrated by Olshansky, one could add in the support for viewing individuals from different camera point of views to further improve on the video recording system.
Regarding claim 18, Lianides as modified by Olshansky discloses the video recording method for compositing of claim 17, wherein the plurality of user videos includes videos from different angles, and in the managing, the plurality of user videos including the videos from different angles are grouped into videos of the same angle, or all of a plurality of user videos recorded at the same time are grouped, and grouped videos are managed separately (Olshansky: paragraph 103, line(s) 9-13 "Similarly, camera one data 1010 is divided into four video segments 1012, 1014, 1016, 1018 in FIG. 10, while camera two data 1020 is divided into segments 1022, 1024, 1026, 1028 and camera three data 1030 is divided into segments 1032, 1034, 1036, and 1038"; also, paragraph 104, line(s) 1-18 "Although determining when certain instructions 412 are provided to the individual 110 is one of the best ways to divide up the time segment data 610, it is not the only way. The incorporated Ser. No. 16/366,746 patent application, for example, describe other techniques for defining time segments 810. This application described these techniques as searching for “switch-initiating events” that can be detected in the content of data 419, 422, 432, 434 recorded at the booth 400. Furthermore, behavioral data analysis 600 created by analyzing this recorded data can also be helpful. For example, facial recognition data, gesture recognition data, posture recognition data, and speech-to-text can be monitored to look for switch-initiating events. For example, if the candidate turns away from one of the video cameras to face a different video camera, the system can detect that motion and note it as a switch-initiating event. Hand gestures or changes in posture can also be used to trigger the system to cut from one camera angle to a different camera angle."; also, paragraph 105, line(s) 1-31 "While the Ser. No. 16/366,746 patent application primarily defines switch-initiating events in the context of switching cameras, these events are equally useful for dividing the time segment data 610 into different time segments 810. In one embodiment, the changes in instruction data 412 provided to the individual 110 are first used to create the separate time segments 810. Switching events detected within a single time segment 810 can then be used to split that time segment 810 into two different time segments 810. For example, the Ser. No. 16/366,746 application explains that the identification of low-noise event can be considered a switch-initiating events. If an average decibel level over a particular range of time (such as 4 seconds) is below a threshold level (such as 30 decibels), this will be considered a low noise audio segment that can be used to subdivide time segments 810. In the context of an interview, time segment 814 can originally be defined to cover the entire answer the individual 110 provided to a first instruction 412. If a low-noise event is identified within that answer, time segment 814 is split into two different time segments—one before the low-noise event and one after the low-noise event. Furthermore, this incorporated patent application describes the ability to optionally remove extended low volume segments or pauses from an audiovisual presentation 714 altogether. If time segment 814 were divided into two using this technique, the first of these new time segments would be the time before the beginning of the low noise event, and the second time segment would be the time after the low-volume segment or pause is completed, thereby removing the low volume segment from any of the defined time segments 810"; also, paragraph 107, line(s) 1-16 "For each time segment 810, the controller computer 410 can select the preferred audio and video data source. For instance, if time segment two 814 is desired in the presentation 714, then the controller computer 410 can select between the two microphones 130—namely between audio segment two (M1) 914 and audio segment two (M2) 924—for the audio. The controller computer 410 would also select between the three cameras—between video segment two (C1) 1014, video segment two (C2) 1024, and video segment two (C3) 1034. If the controller computer 410 determines that the best presentation of time segment two 814 is to use audio segment two (M1) 914 and video segment two (C3) 1034, then it will record that determination and use that audio segment 914 and that video segment 1034 whenever time segment two 814 is desired as part of a presentation 714."; also, paragraph 108, line(s) 3-8 "While this may have originally been only a single time segment 810, the process of subdividing the time segments 810 (such as by searching for switching events as described above) may have split this into multiple segments 810. These multiple segments can be grouped together by the controller computer 410").
Regarding claim 19, Lianides discloses the video recording method for compositing of claim 12, wherein the first monitor (Lianides: FIG. 1, 110; also, paragraph 14, line(s) 15-16 "an audio and video displaying device 110 (e.g., a computer monitor, a laptop or tablet's screen display, or the like)") is located inside a recording booth, and in the outputting of the live video of the user and the basic posture still image through the first monitor (Lianides: FIG.5; also, FIG 6; also, paragraph 12, line(s) 1-4 "FIG. 5 is diagram illustrating the user live video feed with the superposed skeleton image displayed in the user interface as shown in FIG. 3 when the user is doing an exercise correctly; and"; also, paragraph 13, line(s) 1-4 "FIG. 6 is a diagram illustrating the user live video feed with the superposed skeleton image displayed in the user interface as shown in FIG. 3 when the user is doing an exercise incorrectly."; also, paragraph 5, line(s) 15-24 "a provider video capturing device, a provider audio capturing device, and a provider interface controlled by a provider frontend application; the user interfacing device and the provider interfacing device are communicating video data and audio data via an interactive communication API over a network wherein the video data includes a user live stream showing the user body image and the user poses captured by the user video capturing device and displayed on the user interface and the provider interface";), the same information as the output information of the first monitor is output through a second monitor located outside the recording booth. Lianides does not disclose a second monitor located outside the recording booth.
However, in a similar field of endeavor, Olshansky discloses a second monitor located outside the recording booth (Olshansky : paragraph 42, line(s) 8-9 "possible for the computers 10 to be located outside the booth").
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified Lianides invention of the video recording method for compositing of claim 12, the first monitor, and outputting of the live video of the user and the basic posture still image through the first monitor with the features of Olshansky's invention that include the first monitor being located inside of a recording booth, and a second monitor located outside the recording booth. As demonstrated by Olshansky, one could choose to modularize their displays in different positions so that it can either be within the confines of a room and outside of a room.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAI WEI TOMMY LI whose telephone number is (571)272-1170. The examiner can normally be reached 6:00AM-4:00PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached at (571) 272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/JAI W LI/Junior Examiner, Art Unit 2613          


/XIAO M WU/Supervisory Patent Examiner, Art Unit 2613
Read full office action
VIDEO RECORDING SYSTEM AND METHOD FOR COMPOSITING

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

VIDEO RECORDING SYSTEM AND METHOD FOR COMPOSITING

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email