DETAILED ACTION
This action is in response to the remarks filed 12/05/2025. Claims 1 – 20 are pending and have
been examined.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 1, 2023, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments filed 12/05/2025 have been fully considered but they are not persuasive.
Regarding Remarks filed 12/05/2025, Applicant states that amended claim 1 overcomes the cited references, as whether alone or in combination, the cited references fail to disclose or suggest at least these elements. Examiner respectfully disagrees. The amended limitation to claim 1: “performing at least one process on the first audio data and/or the first image data based on configuration information and/or usage information of an output component for outputting the target data to be outputted to obtain the target data to be outputted;” does not overcome the prior art of record, and can be taught under BRI by Baldwin (specifically, Jiao in view of Baldwin, please see below rejection). Baldwin discusses distributing communication of a data stream among multiple devices in which the capability of the device is focused on, and the communication is processed to account for the capabilities of said device, such as the bandwidth capability. In Baldwin, data transmission characteristics are based on the device capability information, this can be interpreted as the usage of the device or the configuration of the device. Therefore, Claim 1 remains as rejected under 35 USC § 103, as well as dependent claims 2 – 9. Independent claims 10 and 11, although different in scope from independent claim 1, recite elements similar to those of claim 1 discussed above. Therefore, for the same reasons above, remain rejected under 35 USC § 103, as well as dependent claims 12 – 20.
Response to Amendment
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 9-11, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Jiao (CN112887654 A) in view of Baldwin et. al (US 20140040364 A1, hereinafter Baldwin).
Regarding Claim 1, Jiao teaches
A processing method applied to a first electronic device (“the video collecting component” Jiao (Page 2, ¶6)), comprising:
obtaining first audio data and/or first image data (“the at least two image collecting module, for collecting respectively least two first video in different directions” Jiao (Page 2, ¶6));
performing at least one process on the first audio data and/or the first image data to obtain target data to be outputted (“the data processing component is used for receiving the second video transmitted by the image processor; based on the second video, determining the video to be displayed” Jiao (Page 2, ¶8)), including:
transmitting the target data (“sending the to-be-displayed video” Jiao (Page 3, ¶15)) to be outputted to a target application running on a second electronic device (“to the electronic device and/or display component, so that the electronic device and/or the display component displays the video to be displayed.” Jiao (Page 3, ¶15)) having a communication connection with the first electronic device (the video collecting component (first electronic device) of at least two image collecting module 112 of the output end are connected with the first end of the image processor 111. The second end of the image processor 111 is connected with the first end of the data processing component 12. The second end of the data processing component 12 is connected with the first end of the control processor 131; the second end of the control processor 131 is connected with the display touch screen 132 (second electronic device)) Jiao (Page 8, ¶7)).
Jiao does not expressively teach
performing at least one process on the first audio data and/or the first image data based on configuration information and/or usage information of an output component for outputting the target data to be outputted to obtain the target data to be outputted; and
wherein data size of the target data to be outputted is different from data size of the first audio data and/or the first image data based on an output requirement of the target application, the output requirement being a requirement for enabling the target application to directly output the target data to be outputted.
However, Baldwin teaches
performing at least one process on the first audio data and/or the first image data based on configuration information and/or usage information of an output component for outputting the target data to be outputted to obtain the target data to be outputted (“Example methods disclosed herein to distribute communication of a first data stream among multiple devices include receiving a request from one of a group of example devices to establish a shared connection to distribute the communication of the first data stream among the group of devices. Such example methods also include establishing, in response to the request, respective data connections with the group of devices based on device capability information” Baldwin [0014], “consumer may have access to multiple electronic devices that support Internet connectivity, but none of these devices may have sufficient data bandwidth capability, on its own, to support the data intensive applications the consumer may wish to access” Baldwin [0019], and “The connection manager 135 further establishes the data connections with the devices 105A-E by associating respective data transmission characteristic(s) with each of the data connections. In the illustrate example, the data transmission characteristic(s) are determined by the connection manager 135 based on the device capability information obtained from the service provider network 110. For example, the connection manager 135 can allocate different bandwidth and/or data rate limits to some or all of the data connections based on the device capability information for each of the devices 105A-E. After establishing the data connections with the respective devices 105A-E and determining their respective data transmission characteristics, the connection manager 135 then initiates the data stream from target data source 115 using the target data source identification information included in the shared connection request” Baldwin [0031]); and
wherein data size of the target data to be outputted is different from data size of the first audio data and/or the first image data based on an output requirement of the target application, the output requirement being a requirement for enabling the target application to directly output the target data to be outputted (“At block 1035, the stream aggregator 630 aggregates, as described above, the partial data streams (e.g., possibly after reordering to account for different data packet arrival times at the different user devices 105A-E) to form the complete data stream being provided by the target data source 115. At block 1040, the primary user device 105A performs any appropriate post-processing on the aggregated, complete data stream, and/or the stream relayer 635 of the primary user device 105A outputs the complete data stream for use by another device, such as the output device 125.” Baldwin [0064]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao with Baldwin so as to “support the data transfer speeds associated with a data intensive application the consumer wishes to access” (Baldwin [0002]) without unnecessary strain on the processor.
Regarding Claim 9, Jiao in view of Baldwin teaches
The method of claim 1, further comprising:
outputting the target data to be outputted to a target output component, the target output component being an output component of the first electronic device and/or a display component and/or an audio output component connected to the first electronic device (“sending the to-be-displayed video to the electronic device and/or display component, so that the electronic device and/or the display component displays the video to be displayed” Jiao (Page 3, ¶14)), wherein:
the target data to be outputted is output to the target output component and the target application through the same or different channels (“In some embodiments, the based on the second video determining video to be displayed, comprising: analyzing the second video; determining the target vocalizing portrait; performing specific manner presentation processing to the target sounding portrait in the first video to obtain the video to be displayed.” Jiao (Page 3, ¶16-18)).
Regarding Claim 10, Jiao teaches
An electronic device (“conference device, conference system and data processing method” Jiao (Page 2, ¶ 3)), comprising:
a microphone array (“the video collecting component further comprises a first microphone array and a voice processor” Jiao (Page 2, ¶14)) arranged on a body of the electronic device for collecting audio data in a target space environment (“the voice processor is used for receiving the first sound information sent by the first microphone array” Jiao (Page 2, ¶16); Fig. 11);
a camera array arranged on the body for collecting image data in the target space environment (“the video collecting component 11 comprises an image processor 111 and at least two image collecting module 112 different from the optical axis direction.” Jiao (Page 5, ¶16); Fig. 3A, 1); and
a processing device disposed in the body (“a data processing component 12” Jiao (Page 5, ¶15)), the processing device being configured to:
obtain first audio data and/or first image data, the first audio data including or not including the audio data collected by the microphone array, the first image data including or not including the image data collected by the camera array (“the at least two image collecting module, for collecting respectively least two first video in different directions.” Jiao (Page 2, ¶6));
perform at least one process on the first audio data and/or the first image data to obtain target data to be outputted (“the data processing component is used for receiving the second video transmitted by the image processor; based on the second video, determining the video to be displayed” Jiao (Page 2, ¶8)), including:
transmit the target data to be outputted to the target application running on a second electronic device having a communication connection with the electronic device (the video collecting component (first electronic device) of at least two image collecting module 112 of the output end are connected with the first end of the image processor 111. The second end of the image processor 111 is connected with the first end of the data processing component 12. The second end of the data processing component 12 is connected with the first end of the control processor 131; the second end of the control processor 131 is connected with the display touch screen 132 (second electronic device)) Jiao (Page 8, ¶7)).
Jiao does not expressively teach
performing at least one process on the first audio data and/or the first image data based on configuration information and/or usage information of an output component for outputting the target data to be outputted to obtain the target data to be outputted, wherein data size of the target data to be outputted is different from data size of the first audio data and/or the first image data based on an output requirement of a target application, and the output requirement is a requirement for enabling the target application to directly output the target data to be outputted.
However, Baldwin teaches
performing at least one process on the first audio data and/or the first image data based on configuration information and/or usage information of an output component for outputting the target data to be outputted to obtain the target data to be outputted (“Example methods disclosed herein to distribute communication of a first data stream among multiple devices include receiving a request from one of a group of example devices to establish a shared connection to distribute the communication of the first data stream among the group of devices. Such example methods also include establishing, in response to the request, respective data connections with the group of devices based on device capability information” Baldwin [0014], “consumer may have access to multiple electronic devices that support Internet connectivity, but none of these devices may have sufficient data bandwidth capability, on its own, to support the data intensive applications the consumer may wish to access” Baldwin [0019], and “The connection manager 135 further establishes the data connections with the devices 105A-E by associating respective data transmission characteristic(s) with each of the data connections. In the illustrate example, the data transmission characteristic(s) are determined by the connection manager 135 based on the device capability information obtained from the service provider network 110. For example, the connection manager 135 can allocate different bandwidth and/or data rate limits to some or all of the data connections based on the device capability information for each of the devices 105A-E. After establishing the data connections with the respective devices 105A-E and determining their respective data transmission characteristics, the connection manager 135 then initiates the data stream from target data source 115 using the target data source identification information included in the shared connection request” Baldwin [0031]), wherein data size of the target data to be outputted is different from data size of the first audio data and/or the first image data based on an output requirement of the target application, and the output requirement is a requirement for enabling the target application to directly output the target data to be outputted (“At block 1035, the stream aggregator 630 aggregates, as described above, the partial data streams (e.g., possibly after reordering to account for different data packet arrival times at the different user devices 105A-E) to form the complete data stream being provided by the target data source 115. At block 1040, the primary user device 105A performs any appropriate post-processing on the aggregated, complete data stream, and/or the stream relayer 635 of the primary user device 105A outputs the complete data stream for use by another device, such as the output device 125.” Baldwin [0064]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao with Baldwin so as to “support the data transfer speeds associated with a data intensive application the consumer wishes to access” (Baldwin [0002]) without unnecessary strain on the processor.
Regarding Claim 11, Jiao teaches
A processing device (“conference device, conference system and data processing method” Jiao (Page 2, ¶ 3)) comprising:
an acquisition module configured to obtain first audio data and/or first image data (“the at least two image collecting module, for collecting respectively least two first video in different directions” Jiao (Page 2, ¶6));
a processing module (“the data processing component” Jiao (Page 2, ¶8)) configured to:
perform at least one process on the first audio data and/or the first image data to obtain target data to be outputted (“the data processing component is used for receiving the second video transmitted by the image processor; based on the second video, determining the video to be displayed” Jiao (Page 2, ¶8)), including:
an output module configured to transmit the target data to be transmitted (“sending the to-be-displayed video” Jiao (Page 3, ¶15)) to a target application running on a second electronic device (“to the electronic device and/or display component” Jiao (Page 3, ¶15)) having a communication connection with the electronic device (the video collecting component (first electronic device) of at least two image collecting module 112 of the output end are connected with the first end of the image processor 111. The second end of the image processor 111 is connected with the first end of the data processing component 12. The second end of the data processing component 12 is connected with the first end of the control processor 131; the second end of the control processor 131 is connected with the display touch screen 132 (second electronic device)) Jiao (Page 8, ¶7)).
Jiao does not expressively teach
performing at least one process on the first audio data and/or the first image data based on configuration information and/or usage information of an output component for outputting the target data to be outputted to obtain the target data to be outputted; and
wherein data size of the target data to be outputted is different from data size of the first audio data and/or the first image data based on an output requirement of the target application, the output requirement being a requirement for enabling the target application to directly output the target data to be outputted.
However, Baldwin teaches
performing at least one process on the first audio data and/or the first image data based on configuration information and/or usage information of an output component for outputting the target data to be outputted to obtain the target data to be outputted (“Example methods disclosed herein to distribute communication of a first data stream among multiple devices include receiving a request from one of a group of example devices to establish a shared connection to distribute the communication of the first data stream among the group of devices. Such example methods also include establishing, in response to the request, respective data connections with the group of devices based on device capability information” Baldwin [0014], “consumer may have access to multiple electronic devices that support Internet connectivity, but none of these devices may have sufficient data bandwidth capability, on its own, to support the data intensive applications the consumer may wish to access” Baldwin [0019], and “The connection manager 135 further establishes the data connections with the devices 105A-E by associating respective data transmission characteristic(s) with each of the data connections. In the illustrate example, the data transmission characteristic(s) are determined by the connection manager 135 based on the device capability information obtained from the service provider network 110. For example, the connection manager 135 can allocate different bandwidth and/or data rate limits to some or all of the data connections based on the device capability information for each of the devices 105A-E. After establishing the data connections with the respective devices 105A-E and determining their respective data transmission characteristics, the connection manager 135 then initiates the data stream from target data source 115 using the target data source identification information included in the shared connection request” Baldwin [0031]); and
wherein data size of the target data to be outputted is different from data size of the first audio data and/or the first image data based on an output requirement of the target application, the output requirement being a requirement for enabling the target application to directly output the target data to be outputted (“At block 1035, the stream aggregator 630 aggregates, as described above, the partial data streams (e.g., possibly after reordering to account for different data packet arrival times at the different user devices 105A-E) to form the complete data stream being provided by the target data source 115. At block 1040, the primary user device 105A performs any appropriate post-processing on the aggregated, complete data stream, and/or the stream relayer 635 of the primary user device 105A outputs the complete data stream for use by another device, such as the output device 125.” Baldwin [0064]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao with Baldwin so as to “support the data transfer speeds associated with a data intensive application the consumer wishes to access” (Baldwin [0002]) without unnecessary strain on the processor.
Regarding Claim 19, Jiao in view of Baldwin teaches
The processing device of claim 10, wherein the processing device is further configured to:
transmit the target data to be outputted to a target output component, the target output component being an output component of the processing device and/or a display component and/or an audio output component connected to the processing device (“sending the to-be-displayed video to the electronic device and/or display component, so that the electronic device and/or the display component displays the video to be displayed” Jiao (Page 3, ¶12)), the target data to be outputted being output to the target output component and the target application through the same or different channels (“In some embodiments, the based on the second video determining video to be displayed, comprising: analyzing the second video; determining the target vocalizing portrait; performing specific manner presentation processing to the target sounding portrait in the first video to obtain the video to be displayed.” Jiao (Page 3, ¶16-18)).
Claims 2-3 and 12-13 are rejected under 35 U.S.C. 103 as being unpatentable over Jiao (CN112887654 A) in view of Baldwin et. al (US 20140040364 A1, hereinafter Baldwin) and further in view of Tanaka et. al (US 20180322896 A1, hereinafter Tanaka).
Regarding Claim 2, Jiao in view of Baldwin teaches
The method of claim 1, wherein obtaining the first audio data and/or the first image data includes at least one of the alternatives below. Examiner has chosen to reject alternative 1.
using a microphone array and/or a camera array of the first electronic device (“the video collecting component” Jiao (Page 2, ¶6)) to collect audio data and/or image data in a target space environment as the first audio data and/or the first image data (“the first microphone array is used for collecting the first sound information” Jiao (Page 2, ¶15));
or using audio data and/or image data from the target application as the first audio data and/or the first image data;
or using the audio data and/or the image data in the target space environment collected by the microphone array and/or the camera array of the first electronic device, and the audio data and/or the image data from the target application as the first audio data and/or the first image data;
or using the audio data and/or the image data in the target space environment collected by the microphone array and/or the camera array of the first electronic device, the audio data and/or the image data from the target application, and audio data and/or image data collected by a third electronic device as the first audio data and/or the first image data;
Jiao in view of Baldwin does not expressively teach
wherein the target space environment being a space environment where the first electronic device is located, the microphone array and/or the camera array being configured to adjust their collection ranges in the target space environment based on change information in the target space environment, the target application including one application or multiple applications of the same or different types.
However, Tanaka teaches
wherein the target space environment being a space environment where the first electronic device is located, the microphone array and/or the camera array being configured to adjust their collection ranges in the target space environment based on change information in the target space environment, the target application including one application or multiple applications of the same or different types (“In addition, in the present embodiment, scanning for specifying a direction to an environmental noise source can be performed to improve voice recognition performance. At step S6, the control unit 1 narrows the sound collection range of the microphone unit 2a and changes the sound collection range.” Tanaka [0103]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao in view of Baldwin, with Tanaka so as to improve performance of voice recognition of the target. In particular, when any other person's voice is generated as environmental noise to dictation target voice, performance of voice recognition of the target voice degrades so that dictation cannot reliably performed in some cases. Tanaka acknowledges that there is a desire to improve target voice detection and does so by adjusting the collection range of the microphone in the target space environment (Tanaka [0008] and [0009]).
Regarding Claim 3, Jiao in view of Baldwin and Tanaka teaches
The method of claim 2, wherein performing the at least one process on the first audio data and/or the first image data to obtain the target data to be outputted further includes at least one of the alternatives below. Examiner has chosen to reject alternative 1.
performing at least one process on the first audio data based on the change information in the target space environment to obtain the target data to be outputted (“In some embodiments, the data processing component 12 can be based on the sound direction information, from the second image to determine the sound direction information corresponding to the target sound image, then determining the target sound image for specific presentation of the video to be displayed. For example, the target sounding portrait specific presentation may include: and amplifying the target sounding image in the second video.” Jiao (Page 10, ¶2));
or performing at least one process on the first audio data in response to obtaining instruction information generated by operations acting on the target application to obtain the target data to be outputted;
or performing at least one process on the first audio data based on target space environment information and resource information of the first electronic device to obtain the target data to be outputted.
Regarding Claim 12, Jiao in view of Baldwin teaches
The electronic device of claim 10, wherein the processing device is further configured to perform at least one of the alternatives below. Examiner has chosen to reject alternative 1.
use a microphone array and/or a camera array of the processing device to collect audio data and/or image data in a target space environment as the first audio data and/or the first image data (“In some embodiments, the video collecting component further comprises a first microphone array and a voice processor; the first microphone array is used for collecting the first sound information; sending the first sound information to the voice processor” Jiao (Page 2, ¶14-15));
or use audio data and/or image data from the target application as the first audio data and/or the first image data;
or use the audio data and/or the image data in the target space environment collected by the microphone array and/or the camera array of the processing device, and the audio data and/or the image data from the target application as the first audio data and/or the first image data;
or use the audio data and/or the image data in the target space environment collected by the microphone array and/or the camera array of the processing device, the audio data and/or the image data from the target application, and audio data and/or image data collected by a third electronic device as the first audio data and/or the first image data, the target space environment being a space environment where the processing device is located;
Jiao in view of Baldwin does not expressively teach
wherein the microphone array and/or the camera array are configured to adjust their collection ranges in the target space environment based on change information in the target space environment, the target application including one application or multiple applications of the same or different types.
However, Tanaka teaches
wherein the target space environment being a space environment where the first electronic device is located, the microphone array and/or the camera array being configured to adjust their collection ranges in the target space environment based on change information in the target space environment, the target application including one application or multiple applications of the same or different types (“In addition, in the present embodiment, scanning for specifying a direction to an environmental noise source can be performed to improve voice recognition performance. At step S6, the control unit 1 narrows the sound collection range of the microphone unit 2a and changes the sound collection range.” Tanaka [0103]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao in view of Baldwin, with Tanaka so as to improve performance of voice recognition of the target. In particular, when any other person's voice is generated as environmental noise to dictation target voice, performance of voice recognition of the target voice degrades so that dictation cannot reliably performed in some cases. Tanaka acknowledges that there is a desire to improve target voice detection and does so by adjusting the collection range of the microphone in the target space environment (Tanaka [0008] and [0009]).
Regarding Claim 13, Jiao in view of Baldwin and Tanaka teaches
The electronic device of claim 12, wherein the processing device is further configured to perform at least one of the alternatives below. Examiner has chosen to reject alternative 1.
perform at least one process on the first audio data based on the change information in the target space environment to obtain the target data to be outputted (“In some embodiments, the data processing component 12 can be based on the sound direction information, from the second image to determine the sound direction information corresponding to the target sound image, then determining the target sound image for specific presentation of the video to be displayed. For example, the target sounding portrait specific presentation may include: and amplifying the target sounding image in the second video.” Jiao (Page 10, ¶2));
or perform at least one process on the first audio data in response to obtaining instruction information generated by operations acting on the target application to obtain the target data to be outputted;
or perform at least one process on the first audio data based on target space environment information and resource information of the first electronic device to obtain the target data to be outputted.
Claims 4 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Jiao (CN112887654 A) in view of Baldwin et. al (US 20140040364 A1, hereinafter Baldwin), Tanaka et. al (US 20180322896 A1, hereinafter Tanaka), and further in view of Ostap et. al (US 11350029 B1, hereinafter Ostap).
Regarding Claim 4, Jiao in view of Baldwin and Tanaka teaches
The method of claim 2, wherein performing the at least one process on the first image data and/or the first audio data to obtain the target data to be outputted further includes at least one of the alternatives below. Examiner has chosen to reject alternative 1.
performing at least one process on the first image data to obtain the target data to be outputted (For example, the middleware software can intercept each character in the second video, combining into a head portrait splicing image and outputting to the conference software, or middleware software can intercept and amplify some area in the 360 degrees panoramic picture, only the intercepted part is output to the conference software; or overlapping a small display window on the screenshot, for displaying the portrait picture of the appointed character (target data). Jiao (Page 12, ¶3));
or performing at least one process on the first image data in response to obtaining instruction information generated by operations acting on the target application to obtain the target data to be outputted;
or performing at least one process on the first image data based on target space environment information and resource information of the first electronic device to obtain the target data to be outputted.
Jiao in view of Baldwin and Tanaka does not expressively teach
based on the change information in the target space environment.
However, Ostap teaches
based on the change information in the target space environment (“Typically, the survey frames are analyzed at the beginning of the video-conferencing session, e.g., to detect conference participants 306, and periodically throughout the video-conferencing session to detect changes in the video-conferencing session, such as participants leaving, participants changing location, new participants joining, changes in participant activity (changes in who is speaking) and shifting participant engagement levels.” Ostap [39]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao in view of Baldwin and Tanaka, with Ostap so as to improve the target data acquisition during a videoconference. Ostap acknowledges that there is a desire to optimize this and does so by prioritizing a certain region-of-interest during a videoconference to ensure the desired view of the target data is displayed (Ostap [5]).
Regarding Claim 14, Jiao in view of Baldwin and Tanaka teaches
The electronic device of claim 12, wherein the processing device is further configured to perform at least one of the alternatives below. Examiner has chosen to reject alternative 1.
perform at least one process on the first image data to obtain the target data to be outputted (For example, the middleware software can intercept each character in the second video, combining into a head portrait splicing image and outputting to the conference software, or middleware software can intercept and amplify some area in the 360 degrees panoramic picture, only the intercepted part is output to the conference software; or overlapping a small display window on the screenshot, for displaying the portrait picture of the appointed character (target data). Jiao (Page 12, ¶3));
or perform at least one process on the first image data in response to obtaining instruction information generated by operations acting on the target application to obtain the target data to be outputted;
or perform at least one process on the first image data based on target space environment information and resource information of the first electronic device to obtain the target data to be outputted.
Jiao in view of Baldwin and Tanaka does not expressively teach
based on the change information in the target space environment.
However, Ostap teaches
based on the change information in the target space environment (“Typically, the survey frames are analyzed at the beginning of the video-conferencing session, e.g., to detect conference participants 306, and periodically throughout the video-conferencing session to detect changes in the video-conferencing session, such as participants leaving, participants changing location, new participants joining, changes in participant activity (changes in who is speaking) and shifting participant engagement levels.” Ostap [39]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao in view of Baldwin and Tanaka, with Ostap so as to improve the target data acquisition during a videoconference. Ostap acknowledges that there is a desire to optimize this and does so by prioritizing a certain region-of-interest during a videoconference to ensure the desired view of the target data is displayed (Ostap [5]).
Claims 5-6 and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Jiao (CN112887654 A) in view of Baldwin et. al (US 20140040364 A1, hereinafter Baldwin), Tanaka et. al (US 20180322896 A1, hereinafter Tanaka), and further in view Feng (CN 113573161 A).
Regarding Claim 5, Jiao in view of Baldwin and Tanaka teaches
The method of claim 2, wherein performing the at least one process on the first audio data and/or the first image data to obtain the target data to be outputted further includes:
processing a plurality of first audio data obtained through a control signal into target audio data (“the second microphone array, for collecting the second sound information, sending the second sound information to the signal processor” Jiao (Page 3, ¶5)); (“sending the third sound information to the data processing component, so that the data processing component forwards the third sound information to the electronic device, so that the electronic device plays the third sound information.” Jiao (Page 3, ¶9));
processing a plurality of first image data (“In some embodiments, the based on the second video determining video to be displayed, comprising: intercepting all the images in the second video; arranging all the images to obtain a fourth video; determining the fourth video as the video to be displayed, or superimposing the fourth video on the second video to obtain the video to be displayed” Jiao (Page 4, ¶1-4)) obtained through the control signal into target image data (“obtaining the trigger instruction generated by triggering the appointed area in the video to be displayed” Jiao (Page 14, ¶9)); and
wherein the control signal at least includes a signal for triggering the microphone array and/or the camera array of the first electronic device to collect corresponding data (“obtaining the trigger instruction generated by triggering the appointed area in the video to be displayed” Jiao (Page 14, ¶9)).
Jiao in view of Baldwin and Tanaka does not expressively teach
merging the target audio data and the target image data.
However, Feng teaches
merging the target audio data and the target image data (“a fusion module, for extracting key video segment from the target video data, fusing the key video segment and the secondary song segment of the target audio data, obtaining the multimedia data comprising the key video segment and the secondary song segment of the target audio data.” Feng (P2, ¶13)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao in view of Baldwin and Tanaka, with Feng so as to improve the obtaining efficiency of the target data to meeting participants. Feng acknowledges that there is a desire to optimize obtaining efficiency and does so by fusing the target image audio and image data to be outputted together Feng (P2, ¶1).
Regarding Claim 6, Jiao in view of Baldwin and Tanaka teaches
The method of claim 2, wherein performing the at least one process on the first audio data and/or the first image data to obtain the target data to be outputted further includes:
determining a use mode of the first electronic device (“collecting mode”, “first structure mode”, “second structure mode” Jiao (Page 6, ¶6)); and
selecting target audio data and target image data from the first audio data and the first image data based at least on the use mode (For example, in the first structure mode, at least two image collecting module 112 can collect the panoramic image, adapted to the panoramic collection mode of the conference scene; In the second structure mode, at least two image collecting module 112 can collect the image in a certain viewing angle range, adapted to the single direction collecting mode of the conference scene. In some embodiments, the first structure mode, at least two image collecting module 112 can be annularly set, the second structure mode, at least two image collecting module 112 can be set on the plane, or can be set in the preset angle range, to collect a certain visual angle under the video; so as to obtain the video of the specific view angle. Jiao (Page 6, ¶6)).
Jiao in view of Baldwin and Tanaka does not expressively teach
performing fusion processing on the target audio data and the target image data based at least on the use mode to obtain the target data to be outputted.
However, Feng teaches
performing fusion processing on the target audio data and the target image data based at least on the use mode to obtain the target data to be outputted (“a fusion module, for extracting key video segment from the target video data, fusing the key video segment and the secondary song segment of the target audio data, obtaining the multimedia data comprising the key video segment and the secondary song segment of the target audio data.” Feng (P2, ¶13)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao in view of Baldwin and Tanaka, with Feng so as to improve the obtaining efficiency of the target data to meeting participants. Feng acknowledges that there is a desire to optimize obtaining efficiency and does so by fusing the target image audio and image data to be outputted together Feng (P2, ¶1).
Regarding Claim 15, Jiao in view of Baldwin and Tanaka teaches
The electronic device of claim 12, wherein the processing device is further configured to:
process a plurality of first audio data obtained through a control signal into target audio data (“the second microphone array, for collecting the second sound information, sending the second sound information to the signal processor” Jiao (Page 3, ¶5)); (“sending the third sound information to the data processing component, so that the data processing component forwards the third sound information to the electronic device, so that the electronic device plays the third sound information.” Jiao (Page 3, ¶9));
process a plurality of first image data (“In some embodiments, the based on the second video determining video to be displayed, comprising: intercepting all the images in the second video; arranging all the images to obtain a fourth video; determining the fourth video as the video to be displayed, or superimposing the fourth video on the second video to obtain the video to be displayed” Jiao (Page 4, ¶1-4)) obtained through the control signal into target image data (“obtaining the trigger instruction generated by triggering the appointed area in the video to be displayed”; Page 14, ¶9));
based on the control signal to obtain the target data to be outputted, the control signal at least including a signal for triggering the microphone array and/or the camera array of the first electronic device to collect corresponding data (“obtaining the trigger instruction generated by triggering the appointed area in the video to be displayed; based on the trigger instruction, intercepting the designated area in the video to be displayed; obtaining the video of the designated area; based on the specified area of the video, determining the target video; obtaining the trigger instruction generated by triggering the appointed area in the video to be displayed” Jiao (Page 14, ¶9)).
Jiao in view of Baldwin and Tanaka does not expressively teach
merging the target audio data and the target image data.
However, Feng teaches
merging the target audio data and the target image data (“a fusion module, for extracting key video segment from the target video data, fusing the key video segment and the secondary song segment of the target audio data, obtaining the multimedia data comprising the key video segment and the secondary song segment of the target audio data.” Feng (P2, ¶13)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao in view of Baldwin and Tanaka, with Feng so as to improve the obtaining efficiency of the target data to meeting participants. Feng acknowledges that there is a desire to optimize obtaining efficiency and does so by fusing the target image audio and image data to be outputted together Feng (P2, ¶1).
Regarding Claim 16, Jiao in view of Baldwin and Tanaka teaches
The electronic device of claim 12, wherein the processing device is further configured to:
determine a use mode of the processing device (“collecting mode”, “first structure mode”, “second structure mode” Jiao (Page 6, ¶6)); and
select target audio data and target image data from the first audio data and the first image data based at least on the use mode (For example, in the first structure mode, at least two image collecting module 112 can collect the panoramic image, adapted to the panoramic collection mode of the conference scene; In the second structure mode, at least two image collecting module 112 can collect the image in a certain viewing angle range, adapted to the single direction collecting mode of the conference scene. In some embodiments, the first structure mode, at least two image collecting module 112 can be annularly set, the second structure mode, at least two image collecting module 112 can be set on the plane, or can be set in the preset angle range, to collect a certain visual angle under the video; so as to obtain the video of the specific view angle. Jiao (Page 6, ¶6)).
Jiao in view of Baldwin and Tanaka does not expressively teach
to perform fusion processing on the target audio data and the target image data based at least on the use mode to obtain the target data to be outputted.
However, Feng teaches
to perform fusion processing on the target audio data and the target image data based at least one the use mode to obtain the target data to be outputted (“a fusion module, for extracting key video segment from the target video data, fusing the key video segment and the secondary song segment of the target audio data, obtaining the multimedia data comprising the key video segment and the secondary song segment of the target audio data.” Feng (P2, ¶13)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao in view of Baldwin and Tanaka, with Feng so as to improve the obtaining efficiency of the target data to meeting participants. Feng acknowledges that there is a desire to optimize obtaining efficiency and does so by fusing the target image audio and image data to be outputted together Feng (P2, ¶1).
Claims 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Jiao (CN112887654 A) in view of Baldwin et. al (US 20140040364 A1, hereinafter Baldwin) and further in view of Albadawi et. al (WO 2018152006 A1, hereinafter Albadawi).
Regarding Claim 7, Jiao in view of Baldwin teaches
The method of claim 1, wherein performing the at least one process on the first audio data and/or the first image data to obtain the target data to be outputted further includes (“the voice processor is used for receiving the first sound information sent by the first microphone array; based on the first sound information, determining the sound direction information of the first sound information;” Jiao (Page 2, ¶16); Fig. 11).
Jiao in view of Baldwin does not expressively teach
obtaining system resource information of the first electronic device, determining a target algorithm set from an algorithm library preset by the first electronic device based on the system resource information, and performing corresponding processing on the first audio data and/or the first image data by using an algorithm model in the target algorithm set to obtain the target data to be outputted, the algorithm library being located in the first electronic device or in the space environment where the first electronic device is ls located, the target algorithm set being updated correspondingly based on changes in the system resource information; or
obtaining the system resource information of the first electronic device, optimizing an original algorithm model based on the system resource information, and performing the corresponding processing on the first audio data and/or the first image data by using the optimized target algorithm model or the target algorithm set to obtain the target data to be outputted, the target algorithm set or the target algorithm model being updated correspondingly based on changes in the system resource information.
However, Albadawi teaches at least one of the following alternatives below. Examiner choses to reject alternative 2.
obtaining system resource information of the first electronic device, determining a target algorithm set from an algorithm library preset by the first electronic device based on the system resource information, and performing corresponding processing on the first audio data and/or the first image data by using an algorithm model in the target algorithm set to obtain the target data to be outputted, the algorithm library being located in the first electronic device or in the space environment where the first electronic device is ls located, the target algorithm set being updated correspondingly based on changes in the system resource information;
or obtaining the system resource information of the first electronic device (“The selection module 914 may select from among the tracking algorithms based on the available computing resources of the computing device(s) with which tracking is performed.” Albadawi [157]), optimizing an original algorithm model based on the system resource information (“the selection module 914 may select the tracking algorithm that consumes the least power when used to process image data if the remaining battery life is less than a predetermined threshold (e.g., 10%), and may forego selection of the other tracking algorithms.” Albadawi [157]), and performing the corresponding processing on the first audio data and/or the first image data by using the optimized target algorithm model or the target algorithm set to obtain the target data to be outputted, the target algorithm set or the target algorithm model being updated correspondingly based on changes in the system resource information (“Algorithm feedback in this manner may enable other types of updates, including updates to a tracking algorithm by the face detection algorithm, and updates between two or more tracking algorithms, as described in further detail below with reference to FIG. 25.” Albadawi [162]; “The selection module 914 then determines a less computationally intensive manner in which to track the first and second people 930 and 932 by selecting one or more of the tracking algorithms 920-926.” Albadawi [165]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao in view of Baldwin, with Albadawi so as to improve the processor load during image analysis. In order to prevent stalls and a degraded user experience, the tracking algorithm can be optimized such that the image processing does not consume more than the available share of logic processor resources (Albadawi [158]). Albadawi acknowledges that there is a desire to optimize processor load during image analysis, such as selecting a tracking algorithm that does not consume more than the available share of logic processor, i.e., less processor intensive (for further support see ¶157, ¶158, ¶162, and ¶165 of Albadawi).
Regarding Claim 17, Jiao in view of Baldwin teaches
The electronic device of claim 10 (“the data processing component” Jiao (Page 2, ¶8)), wherein the processing device is further configured to:
Jiao in view of Baldwin does not expressively teach
to obtain system resource information of the processing device, determine a target algorithm set from an algorithm library preset by the processing device based on the system resource information, and perform corresponding processing on the first audio data and/or the first image data by using an algorithm model in the target algorithm set to obtain the target data to be outputted, the algorithm library being located in the processing device or in the space environment where the processing device is located, the target algorithm set being updated correspondingly based on changes in the system resource information; or
obtain the system resource information of the processing device, optimize an original algorithm model based on the system resource information, and perform the corresponding processing on the first audio data and/or the first image data by using the optimized target algorithm model or the target algorithm set to obtain the target data to be outputted, the target algorithm set or the target algorithm model being updated correspondingly based on changes in the system resource information.
However, Albadawi teaches at least one of the following alternatives below. Examiner choses to reject alternative 2.
obtain system resource information of the processing device, determine a target algorithm set from an algorithm library preset by the processing device based on the system resource information, and perform corresponding processing on the first audio data and/or the first image data by using an algorithm model in the target algorithm set to obtain the target data to be outputted, the algorithm library being located in the processing device or in the space environment where the processing device is located, the target algorithm set being updated correspondingly based on changes in the system resource information;
or obtain the system resource information of the processing device (“The selection module 914 may select from among the tracking algorithms based on the available computing resources of the computing device(s) with which tracking is performed.” Albadawi [157]), optimize an original algorithm model based on the system resource information (“the selection module 914 may select the tracking algorithm that consumes the least power when used to process image data if the remaining battery life is less than a predetermined threshold (e.g., 10%), and may forego selection of the other tracking algorithms.” Albadawi [157]), and perform the corresponding processing on the first audio data and/or the first image data by using the optimized target algorithm model or the target algorithm set to obtain the target data to be outputted, the target algorithm set or the target algorithm model being updated correspondingly based on changes in the system resource information (“Algorithm feedback in this manner may enable other types of updates, including updates to a tracking algorithm by the face detection algorithm, and updates between two or more tracking algorithms, as described in further detail below with reference to FIG. 25.” Albadawi [162]; “The selection module 914 then determines a less computationally intensive manner in which to track the first and second people 930 and 932 by selecting one or more of the tracking algorithms 920-926.” Albadawi [165]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao in view of Baldwin, with Albadawi so as to improve the processor load during image analysis. In order to prevent stalls and a degraded user experience, the tracking algorithm can be optimized such that the image processing does not consume more than the available share of logic processor resources (Albadawi [158]). Albadawi acknowledges that there is a desire to optimize processor load during image analysis, such as selecting a tracking algorithm that does not consume more than the available share of logic processor, i.e., less processor intensive (for further support see ¶157, ¶158, ¶162, and ¶165 of Albadawi).
Claims 8 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Jiao (CN112887654 A) in view of Baldwin et. al (US 20140040364 A1, hereinafter Baldwin), Tanaka et. al (US 20180322896 A1, hereinafter Tanaka), and further in view of Hu et. al (CN 112866619 A, hereinafter Hu).
Regarding Claim 8, Jiao in view of Baldwin and Tanaka teaches
The method of claim 2, wherein transmitting the target data to be outputted to the target application running on the second electronic device having the communication connection with the first electronic device includes at least one of the alternatives below. Examiner choses to reject alternative 1.
when the first audio data and/or the first image data includes audio data and/or image data from a first target application (“sending the to-be-displayed video to the electronic device and/or display component, so that the electronic device and/or the display component displays the video to be displayed” Jiao (Page 3, ¶14));
or when the first audio data and/or the first image data includes the audio data and/or the image data from the first target application, transmitting the target data to be outputted to a third target application identical to the first target application, the first target application and the third target application being ran on different second electronic devices;
or in response to obtaining sharing request from the first target application, transmitting the target data to be outputted to a fourth target application corresponding to the sharing object, the sharing request including a sharing object of the target data to be outputted, the fourth target application and the first target application being the same or different applications running on different second electronic devices.
Jiao in view of Baldwin and Tanaka does not expressively teach the entirety of alternative 1: “transmitting the target data to be outputted to the second target application different from the first target application, the first target application and the second target application being ran on different second electronic devices.”
However, Hu discloses the remainder of alternative 1: transmitting the target data to be outputted to the second target application different from the first target application, the first target application and the second target application being ran on different second electronic devices (“taking the network easy cloud conference as an example, the service end can run the transfer server, receiving remote control instruction sent by the external computer (i.e., remote control terminal), opening the network easy cloud conference software application, and login by the conference account of the network easy cloud conference, then initiating a remote conference; the participants of other remote conference are added in, or an existing remote conference is added. the transfer server of the service end also can open the remote conference login page through the browser, login by the conference account number, then initiating a remote conference, the participants of other remote conference are added in, or adding an existing remote conference.” Hu (Pages 6-7, ¶10)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao in view of Baldwin and Tanaka, with Hu. The motivation for doing so being to allow multiple devices to view the target data in the target application across multiple electronic devices Hu (Page 2, ¶1-3).
Regarding Claim 18, Jiao in view of Baldwin and Tanaka teaches
The electronic device of claim 12, wherein the processing device is further configured to perform at least one of the alternatives below. Examiner choses to reject alternative 1.
when the first audio data and/or the first image data includes audio data and/or image data from a first target application (“sending the to-be-displayed video to the electronic device and/or display component, so that the electronic device and/or the display component displays the video to be displayed” Jiao (Page 3, ¶14));
or when the first audio data and/or the first image data includes the audio data and/or the image data from the first target application, transmit the target data to be outputted to a third target application identical to the first target application, the first target application and the third target application being ran on different second electronic devices;
or in response to obtaining sharing request from the first target application, the sharing request including a sharing object of the target data to be outputted, transmit the target data to be outputted to a fourth target application corresponding to the sharing object, the fourth /target application and the first target application being the same or different applications running on different second electronic devices.
Jiao in view of Baldwin and Tanaka does not expressively teach the entirety of alternative 1: “transmit the target data to be outputted to the second target application different from the first target application, the first target application and the second target application being ran on different second electronic devices.”
However, Hu discloses the remainder of alternative 1: transmit the target data to be outputted to the second target application different from the first target application, the first target application and the second target application being ran on different second electronic devices (“taking the network easy cloud conference as an example, the service end can run the transfer server, receiving remote control instruction sent by the external computer (i.e., remote control terminal), opening the network easy cloud conference software application, and login by the conference account of the network easy cloud conference, then initiating a remote conference; the participants of other remote conference are added in, or an existing remote conference is added. the transfer server of the service end also can open the remote conference login page through the browser, login by the conference account number, then initiating a remote conference, the participants of other remote conference are added in, or adding an existing remote conference.” Hu (Pages 6-7, ¶10)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao in view of Baldwin and Tanaka, with Hu. The motivation for doing so being to allow multiple devices to view the target data in the target application across multiple electronic devices Hu (Page 2, ¶1-3).
Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Jiao (CN112887654 A) in view of Baldwin et. al (US 20140040364 A1, hereinafter Baldwin), Tanaka et. al (US 20180322896 A1, hereinafter Tanaka), Feng (CN 113573161 A), and further in view of Witriol et. al (US 12294919 B2, hereinafter Witriol).
Regarding Claim 20, Jiao in view of Baldwin, Tanaka, and Feng teaches the electronic device of claim 16.
Jiao in view of Baldwin, Tanaka, and Feng does not expressively teach “wherein: the use mode of the processing device includes at least one of the alternatives below.” Examiner chooses to reject alternative 2.
“a whiteboard mode,
a speech mode,
a comparison mode,
or a display mode.”
However, Witriol does teach wherein: the use mode of the process device includes at least a speech mode (“a speech mode” Witriol [32]; Fig. 3, 303).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao in view of Baldwin, Tanaka, and Feng, with Witriol. Videoconferences may have different use cases, i.e. conference meeting with twenty participants or a personal one-on-one, which could necessitate for a variety of use modes to better improve the user experience. Witriol acknowledges that there is a motivation to do so through including a variety of use modes, thereby improving the user experience (Witriol [32]).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Refer to PTO-892, Notice of References Cited for a listing of analogous art.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CARISSA A JONES whose telephone number is (703)756-1677. The examiner can normally be reached Telework M-F 6:30 AM - 4:00 PM CT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached at 5712727503. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CARISSA A JONES/Examiner, Art Unit 2691
/DUC NGUYEN/Supervisory Patent Examiner, Art Unit 2691