Last updated: April 19, 2026
Application No. 18/606,683
MULTIMEDIA DATA PROCESSING METHOD, DEVICE AND ELECTRONIC DEVICE

Non-Final OA §101§102§103§112
Filed
Mar 15, 2024
Examiner
BROUGHTON, KATHLEEN M
Art Unit
2661
Tech Center
2600 — Communications
Assignee
Lenovo (Beijing) Limited
OA Round
1 (Non-Final)
Interview Optional

— +8.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 263 resolved cases, 2023–2026
Examiner Intelligence

BROUGHTON, KATHLEEN M View full profile →
Grants 83% — above average
Career Allow Rate
219 granted / 263 resolved
+21.3% vs TC avg
Moderate +8% lift
Without
With
+8.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
34 currently pending
Career history
297
Total Applications
across all art units
Statute-Specific Performance

§101
10.9%
-29.1% vs TC avg
§103
51.2%
+11.2% vs TC avg
§102
24.1%
-15.9% vs TC avg
§112
11.4%
-28.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 263 resolved cases
Office Action

§101 §102 §103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.


Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are:  
Claim 20
The electronic device is described to be executed by a processor stored on a computer (¶ [0074], [0080]). The following elements are claimed:
“a slice creator” (element 401, Figure 4, ¶ [0075]-[0078]), based on S101 methodology (Fig 1, 2 and at least ¶ [0019]-[0021], [0033]-[0052], [0062]-[0063]).
“a combiner” (element 402, Figure 4, ¶ [0075]-[0078]), based on S102 methodology (Fig 1, 2 and at least ¶ [0022], [0053]-[0056], [0062]-[0063]).
“an encoder” (element 403, Figure 4, ¶ [0075]-[0078]), based on S103 methodology (Fig 1, 2 and at least ¶ [0023]-[0024], [0062]-[0063]).

Under 35 U.S.C. § 112(f), the broadest reasonable interpretation of the claims each incorporate particular detailed computer processing operations that are considered an improvement upon existing technological processes and therefore are statutory eligible. See Enfish, LLC v. Microsoft Corp., 822 F.3d 1327, 1336-37, 118 USPQ2d 1684, 1689-90 (Fed. Cir. 2016) and MPEP § 2106(II).

Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, each are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.

If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.



Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 4, 6, 7, 10-13, 15-18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Claim 1 recites a multimedia data processing method (the mind can process multiple types of sensory information), comprising: 
performing a first processing on raw multimedia data based on target requirement information to obtain at least one to-be-processed multimedia data (the mind can process information captured through sensory organs, such as the eyes (plural sensors and data continuously input from different viewpoints regarding a given target of the environment), wherein a data volume of each to-be-processed multimedia data is smaller than a data volume of the raw multimedia data (each type of sensory input data (visual versus audio, for example) is a segment of the entire input of the sensory input data (visual with audio); 
performing a second processing on a corresponding to-be-processed multimedia data by using at least one processing algorithm to obtain a processing result for each to-be-processed multimedia data (the mind can process information captured through a second sensory organ, such as the ears (plural sensors and data continuously input from different input points), wherein the at least one processing algorithm is related to the target requirement information (sound input is associated with the visual input related to a given target of the environment); and 
performing a target processing on the raw multimedia data based on the processing result for each to-be-processed multimedia data to obtain target multimedia data (the mind can process both visual and audio input from the multiple sensors per multimedia input to detect a given target within the environment).  

Claim 4 recites the method of claim 1 (as described above), wherein performing the first processing on the raw multimedia data based on the target requirement information (mentally processing visual information input from the eyes to detect a given target within the environment) comprises: identifying at least one target data frame set from the raw multimedia data based on the target requirement information (the mind can process the visual data to identify a given target within the environment), and take the at least one target data frame set as the at least one to-be-processed multimedia data (the mind can identify a given initial location for the target), wherein there is at least a temporal correlation between data frames within each of the at least one target data frame set (the mind can determine changes of position of the given target from an initial input time and a later time of viewing the environment).  

Claim 6 recites the method of claim 1 (as described above), wherein performing the first processing on the raw multimedia data based on the target requirement information (mentally processing visual information input from the eyes to detect a given target within the environment) comprises: 
performing a target content sampling on the raw multimedia data based on the target requirement information, to obtain at least one target data frame set (the mind can process a fraction of the entire visual environment to determine a region of interest to evaluate a target), and 
taking the at least one target data frame set as the at least one to-be-processed multimedia data (a segment of time can be mentally analyzed to determine a representation of the target in the environment), wherein there is at least a content correlation between data frames within each of the at least one target data frame set (the visual input is taken continuously and therefore a content correlation exists temporally).  

Claim 7 recites the method of claim 1 (as described above), further comprising obtaining the target requirement information (the input sensory information of the environment is used to obtain a given intended datapoint regarding the target).  

Claim 10 recites the method of claim 7 (as described above), wherein obtaining the target requirement information (the input sensory information of the environment is used to obtain a given intended datapoint regarding the target) comprises: obtaining the target requirement information based on application scenario information of the target multimedia data (the mind can interpret the input sensory information of the target in the environment based on the intended purpose of understanding the target in the environment).  

Claim 11 recites the method of claim 7 (as described above), wherein obtaining the target requirement information (the input sensory information of the environment is used to obtain a given intended datapoint regarding the target) comprises: obtaining the target requirement information based on configuration information of a receiving terminal of the target multimedia data (the given sensory information of the environment is input based on the position of the person intaking the sensory environment allowing for the mind to process information regarding a target).  

Claim 12 recites the method of claim 1 (as described above), wherein performing the second processing on a corresponding to-be-processed multimedia data by using the at least one processing algorithm  (the mind can process information captured through a second sensory organ, such as the ears (plural sensors and data continuously input from different input points) comprises: 
determining a processing algorithm corresponding to each to-be-processed multimedia data based on the target requirement information (an audio analysis may be performed in the mind to relate the sound to the associated visual input related to a given target of the environment); 
performing a parallel processing on each to-be-processed multimedia data based on the corresponding processing algorithm to obtain at least one processing result (the mind can simultaneously process sensory information pertaining to both the visual and audio sensory inputs); and 
processing the at least one processing result into processing parameters for the raw multimedia data (the mind can decipher audio input information corresponding to the visual information to determine audio associations of a given target).  

Claim 13 recites the method of claim 12 (as described above), wherein performing the target processing of the raw multimedia data based on the processing result for each to-be-processed multimedia data (the mind can decipher audio input information corresponding to the visual information to determine audio associations of a given target) comprises: performing the target processing on the raw multimedia data based on the processing parameters corresponding to each processing result to obtain the target multimedia data (the mind processes the sensory information related to the audio of the target based on a given input received).  

Claim 15 recites the method of claim 1 (as described above), further comprising: updating the at least one processing algorithm based on changing information of the target requirement information (the mind continuously changes information processing based on sensory input, including how to process information of an environment of a target based on changing conditions of the environment).  

Claim 16 recites the method of claim 1 (as described above), further comprising: executing a corresponding first processing by at least one electronic device in a processing system (the mind can process additional visual sensory information using different neurons other than those used for the initial visual processing of the target), the processing system including a plurality of electronic devices capable of executing the multimedia data processing method (the mind contains a plurality of regions processing information simultaneously and the processing can occur simultaneously and in coordination).  

Claim 17 recites the method of claim 1 (as described above), further comprising: executing a corresponding second processing using a corresponding processing algorithm by at least one electronic device in a processing system  (the mind can process additional sensory information using different neurons other than those used for the initial audio processing of the target), the processing system including a plurality of electronic devices capable of executing the multimedia data processing method  (the mind contains a plurality of regions processing information simultaneously and the processing can occur simultaneously and in coordination).

Claim 18 recites a multimedia data processing device (the mind is a device capable of processing multiple types of sensory input), comprising: a processor (the mind processes sensory information of the environment0; and a memory coupled to the processor (the mind can store information and process the information), the memory storing instructions that, when executed by the processor, cause the processor (the mind can process information representing the environment input from the sensory organs, including interpretation of the environment including associated with a given target) to: perform steps identical to claim 1 (as discussed above).

The limitations of processing multimedia data are processes that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, regarding the method or electronic device, other than reciting generic placeholder-related computer components, such as a processor and memory, nothing in the claim elements precludes the steps from practically being performed in the mind. For example in claim 1, language of “performing” in the context of the each step is broadly described for processing data using a given generically claimed processing algorithm related to the data information from the environment. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, these claims each recite an abstract idea.

This judicial exception is not integrated into a practical application. In particular, the above identified method and device claims do not recite any elements which could not be performed in the mind and the claims only recite generic placeholder-related computer components, including a memory and processor. The computer components are recited at a high-level of generality (i.e., generic machines for performing general processing) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Accordingly, the computer components do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. Therefore, the aforementioned claims are directed to abstract ideas. 

The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a generic placeholder-related computer components, the memory and processor, used to analyze multimedia data, amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an invention concept. The claims are not patent eligible.

To note regarding Claim 20, under 35 U.S.C. § 112(f), the broadest reasonable interpretation of the claim incorporates particular detailed computer processing operations that are considered an improvement upon existing technological processes and therefore statutory eligible. See Enfish, LLC v. Microsoft Corp., 822 F.3d 1327, 1336-37, 118 USPQ2d 1684, 1689-90 (Fed. Cir. 2016) and MPEP § 2106(II).

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 7-15, 18 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Wu et al (CN 112073649 with application CN 2020/10924421.9 cited by US 2023/0144094 with foreign priority; US 2023/0144094 used as translation in the rejections below).
Regarding Claim 1, Wu et al teach a multimedia data processing method (method for processing multimedia data for a first client; Fig 1 and ¶ [0045]), comprising: 
performing a first processing on raw multimedia data based on target requirement information (“target requirement information” described as information identification requirements, labeling requirements, among other processing requirements, specification ¶ [0019]) to obtain at least one to-be-processed multimedia data (the multimedia data is obtained S101 and is first processed with an editing process by the user inputs, S102, to be labeled, S103, with the goal of generating a template from the labeled multimedia data based on a template instruction (target requirement information); Fig 1 and ¶ [0046]-[0052]), wherein a data volume of each to-be-processed multimedia data is smaller than a data volume of the raw multimedia data (the amount of editing material data is one type of data (video, text, audio, picture) and thereby smaller than the entire multimedia data; ¶ [0050]-[0051]); 
performing a second processing on a corresponding to-be-processed multimedia data by using at least one processing algorithm to obtain a processing result for each to-be-processed multimedia data (a template generation process can be performed on the labeled material data by removing the label to result in generating a template based on the labeled material data from the multimedia data, S104, and editing information may include an adapted resolution (corresponding to-be-processed multimedia data) to generate the template; Fig 1 and ¶ [0055]-[0058], [0073]-[0074]), wherein the at least one processing algorithm is related to the target requirement information (the template generation instruction is performed on the labeled material data from the multimedia data; ¶ [0055]); and 
performing a target processing on the raw multimedia data based on the processing result for each to-be-processed multimedia data to obtain target multimedia data (the processing multimedia data can be performed multiple times to generate multiple templates; Fig 1 and ¶ [0058]).  

Regarding Claim 7, Wu et al teach the method of claim 1 (as described above), further comprising obtaining the target requirement information (the template instruction is generated based on the labeled multimedia data from user edits; ¶ [0048]).  

Regarding Claim 8, Wu et al teach the method of claim 7 (as described above), wherein obtaining the target requirement information comprises: obtaining the target requirement information based on information input by a target user acting on an input component of an electronic device  (the template instruction is generated based on the labeled multimedia data from user edits on an electronic device supporting user interface, such as a personal computer; ¶ [0045], [0048]).  

Regarding Claim 9, Wu et al teach the method of claim 7 (as described above), wherein obtaining the target requirement information comprises: obtaining the target requirement information based on interaction data between an electronic device and a receiving terminal of the target multimedia data (the user uses an electronic device to log in to a web client or application client (a receiving terminal) to process multimedia data; ¶ [0045]).  

Regarding Claim 10, Wu et al teach the method of claim 7 (as described above), wherein obtaining the target requirement information comprises: obtaining the target requirement information based on application scenario information of the target multimedia data (the user will label the displayed material data based on the type of data (video data, text information data, audio data, picture data) and the intended template to generate for a particular purpose determined by the user; ¶ [0047], [0057]-[0058]).  

Regarding Claim 11, Wu et al teach the method of claim 7 (as described above), wherein obtaining the target requirement information comprises: obtaining the target requirement information based on configuration information of a receiving terminal of the target multimedia data  (the receiving template generation instructions indicates the user adjustments for the multimedia data for the desired template generation, which is input by the user to the terminal of the user electronic device at an interaction layer; ¶ [0045], [0048], [0056]).

Regarding Claim 12, Wu et al teach the method of claim 1 (as described above), including performing the second processing on a corresponding to-be-processed multimedia data by using the at least one processing algorithm (a template generation process (second processing) can be performed on the labeled material data with adapted resolution (corresponding to-be-processed multimedia data) by performing at least one function control in the editing to generate the template; Fig 1 and ¶ [0048]-[0050], [0055]-[0063], [0073]-[0074]) comprises: 
determining a processing algorithm corresponding to each to-be-processed multimedia data based on the target requirement information (the template instruction (target requirement information) is based on the editing instruction from the user including the integration of given function controls (processing algorithms) to generate the labeled material data; Fig 1, 3 and ¶ [0048], [0052]-[0056], [0074]-[0075]); 
performing a parallel processing on each to-be-processed multimedia data based on the corresponding processing algorithm to obtain at least one processing result (the multiple function controls (parallel processing functions with corresponding processing algorithm) to achieve the given intended template; Fig 1-3 and ¶ [0062]-[0065], [0073]-[0075], [0213]); and 
processing the at least one processing result into processing parameters for the raw multimedia data  (a given template is generated and may be used for processing additional (raw) material data; Fig 1 and ¶ [0057]-[0058]).  

Regarding Claim 13, Wu et al teach the method of claim 12 (as described above), wherein performing the target processing of the raw multimedia data based on the processing result for each to-be-processed multimedia data comprises: performing the target processing on the raw multimedia data based on the processing parameters corresponding to each processing result to obtain the target multimedia data (the processing multimedia data can be performed multiple times to generate multiple templates, with changes made to the editing functions, thereby enhancing the templates and the conversion rate of templates; Fig 1 and ¶ [0058]-[0063]).  

Regarding Claim 14, Wu et al teach the method of claim 13 (as described above), wherein the target processing includes at least one of the following: cropping, content replacement, annotation, scaling, parameter adjustment, special effects processing, encoding, or rendering (editing includes a labeling operation, S103; Fig 1 and ¶ [0052]).  

Regarding Claim 15, Wu et al teach the method of claim 1 (as described above), further comprising updating the at least one processing algorithm based on changing information of the target requirement information (the function controls (processing algorithms) can be edited (updated) based on the template instruction (target requirement information) from the user edits; Fig 1, 3 and ¶ [0048], [0052]-[0057], [0074]-[0075]).

Regarding Claim 18, Wu et al teach a multimedia data processing device (electronic device 1900; Fig 19 and ¶ [0205]), comprising: a processor (processor 1901; Fig 19 and ¶ [0205]); and a memory coupled to the processor (memory 1902, 1903 coupled to processor 1901; Fig 19 and ¶ [0205]), the memory storing instructions that, when executed by the processor, cause the processor (memory storing program and executed by processor; Fig 19 and ¶ [0205]) to: perform steps identical to claim 1 (as described above).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 2-3, 19 are rejected under 35 U.S.C. 103 as being unpatentable over Wu et al (CN 112073649 with application CN 2020/10924421.9 cited by US 2023/0144094 with foreign priority; US 2023/0144094 used as translation in the rejections below) in view of Gao et al (US 2020/0302179).
Regarding Claim 2, Wu et al teach the method of claim 1 (as described above).
Wu et al does not teach further comprising at least one of following: obtaining first timestamp information of the raw multimedia data, and using the first timestamp information to align each to-be-processed multimedia data, the processing result for each to-be-processed multimedia data, and a corresponding data frame of the raw multimedia data; or determining an availability of the processing result for each to-be-processed multimedia data based on at least second timestamp information, and determining, based on the availability of the processing result for each to-be-processed multimedia data, whether or not to perform the target processing on the raw multimedia data based on the processing result.  
Gao et al is analogous art pertinent to the technological problem addressed in this application and teaches at least one of following: obtaining first timestamp information of the raw multimedia data  (the obtained multimedia file includes frames and corresponding audio, each image and audio frame labeled with a timestamp, S203; Fig 2 and ¶ [0079]-[0082]), and using the first timestamp information to align each to-be-processed multimedia data, the processing result for each to-be-processed multimedia data, and a corresponding data frame of the raw multimedia data (the timestamp of a given data frame corresponds between each data of the multimedia data (thereby align) and for the given target data for multiple consecutive data frames resulting in a performance segment, S204, S205; Fig 2 and ¶ [0083]-[0088]); 
or determining an availability of the processing result for each to-be-processed multimedia data based on at least second timestamp information, and determining, based on the availability of the processing result for each to-be-processed multimedia data, whether or not to perform the target processing on the raw multimedia data based on the processing result (examiner note: this claim limitation is not required based on the “at least one of the following”…”or” language).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the current application to combine the teachings of Wu et al with Gao et al including at least one of following: obtaining first timestamp information of the raw multimedia data, and using the first timestamp information to align each to-be-processed multimedia data, the processing result for each to-be-processed multimedia data, and a corresponding data frame of the raw multimedia data; or determining an availability of the processing result for each to-be-processed multimedia data based on at least second timestamp information, and determining, based on the availability of the processing result for each to-be-processed multimedia data, whether or not to perform the target processing on the raw multimedia data based on the processing result. By using timestamp data to identify the frame data from multimedia file data, the different types of data (video image frame and associated audio) is labeled and temporally identified for each data, thereby increasing precision and efficiency resulting in reduced granularity, as recognized by Gao et al (¶ [0088]).

Regarding Claim 3, Wu et al teach the method of claim 2 (as described above), wherein determining the availability of the processing result based on at least the second timestamp information (examiner note: this claim limitation is not required based on the “at least one of the following”…”or” …”second timestamp” language in claim 2 and the following limitation is based on a limitation that is not required in the claim in which it depends) comprises: marking each frame of the to-be-processed multimedia data using the first timestamp information of the raw multimedia data; marking a processing result of each frame of the to-be-processed multimedia data using the second timestamp information; if it is determined, based on the second timestamp information and the first timestamp information, that a processing time length of a first multimedia data frame is not greater than a first threshold, performing the target processing on the raw multimedia data based on a processing result of the first multimedia data frame; and if it is determined, based on the second timestamp and the first timestamp information, that the processing time length of the first multimedia data frame is greater than the first threshold, discarding the processing result of the first multimedia data frame (limitations are all dependent on a claim limitation not required in the claim in which it depends).  

Regarding Claim 19, Wu et al teach the multimedia data processing device according to claim 18 (as described above), wherein the instructions, when executed by the processor, further cause the processor to perform at least one of following: steps identical to claim 2 (as described above).

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Wu et al (CN 112073649 with application CN 2020/10924421.9 cited by US 2023/0144094 with foreign priority; US 2023/0144094 used as translation in the rejections below) in view of Yang et al (US 2022/0239882).
Regarding Claim 4, Wu et al teach the method of claim 1 (as described above), including performing the first processing on the raw multimedia data based on the target requirement information  (the multimedia data is obtained S101 and is first processed with an editing process by the user inputs, S102, to be labeled, S103, with the goal of generating a template from the labeled multimedia data based on a template instruction (target requirement information); Fig 1 and ¶ [0046]-[0052]).
Wu et al does not teach identifying at least one target data frame set from the raw multimedia data based on the target requirement information, and take the at least one target data frame set as the at least one to-be-processed multimedia data, wherein there is at least a temporal correlation between data frames within each of the at least one target data frame set.  
Yang et al is analogous art pertinent to the technological problem addressed in this application and teaches identifying at least one target data frame set from the raw multimedia data based on the target requirement information (multimedia data stream data includes video stream (multiple frame set), which can be identified as a series based on timestamp corresponding to target timestamp, S110; Fig 1 and ¶ [0038]-[0039]), and take the at least one target data frame set as the at least one to-be-processed multimedia data (the multimedia data stream based on timestamp (associated with corresponding speech and text) are determined as the target content, S120; Fig 1 and ¶  [0040]-[0042]), wherein there is at least a temporal correlation between data frames within each of the at least one target data frame set (the video stream of the multimedia data stream correspond to the timestamp (temporal correlation) and the display text and associated audio within the target content ¶ [0039], [0042]).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the current application to combine the teachings of Wu et al with Yang et al including identifying at least one target data frame set from the raw multimedia data based on the target requirement information, and take the at least one target data frame set as the at least one to-be-processed multimedia data, wherein there is at least a temporal correlation between data frames within each of the at least one target data frame set. By using a timestamp to identify the correspondence between the multimedia data stream, the target page data corresponds between the multimedia data is synchronized, thereby improving search efficiency, as recognized by Yang et al (¶ [0004]).

Claims 5, 6 are rejected under 35 U.S.C. 103 as being unpatentable over Wu et al (CN 112073649 with application CN 2020/10924421.9 cited by US 2023/0144094 with foreign priority; US 2023/0144094 used as translation in the rejections below) in view of Raveendran et al (US 2007/0081587).
Regarding Claim 5, Wu et al teach the method of claim 1 (as described above), including performing the first processing on the raw multimedia data based on the target requirement information  (the multimedia data is obtained S101 and is first processed with an editing process by the user inputs, S102, to be labeled, S103, with the goal of generating a template from the labeled multimedia data based on a template instruction (target requirement information); Fig 1 and ¶ [0046]-[0052]).
Wu et al does not teach performing a downsampling processing on target data frames of the raw multimedia data based on the target requirement information, to obtain the at least one to-be-processed multimedia data, wherein the number of target data frames is not greater than a total number of data frames of the raw multimedia data, and a resolution of each target data frame is lower than a resolution of a corresponding data frame in the raw multimedia data.  
Raveendran et al is analogous art pertinent to the technological problem addressed in this application and teaches performing a downsampling processing on target data frames of the raw multimedia data based on the target requirement information (raw video data 224 is preprocessed by preprocessor 226 including resizing (down-sampling) the multimedia data based on a given video sequence; Fig 2, 6 and ¶ [0086]-[0087]), to obtain the at least one to-be-processed multimedia data (the multimedia data output from the preprocessor 226 includes resizing; Fig 2, 6 and ¶ [0086]-[0087]), wherein the number of target data frames is not greater than a total number of data frames of the raw multimedia data (each of multiple video sequences is a fraction of the entire raw video data; ¶ [0086]), and a resolution of each target data frame is lower than a resolution of a corresponding data frame in the raw multimedia data (the resizing (spatial resolution down-sampling) yields a lower resolution output than the video sequence input to resizing by the preprocessor 226; Fig 2, 6 and ¶ [0086]).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the current application to combine the teachings of Wu et al with Raveendran et al et al including performing a downsampling processing on target data frames of the raw multimedia data based on the target requirement information, to obtain the at least one to-be-processed multimedia data, wherein the number of target data frames is not greater than a total number of data frames of the raw multimedia data, and a resolution of each target data frame is lower than a resolution of a corresponding data frame in the raw multimedia data. By downsampling the video stream data, the data may be processed in smaller bit quantity, thereby creating an efficient encoding process with reduced error, leading to improved mobile broadcasting of streaming multimedia information, as recognized by Raveendran et al (¶ [0007]).

Regarding Claim 6, Wu et al teach the method of claim 1 (as described above), including performing the first processing on the raw multimedia data based on the target requirement information  (the multimedia data is obtained S101 and is first processed with an editing process by the user inputs, S102, to be labeled, S103, with the goal of generating a template from the labeled multimedia data based on a template instruction (target requirement information); Fig 1 and ¶ [0046]-[0052]).
Wu et al does not teach performing a target content sampling on the raw multimedia data based on the target requirement information, to obtain at least one target data frame set, and taking the at least one target data frame set as the at least one to-be-processed multimedia data, wherein there is at least a content correlation between data frames within each of the at least one target data frame set.  
Raveendran et al is analogous art pertinent to the technological problem addressed in this application and teaches performing a target content sampling on the raw multimedia data based on the target requirement information, to obtain at least one target data frame set (a given video sequence (target content sampling) of the raw video data 224 is sent from the decoder 214 to the preprocessor 226; ¶ [0086]), and taking the at least one target data frame set as the at least one to-be-processed multimedia data (the given video sequence is processed by the preprocessor 226; Fig 2, 6 and ¶ [0086]), wherein there is at least a content correlation between data frames within each of the at least one target data frame set (the video sequences are converted into a progressive (content correlation) video sequence for further processing, performed by the encoder 228; Fig 2, 6 and ¶ [0086]).   
It would have been obvious to one of ordinary skill in the art before the effective filing date of the current application to combine the teachings of Wu et al with Raveendran et al including performing a target content sampling on the raw multimedia data based on the target requirement information, to obtain at least one target data frame set, and  taking the at least one target data frame set as the at least one to-be-processed multimedia data, wherein there is at least a content correlation between data frames within each of the at least one target data frame set. By using sampling of the video stream data, the data may be processed in smaller bit quantity, thereby creating an efficient encoding process with reduced error, leading to improved mobile broadcasting of streaming multimedia information, as recognized by Raveendran et al (¶ [0007]).

Claims 16, 17 are rejected under 35 U.S.C. 103 as being unpatentable over Wu et al (CN 112073649 with application CN 2020/10924421.9 cited by US 2023/0144094 with foreign priority; US 2023/0144094 used as translation in the rejections below) first embodiment (Fig 1) in view of the second embodiment (Fig 16).
Regarding Claim 16, Wu et al teach the method of claim 1 (as described above).
Wu et al method of first client 100 (Fig 1) does not include the same embodiment limitation teaching of executing a corresponding first processing by at least one electronic device in a processing system, the processing system including a plurality of electronic devices capable of executing the multimedia data processing method.  
Wu et al teaches a second embodiment for the teaching of executing a corresponding first processing by at least one electronic device in a processing system (an execution environment for performing the method for processing and generating a template based on multimedia data from second client 300 to perform corresponding processing); Fig 16 and ¶ [0178]), the processing system including a plurality of electronic devices capable of executing the multimedia data processing method (the processing system includes electronic devices 100, 200, 300 and a second user 300 obtains new target multimedia data by performing editing based on the multimedia data (editing via labeling interpreted as first processing) and publishes the new target multimedia data and template, where the first client 100 and second client 300 may be interchanged; Fig 16 and ¶ [0178]).  
It would have been obvious to one of ordinary skill in the art before the effective to combine the first embodiment (first client 100) with the second embodiment (includes second client 300) of the teachings of Wu et al including executing a corresponding first processing by at least one electronic device in a processing system, the processing system including a plurality of electronic devices capable of executing the multimedia data processing method. By using corresponding processing systems and edits, the efficiency of generating target multimedia and reducing the requirements for single terminal processing of multimedia data is achieved thereby improving the efficiency and quantity of target multimedia data editing, as recognized by Wu et al (¶ [0177]).

Regarding Claim 17, Wu et al teach the method of claim 1 (as described above).
Wu et al method of first client 100 (Fig 1) does not include the same embodiment limitation teaching of executing a corresponding second processing using a corresponding processing algorithm by at least one electronic device in a processing system, the processing system including a plurality of electronic devices capable of executing the multimedia data processing method.
Wu et al teaches a second embodiment for the teaching of executing a corresponding second processing using a corresponding processing algorithm by at least one electronic device in a processing system (an execution environment for performing the method for processing and generating a template based on multimedia data from second client 300 to perform corresponding processing); Fig 16 and ¶ [0178]), the processing system including a plurality of electronic devices capable of executing the multimedia data processing method (the processing system includes electronic devices 100, 200, 300 and a second user 300 obtains new target multimedia data by performing editing based on the multimedia data and publishes the new target multimedia data and template (template publishing interpreted as second processing), where the first client 100 and second client 300 may be interchanged; Fig 16 and ¶ [0178]).  
It would have been obvious to one of ordinary skill in the art before the effective to combine the first embodiment (first client 100) with the second embodiment (includes second client 300) of the teachings of Wu et al including executing a corresponding second processing using a corresponding processing algorithm by at least one electronic device in a processing system, the processing system including a plurality of electronic devices capable of executing the multimedia data processing method. By using corresponding processing systems and edits, the efficiency of generating the multimedia and reducing the requirements for single terminal processing is achieved thereby improving the efficiency and quantity of target multimedia data editing, as recognized by Wu et al (¶ [0177]).

Allowable Subject Matter
Claim 20 is allowed based on the claim interpretation under 35 U.S.C. § 112(f) as discussed above. 
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Matsuda et al (US 2022/0028427) teach a system and method for media-editing including analysis of multimedia between multiple clips to analyze the multimedia.

Mahapatra et al (US 2018/0130496) teach a method and system for multimedia data compilation and auto-generation of transcript notes based on visual and audio data.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to KATHLEEN M BROUGHTON whose telephone number is (571)270-7380. The examiner can normally be reached Monday-Friday 8:00-5:00.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, John Villecco can be reached at (571) 272-7319. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KATHLEEN M BROUGHTON/Primary Examiner, Art Unit 2661
Read full office action
Prosecution Timeline

Mar 15, 2024
Application Filed
Feb 21, 2026
Non-Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/313,287
Patent 12602915
FEATURE FUSION FOR NEAR FIELD AND FAR FIELD IMAGES FOR VEHICLE APPLICATIONS
2y 5m to grant Granted Apr 14, 2026
18/183,030
Patent 12597233
SYSTEM AND METHOD FOR TRAINING A MACHINE LEARNING MODEL
2y 5m to grant Granted Apr 07, 2026
18/219,943
Patent 12586203
IMAGE CUTTING METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
2y 5m to grant Granted Mar 24, 2026
18/274,217
Patent 12567227
METHOD AND SYSTEM FOR UNSUPERVISED DEEP REPRESENTATION LEARNING BASED ON IMAGE TRANSLATION
2y 5m to grant Granted Mar 03, 2026
18/309,150
Patent 12565240
METHOD AND SYSTEM FOR GRAPH NEURAL NETWORK BASED PEDESTRIAN ACTION PREDICTION IN AUTONOMOUS DRIVING SYSTEMS
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
83%
Grant Probability
92%
With Interview (+8.3%)
2y 7m
Median Time to Grant
Low
PTA Risk
Based on 263 resolved cases by this examiner. Grant probability derived from career allow rate.