Last updated: April 19, 2026
Application No. 18/271,666
MEDIA PROCESSING METHOD, DEVICE AND SYSTEM

Final Rejection §103
Filed
Jul 10, 2023
Examiner
SMITH, SEAN THOMAS
Art Unit
2659
Tech Center
2600 — Communications
Assignee
Eyeora Limited
OA Round
2 (Final)
Interview Optional

— +33.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 6 resolved cases, 2023–2026
Examiner Intelligence

SMITH, SEAN THOMAS View full profile →
Grants 83% — above average
Career Allow Rate
5 granted / 6 resolved
+21.3% vs TC avg
Strong +33% interview lift
Without
With
+33.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
37 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
27.9%
-12.1% vs TC avg
§103
50.7%
+10.7% vs TC avg
§102
12.9%
-27.1% vs TC avg
§112
8.6%
-31.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 6 resolved cases
Office Action

§103
DETAILED ACTION
This Office action is responsive to amendments and arguments filed on January 9th, 2026. Claims 1, 4, 11-13 and 16 are amended, claim 2 is canceled, claims 1 and 3-20 are pending and have been examined; hence, this action is made FINAL.
Any objections/rejections not mentioned in this Office action have been withdrawn by the Examiner.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority to European Patent application 21151010. Accordingly, claims 1-20 have been afforded the benefit of the earlier filing date.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on July 26 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) is invoked.As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
Claim 11 recites, “The computer-implemented method of claim 2, further comprising providing the user with means for switching from playing the master 3D digital asset to playing of the auxiliary digital asset,” and thus are being interpreted to cover the corresponding structure described in the specification, paragraph [25], “The switching means may comprise an interactive hotspot located at a virtual  positional location within the master 3D digital asset.”
Response to Arguments
With respect to rejections made under 35 U.S.C. 101, the independent claims as amended include method steps that cannot be performed in the human mind; accordingly, the rejections under 35 U.S.C. 101 are withdrawn.
With respect to rejections made under 35 U.S.C. 102, “Applicant has amended independent claim 1 to incorporate the subject matter of dependent claim 2. The Office Action acknowledges (p. 17) that this subject matter is not disclosed by Shih,” (page 10 of Remarks). Accordingly, the rejections under 35 U.S.C. 102 are withdrawn; however, new grounds of rejection are raised under 35 U.S.C. 103 in view of reference Weber. Further details are provided below.
With regards to rejections made under 35 U.S.C. 103, Applicant argues, “that Shih and Weber fail to disclose or suggest at least the [switching to playing of the auxiliary digital asset based on the time synchronisation data],” (emphasis original, page 11 of Remarks). Applicant further argues, “Weber’s ‘cue points’ fail to disclose or suggest the ‘time synchronisation data’ of the claim. Thus, switching based on cue points fails to disclose or suggest the claim’s ‘switching… based on time synchronisation data” (page 12 of Remarks).
Examiner respectfully disagrees. As is shown in the Remarks, Weber’s cue points “[indicate] a position in an audio track where… the audio track is suitable for editing,” and Weber’s system may “stop playing a [first] audio track at a cue point and… start playing a [second] audio track at that cue point,” thereby “enabling the dynamic change of audio tracks…” The claims broadly disclose a correlation between audio assets (a feature taught by Shih) and “switching to playing… the auxiliary digital asset based on the time synchronisation data,” without making a distinction between the composition or function of that time synchronisation data and Weber’s cue points. Both elements serve to mark a point in a timeline when audio (or any other digital media) may be overlaid or interchanged. Accordingly, the rejections relying on Weber are maintained.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1,3-7, 9-10, 16 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication 2017/0262255 to Shih et al. (hereinafter, "Shih") in view of U.S. Patent Application Publication 2017/0069351 to Weber (hereinafter, "Weber").
Regarding claim 1, Shih teaches a computer-implemented method for processing 3D media data, the method comprising: receiving a master 3D digital asset comprising master audio data (paragraph [004], "According to one embodiment of the present invention, an audio synchronization method comprises: receiving a first audio signal from a first recording device;");
receiving an auxiliary digital asset comprising auxiliary audio data (paragraph [0004], "According to one embodiment of the present invention, an audio synchronization method comprises: receiving a first audio signal from a first recording device; receiving a second audio signal from a second recording device;");
determining a correlation between the master audio data and the auxiliary audio data (paragraph [0004], "According to one embodiment of the present invention, an audio synchronization method comprises: receiving a first audio signal from a first recording device; receiving a second audio signal from a second recording device; performing a correlation operation upon the first audio signal and the second audio signal to align a first pattern of the first audio signal and the first pattern of the second audio signal;");
identifying time synchronisation data of the auxiliary digital asset, relative to the master 3D digital asset, based on the determined correlation (paragraph [0004], "after the first patterns of the first audio signal and the second audio signal are aligned, calculating a difference between a second pattern of the first audio signal and the second pattern of the second audio signal; and obtaining a starting-time difference between the first audio signal and the second audio signal for audio synchronization according to the difference between the second pattern of the first audio signal and the second pattern of the second audio signal.");
storing the time synchronisation data in memory (paragraph [0050], "In detail, the electronic device 1450 comprises a capturing module 1451, a stitching module 1452, an encoder 1453 and a file composing module 1454. The capturing module 1451 is used to receive the audio signals from two or more microphones. The stitching module 1452 is used to calculate the starting-time difference between the microphones to compensate/synchronize the audio signals. The encoder 1453 is used to encode the compensated/synchronized audio signals according to related codec standard. The file composing module 1454 converts the encoded audio signals to a media file having a designated format such as ISO base media file format (ISOBMFF). Finally the files generated by the electronic device 1450 are stored in a storage device or a cloud server.") anddelivering, to a user device, the master 3D digital asset and the auxiliary digital asset; playing the master 3D digital asset (paragraph [0044], "As shown in FIG. 13, the system comprises an electronic device 1350 and a head-mounted display 1370, where the electronic device 1350 can be any one of the electronic devices 608, 708, 908, 1108 that is capable of receiving the audio signals recorded by the microphones and providing the synchronized audio signals, and the head-mounted display 1370 is used to receive the synchronized audio signals from the electronic device 1350 via network 1360 and to play the synchronized audio signals for the user.").
Shih does not explicitly teach  “switching to playing of the auxiliary digital asset based on the time synchronisation data,” and thus, Weber is introduced.
Weber teaches switching to playing of the auxiliary digital asset based on the time synchronisation data (paragraph [0018] ,"In an embodiment, the separate audio tracks include cue points, a cue point indicating a position in an audio track where, during playing of that audio track, the audio track is suitable for editing. The movie episode data object further comprises playback instructions configured to direct the playback engine to stop playing a separate audio track at a cue point and to start playing a different separate audio track at that cue point.").
Shih and Weber are considered analogous because they are each concerned with aligning and processing audio data. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Shih with the teachings of Weber for the purpose of enhancing user experience. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Regarding claim 3, Shih teaches the computer-implemented method of claim 1, wherein the step of storing comprises storing the time synchronisation data in a media database in association with the master 3D digital asset and/or the auxiliary digital asset (paragraph [0050], "In detail, the electronic device 1450 comprises a capturing module 1451, a stitching module 1452, an encoder 1453 and a file composing module 1454. The capturing module 1451 is used to receive the audio signals from two or more microphones. The stitching module 1452 is used to calculate the starting-time difference between the microphones to compensate/synchronize the audio signals. The encoder 1453 is used to encode the compensated/synchronized audio signals according to related codec standard. The file composing module 1454 converts the encoded audio signals to a media file having a designated format such as ISO base media file format (ISOBMFF). Finally the files generated by the electronic device 1450 are stored in a storage device or a cloud server.").
Regarding claim 4, Shih teaches the computer-implemented method of claim 3, wherein the step of switching to playing of the auxiliary digital asset based on the time synchronisation data comprises performing a lookup in the media database of the master 3D digital asset and/or the auxiliary digital asset and identifying the time synchronisation data and utilising the time synchronisation data for playback of the auxiliary digital asset (paragraph [0051] , "The head-mounted display 1470 comprises a file parsing module 1472, a decoder 1473, a head/eye tracking module 1474 and a rendering module 1475. When the user wants the virtual reality, the file parsing module 1472 is used to receive the media files from the storage device or the cloud server, and to parse the media file to generate the encoded audio signals.").
Regarding claim 5, Shih teaches the computer-implemented method of claim 1, further comprising, prior to determining a correlation: processing the master 3D digital asset to extract the master audio data (paragraph [0024], "After receiving the original recorded audio signals, an auto-correlation operation is performed upon the original recorded audio signals to align the synchronization sound patterns within the original recorded audio signals."); and
separately processing auxiliary digital asset to extract the auxiliary audio data (paragraph [0030], "After the first synchronization sound patterns “b” and “c” are aligned, a difference between the second synchronization sound pattern “e” of the audio signal recorded by the microphone 420_2 and the second synchronization sound pattern “f” of the audio signal recorded by the microphone 420_1 is calculated,").
Regarding claim 6, Shih teaches the computer-implemented method of claim 5, wherein the step of processing the master 3D digital asset is a pre-processing step performed upon receipt of the master 3D digital asset by a media processing device (paragraph [0024], "After receiving the original recorded audio signals, an auto-correlation operation is performed upon the original recorded audio signals to align the synchronization sound patterns within the original recorded audio signals.").
Regarding claim 7, Shih teaches the computer-implemented method of claim 5, wherein the step of separately processing the auxiliary digital asset is a subsequent processing step performed upon receipt of the auxiliary digital asset by a media processing device (paragraph [0030], "After the first synchronization sound patterns “b” and “c” are aligned, a difference between the second synchronization sound pattern “e” of the audio signal recorded by the microphone 420_2 and the second synchronization sound pattern “f” of the audio signal recorded by the microphone 420_1 is calculated,").
Regarding claim 9, Shih teaches the computer-implemented method of claim 1, wherein: the 3D digital asset comprises 360 degree or 180 degree video and associated master audio data (paragraph [0022], "In addition, the embodiments of the audio synchronization method can be applied to a 360-degree audio/video application as shown in FIG. 1, where a plurality of cameras 110_1 and 110_2 and microphones 120_1 and 120_2 are used to record audio and video signals for use of virtual reality."), and/or the master and auxiliary data both comprise audio from a common source.
Regarding claim 10, Shih teaches the computer-implemented method of claim 1, wherein the auxiliary digital asset comprises at least the auxiliary audio data and optionally 3D video content, such as 360 degree or 180 degree video content, or 2D video content (paragraph [0022], "In addition, the embodiments of the audio synchronization method can be applied to a 360-degree audio/video application as shown in FIG. 1, where a plurality of cameras 110_1 and 110_2 and microphones 120_1 and 120_2 are used to record audio and video signals for use of virtual reality.").
Regarding claim 16, Shih teaches the computer-implemented method of claim 1, wherein identifying the time synchronisation data comprises identifying a time offset of the auxiliary digital asset relative to the master 3D digital asset (paragraph [0004], "after the first patterns of the first audio signal and the second audio signal are aligned, calculating a difference between a second pattern of the first audio signal and the second pattern of the second audio signal; and obtaining a starting-time difference between the first audio signal and the second audio signal for audio synchronization according to the difference between the second pattern of the first audio signal and the second pattern of the second audio signal."),
wherein the master 3D digital asset and auxiliary digital asset are uploaded to a server (paragraph [0037], "FIG. 9 is a system capable of recording audio signals for 360-degree audio application according to one embodiment of the present invention. As shown in FIG. 9, the system comprises four electronic devices 902, 904, 906 and 908... In addition, the electronic device 902, 904 and 906 can be any portable device such as a smart phone or a tablet having a speaker and a microphone, and the electronic device 908 can be a recording device, a remote controller or a cloud server device, but it is not a limitation of the present invention."), and
wherein the server is configured to perform the steps of: determining the correlation, identifying the time offset and storing the time offset in memory automatically, immediately or via a queue upon receipt by the server of the auxiliary digital asset (paragraph [0024], "After receiving the original recorded audio signals, an auto-correlation operation is performed upon the original recorded audio signals to align the synchronization sound patterns within the original recorded audio signals."), and
optionally wherein: the delivery of the master 3D digital asset or auxiliary digital asset to the user device is from the server and occurs in real time with playback of the master 3D digital asset or auxiliary digital asset on the user device (paragraph [0017], "The movie episode data object further comprises audio tracks which are separate from the audio/video segments and have no video content, and further playback instructions configured to direct the playback engine to retrieve from the movie episode data object one of the separate audio tracks and to play the separate audio track uninterruptedly during a change of audio/video segments at a point in time selected interactively and dynamically by the user, while playing an end of a first audio/video segment, up to the point in time selected by the user, and a start of a second audio/video segment which is selected by the user and played subsequently to the first audio/video segment.").
Regarding claim 18, Shih teaches the computer-implemented method of claim 1, wherein the step of identifying time synchronisation data comprises identifying a time offset of the auxiliary digital asset relative to the master 3D digital asset based on the determined correlation, wherein the time synchronisation data comprises the time offset (paragraph [004], "obtaining a starting-time difference between the first audio signal and the second audio signal for audio synchronization according to the difference between the second pattern of the first audio signal and the second pattern of the second audio signal."),
optionally wherein the time offset is the difference in start or end time between the master 3D digital asset and the auxiliary digital asset (paragraph [0005], "referring to time points of the first pattern and the second pattern within the first audio signal and time points of the first pattern and the second pattern within the second audio signal to obtain a starting-time difference between the first audio signal and the second audio signal for audio synchronization.").
Regarding claim 19, the combination of Shih and Weber teach a processing device configured to perform the method of claim 1 (paragraph [0006], "According to another embodiment of the present invention, an electronic device comprises a processing circuit. The processing circuit is arranged for receiving a first audio signal and a second audio signal from a first recording device and a second recording device, respectively, and performing a correlation operation upon the first audio signal and the second audio signal to align a first pattern of the first audio signal and the first pattern of the second audio signal; and after the first patterns of the first audio signal and the second audio signal are aligned, calculating a difference between a second pattern of the first audio signal and the second pattern of the second audio signal; and obtaining a starting-time difference between the first audio signal and the second audio signal for audio synchronization according to the difference between the second pattern of the first audio signal and the second pattern of the second audio signal.").
Shih and Weber are considered analogous because they are each concerned with aligning and processing audio data. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Shih with the teachings of Weber for the purpose of enhancing user experience. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Regarding claim 20, Shih teaches a system comprising: the processing device of claim 19; anda server configured to store the master 3D digital asset and auxiliary digital asset (paragraph [0037], "In addition, the electronic device 902, 904 and 906 can be any portable device such as a smart phone or a tablet having a speaker and a microphone, and the electronic device 908 can be a recording device, a remote controller or a cloud server device, but it is not a limitation of the present invention.");
a user device configured to request and obtain the master 3D digital asset and auxiliary digital asset from the server and playback the master 3D digital asset and auxiliary digital asset on a display of the user device, wherein playback of the auxiliary digital asset is commenced based on the time synchronisation data of the auxiliary digital asset relative to the master 3D digital asset when a user indicates to the user device that playback is to be switched from the master 3D digital asset to the auxiliary digital asset (paragraph [0046], "The head-mounted display 1370 comprises a media requesting and receiving module 1371, a file parsing module 1372, a decoder 1373, a head/eye tracking module 1374 and a rendering module 1375.").
Claims 8 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Shih and Weber as applied to claim 1 and 5 above, further in view of U.S. Patent Application Publication 2019/0294877 to Ayalon (hereinafter, "Ayalon").
Regarding claim 8, the combination of Shih and Weber does not explicitly teach “The computer-implemented method of claim 5, wherein the step of processing the master 3D digital asset further comprises computing a first Fourier transform of successive segments in time of the master audio data to generate sets of master FT data, each segment having a time offset from a known time within the master audio data, wherein the step of separately processing the auxiliary digital asset further comprises computing a second Fourier transform of the auxiliary audio data to generate auxiliary FT data; wherein determining a correlation between the master audio data and auxiliary audio data comprises matching the auxiliary FT data with at least one set of the master FT data by determining a sufficient similarity between the auxiliary FT data and the at least one set of the master FT data, wherein the step of determining the time synchronisation data comprises identifying the time offset of the segment of the master audio data corresponding to the matched at least one set of the master FT data,” and thus, Ayalon is introduced.
Ayalon teaches computing a first Fourier transform of successive segments in time of the master audio data to generate sets of master FT data, each segment having a time offset from a known time within the master audio data (paragraph [0033], "The step 304, includes various sub steps. At first sub step that is at step 3042, a chromatogram of the first audio signal is computed consisting of a plurality of frames. The chromatogram may be generated using any well-known algorithm like short-time Fourier transform (STFT). STFT is a Fourier-related transform used to determine a sinusoidal frequency and phase content of local sections of a signal that changes overtime. STFT splits a longer time signal to shorter segments of equal length. Then the Fourier transform separately on each of the shorter segments. This generates a Fourier spectrum for each of the shorter segments."),wherein the step of separately processing the auxiliary digital asset further comprises computing a second Fourier transform of the auxiliary audio data to generate auxiliary FT data (paragraph [0033], "At step 304, a spectral analysis of the first audio signal is performed by the audio analyzer 206. It is to be noted that a simultaneous spectral analysis is also performed on a bank of signals stored in the storage 208 of the processor 200 or a post analysis results of each of the signals within the signal bank is stored in the database 210 that may be accessed by the audio analyzer 206 during the identification method 300.");wherein determining a correlation between the master audio data and auxiliary audio data comprises matching the auxiliary FT data with at least one set of the master FT data by determining a sufficient similarity between the auxiliary FT data and the at least one set of the master FT data (paragraph [0033], "For this method 300, the instruction received may be to identify a matching signal to the first audio signal. At step 304, a spectral analysis of the first audio signal is performed by the audio analyzer 206."),wherein the step of determining the time synchronisation data comprises identifying the time offset of the segment of the master audio data corresponding to the matched at least one set of the master FT data (paragraph [0044], "This sync point, defined in digital audio samples, is calculated by taking the beat time on which the similarity score between the first audio signal beat times and the sliding window beat times was the highest, and reducing the number of samples that leads to the first beat on the sliding window beat times.").
Shih, Weber and Ayalon are considered analogous because they are each concerned with aligning and processing audio data. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Shih and Weber with the teachings of Ayalon for the purpose of improving synchronization accuracy. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Regarding claim 17, Shih teaches the computer-implemented method of claim 1… optionally wherein identifying the time synchronisation data comprises identifying a time offset of the auxiliary digital asset relative to the master 3D digital asset (paragraph [0004], "after the first patterns of the first audio signal and the second audio signal are aligned, calculating a difference between a second pattern of the first audio signal and the second pattern of the second audio signal; and obtaining a starting-time difference between the first audio signal and the second audio signal for audio synchronization according to the difference between the second pattern of the first audio signal and the second pattern of the second audio signal."),
optionally wherein the steps of determining the correlation, identifying the time synchronisation data and storing the time offset in memory are triggered automatically upon completion of the transcoding and the audio extraction (paragraph [0024], "After receiving the original recorded audio signals, an auto-correlation operation is performed upon the original recorded audio signals to align the synchronization sound patterns within the original recorded audio signals.").
 The combination of Shih and Weber does not teach the “computer-implemented method of claim 1, further comprising transcoding the master 3D digital asset and auxiliary digital asset, and extracting the master and auxiliary audio data from the transcoded 3D digital asset and auxiliary digital asset respectively,” however, Ayalon teaches transcoding the master 3D digital asset and auxiliary digital asset, and extracting the master and auxiliary audio data from the transcoded 3D digital asset and auxiliary digital asset respectively (paragraph [0033], "At step 304, a spectral analysis of the first audio signal is performed by the audio analyzer 206. It is to be noted that a simultaneous spectral analysis is also performed on a bank of signals stored in the storage 208 of the processor 200 or a post analysis results of each of the signals within the signal bank is stored in the database 210 that may be accessed by the audio analyzer 206 during the identification method 300. The step 304, includes various sub steps. At first sub step that is at step 3042, a chromatogram of the first audio signal is computed consisting of a plurality of frames," and paragraph [0034], "At step 3044, each of the plurality of chromatogram frames is further split into a plurality of pitch classes. FIG. 4 illustrates a sample table 400 wherein each frame is split into 6 pitch classes over 15 STFT. Further at step 3046, each of the pitch class of the frame is analyzed.").
Shih, Weber and Ayalon are considered analogous because they are each concerned with aligning and processing audio data. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Shih with the teachings of Ayalon for the purpose of improving synchronization accuracy. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Claims 11-13 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Shih and Weber as applied to claim 1 above, further in view of U.S. Patent Application Publication 2020/0312029 to Heinen et al. (hereinafter, "Heinen").
Regarding claim 11, the combination of Shih and Weber does not teach “The computer-implemented method of claim 1, further comprising providing the user with means for switching from playing the master 3D digital asset to playing of the auxiliary digital asset,” and thus, Heinen is introduced.
Heinen teaches providing the user with means for switching from playing the master 3D digital asset to playing of the auxiliary digital asset (paragraph [0247], "Additionally or alternatively, the systems and methods here may support audio features and/or experiences including using various audio channels in user displays, headsets and computing devices. FIG. 11a is an illustration that visualizes example audio sources 210104 with a 3D position in a virtual scene 210102.").
Shih, Weber and Heinen are considered analogous because they are each concerned with processing spatial audio data. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Shih and Weber with the teachings of Heinen for the purpose of enhancing user experience. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Regarding claim 12, the combination of Shih and Weber does not teach the “computer-implemented method of claim 11, wherein the step of switching to playing of the auxiliary digital asset based on the time synchronisation data comprises: detecting activation of the switching means at an activation time corresponding to a current playback time of the 3D digital asset,” or “said playing of the auxiliary digital asset is commenced from an auxiliary start time in the auxiliary digital asset which corresponds to said activation time,” however, Heinen teaches a method wherein the step of switching to playing of the auxiliary digital asset based on the time synchronisation data comprises: detecting activation of the switching means at an activation time corresponding to a current playback time of the 3D digital asset (paragraph [0247], "An audio source can either be played all the time, or be triggered by certain events. Such methods can also be used to annotate different elements in a scene with audio annotations which could be triggered if the viewing user later clicks the virtual element the audio annotation is attached to."); and
playing the auxiliary digital asset in response to the detected activation of the switching means, wherein said playing of the auxiliary digital asset is commenced from an auxiliary start time in the auxiliary digital asset which corresponds to said activation time (paragraph [0247], "An audio source can either be played all the time, or be triggered by certain events. Such methods can also be used to annotate different elements in a scene with audio annotations which could be triggered if the viewing user later clicks the virtual element the audio annotation is attached to.").
Shih, Weber and Heinen are considered analogous because they are each concerned with processing spatial audio data. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Shih and Weber with the teachings of Heinen for the purpose of enhancing user experience. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Regarding claim 13, Shih teaches the computer-implemented method of claim 12, wherein identifying the time synchronisation data comprises identifying a time offset of the auxiliary digital asset relative to the master 3D digital asset; further comprising determining the auxiliary start time based on the time offset being applied to the current playback time of the 3D digital asset (paragraph [0004], "after the first patterns of the first audio signal and the second audio signal are aligned, calculating a difference between a second pattern of the first audio signal and the second pattern of the second audio signal; and obtaining a starting-time difference between the first audio signal and the second audio signal for audio synchronization according to the difference between the second pattern of the first audio signal and the second pattern of the second audio signal.") and
optionally wherein the step of determining the auxiliary start time comprises subtracting the time offset from the current playback time of the master digital asset (paragraph [0024], "Then, because the device/microphone specifications and placements are known, the sound delays corresponding to the distance difference “y-x” and “y-z” cane be pre-calculated, and the sound delays corresponding to the distance difference “y-x” and “y-z” are applied to compensate the aligned audio signals to generate the compensated audio signals.").
Regarding claim 15, Heinen further teaches the computer-implemented method of claim 12, wherein the switching means comprises an interactive hotspot located at a virtual positional location within the master 3D digital asset (paragraph [0247], "FIG. 11a is an illustration that visualizes example audio sources 210104 with a 3D position in a virtual scene 210102. The virtual scene 210102 can both be viewed and created with a technical device 400106. This method can embed audio sources 210104 in virtual scenes 210102. Example audio sources 210104 can have a 3D position in a scene 210102.").Shih teaches optionally wherein the interactive hotspot is activated via user input to a user device (paragraph [0051], "Finally, the rendering module 1475 receives the decoded audio signals from the decoder 1473 to play the audio signals for the user according to the user's head/eye tracking information generated by the head/eye tracking module 1474.").
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Shih, Weber and Heinen as applied to claim 12 above, and further in view of U.S. Patent 9,223,458 to Qureshi (hereinafter, "Qureshi").
Regarding claim 14, the combination of Shih, Weber and Heinen does not teach “The computer-implemented method of claim 12, wherein the switching means are made available at a playback time in the master 3D digital asset corresponding to the earliest possible playback time of the auxiliary digital asset relative to the playback time of the master 3D digital asset, and are made unavailable at a point in the master 3D digital asset corresponding to the latest point in the auxiliary digital asset,” and thus, Qureshi is introduced.
Qureshi teaches the switching means are made available at a playback time in the master 3D digital asset corresponding to the earliest possible playback time of the auxiliary digital asset relative to the playback time of the master 3D digital asset, and are made unavailable at a point in the master 3D digital asset corresponding to the latest point in the auxiliary digital asset (column 7, lines 12-16, " In another example, presenting the user with the option of switching from one media file to another may only make sense during an initial period of playback of the first media file (e.g., the first minute). After that, it may reasonably be assumed that the user is not interested.").
Shih, Weber, Heinen and Qureshi are considered analogous because they are each concerned with processing audio information. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Shih and Heinen with the teachings of Qureshi for the purpose of enhancing user experience. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
U.S. Patent 9,445,042 to Gandolph et al.
U.S. Patent 10,277,813 to Thomas et al.
U.S. Patent 11,227,440 to Yang.
U.S. Patent 11,429,340 to Munoz et al.
U.S. Patent Application Publication 2016/0373615 to Chen.
U.S. Patent Application Publication 2018/0358030 to Stefanakis et al.
U.S. Patent Application Publication 2022/0256231 to Eniwumide.
European Patent Specification EP 0694243 to Phillips et al.
International Publication WO 03/094518 to Corby.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEAN T SMITH whose telephone number is (571)272-6643. The examiner can normally be reached Monday - Friday 8:00am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, PIERRE-LOUIS DESIR can be reached at (571) 272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SEAN THOMAS SMITH/Examiner, Art Unit 2659  

/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659
Read full office action
Prosecution Timeline

Jul 10, 2023
Application Filed
Jul 10, 2023
Response after Non-Final Action
Sep 05, 2025
Non-Final Rejection — §103
Jan 09, 2026
Response Filed
Feb 03, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/393,807
Patent 12602540
LEVERAGING A LARGE LANGUAGE MODEL ENCODER TO EVALUATE PREDICTIVE MODELS
2y 5m to grant Granted Apr 14, 2026
18/092,987
Patent 12530534
SYSTEM AND METHOD FOR GENERATING STRUCTURED SEMANTIC ANNOTATIONS FROM UNSTRUCTURED DOCUMENT
2y 5m to grant Granted Jan 20, 2026
Study what changed to get past this examiner. Based on 2 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
83%
Grant Probability
99%
With Interview (+33.3%)
2y 8m
Median Time to Grant
Moderate
PTA Risk
Based on 6 resolved cases by this examiner. Grant probability derived from career allow rate.
MEDIA PROCESSING METHOD, DEVICE AND SYSTEM

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email