DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1-5, 12 and 14-20 are rejected under 35 U.S.C. 103 as being unpatentable over Oh et al. (published as US 20080192941 A1; hereafter Oh).
Regarding claim 1, Oh discloses a device (e.g., encoder in Fig. 19 or decoder in Fig. 17) for generating a processed signal (e.g., downmix signal forward to the decoder in Fig. 19 or rendered signal to be applied to speakers in Fig. 17) using a plurality of audio objects (“Multi-Object Input” in Figs. 17-19), each audio object of the plurality of audio objects comprising an audio object signal and audio object metadata (side info; [0165]-[0168]), the audio object metadata comprising a position of the audio object and a gain parameter of the audio object (for distinct position and volume of each source, [0057], [0140], [0176]; see also “when the downmix includes a plural object, a side information includes information for each object.” in [0170]), the device comprising:
an interface (user interface, [0156], provided for a producer in [0159] and [0175], [0178], [0179]; and for user at the decoder. See “User Control” in Fig. 17, e.g.) for specification of at least one effect parameter of a processing-object group of audio objects on by a user (“able to control similar objects in a lump…” in [0162], “grouped object” in [0175], “combined objects” in [0176]; e.g., modifying gain in [0175], modifying position in [0176], or applying general effect in [0157]), the processing-object group of audio objects comprising two or more audio objects of the plurality of audio objects ([0175], or objects in the entire scene; see also [0057] gain for all channels/objects), and
a processor unit configured to generate the processed signal (encoded bitstream from an encoder is shown in Fig. 14, encoder is shown as 1210 in Fig. 17 or in Fig. 19, or output of decoder 1200B for speaker in Fig. 17) using
the audio object signals (multi-object input for 1310 in encoder of Fig. 19 or for 1210 in Fig. 17; object signals included in downmix for 1200B in Fig. 17),
the positions of the plurality of audio objects comprised by the audio object metadata (side info includes object position, such as instrument position, e.g., [0057], [0140], [0176]),
the gain parameters of the plurality of audio object comprised by the metadata (side info includes object gain, such as the gain for an instrument; e.g., [0057], [0140]) (see also [0165], [0167]), and
the at least one effect parameter (effect mode info at encoder, see “4.1.1 Transmitting Effect-Mode Information to Decoder side” in [0158]-[160] or effect mode info at 1200B at decoder; see “4.1.2 Generating Effect-Mode Information in Decoder Side” in [0161], “number of objects” indicates a group of objects; “control similar objects in a lump” in [0162] indicates an effect parameter for similar objects in a lump),
such that, for each audio object of the of the processing-object group of audio objects (e.g., each drum of a group of drums as discussed in [0176] or “similar objects in a lump” in [0162]), the at least one effect parameter (e.g., moving position parameter for the group of drums) specified by means of the interface (at encoder side in [0157] and [0159] or decoder side in [0161]-[0162]) is applied to the audio object signal or to the audio object metadata of each of the audio objects of the processing-object group of audio objects (the effect parameter is applied to each drum in the group of drums) (see also [0162], [0176] or [0157], Fig. 1).
Oh fails to explicitly demonstrate that changing a parameter for the group would changing the gain parameter of each object in the group and changing a position of the group would changing the position of each object in the group. However, the claimed features are one of many available tools taught in Oh that could apply to a group of objects based on user’s preference (producer at the encoder side in Fig. 19 or an end user at the decoder side in Fig. 17, e.g.). That is, the user could keep the objects as they are without any modification, but if the user want to change gain or position, the user could use the available tools for such changes. Controlling object gain and/or position based on user selection are basic tools that allows an user to control how to mix plural objects ([0005], [0010]). For example, the producer at the encoder side would like to move a lead singer to the left of the stage, or increase the gain of the lead singer. The producer could do so with basic gain control or positioning/panning control. The end user at the decoder could also user the similar tools available at the decoder side to adjust the final outcome to be generated from the speakers. In addition to these basic tools, Oh further teaches other tools, such as grouped object (“b) Grouped Object”, [0175]) at the encoder side or similar tool for the user at the decoder side (“the grouping information may be generated in a decoder” in [0180], “to control similar objects in a lump” in [0162]). Oh further teaches the advantage of applying a parameter (e.g., gain X, wherein gain is a type of effect) to a group of objects (e.g., 4 objects) as each of objects in the group could be controlling simultaneously ([0162], [0175]), instead of applying the same parameter (gain X) multiple times (4 times) for each object in the group. This action not only save time, but would save user from monotonously actions. Oh further clearly states “It is able to apply one parameter to grouped object for controlling object panning and object gain” in paragraph [0175] and “to control object panning and gain in a lump” in paragraph [0176]. Thus, it would have been obvious to one of ordinary skill in the art to modify Oh by utilizing available tools, such as grouping objects, controlling gain in a lump and controlling panning in a lump, in order to allow user at either encoder side or decoder the flexibility of controlling the sound scene with multiple sound objects with a tool that could save time and mental effort.
Regarding claims 2 and 3, Oh shows that the effect is applied to the objects belong to the processing-object group ([0176], e.g.), thus, other object that does not belong to the processing-object group does not have the effect being applied.
Regarding claim 4, Oh discloses that the at least one effect parameter is applied to the gain parameter of the metadata of each of the objects in the processing-object group (e.g., [0175], [0162]), not other object that does not belong to the processing-object group.
Regarding claim 5, Oh discloses that the at least one effect parameter is applied to the position of the metadata of each of the objects in the processing-object group (e.g., [0176], [0162]), not other object that does not belong to the processing-object group.
Regarding claim 12, Oh fails to explicitly show that one or more further processing-object groups of audio objects also exist. However, in view of the teaching of Oh as a whole, one skilled in the art would have been able to designate one or more additional processing-object groups, such as a group for violins/string instruments in addition to the group of drums ([0176]) for an orchestra without undue experimentation. A corresponding effect parameter for the one or more further processing-object groups would also be utilized without undue experimentation in order to allow the user to make fast adjustment applied to all objects within the one or more further processing-object groups similar to the effect parameter for the first group. Orchestra, drums and violins are stated in the office action for explanatory purpose. One skilled in the art would have recognized that the user can group the objects in one or more grouping as he/she prefers. Thus, it would have been obvious to one of ordinary skill in the art to modify Oh by allowing the user/editor designating one or more further processing-object groups and its corresponding effect parameter in order to enhance the functionality of the audio device by allowing the user/editor to adjust multiple groups of objects efficiently.
Regarding claims 14 and 17, Oh shows an encoder (e.g., Fig. 19, 1200A in Figs. 17 and 18), a downmix signal (“Downmix of an Audio Signal”) and a metadata signal (“Side Info”). A decoder similar to the one as shown in Fig. 17 is inherently coupled to the encoder in Fig. 19.
Regarding claims 15 and 20, Oh shows a decoder (1200B in Fig. 17), a downmix signal (“Downmix of an Audio Signal”) and a metadata signal (“Side Info”). Oh also shows reconstruction (by 1230) and one or more audio output channels (“Multi-Channel Output”). Oh also shows applying the effect parameter ([0161]). Oh also shows an encoder (1200A in Fig. 17).
Regarding claim 16, Oh shows specification of one or more rendering parameters by the user (user control supplied to 1220 in Fig. 17, [0056]-[0060], e.g.).
Claim 18 corresponds claim 1 discussed above.
Most of limitations in claim 19 correspond to those specified in claim 1 discussed above. Oh teaches computer-readable medium and a computer ([0026], [0027], claims 15 and 16), but fails to show a non-transitory digital storage medium. Examiner takes Official Notice that this feature is notoriously well known in the art. Thus, it would have been obvious to one of ordinary skill in the art to modify Oh by utilizing well known data storage medium, such as a non-transitory type, in order to save the computer program locally and be available for controlling the audio signal processing by the computer.
Claims 1-13 and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Scheirer et al. (hereafter Scheirer; “AudioBIFS: Describing Audio Scenes with the MPEG-4 Multimedia Standard’, IDS filed on 10/6/2020).
Regarding claim 1, Scheirer discloses a device for generating a processed signal while using a plurality of audio objects (Fig. 3, right column of p. 242), each audio object of the plurality of audio objects comprising an audio object signal (Fig. 3) and audio object metadata (Fig. 3), the audio object metadata comprising a position of the audio object (left column of p. 242) and a gain parameter of the audio object (sect. D on p. 240, “moving sound should have a Doppler shift,; distance sound should be attenuated ..., Fig. 2, sect. “70” on the right column of p. 244), the device comprising:
an interface (“the mouse of other input device” under section B of p. 238) for specification of at least one effect parameter of a processing-object group of audio objects on the part of a user (e.g., defining moving to a new position, sect. B, “made to move’), the processing-object group of audio objects comprising two or more audio objects of the plurality of audio objects (sect. B on p. 238, “all subgraphs”, Fig. 1, sect. “9)” on p. 244), and
a processor unit (inherently included to implement Figs. 1 and 3) configured to generate the processed signal (encoded signal to be transmitted for an end user, or a decoded signal by the decoder to be heard by the end user) using
the audio object signals (e.g., AudioClip defined on p. 239),
the positions of the plurality of audio objects comprised by the audio object metadata (e.g., Sound in Table 1 on p. 239),
the gain parameters of the plurality of audio objects comprised by the metadata (e.g., intensity field defined on the second column of p. 239, y-axis in Fig. 2, second column of p. 244 discussing “7) Sound”), and
the at least one effect parameters (e.g., second column of p. 244 discussing “9) Group”, one example of an effect parameter for the group is moving the group to a new position, another example is Transform),
such that the at least one effect parameter specified by means of the interface is applied to the audio object signal or to the audio object metadata of each of the audio objects of the processing-object group of audio objects (right column of p. 245; Fig. 3, sound output to listener).
Scheirer fails to explicitly demonstrate that changing a parameter for the group would changing the gain parameter of each object in the group and changing a position of the group would changing the position of each object in the group. However, the claimed features are one of many available tools taught in Scheirer that could apply to a group of objects based on user’s preference. That is, the user could keep the objects as they are without any modification, but if the user want to change gain or position, the user could use the available tools for such changes. Controlling object gain and/or position based on user selection are basic tools that allows an user to control how to mix plural objects (“intensity control” in first column of p. 240, “location” “moving sounds” “angle of radiation” in second column of p. 240, function of “Sound” or “Sound2D” in Table II of p. 242), “position of the Sound mode” in second column o fp. 243). For example, the producer at the encoder side would like to move a lead singer to the left of the stage, or increase the gain of the lead singer. The producer could do so with basic gain control or positioning/panning control. In addition to these basic tools, Scheirer further teaches other tools, such as grouped object (“Group” in Table II on p. 242). Scheirer further teaches the advantage of applying a parameter (e.g., positioning X, wherein positioning is a type of effect) to a group of objects (e.g., 4 objects) as each of objects in the group could be controlling simultaneously ([0162], [0175]), instead of applying the same parameter (gain X) multiple times (4 times) for each object in the group. This action not only save time, but would save user from monotonously actions. See specific example provided on the second column of p. 238 (“By transforming the position of the character, all the subgraphs (“local coordinate spaces”) are automatically transformed at well.”; see also the last full paragraph of second column of p. 244). Scheirer further clearly explain “Transform” function applied to a group would transform all objects in the group simultaneously (first column of p. 245). The benefit could also apply to gain change for a group of objects (e.g., singing volume from members of a choir). Thus, it would have been obvious to one of ordinary skill in the art to modify Scheirer by utilizing available tools, such as grouping objects and transform, in order to allow user the flexibility of controlling gain and position of multiple objects in a group in the sound scene with a tool that could save time and mental effort.
Regarding claims 2 and 3, by assigning related objects to form a Group (Table Il), other remaining object would not be affected by the parameter for the objects in Group while the parameter for the objects in Group would be applied to each and every object in Group Object.
Regarding claim 4, Scheirer discloses, object parameters such gain (Fig. 2, matrix field disclosed on right column of p. 245).
Regarding claim 5, Scheirer discloses, in sec. “9)” on p. 244, that the objects in Group can be moved, which has affected all members in the group. Furthermore, the group object can be rotated (left column of p. 254), which also affects the position of the metadata of the audio objects.
Regarding claim 6, the claimed “at least one definition parameter” reads on Group, Group2D, Transform and Transform2D in sec. “9)” on p. 244.
Regarding claims 7-9, Scheirer discloses the modeling of the acoustic environment (e.g., walls, objects, etc.) via geometric regions having different acoustic properties (sects. A and B on p. 246; “geometrical regions”). These acoustic properties are modeled by effects, such as frequency-dependent attenuation, for example, at disclosed on sect. “2) AcousticMaterial” on p. 247. The applying of this filter to an audio object is thus controlled such that the object lies in the range of influence of this filter, which is defined by the geometric region. Scheirer also discloses a distance-dependent weighting (right column of p. 247).
Regarding claims 10 and 11, Scheirer fails to explicitly show at least one angle specifying a direction from a defined user position in which there is an area of interest associated with the processing-object group. However, one would have understood that that a sound located from the front of user would sound different from the same sound located at the back of the user. When a sound moves from the front of the user, the user can distinguish the acoustic difference. The same property also applies to a group of sound objects located at almost to the same central location and moved in the same fashion (from one angle to another angle relative to the user). Scheirer teaches listening position of a virtual listener (p. 245, “10) ListeningPoint’, “The listening point ... facing direction”). Furthermore, the position information of the listening position is provided to
AudioFX node in order to enable processing which is dependent on the position of the virtual listener (p. 245, left column, “The listening-point location is also provided to the AudioFx node so that the SAOL code may provide virtual-listener-location-dependent processing”). Thus, in view of teaching from Scheirer, it would have been obvious to one of ordinary skilled in the art to modify Scheirer by having a parameter defining the orientation of the user in order to accurately processing sound from a group of objects relative to the front, back or sides of the user.
Regarding claims 12 and 13, Scheirer fails to explicitly show that one or more further processing-object groups of audio objects also exist. However, in view of the teaching of Scheirer as a whole, one skilled in the art would have been able to designate one or more additional processing-object groups, such as a group for violins/string instruments in addition to a group of drums ([0176]) for an orchestra without undue experimentation. A corresponding effect parameter for the one or more further processing-object groups would also be utilized without undue experimentation in order to allow the user to make fast adjustment applied to all objects within the one or more further processing-object groups similar to the effect parameter for the first group. Orchestra, drums and violins are stated in the office action for explanatory purpose. One skilled in the art would have recognized that the user can group the objects in one or more grouping as he/she prefers. Thus, it would have been obvious to one of ordinary skill in the art to modify Scheirer by allowing the user/editor designating one or more further processing-object groups and its corresponding effect parameter in order to enhance the functionality of the audio device by allowing the user/editor to adjust multiple groups of objects efficiently.
Claims 17 and 18 correspond to claim 1 discussed before. A decoder is disclosed on left column of p. 236 and inherently provided for post production or presentation for the end user on Sect. C on p. 245.
Regarding claim 19, Scheirer teaches each and every limitation with the exception of a non-transitory digital storage medium having a computer program. However, Scheirer teaches software implementing the process allowing the user to define and modify the audio objects recorded on tracks in wide variety of available tools (p. 248, sect. VI). A computer program is required to control a CPU in response to user’s selection through an interface. Examiner takes Official Notice that storing a computer program on a non-transitory digital storage medium is notoriously well known in the art. Thus, it would have been obvious to one of ordinary skill in the art to modify Scheirer by storing a computer program for implementing the tools for sound composition on a non-transitory computer medium in order to provide easy transportation of the computer program.
Claims 6-8, 10 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Oh as applied to claims 1 and 12 above, and further in view of Laaksonen (US 20180109901A1).
Regarding claims 6-8, 10 and 13, Oh fails to explicitly show specifying at least one definition parameter, such as comprises at least one position of an area of interest, radius of the area or an angle specifying a direction from a user position. Oh teaches examples of grouping, such as a drum group ([0176]) or instruments associated with a rhythm ([0162]). Since Oh teaches that the user can group any objects together ([0175]), one skilled in the art would have expected that the user could group the objects at his/her preference. Oh groups the objects based on some common characteristics among the objects, such as a group of drums (percussion section) ([0176]). One skilled in the art would have expected that other common characteristics among the objects based on the user’s definition could be used without generating any unexpected result. For example, a group of objects (e.g., dancers) located within a circular area having a defined radius in front of the user or a group of objects located at between 30o and 45o to the left from the user’s front view. By specifying the common characteristic shared by the objects in the group, user could easily and quickly lump objects into a defined group based on his/her definition. Laaksonen, teaches an audio object modification device, is cited here as an example. Laaksonen teaches that an user can define objects into a group based upon a classification ([0040]). Thus, it would have been obvious to one of ordinary skill in the art to modify Oh by allowing the user to define a character of grouping two or more objects in order to enable the user to efficiently selecting objects into a defined group and applying the common effect to all objects within the group as defined by the user.
Claims 9 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Oh and Laaksonen as applied to claims 6-8 and 10 above, and further in view of Scheirer.
Regarding claims 9 and 11, Oh fails to explicitly show a weighting factor depending on a distance or an angular difference. As taught in Oh, the panning/repositioning is based on the effect parameter. In order to perceive the proper sound effect after applying the effect parameter, the dry sound source representing each object is inherently being adjusted based on where the object is located relative to the user. For example, with a group of three objects located in front of the user, one object behind another object, the object closest to the user is expected to having the highest sound level if all three objects receive the same dry source, while the farthest object has the lowest sound level due to the corresponding distance between each object in the group and the user. The level of each sound object is a function of the relative distance between the sound level at a reference point of the group and the position of each object. If the user prefers to apply a gain increment for the group at the reference point of the group, a weighting is applied to each object in the group based on the distance between each object in the group and the reference point of the group. The same logic would apply if the grouping of the object is based on the orientation of the user. Scheirer teaches an audio device with Audio BIFS that allows the user to define a group of objects (“Group node” under “9)” on the right column of p. 244) and an effect parameter for the group (e.g., being moved as discussed on the right column of p. 244). Since the objects in the group maintains the same relative positions, the sound from each object in the group would maintain the relationship as the group moves. When a sound moves from the front of the user, the user can distinguish the acoustic difference. The same property also applies to a group of sound objects located at almost to the same central location and moved in the same fashion (from one angle to another angle relative to the user). Scheirer teaches listening position of a virtual listener (p. 245, “10) ListeningPoint’, “The listening point ... facing direction”). Furthermore, the position information of the listening position is provided to AudioFX node in order to enable processing which is dependent on the position of the virtual listener (p. 245, left column, “The listening-point location is also provided to the AudioFx node so that the SAOL code may provide virtual-listener-location-dependent processing”). Thus, in view of teaching from Scheirer, it would have been obvious to one of ordinary skilled in the art to modify Oh in view Scheirer by having a weighting factor representing the relationship between the object in the group and the reference position/angle of the group based on the distance or angular difference in order to accurately processing sound from a group of objects relative to the front, back or sides of the user when an effect parameter is applied to the reference position/angle of the group.
Response to Arguments
Applicant's arguments filed 3/26/2026 have been fully considered but they are not persuasive.
Throughout the argument presented on p. 13 and p. 14, applicant relied on phrases “preprocessing stage” and “decoding” or similar. However, claims 1, 18 and 19 do not include such claimed features.
On p. 13, applicant argued that Oh fails to disclose or provide a pointer to the claimed arrangement in which a downmix signal is preprocessed using object-based parameters prior to input into a multi-channel decoder. The office disagrees with applicant’s interpretation of the claimed features and Oh’s disclosure. Claim 1 does not specifically define the claimed device as either an encoder or decoder. Claims 18 and 19 are similar to claim 1, thus also fail to explicitly specify the argued features.
On p. 13, applicant argued that the downmix signal in Oh’s device is provided as an input to the decoder without modification. This is not true. Fig. 1 of Oh clearly shows that metadata (“Side Info”) is modified (by 110) before being utilized by a decoder (120). Fig. 2 of Oh clearly shows that metadata (“Side Info”) is modified (by 210) before being utilized by a decoder (230). In view of Figs. 1 and 2 of Oh, the words decoder and render are interchangeable. Due to the modification of the metadata, object positioning, gain control and spatial synthesis are executed internally within the decoder (120 or 230; [0057], [0106]).
On p. 13, applicant argued that claims 1, 18 and 19 are directed to a particular architecture that includes a downmix processing unit. The office disagrees. The claims 1, 18 and 19 do not explicitly specify a downmix processing unit, a downmix signal, mix information, a processed downmix signal and a multi-channel decoder. Contrary to applicant’s argument, Oh teaches a preprocessing step (before 120 in Fig. 1 or 230 in Fig. 2) that changes rendering functionality, in particular object gain or panning control ([0057], [0106]), out of the decoder (110 is out of 120, 210 is out of 230) and into a preprocessing stage (preprocessed before 120 or 230) operating directly on the downmix signal (encoder generates downmix signal and “Side Info”. See Figs. 14, 15, 17 and 18).
On p. 13, applicant argued that Oh teaches away from altering the downmix signal prior to decoding. The office disagrees. Figs. 1 and 2 of Oh explicitly illustrate that the metadata (“Side Info”) of the original audio signal (“Multi-Object Input” in Figs. 17 and 18) is modified prior to decoding.
On p. 14, applicant argued that Oh does not disclose or point to combining object parameters with user input to dynamically influence the downmix signal prior to decoding. The office disagrees. Figs. 1, 9 and 17 of Oh clearly illustrate “User Control” before rendering.
On p. 14, applicant argued that Oh’s system would have to redesign. The office disagrees. Since Oh’s teaching meets the claimed feature, no need to redesign the decoding process.
On p. 14, applicant argued that the claimed invention provides the ability to use an unmodified multi-channel decoder while enabling flexible, user-driven object rendering. The phrase “multi-channel decoder” is not explicitly defined in claims 1, 18 and 19. Furthermore, Oh teaches that the user could control/modify metadata before rendering/decoding. Some examples provided for group effect parameter are discussed in paragraphs [0157]-[0163] and [0175]-[0180].
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PING LEE whose telephone number is (571)272-7522. The examiner can normally be reached Monday-Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vivian Chin can be reached at 571-272-7848. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/PING LEE/Primary Examiner, Art Unit 2695