DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . This Office action is based on the communications filed July 23, 2025. Claims 1 – 10 are currently pending and considered below.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1 – 10 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1 – 7 and 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Murtaza et al. (US 2020/0278828 A1), hereinafter Murtaza, in view of Jensen et al. (US 2014/0114560 A1), hereinafter Jensen.
Claim 1: Murtaza discloses an information processing device comprising:
a memory that stores position information of a destination and sound information corresponding to the destination (see at least, “For ensuring a good audio experience all the audio elements composing an audio scene at a certain moment in time, may have to be made available to a Media Decoder which can make use of the position information for creating the final audio scene,” Murtaza [0256], “If the content is pre-encoded, for a number of pre-defined locations, the system can provide accurate reproduction of the audio scenes at these specific locations under the assumption that these audio scenes do not overlap and the user can “jump/switch” from one location to the next one,” Murtaza [0257], “The Audio Streams are stored on a Media Server, where for each Audio Stream the different encodings at different bitrates (i.e., different Representations) are grouped in one Adaptation Set with the appropriate data signaling the availability of all the created Adaptation Sets,” Murtaza [0264]);
an acquirer that acquires position information of a user (see at least, “User position information: location information (e.g., x, y, z coordinates), orientation information (yaw, pitch, roll), direction and speed of movement, etc.,” Murtaza [0018], “A metadata processor 1236 may be provided to receive from the download and switching information about the audio streams received, information that may include the audio metadata corresponding to each audio stream received. The metadata processor 1236 may be also configured to process and manipulate the audio metadata associated with each audio stream 113, based on the information received from the viewport processor 1232 that may include information about the user location and/or orientation and/or direction of movement 110, in order to select/enable the useful audio elements 152 composing the new audio scene as indicated by the viewport processor 1232, allow the merge of all audio streams 113 into a single audio stream 106,” Murtaza [0184]);
a calculator that calculates a distance from the user to the destination on a basis of the position information of the destination and the position information of the user (see at least, “In examples, a plurality of N audio elements are defined, and, in case the user's distance to the position or area of these audio elements is larger than a predetermined threshold, the N audio elements are processed to obtain a smaller number M of audio elements (M<N) associated to a position or area close to the position or area of the N audio elements, so as to provide the system with at least one audio stream associated to the N audio elements, in case the user's distance to the position or area of the N audio elements is smaller than a predetermined threshold, or to provide the system with at least one audio stream associated to the M audio elements, in case the user's distance to the position or area of the N audio elements is larger than a predetermined threshold,” Murtaza [0422]); and
a processor that performs switching between a virtual point sound source (see at least, “At a first instant t=t1 shown in FIG. 5a, a user is positioned e.g. at a first position. In this first position, a first audio element 1(152-1) and a second audio element 2 (152-2) are located (e.g., virtually) at distances d1 and respective d2 from the user equipped with the MCD. Both distances d1 and d2 may be greater in this case than a defined threshold distance dthreshold, and therefore the system 102 is configured to group both audio elements into one single virtual source 152-3. The position and the properties (e.g., spatial extent) of the single virtual source can be computed based for example on the positions of the original two sources in such a way that it mimics as good as possible the original sound field generated by the two sources (e.g., two well localized point sources can be reproduced in the middle of the distance between them as a single source). The user position data 110 (d1, d2) may be transmitted from the MCD to the system 102 (client) and subsequently to the server 120, which may decide to send an appropriate audio stream 106 to be rendered by the server system 120 (in other embodiments, it is the client 102 which decides which streams to be transmitted from the server 120). By grouping both audio elements into one single virtual source 152-3, the server 120 may select one of a multitude of representations associated with the aforementioned information. (For example, it is possible to deliver accordingly a dedicated stream 106 an adaptation set 113' accordingly associated with e.g. one single channel. Consequently the user may receive through the MCD an audio signal as being transmitted from the single virtual audio element 152-3 positioned between the real audio elements 1 (152-1) and 2 (152-2)),” Murtaza [0230]) and a virtual surround sound source and causes the sound information to be output according to the distance calculated by the calculator(see at least, “At a second instant t=t2 shown in FIG. 5b, a user is positioned e.g. in the same scene 150, having a second defined position in the same VR-environment as in FIG. 5a. In this second position, the two audio elements 152-1 and 152-2 are located (e.g., virtually) at distances d3 and respective d4 from the user. Both distances d3 and d4 may be smaller as the threshold distance dthreshold, and therefore the grouping of the audio elements 152-1 and 152-2 into one single virtual source 152-3 is not used anymore. The user position data are transmitted from the MCD to the system 102 and subsequently to the server 120, which may decide to send another appropriate audio stream 106 to be rendered by the system server 120 (in other embodiments, this decision is made by the client 102). By avoiding to group the audio elements, the server 120 may select a different representation associated with the aforementioned information to deliver accordingly a dedicated stream 106 with an adaptation set 113' accordingly associated with different channels for each audio element. Consequently the user may receive through the MCD an audio signal 108 as being transmitted from two different audio elements 1 (152-1) and 2 (152-2). Therefore the closer the user's position 110 to the audio sources 1 (152-1) and 2 (152-2), the higher the needed quality level of the stream associated to the audio sources has to be selected,” Murtaza [0231], see also at least, Murtaza [0435] – [0443]).
Murtaza does not disclose acquiring position information of a user by Global Positioning System (GPS) or by Pedestrian Dead-Reckoning (PDR). However, Jensen discloses a similar hearing device with a distance measurement unit and further discloses acquiring position information of a user by Global Positioning System (GPS) or by Pedestrian Dead-Reckoning (PDR) (see at least, “Thus, the navigation system comprises a hearing device configured to be head worn and having one or more loudspeakers for emission of sound towards one or both ears of a user and accommodating the inertial measurement unit configured for determining head yaw, when the user wears the hearing device in its intended operational position on the user's head, a distance measurement unit configured to measure distance from the user to an object in the field of view of the user, the GPS unit for determining the geographical position of the user, the sound generator connected for outputting audio signals to the loudspeakers, and the processor configured for, based on the determined head yaw, selecting a POI in the field of view of the user, receiving a distance measurement from the distance measurement unit, determining the distance from the user to the selected POI based on geographical positions of the user and the selected POI, and provided that absolute value of a difference between the received distance measurement and the determined distance from the user to the selected POI is less than a predetermined obstruction threshold, controlling the sound generator to output audio signals with spoken information on the selected POI,” Jensen [0132] – [0138], “In absence of GPS-signal, e.g. when buildings or terrain block the satellite signals, the navigation system may continue its operation relying on data from the inertial measurement unit of the hearing device utilising dead reckoning as is well-known from Inertial navigation systems in general. The processor uses information from gyros and accelerometers of the inertial measurement unit of the hearing device to calculate speed and direction of travel as a function of time and integrates to determine geographical positions of the user with the latest determined position based on GPS-signals as a starting point, until appropriate GPS-signal reception is resumed,” Jensen [0140]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the aforementioned features of Jensen in the invention of Murtaza thereby acquiring improved accuracy in regards to position information of a user by utilizing Global Positioning System (GPS) or by Pedestrian Dead-Reckoning (PDR) position determination in addition to the user position information of Murtaza (see at least, “User position information: location information (e.g., x, y, z coordinates), orientation information (yaw, pitch, roll), direction and speed of movement, etc.,” Murtaza [0018]) for the purposes of virtual sound positioning in augmented/mixed reality applications.
Claim 2: Murtaza and Jensen disclose the information processing device according to claim 1, wherein the processor causes the virtual point sound source to output the sound information in a situation in which the distance exceeds a threshold, and causes the virtual surround sound source to output the sound information when the distance becomes equal to or shorter than the threshold (see at least, “At a first instant t=t1 shown in FIG. 5a, a user is positioned e.g. at a first position. In this first position, a first audio element 1(152-1) and a second audio element 2 (152-2) are located (e.g., virtually) at distances d1 and respective d2 from the user equipped with the MCD. Both distances d1 and d2 may be greater in this case than a defined threshold distance dthreshold, and therefore the system 102 is configured to group both audio elements into one single virtual source 152-3. The position and the properties (e.g., spatial extent) of the single virtual source can be computed based for example on the positions of the original two sources in such a way that it mimics as good as possible the original sound field generated by the two sources (e.g., two well localized point sources can be reproduced in the middle of the distance between them as a single source). The user position data 110 (d1, d2) may be transmitted from the MCD to the system 102 (client) and subsequently to the server 120, which may decide to send an appropriate audio stream 106 to be rendered by the server system 120 (in other embodiments, it is the client 102 which decides which streams to be transmitted from the server 120). By grouping both audio elements into one single virtual source 152-3, the server 120 may select one of a multitude of representations associated with the aforementioned information. (For example, it is possible to deliver accordingly a dedicated stream 106 an adaptation set 113' accordingly associated with e.g. one single channel. Consequently the user may receive through the MCD an audio signal as being transmitted from the single virtual audio element 152-3 positioned between the real audio elements 1 (152-1) and 2 (152-2)),” Murtaza [0230], “At a second instant t=t2 shown in FIG. 5b, a user is positioned e.g. in the same scene 150, having a second defined position in the same VR-environment as in FIG. 5a. In this second position, the two audio elements 152-1 and 152-2 are located (e.g., virtually) at distances d3 and respective d4 from the user. Both distances d3 and d4 may be smaller as the threshold distance dthreshold, and therefore the grouping of the audio elements 152-1 and 152-2 into one single virtual source 152-3 is not used anymore. The user position data are transmitted from the MCD to the system 102 and subsequently to the server 120, which may decide to send another appropriate audio stream 106 to be rendered by the system server 120 (in other embodiments, this decision is made by the client 102). By avoiding to group the audio elements, the server 120 may select a different representation associated with the aforementioned information to deliver accordingly a dedicated stream 106 with an adaptation set 113' accordingly associated with different channels for each audio element. Consequently the user may receive through the MCD an audio signal 108 as being transmitted from two different audio elements 1 (152-1) and 2 (152-2). Therefore the closer the user's position 110 to the audio sources 1 (152-1) and 2 (152-2), the higher the needed quality level of the stream associated to the audio sources has to be selected,” Murtaza [0231]).
Claim 3: Murtaza and Jensen disclose the information processing device according to claim 2, wherein the processor arranges the one virtual point sound source in a direction, in which the destination is present, and causes the sound information suggesting the presence of the destination to be output in a situation in which the distance exceeds the threshold (see at least, “At a first instant t=t1 shown in FIG. 5a, a user is positioned e.g. at a first position. In this first position, a first audio element 1(152-1) and a second audio element 2 (152-2) are located (e.g., virtually) at distances d1 and respective d2 from the user equipped with the MCD. Both distances d1 and d2 may be greater in this case than a defined threshold distance dthreshold, and therefore the system 102 is configured to group both audio elements into one single virtual source 152-3. The position and the properties (e.g., spatial extent) of the single virtual source can be computed based for example on the positions of the original two sources in such a way that it mimics as good as possible the original sound field generated by the two sources (e.g., two well localized point sources can be reproduced in the middle of the distance between them as a single source). The user position data 110 (d1, d2) may be transmitted from the MCD to the system 102 (client) and subsequently to the server 120, which may decide to send an appropriate audio stream 106 to be rendered by the server system 120 (in other embodiments, it is the client 102 which decides which streams to be transmitted from the server 120). By grouping both audio elements into one single virtual source 152-3, the server 120 may select one of a multitude of representations associated with the aforementioned information. (For example, it is possible to deliver accordingly a dedicated stream 106 an adaptation set 113' accordingly associated with e.g. one single channel. Consequently the user may receive through the MCD an audio signal as being transmitted from the single virtual audio element 152-3 positioned between the real audio elements 1 (152-1) and 2 (152-2)),” Murtaza [0230]), and arranges a plurality of the virtual point sound sources around the user and causes the sound information associated with an atmosphere of the destination to be output when the distance becomes equal to or shorter than the threshold (see at least, “At a second instant t=t2 shown in FIG. 5b, a user is positioned e.g. in the same scene 150, having a second defined position in the same VR-environment as in FIG. 5a. In this second position, the two audio elements 152-1 and 152-2 are located (e.g., virtually) at distances d3 and respective d4 from the user. Both distances d3 and d4 may be smaller as the threshold distance dthreshold, and therefore the grouping of the audio elements 152-1 and 152-2 into one single virtual source 152-3 is not used anymore. The user position data are transmitted from the MCD to the system 102 and subsequently to the server 120, which may decide to send another appropriate audio stream 106 to be rendered by the system server 120 (in other embodiments, this decision is made by the client 102). By avoiding to group the audio elements, the server 120 may select a different representation associated with the aforementioned information to deliver accordingly a dedicated stream 106 with an adaptation set 113' accordingly associated with different channels for each audio element. Consequently the user may receive through the MCD an audio signal 108 as being transmitted from two different audio elements 1 (152-1) and 2 (152-2). Therefore the closer the user's position 110 to the audio sources 1 (152-1) and 2 (152-2), the higher the needed quality level of the stream associated to the audio sources has to be selected,” Murtaza [0231]).
Claim 4: Murtaza and Jensen disclose the information processing device according to claim 2, wherein the processor causes, in a case where there is a plurality of the destinations the distances to which are equal to or shorter than the threshold, the virtual surround sound source to output the sound information corresponding to the destination the distance to which becomes equal to or shorter than the threshold at latest timing, and stops the output of the sound information corresponding to the other destination (see at least, “A mixer/renderer 1238 may be provided in the system 102 being configured to reproduce the final audio scene based on the information about the user location and/or orientation and/or direction of movement, i.e., for example, some of the audio elements which are not audible at that specific location should be disabled or not rendered,” Murtaza [0255], “According to an aspect the system may be configured to disable the decoding of audio elements selected the basis of the user's current viewport and/or head orientation and/or movement data and/or metadata and/or virtual position,” Murtaza [0105], “For example, if the user never goes close to the wall, there is no need for the client system 102 to request the streams of the neighboring environment (e.g., they may requested by the client system 102 only when the user approaches the wall). Moreover, the streams coming from outside the wall may have a reduced bitrate, as they may be heard at low volume. Notably, more relevant streams (e.g., streams coming from audio objects within the current environment) may be delivered by the server system 120 to the client system 102 at the highest bitrate and/or highest quality level (as a consequence of the fact that the less relevant streams are at lower bitrate and or quality level, hence leaving free band for the more relevant streams),” Murtaza [0155]).
Claim 5: Murtaza and Jensen disclose the information processing device according to claim 2, wherein the processor causes, in a case where there is a plurality of the destinations the distances to which are equal to or shorter than the threshold, the virtual point sound source to output pieces of the sound information respectively corresponding to all the destinations the distances to which are equal to or shorter than the threshold, and then gradually switches the virtual point sound source to the virtual surround sound source and causes the sound information to be output (see at least, “At positions very far away from the user position (e.g. higher than a first threshold) the objects are mixed into 2 signals (other numbers are possible, based on their spatial position and semantic) and delivered as 2 "virtual objects",” Murtaza [0157], “At positions closer to the user position (e.g. lower than the first threshold but higher than a second threshold smaller than the first threshold) the objects are mixed into 5 signals (based on their spatial position and semantic) and delivered as 5 (other numbers are possible) "virtual objects",” Murtaza [0158], “At positions very close to the user positions (lower than the first and second thresholds) the 10 objects are delivered as 10 audio signals provided the highest quality,” Murtaza [0159]).
Claim 6: Murtaza and Jensen disclose the information processing device according to claim 2, wherein the processor causes the sound information acquired by processing of sound to be output when the distance is equal to or longer than a predetermined distance, and causes the sound information acquired without the processing of the sound to be output when the distance is shorter than the predetermined distance in a case where the distance exceeds the threshold (see at least, “At a first instant t=t1 shown in FIG. 5a, a user is positioned e.g. at a first position. In this first position, a first audio element 1(152-1) and a second audio element 2 (152-2) are located (e.g., virtually) at distances d1 and respective d2 from the user equipped with the MCD. Both distances d1 and d2 may be greater in this case than a defined threshold distance dthreshold, and therefore the system 102 is configured to group both audio elements into one single virtual source 152-3. The position and the properties (e.g., spatial extent) of the single virtual source can be computed based for example on the positions of the original two sources in such a way that it mimics as good as possible the original sound field generated by the two sources (e.g., two well localized point sources can be reproduced in the middle of the distance between them as a single source). The user position data 110 (d1, d2) may be transmitted from the MCD to the system 102 (client) and subsequently to the server 120, which may decide to send an appropriate audio stream 106 to be rendered by the server system 120 (in other embodiments, it is the client 102 which decides which streams to be transmitted from the server 120). By grouping both audio elements into one single virtual source 152-3, the server 120 may select one of a multitude of representations associated with the aforementioned information. (For example, it is possible to deliver accordingly a dedicated stream 106 an adaptation set 113' accordingly associated with e.g. one single channel. Consequently the user may receive through the MCD an audio signal as being transmitted from the single virtual audio element 152-3 positioned between the real audio elements 1 (152-1) and 2 (152-2)),” Murtaza [0230], “At a second instant t=t2 shown in FIG. 5b, a user is positioned e.g. in the same scene 150, having a second defined position in the same VR-environment as in FIG. 5a. In this second position, the two audio elements 152-1 and 152-2 are located (e.g., virtually) at distances d3 and respective d4 from the user. Both distances d3 and d4 may be smaller as the threshold distance dthreshold, and therefore the grouping of the audio elements 152-1 and 152-2 into one single virtual source 152-3 is not used anymore. The user position data are transmitted from the MCD to the system 102 and subsequently to the server 120, which may decide to send another appropriate audio stream 106 to be rendered by the system server 120 (in other embodiments, this decision is made by the client 102). By avoiding to group the audio elements, the server 120 may select a different representation associated with the aforementioned information to deliver accordingly a dedicated stream 106 with an adaptation set 113' accordingly associated with different channels for each audio element. Consequently the user may receive through the MCD an audio signal 108 as being transmitted from two different audio elements 1 (152-1) and 2 (152-2). Therefore the closer the user's position 110 to the audio sources 1 (152-1) and 2 (152-2), the higher the needed quality level of the stream associated to the audio sources has to be selected,” Murtaza [0231], “A determining step 805 having three different results may be performed at a subsequent moment. One or two defined threshold(s) may be relevant at this step for determining e.g. a predictive decision regarding a subsequent viewport and/or head orientation and/or movement data and/or interaction metadata and/or virtual position. Therefore, a comparison with a first and/or a second threshold may be performed, regarding the probability of a change into a second position, resulting in e.g. three different subsequent steps to be performed,” Murtaza [0249]).
Claim 7: Murtaza and Jensen disclose the information processing device according to claim 1, wherein the processor causes, in a case of switching the virtual point sound source and the virtual surround sound source, a switching sound indicating the switching to be output (see at least, “FIG. 4 (case 3) shows an embodiment with another exemplary scenario (represented in a vertical plane XZ of a space XYZ, where the axis Y is represented as entering the paper), where the user moves in a YR, AR and/or MR scene 150A implying a transition of audio from one first position at time t1 to a second position also in the first scene 150A at time t2 . The user in the first position may be far from a wall at time t1 at a distance d1 from the wall; and may be close to the wall at time t2 , at a distance d2 from the wall. Here, d1>d2 . While at the distance d1 the user only hears the source 152A of the scene 150A, he may also hear the source 152B of the scene 150B beyond the wall,” Murtaza [0254], “When the user is in the second position (d2 ), the client 102 sends to the server 120 the data regarding the user's position 110 (d2 ) and receives, from the server 120, not only the audio streams 106 of the first scene 150A, but also the audio streams 106 of the second scene 150B. On the basis of the metadata provided by the server 120, for example, the client 102 will cause the reproduction, e.g., via the decoder 104, of the streams 106 of the second scene 150B (beyond the wall) at a low volume,” Murtaza [0255]).
Claim 10 is directed to an information processing method substantially similar in scope to claim 1 and therefore is rejected for the same reasons (see also at least, “Generally, examples may be implemented as a computer program product with program instructions, the program instructions being operative for performing one of the methods when the computer program product runs on a computer. The program instructions may for example be stored on a machine readable medium,” Murtaza [0436]).
Claim(s) 8 and 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Murtaza and Jensen in view of Shakil et al. (US 2014/0222462 A1), hereinafter Shakil.
Claim 8: Murtaza and Jensen disclose the information processing device according to claim 1, but does not disclose wherein the processor causes open earbuds to output the sound information. However, Shakil discloses in regards to augmented reality and utilizing “An "ears-open" earpiece,” Shakil [0015], for delivering audio data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the ““ear-open” earpiece” of Shakil in the invention of Murtaza and Jensen thereby allowing for the advantage of delivering audio data “without obstructing the ear canal,” Shakil [0015].
Claim 9 is directed to an information processing system substantially similar in scope to claim 8 and therefore is rejected for the same reasons.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOSEPH SAUNDERS whose telephone number is (571)270-1063. The examiner can normally be reached Monday-Thursday, 9:00 a.m. - 4 p.m., EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Carolyn R Edwards can be reached at (571)270-7136. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JOSEPH SAUNDERS JR/Primary Examiner, Art Unit 2692