Last updated: April 19, 2026
Application No. 18/561,273
SOUND SOURCE REPRODUCTION DEVICE, SOUND SOURCE REPRODUCTION METHOD, AND PROGRAM

Final Rejection §103
Filed
Nov 15, 2023
Examiner
BRINEY III, WALTER F
Art Unit
2692
Tech Center
2600 — Communications
Assignee
Nippon Telegraph and Telephone Corporation
OA Round
2 (Final)
Interview Optional

— +3.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 540 resolved cases, 2023–2026
Examiner Intelligence

BRINEY III, WALTER F View full profile →
Grants 65% — above average
Career Allow Rate
352 granted / 540 resolved
+3.2% vs TC avg
Minimal +4% lift
Without
With
+3.8%
Interview Lift
resolved cases with interview
Typical timeline
2y 12m
Avg Prosecution
58 currently pending
Career history
598
Total Applications
across all art units
Statute-Specific Performance

§101
1.7%
-38.3% vs TC avg
§103
63.2%
+23.2% vs TC avg
§102
13.5%
-26.5% vs TC avg
§112
9.4%
-30.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 540 resolved cases
Office Action

§103
Detailed Action
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . See 35 U.S.C. § 100 (note).
Art Rejections
Obviousness
The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1–7, 9–12, 14–17 and 19 are rejected under 35 U.S.C. § 103 as being unpatentable over the combination of US Patent Application Publication 2022/0360931 (effectively filed 21 June 2019) (“Namba”) and US Patent Application Publication 2008/0219509 (published 11 September 2008) (“White”) and Akihito Akutsu et al., 2020 Public Viewing—Kirari! Immersive Telepresence Technology, in 14 NTT Technical Review (December 2016) (“Akutsu”).
Claims 8, 13 and 18 are rejected under 35 U.S.C. § 103 as being unpatentable over the combination of Namba; White; US Patent Application Publication 2020/0364462 (published 19 November 2020) (“Imes”) and US Patent Application Publication 2013/0278727 (published 24 October 2013) (“Tamir”).
Claim 1 is drawn to “a sound source reproduction device.” The following table illustrates the correspondence between the claimed device and the Namba reference.
Claim 1
The Namba Reference
“1. A sound source reproduction device comprising:
“a processor configured to execute operations comprising:
The Namba reference similarly describes a signal processing device, method and computer readable medium with instructions that cause a processor 501 to generate and reproduces sounds. Namba at Abs., ¶¶ 11–13, 313–319, FIGs.5, 10, 11.
“receiving, as input, sound source information recorded in advance in synchronization with time information of a time;
Namba describes receiving audio information and metadata about objects in a piece of content. Id. at ¶¶ 48, 53, 65–83. The content may include a video of a sports match, like a soccer game. Id. at ¶¶ 48, 53, 68. The information may include an audio recording of a soccer ball being kicked. Id.
“receiving, as input, position information of a first sound source at the time; …
The metadata includes object position at various times within the content corresponding to frames of video. Id. at ¶¶ 72–80.
“generating a first virtual sound source for reproducing the first sound source in the real space by using the sound source information and the sound source position information at the time;
“reproducing the virtual sound source in the real space;
Namba analyzes the metadata in order to generate and reproduce a spatialized version of the object’s sound based on an arbitrarily determined listener position. Id. at ¶¶ 114–129.
“receiving, as input, first sound source attribute information for expressing an attribute of the first sound source at the time by using an image and for reproducing the attribute in a real space; …
“generating a sound source attribute synthesis image for reproducing the first sound source attribute information in the real space by using the position information of the first sound source and the first sound source attribute information at the time, wherein the sound source attribute synthesis image is generated at a substantially similar virtual position as the first sound source, in which the virtual position of the first sound source visually expresses associated sound source coordinates; and
“displaying the sound source attribute synthesis image in the real space.”
The Namba reference does not describe the generation and displaying of a sound source attribute synthesis image in real space, let alone at a similar virtual position as a first sound source to visually express associated sound source coordinates.

Table 1
The table above shows that the Namba reference describes a device that corresponds closely to the claimed device. The Namba reference does not anticipate the claimed device, however, because Namba does not describe a corresponding mechanism for generating and displaying a synthesized image from sound source attribute information at a time by using an image. Namba also does not describe that the synthesized image is displayed at a similar virtual position as a first sound source to visually express associated sound source coordinates.
Generating and Displaying a Synthesized Image
The differences between the claimed invention and the Namba reference are such that the invention as a whole would have been obvious to one of ordinary skill in the art at the time the Application was effectively filed. Namba describes a device capable of accurately spatializing sounds from a user-selectable viewing position in a variety of content contexts, such as a sports field, a theatre, a live show venue, a theme park and an orchestral performance. Namba at ¶¶ 26–45. Though Namba’s content includes both audible and visual elements, Namba does not describe any mechanism for enhancing the visual elements of the content.
The Namba reference teaches that its device can enhance the realistic presentation of a sporting event by modifying certain audio characteristics of the event. The White reference similarly relates to enhancing the presentation of sporting events. White at Abs., ¶¶ 2–7, 25–30, FIG.1. Rather than describing techniques for enhancing audio characteristics, the White reference teaches and suggests a device for enhancing visual characteristics of a sporting event. Id. White teaches tracking objects, such as a ball in a sporting game, in order to determining the object’s position, direction and speed at different points in time. Id. White teaches and suggests that a detected object, like a ball, is tracked in a sequence of time-indexed images. Id. at ¶ 61, FIG.10. The White device uses the object’s location, direction and speed to synthesize and apply a graphic to the object in order to highlight, or enhance, the object and to display the object’s path over a sequence of image frames. Id.
Read together, the Namba and the White references reasonably suggest modifying the Namba device to include both audible and visual elements of a sporting event. The Namba device would be modified to include features taught and suggested by White. For example, one of ordinary skill would be modified to include a processing facility 140, like the one taught by White. Processing facility 140 would receive sequences of images from cameras 120 and 130 that are recording a sporting event. Processing facility 140 would process the images to identify and track objects, such as a ball. Parameters tracked by processing facility 140 would include the object’s position, direction and speed. Processing facility 140 would then enhance the video produced for the sporting event by highlighting the tracked object. As an example, a ball would be highlighted by the addition of a graphical overlay at the position of the ball in each image frame. Additionally, processing facility 140 would blend a sequence of image frames to show the tracked object’s motion throughout a sporting space. The enhanced images would then be displayed at a user’s display device to improve the visibility of the tracked object.
Positioning a Synthesized Image at the Virtual Position of a Sound Source to Visually Express Associated Sound Source Coordinates
The combination of Namba and White shows the obviousness of modifying Namba’s system and method to generate synthesized images, such as a ball highlighted with graphical overlays that facilitate tracking the object’s motion through a sporting space. Further, the Akutsu reference teaches and suggests an additional modification. Akutsu is drawn to an immersive telepresence system that enhances conventional television and surround sound systems. Akutsu at Abs., § 1. Sounds and images from a sporting venue are captured in real time. Id. at §§ 6, FIG.1. The images are processed to extract particular objects, such as an athlete, and encoded for distribution to a remote viewing venue. Id. at §§ 3, 4, FIG.1. The remote venue receives the transmission and uses a pseudo-3D display device and an array of ultrasonic loudspeakers to present the images and audio together in the remote venue. Id. at FIGs.1, 2, 5. The unique combination of elements creates a pseudo-3D display of extracted images in addition to sound space reconstruction to place a virtual sound source, or acoustic image, alongside the visual image of the athlete. Id. at § 5, FIGs.2, 5. This presentation creates a reproduction of the extracted subject and its corresponding sound being reproduced as if those in the remote venue are at the sporting venue. Id.
Read in combination with Namba and White, Akutsu reasonably suggests modifying Namba to include a similar recording, processing, encoding, transmission and reproduction system. Rather than relying on a remote venue’s conventional television displays and surround sound speakers to reproduce events at a sporting venue, a pseudo-3D display and array of ultrasonic transducers may be used to virtual project the image and sounds of an extracted subject at the remote venue. The images and sounds would be collocated as claimed in order to present a unified representation of the subject that expresses both the subject’s visual and audible characteristics. For the foregoing reasons, the combination of the Namba, the White and the Akutsu references makes obvious all limitations of the claim.
Claim 2 depends on claim 1 and further requires the following:
“wherein the generating the sound source attribute synthesis image further comprises generating the sound source attribute synthesis image of a different mode according to a speed of the first sound source.”
Similarly, the White reference teaches and suggests displaying statistical information concerning the speed of a sports object. White at ¶ 30, FIG.10. For the foregoing reasons, the combination of the Namba and the White references makes obvious all limitations of the claim.
Claim 3 depends on claim 1 and further requires the following:
“wherein the receiving the position information further comprises receiving, as input, the position information of the first sound source and position information of a second sound source at the time, and
“the processor further configured to execute operations comprising:
“receiving, as input second sound source attribute information for expressing an attribute of the second sound source at the time by sound and reproducing the attribute in the real space,
“generating a second virtual sound source for reproducing the second sound source attribute information in the real space by using the position information of the second sound source and the second sound source attribute information at the time, and
“reproducing the second virtual sound source in the real space.”
The Namba reference similarly describes receiving position information concerning multiple objects that serve as sound sources. Namba at ¶¶ 2, 49–54, FIG.1. Each object is associated with a sound source attribute information, such as a source type indicating the type of sound associated with the object. Id. at ¶¶ 2, 53, 66–70. Namba describes generating sounds for each sound source based on the source type information and the position in order to spatialize each sound relative to a listening position using vector-based amplitude panning, wave field synthesis or head-related transfer functions. Id. at ¶¶ 3, 115, 216. For the foregoing reasons, the combination of the Namba and the White references makes obvious all limitations of the claim.
Claim 4 depends on claim 3 and further requires the following:
“wherein, when there is a plurality of the second sound sources, the generating the second sound source further comprises generating the second virtual sound sources of different modes.”
The Namba reference describes generating sounds of different modes for each type of audio source depending on the type of the audio source. Namba at ¶¶ 2, 53, 66–70. For the foregoing reasons, the combination of the Namba and the White references makes obvious all limitations of the claim.
Claim 8 depends on claim 1 and further requires the following:
“wherein the first sound source includes a ball used in a soccer,
“the attribute of the first sound source includes a velocity of movement of the ball, and
“wherein the first virtual sound source for reproducing the first sound source in the real space includes an image of the ball with a color representing the velocity of the movement of the ball and with a size representing a position of the ball according to the position information of the first sound source.”
The obviousness rejection of claim 1, incorporated herein, shows the obviousness of modifying Namba’s device to include features of White’s device in order to highlight sports objects, such as a ball. Further, Namba describes tracking and highlighting a soccer ball. Namba at ¶ 48. White teaches and suggests determining the ball’s velocity and highlighting the ball’s velocity by displaying a graphic of the ball and its path in a sequence of frames in addition to a graphic containing statistical, such as velocity. White at ¶ 61, FIG.10.
Neither Namba nor White teach or suggest altering the ball’s color to represent velocity or altering the ball’s size according to position information. However, Imes teaches and suggests visually highlighting sports balls by adding different colored graphics based on speed. Imes at ¶ 256. For example, a red color is added to indicate a high speed ball. Id. And the Tamir reference teaches and suggests scaling the size of graphics used to highlight tracked sports objects based on the size of the ball as it varies with depth. Tamir at ¶¶ 252–259. Based on these teachings, one of ordinary skill would have reasonably considered further altering Namba’s device to alter object graphics to signal speed with color and distance/depth from a viewing perspective with size. For the foregoing reasons, the combination of the Namba, the White, the Imes and the Tamir references makes obvious all limitations of the claim.
Claim 9 depends on claim 1 and further requires the following:
“wherein the sound source attribute synthesis image includes a visual representation of the first sound source and a visual indication of a direction and a speed of the first sound source in motion.”
The obviousness rejection of claim 1, incorporated herein, shows the obviousness of modifying Namba’s device to include features of White’s device in order to highlight sports objects, such as a ball. Further, Namba describes tracking and highlighting a soccer ball. Namba at ¶ 48. White teaches and suggests determining the ball’s velocity and highlighting the ball’s velocity by displaying a graphic of the ball and its path in a sequence of frames in addition to a graphic containing statistical, such as velocity. White at ¶ 61, FIG.10. For the foregoing reasons, the combination of the Namba and the White references makes obvious all limitations of the claim.
Claim 5 is drawn to “a sound source reproduction method.” The following table illustrates the correspondence between the claimed device and the Namba reference.
Claim 5
The Namba Reference
“5. A sound source reproduction method for presenting, to a listener, a sound source together with visual information representing a state and situation of the sound source, comprising:
The Namba reference similarly describes a signal processing device, method and computer readable medium with instructions that cause a processor 501 to generate and reproduce sounds corresponding to visual objects in a content video. Namba at Abs., ¶¶ 11–13, 313–319, FIGs.5, 10, 11.
“inputting sound source information recorded in advance in synchronization with time information of a time;
Namba describes receiving audio information and metadata about objects in a piece of content. Id. at ¶¶ 48, 53, 65–83. The content may include a video of a sports match, like a soccer game. Id. at ¶¶ 48, 53, 68. The information may include an audio recording of a soccer ball being kicked. Id.
“inputting position information of a first sound source at the time;
The metadata includes object position at various times within the content corresponding to frames of video. Id. at ¶¶ 72–80.
“generating a first virtual sound source for reproducing the first sound source in the real space by using the sound source information and the sound source position information at the time;
“step of reproducing the virtual sound source in the real space;
Namba analyzes the metadata in order to generate and reproduce a spatialized version of the object’s sound based on an arbitrarily determined listener position. Id. at ¶¶ 114–129.
“inputting first sound source attribute information for expressing an attribute of the first sound source at the time by using an image and reproducing the attribute in a real space;
“generating a sound source attribute synthesis image for reproducing the first sound source in the real space by using the position information of the first sound source and the first sound source attribute information at the time, wherein the sound source attribute synthesis image is generated at a substantially similar virtual position as the first sound source, in which the virtual position of the first sound source visually expresses associated sound source coordinates; and
“displaying the sound source attribute synthesis image in the real space.”
The Namba reference does not describe the generation and displaying of a sound source attribute synthesis image in real space, let alone at a similar virtual position as a first sound source to visually express associated sound source coordinates.

Table 2
The table above shows that the Namba reference describes a method that corresponds closely to the claimed method. The Namba reference does not anticipate the claimed method, however, because Namba does not describe a corresponding mechanism for generating and displaying a synthesized image from sound source attribute information at a time by using an image. Namba also does not describe that the synthesized image is displayed at a similar virtual position as a first sound source to visually express associated sound source coordinates.
Generating and Displaying a Synthesized Image
The differences between the claimed invention and the Namba reference are such that the invention as a whole would have been obvious to one of ordinary skill in the art at the time the Application was effectively filed. Namba describes a device capable of accurately spatializing sounds from a user-selectable viewing position in a variety of content contexts, such as a sports field, a theatre, a live show venue, a theme park and an orchestral performance. Namba at ¶¶ 26–45. Though Namba’s content includes both audible and visual elements, Namba does not describe any mechanism for enhancing the visual elements of the content.
The Namba reference teaches that its device can enhance the realistic presentation of a sporting event by modifying certain audio characteristics of the event. The White reference similarly relates to enhancing the presentation of sporting events. White at Abs., ¶¶ 2–7, 25–30, FIG.1. Rather than describing techniques for enhancing audio characteristics, the White reference teaches and suggests a device for enhancing visual characteristics of a sporting event. Id. White teaches tracking objects, such as a ball in a sporting game, in order to determining the object’s position, direction and speed at different points in time. Id. White teaches and suggests that a detected object, like a ball, is tracked in a sequence of time-indexed images. Id. at ¶ 61, FIG.10. The White device uses the object’s location, direction and speed to synthesize and apply a graphic to the object in order to highlight, or enhance, the object and to display the object’s path over a sequence of image frames. Id.
Read together, the Namba and the White references reasonably suggest modifying the Namba device to include both audible and visual elements of a sporting event. The Namba device would be modified to include features taught and suggested by White. For example, one of ordinary skill would be modified to include a processing facility 140, like the one taught by White. Processing facility 140 would receive sequences of images from cameras 120 and 130 that are recording a sporting event. Processing facility 140 would process the images to identify and track objects, such as a ball. Parameters tracked by processing facility 140 would include the object’s position, direction and speed. Processing facility 140 would then enhance the video produced for the sporting event by highlighting the tracked object. As an example, a ball would be highlighted by the addition of a graphical overlay at the position of the ball in each image frame. Additionally, processing facility 140 would blend a sequence of image frames to show the tracked object’s motion throughout a sporting space. The enhanced images would then be displayed at a user’s display device to improve the visibility of the tracked object.
Positioning a Synthesized Image at the Virtual Position of a Sound Source to Visually Express Associated Sound Source Coordinates
The combination of Namba and White shows the obviousness of modifying Namba’s system and method to generate synthesized images, such as a ball highlighted with graphical overlays that facilitate tracking the object’s motion through a sporting space. Further, the Akutsu reference teaches and suggests an additional modification. Akutsu is drawn to an immersive telepresence system that enhances conventional television and surround sound systems. Akutsu at Abs., § 1. Sounds and images from a sporting venue are captured in real time. Id. at §§ 6, FIG.1. The images are processed to extract particular objects, such as an athlete, and encoded for distribution to a remote viewing venue. Id. at §§ 3, 4, FIG.1. The remote venue receives the transmission and uses a pseudo-3D display device and an array of ultrasonic loudspeakers to present the images and audio together in the remote venue. Id. at FIGs.1, 2, 5. The unique combination of elements creates a pseudo-3D display of extracted images in addition to sound space reconstruction to place a virtual sound source, or acoustic image, alongside the visual image of the athlete. Id. at § 5, FIGs.2, 5. This presentation creates a reproduction of the extracted subject and its corresponding sound being reproduced as if those in the remote venue are at the sporting venue. Id.
Read in combination with Namba and White, Akutsu reasonably suggests modifying Namba to include a similar recording, processing, encoding, transmission and reproduction system. Rather than relying on a remote venue’s conventional television displays and surround sound speakers to reproduce events at a sporting venue, a pseudo-3D display and array of ultrasonic transducers may be used to virtual project the image and sounds of an extracted subject at the remote venue. The images and sounds would be collocated as claimed in order to present a unified representation of the subject that expresses both the subject’s visual and audible characteristics. For the foregoing reasons, the combination of the Namba, the White and the Akutsu references makes obvious all limitations of the claim.
Claim 6 depends on claim 5 and further requires the following:
“further comprising: inputting the position information of the first sound source and position information of a second sound source at the time;
“inputting second sound source attribute information for expressing an attribute of the second sound source at the time by sound and for reproducing the attribute in the real space;
“generating a second virtual sound source for reproducing the second sound source attribute information in the real space by using the position information of the second sound source and the second sound source attribute information at the time; and
“reproducing the second virtual sound source in the real space.”
The Namba reference similarly describes receiving position information concerning multiple objects that serve as sound sources. Namba at ¶¶ 2, 49–54, FIG.1. Each object is associated with a sound source attribute information, such as a source type indicating the type of sound associated with the object. Id. at ¶¶ 2, 53, 66–70. Namba describes generating sounds for each sound source based on the source type information and the position in order to spatialize each sound relative to a listening position using vector-based amplitude panning, wave field synthesis or head-related transfer functions. Id. at ¶¶ 3, 115, 216. For the foregoing reasons, the combination of the Namba and the White references makes obvious all limitations of the claim.
Claim 10 depends on claim 5 and further requires the following:
“wherein the generating the sound source attribute synthesis image further comprises generating the sound source attribute synthesis image of a different mode according to a speed of the first sound source.”
Similarly, the White reference teaches and suggests displaying statistical information concerning the speed of a sports object. White at ¶ 30, FIG.10. For the foregoing reasons, the combination of the Namba and the White references makes obvious all limitations of the claim.
Claim 11 depends on claim 5 and further requires the following:
“wherein the receiving the position information further comprises receiving, as input, the position information of the first sound source and position information of a second sound source at the time, and
“the processor further configured to execute operations comprising:
“receiving, as input second sound source attribute information for expressing an attribute of the second sound source at the time by sound and
“reproducing the attribute in the real space, generating a second virtual sound source for reproducing the second sound source attribute information in the real space by using the position information of the second sound source and the second sound source attribute information at the time, and
“reproducing the second virtual sound source in the real space.”
The Namba reference similarly describes receiving position information concerning multiple objects that serve as sound sources. Namba at ¶¶ 2, 49–54, FIG.1. Each object is associated with a sound source attribute information, such as a source type indicating the type of sound associated with the object. Id. at ¶¶ 2, 53, 66–70. Namba describes generating sounds for each sound source based on the source type information and the position in order to spatialize each sound relative to a listening position using vector-based amplitude panning, wave field synthesis or head-related transfer functions. Id. at ¶¶ 3, 115, 216. For the foregoing reasons, the combination of the Namba and the White references makes obvious all limitations of the claim.
Claim 12 depends on claim 11 and further requires the following:
“wherein, when there is a plurality of the second sound sources, the generating the second sound source further comprises generating the second virtual sound sources of different modes.”
The Namba reference describes generating sounds of different modes for each type of audio source depending on the type of the audio source. Namba at ¶¶ 2, 53, 66–70. For the foregoing reasons, the combination of the Namba and the White references makes obvious all limitations of the claim.
Claim 13 depends on claim 5 and further requires the following:
“wherein the first sound source includes a ball used in a soccer,
“the attribute of the first sound source includes a velocity of movement of the ball, and
“wherein the first virtual sound source for reproducing the first sound source in the real space includes an image of the ball with a color representing the velocity of the movement of the ball and with a size representing a position of the ball according to the position information of the first sound source.”
The obviousness rejection of claim 5, incorporated herein, shows the obviousness of modifying Namba’s device to include features of White’s device in order to highlight sports objects, such as a ball. Further, Namba describes tracking and highlighting a soccer ball. Namba at ¶ 48. White teaches and suggests determining the ball’s velocity and highlighting the ball’s velocity by displaying a graphic of the ball and its path in a sequence of frames in addition to a graphic containing statistical, such as velocity. White at ¶ 61, FIG.10.
Neither Namba nor White teach or suggest altering the ball’s color to represent velocity or altering the ball’s size according to position information. However, Imes teaches and suggests visually highlighting sports balls by adding different colored graphics based on speed. Imes at ¶ 256. For example, a red color is added to indicate a high speed ball. Id. And the Tamir reference teaches and suggests scaling the size of graphics used to highlight tracked sports objects based on the size of the ball as it varies with depth. Tamir at ¶¶ 252–259. Based on these teachings, one of ordinary skill would have reasonably considered further altering Namba’s device to alter object graphics to signal speed with color and distance/depth from a viewing perspective with size. For the foregoing reasons, the combination of the Namba, the White, the Imes and the Tamir references makes obvious all limitations of the claim.
Claim 14 depends on claim 5 and further requires the following:
“wherein the sound source attribute synthesis image includes a visual representation of the first sound source and a visual indication of a direction and a speed of the first sound source in motion.”
The obviousness rejection of claim 5, incorporated herein, shows the obviousness of modifying Namba’s device to include features of White’s device in order to highlight sports objects, such as a ball. Further, Namba describes tracking and highlighting a soccer ball. Namba at ¶ 48. White teaches and suggests determining the ball’s velocity and highlighting the ball’s velocity by displaying a graphic of the ball and its path in a sequence of frames in addition to a graphic containing statistical, such as velocity. White at ¶ 61, FIG.10. For the foregoing reasons, the combination of the Namba and the White references makes obvious all limitations of the claim.
Claim 7 is drawn to “a computer-readable non-transitory recording medium storing computer-executable program instructions that when executed by a processor cause a computer to execute operations.” The following table illustrates the correspondence between the claimed medium and the Namba reference.
Claim 7
The Namba Reference
“7. A computer-readable non-transitory recording medium storing computer-executable program instructions that when executed by a processor cause a computer to execute operations comprising:
The Namba reference similarly describes a signal processing device, method and computer readable medium with instructions that cause a processor 501 to generate and reproduces sounds. Namba at Abs., ¶¶ 11–13, 313–319, FIGs.5, 10, 11.
“receiving, as input, sound source information recorded in advance in synchronization with time information of a time;
Namba describes receiving audio information and metadata about objects in a piece of content. Id. at ¶¶ 48, 53, 65–83. The content may include a video of a sports match, like a soccer game. Id. at ¶¶ 48, 53, 68. The information may include an audio recording of a soccer ball being kicked. Id.
“receiving, as input, position information of a first sound source at the time;
The metadata includes object position at various times within the content corresponding to frames of video. Id. at ¶¶ 72–80.
“generating a first virtual sound source for reproducing the first sound source in the real space by using the sound source information and the sound source position information at the time;
“reproducing the virtual sound source in the real space;
Namba analyzes the metadata in order to generate and reproduce a spatialized version of the object’s sound based on an arbitrarily determined listener position. Id. at ¶¶ 114–129.
“receiving, as input, first sound source attribute information for expressing an attribute of the first sound source at the time by using an image and for reproducing the attribute in a real space;
“generating a sound source attribute synthesis image for reproducing the first sound source attribute information in the real space by using the position information of the first sound source and the first sound source attribute information at the time, wherein the sound source attribute synthesis image is generated at a substantially similar virtual position as the first sound source, in which the virtual position of the first sound source visually expresses associated sound source coordinates; and
“displaying the sound source attribute synthesis image in the real space.”
The Namba reference does not describe the generation and displaying of a sound source attribute synthesis image in real space, let alone at a similar virtual position as a first sound source to visually express associated sound source coordinates.

Table 3
The table above shows that the Namba reference describes a medium that corresponds closely to the claimed medium. The Namba reference does not anticipate the claimed medium, however, because Namba does not describe a corresponding mechanism for generating and displaying a synthesized image from sound source attribute information at a time by using an image. Namba also does not describe that the synthesized image is displayed at a similar virtual position as a first sound source to visually express associated sound source coordinates.
Generating and Displaying a Synthesized Image
The differences between the claimed invention and the Namba reference are such that the invention as a whole would have been obvious to one of ordinary skill in the art at the time the Application was effectively filed. Namba describes a device capable of accurately spatializing sounds from a user-selectable viewing position in a variety of content contexts, such as a sports field, a theatre, a live show venue, a theme park and an orchestral performance. Namba at ¶¶ 26–45. Though Namba’s content includes both audible and visual elements, Namba does not describe any mechanism for enhancing the visual elements of the content.
The Namba reference teaches that its device can enhance the realistic presentation of a sporting event by modifying certain audio characteristics of the event. The White reference similarly relates to enhancing the presentation of sporting events. White at Abs., ¶¶ 2–7, 25–30, FIG.1. Rather than describing techniques for enhancing audio characteristics, the White reference teaches and suggests a device for enhancing visual characteristics of a sporting event. Id. White teaches tracking objects, such as a ball in a sporting game, in order to determining the object’s position, direction and speed at different points in time. Id. White teaches and suggests that a detected object, like a ball, is tracked in a sequence of time-indexed images. Id. at ¶ 61, FIG.10. The White device uses the object’s location, direction and speed to synthesize and apply a graphic to the object in order to highlight, or enhance, the object and to display the object’s path over a sequence of image frames. Id.
Read together, the Namba and the White references reasonably suggest modifying the Namba device to include both audible and visual elements of a sporting event. The Namba device would be modified to include features taught and suggested by White. For example, one of ordinary skill would be modified to include a processing facility 140, like the one taught by White. Processing facility 140 would receive sequences of images from cameras 120 and 130 that are recording a sporting event. Processing facility 140 would process the images to identify and track objects, such as a ball. Parameters tracked by processing facility 140 would include the object’s position, direction and speed. Processing facility 140 would then enhance the video produced for the sporting event by highlighting the tracked object. As an example, a ball would be highlighted by the addition of a graphical overlay at the position of the ball in each image frame. Additionally, processing facility 140 would blend a sequence of image frames to show the tracked object’s motion throughout a sporting space. The enhanced images would then be displayed at a user’s display device to improve the visibility of the tracked object.
Positioning a Synthesized Image at the Virtual Position of a Sound Source to Visually Express Associated Sound Source Coordinates
The combination of Namba and White shows the obviousness of modifying Namba’s system and method to generate synthesized images, such as a ball highlighted with graphical overlays that facilitate tracking the object’s motion through a sporting space. Further, the Akutsu reference teaches and suggests an additional modification. Akutsu is drawn to an immersive telepresence system that enhances conventional television and surround sound systems. Akutsu at Abs., § 1. Sounds and images from a sporting venue are captured in real time. Id. at §§ 6, FIG.1. The images are processed to extract particular objects, such as an athlete, and encoded for distribution to a remote viewing venue. Id. at §§ 3, 4, FIG.1. The remote venue receives the transmission and uses a pseudo-3D display device and an array of ultrasonic loudspeakers to present the images and audio together in the remote venue. Id. at FIGs.1, 2, 5. The unique combination of elements creates a pseudo-3D display of extracted images in addition to sound space reconstruction to place a virtual sound source, or acoustic image, alongside the visual image of the athlete. Id. at § 5, FIGs.2, 5. This presentation creates a reproduction of the extracted subject and its corresponding sound being reproduced as if those in the remote venue are at the sporting venue. Id.
Read in combination with Namba and White, Akutsu reasonably suggests modifying Namba to include a similar recording, processing, encoding, transmission and reproduction system. Rather than relying on a remote venue’s conventional television displays and surround sound speakers to reproduce events at a sporting venue, a pseudo-3D display and array of ultrasonic transducers may be used to virtual project the image and sounds of an extracted subject at the remote venue. The images and sounds would be collocated as claimed in order to present a unified representation of the subject that expresses both the subject’s visual and audible characteristics. For the foregoing reasons, the combination of the Namba, the White and the Akutsu references makes obvious all limitations of the claim.
Claim 15 depends on claim 7 and further requires the following:
“wherein the generating the sound source attribute synthesis image further comprises generating the sound source attribute synthesis image of a different mode according to a speed of the first sound source.”
Similarly, the White reference teaches and suggests displaying statistical information concerning the speed of a sports object. White at ¶ 30, FIG.10. For the foregoing reasons, the combination of the Namba and the White references makes obvious all limitations of the claim.
Claim 16 depends on claim 7 and further requires the following:
“wherein the receiving the position information further comprises receiving, as input, the position information of the first sound source and position information of a second sound source at the time, and
“the processor further configured to execute operations comprising:
“receiving, as input second sound source attribute information for expressing an attribute of the second sound source at the time by sound and reproducing the attribute in the real space,
“generating a second virtual sound source for reproducing the second sound source attribute information in the real space by using the position information of the second sound source and the second sound source attribute information at the time, and
“reproducing the second virtual sound source in the real space.”
The Namba reference similarly describes receiving position information concerning multiple objects that serve as sound sources. Namba at ¶¶ 2, 49–54, FIG.1. Each object is associated with a sound source attribute information, such as a source type indicating the type of sound associated with the object. Id. at ¶¶ 2, 53, 66–70. Namba describes generating sounds for each sound source based on the source type information and the position in order to spatialize each sound relative to a listening position using vector-based amplitude panning, wave field synthesis or head-related transfer functions. Id. at ¶¶ 3, 115, 216. For the foregoing reasons, the combination of the Namba and the White references makes obvious all limitations of the claim.
Claim 17 depends on claim 16 and further requires the following:
“wherein, when there is a plurality of the second sound sources, the generating the second sound source further comprises generating the second virtual sound sources of different modes.”
The Namba reference describes generating sounds of different modes for each type of audio source depending on the type of the audio source. Namba at ¶¶ 2, 53, 66–70. For the foregoing reasons, the combination of the Namba and the White references makes obvious all limitations of the claim.
Claim 18 depends on claim 7 and further requires the following:
“wherein the first sound source includes a ball used in a soccer,
“the attribute of the first sound source includes a velocity of movement of the ball, and
“wherein the first virtual sound source for reproducing the first sound source in the real space includes an image of the ball with a color representing the velocity of the movement of the ball and with a size representing a position of the ball according to the position information of the first sound source.”
The obviousness rejection of claim 7, incorporated herein, shows the obviousness of modifying Namba’s device to include features of White’s device in order to highlight sports objects, such as a ball. Further, Namba describes tracking and highlighting a soccer ball. Namba at ¶ 48. White teaches and suggests determining the ball’s velocity and highlighting the ball’s velocity by displaying a graphic of the ball and its path in a sequence of frames in addition to a graphic containing statistical, such as velocity. White at ¶ 61, FIG.10.
Neither Namba nor White teach or suggest altering the ball’s color to represent velocity or altering the ball’s size according to position information. However, Imes teaches and suggests visually highlighting sports balls by adding different colored graphics based on speed. Imes at ¶ 256. For example, a red color is added to indicate a high speed ball. Id. And the Tamir reference teaches and suggests scaling the size of graphics used to highlight tracked sports objects based on the size of the ball as it varies with depth. Tamir at ¶¶ 252–259. Based on these teachings, one of ordinary skill would have reasonably considered further altering Namba’s device to alter object graphics to signal speed with color and distance/depth from a viewing perspective with size. For the foregoing reasons, the combination of the Namba, the White, the Imes and the Tamir references makes obvious all limitations of the claim.
Claim 19 depends on claim 7 and further requires the following:
“wherein the sound source attribute synthesis image includes a visual representation of the first sound source and a visual indication of a direction and a speed of the first sound source in motion.”
The obviousness rejection of claim 7, incorporated herein, shows the obviousness of modifying Namba’s device to include features of White’s device in order to highlight sports objects, such as a ball. Further, Namba describes tracking and highlighting a soccer ball. Namba at ¶ 48. White teaches and suggests determining the ball’s velocity and highlighting the ball’s velocity by displaying a graphic of the ball and its path in a sequence of frames in addition to a graphic containing statistical, such as velocity. White at ¶ 61, FIG.10. For the foregoing reasons, the combination of the Namba and the White references makes obvious all limitations of the claim.
Summary
Claims 1–19 are rejected under at least one of 35 U.S.C. §§ 102 and 103 as being unpatentable over the cited prior art. In the event the determination of the status of the application as subject to AIA  35 U.S.C. §§ 102 and 103 (or as subject to pre-AIA  35 U.S.C. §§ 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 C.F.R. § 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. § 102(b)(2)(C) for any potential 35 U.S.C. § 102(a)(2) prior art against the later invention.
Response to Applicant’s Arguments
Applicant’s Reply (25 November 2025) has substantively amended all the claims. This Office action has updated all the rejections accordingly.
Applicant’s Reply at 9–10 additionally includes comments pertaining to the rejections presented in the Non-Final Rejection (25 August 2025). The Examiner has considered those comments, but they are moot in light of the new grounds of rejection presented in this Office action.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 C.F.R. § 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 C.F.R. § 1.17(a)) pursuant to 37 C.F.R. § 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WALTER F BRINEY III whose telephone number is (571)272-7513. The examiner can normally be reached M-F 8 am-4:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Carolyn Edwards can be reached at 571-270-7136. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Walter F Briney III/

Walter F Briney IIIPrimary ExaminerArt Unit 2692

1/23/2026
Read full office action
Prosecution Timeline

Nov 15, 2023
Application Filed
Aug 21, 2025
Non-Final Rejection — §103
Nov 13, 2025
Interview Requested
Nov 25, 2025
Applicant Interview (Telephonic)
Nov 25, 2025
Response Filed
Dec 04, 2025
Examiner Interview Summary
Jan 23, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/940,871
Patent 12598444
Apparatus and Method for Rendering a Sound Scene Using Pipeline Stages
2y 5m to grant Granted Apr 07, 2026
18/152,065
Patent 12598442
AUTOMATIC LOUDSPEAKER DIRECTIVITY ADAPTATION
2y 5m to grant Granted Apr 07, 2026
18/562,609
Patent 12598412
Sound Signal Processing Method and Headset Device
2y 5m to grant Granted Apr 07, 2026
18/522,158
Patent 12587791
SOUND-GENERATING DEVICE
2y 5m to grant Granted Mar 24, 2026
18/223,871
Patent 12581245
LOUDSPEAKER
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
65%
Grant Probability
69%
With Interview (+3.8%)
2y 12m
Median Time to Grant
Moderate
PTA Risk
Based on 540 resolved cases by this examiner. Grant probability derived from career allow rate.
SOUND SOURCE REPRODUCTION DEVICE, SOUND SOURCE REPRODUCTION METHOD, AND PROGRAM

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email