Last updated: April 19, 2026
Application No. 19/090,027
VIRTUAL SPACE INTERFACE DEVICE, CLIENT TERMINAL, COMPUTER READABLE NON-TRANSITORY STORAGE MEDIUM STORING PROGRAM, AND VIRTUAL SPACE INTERFACE CONTROL METHOD

Non-Final OA §103
Filed
Mar 25, 2025
Examiner
CERULLO, LILIANA P
Art Unit
2621
Tech Center
2600 — Communications
Assignee
Jvckenwood Corporation
OA Round
1 (Non-Final)
Interview Optional

— +21.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 944 resolved cases, 2023–2026
Examiner Intelligence

CERULLO, LILIANA P View full profile →
Grants 74% — above average
Career Allow Rate
702 granted / 944 resolved
+12.4% vs TC avg
Strong +22% interview lift
Without
With
+21.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 6m
Avg Prosecution
27 currently pending
Career history
971
Total Applications
across all art units
Statute-Specific Performance

§101
2.4%
-37.6% vs TC avg
§103
53.6%
+13.6% vs TC avg
§102
22.2%
-17.8% vs TC avg
§112
15.0%
-25.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 944 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 
The following title is suggested: “Virtual space interface device and method with control based on position of hands with respect to user’s face”.
Claim Objections
Claims 1 and 4 are objected to because of the following informalities: 
Claim 1 line 12 reads “of the client terminal to display an image showing a situation in the virtual space”. For proper antecedent basis, this should read ““of the client terminal to display the image showing the situation in the virtual space” because “an image showing a situation” was already introduce in lines 4-5. 
Claim 1 line 15 reads “of the client terminal to output a sound in the virtual space”. For proper antecedent basis, this should read ““of the client terminal to output the sound in the virtual space” because “a sound in the virtual space” was already introduced in line 6.
Claim 4 line 10 reads “terminal to output a sound in the virtual space”. For proper antecedent basis, this should read ““terminal to output the sound in the virtual space” because “a sound in the virtual space” was already introduced in line 4.
Appropriate correction is required.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitations use a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are: 
“display data generating unit” in claim 1-3, which maps to a server in Fig. 1.
“sound data generating unit” in claims 1-6, which maps to a server in Fig. 1.
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitations to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitations recite sufficient structure to perform the claimed function so as to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4, 7 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Jarvinen et al. in US 2020/0209952 (hereinafter Jarvinen) in view of Osman in US 2018/0311585 (hereinafter Osman).

Regarding claim 1, Jarvinen disclose a virtual space interface device (Jarvinen’s Fig.1 and par. 142: see 104) provided in a virtual space providing system (Jarvinen’s Fig. 1 and par. 142, 252: see 100) having at least a client terminal used by a user (Jarvinen’s Fig.1 and par. 151: see 107, 102), wherein the client terminal includes a display device (Jarvinen’s Fig. 1 and par. 155: see 110) configured to display an image showing a situation in the virtual space (Jarvinen’s par. 155: visual imagery in VR), a sound output device (Jarvinen’s Fig. 1 and par. 155: see 111) configured to output a sound in the virtual space (Jarvinen’s par. 155: audio content of the VR), a sound pickup device (Jarvinen’s par. 119, 124, 127: microphone) configured to pick up a sound uttered by the user (Jarvinen’s par. 119, 124, 127: capture aural scene), and a photographing device (Jarvinen’s par. 119, 205: camera) configured to capture a facial image of the user (Jarvinen’s par. 119, 205: camera detect free space gestures, where the gesture includes a face per par. 208), wherein the virtual space interface device (Jarvinen’s Fig.1 and par. 142: see 104) includes a display data generating unit (112f: server)(Jarvinen’s par. 150-152, 252) configured to generate display data for causing the display device of the client terminal to display the image showing the situation in the virtual space (Jarvinen’s par. 155: 101 in communication with 104 to display visual imagery content of the visual scene) and 
a sound data generating unit (112f: server)(Jarvinen’s par. 150-152, 252) configured to generate sound data for causing the sound output device of the client terminal to output the sound in the virtual space (Jarvinen’s par. 155: 101 in communication with 104 to present the spatial audio content of the audio scene), wherein the sound data generating unit (112f: server)(Jarvinen’s par. 150-152, 252) generates sound data for outputting the user-uttered sound (Jarvinen’s par. 124: aural scene captured, par. 205: voice command) picked up by the sound pickup device of the client terminal (Jarvinen’s par. 124: microphone) into the virtual space (Jarvinen’s par. 192-193: speech bubbles, par. 205: voice command), wherein the display data generating unit (112f: server)(Jarvinen’s par. 150-152, 252) and the sound data generating unit (112f: server)(Jarvinen’s par. 150-152, 252) control at least one item of the display data (as shown in Jarvinen’s Figs. 8-9) for causing the display device of the client terminal (Jarvinen’s Figs. 1, 8-9: see 110) to display the image showing the situation in the virtual space (Jarvinen’s Figs. 8-9), the sound data for causing the sound output device of the client terminal to output the sound in the virtual space (Jarvinen’s Figs. 8-9 and par. 192-193: speech), and the sound data for outputting the sound uttered by the user into the virtual space (Jarvinen’s par. 205: voice command), as a control target (Jarvinen’s Figs. 8-9 and par. 192-193: orientation of display and sound being presented, par. 205: voice command), on the basis of a gesture of positioning the user's hands at an area of the user's face (Jarvinen’s Fig. 11 and par. 206-208) photographed by the photographing device of the client terminal (Jarvinen’s par. 119, 205) and a positional relationship of the user (Jarvinen’s Fig. 10 and par. 203: user has a close up view of the band by moving forward from location 1018 to 1017), and wherein the display data generating unit (112f: server)(Jarvinen’s par. 150-152, 252) and the sound data generating unit (112f: server)(Jarvinen’s par. 150-152, 252) differentiate the control target in accordance with a part of the face area where the user positions the user's hands (Jarvinen’s Fig. 11 and par. 207: hand to ear for locking audio scene, vs. hand to eyes to lock visual scene).
Jarvinen fails to disclose a positional relationship between the photographing device of the client terminal and the user's face.
However, in the same field of endeavor of VR interfaces, Osman discloses a user of an HMD (Osman’s Fig. 2B and par. 96) with a positional relationship between a photographic device and a user’s face (Osman’s Fig. 5 and par. 103-104: distance between face and camera of hand-held device) used to control the image viewed through an external hand held device (Osman’s Fig. 5).
Therefore, it would have been obvious to one of ordinary skill in the art, that Jarvinen also controls at least one item of the display data as a control target on the basis of the gesture (Jarvinen’s Figs. 8-9), but also on the basis of the distance between the photographing device and the user’s face (Osman’s Fig. 5), in order to obtain the benefit of enabling the user to see more or less content when using an external (Osman’s par. 102) and because Jarvinen already discloses zooming content based on distance (Osman’s Fig. 10).
By doing such combination, Jarvinen in view of Osman disclose:
A virtual space interface device (Jarvinen’s Fig.1 and par. 142: see 104) provided in a virtual space providing system (Jarvinen’s Fig. 1 and par. 142, 252: see 100) having at least a client terminal used by a user (Jarvinen’s Fig.1 and par. 151: see 107, 102), 
wherein the client terminal includes 
a display device (Jarvinen’s Fig. 1 and par. 155: see 110) configured to display an image showing a situation in the virtual space (Jarvinen’s par. 155: visual imagery in VR), 
a sound output device (Jarvinen’s Fig. 1 and par. 155: see 111) configured to output a sound in the virtual space (Jarvinen’s par. 155: audio content of the VR), 
a sound pickup device (Jarvinen’s par. 119, 124, 127: microphone) configured to pick up a sound uttered by the user (Jarvinen’s par. 119, 124, 127: capture aural scene), and 
a photographing device (Jarvinen’s par. 119, 205: camera) configured to capture a facial image of the user (Jarvinen’s par. 119, 205: camera detect free space gestures, where the gesture includes a face per par. 208), 
wherein the virtual space interface device (Jarvinen’s Fig.1 and par. 142: see 104) includes 
a display data generating unit (112f: server)(Jarvinen’s par. 150-152, 252) configured to generate display data for causing the display device of the client terminal to display the image showing the situation in the virtual space (Jarvinen’s par. 155: 101 in communication with 104 to display visual imagery content of the visual scene) and 
a sound data generating unit (112f: server)(Jarvinen’s par. 150-152, 252) configured to generate sound data for causing the sound output device of the client terminal to output the sound in the virtual space (Jarvinen’s par. 155: 101 in communication with 104 to present the spatial audio content of the audio scene), 
wherein the sound data generating unit (112f: server)(Jarvinen’s par. 150-152, 252) generates sound data for outputting the user-uttered sound (Jarvinen’s par. 124: aural scene captured, par. 205: voice command) picked up by the sound pickup device of the client terminal (Jarvinen’s par. 124: microphone) into the virtual space (Jarvinen’s par. 192-193: speech bubbles, par. 205: voice command), 
wherein the display data generating unit (112f: server)(Jarvinen’s par. 150-152, 252) and the sound data generating unit (112f: server)(Jarvinen’s par. 150-152, 252) control at least one item of the display data (as shown in Jarvinen’s Figs. 8-9) for causing the display device of the client terminal (Jarvinen’s Figs. 1, 8-9: see 110) to display the image showing the situation in the virtual space (Jarvinen’s Figs. 8-9), the sound data for causing the sound output device of the client terminal to output the sound in the virtual space (Jarvinen’s Figs. 8-9 and par. 192-193: speech), and the sound data for outputting the sound uttered by the user into the virtual space (Jarvinen’s par. 205: voice command), as a control target (Jarvinen’s Figs. 8-9 and par. 192-193: orientation of display and sound being presented, par. 205: voice command), on the basis of a gesture of positioning the user's hands at an area of the user's face (Jarvinen’s Fig. 11 and par. 206-208) photographed by the photographing device of the client terminal (Jarvinen’s par. 119, 205) and a positional relationship between the photographing device of the client terminal and the user's face (upon combination with Osman’s Figs. 2B, 5 and par. 96, 103-104: the user can zoom-in or out of an observed image by the distance between a camera and the face of the user), and 
wherein the display data generating unit (112f: server)(Jarvinen’s par. 150-152, 252) and the sound data generating unit (112f: server)(Jarvinen’s par. 150-152, 252) differentiate the control target in accordance with a part of the face area where the user positions the user's hands (Jarvinen’s Fig. 11 and par. 207: hand to ear for locking audio scene, vs. hand to eyes to lock visual scene).

Regarding claim 4, Jarvinen disclose a virtual space interface device (Jarvinen’s Fig.1 and par. 142: see 104) provided in a virtual space providing system (Jarvinen’s Fig. 1 and par. 142, 252: see 100) having at least a client terminal used by a user (Jarvinen’s Fig.1 and par. 151: see 107, 102), wherein the client terminal includes a sound output device (Jarvinen’s Fig. 1 and par. 155: see 111) configured to output a sound in the virtual space (Jarvinen’s par. 155: audio content of the VR), a sound pickup device (Jarvinen’s par. 119, 124, 127: microphone) configured to pick up a sound uttered by the user (Jarvinen’s par. 119, 124, 127: capture aural scene), and a photographing device (Jarvinen’s par. 119, 205: camera) configured to capture a facial image of the user (Jarvinen’s par. 119, 205: camera detect free space gestures, where the gesture includes a face per par. 208), wherein the virtual space interface device (Jarvinen’s Fig.1 and par. 142: see 104) includes a sound data generating unit (112f: server)(Jarvinen’s par. 150-152, 252) configured to generate sound data for causing the sound output device of the client terminal to output the sound in the virtual space (Jarvinen’s par. 155: 101 in communication with 104 to present the spatial audio content of the audio scene), wherein the sound data generating unit (112f: server)(Jarvinen’s par. 150-152, 252) generates sound data for outputting the user-uttered sound (Jarvinen’s par. 124: aural scene captured, par. 205: voice command) picked up by the sound pickup device of the client terminal (Jarvinen’s par. 124: microphone) into the virtual space (Jarvinen’s par. 192-193: speech bubbles, par. 205: voice command), and wherein the sound data generating unit (112f: server)(Jarvinen’s par. 150-152, 252) controls at least one item of the sound data (Jarvinen’s Figs. 8-10 and par. 191-201) for causing the sound output device of the client terminal to output the sound in the virtual space (Jarvinen’s Figs. 8-9 and par. 192-193: speech) and the sound data for outputting the sound uttered by the user into the virtual space (Jarvinen’s par. 205: voice command), as a control target (Jarvinen’s Figs. 8-10 and par. 192-193, 201: orientation of display and sound being presented, par. 205: voice command), on the basis of a gesture of positioning the user's hands at an area of the user's face (Jarvinen’s Fig. 11 and par. 206-208) photographed by the photographing device of the client terminal (Jarvinen’s par. 119, 205) and differentiates the control target in accordance with a part of the face area where the user positions the user's hands (Jarvinen’s Fig. 11 and par. 207: hand to ear for locking audio scene, vs. hand to eyes to lock visual scene).
Jarvinen fails to disclose a positional relationship between the photographing device of the client terminal and the user's face.
However, in the same field of endeavor of VR interfaces, Osman discloses a user of an HMD (Osman’s Fig. 2B and par. 96) with a positional relationship between a photographic device and a user’s face (Osman’s Fig. 5 and par. 103-104: distance between face and camera of hand-held device) used to control the image viewed through an external hand held device (Osman’s Fig. 5).
Therefore, it would have been obvious to one of ordinary skill in the art, that Jarvinen also controls at least one item of the display data as a control target on the basis of the gesture (Jarvinen’s Figs. 8-9), but also on the basis of the distance between the photographing device and the user’s face (Osman’s Fig. 5), in order to obtain the benefit of enabling the user to see more or less content when using an external (Osman’s par. 102) and because Jarvinen already discloses zooming content based on distance (Osman’s Fig. 10).
By doing such combination, Jarvinen in view of Osman disclose:
A virtual space interface device (Jarvinen’s Fig.1 and par. 142: see 104) provided in a virtual space providing system (Jarvinen’s Fig. 1 and par. 142, 252: see 100) having at least a client terminal used by a user (Jarvinen’s Fig.1 and par. 151: see 107, 102), 
wherein the client terminal includes 
a sound output device (Jarvinen’s Fig. 1 and par. 155: see 111) configured to output a sound in the virtual space (Jarvinen’s par. 155: audio content of the VR), 
a sound pickup device (Jarvinen’s par. 119, 124, 127: microphone) configured to pick up a sound uttered by the user (Jarvinen’s par. 119, 124, 127: capture aural scene), and 
a photographing device (Jarvinen’s par. 119, 205: camera) configured to capture a facial image of the user (Jarvinen’s par. 119, 205: camera detect free space gestures, where the gesture includes a face per par. 208), 
wherein the virtual space interface device (Jarvinen’s Fig.1 and par. 142: see 104) includes 
a sound data generating unit (112f: server)(Jarvinen’s par. 150-152, 252) configured to generate sound data for causing the sound output device of the client terminal to output the sound in the virtual space (Jarvinen’s par. 155: 101 in communication with 104 to present the spatial audio content of the audio scene), 
wherein the sound data generating unit (112f: server)(Jarvinen’s par. 150-152, 252) generates sound data for outputting the user-uttered sound (Jarvinen’s par. 124: aural scene captured, par. 205: voice command) picked up by the sound pickup device of the client terminal (Jarvinen’s par. 124: microphone) into the virtual space (Jarvinen’s par. 192-193: speech bubbles, par. 205: voice command), and 
wherein the sound data generating unit (112f: server)(Jarvinen’s par. 150-152, 252) controls at least one item of the sound data (Jarvinen’s Figs. 8-10 and par. 191-201) for causing the sound output device of the client terminal to output the sound in the virtual space (Jarvinen’s Figs. 8-9 and par. 192-193: speech) and the sound data for outputting the sound uttered by the user into the virtual space (Jarvinen’s par. 205: voice command), as a control target (Jarvinen’s Figs. 8-10 and par. 192-193, 201: orientation of display and sound being presented, par. 205: voice command), on the basis of a gesture of positioning the user's hands at an area of the user's face (Jarvinen’s Fig. 11 and par. 206-208) photographed by the photographing device of the client terminal (Jarvinen’s par. 119, 205) and a positional relationship between the photographing device of the client terminal and the user's face (upon combination with Osman’s Figs. 2B, 5 and par. 96, 103-104: the user can zoom-in or out of an observed image by the distance between a camera and the face of the user) and 
differentiates the control target in accordance with a part of the face area where the user positions the user's hands (Jarvinen’s Fig. 11 and par. 207: hand to ear for locking audio scene, vs. hand to eyes to lock visual scene).
Regarding claim 7, Jarvinen in view of Osman disclose wherein the sound data generating unit (112f: server)(Jarvinen’s par. 150-152, 252) controls a direction of arrival of the sound from the virtual space output by the sound output device of the client terminal (Jarvinen’s Fig. 10 and par. 201-204: locking orientation of output to orientation 1003) on the basis of an action of the user (Jarvinen’s Fig. 11 and par. 207), who is photographed by the photographing device of the client terminal, placing the user's hands at the user's ears (Jarvinen’s Fig. 11 and par. 207: locking audio scene) and the orientation of the user's face relative to the photographing device of the client terminal (Jarvinen’s Fig. 10 and par. 202: locked audio scene in orientation 1003 where orientation 1003 is determined according to the face relative to an external device with a camera upon combination with Osman’s Fig. 5).
It would also have been obvious to one of ordinary skill in the art, that Jarvinen’s Fig. 10 face orientation is determined as the orientation of the user’s face relative to a photographing device (Osman’s Fig. 5), in order to obtain the predictable result explained for claim 4.

Regarding claim 9, Jarvinen disclose a virtual space interface control method (Jarvinen’s par. 1) for controlling a virtual space providing system (Jarvinen’s Fig. 1 and par. 142, 252: see 100) having at least a client terminal used by a user (Jarvinen’s Fig.1 and par. 151: see 107, 102), the virtual space interface control method comprising: 
generating, by a computer (Jarvinen’s Fig. 1 and par. 1), display data (Jarvinen’s par. 155: display visual imagery content of the visual scene) for causing a display device of the client terminal (Jarvinen’s Fig. 1 and par. 155: see 110) to display an image showing a situation in a virtual space (Jarvinen’s par. 155: visual imagery in VR); 
generating, by the computer (Jarvinen’s Fig. 1 and par. 1), first sound data (Jarvinen’s par. 124: aural scene captured, par. 205: voice command) for outputting a user-uttered sound (Jarvinen’s par. 119, 124, 127: capture aural scene) picked up by a sound pickup device of the client terminal (Jarvinen’s par. 119, 124, 127: microphone) into the virtual space (Jarvinen’s par. 205: voice command); 
generating, by the computer (Jarvinen’s Fig. 1 and par. 1), second sound data (Jarvinen’s par. 155: present the spatial audio content of the audio scene) for causing a sound output device of the client terminal (Jarvinen’s Fig. 1 and par. 155: see 111) to output a sound in the virtual space (Jarvinen’s par. 155: audio content of the VR); and 
performing, by the computer (Jarvinen’s Fig. 1 and par. 1), control by differentiating at least one item of the display data (as shown in Jarvinen’s Figs. 8-9), the first sound data (as shown in Jarvinen’s Fig. 10 or voice command of par. 205), and the second sound data (Jarvinen’s Figs. 8-10 and par. 192-204: speech or audio) in accordance with a part of a face area where the user positions the user's hands on the basis of a gesture of positioning the user's hands at an area of the user's face (Jarvinen’s Fig. 11 and par. 207: hand to ear for locking audio scene, vs. hand to eyes to lock visual scene) photographed by a photographing device of the client terminal (Jarvinen’s par. 119, 205).
Jarvinen fails to disclose a positional relationship between the photographing device of the client terminal and the user's face.
However, in the same field of endeavor of VR interfaces, Osman discloses a user of an HMD (Osman’s Fig. 2B and par. 96) with a positional relationship between a photographic device and a user’s face (Osman’s Fig. 5 and par. 103-104: distance between face and camera of hand-held device) used to control the image viewed through an external hand held device (Osman’s Fig. 5).
Therefore, it would have been obvious to one of ordinary skill in the art, that Jarvinen also controls at least one item of the display data as a control target on the basis of the gesture (Jarvinen’s Figs. 8-9), but also on the basis of the distance between the photographing device and the user’s face (Osman’s Fig. 5), in order to obtain the benefit of enabling the user to see more or less content when using an external (Osman’s par. 102) and because Jarvinen already discloses zooming content based on distance (Osman’s Fig. 10).
By doing such combination, Jarvinen in view of Osman disclose:
performing, by the computer (Jarvinen’s Fig. 1 and par. 1), control by differentiating at least one item of the display data (as shown in Jarvinen’s Figs. 8-9), the first sound data (as shown in Jarvinen’s Fig. 10 or voice command of par. 205), and the second sound data (Jarvinen’s Figs. 8-10 and par. 192-204: speech or audio) in accordance with a part of a face area where the user positions the user's hands on the basis of a gesture of positioning the user's hands at an area of the user's face (Jarvinen’s Fig. 11 and par. 207: hand to ear for locking audio scene, vs. hand to eyes to lock visual scene) photographed by a photographing device of the client terminal (Jarvinen’s par. 119, 205) and a positional relationship between the photographing device of the client terminal and the user's face upon combination with Osman’s Figs. 2B, 5 and par. 96, 103-104: the user can zoom-in or out of an observed image by the distance between a camera and the face of the user).

Allowable Subject Matter
Claims 2-3, 5-6 and 8 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Regarding dependent claim 2, the prior art fails to disclose ALL limitations of claim 1, in addition to “wherein the display data generating unit controls at least one of enlargement and reduction of the image showing the situation in the virtual space displayed by the display device of the client terminal on the basis of an action of the user, who is photographed by the photographing device of the client terminal, placing the user's hands over the user's eyes and a distance between the photographing device of the client terminal and the user's face, wherein the sound data generating unit controls a volume of the sound in the virtual space output by the sound output device of the client terminal on the basis of an action of the user, who is photographed by the photographing device of the client terminal, placing the user's hands at the user's ears and the distance between the photographing device of the client terminal and the user's face, and wherein the sound data generating unit controls a volume of the user-uttered sound picked up by the sound pickup device of the client terminal and output into the virtual space on the basis of an action of the user, who is photographed by the photographing device of the client terminal, placing the user's hand at the user's mouth and the distance between the photographing device of the client terminal and the user's face”.

Regarding dependent claim 3, the prior art fails to disclose ALL limitations of claim 1, in addition to “wherein the display data generating unit controls a position corresponding to the image displayed by the display device of the client terminal as a position in the virtual space on the basis of an action of the user, who is photographed by the photographing device of the client terminal, placing the user's hands over the user's eyes and an orientation of the user's face relative to the photographing device of the client terminal, wherein the sound data generating unit controls a direction of arrival of the sound from the virtual space output by the sound output device of the client terminal on the basis of an action of the user, who is photographed by the photographing device of the client terminal, placing the user's hands at the user's ears and the orientation of the user's face relative to the photographing device of the client terminal, and wherein the sound data generating unit controls a direction in which the sound uttered by the user is output to the virtual space on the basis of an action of the user, who is photographed by the photographing device of the client terminal, placing the user's hand at the user's mouth and the orientation of the user's face relative to the photographing device of the client terminal”.

Regarding dependent claim 5, the prior art fails to disclose ALL limitations of claim 4, in addition to “wherein the sound data generating unit controls a volume of the sound in the virtual space output by the sound output device of the client terminal on the basis of an action of the user, who is photographed by the photographing device of the client terminal, placing the user's hands at the user's ears and a distance between the photographing device of the client terminal and the user's face”.

Regarding dependent claim 6, the prior art fails to disclose ALL limitations of claim 4, in addition to “wherein the sound data generating unit controls a volume of the user-uttered sound  picked up by the sound pickup device of the client terminal and output into the virtual space on the basis of an action of the user , who is photographed by the photographing device of the client terminal, placing the user's hand at the user's mouth and the distance between the photographing device of the client terminal and the user's face”.

Regarding dependent claim 8, the prior art fails to disclose ALL limitations of claim 4, in addition to “wherein the sound data generating unit controls a direction in which the sound uttered by the user is output to the virtual space on the basis of an action of the user, who is photographed by the photographing device of the client terminal, placing the user's hand at the user's mouth and the orientation of the user's face relative to the photographing device of the client terminal”.

The closest prior art to Jarvinen and Osman fail to disclose the volume control as described in claims 2, 5 or 6.
The closest prior art to Jarvinen disclose control of sound direction (Fig. 10) but is not based on placing the user’s hand at the user’s mouth as necessary for claims 3 or 8.
	Nor does any other prior art disclose ALL features as claimed.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Liliana Cerullo whose telephone number is (571)270-5882. The examiner can normally be reached 8AM to 3PM MT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Amr Awad can be reached at 571-272-7764. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/LILIANA CERULLO/Primary Examiner, Art Unit 2621
Read full office action
Prosecution Timeline

Mar 25, 2025
Application Filed
Mar 09, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/429,637
Patent 12602105
SYSTEMS AND METHODS FOR RENDERING AUGMENTED REALITY CONTENT
2y 5m to grant Granted Apr 14, 2026
18/926,045
Patent 12602120
ELECTRONIC PEN HAVING KNOCK MECHANISM TO PUSH AND RETRACT ELECTRONIC PEN MAIN BODY OUT OF AND INTO PEN HOUSING
2y 5m to grant Granted Apr 14, 2026
18/943,093
Patent 12602129
TOUCH CONTROL STRUCTURE AND DISPLAY APPARATUS WITH TOUCH SIGNAL LINES WITH DOUBLE-LAYER REGION IN A CORNER AREA
2y 5m to grant Granted Apr 14, 2026
18/473,196
Patent 12596472
METHODS FOR DISPLAYING A VISUAL INDICATION IN A USER INTERFACE BASED ON USER INTERACTION
2y 5m to grant Granted Apr 07, 2026
18/746,420
Patent 12596471
DEVICE AND METHOD WITH TRAINED NEURAL NETWORK TO IDENTIFY TOUCH INPUT
2y 5m to grant Granted Apr 07, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
74%
Grant Probability
96%
With Interview (+21.5%)
2y 6m
Median Time to Grant
Low
PTA Risk
Based on 944 resolved cases by this examiner. Grant probability derived from career allow rate.
VIRTUAL SPACE INTERFACE DEVICE, CLIENT TERMINAL, COMPUTER READABLE NON-TRANSITORY STORAGE MEDIUM STORING PROGRAM, AND VIRTUAL SPACE INTERFACE CONTROL METHOD

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email