Last updated: July 17, 2026

Application No. 18/733,802

LOCATIONS OF MEDIA CONTROLS FOR MEDIA CONTENT AND CAPTIONS FOR MEDIA CONTENT IN THREE-DIMENSIONAL ENVIRONMENTS

Final Rejection §103

Filed

Jun 04, 2024

Priority

Jun 04, 2023 — provisional 63/506,122 +2 more

Examiner

ZALALEE, SULTANA MARCIA

Art Unit

2614

Tech Center

2600 — Communications

Assignee

Apple Inc.

OA Round

2 (Final)

Interview Optional

— +15.1% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 71% grant rate with +15.1% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.

Based on 499 resolved cases, 2023–2026

Examiner Intelligence

ZALALEE, SULTANA MARCIA View full profile →

Grants 71% — above average

Career Allowance Rate

356 granted / 499 resolved

+9.3% vs TC avg

Strong +15% interview lift

Without

With

+15.1%

Interview Lift

resolved cases with interview

Typical timeline

2y 7m

Avg Prosecution

26 currently pending

Career history

528

Total Applications

across all art units

Statute-Specific Performance

§101

2.1%

-37.9% vs TC avg

§103

78.4%

+38.4% vs TC avg

§102

2.4%

-37.6% vs TC avg

§112

3.4%

-36.6% vs TC avg

Black line = Tech Center average estimate • Based on career data from 499 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
This office action is responsive to the communication filed on 04/29/2026. As an initial matter, the previous claim interpretation under 35 USC 112 f set forth in the previous office action have been withdrawn in view of Applicant's arguments in pages 15-16.
Applicant's remaining arguments regarding the 35 USC 103 rejections with respect to amended limitation of claims 19-35 have been fully considered but they are moot in view of the new ground(s) of rejection necessitated by the amendment.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 19-35 are rejected under 35 U.S.C. 103 as being unpatentable over Faulkner et al (US 20220091722 A1), in view of Kurzhals et al (Kurzhals K, Göbel F, Angerbauer K, Sedlmair M, Raubal M. A view on the viewer: Gaze-adaptive captions for videos. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems 2020 Apr 21 (pp. 1-12).), and further in view of Hong et al (US 20220360764 A1).
RE claim 19, Faulkner teaches A method comprising: at a computer system in communication with a display generation component and one or more input devices (Figs 1-3, [0043]): 
while a three-dimensional environment is visible via the display generation component, concurrently displaying, via the display generation component: media content, wherein the media content is displayed from a first viewpoint of a user relative to the three-dimensional environment, and a user interface element that includes one or more captions for the media content, wherein the user interface element has a first spatial relationship relative to a spatial reference in the media content in the three-dimensional environment (Figs 7-8, [0107]-[0108], [0117], [0122], [0209]); 
while concurrently displaying the media content from the first viewpoint of the user relative to the three-dimensional environment and the user interface element  having the first spatial relationship relative to the spatial reference in the media content in the three-dimensional environment, detecting, via the one or more input devices, an event corresponding to a change of a viewpoint of the user relative to the three-dimensional environment (Figs 6-8, [0110], [0117], [0152]); and 
in response to detecting the event corresponding to the change of the viewpoint of the user relative to the three-dimensional environment from the first viewpoint of the user to a second viewpoint of the user relative to the three-dimensional environment, wherein the second viewpoint of the user is different from the first viewpoint of the user: in accordance with an immersive media content: displaying the user interface element having a second spatial relationship relative to the spatial reference in the media content, different from the first spatial relationship relative to the spatial reference in the media content, based on the change of the viewpoint of the user (Figs 7, 8, [0040], [0110], [0117], [0152], [0186], [0200]).
Faulkner is silent RE: that includes one or more captions for the media content. However Kurzhals teaches in abstract, Figs 1-2, page 2 col 2 wherein gaze input is used to adapt the position of the captions helping communicate information/audio in loud environments, hearing impairments, unknown languages etc.
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to include in Faulkner a system and method that includes one or more captions for the media content, as suggested by Kurzhals, as this doesn’t change the overall operation of the system, and it could be used to provide the captions with comfortable view and thereby increasing system effectiveness and user experience.
Faulkner as modified by Kurzhals is silent RE: a determination that the media content is immersive media content; and in accordance with a determination that the media content is non-immersive media content: maintaining the display of the user interface element including the one or more captions for the media content having the first spatial relationship relative to the spatial reference in the media content.
However Hong teaches selecting or switching operating mode in 2D or 3D based on determining whether the content is a 2D or 3D image in [0079] “the display modes of the wearable electronic device 300 may also include a 2D mode for displaying 2D images, as well as the 3D mode. If the wearable electronic device 300 operates in the 3D mode or the display mode is switched from the 2D mode to the 3D mode, the wearable electronic device 300 may send a request for conversion of 3D images to the mobile electronic device 350 that interoperates with the wearable electronic device 300. The mobile electronic device 350 may determine whether currently provided content is a 2D image or a 3D image. If the currently provided content is determined as a 3D image, the mobile electronic device 350 may transmit information of colors corresponding to each of the left eye and the right eye to the wearable electronic device 300. If the currently provided content is determined as a 2D image, the mobile electronic device 350 may perform image processing to convert the 2D image into a 3D image.”. This is readily available or can be equally applied to display the captions based on the content being as 3D/immersive or 2D/non-immersive as Faulker teaches operating or switching within immersive and non-immersive modes for immersive movie/video and non-immersive contents in [0304], [0330] [0345], to provide an appropriate user experience.
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to include in Faulkner as modified by Kurzhals a system and method ta determination that the media content is immersive media content; and in accordance with a determination that the media content is non-immersive media content: maintaining the display of the user interface element including the one or more captions for the media content having the first spatial relationship relative to the spatial reference in the media content, as set forth above applying Hong, as this doesn’t change the overall operation of the system, and it could be used to provide better user experience according to the type of the media content.

RE claim 20, Faulkner as modified by Kurzhals and Hong teaches comprising: while the three-dimensional environment is visible from a respective viewpoint of the user and while displaying the user interface element including the one or more captions for the media content having a respective spatial relationship to the spatial reference in the media content, wherein the user interface element including the one or more captions for the media content is displayed at a first location in the three-dimensional environment, detecting, via the one or more input devices, an input to display a playback control user interface including one or more selectable options for performing one or more actions associated with the media content; and in response to detecting the input to display the playback control user interface including one or more selectable options for performing one or more actions associated with the media content: displaying, via the display generation component, the playback control user interface including one or more selectable options for performing one or more actions associated with the media content wherein the playback control user interface is spatially separated from the media content in the three-dimensional environment; and in accordance with a determination that one or more criteria is met, displaying, via the display generation component, the user interface element including the one or more captions for the media content at a second location, different from the first location, wherein when the user interface element including the one or more captions for the media content is at the second location, a spatial arrangement of the playback control user interface and the user interface element including the one or more captions for the media content is a first spatial arrangement ( Faulkner Figs 7E-F, K-L, AD, 9, [0122], [0240], [0242] and [0037], [0138] etc wherein user interface objects are decoupled from initial reference point and coupled to the users viewpoint/gaze based on specific input criteria. ).

RE claim 21, Faulkner as modified by Kurzhals and Hong teaches wherein the one or more criteria includes a requirement that attention of the user is directed to the playback control user interface in order for the one or more criteria to be met (Faulkner Figs 6, 7E-F, K-L, AD, 9, [0122], [0240], [0242] and [0037], [0138] etc wherein user interface objects are decoupled from initial reference point and coupled to the users viewpoint/gaze based on specific input criteria. In addition Kurzhals abstract, Figs 1-2, page 2 col 2 etc. wherein gaze input is used to adapt the position of the captions).

RE claim 22, Faulkner as modified by Kurzhals and Hong teaches comprising: while the three-dimensional environment is visible via the display generation component from the respective viewpoint of the user, while displaying, at the second location in the three-dimensional environment, the user interface element including the one or more captions for the media content, and while displaying the playback control user interface, wherein the user interface element and the playback control user interface have the first spatial arrangement, detecting, via the one or more input devices, a second event corresponding to a second change of the viewpoint of the user relative to the three-dimensional environment; and in response to detecting the second event corresponding to the second change of the viewpoint of the user relative to the three-dimensional environment, continuing display, via the display generation component, of the user interface element including the one or more captions for the media content at the second location in the three-dimensional environment and the playback control user interface, with the spatial arrangement of the user interface element including the one or more captions for the media content and the playback control user interface being the first spatial arrangement (Faulkner 7, 8, [0110], [0117], [0152], [0200]. In addition Kurzhals abstract, Figs 1-2, page 2 col 2 etc. wherein gaze input is used to adapt the position of the captions).

RE claim 23, Faulkner as modified by Kurzhals and Hong teaches comprising: after continuing display, via the display generation component, of the user interface element including the one or more captions for the media content at the second location and the playback control user interface with the spatial arrangement of the user interface element including the one or more captions for the media content and the playback control user interface being the first spatial arrangement, and in accordance with a determination that one or more second criteria is met: reducing a visual prominence of the playback control user interface; initiating a process to display, via the display generation component, the user interface element including the one or more captions for the media content at a location in the three-dimensional environment that is different from the second location in the three-dimensional environment, having a third spatial relationship relative to the spatial reference in the media content, based on the second change of viewpoint of the user relative to the three-dimensional environment, including displaying, via the display generation component, the user interface element including the one or more captions for the media content at the location in the three-dimensional environment having the third spatial relationship relative to the spatial reference in the media content (Faulkner [0317], [0347], [0159]).

RE claim 24, Faulkner as modified by Kurzhals and Hong teaches wherein: initiating the process to display, via the display generation component, the user interface element including the one or more captions for the media content at the location in the three-dimensional environment having the third spatial relationship relative to the spatial reference in the media content includes: reducing a visual prominence of the user interface element including the one or more captions for the media content that was displayed at the second location with the spatial arrangement of the user interface element including the one or more captions for the media content and the playback control user interface being the first spatial arrangement; and after reducing the visual prominence of the user interface element including the one or more captions for the media content that was displayed at the second location with the spatial arrangement of the user interface element including the one or more captions for the media content and the playback control user interface being the first spatial arrangement, displaying, via the display generation component, the user interface element including the one or more captions for the media content at the location in the three-dimensional environment having the third spatial relationship relative to the spatial reference in the media content (Faulkner  Figs 7, 8, [0110], [0117], [0152], [0200], [0317], [0347], [0159]).
RE claim 25, Faulkner as modified by Kurzhals and Hong teaches wherein displaying, via the display generation component, the user interface element including the one or more captions for the media content at the location in the three-dimensional environment having the third spatial relationship relative to the spatial reference in the media content includes: increasing the visual prominence of the user interface element including the one or more captions for the media content while visually moving the user interface element including the one or more captions for the media content towards the location in the three-dimensional environment having the third spatial relationship relative to the spatial reference in the media content (Faulkner  Figs 7, 8, [0110], [0117], [0152], [0200], [0317], [0347], [0159]).

RE claim 26, Faulkner as modified by Kurzhals and Hong teaches wherein displaying the user interface element including the one or more captions for the media content having the second spatial relationship relative to the spatial reference in the media content is performed in accordance with a determination that one or more criteria is met, wherein the one or more criteria includes a requirement that a threshold period of time has passed since detecting the event corresponding to the change of the viewpoint of the user relative to the three-dimensional environment from the first viewpoint of the user to the second viewpoint of the user relative to the three-dimensional environment in order for the one or more criteria to be met (Faulkner [0119], [0205], [0214] etc.).

RE claim 27, Faulkner as modified by Kurzhals and Hong teaches comprising: in response to detecting the event corresponding to the change of the viewpoint of the user relative to the three-dimensional environment from the first viewpoint of the user to the second viewpoint of the user: displaying, via the display generation component, the user interface element including the one or more captions for the media content at a first respective location in the three-dimensional environment from the second viewpoint of the user, with a respective amount of visual prominence, wherein the one or more captions are displayed as moving towards a second respective location in the three-dimensional environment; and after displaying, via the display generation component, the user interface element including the one or more captions for the media content having the first respective location in the three-dimensional environment from the second viewpoint of the user and with the respective amount of visual prominence, displaying, via the display generation component, the user interface element including the one or more captions for the media content having the second respective location in the three-dimensional environment from the second viewpoint of the user, different from the first respective location in the three-dimensional environment from the second viewpoint of the user, with an amount of visual prominence that is greater than the respective amount of visual prominence, and wherein when the user interface element has the second respective location in the three-dimensional environment from the second viewpoint of the user, the user interface element including the one or more captions for the media content has the second spatial relationship relative to the spatial reference in the media content in the three-dimensional environment (Faulkner  Figs 7, 8, [0110], [0117], [0152], [0200], [0317], [0347], [0159]).
RE claim 28, Faulkner as modified by Kurzhals and Hong teaches wherein while displaying, via the display generation component, the user interface element including the one or more captions for the media content at the first respective location: in accordance with a determination that the change in viewpoint of the user relative to the three-dimensional environment from the first viewpoint of the user to the second viewpoint of the user is a first amount of change of viewpoint of the user, the respective amount of visual prominence is a first amount of visual prominence, and in accordance with a determination that the change in viewpoint of the user relative to the three-dimensional environment from the first viewpoint of the user to the second viewpoint of the user is a second amount of change of viewpoint of the user, greater than the first amount of change of viewpoint of the user, the respective amount of visual prominence is a second amount of visual prominence that is less than the first amount of visual prominence (Faulkner  Figs 7, 8, [0110], [0117], [0152], [0200], [0317], [0347], [0159] wherein  the amount of fading or shrinking corresponds to an amount of change in the current distance between the first control and the first object).
RE claim 29, Faulkner as modified by Kurzhals and Hong teaches comprising: in response to detecting the event corresponding to the change of the viewpoint of the user relative to the three-dimensional environment from the first viewpoint of the user to the second viewpoint of the user: in accordance with a determination that the change in viewpoint of the user relative to the three-dimensional environment from the first viewpoint of the user to the second viewpoint of the user is greater than a threshold change in viewpoint of the user, forgoing display of the user interface element including the one or more captions for the media content via the display generation component at the first respective location  (Faulkner  Figs 7, 8, [0110], [0117], [0152], [0200], [0262]).
RE claim 30, Faulkner as modified by Kurzhals and Hong teaches wherein: while displaying the user interface element including the one or more captions for the media content having the second spatial relationship relative to the spatial reference in the media content in the three-dimensional environment, the user interface element including the one or more captions for the media content is displayed at a first location in three-dimensional environment that, in a view of the three-dimensional environment from the second viewpoint of the user, is offset from a center of the second viewpoint of the user (Faulkner  Figs 7 K-L, AD, [0194], [0239], [0274] etc).

RE claim 31, Faulkner as modified by Kurzhals and Hong teaches comprising: in response to detecting the event corresponding to the change of the viewpoint of the user relative to the three-dimensional environment from the first viewpoint of the user to the second viewpoint of the user relative to the three-dimensional environment: displaying, via the display generation component, the media content, wherein the media content is displayed from the second viewpoint of the user relative to the three-dimensional environment, wherein a closest element of the media content to the second viewpoint of the user is at a first distance from the second viewpoint of the user, wherein the user interface element including the one or more captions for the media content is displayed at a second distance from the second viewpoint of the user that is less than the first distance from the second viewpoint of the user (Faulkner  Figs 7O-T, [0192]-[0193], [0310]).

RE claim 32, Faulkner as modified by Kurzhals and Hong teaches comprising: while displaying the media content: in accordance with a determination that attention of the user is directed to a respective user interface element of a first type, different from the user interface element of a second type including the one or more captions for the media content, reducing a visual prominence of the user interface element including the one or more captions for the media content (Faulkner  Figs 7S-T, [0276], [0205]-[0209], [0184] etc).

RE claim 33, Faulkner as modified by Kurzhals and Hong teaches wherein the media content is immersive media content, the method further comprising: displaying, via the display generation component, second media content that is non-immersive media content at a location in the three-dimensional environment; detecting, via the one or more input devices, an event corresponding to a trigger to display a user interface element that includes one or more captions for the second media content; and in response to detecting the event corresponding to the trigger to display the user interface element that includes the one or more captions for the media content, displaying, via the display generation component, the user interface element that includes the one or more captions for the media content at a location in the three-dimensional environment that is independent of a viewpoint of the user (Faulkner  Figs 7AA-AD, [0186], [0345], [0353]).
Claim 34 recites limitations similar in scope with limitations of claim 19 and therefore rejected under the same rationale. In addition Faulkner teaches A computer system that is in communication with a display generation component and one or more input devices, the computer system comprising: one or more processors; memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions (Figs 1-3, [0043]).
Claim 35 recites limitations similar in scope with limitations of claim 19 and therefore rejected under the same rationale. In addition Faulkner teaches A non-transitory computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of a computer system that is in communication with a display generation component and one or more input devices, cause the computer system to perform the method (Figs 1-3, [0043], [0060]).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See attached 892.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SULTANA MARCIA ZALALEE whose telephone number is (571)270-1411. The examiner can normally be reached Monday- Friday 8:00am-4:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kent Chang can be reached at (571)272-7667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Sultana M Zalalee/           Primary Examiner, Art Unit 2614

Read full office action

Prosecution Timeline

Jun 04, 2024

Application Filed

Jan 30, 2026

Non-Final Rejection mailed — §103

Apr 22, 2026

Applicant Interview (Telephonic)

Apr 22, 2026

Examiner Interview Summary

Apr 29, 2026

Response Filed

Jun 23, 2026

Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/744,775

Patent 12664922

DISPLAY CONTROL DEVICE, DISPLAY CONTROL METHOD, AND COMPUTER-READABLE STORAGE MEDIUM

2y 0m to grant Granted Jun 23, 2026

18/522,577

Patent 12651410

PARTS-BASED DECOMPOSITION OF HUMAN BODY FOR BLEND SHAPE PIPELINE INTEGRATION AND MUSCLE PRIOR CREATION

2y 6m to grant Granted Jun 09, 2026

18/509,426

Patent 12639881

GRAPHICS PROCESSORS

2y 6m to grant Granted May 26, 2026

18/509,679

Patent 12626462

GRAPHICS PROCESSORS

2y 5m to grant Granted May 12, 2026

18/540,462

Patent 12626449

RAY TRACING OF DISPLACED MICRO MESHES USING A BOUNDING PRISM HIERARCHY

2y 5m to grant Granted May 12, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4

Expected OA Rounds

71%

Grant Probability

86%

With Interview (+15.1%)

2y 7m (~6m remaining)

Median Time to Grant

Moderate

PTA Risk

Based on 499 resolved cases by this examiner. Grant probability derived from career allowance rate.