Last updated: April 19, 2026
Application No. 18/691,226
VIDEO GENERATION DEVICE, VIDEO GENERATION METHOD, AND PROGRAM

Non-Final OA §101§103
Filed
Mar 12, 2024
Examiner
OSIFADE, IDOWU O
Art Unit
2675
Tech Center
2600 — Communications
Assignee
NTT Docomo Inc.
OA Round
1 (Non-Final)
Interview Optional

— +12.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 671 resolved cases, 2023–2026
Examiner Intelligence

OSIFADE, IDOWU O View full profile →
Grants 81% — above average
Career Allow Rate
545 granted / 671 resolved
+19.2% vs TC avg
Moderate +12% lift
Without
With
+12.4%
Interview Lift
resolved cases with interview
Fast prosecutor
2y 2m
Avg Prosecution
18 currently pending
Career history
689
Total Applications
across all art units
Statute-Specific Performance

§101
11.7%
-28.3% vs TC avg
§103
59.9%
+19.9% vs TC avg
§102
11.8%
-28.2% vs TC avg
§112
14.0%
-26.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 671 resolved cases
Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

DETAILED ACTION
Claims 1 – 7 are pending in this application. Claims 1 and 4 are independent.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claim(s) 5 – 7 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim(s) do not fall within at least one of the four categories of patent eligible subject matter because claim(s) 5 – 7 disclose statutory and non-statutory embodiments (under the broadest reasonable interpretation of the claim when read in light of the specification and in view of one skilled in the art) and embrace subject matter that is not eligible for patent protection and therefore is directed to non-statutory subject matter.

Claim(s) 5 – 7 recite "…A program…", and Inventor(s) (or (pre-AlA) Applicant(s))'s filed Specification fails to limit the medium to the statutory embodiments. As a result, neither the claim nor the disclosure limits the medium to the statutory embodiments. The broadest reasonable interpretation of a claim drawn to a computer readable medium (also called machine readable medium and other such variations) typically covers forms of non-transitory tangible media and transitory propagating signals per se in view of the ordinary and customary meaning of computer readable media. See MPEP 2111.01. When the broadest reasonable interpretation of a claim covers a signal per se, the claim must be rejected under 35 U.S.C. § 101 as covering non-statutory subject matter. See In re Nuijten, 500 F.3d 1346, 1356-57 (Fed. Cir. 2007) (transitory embodiments are not directed to statutory subject matter) and Interim Examination Instructions for Evaluating Subject Matter Eligibility Under 35 U.S.C. § 101, Aug. 24,2009; p. 2.. In an effort to assist the Inventor(s) (or (pre-AlA) Applicant(s)) in overcoming a rejection or potential rejection under 35 U.S.C. § 101 in this situation, the Examiner suggests the following approach: claims drawn to such a computer readable medium that cover both transitory and non-transitory embodiments may be amended to narrow the claim(s) to cover only statutory embodiments to avoid a rejection under 35 U.S.C. § 101 by adding the limitation "non-transitory" to the claim(s).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1 – 7 are rejected under 35 U.S.C. 103 as being unpatentable over Kundu, Malay (US-20220030179-A1, hereinafter simply referred to as Malay) in view of ROSS, David (US-20180324229-A1, hereinafter simply referred to as David).

Regarding independent claim(s) 1 and 4, Malay teaches:
A video generation method (e.g., FIG. 2 of Malay) for use in a video generation device (e.g., imaging unit 102 (FIG. 1) of Malay), comprising: a capturing step in which a presenter (e.g., first actor of Malay) wearing a wearable device (e.g., display unit 110 (FIG. 1) of Malay) including a camera CP (See at least Malay, ¶ [0046]; FIGS. 1, 2; "…display unit 110 may be augmented reality glasses, virtual reality headset, display presented by other wearables, or any other devices that display image(s) or video(s) for visual presentation…") and an acquirer (e.g., second actor of Malay) wearing a wearable device (e.g., display unit 110 (FIG. 1) of Malay) including a camera CL and a transparent display (e.g., transparent portion of Malay) facing each other and capturing images of each other with each other's cameras (See at least Malay, ¶ [0036]; FIGS. 1 – 4; "…The imaging unit may be facing to capture a front of an actor, a face of an actor, a side view of an actor, a top view of an actor, a rear view of an actor, a perspective view of an actor, and the like. Similarly, the imaging unit can zoom in and out to vary the captured area of the actor…"); a corresponding points (e.g., FIG. 4 of Malay) obtaining step in which the video generation device estimates a frame of the acquirer from a video obtained with the camera Cp (See at least Malay, ¶ [0047, 0070]; FIGS. 1 – 4, 8; "…FIG. 4 shows example scenes 402 404 from an output video presentable using a display unit…The actor 406 is presented in the actor layer where the subject 408 of the subject layer is unobstructed by the actor…As shown, each of the scene layers can include transparent portions such that the output video plays a multilayer scene depicting the actor 406 and the subject 408 of the subject layer over the background layer 410…", "…changes in the lateral or vertical position of the actor within its actor layer are triggered by recognition of gestures performed or exercised by the actor. The gesture performed by the actor (i.e., a user or a person captured by a camera) is captured by the imaging unit…this feature of the present disclosure is provided in an application of a video gaming environment where the actor's image or video is portrayed…Two actors 808 804 are depicted in this scene 800…"), obtains a set FL of one or more corresponding points representing the frame of the acquirer which is estimated  (See at least Malay, ¶ [0062]; FIGS. 1 – 4, 8; "…the neural network is trained using a training set of images containing annotated human (i.e. actor) faces and body parts. Once trained, the neural network is used as a classifier by which it can tag, in a binary matter, which regions of the image are most likely part of a human face or body parts. The identified regions are considered the image of the actor which can be extracted from the video stream or the data feed capturing the actor on a frame by frame basis…").
Malay teaches the subject matter of the claimed inventive concept as expressed in the rejections above.
But, Malay does not expressly disclose the concept of estimating a frame of the presenter from a video obtained with the camera CL, and obtaining a set FP of one or more corresponding points representing the frame of the presenter which is estimated; a video generating step in which the video generation device generates, based on the set FL and the set FP, a semi-transparent video representing a posture of the acquirer such that the semi-transparent video matches a posture of the presenter which is transparently seen through the transparent display; and a displaying step in which the semi-transparent video is displayed on the transparent display.
Nevertheless, David teaches the concept of estimating a frame of the presenter from a video obtained with the camera CL, and obtaining a set FP of one or more corresponding points representing the frame of the presenter which is estimated (See at least David, ¶ [0046, 0047, 0070]; FIGS. 1, 2; "…the assistance content includes different types of content that is presented to the first user via the first user device (e.g., using the techniques that are described elsewhere herein). Examples of different types of content include: (i) visual or audio content generated by the second user as described elsewhere herein; (ii) one or more movements or gestures made by the second user that are presented to the first user…", "…one or more movements or gestures of the second user can be captured using a camera (e.g., an AR device)…Such gestures or movements can be correlated…In one embodiment, virtual representations of the gestures or movements are depicted on a display of the first user device…", "…The augment reality headset connects to a server component which facilitates a connection to a remote expert wearing a virtual reality headset. Video is streamed from the field technician to the remote expert…"); a video generating step in which the video generation device generates, based on the set FL and the set FP, a semi-transparent video (e.g., transparent/ video see-through display of David) representing a posture of the acquirer such that the semi-transparent video matches a posture of the presenter which is transparently seen through the transparent display (e.g., transparent/ video see-through display of David) (See at least David, ¶ [0046, 0047, 0070]; FIGS. 1, 2; "…the assistance content includes different types of content that is presented to the first user via the first user device (e.g., using the techniques that are described elsewhere herein). Examples of different types of content include: (i) visual or audio content generated by the second user as described elsewhere herein; (ii) one or more movements or gestures made by the second user that are presented to the first user…", "…one or more movements or gestures of the second user can be captured using a camera (e.g., an AR device)…Such gestures or movements can be correlated…In one embodiment, virtual representations of the gestures or movements are depicted on a display of the first user device…", "…The augment reality headset connects to a server component which facilitates a connection to a remote expert wearing a virtual reality headset. Video is streamed from the field technician to the remote expert…"); and a displaying step in which the semi-transparent video is displayed on the transparent display (e.g., transparent/ video see-through display of David) (See at least David, ¶ [0046, 0047, 0070]; FIGS. 1, 2; "…the assistance content includes different types of content that is presented to the first user via the first user device (e.g., using the techniques that are described elsewhere herein). Examples of different types of content include: (i) visual or audio content generated by the second user as described elsewhere herein; (ii) one or more movements or gestures made by the second user that are presented to the first user…", "…one or more movements or gestures of the second user can be captured using a camera (e.g., an AR device)…Such gestures or movements can be correlated…In one embodiment, virtual representations of the gestures or movements are depicted on a display of the first user device…", "…The augment reality headset connects to a server component which facilitates a connection to a remote expert wearing a virtual reality headset. Video is streamed from the field technician to the remote expert…").
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use and apply the known technique of estimating a frame of the presenter from a video obtained with the camera CL, and obtaining a set FP of one or more corresponding points representing the frame of the presenter which is estimated; a video generating step in which the video generation device generates, based on the set FL and the set FP, a semi-transparent video representing a posture of the acquirer such that the semi-transparent video matches a posture of the presenter which is transparently seen through the transparent display; and a displaying step in which the semi-transparent video is displayed on the transparent display as disclosed in the device of David to modify and improve the known and similar device of Malay for the desirable and advantageous purpose of enabling an expert to assist with a resolution to a problem and/or on premise training, thus reducing the length of time it takes to solve a problem, as discussed in David (See ¶ [0052]); thereby, achieving the predictable result of improving the overall efficiency and speed of the system with a reasonable expectation of success while enabling others skilled in the art to best utilize the invention along with various implementations and modifications as are suited to the particular use contemplated.

Regarding dependent claim 2, Malay modified by David above teaches:
wherein the set FL and the set Fr include corresponding points consisting of intersection points of a neck and a trunk corresponding points consisting of intersection points of the trunk and legs (See at least Malay, ¶ [0036, 0062]; FIGS. 1 – 4, 8; "…The imaging unit may be facing to capture a front of an actor, a face of an actor, a side view of an actor, a top view of an actor, a rear view of an actor, a perspective view of an actor, and the like. Similarly, the imaging unit can zoom in and out to vary the captured area of the actor…", "…the neural network is trained using a training set of images containing annotated human (i.e. actor) faces and body parts. Once trained, the neural network is used as a classifier by which it can tag, in a binary matter, which regions of the image are most likely part of a human face or body parts. The identified regions are considered the image of the actor which can be extracted from the video stream or the data feed capturing the actor on a frame by frame basis…" Also, see at least Malay, ¶ [0071, 0074, 0075] and David, ¶ [0012, 0013, 0046, 0047, 0070]; FIGS. 1, 2), the video generation part enlarges or reduces a distance DL between the corresponding points consisting of the intersection points of the neck and the trunk and the corresponding points consisting of the intersection points of the trunk and the legs included in the set FL such that the distance DL matches a distance DP between the corresponding points consisting of the intersection points of the neck and the trunk and the corresponding points consisting of the intersection points of the trunk and the legs included in the set FP (See at least Malay, ¶ [0059, 0064]; FIGS. 1 – 4, 8; "…depth information is extracted from the input data and used to position layers relative to each other. The depth information can be extracted from the input data or the data feed. Once depth information related to the scene has been extracted, portions of the scene are placed in different layers in accordance to their distance (depth) from the camera. These portions are assigned to one of the layers defined in the scene. Background layers and subject layers are both instances of the layers within the scene. The actor layer can be positioned in between the layers in the scene in accordance with the distance of the at least one actor from the camera. As such, the actor can be seen to be moving back and forth among image portions displayed in the various layers of the scene. And the actor can interact with the subjects positioned in any of these layers…", "…when combining more than one actor into a shared space, the distance between the imaging unit (e.g., a camera) of the actor can be used to modulate the scale of that actor when placed into a local space of another actor. More specifically, the actor is made larger when coming closer to the camera and made smaller when moving further from the camera…" Also, see at least Malay, ¶ [0071, 0074, 0075] and David, ¶ [0012, 0013, 0046, 0047, 0070]; FIGS. 1, 2), and generates the semi-transparent video representing the posture of the acquirer such that the corresponding points consisting of the intersection points of the neck and the trunk in the distance DL which is enlarged or reduced match the corresponding points consisting of the intersection points of the neck and the trunk included in the set FP (See at least Malay, ¶ [0059, 0064]; FIGS. 1 – 4, 8; "…depth information is extracted from the input data and used to position layers relative to each other. The depth information can be extracted from the input data or the data feed. Once depth information related to the scene has been extracted, portions of the scene are placed in different layers in accordance to their distance (depth) from the camera. These portions are assigned to one of the layers defined in the scene. Background layers and subject layers are both instances of the layers within the scene. The actor layer can be positioned in between the layers in the scene in accordance with the distance of the at least one actor from the camera. As such, the actor can be seen to be moving back and forth among image portions displayed in the various layers of the scene. And the actor can interact with the subjects positioned in any of these layers…", "…when combining more than one actor into a shared space, the distance between the imaging unit (e.g., a camera) of the actor can be used to modulate the scale of that actor when placed into a local space of another actor. More specifically, the actor is made larger when coming closer to the camera and made smaller when moving further from the camera…" Also, see at least Malay, ¶ [0071, 0074, 0075] and David, ¶ [0012, 0013, 0046, 0047, 0070]; FIGS. 1, 2), and the corresponding points consisting of the intersection points of the trunk and the legs in the distance DL which is enlarged or reduced match the corresponding points consisting of the intersection points of the trunk and the legs included in the set Fr (See at least Malay, ¶ [0047, 0070]; FIGS. 1 – 4, 8; "…FIG. 4 shows example scenes 402 404 from an output video presentable using a display unit…The actor 406 is presented in the actor layer where the subject 408 of the subject layer is unobstructed by the actor…As shown, each of the scene layers can include transparent portions such that the output video plays a multilayer scene depicting the actor 406 and the subject 408 of the subject layer over the background layer 410…", "…changes in the lateral or vertical position of the actor within its actor layer are triggered by recognition of gestures performed or exercised by the actor. The gesture performed by the actor (i.e., a user or a person captured by a camera) is captured by the imaging unit…this feature of the present disclosure is provided in an application of a video gaming environment where the actor's image or video is portrayed…Two actors 808 804 are depicted in this scene 800…" Also, see at least Malay, ¶ [0071, 0074, 0075] and David, ¶ [0012, 0013, 0046, 0047, 0070]; FIGS. 1, 2).

Regarding dependent claim 3, Malay modified by David above teaches:
wherein the set FL and the set FP include four corresponding points consisting of apexes of a plane surface having a quadrangle shape copying the trunk (See at least Malay, ¶ [0062]; FIGS. 1 – 4, 8; "…the neural network is trained using a training set of images containing annotated human (i.e. actor) faces and body parts. Once trained, the neural network is used as a classifier by which it can tag, in a binary matter, which regions of the image are most likely part of a human face or body parts. The identified regions are considered the image of the actor which can be extracted from the video stream or the data feed capturing the actor on a frame by frame basis…" Also, see at least Malay, ¶ [0071, 0074, 0075] and David, ¶ [0012, 0013, 0046, 0047, 0070]; FIGS. 1, 2), and the video generation part generates the semi-transparent video representing the posture of the acquirer such that the plane surface formed of the four corresponding points included in the set FL matches the plane surface formed of the four corresponding points included in the set Fr (See at least Malay, ¶ [0062]; FIGS. 1 – 4, 8; "…the neural network is trained using a training set of images containing annotated human (i.e. actor) faces and body parts. Once trained, the neural network is used as a classifier by which it can tag, in a binary matter, which regions of the image are most likely part of a human face or body parts. The identified regions are considered the image of the actor which can be extracted from the video stream or the data feed capturing the actor on a frame by frame basis…" Also, see at least Malay, ¶ [0071, 0074, 0075] and David, ¶ [0012, 0013, 0046, 0047, 0070]; FIGS. 1, 2).

Regarding dependent claim 5, Malay modified by David above teaches:
A program for causing a computer to function as the video generation device according to claim 1 (See at least Malay, ¶ [0046]; FIGS. 1, 2; "…display unit 110 may be augmented reality glasses, virtual reality headset, display presented by other wearables, or any other devices that display image(s) or video(s) for visual presentation…" Also, see at least Malay, ¶ [0071, 0074, 0075] and David, ¶ [0012, 0013, 0046, 0047, 0070]; FIGS. 1, 2).

Regarding dependent claim 6, Malay modified by David above teaches:
A program for causing a computer to function as the video generation device according to claim 2 (See at least Malay, ¶ [0046]; FIGS. 1, 2; "…display unit 110 may be augmented reality glasses, virtual reality headset, display presented by other wearables, or any other devices that display image(s) or video(s) for visual presentation…" Also, see at least Malay, ¶ [0071, 0074, 0075] and David, ¶ [0012, 0013, 0046, 0047, 0070]; FIGS. 1, 2).

Regarding dependent claim 7, Malay modified by David above teaches:
A program for causing a computer to function as the video generation device according to claim 3 (See at least Malay, ¶ [0046]; FIGS. 1, 2; "…display unit 110 may be augmented reality glasses, virtual reality headset, display presented by other wearables, or any other devices that display image(s) or video(s) for visual presentation…" Also, see at least Malay, ¶ [0071, 0074, 0075] and David, ¶ [0012, 0013, 0046, 0047, 0070]; FIGS. 1, 2).

Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure: See the Notice of References Cited (PTO–892)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to IDOWU O OSIFADE whose telephone number is (571)272-0864. The Examiner can normally be reached on Monday-Friday 8:00am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the Examiner’s Supervisor, ANDREW MOYER can be reached on (571) 272 – 9523. The fax phone number for the organization where this application or proceeding is assigned is (571) 273 – 8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. 
Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at (866) 217 – 9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call (800) 786 – 9199 (IN USA OR CANADA) or (571) 272 – 1000.

/IDOWU O OSIFADE/Primary Examiner, Art Unit 2675
Read full office action
Prosecution Timeline

Mar 12, 2024
Application Filed
Feb 02, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/323,654
Patent 12604780
RADIO FREQUENCY MODULE AND COMMUNICATION DEVICE
2y 5m to grant Granted Apr 14, 2026
18/157,034
Patent 12597265
OCCLUSION RESOLVING GATED MECHANISM FOR SENSOR FUSION
2y 5m to grant Granted Apr 07, 2026
18/062,228
Patent 12592083
SYSTEMS AND METHODS FOR CONTROLLING A VEHICLE BY DETECTING AND TRACKING OBJECTS THROUGH ASSOCIATED DETECTIONS
2y 5m to grant Granted Mar 31, 2026
18/030,930
Patent 12587837
Secure Broadcast From One To Many Devices
2y 5m to grant Granted Mar 24, 2026
18/044,495
Patent 12587936
CONDITIONAL HANDOVER
2y 5m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
81%
Grant Probability
94%
With Interview (+12.4%)
2y 2m
Median Time to Grant
Low
PTA Risk
Based on 671 resolved cases by this examiner. Grant probability derived from career allow rate.