Last updated: April 19, 2026

Application No. 18/826,116

SYSTEMS AND METHODS FOR ROBUST MULTI-VIEW IMAGE TRANSLATION FOR ROBOTICS

Final Rejection §103

Filed

Sep 05, 2024

Examiner

LI, TRACY Y

Art Unit

2487

Tech Center

2400 — Computer Networks

Assignee

Honda Motor Co. Ltd.

OA Round

2 (Final)

Interview Optional

— +16.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 739 resolved cases, 2023–2026

Examiner Intelligence

LI, TRACY Y View full profile →

Grants 80% — above average

Career Allow Rate

594 granted / 739 resolved

+22.4% vs TC avg

Strong +16% interview lift

Without

With

+16.4%

Interview Lift

resolved cases with interview

Typical timeline

2y 10m

Avg Prosecution

25 currently pending

Career history

764

Total Applications

across all art units

Statute-Specific Performance

§101

8.3%

-31.7% vs TC avg

§103

66.6%

+26.6% vs TC avg

§102

12.7%

-27.3% vs TC avg

§112

6.3%

-33.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 739 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-6, 8-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 20250045952 A1 Popov; Alexander et al. (hereafter Popov), in view of US 20250022099 A1 Song; Yizhi et al. (hereafter Song), and further in view of US 20210056668 A1 Barnes; Connelly et al. (hereafter Barnes).
Regarding claim 1, Popov discloses A system for multi-view image translation (Fig.1), comprising: one or more image encoders that encode a plurality of images into a plurality of feature-encoded images in a feature space (Fig.1, [41]-[43], encoder compresses the representation of the image data 116 that includes an array of elements of features of image data 108 into a latent space), wherein an image and an associated feature-encoded image are associated with one of a plurality of image sources ([34]), and wherein each image source is associated with a different view of a common scene ([57]); an image processor ([30]).
Popov fails to disclose processes the feature-encoded images to generate a plurality of translated images in the feature space, wherein the image processor removes at least one feature of a selected feature-encoded image that is not present in an other of the plurality of feature encoded images when generating the translated images and replaces the removed at least one feature of the selected feature encoded image with at least partially, a second feature in at least one of an other of the plurality of translated images using second feature data from the other of the feature-encoded images; and one or more image decoders that output one or more translated images in an output image format.
However, Song teaches that processes the feature-encoded images to generate a plurality of translated images in the feature space (Fig.8, [160], image generation apparatus 800 generates a translated image in feature/latent space), wherein the image processor (Fig.17) removes at least one feature of a selected feature-encoded image that is not present in an other of the plurality of feature encoded images when generating the translated images (Fig.8, [160], [175], [197], as image generation apparatus 800 generates a translated image (e.g. diffused image),  masked feature of a first image that is selected feature-encoded image is removed);  and one or more image decoders that output one or more translated images in an output image format ([160]). Barnes teaches replaces the removed at least one feature of the selected feature encoded image with at least partially, a second feature in at least one of an other of the plurality of translated images using second feature data from the other of the feature-encoded images (Figs.4, [11], [39], [107], a neural network based system generates translated /transformed feature-encoded images such as primary image and auxiliary image, in that the feature representing unwanted object in primary image is removed, then is replaced or pasted by region representing feature in auxiliary image.
Therefore it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention having all the references Popov, Song and Barnes before him/her, to modify the system for multi-view image translation disclosed by Popov to include the teaching in the same field of endeavor of Song and Barnes, in order to provide a machine learning technique for image generation, as identified by Song, and techniques for effectively replacing a selected region of a given image with a corresponding region of an auxiliary image, as identified by Barnes.
Regarding claim 2, Popov discloses The system of claim 1, wherein the image processor associates positional encoding data of one or more robotic elements with the feature-encoded images, and wherein the at least one feature that is removed by the processor is associated with a robotic element ([24], [28]).  
Regarding claim 3, Popov discloses The system of claim 2, wherein the image processor further associates robotic proprioception data of the one or more robotic elements with the feature-encoded images ([27]).  
Regarding claim 4, Popov discloses The system of claim 1, wherein the processing includes passing at least the plurality of feature-encoded images through a neural network ([33]).  
Regarding claim 5, Popov discloses The system of claim 4, wherein the neural network is a multi-headed self-attention network that performs attention operations on features within a feature-encoded image and between the plurality of feature-encoded images ([43]).  
Regarding claim 6, Popov discloses The system of claim 4, wherein the neural network is a cross-image attention network that substantially performs attention operations on features between the plurality of feature-encoded images ([48]).  
Regarding claim 8, Popov discloses A method for multi-view image translation, comprising: receiving a first sequence of images of a first view of a scene from a first video source; receiving a second sequence of images of a second view of the scene from a second video source ([57]); encoding the first sequence of images of the first view into a third sequence of images encoded in a feature space (Fig.1, [41]-[43]); encoding the second sequence of images of the second view into a fourth sequence of images encoded in the feature space (Fig.1, [40]-[42]); outputting the fifth sequence of images as the first video source; and outputting the sixth sequence of images as the second video source ([178]).  
Popov fails to disclose processing the third sequence of images and the fourth sequence of images using a neural network to generate translated sequences of images in the feature space associated with the first view and the second view, wherein the processing removes a first feature of a feature-encoded image of the third sequence when generating the translated sequences of images and restores, at least partially, a second feature in at least one of the translated sequences of images from the fourth sequence of images; decoding the translated sequences of images into a fifth sequence of images of the first view and a sixth sequence of images of the second view.
However, Song teaches processing the third sequence of images and the fourth sequence of images using a neural network to generate translated sequences of images in the feature space associated with the first view and the second view ([43], [160]); decoding the translated sequences of images into a fifth sequence of images of the first view and a sixth sequence of images of the second view ([87], [165], [175], [178]). Barnes teaches wherein the processing determines a first feature is blocking a view of a second feature of a feature encoded second image ([12]), removes a first feature of a feature-encoded image of the third sequence when generating the translated sequences of images and restores, at least partially, a second feature in at least one of the translated sequences of images from the fourth sequence of images(Figs.4, [11], [39], [107]).
Regarding claims 9, 16, Popov discloses The method of claim 8, wherein the scene includes a robotic element and an object, and wherein the robotic element is encoded as the first feature in the feature space and the object is encoded as the second feature in the feature space (Fig.1, [38]).  
Regarding claims 10, 17, Song teaches The method of claim 9, further comprising: receiving a plurality of positional encodings of objects in the scene; and combining the positional encodings of the objects with the associated features appearing in the images prior to processing images using the neural network (Fig.5, [42]).  
Regarding claims 11, 18, Popov discloses The method of claim 9, further comprising: receiving proprioception data associated with the robotic element; and combining the proprioception data with the associated features appearing in the images prior to processing images using the neural network ([38]).  
Regarding claim 12, Popov discloses The method of claim 8, wherein the neural network is a multi-headed self-attention network that performs attention operations on features appearing within images and between features appearing in sequences of images associated with the first view and the second view ([38], [59]).  
Regarding claim 13, Popov discloses The method of claim 8, wherein the neural network is a cross-image attention network that substantially performs attention operations between features appearing in sequences of images associated with the first view and the second view ([38], [59]).  
Regarding claims 14, 20, Popov discloses The method of claim 8, further comprising: receiving one or more additional sequences of images of one or more additional views; and encoding the additional sequences of images into the feature space, wherein the processing operation using the neural network to generate translated sequences of images in the feature space includes processing the encoded additional sequences of images associatively with the third sequence of images and the fourth sequence of images (Fig.1, [53]).  
Regarding claim 15, see the rejection for claim 8.
Regarding claim 19, Popov discloses The non-transitory computer readable storage medium of claim 18, wherein the neural network is selected from the group consisting of: a multi-headed self-attention network that performs attention operations on features appearing both within images and between sequences of images associated with the first view and the second view, and a cross-image attention network that substantially performs attention operations between features appearing in sequences of images associated with the first view and the second view (Fig.1, [38], [53], [59]).  
Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Popov, in view of Son and Barnes, and further in view of WO 2023085897 A1 GWAK, Donggyu et al. (hereafter Gwak)
Regarding claim 7, Gwak teaches The system of claim 1, wherein a format of the feature-encoded images is different from at least one of the output image format and the received plurality of images (P.10, para.2nd).  
Therefore it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention having all the references Popov, Song, Barnes and Gwak before him/her, to modify the system for multi-view image translation disclosed by Popov to include the teaching in the same field of endeavor of Song, Barnes and Gwak, in order to provide a machine learning technique for image generation, as identified by Song, and techniques for effectively replacing a selected region of a given image with a corresponding region of an auxiliary image, as identified by Barnes, and  improve encoding/decoding efficiency, as identified by Gwak.


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.


Any inquiry concerning this communication or earlier communications from the examiner should be directed to TRACY Y. LI whose telephone number is (571)270-3671. The examiner can normally be reached Monday Friday (8:30 AM- 4:30 PM) EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Czekaj can be reached at (571) 272-7327. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/TRACY Y. LI/              Primary Examiner, Art Unit 2487

Read full office action

Prosecution Timeline

Sep 05, 2024

Application Filed

Sep 16, 2025

Non-Final Rejection — §103

Dec 19, 2025

Response Filed

Feb 11, 2026

Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

19/285,613

Patent 12598298

METHOD AND APPARATUS FOR RECONSTRUCTING 360-DEGREE IMAGE ACCORDING TO PROJECTION FORMAT

2y 5m to grant Granted Apr 07, 2026

18/604,710

Patent 12587661

VIDEO ENCODING METHOD, VIDEO DECODING METHOD, AND APPARATUS

2y 5m to grant Granted Mar 24, 2026

18/310,828

Patent 12579629

Systems and methods for utilizing remote visualization for performing micro-trenching

2y 5m to grant Granted Mar 17, 2026

18/531,014

Patent 12574556

DECODED PICTURE BUFFER MEMORY ALLOCATION AND PICTURE OUTPUT IN SCALABLE VIDEO CODING

2y 5m to grant Granted Mar 10, 2026

18/689,820

Patent 12574567

VIDEO PROCESSING METHOD AND APPARATUS, DEVICE, AND MEDIUM

2y 5m to grant Granted Mar 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

80%

Grant Probability

97%

With Interview (+16.4%)

2y 10m

Median Time to Grant

Moderate

PTA Risk

Based on 739 resolved cases by this examiner. Grant probability derived from career allow rate.