Last updated: April 19, 2026
Application No. 18/611,619
IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND PROGRAM

Final Rejection §103
Filed
Mar 20, 2024
Examiner
WU, MING HAN
Art Unit
2618
Tech Center
2600 — Communications
Assignee
Fujifilm Corporation
OA Round
2 (Final)
Interview Optional

— +23.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 370 resolved cases, 2023–2026
Examiner Intelligence

WU, MING HAN View full profile →
Grants 76% — above average
Career Allow Rate
282 granted / 370 resolved
+14.2% vs TC avg
Strong +23% interview lift
Without
With
+23.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
35 currently pending
Career history
405
Total Applications
across all art units
Statute-Specific Performance

§101
7.8%
-32.2% vs TC avg
§103
68.3%
+28.3% vs TC avg
§102
2.1%
-37.9% vs TC avg
§112
12.6%
-27.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 370 resolved cases
Office Action

§103
DETAILED ACTION

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1 – 24 are rejected under 35 U.S.C. 103 as being unpatentable over Xing et al. (NPL Publication: CN 111586360 A) in view of Michael et al. (Patent: WO 2017/144049 A1) and Hirano (Publication: US 2006/0210338 A1).

Regarding claim 1, see rejection on claim 23.

Regarding claim 2, Xing in view of Michael, Hirano, disclose all the limitation of claim 1.
Xing discloses wherein the captured image is an image captured from the air (Page 2 paragraph 3 - an unmanned aerial vehicle with camera takes shot.).

Regarding claim 3, Xing in view of Michael, Hirano, disclose all the limitation of claim 1.
Xing discloses wherein the plurality of specific points are points in a geographical space of the imaging target range (Page 7 paragraph 5 - obtaining the two-dimensional image corresponding to the range of the unmanned aerial vehicle camera shot under the initial position attitude, then performing feature matching for the two-dimensional picture and the video frame shot by the unmanned aerial vehicle camera; determining the three-dimensional feature point corresponding to the two-dimensional feature point on the video frame in the three-dimensional map after the matching is finished;).

Regarding claim 4, Xing in view of Michael, Hirano, disclose all the limitation of claim 1.
Xing discloses acquire map data corresponding to the imaging target range, and acquire the position information of the plurality of specific points from the map data (Page 6 paragraph 6 - S102, determining the three-dimensional characteristic point corresponding to the two-dimensional characteristic point on the three-dimensional map on the video frame according to the characteristic matching result. after obtaining the rendered two-dimensional picture, the unmanned aerial vehicle camera shooting the video frame and the two-dimensional picture characteristic matching, namely extracting the characteristic point in the video frame and the two-dimensional picture, the extracting must include position information in the points “acquire the position information”, and according to the similarity (characteristic vector distance) between the characteristic points to match, and generating the matching result. wherein the feature point on the video frame is a two-dimensional feature point; the feature point on the two-dimensional picture is a matching feature point.).

Regarding claim 5, Xing in view of Michael, Hirano, disclose all the limitation of claim 4.
Xing discloses wherein the map data includes data of latitude, longitude, and altitude, and the one or more processors transform the map data into rectangular coordinate data (Page 5 paragraph 12 – camera, the positioning information comprises longitude, latitude and height, which can be detected by the GPS module carried on the unmanned aerial vehicle.
Page 9 paragraph 1 - S209: according to the corresponding relation of the three-dimensional map and the two-dimensional picture coordinate point, determining the coordinate of the three-dimensional feature point corresponding to the two-dimensional feature point in the three-dimensional map.).

Regarding claim 6, Xing in view of Michael, Hirano, disclose all the limitation of claim 1.
Xing discloses wherein the plurality of specific points include points of specifying a feature (column 6 paragraph 7 - the unmanned aerial vehicle camera shooting the video frame and the two-dimensional picture characteristic matching, namely extracting the characteristic point in the video frame and the two-dimensional picture, and according to the similarity (characteristic vector distance) between the characteristic points to match, and generating the matching result. wherein the feature point on the video frame is a two-dimensional feature point; the feature point on the two-dimensional picture is a matching feature point.).
Michael discloses specifying a shape of a house (Page 9 last paragraph, Fig. 4 - a shape of a building, a region of the building.)
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Xing in view of Michael, Hirano, with specifying a shape of a house as taught by Michael. The motivation for doing is to provide more realism. 

Regarding claim 7, Xing in view of Michael, Hirano, disclose all the limitation of claim 1.
Xing discloses wherein the plurality of specific points include points of specifying a position of a feature(column 6 paragraph 7 - the unmanned aerial vehicle camera shooting the video frame and the two-dimensional picture characteristic matching, namely extracting the characteristic point in the video frame and the two-dimensional picture, and according to the similarity (characteristic vector distance) between the characteristic points to match, and generating the matching result. wherein the feature point on the video frame is a two-dimensional feature point; the feature point on the two-dimensional picture is a matching feature point. obtaining the two-dimensional image corresponding to the range of the unmanned aerial vehicle camera shot under the initial position attitude).
Michael discloses specifying a shape of a road (Page 5 paragraph5 , Fig. 4 - a shape of a road.)
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Xing in view of Michael, Hirano, with specifying a shape of a road as taught by Michael. The motivation for doing is to provide more realism. 

Regarding claim 8, Xing in view of Michael, Hirano, disclose all the limitation of claim 1.
Xing discloses wherein a transformation matrix used for the perspective projection transformation includes a plurality of the parameters (Page 7 paragraph 5 - determining the camera pose matrix by the posture-solving algorithm, focus information and/or distortion parameter; setting the camera in the virtual scene according to the above information, and adding the video frame into the rendering pipeline for video projection. obtaining the two-dimensional image corresponding to the range of the unmanned aerial vehicle camera shot under the initial position attitude.),
perform the evaluation of the rate of match a plurality of times while changing a combination of values of the plurality of parameters (Page 8 paragraph 6 – 8 S206: and performing characteristic matching to the characteristic points between the video frame and the two-dimensional picture according to the distance of the descriptor.
Specifically, after obtaining the feature point on the two-dimensional picture and video frame, judging the similarity between two feature points according to the distance corresponding to the descriptor, the smaller the distance is, the similarity is higher. wherein the distance of the descriptor can be Euclidean distance, hamming distance, cosine distance and so on. There are plurality of features points. Features points are arranged in order (ordering) so the changing evaluation is performed in a plurality of times.
further, based on the description of the GPU traversing the two-dimensional picture and the video frame, ordering the feature points according to the distance, displaying the matching result of the front N features under a certain confidence degree, namely, matching the feature points between the two-dimensional picture and the video frame according to the similarity reflected by the distance “rate of match”.).

Regarding claim 9, Xing in view of Michael, Hirano, disclose all the limitation of claim 8.
Xing discloses wherein the plurality of parameters are parameters related to a position and a posture of the camera that captures the captured image (Page 7 paragraph 5 - determining the camera pose matrix by the posture-solving algorithm, focus information and/or distortion parameter; setting the camera in the virtual scene according to the above information, and adding the video frame into the rendering pipeline for video projection. obtaining the two-dimensional image corresponding to the range of the unmanned aerial vehicle camera shot under the initial position attitude.).

Regarding claim 10, Xing in view of Michael, Hirano, disclose all the limitation of claim 1.
Xing discloses wherein the captured image is an image captured by using the camera mounted on a flying object (Page 2 paragraph 8 - performing characteristic matching for the video frame and the two-dimensional image shot by the unmanned aerial vehicle;), and the one or more processors acquire camera position information indicating a position of the camera during capturing of the captured image and posture information indicating a posture of the camera during the capturing of the captured image (Page 5 paragraph 12 – camera, the positioning information comprises longitude, latitude and height, which can be detected by the GPS module carried on the unmanned aerial vehicle.
Page 7 paragraph 5 - determining the camera pose matrix by the posture-solving algorithm, focus information and/or distortion parameter; setting the camera in the virtual scene according to the above information, and adding the video frame into the rendering pipeline for video projection. obtaining the two-dimensional image corresponding to the range of the unmanned aerial vehicle camera shot under the initial position attitude.), and 
decide a search range in which the value of the parameter is searched for, based on the camera position information and the posture information (page 7 paragraph 5 - firstly based on the positioning information of the unmanned aerial vehicle, attitude information and holder information determining the initial position posture of the unmanned aerial vehicle camera, and rendering the three-dimensional map based on the initial position attitude, obtaining the two-dimensional image corresponding to the range of the unmanned aerial vehicle camera shot under the initial position attitude, then performing characteristic matching to the video frame shot by the two-dimensional image and the unmanned aerial vehicle camera.).

Regarding claim 11, Xing in view of Michael, Hirano, disclose all the limitation of claim 10.
Xing discloses wherein the camera position information includes data of latitude, longitude, and altitude, and the posture information includes data of an azimuthal angle, a tilt angle, and a roll angle indicating an inclination from horizontal (Page 5 paragraph 12 –that the camera is mounted on the pan-tilt, and the pan-tilt is mounted on the unmanned aerial vehicle. wherein the positioning information comprises longitude, latitude and height, which can be detected by the GPS module carried on the unmanned aerial vehicle. the attitude information comprises pitch angle, roll angle and yaw angle, which can be detected by the IMU (inertia measuring unit) carried on the unmanned aerial vehicle. pan-tilt information (PTZ, Pan/Tilt/Zoom, rotating/pitching/zooming) comprises rotation of the pan-tilt, pitching and zoom information, representing the pan-tilt omnidirectional (left and right/upper) movement and lens zoom, zoom control “an inclination from horizontal”. omnidirectional movement includes the azimuthal angle.
Page 7 paragraph 5 - determining the camera pose matrix by the posture-solving algorithm, focus information and/or distortion parameter.).

Regarding claim 12, Xing in view of Michael, Hirano, disclose all the limitation of claim 10.
Xing discloses wherein the camera position information and the posture information are acquired from sensor data obtained by a sensor disposed in at least one of the camera or the flying object (Page 5 paragraph 12 –that the camera is mounted on the pan-tilt, and the pan-tilt is mounted on the unmanned aerial vehicle. wherein the positioning information comprises longitude, latitude and height, which can be detected by the GPS module carried on the unmanned aerial vehicle. the attitude information comprises pitch angle, roll angle and yaw angle, which can be detected by the IMU (inertia measuring unit) “sensor” carried on the unmanned aerial vehicle. pan-tilt information (PTZ, Pan/Tilt/Zoom, rotating/pitching/zooming) comprises rotation of the pan-tilt, pitching and zoom information, representing the pan-tilt omnidirectional (left and right/upper) movement and lens zoom, zoom control. )
	
Regarding claim 13, Xing in view of Michael, Hirano, disclose all the limitation of claim 1.
Xing discloses make a weight for the evaluation of the rate of match different between a portion and a portion of the captured image (
Page 8 paragraph 6 – 8 S206: and performing characteristic matching to the characteristic points between the video frame and the two-dimensional picture according to the distance of the descriptor.
Specifically, after obtaining the feature point on the two-dimensional picture and video frame, judging the similarity between two feature points according to the distance corresponding to the descriptor, the smaller the distance is, the similarity is higher “weight ”. wherein the distance of the descriptor can be Euclidean distance, hamming distance, cosine distance and so on. There are plurality of features points. Features points are arranged in order (ordering) so the evaluation is performed in a plurality of times.
further, based on the description of the GPU traversing the two-dimensional picture and the video frame, ordering the feature points according to the distance, displaying the matching result of the front N features under a certain confidence degree, namely, matching the feature points between the two-dimensional picture and the video frame according to the similarity reflected by the distance “rate of match”. ).
Michael discloses between a central portion and a peripheral portion of the captured image (Page 10 paragraph 7 - the object orientation and the determination or calculation of the visible part of the virtual object 19 are based on orientation and relative arrangement to the virtual position 14.2 and virtual line of sight 15.2 possibly taking into account the occlusion by other parts of the simulation environment 13 and possibly a scaling of the virtual object 19 based on the distance 31 between the object position 16 and the virtual position 14.2, also taking into account the virtual positions 14.2 of the virtual viewing direction 15.2 of the object position. tion 16 and, if necessary, the extent of the virtual field of view 2.2 calculates an insertion position of the virtual image content on the display element.).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Xing in view of Michael, Hirano, with between a central portion and a peripheral portion of the captured image as taught by Michael. The motivation for doing is to provide more realism. 

Regarding claim 14, Xing in view of Michael, Hirano, disclose all the limitation of claim 1.
Xing discloses select the value of the parameter at which the rate of match is highest, based on the results of the evaluation performed a plurality of times (
Page 8 paragraph 6 – 8 S206: and performing characteristic matching to the characteristic points between the video frame and the two-dimensional picture according to the distance of the descriptor.
Specifically, after obtaining the feature point on the two-dimensional picture and video frame, judging the similarity between two feature points according to the distance corresponding to the descriptor, the smaller the distance is, the similarity is higher “select the value of the parameter”. wherein the distance of the descriptor can be Euclidean distance, hamming distance, cosine distance and so on. There are plurality of features points. Features points are arranged in order (ordering) so the evaluation is performed in a plurality of times.
further, based on the description of the GPU traversing the two-dimensional picture and the video frame, ordering the feature points according to the distance, displaying the matching result of the front N features under a certain confidence degree, namely, matching the feature points between the two-dimensional picture and the video frame according to the similarity reflected by the distance “rate of match”.).

Regarding claim 15, Xing in view of Michael, Hirano, disclose all the limitation of claim 14.
Xing discloses generate a composite image in which the first line segment generated by using the perspective projection transformation defined by the selected value of the parameter is superimposed on the captured image (Page 6 paragraph 6 - S102 after obtaining the rendered two-dimensional picture, the unmanned aerial vehicle camera shooting the video frame and the two-dimensional picture characteristic matching, namely extracting the characteristic point in the video frame and the two-dimensional picture, and according to the similarity (characteristic vector distance) between the characteristic points to match, and generating the matching result. wherein the feature point on the video frame is a two-dimensional feature point; the feature point on the two-dimensional picture is a matching feature point “selected value”.
Page 7 paragraph 5 - S104 performing the fusion projection transformation “composite image”, determining the mapping relationship between the pixel point in the video frame and the three-dimensional point in the three-dimensional scene (virtual scene), and according to the mapping relation, the video frame in the three-dimensional scene for colour texture mapping, and the overlapped area of the colour texture mapping for smooth transition processing, “superimposed”
Page 6 paragraph 6 - subsequent to the video frame and the characteristic matching of the two-dimensional picture can solve the difference caused by different viewing angle. or the real-time positioning information and posture information of the unmanned aerial vehicle and the previous video frame solution to calculate the camera position as fusion filter to obtain the initial position posture of the new frame, and rendering the three-dimensional map based on the initial position posture.
S102 to S104 are steps in sequential and these steps describe the limitation above.).

Regarding claim 16, Xing in view of Michael, Hirano, disclose all the limitation of claim 1.
Xing disclose results with superior evaluation records among the evaluations performed a plurality of times, and receive an instruction to select one result from among the plurality of results with the superior evaluation records (Page 8 paragraph 6 – 8 S206: and performing characteristic matching to the characteristic points between the video frame and the two-dimensional picture according to the distance of the descriptor.
Specifically, after obtaining the feature point on the two-dimensional picture and video frame, judging the similarity between two feature points according to the distance corresponding to the descriptor, the smaller the distance is, the similarity is higher “instruction to select”. wherein the distance of the descriptor can be Euclidean distance, hamming distance, cosine distance and so on. There are plurality of features points. Features points are arranged in order (ordering) so the evaluation is performed in a plurality of times.
further, based on the description of the GPU traversing the two-dimensional picture and the video frame, ordering the feature points according to the distance, displaying the matching result of the front N features under a certain confidence degree, namely, matching the feature points between the two-dimensional picture and the video frame according to the similarity reflected by the distance.).
Michael discloses displaying a plurality of results (page 3 paragraph 9 - The effect of the simulation environment as an invisible virtual background for the virtual objects and image content, the realistic alignment of the virtual objects and image content is transferred as part of the process in the real environment and a realistic superimposed display of the virtual image content allows. For example, this ensures that a virtual object in the form of a vehicle is arranged in the simulation environment such that at the object position or in the Perimeter of the object's position, the four or more wheels of the vehicle touch or touch the surface of the simulation environment. Page 4 paragraph 5 - display of at least one virtual image content on the display element takes place taking into account the previously calculated insertion position on the display element.)
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Xing in view of Michael, Hirano, with displaying a plurality of results as taught by Michael. The motivation for doing is to provide more realism. 

Regarding claim 17, Xing in view of Michael, Hirano, disclose all the limitation of claim 16.
Xing discloses wherein the one or more processors generate a composite image in which the first line segment generated by using the perspective projection transformation defined by the value of the parameter corresponding to the selected result is superimposed on the captured image in accordance with the received instruction (Page 6 paragraph 6 - S102 after obtaining the rendered two-dimensional picture, the unmanned aerial vehicle camera shooting the video frame and the two-dimensional picture characteristic matching, namely extracting the characteristic point in the video frame and the two-dimensional picture, and according to the similarity (characteristic vector distance) between the characteristic points to match “line segment”, and generating the matching result. wherein the feature point on the video frame is a two-dimensional feature point; the feature point on the two-dimensional picture is a matching feature point “selected value”.
Page 7 paragraph 5 - S104 performing the fusion projection transformation “composite image”, determining the mapping relationship between the pixel point in the video frame and the three-dimensional point in the three-dimensional scene (virtual scene) “projection transformation”, and according to the mapping relation, the video frame in the three-dimensional scene for colour texture mapping, and the overlapped area of the colour texture mapping for smooth transition processing, “superimposed”
Page 6 paragraph 6 - subsequent to the video frame and the characteristic matching of the two-dimensional picture can solve the difference caused by different viewing angle. or the real-time positioning information and posture information of the unmanned aerial vehicle and the previous video frame solution to calculate the camera position as fusion filter to obtain the initial position posture of the new frame, and rendering the three-dimensional map based on the initial position posture.
S102 to S104 are steps in sequential and these steps describe the limitation above.).

Regarding claim 18, Xing in view of Michael, Hirano, disclose all the limitation of claim 15.
Xing discloses wherein the plurality of specific points include points of specifying a shape of a feature, and the composite image is an image in which a figure indicating a region of the feature using the first line segment is superimposed on the captured image (Page 6 paragraph 6 - S102 after obtaining the rendered two-dimensional picture, the unmanned aerial vehicle camera shooting the video frame and the two-dimensional picture characteristic matching, namely extracting the characteristic point in the video frame and the two-dimensional picture, and according to the similarity (characteristic vector distance) between the characteristic points to match “first line segment”, and generating the matching result. wherein the feature point on the video frame is a two-dimensional feature point; the feature point on the two-dimensional picture is a matching feature point “selected value”.
Page 7 paragraph 5 - S104 performing the fusion projection transformation “composite image”, determining the mapping relationship between the pixel point in the video frame and the three-dimensional point in the three-dimensional scene (virtual scene) “projection transformation”, and according to the mapping relation, the video frame in the three-dimensional scene for colour texture mapping, and the overlapped area of the colour texture mapping for smooth transition processing, “superimposed”
Page 6 paragraph 6 - subsequent to the video frame and the characteristic matching of the two-dimensional picture can solve the difference caused by different viewing angle. or the real-time positioning information and posture information of the unmanned aerial vehicle and the previous video frame solution to calculate the camera position as fusion filter to obtain the initial position posture of the new frame, and rendering the three-dimensional map based on the initial position posture.
S102 to S104 are steps in sequential and these steps describe the limitation above.).
Michael discloses specifying a shape of a house, a region of the house (Page 9 last paragraph, Fig. 4 - a shape of a building, a region of the building.)
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Xing in view of Michael, Hirano, with a shape of a house, a region of the house as taught by Michael. The motivation for doing is to provide more realism. 

Regarding claim 19, Xing in view of Michael, Hirano, disclose all the limitation of claim 18.
Michael discloses wherein receive input of an instruction to move the figure indicating the region of the house displayed by being superimposed on the captured image, and move the figure on the captured image in accordance with the input instruction (Page 7 paragraphs 5 and 6 - that moving virtual objects can be superimposed into the real environment by means of the virtual image contents, which is made possible via the superimposed display of virtual and real image contents by means of the display device. The same may additionally or alternatively also apply to the orientation of the virtual objects in the simulation environment. The process is performed by computer and has to have received an instruction for the computer to performed the claimed methods. Building in the simulated environment, page 4 paragraph 5.).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Xing in view of Michael, Hirano, with wherein receive input of an instruction to move the figure indicating the region of the house displayed by being superimposed on the captured image, and move the figure on the captured image in accordance with the input instruction as taught by Michael. The motivation for doing is to provide more realism. 

Regarding claim 20, Xing in view of Michael, Hirano, disclose all the limitation of claim 18.
Michael discloses cut out an image area of the house surrounded by the figure from the image (Page 6 paragraph 6 cropping the building from the image.).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Xing in view of Michael, Hirano, with cut out an image area of the house surrounded by the figure from the image as taught by Michael. The motivation for doing is to provide more realism. 
 
Regarding claim 21, Xing in view of Michael, Hirano, disclose all the limitation of claim 1.
Xing discloses a display unit that displays a result of the association between the captured image and the positions of the plurality of specific points (page 8 paragraph 7 - after obtaining the feature point on the two-dimensional picture and video frame, judging the similarity between two feature points according to the distance corresponding to the descriptor, the smaller the distance is, the similarity is higher. wherein the distance of the descriptor can be Euclidean distance, hamming distance, cosine distance and so on.

page 8 paragraph 8 - further, based on the description of the GPU traversing the two-dimensional picture and the video frame, ordering the feature points according to the distance, displaying the matching result of the front N features under a certain confidence degree, namely, matching the feature points between the two-dimensional picture and the video frame according to the similarity reflected by the distance.); and 
an input unit for inputting an instruction from a user (column 11 line 3 - The input device 43 may be used to receive the input digital or character information, and generate a key signal input associated with the user setting and function control of the device. The output device 44 may include a display device such as a display screen.).

Regarding claim 22, see rejection on claim 23.

Regarding claim 23, Xing discloses a non-transitory, computer-readable tangible recording medium which records thereon a program causing, when read by a computer, the computer to implement: a function of acquiring a captured image captured by using a camera (Page 4, last paragraph - A storage medium containing computer-executable instructions, the computer-executable instructions when executed by a computer processor for executing unmanned machine projection method provided by the embodiment, the unmanned machine projection method comprises: based on the positioning information of the unmanned aerial vehicle; determining the initial position posture of the unmanned aerial vehicle camera under the world coordinate system;
Page 11 paragraph 1 - FIG. 4 is a structure schematic diagram of a computer device provided by the embodiment of the invention. Referring to FIG. 4, the computer device includes: an input device 43, an output device 44, a memory 42, and one or more processors 41; the memory 42 is configured to store one or more programs; when the one or more programs are executed by the one or more processors 41, so that the one or more processors 41 to realize the unmanned machine projection method provided by the above embodiments. wherein the input device 43, the output device 44, the memory 42 and the processor 41 can be connected by a bus or other means, in FIG. 4 is connected by a bus as an example.memory 42 as a computing device readable storage medium, which can be used for storing software program, computer executable program and module,to perform the method:); 
a function of acquiring three-dimensional position information indicating positions of a plurality of specific points in a space of an imaging target range (Page 6 paragraph 3 - rendering the three-dimensional map, so as to obtain the picture corresponding to the camera shot at the initial position posture corresponding to the two-dimensional picture.); 
a function of setting a value of a parameter of perspective projection transformation (Page 8 paragraphs 7 and 8 – Specifically, after obtaining the feature point on the two-dimensional picture and video frame, judging the similarity between two feature points according to the distance corresponding to the descriptor, the smaller the distance is, the similarity is higher. wherein the distance of the descriptor can be Euclidean distance, hamming distance, cosine distance and so on. Page 8 paragraph 6 – 8 S206: and performing characteristic matching to the characteristic points between the video frame and the two-dimensional picture according to the distance of the descriptor.
Specifically, after obtaining the feature point on the two-dimensional picture and video frame, judging the similarity between two feature points according to the distance corresponding to the descriptor, the smaller the distance is, the similarity is higher. wherein the distance of the descriptor can be Euclidean distance, hamming distance, cosine distance and so on. There are plurality of features points. Features points are arranged in order (ordering) so the evaluation is performed in a plurality of times “setting a value of a parameter”.);
a function of transforming the position information of the plurality of specific points into data of the image coordinates by using the perspective projection transformation (Page 9 paragraph 4 - obtaining the two-dimensional feature point coordinate on the video frame and the three-dimensional feature point coordinate on the three-dimensional map, the two-dimensional feature point coordinate and three-dimensional feature point coordinate into the PnP algorithm and nonlinear optimization algorithm, obtaining the accurate camera pose matrix by the PnP algorithm, and then optimizing the camera parameters by the nonlinear optimization algorithm to obtain the focus information and/or distortion parameter.
Page 7 paragraph 5 - determining the three-dimensional feature point corresponding to the two-dimensional feature point on the video frame in the three-dimensional map; determining the camera pose matrix by the posture-solving algorithm, focus information and/or distortion parameter; setting the camera in the virtual scene according to the above information, and adding the video frame into the rendering pipeline for video projection); 
a function of evaluating a rate of match between the first line segment and the second line segment (Page 6 paragraph 6 - S102 after obtaining the rendered two-dimensional picture, the unmanned aerial vehicle camera shooting the video frame and the two-dimensional picture characteristic matching, namely extracting the characteristic point in the video frame and the two-dimensional picture, and according to the similarity (characteristic vector distance) between the characteristic points to match “line segment”, and generating the matching result. wherein the feature point on the video frame is a two-dimensional feature point; the feature point on the two-dimensional picture is a matching feature point.
Page 8 paragraphs 7 and 8 – Specifically, after obtaining the feature point on the two-dimensional picture and video frame, judging the similarity between two feature points according to the distance corresponding to the descriptor, the smaller the distance is, the similarity is higher. wherein the distance of the descriptor can be Euclidean distance, hamming distance, cosine distance and so on.
further, based on the description of the GPU traversing the two-dimensional picture and the video frame, ordering the feature points according to the distance, displaying the matching result of the front N features under a certain confidence degree, namely, matching the feature points between the two-dimensional picture and the video frame according to the similarity reflected by the distance.); 
a function of performing the evaluation of the rate of match a plurality of times while changing the value of the parameter of the perspective projection transformation (
Page 8 paragraph 6 – 8 S206: and performing characteristic matching to the characteristic points between the video frame and the two-dimensional picture according to the distance of the descriptor.
Specifically, after obtaining the feature point on the two-dimensional picture and video frame, judging the similarity between two feature points according to the distance corresponding to the descriptor, the smaller the distance is, the similarity is higher. wherein the distance of the descriptor can be Euclidean distance, hamming distance, cosine distance and so on. There are plurality of features points. Features points are arranged in order (ordering) so the evaluation is performed in a plurality of times.
further, based on the description of the GPU traversing the two-dimensional picture and the video frame, ordering the feature points according to the distance, displaying the matching result of the front N features under a certain confidence degree, namely, matching the feature points between the two-dimensional picture and the video frame according to the similarity reflected by the distance “rate of match”.);
and a function of associating the captured image with the positions of the plurality of specific points based on results of the evaluation performed a plurality of times (Page 8 paragraph 6 – 8 S206: and performing characteristic matching to the characteristic points between the video frame and the two-dimensional picture according to the distance of the descriptor.
Specifically, after obtaining the feature point on the two-dimensional picture and video frame, judging the similarity between two feature points according to the distance corresponding to the descriptor, the smaller the distance is, the similarity is higher. wherein the distance of the descriptor can be Euclidean distance, hamming distance, cosine distance and so on.
further, based on the description of the GPU traversing the two-dimensional picture and the video frame, ordering the feature points according to the distance, displaying the matching result of the front N features under a certain confidence degree, namely, matching the feature points between the two-dimensional picture and the video frame according to the similarity reflected by the distance.
Features points are arranged in order (ordering) so the evaluation is performed in a plurality of times.).
Xing does not however Michael discloses 
perspective projection transformation of transforming the three-dimensional position information into two-dimensional image coordinates based on an imaging condition of the image (Page 6 paragraph 1 - wherein the projection is generated at least on the basis of the orientation of the virtual object in the simulation environment. The dependence of the projection or the virtual image content resulting from the projection on the orientation of the virtual object also means that the virtual image content displayed to the user will change even if the corresponding virtual object in the simulation environment only makes one rotation without a change in the object position around an example vertically running object axis executes. An illustrative example of this is also a tracked vehicle as a virtual object, since even in reality tracked vehicles are capable of turning on the spot, ie only performing a rotation about a vertical axis. In such a movement of a real tracked vehicle, the viewer sees in reality also a constantly changing part of the vehicle surface. When the virtual image content is generated as a two-dimensional projection of three-dimensional virtual objects, taking into account the orientation of the object in the simulation environment, such rotation of the object may also involve shaping one or more axes both individually and in combination with a change in the object position be presented to the user together or superimposed with real image content of very realistic virtual image content.).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Xing with perspective projection transformation of transforming the three-dimensional position information into two-dimensional image coordinates based on an imaging condition of the image as taught by Michael. The motivation for doing is to provide more realism. 
 Xing in view of Michael do not however Hirano discloses 
a function of extracting a first line segment based on the data of the image coordinates obtained by the transformation ([0012] - a second fitting line formed out of a line of a predetermined shape according to the edge points in the effective edge point region on the contour image ,“ function”.
[0047] - A size and position of the line are determined by the Hough transformation, and the fitting line is extracted from the contour image based on the transformation,“ function”.);
a function of extracting a second line segment from the captured image (
[0047] - the first fitting line is extracted from the contour image.);
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Xing in view of Michael, Hirano, with a function of extracting a first line segment based on the data of the image coordinates obtained by the transformation; a function of extracting a second line segment from the captured image as taught by Hirano. The motivation for doing is to improve the accuracy of fitting. 

Regarding claim 24, Xing in view of Michael, Hirano, disclose all the limitation of claim 1.
Hirano discloses generate the first line segment by connecting points of the data of the image coordinates obtained by the transformation ([0047] -  a fitting line generation unit for generating a fitting line formed out of a line of a predetermined shape according to the edge points in the effective edge point region on the contour image.
[0012] - a fitting line formed out of a line of a predetermined shape according to the edge points in the effective edge point region on the contour image.).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Xing in view of Michael, Hirano, with generate the first line segment by connecting points of the data of the image coordinates obtained by the transformation as taught by Hirano. The motivation for doing is to improve the accuracy of fitting. 

Response to Arguments

Examiner suggests to amend a specific element in the claim that when reading a claim in light of the invention, it directs to a unique technology. 

Claim Rejection Under 35 U.S.C. 103
Applicant asserts “Amended independent claim 1 specifies that the one or more processors execute a command of the program to, in part, "extract a first line segment based on the data of the image coordinates obtained by the transformation," "extract a second line segment from the captured image," and "evaluate a rate of match between the first line segment and the second line segment." Amended independent claims 22 and 23 recite similar features. At least paras. [0065], [0077], [0078], [0135], and [0136] of the specification support the amended features. Applicant contends that both Xing and Michael fail to disclose or suggest at least the features of extracting "a first line segment" and "a second line segment" and evaluating "a rate of match between the first line segment and the second line segment" recited in the amended claims. The invention of Xing may look similar to the claimed features at first glance in that Xing is directed to a technique for matching (positional matching) between a two-dimensional image shot by a drone and a three-dimensional map. However, Applicant contends that the invention of Xing and the claimed features are different in a processing method used for matching. In Xing, a three-dimensional map is rendered based on position information, posture information, etc. of a drone to generate a two-dimensional image, characteristic points are extracted from each of the two-dimensional image (an image generated through transformation from the three-dimensional map) and a video frame shot by the drone, characteristic matching is performed on the characteristic points of the images, and a coordinate of a 3D characteristic point in the three-dimensional map corresponding to a 2D characteristic point on the video frame is determined based on a result of the characteristic matching. A characteristic point of an image in Xing is a "point" extracted using an image feature extraction algorithm, such as SIFT, SURF, or ORB and is not a "line segment." Applicant contends that evaluation of a correspondence between "characteristic points" in Xing and evaluation of a rate of match between "line segments" as in the amended claims are different techniques. Applicant contends that neither Xing nor Michael discloses the technical idea of evaluating a rate of match between "line segments" (performing positional matching with a focus on "line segments") as in the amended claims. Thus, Applicant contends that the claimed features would not have been easily arrived at from the combination of Xing and Michael.”

The argument has been fully considered and is persuasive. Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Hirano reference.  
Regarding claims 2 – 21, and 24, the Applicant asserts that they are not obvious over based on their dependency from independent claim 1 and 10 respectively. The examiner cannot concur with the Applicant respectfully from same reason noted in the examiner’s response to argument asserted from claim 1 respectively. 

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Ming Wu whose telephone number is (571) 270-0724.  The examiner can normally be reached on Monday-Thursday and alternate Fridays (9:30am - 6:00pm) EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Devona Faulk can be reached on 571-272-7515.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Ming Wu/
Primary Examiner, Art Unit 2618
Read full office action
Prosecution Timeline

Mar 20, 2024
Application Filed
Oct 31, 2025
Non-Final Rejection — §103
Feb 05, 2026
Response Filed
Feb 21, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/470,698
Patent 12597109
SYSTEMS AND METHODS FOR GENERATING THREE-DIMENSIONAL MODELS USING CAPTURED VIDEO
2y 5m to grant Granted Apr 07, 2026
18/436,674
Patent 12579702
METHOD AND SYSTEM FOR ADAPTING A DIFFUSION MODEL
2y 5m to grant Granted Mar 17, 2026
18/551,392
Patent 12579623
IMAGE PROCESSING METHOD AND APPARATUS, ELECTRONIC DEVICE, AND READABLE STORAGE MEDIUM
2y 5m to grant Granted Mar 17, 2026
18/387,825
Patent 12567185
Method and system of creating and displaying a visually distinct rendering of an ultrasound image
2y 5m to grant Granted Mar 03, 2026
18/490,325
Patent 12548202
TEXTURE COORDINATE COMPRESSION USING CHART PARTITION
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
76%
Grant Probability
99%
With Interview (+23.3%)
2y 8m
Median Time to Grant
Moderate
PTA Risk
Based on 370 resolved cases by this examiner. Grant probability derived from career allow rate.