Last updated: April 19, 2026
Application No. 18/589,174
INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND COMPUTER PROGRAM

Non-Final OA §103§112
Filed
Feb 27, 2024
Examiner
PATEL, PINALBEN V
Art Unit
2673
Tech Center
2600 — Communications
Assignee
Kabushiki Kaisha Toshiba
OA Round
1 (Non-Final)
Interview Optional

— +9.9% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 545 resolved cases, 2023–2026
Examiner Intelligence

PATEL, PINALBEN V View full profile →
Grants 89% — above average
Career Allow Rate
484 granted / 545 resolved
+26.8% vs TC avg
Moderate +10% lift
Without
With
+9.9%
Interview Lift
resolved cases with interview
Typical timeline
2y 6m
Avg Prosecution
23 currently pending
Career history
568
Total Applications
across all art units
Statute-Specific Performance

§101
9.1%
-30.9% vs TC avg
§103
59.9%
+19.9% vs TC avg
§102
5.9%
-34.1% vs TC avg
§112
14.9%
-25.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 545 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 02/27/2024 was filed.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-13 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.


Claims 1, 12 and 13 recite limitations – “a selection unit configured to select a plurality of correspondence relationships, based on effectiveness of the correspondence relationship and an influence on at least one of the images when the correspondence relationship with the effectiveness lower than an effectiveness threshold is eliminated”, appears to be directed to selecting correspondence relationship among multiple correspondence relationships when its effectiveness threshold is lower than preset threshold wherein the effectiveness of the correspondence relationship is based on its influence on one of the images. However, the structure of the claimed features where it recites –"based on an influence on at least one of the images when the correspondence relationship with the effectiveness lower…”, appears to be not clear as to what specific feature is eliminated, is it the correspondence relationship with lower threshold or the image which is less influenced. 

Therefore, the Examiner suggests amending the claims to clarify the above discussed features in order to render the claims definite. 

Specifically, may be amended as to clearly indicate that correspondence relationship is selected based on its effectiveness and influence on at least one of the images and further correspondence relationship is eliminated if it is lower than preset effectiveness threshold as outlined in one of the embodiments on original specifications. 


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.
Claims 1-13 are rejected under 35 U.S.C. 103 as being unpatentable over Matsubara et al. (JP 5953142 B2, as provided) in view of Lin et al. US Pub No. 20190244050 A1). 

Regarding Claim 1,
Matsubara discloses an information processing apparatus, comprising: one or more hardware processors configured to function as: (Matsubara, Description of Embodiments, discloses image processing apparatus 10 according to the present embodiment is shown in FIG. The image processing apparatus 10 is an apparatus that performs a process of removing blur on an input moving image and outputs a moving image from which the blur has been removed. The image processing apparatus 10 includes a shake correction unit 100, a control unit 11, an image acquisition unit 12, a compression / decompression unit 13, a first storage unit 14, a second storage unit 15, and an image output unit 16. Is provided. Each unit is connected to each other via a bus 17; apparatus is disclosed)

a correspondence relationship acquisition unit configured to calculate a plurality of image features from a plurality of images captured by a camera and acquire a correspondence relationship between the image features; (Matsubara, Description of Embodiments, Details of a configuration example of the shake correction unit 100 are shown in FIG. As shown in this figure, the shake correction unit 100 includes a corresponding point acquisition unit 110, a posture estimation unit 140, and a correction unit 150. In the present embodiment, the corresponding point acquisition unit 110 includes a feature point tracking unit 120 that acquires corresponding points based on the feature points. The feature point tracking unit 120 includes a feature point detection unit 124, a feature amount calculation unit 126, a matching setting unit 127, and a matching calculation unit 128. The feature point detection unit 124 obtains the image of the Nth frame and the image of the N + 1th frame (between consecutive images) acquired by the image acquisition unit 12 and input to the blur correction unit 100 via the bus 17. Each feature point candidate is detected. The feature amount calculation unit 126 calculates feature amounts of candidate feature points detected by the feature point detection unit 124, and determines a point having a high feature amount as a feature point. The matching setting unit 127 reduces the number of feature points used in subsequent processing according to the depth of the feature points determined by the feature amount calculation unit 126. The matching calculation unit 128 acquires a correspondence relationship between the feature point of the Nth frame and the feature point of the (N + 1) th frame. The feature point for which the correspondence relationship has been acquired will be referred to as a first corresponding point; corresponding relationship (amount of shake or motion) between features of N and N+1th frames is determined captured by camera sensor) and 

a selection unit configured to select a plurality of correspondence relationships, based on effectiveness of the correspondence relationship and an influence on at least one of the images when the correspondence relationship with the effectiveness lower than an effectiveness threshold is eliminated; (Matsubara, Description of Embodiments, discloses basic matrix calculation unit 144 illustrated in FIG. 2 calculates a basic matrix based on the basic matrix calculated by the basic matrix calculation unit 142. Based on the basic matrix calculated by the basic matrix calculation unit 144, the rotation / translation calculation unit 146 calculates the rotation and translation of the imaging device that has captured the N + 1th frame relative to the imaging device that has captured the Nth frame. The depth calculation unit 148 calculates the three-dimensional coordinates of the feature points determined as inlier corresponding points by the inlier calculation unit 1423 based on the basic matrix, and outputs the result to the matching setting unit 127. Based on the rotation and translation of the imaging apparatus calculated by the posture estimation unit 140, the correction unit 150 performs correction to remove blurring between the Nth frame image and the N + 1th frame image; The posture estimation unit 140 includes a basic matrix calculation unit 142, a basic matrix calculation unit 144, a rotation / translation calculation unit 146, and a depth calculation unit 148. Details of a configuration example of the basic matrix calculation unit 142 are shown in FIG. As shown in this figure, the basic matrix calculation unit 142 includes a corresponding point extraction unit 1421, a temporary basic matrix calculation unit 1422, an inlier calculation unit 1423, an iterative determination unit 1424, and a basic matrix determination unit 1425. The corresponding point extraction unit 1421 randomly extracts, for example, eight points from the first corresponding points acquired by the corresponding point acquisition unit 110. The temporary basic matrix calculation unit 1422 calculates a basic matrix based on the eight first corresponding points extracted by the corresponding point extraction unit 1421. Here, a basic matrix calculated from eight points extracted at random is referred to as a temporary basic matrix. The inlier calculating unit 1423 calculates an epipolar line for each first corresponding point acquired by the corresponding point acquiring unit 110 based on the temporary basic matrix calculated by the temporary basic matrix calculating unit 1422, and the epipolar line and the first corresponding point. The distance to the corresponding point is calculated. The inlier calculation unit 1423 determines whether the distance from the epipolar line is smaller than a predetermined threshold for each of the first corresponding points, and sets the corresponding points that are less than the predetermined threshold (or less) as the inlier corresponding points. And the number of corresponding points to be inlier corresponding points among the first corresponding points is counted. The repetition determining unit 1424 repeats the calculation of the number of inlier corresponding points corresponding to each temporary base matrix, that is, the processing from the corresponding point extracting unit 1421 to the inlier calculating unit 1423 a predetermined number of times or until a predetermined condition is satisfied. And a plurality of provisional basic matrices and the number of inlier corresponding points for the temporary basic matrix. The basic matrix determination unit 1425 compares the number of inlier corresponding points with respect to the temporary basic matrix, and determines the temporary basic matrix having the largest number of inlier corresponding points as the basic matrix; rotation or translation required to correct the corresponding relationship in form of shake or blur is determined and their values of relevance is determined based on threshold value) and 

Matsubara does not explicitly disclose a calculation unit configured to calculate at least one of a position and direction of the camera and three-dimensional information on the image features from the correspondence relationship selected from among the correspondence relationships.  

Lin discloses a calculation unit configured to calculate at least one of a position and direction of the camera and three-dimensional information on the image features from the correspondence relationship selected from among the correspondence relationships.   (Lin, [0039-0041], Fig. 2-7, discloses an AR application scenario according to an exemplary embodiment of this application. There exists a desktop 220 in the real world, a picture 222 is on the desktop 220, and picture content of the picture 222 may be regarded as a Marker image. The Marker image is a reference image used for matching. A mobile terminal 240 having the camera performs continuous shooting by using the desktop 220 as a shot image, to obtain frames of images, such as images 1 to 6 shown in FIG. 2. The continuously shot frames of images are successively input to the processor for processing. In this embodiment of this application, the first image is used to refer to an i.sup.th frame of image collected by the camera, and the second image is used to refer to an (i+1).sup.th frame of image collected by the camera. The mobile terminal measures a homography between the Marker image and the second image by using a detector, and measures a homography between the first image and the second image by using a tracker; and then, performs complementary filtering processing on the two homographies, to obtain camera pose information of the mobile terminal through calculation, the camera pose information being used to represent a spatial position of the mobile terminal when the mobile terminal shoots the second image in the real world; homography is also referred to as a homography matrix, and generally describes a transformation relationship of some points on a common plane between two images. The homography describes a mapping relationship between two planes, and if all feature points in a real environment fall on the same physical plane, movement estimation may be performed between two frames of images by using the homography. For image A and image B, when at least four pairs of matched feature points exist in image A and image B, the mobile terminal decomposes the homography by using a ransac (Random Sample Consensus) algorithm, to obtain a rotation and translation matrix R|T. R is a rotation matrix corresponding to the camera changing from a first pose for shooting image A to a second pose for shooting image B, and T is a displacement vector corresponding to the camera changing from the first pose for shooting image A to the second pose for shooting image B; for the characteristic of a low operational capability of the mobile device, in this solution, a complementary filtering algorithm is used to accurately fuse a natural image detection result and an image inter-frame tracking result stored by a user, to implement a stable and rapid method that is for determining camera pose information and that has strong robustness. The method may be applied to an AR scenario, for example, an interactive scenario such as an AR type game scenario, an AR type educational scenario, or an AR type conference scenario. The method may be applied to an application program of camera positioning and pose correction based on the Marker image. The template image in this application is the Marker image. The Marker image may also be referred to as an Anchor image; camera pose is estimated based on matched features (corresponding relationship) between captured images using the camera sensor). 

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Dujmic in view of Geng having a method of detecting a radiographic object by the training of the training model, with the teachings of Gates having, by the training module, performing an optimization to modify a weighted vector of a decision boundary separating the target group of feature vectors and the control group of feature vectors so that the margin representing the distance from the decision boundary to each of support vectors of the target group of feature vectors and of the control group of feature vectors is maximized through an objective function of the training model. 

Regarding Claim 2, 
The combination of Matsubara and Lin further discloses an initial value acquisition unit configured to acquire at least one of an initial value of a capturing position of the camera and an initial value of the direction of the camera, and 
the calculation unit calculates at least one of the position and direction of the camera and the three-dimensional information on the image features, further based on at least one of the initial value of the capturing position of the camera and the initial value of the direction of the camera.  (Lin, [0039-0041], [0099], Fig. 2-7, discloses an AR application scenario according to an exemplary embodiment of this application. There exists a desktop 220 in the real world, a picture 222 is on the desktop 220, and picture content of the picture 222 may be regarded as a Marker image. The Marker image is a reference image used for matching. A mobile terminal 240 having the camera performs continuous shooting by using the desktop 220 as a shot image, to obtain frames of images, such as images 1 to 6 shown in FIG. 2. The continuously shot frames of images are successively input to the processor for processing. In this embodiment of this application, the first image is used to refer to an i.sup.th frame of image collected by the camera, and the second image is used to refer to an (i+1).sup.th frame of image collected by the camera. The mobile terminal measures a homography between the Marker image and the second image by using a detector, and measures a homography between the first image and the second image by using a tracker; and then, performs complementary filtering processing on the two homographies, to obtain camera pose information of the mobile terminal through calculation, the camera pose information being used to represent a spatial position of the mobile terminal when the mobile terminal shoots the second image in the real world; homography is also referred to as a homography matrix, and generally describes a transformation relationship of some points on a common plane between two images. The homography describes a mapping relationship between two planes, and if all feature points in a real environment fall on the same physical plane, movement estimation may be performed between two frames of images by using the homography. For image A and image B, when at least four pairs of matched feature points exist in image A and image B, the mobile terminal decomposes the homography by using a ransac (Random Sample Consensus) algorithm, to obtain a rotation and translation matrix R|T. R is a rotation matrix corresponding to the camera changing from a first pose for shooting image A to a second pose for shooting image B, and T is a displacement vector corresponding to the camera changing from the first pose for shooting image A to the second pose for shooting image B; for the characteristic of a low operational capability of the mobile device, in this solution, a complementary filtering algorithm is used to accurately fuse a natural image detection result and an image inter-frame tracking result stored by a user, to implement a stable and rapid method that is for determining camera pose information and that has strong robustness. The method may be applied to an AR scenario, for example, an interactive scenario such as an AR type game scenario, an AR type educational scenario, or an AR type conference scenario. The method may be applied to an application program of camera positioning and pose correction based on the Marker image. The template image in this application is the Marker image. The Marker image may also be referred to as an Anchor image; tracker in the 104 module tracks the template image, and updates a second homography. Therefore, complementary filtering processing may be performed on the first homography and the second homography in the module 107 for fusion, and the camera pose information obtained after the fusion is output to a module 108. If the module 105 determines that the detection has a result and a module 109 determines that the tracker has not been initialized, the tracker is initialized, and the tracker starts to work from a next frame; camera pose is estimated based on matched features (corresponding relationship) between captured images using the camera sensor based on initial camera position when tracker is initialized to determine its initial position and three-dimensional direction). Additionally, the rational and motivation to combine the references Matsubara and Lin as applied in rejection of claim 1 apply to this claim. 

Regarding Claim 3, 
The combination of Matsubara and Lin further discloses wherein the effectiveness of the correspondence relationship is such that an amount of movement between positions indicated by two image features in the correspondence relationship is greater, the effectiveness threshold indicates a predetermined amount of movement, and the selection unit selects the correspondence relationships so as not to eliminate the correspondence relationship having a greater amount of movement between the positions indicated by the two image features in the correspondence relationship.  (Lin, [0038-0039],[0097],  discloses processor 120 is further electrically connected to the camera 160. Optionally, the processor 120 is connected to the camera 160 by using a bus. The camera 160 is a sensing device having an image collection capability. The camera 160 may also be referred to as another name such as a camera or a sensing device. The camera 160 has a capability of continuously collecting images or collecting images for multiple times. Optionally, the camera 160 is disposed inside the device or outside the device. In this embodiment of this application, the camera 160 may continuously collect multi-frame images, an i.sup.th frame of image in the multi-frame images is a first image, and an (i+1).sup.th image in the multi-frame images is a second image; Fig. 2 is a schematic scenario diagram of an AR application scenario according to an exemplary embodiment of this application. There exists a desktop 220 in the real world, a picture 222 is on the desktop 220, and picture content of the picture 222 may be regarded as a Marker image. The Marker image is a reference image used for matching. A mobile terminal 240 having the camera performs continuous shooting by using the desktop 220 as a shot image, to obtain frames of images, such as images 1 to 6 shown in FIG. 2. The continuously shot frames of images are successively input to the processor for processing. In this embodiment of this application, the first image is used to refer to an i.sup.th frame of image collected by the camera, and the second image is used to refer to an (i+1).sup.th frame of image collected by the camera. The mobile terminal measures a homography between the Marker image and the second image by using a detector, and measures a homography between the first image and the second image by using a tracker; and then, performs complementary filtering processing on the two homographies, to obtain camera pose information of the mobile terminal through calculation, the camera pose information being used to represent a spatial position of the mobile terminal when the mobile terminal shoots the second image in the real world; obtaining q optical flow feature points as the second optical flow feature points if the quantity of the second optical flow feature points is less than a preset threshold, so that the quantity of the second optical flow feature points reaches the preset threshold, q being a positive integer; image features in two frames are tracked and their motion is determined and compared with threshold to determine its effectiveness for processing including selecting).  Additionally, the rational and motivation to combine the references Matsubara and Lin as applied in rejection of claim 1 apply to this claim.


Regarding Claim 4, 
The combination of Matsubara and Lin further discloses wherein the effectiveness of the correspondence relationship is such that a number of images containing the image features in the correspondence relationship is greater, the effectiveness threshold indicates a predetermined number of images, and 
the selection unit selects the correspondence relationships so as not to eliminate the correspondence relationship having a greater number of the images containing the image features in the correspondence relationship.  (Lin, [0038-0039],[0097],  discloses processor 120 is further electrically connected to the camera 160. Optionally, the processor 120 is connected to the camera 160 by using a bus. The camera 160 is a sensing device having an image collection capability. The camera 160 may also be referred to as another name such as a camera or a sensing device. The camera 160 has a capability of continuously collecting images or collecting images for multiple times. Optionally, the camera 160 is disposed inside the device or outside the device. In this embodiment of this application, the camera 160 may continuously collect multi-frame images, an i.sup.th frame of image in the multi-frame images is a first image, and an (i+1).sup.th image in the multi-frame images is a second image; Fig. 2 is a schematic scenario diagram of an AR application scenario according to an exemplary embodiment of this application. There exists a desktop 220 in the real world, a picture 222 is on the desktop 220, and picture content of the picture 222 may be regarded as a Marker image. The Marker image is a reference image used for matching. A mobile terminal 240 having the camera performs continuous shooting by using the desktop 220 as a shot image, to obtain frames of images, such as images 1 to 6 shown in FIG. 2. The continuously shot frames of images are successively input to the processor for processing. In this embodiment of this application, the first image is used to refer to an i.sup.th frame of image collected by the camera, and the second image is used to refer to an (i+1).sup.th frame of image collected by the camera. The mobile terminal measures a homography between the Marker image and the second image by using a detector, and measures a homography between the first image and the second image by using a tracker; and then, performs complementary filtering processing on the two homographies, to obtain camera pose information of the mobile terminal through calculation, the camera pose information being used to represent a spatial position of the mobile terminal when the mobile terminal shoots the second image in the real world; obtaining q optical flow feature points as the second optical flow feature points if the quantity of the second optical flow feature points is less than a preset threshold, so that the quantity of the second optical flow feature points reaches the preset threshold, q being a positive integer; image features in two frames are tracked and their motion is determined and compared with threshold to determine its effectiveness for processing including selecting).  Additionally, the rational and motivation to combine the references Matsubara and Lin as applied in rejection of claim 1 apply to this claim.


Regarding Claim 5,
The combination of Matsubara and Lin further discloses wherein the effectiveness of the correspondence relationship is such that a degree of similarity of the image features in the correspondence relationship is higher, the effectiveness threshold indicates a predetermined degree of similarity, and the selection unit selects the correspondence relationships so as not to eliminate the correspondence relationship having a higher degree of similarity of the image features in the correspondence relationship.  (Lin, [0038-0039],[0097],  [0131], discloses processor 120 is further electrically connected to the camera 160. Optionally, the processor 120 is connected to the camera 160 by using a bus. The camera 160 is a sensing device having an image collection capability. The camera 160 may also be referred to as another name such as a camera or a sensing device. The camera 160 has a capability of continuously collecting images or collecting images for multiple times. Optionally, the camera 160 is disposed inside the device or outside the device. In this embodiment of this application, the camera 160 may continuously collect multi-frame images, an i.sup.th frame of image in the multi-frame images is a first image, and an (i+1).sup.th image in the multi-frame images is a second image; Fig. 2 is a schematic scenario diagram of an AR application scenario according to an exemplary embodiment of this application. There exists a desktop 220 in the real world, a picture 222 is on the desktop 220, and picture content of the picture 222 may be regarded as a Marker image. The Marker image is a reference image used for matching. A mobile terminal 240 having the camera performs continuous shooting by using the desktop 220 as a shot image, to obtain frames of images, such as images 1 to 6 shown in FIG. 2. The continuously shot frames of images are successively input to the processor for processing. In this embodiment of this application, the first image is used to refer to an i.sup.th frame of image collected by the camera, and the second image is used to refer to an (i+1).sup.th frame of image collected by the camera. The mobile terminal measures a homography between the Marker image and the second image by using a detector, and measures a homography between the first image and the second image by using a tracker; and then, performs complementary filtering processing on the two homographies, to obtain camera pose information of the mobile terminal through calculation, the camera pose information being used to represent a spatial position of the mobile terminal when the mobile terminal shoots the second image in the real world; obtaining q optical flow feature points as the second optical flow feature points if the quantity of the second optical flow feature points is less than a preset threshold, so that the quantity of the second optical flow feature points reaches the preset threshold, q being a positive integer; target grid is a part of grids of the plurality of grids of the template image. That is, the first feature point in the target grid has a matched target feature point in the second image, and each target grid only corresponds to a set of matched feature point pairs. Because when homography calculation is performed on two images, only at least four pairs of feature point pairs may be needed to calculate the homography, the quantity of feature point pairs is less required but the feature point pair requires higher quality. Feature point pairs in the same grid have a relatively high similarity degree, and the terminal may select feature point pairs belonging to different target grids as possible as it can for subsequent calculation; image features in two frames are tracked and their motion is determined and compared with threshold to determine its effectiveness for processing including selecting with use of their features degree of similarity when tracked in image frames).  Additionally, the rational and motivation to combine the references Matsubara and Lin as applied in rejection of claim 1 apply to this claim.


Regarding Claim 6, 
The combination of Matsubara and Lin further discloses wherein, by using a neural network that calculates reliability of the correspondence relationship, the correspondence relationship acquisition unit further acquires the reliability, the effectiveness of the correspondence relationship is such that the reliability is higher, the effectiveness threshold indicates predetermined reliability, and the selection unit selects the correspondence relationships so as not to eliminate the correspondence relationship having higher reliability of the correspondence relationship.  (Lin, [0038-0039],[0097],  [0131], discloses processor 120 is further electrically connected to the camera 160. Optionally, the processor 120 is connected to the camera 160 by using a bus. The camera 160 is a sensing device having an image collection capability. The camera 160 may also be referred to as another name such as a camera or a sensing device. The camera 160 has a capability of continuously collecting images or collecting images for multiple times. Optionally, the camera 160 is disposed inside the device or outside the device. In this embodiment of this application, the camera 160 may continuously collect multi-frame images, an i.sup.th frame of image in the multi-frame images is a first image, and an (i+1).sup.th image in the multi-frame images is a second image; Fig. 2 is a schematic scenario diagram of an AR application scenario according to an exemplary embodiment of this application. There exists a desktop 220 in the real world, a picture 222 is on the desktop 220, and picture content of the picture 222 may be regarded as a Marker image. The Marker image is a reference image used for matching. A mobile terminal 240 having the camera performs continuous shooting by using the desktop 220 as a shot image, to obtain frames of images, such as images 1 to 6 shown in FIG. 2. The continuously shot frames of images are successively input to the processor for processing. In this embodiment of this application, the first image is used to refer to an i.sup.th frame of image collected by the camera, and the second image is used to refer to an (i+1).sup.th frame of image collected by the camera. The mobile terminal measures a homography between the Marker image and the second image by using a detector, and measures a homography between the first image and the second image by using a tracker; and then, performs complementary filtering processing on the two homographies, to obtain camera pose information of the mobile terminal through calculation, the camera pose information being used to represent a spatial position of the mobile terminal when the mobile terminal shoots the second image in the real world; obtaining q optical flow feature points as the second optical flow feature points if the quantity of the second optical flow feature points is less than a preset threshold, so that the quantity of the second optical flow feature points reaches the preset threshold, q being a positive integer; target grid is a part of grids of the plurality of grids of the template image. That is, the first feature point in the target grid has a matched target feature point in the second image, and each target grid only corresponds to a set of matched feature point pairs. Because when homography calculation is performed on two images, only at least four pairs of feature point pairs may be needed to calculate the homography, the quantity of feature point pairs is less required but the feature point pair requires higher quality. Feature point pairs in the same grid have a relatively high similarity degree, and the terminal may select feature point pairs belonging to different target grids as possible as it can for subsequent calculation; image features in two frames are tracked and their motion is determined and compared with threshold to determine its effectiveness for processing including selecting with use of their features degree of similarity when tracked in image frames based on reliability of feature point pairs (quality of features)).  Additionally, the rational and motivation to combine the references Matsubara and Lin as applied in rejection of claim 1 apply to this claim.


Regarding Claim 7, 
The combination of Matsubara and Lin further discloses wherein the influence on at least one of the images is variability in positions of the image features distributed in each of the images, and the selection unit selects the correspondence relationships so as to avoid an evaluation value used for evaluating the variability in the positions of the image features from becoming smaller than an influence threshold when the correspondence relationship with the effectiveness lower than the effectiveness threshold is eliminated.  (Lin, [0038-0039],[0097],  [0131], discloses processor 120 is further electrically connected to the camera 160. Optionally, the processor 120 is connected to the camera 160 by using a bus. The camera 160 is a sensing device having an image collection capability. The camera 160 may also be referred to as another name such as a camera or a sensing device. The camera 160 has a capability of continuously collecting images or collecting images for multiple times. Optionally, the camera 160 is disposed inside the device or outside the device. In this embodiment of this application, the camera 160 may continuously collect multi-frame images, an i.sup.th frame of image in the multi-frame images is a first image, and an (i+1).sup.th image in the multi-frame images is a second image; Fig. 2 is a schematic scenario diagram of an AR application scenario according to an exemplary embodiment of this application. There exists a desktop 220 in the real world, a picture 222 is on the desktop 220, and picture content of the picture 222 may be regarded as a Marker image. The Marker image is a reference image used for matching. A mobile terminal 240 having the camera performs continuous shooting by using the desktop 220 as a shot image, to obtain frames of images, such as images 1 to 6 shown in FIG. 2. The continuously shot frames of images are successively input to the processor for processing. In this embodiment of this application, the first image is used to refer to an i.sup.th frame of image collected by the camera, and the second image is used to refer to an (i+1).sup.th frame of image collected by the camera. The mobile terminal measures a homography between the Marker image and the second image by using a detector, and measures a homography between the first image and the second image by using a tracker; and then, performs complementary filtering processing on the two homographies, to obtain camera pose information of the mobile terminal through calculation, the camera pose information being used to represent a spatial position of the mobile terminal when the mobile terminal shoots the second image in the real world; obtaining q optical flow feature points as the second optical flow feature points if the quantity of the second optical flow feature points is less than a preset threshold, so that the quantity of the second optical flow feature points reaches the preset threshold, q being a positive integer; target grid is a part of grids of the plurality of grids of the template image. That is, the first feature point in the target grid has a matched target feature point in the second image, and each target grid only corresponds to a set of matched feature point pairs. Because when homography calculation is performed on two images, only at least four pairs of feature point pairs may be needed to calculate the homography, the quantity of feature point pairs is less required but the feature point pair requires higher quality. Feature point pairs in the same grid have a relatively high similarity degree, and the terminal may select feature point pairs belonging to different target grids as possible as it can for subsequent calculation; image features in two frames are tracked and their motion is determined and compared with threshold to determine its effectiveness for processing including selecting with use of their features degree of similarity when tracked in image frames based on reliability of feature point pairs (quality of features)).  Additionally, the rational and motivation to combine the references Matsubara and Lin as applied in rejection of claim 1 apply to this claim.

Regarding Claim 8, 
The combination of Matsubara and Lin further discloses wherein the influence on at least one of the images is a total number of the correspondence relationships in the images, and the selection unit selects the correspondence relationships so as to avoid the total number of the correspondence relationships in the images from becoming smaller than an influence threshold when the correspondence relationship with the effectiveness lower than the effectiveness threshold is eliminated.  (Lin, [0038-0039],[0097],  [0131], discloses processor 120 is further electrically connected to the camera 160. Optionally, the processor 120 is connected to the camera 160 by using a bus. The camera 160 is a sensing device having an image collection capability. The camera 160 may also be referred to as another name such as a camera or a sensing device. The camera 160 has a capability of continuously collecting images or collecting images for multiple times. Optionally, the camera 160 is disposed inside the device or outside the device. In this embodiment of this application, the camera 160 may continuously collect multi-frame images, an i.sup.th frame of image in the multi-frame images is a first image, and an (i+1).sup.th image in the multi-frame images is a second image; Fig. 2 is a schematic scenario diagram of an AR application scenario according to an exemplary embodiment of this application. There exists a desktop 220 in the real world, a picture 222 is on the desktop 220, and picture content of the picture 222 may be regarded as a Marker image. The Marker image is a reference image used for matching. A mobile terminal 240 having the camera performs continuous shooting by using the desktop 220 as a shot image, to obtain frames of images, such as images 1 to 6 shown in FIG. 2. The continuously shot frames of images are successively input to the processor for processing. In this embodiment of this application, the first image is used to refer to an i.sup.th frame of image collected by the camera, and the second image is used to refer to an (i+1).sup.th frame of image collected by the camera. The mobile terminal measures a homography between the Marker image and the second image by using a detector, and measures a homography between the first image and the second image by using a tracker; and then, performs complementary filtering processing on the two homographies, to obtain camera pose information of the mobile terminal through calculation, the camera pose information being used to represent a spatial position of the mobile terminal when the mobile terminal shoots the second image in the real world; obtaining q optical flow feature points as the second optical flow feature points if the quantity of the second optical flow feature points is less than a preset threshold, so that the quantity of the second optical flow feature points reaches the preset threshold, q being a positive integer; target grid is a part of grids of the plurality of grids of the template image. That is, the first feature point in the target grid has a matched target feature point in the second image, and each target grid only corresponds to a set of matched feature point pairs. Because when homography calculation is performed on two images, only at least four pairs of feature point pairs may be needed to calculate the homography, the quantity of feature point pairs is less required but the feature point pair requires higher quality. Feature point pairs in the same grid have a relatively high similarity degree, and the terminal may select feature point pairs belonging to different target grids as possible as it can for subsequent calculation; image features in two frames are tracked and their motion is determined and compared with threshold to determine its effectiveness for processing including selecting with use of their features degree of similarity when tracked in image frames based on reliability of feature point pairs (quality of features)).  Additionally, the rational and motivation to combine the references Matsubara and Lin as applied in rejection of claim 1 apply to this claim.

Regarding Claim 9, 
The combination of Matsubara and Lin further discloses wherein the selection unit changes the effectiveness threshold until the total number of the correspondence relationships approaches a predetermined total number, and selects the correspondence relationships so as to avoid the total number of the correspondence relationships in the images from becoming smaller than the influence threshold when the correspondence relationship with the effectiveness lower than the effectiveness threshold is eliminated.  (Lin, [0038-0039],[0097],  [0131], discloses processor 120 is further electrically connected to the camera 160. Optionally, the processor 120 is connected to the camera 160 by using a bus. The camera 160 is a sensing device having an image collection capability. The camera 160 may also be referred to as another name such as a camera or a sensing device. The camera 160 has a capability of continuously collecting images or collecting images for multiple times. Optionally, the camera 160 is disposed inside the device or outside the device. In this embodiment of this application, the camera 160 may continuously collect multi-frame images, an i.sup.th frame of image in the multi-frame images is a first image, and an (i+1).sup.th image in the multi-frame images is a second image; Fig. 2 is a schematic scenario diagram of an AR application scenario according to an exemplary embodiment of this application. There exists a desktop 220 in the real world, a picture 222 is on the desktop 220, and picture content of the picture 222 may be regarded as a Marker image. The Marker image is a reference image used for matching. A mobile terminal 240 having the camera performs continuous shooting by using the desktop 220 as a shot image, to obtain frames of images, such as images 1 to 6 shown in FIG. 2. The continuously shot frames of images are successively input to the processor for processing. In this embodiment of this application, the first image is used to refer to an i.sup.th frame of image collected by the camera, and the second image is used to refer to an (i+1).sup.th frame of image collected by the camera. The mobile terminal measures a homography between the Marker image and the second image by using a detector, and measures a homography between the first image and the second image by using a tracker; and then, performs complementary filtering processing on the two homographies, to obtain camera pose information of the mobile terminal through calculation, the camera pose information being used to represent a spatial position of the mobile terminal when the mobile terminal shoots the second image in the real world; obtaining q optical flow feature points as the second optical flow feature points if the quantity of the second optical flow feature points is less than a preset threshold, so that the quantity of the second optical flow feature points reaches the preset threshold, q being a positive integer; target grid is a part of grids of the plurality of grids of the template image. That is, the first feature point in the target grid has a matched target feature point in the second image, and each target grid only corresponds to a set of matched feature point pairs. Because when homography calculation is performed on two images, only at least four pairs of feature point pairs may be needed to calculate the homography, the quantity of feature point pairs is less required but the feature point pair requires higher quality. Feature point pairs in the same grid have a relatively high similarity degree, and the terminal may select feature point pairs belonging to different target grids as possible as it can for subsequent calculation; image features in two frames are tracked and their motion is determined and compared with threshold to determine its effectiveness for processing including selecting with use of their features degree of similarity when tracked in image frames based on reliability of feature point pairs (quality of features)).  Additionally, the rational and motivation to combine the references Matsubara and Lin as applied in rejection of claim 1 apply to this claim.

Regarding Claim 10, 
The combination of Matsubara and Lin further discloses wherein the influence on at least one of the images is a total number of the image features contained in each of the images, and the selection unit selects the correspondence relationships so as to avoid the total number of the image features from becoming smaller than an influence threshold when the correspondence relationship with the effectiveness lower than the effectiveness threshold is eliminated.  (Lin, [0038-0039],[0097],  [0131], discloses processor 120 is further electrically connected to the camera 160. Optionally, the processor 120 is connected to the camera 160 by using a bus. The camera 160 is a sensing device having an image collection capability. The camera 160 may also be referred to as another name such as a camera or a sensing device. The camera 160 has a capability of continuously collecting images or collecting images for multiple times. Optionally, the camera 160 is disposed inside the device or outside the device. In this embodiment of this application, the camera 160 may continuously collect multi-frame images, an i.sup.th frame of image in the multi-frame images is a first image, and an (i+1).sup.th image in the multi-frame images is a second image; Fig. 2 is a schematic scenario diagram of an AR application scenario according to an exemplary embodiment of this application. There exists a desktop 220 in the real world, a picture 222 is on the desktop 220, and picture content of the picture 222 may be regarded as a Marker image. The Marker image is a reference image used for matching. A mobile terminal 240 having the camera performs continuous shooting by using the desktop 220 as a shot image, to obtain frames of images, such as images 1 to 6 shown in FIG. 2. The continuously shot frames of images are successively input to the processor for processing. In this embodiment of this application, the first image is used to refer to an i.sup.th frame of image collected by the camera, and the second image is used to refer to an (i+1).sup.th frame of image collected by the camera. The mobile terminal measures a homography between the Marker image and the second image by using a detector, and measures a homography between the first image and the second image by using a tracker; and then, performs complementary filtering processing on the two homographies, to obtain camera pose information of the mobile terminal through calculation, the camera pose information being used to represent a spatial position of the mobile terminal when the mobile terminal shoots the second image in the real world; obtaining q optical flow feature points as the second optical flow feature points if the quantity of the second optical flow feature points is less than a preset threshold, so that the quantity of the second optical flow feature points reaches the preset threshold, q being a positive integer; target grid is a part of grids of the plurality of grids of the template image. That is, the first feature point in the target grid has a matched target feature point in the second image, and each target grid only corresponds to a set of matched feature point pairs. Because when homography calculation is performed on two images, only at least four pairs of feature point pairs may be needed to calculate the homography, the quantity of feature point pairs is less required but the feature point pair requires higher quality. Feature point pairs in the same grid have a relatively high similarity degree, and the terminal may select feature point pairs belonging to different target grids as possible as it can for subsequent calculation; image features in two frames are tracked and their motion is determined and compared with threshold to determine its effectiveness for processing including selecting with use of their features degree of similarity when tracked in image frames based on reliability of feature point pairs (quality of features)).  Additionally, the rational and motivation to combine the references Matsubara and Lin as applied in rejection of claim 1 apply to this claim.

Regarding Claim 11, 
The combination of Matsubara and Lin further discloses wherein the selection unit changes the effectiveness threshold until the total number of the image features contained in each of the images approaches a predetermined total number, and selects the correspondence relationships so as to avoid the total number of the image features contained in each of the images from becoming smaller than the influence threshold when the correspondence relationship with the effectiveness lower than the effectiveness threshold is eliminated. (Lin, [0038-0039],[0097],  [0131], discloses processor 120 is further electrically connected to the camera 160. Optionally, the processor 120 is connected to the camera 160 by using a bus. The camera 160 is a sensing device having an image collection capability. The camera 160 may also be referred to as another name such as a camera or a sensing device. The camera 160 has a capability of continuously collecting images or collecting images for multiple times. Optionally, the camera 160 is disposed inside the device or outside the device. In this embodiment of this application, the camera 160 may continuously collect multi-frame images, an i.sup.th frame of image in the multi-frame images is a first image, and an (i+1).sup.th image in the multi-frame images is a second image; Fig. 2 is a schematic scenario diagram of an AR application scenario according to an exemplary embodiment of this application. There exists a desktop 220 in the real world, a picture 222 is on the desktop 220, and picture content of the picture 222 may be regarded as a Marker image. The Marker image is a reference image used for matching. A mobile terminal 240 having the camera performs continuous shooting by using the desktop 220 as a shot image, to obtain frames of images, such as images 1 to 6 shown in FIG. 2. The continuously shot frames of images are successively input to the processor for processing. In this embodiment of this application, the first image is used to refer to an i.sup.th frame of image collected by the camera, and the second image is used to refer to an (i+1).sup.th frame of image collected by the camera. The mobile terminal measures a homography between the Marker image and the second image by using a detector, and measures a homography between the first image and the second image by using a tracker; and then, performs complementary filtering processing on the two homographies, to obtain camera pose information of the mobile terminal through calculation, the camera pose information being used to represent a spatial position of the mobile terminal when the mobile terminal shoots the second image in the real world; obtaining q optical flow feature points as the second optical flow feature points if the quantity of the second optical flow feature points is less than a preset threshold, so that the quantity of the second optical flow feature points reaches the preset threshold, q being a positive integer; target grid is a part of grids of the plurality of grids of the template image. That is, the first feature point in the target grid has a matched target feature point in the second image, and each target grid only corresponds to a set of matched feature point pairs. Because when homography calculation is performed on two images, only at least four pairs of feature point pairs may be needed to calculate the homography, the quantity of feature point pairs is less required but the feature point pair requires higher quality. Feature point pairs in the same grid have a relatively high similarity degree, and the terminal may select feature point pairs belonging to different target grids as possible as it can for subsequent calculation; image features in two frames are tracked and their motion is determined and compared with threshold to determine its effectiveness for processing including selecting with use of their features degree of similarity when tracked in image frames based on reliability of feature point pairs (quality of features)).  Additionally, the rational and motivation to combine the references Matsubara and Lin as applied in rejection of claim 1 apply to this claim.

Claims 12 and 13 recite computer program and method with instructions and steps corresponding to the apparatus elements recited in Claim 1. Therefore, the recited instructions of the computer program and steps of the method Claims 12 and 13 are mapped to the proposed combination in the same manner as the corresponding elements of Claim 1. Additionally, the rationale and motivation to combine the Matsubara and Lin references presented in rejection of Claim 1, apply to these claims.

Furthermore, the combination of Matsubara and Lin further discloses A computer program product having a non-transitory computer readable medium including programmed instructions stored thereon, wherein the instructions, when executed by a computer, cause the computer to function (Lin, [0226], discloses memory 420 may be configured to store a software program and module. The processor 480 runs the software program and module stored in the memory 420, to implement various functional applications and data processing of the mobile phone. The memory 420 may mainly include a program storage area and a data storage area. The program storage area may store an operating system, an application program required by at least one function (such as a sound playback function and an image display function), and the like. The data storage area may store data (such as audio data and an address book) created according to use of the mobile phone, and the like. In addition, the memory 420 may include a high speed random access memory, and may further include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory, or other volatile solid-state storage devices). 


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PINALBEN V PATEL whose telephone number is (571)270-5872. The examiner can normally be reached M-F: 10am - 8pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Wills-Burns Chineyere can be reached at 571-272-9752. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Pinalben Patel/Examiner, Art Unit 2673
Read full office action
Prosecution Timeline

Feb 27, 2024
Application Filed
Feb 05, 2026
Non-Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/311,014
Patent 12602824
SUBSTRATE TREATING APPARATUS AND SUBSTRATE TREATING METHOD
2y 5m to grant Granted Apr 14, 2026
17/825,207
Patent 12596437
Monitoring System and Method Having Gesture Detection
2y 5m to grant Granted Apr 07, 2026
18/178,589
Patent 12597235
INFORMATION PROCESSING APPARATUS, LEARNING METHOD, RECOGNITION METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM
2y 5m to grant Granted Apr 07, 2026
18/327,919
Patent 12586215
VEHICLE POSE
2y 5m to grant Granted Mar 24, 2026
18/490,178
Patent 12586217
VISION SENSOR, OPERATING METHOD OF VISION SENSOR, AND IMAGE PROCESSING DEVICE INCLUDING THE VISION SENSOR
2y 5m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
89%
Grant Probability
99%
With Interview (+9.9%)
2y 6m
Median Time to Grant
Low
PTA Risk
Based on 545 resolved cases by this examiner. Grant probability derived from career allow rate.