DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . The Amendment filed 6 October 2025 (hereinafter “the Amendment”) has been entered and considered. Claims 21, 30, 32-33, and 37-38 have been amended. Claims 26, 31, and 36 have been canceled. Claims 21-25, 27-30, 32-35, and 37-40, all the claims pending in the application, are rejected. All new grounds of rejection set forth in the present action were necessitated by Applicant’s claim amendments; accordingly, this action is made final.
Response to Amendment
Claim Rejections - 35 USC § 112
In view of the amendments to independent claims 21, 32, and 37, the rejections under 35 USC 112 are withdrawn.
Double Patenting
Applicant indicates that consideration of filing a terminal disclaimer to obviate the double patenting rejections will be made when the pending claims are otherwise in condition for allowance. The double patenting rejections are maintained.
Prior Art Rejections
On pages 9-11, Applicant contends that the applied art does not teach or suggest the newly added features of independent claims 21, 32, and 37: “selecting, by the computing system, a respective machine-learned model from a plurality of machine-learned models, wherein each machine-learned model is trained using imagery from a particular geographic region and is associated with the particular geographic region and the respective machine-learned model is selected when the computing system comprising the camera is within the particular geographic region”. In support of this assertion, Applicant argues that Tomioka describes matching the location of images to the location of images used in training models and does not describe machine-learned models associated with particular regions at all, much less selecting a particular machine-learned model when the computing system is within that region. The Examiner respectfully disagrees and maintains that Tomioka does indeed teach the newly added limitations of the claims.
Initially, the Examiner notes that Tomioka discloses that the “location of images” to which Applicant refers is the location of the apparatus itself. In particular, Tomioka’s information processing apparatus 2 including image capturing apparatus 11 includes “sensor information such as a GPS signal” used to calculate “position information” that is compared with the location of the images used to train the various machine learning models ([0092] and Fig. 10). For example, the information processing apparatus may be “used in an automobile” in conjunction with a “car-navigation system” ([0126]). Indeed, the whole point of Tomioka’s invention is “to calculate the position and orientation of the image capturing apparatus accurately” ([0100]).
Furthermore, each of Tomioka’s candidate models is necessarily characterized by the range of locations (i.e., a geographic region) within which the training images used to train the respective models are located.
For example, Tomioka discloses that “the learning model group holding unit 130 holds at least two learning models and position information lists that each include information relating to the position at which a training image used in training of a learning model was shot” ([0092]). Tomioka’s system “compares the position information acquired in step S1410 with the position information list loaded in step S1420, and searches for matching position information” ([0098]). For example, using “latitude and longitude obtained from the GPS” as position information, “it is determined whether or not the distance between the position information included in the position information list acquired in step S1410 and the position information held by the learning model group holding unit is within a predetermined threshold”, and if so, “a count is given to this learning model” ([0099]). The apparatus may then “select the learning model that has the largest number of counts” ([0065]).
Here, Tomioka describes a system that compares the position information of an image captured by an apparatus with locations of training images used to train different machine learning models. If the position information (i.e., the location of the vehicle) is sufficiently near enough of the locations of training images, as determined by the incremental count assigned to each learning model by position comparison of training images in the position information list for that model, that learning model is selected.
Importantly, the locations of training images necessarily define a geographic region for the model trained using those training images. Such a geographic region may be thought of as including all the locations of training images plus the “predetermined threshold” distance. The apparatus/vehicle MUST be within the geographic region characterizing the selected model; otherwise, the “count” for the model would never increment, and the model would not be selected.
Accordingly, the Examiner maintains that Tomioka does indeed disclose machine-learned models associated with particular regions, as well as selecting a particular machine-learned model when the computing system is within that region, contrary to Applicant’s assertions.
Moreover, Tomioka describes a “modification” to the above-described system in which “the position information may be…a region name” such that a model is selected if the “region name” at which the input image is captured matches the “region name” at which the training image was shot ([0101-0103]). Here, Tomioka expressly discloses machine learning models associated with particular regions (defined by name) and selecting a particular machine learning model when the input image (i.e., vehicle) location is within a matching region.
For all the foregoing reasons, the prior art rejections are maintained.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 21-25, 27-30, 32-33, and 37 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-5, 7-10, and 12-14 of U.S. Patent No. 11,994,405 (hereinafter “the ‘405 patent”) in view of U.S. Patent Application Publication No. 2019/0130216 to Tomioka et al. (made of record in parent U.S. Patent Application No. 16/978,374 and cited in IDS filed 4/2/24; hereinafter “Tomioka”).
As to independent claim 21, claim 1 of the ‘405 patent discloses a computer-implemented method comprising: receiving, by a computing system comprising a camera, data generated by the computing system, wherein the data comprises a geographic location of the computing system and imagery that includes at least a portion of a physical real-world environment comprising the camera and a travelway (“A computer-implemented method comprising: receiving, by a computing system, data generated by a camera and representing imagery that includes at least a portion of a physical real-world environment comprising the camera and a travelway…determining a geographic location of the camera”); selecting, by the computing system, a respective machine-learned model from a plurality of machine-learned models, wherein each machine-learned model is trained using imagery from a particular geographic region and is associated with particular geographic region and the respective machine-learned model is selected based on the particular geographic region (“selecting, based at least in part on the geographic location of the camera, a machine-learning model from amongst a plurality of different machine-learning models for determining geographic orientations of cameras with respect to travelways, the plurality of different machine-learning models being based at least in part on training data comprising imagery from different corresponding geographic regions, the machine-learning model being based at least in part on training data comprising imagery from a geographic region comprising the geographic location of the camera”); determining, by the computing system, a geographic orientation of the camera with respect to the travelway (“determining, by the computing system, a geographic orientation of the camera with respect to the travelway”).
The claims of the ‘405 patent do not expressly disclose providing, by the computing system, the imagery to the respective machine-learned model as input or that the geographic orientation of the camera is determined based on an output of the machine-learned model or that the respective machine-learned model is selected when the computing system comprising the camera is within the particular geographic region.
Tomioka, like the claims of the ‘405 patent, is directed to an apparatus for “measuring the position and orientation of an image capturing apparatus” based on images captured thereby, wherein the apparatus further uses GPS of the camera at the time the images are captured (Abstract, [0022, 0090-0103]). Tomioka discloses that the model that measures the position and orientation of the camera is a “Convolutional Neural Network (CNN)” which is known to be a “learning model” ([0023]). Tomioka discloses acquiring an input image from the camera, selecting a learning model from among a plurality of learning models stored in unit 130 based on evaluation values of the plurality of learning models, and obtaining the position and orientation of the camera based on an output of the selected learning model in response to the input image ([0038-0047]). For example, Tomioka discloses that “the learning model group holding unit 130 holds at least two learning models and position information lists that each include information relating to the position at which a training image used in training of a learning model was shot” ([0092]). Tomioka’s system “compares the position information acquired in step S1410 with the position information list loaded in step S1420, and searches for matching position information” ([0098]). For example, using “latitude and longitude obtained from the GPS” as position information, “it is determined whether or not the distance between the position information included in the position information list acquired in step S1410 and the position information held by the learning model group holding unit is within a predetermined threshold”, and if so, “a count is given to this learning model” ([0099]). The apparatus may then “select the learning model that has the largest number of counts” ([0065]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the claims of the ‘405 patent to determine the camera position and orientation based on the output of the selected machine learning model responsive to the input image and select the machine learning model trained on training images within a geographic region in which the camera is located, as taught by Tomioka, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. It is predictable that the proposed modification would have put the selected model in claim 1 of the ‘405 patent to use, thereby resulting in a working device. Indeed, the selected model would not need to be selected, as required by the claim 1 of the ‘405 patent, if not for the purpose of using the selected model to input the image to produce some output, as taught by Tomioka. It is further predictable that selecting a model trained on images located within a geographic region within which the camera is located would have made “it possible to calculate the position and orientation of the image capturing apparatus accurately” ([0100] of Tomioka).
Claims 22-25 and 27-30 are dependent on claim 21 and recite features identical to those recited in claims 2-5 and 7-10, respectively, of the ‘405 patent. Since claims 2-5 and 7-10 of the ‘405 patent are dependent on claim 1 of the ‘405 patent, claims 22-25 and 27-30 are rejected based on claims 2-5 and 7-10 of the ‘405 patent as modified by Tomioka above.
As to independent claim 32, claim 12 of the ‘405 patent discloses a computing system comprising: a camera; one or more processors; and a memory storing instructions that when executed by the one or more processors cause the system to perform operations comprising: receiving data generated by the computing system, wherein the data comprises a geographic location of the computing system and imagery that includes at least a portion of a physical real-world environment comprising the camera and a travelway (“A system comprising: one or more processors; and a memory storing instructions that when executed by the one or more processors cause the system to perform operations comprising: receiving data generated by a camera and representing imagery that includes at least a portion of a physical real-world environment comprising the camera and a travelway…determining a geographic location of the camera”); selecting a respective machine-learned model from a plurality of machine-learned models, wherein each machine-learned model is trained using imagery from a particular geographic region and is associated with the particular geographic region and the respective machine-learned model is selected based on the particular geographic location (“selecting, based at least in part on the geographic location of the camera, a machine-learning model from amongst a plurality of different machine-learning models for determining geographic orientations of cameras with respect to travelways, the plurality of different machine-learning models being based at least in part on training data comprising imagery from different corresponding geographic regions, the machine-learning model being based at least in part on training data comprising imagery from a geographic region comprising the geographic location of the camera”); determining a geographic orientation of the camera with respect to the travelway (“determining a geographic orientation of the camera with respect to the travelway”).
The claims of the ‘405 patent do not expressly disclose providing the imagery to the respective machine-learned model as input or that the geographic orientation of the camera is determined based on an output of the machine-learned model or that the respective machine-learned model is selected when the computing system comprising the camera is within the particular geographic region.
Tomioka, like the claims of the ‘405 patent, is directed to an apparatus for “measuring the position and orientation of an image capturing apparatus” based on images captured thereby, wherein the apparatus further uses GPS of the camera at the time the images are captured (Abstract, [0022, 0090-0103]). Tomioka discloses that the model that measures the position and orientation of the camera is a “Convolutional Neural Network (CNN)” which is known to be a “learning model” ([0023]). Tomioka discloses acquiring an input image from the camera, selecting a learning model from among a plurality of learning models stored in unit 130 based on evaluation values of the plurality of learning models, and obtaining the position and orientation of the camera based on an output of the selected learning model in response to the input image ([0038-0047]). For example, Tomioka discloses that “the learning model group holding unit 130 holds at least two learning models and position information lists that each include information relating to the position at which a training image used in training of a learning model was shot” ([0092]). Tomioka’s system “compares the position information acquired in step S1410 with the position information list loaded in step S1420, and searches for matching position information” ([0098]). For example, using “latitude and longitude obtained from the GPS” as position information, “it is determined whether or not the distance between the position information included in the position information list acquired in step S1410 and the position information held by the learning model group holding unit is within a predetermined threshold”, and if so, “a count is given to this learning model” ([0099]). The apparatus may then “select the learning model that has the largest number of counts” ([0065]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the claims of the ‘405 patent to determine the camera position and orientation based on the output of the selected machine learning model responsive to the input image and select the machine learning model trained on training images within a geographic region in which the camera is located, as taught by Tomioka, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. It is predictable that the proposed modification would have put the selected model in claim 12 of the ‘405 patent to use, thereby resulting in a working device. Indeed, the selected model would not need to be selected, as required by the claim 12 of the ‘405 patent, if not for the purpose of using the selected model to input the image to produce some output, as taught by Tomioka. It is further predictable that selecting a model trained on images located within a geographic region within which the camera is located would have made “it possible to calculate the position and orientation of the image capturing apparatus accurately” ([0100] of Tomioka).
Claim 33 is dependent on claim 32 and recites features identical to those recited in claim 13 of the ‘405 patent. Since claim 13 of the ‘405 patent is dependent on claim 12 of the ‘405 patent, claim 33 is rejected based on claim 13 of the ‘405 patent as modified by Tomioka above.
As to independent claim 37, claim 14 of the ‘405 patent discloses one or more non-transitory computer-readable media comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: receiving data generated by a user computing device comprising a camera, wherein the data comprises a geographic location of the user computing device and imagery that includes at least a portion of a physical real-world environment comprising the camera and a travelway (“One or more non-transitory computer-readable media comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: receiving data generated by a camera and representing imagery that includes at least a portion of a physical real-world environment comprising the camera and a travelway…determining a geographic location of the camera”); selecting a respective machine-learned model from a plurality of machine-learned models, wherein each machine-learned model is trained using imagery from a particular geographic region and is associated with the particular geographic region and the respective machine-learned model is selected based on the particular geographic region (“selecting, based at least in part on the geographic location of the camera, a machine-learning model from amongst a plurality of different machine-learning models for determining geographic orientations of cameras with respect to travelways, the plurality of different machine-learning models being based at least in part on training data comprising imagery from different corresponding geographic regions, the machine-learning model being based at least in part on training data comprising imagery from a geographic region comprising the geographic location of the camera”); determining a geographic orientation of the camera with respect to the travelway (“determining a geographic orientation of the camera with respect to the travelway”).
The claims of the ‘405 patent do not expressly disclose providing the imagery to the respective machine-learned model as input or that the geographic orientation of the camera is determined based on an output of the machine-learned model or that the respective machine-learned model is selected when the computing system comprising the camera is within the particular geographic region.
Tomioka, like the claims of the ‘405 patent, is directed to an apparatus for “measuring the position and orientation of an image capturing apparatus” based on images captured thereby, wherein the apparatus further uses GPS of the camera at the time the images are captured (Abstract, [0022, 0090-0103]). Tomioka discloses that the model that measures the position and orientation of the camera is a “Convolutional Neural Network (CNN)” which is known to be a “learning model” ([0023]). Tomioka discloses acquiring an input image from the camera, selecting a learning model from among a plurality of learning models stored in unit 130 based on evaluation values of the plurality of learning models, and obtaining the position and orientation of the camera based on an output of the selected learning model in response to the input image ([0038-0047]). For example, Tomioka discloses that “the learning model group holding unit 130 holds at least two learning models and position information lists that each include information relating to the position at which a training image used in training of a learning model was shot” ([0092]). Tomioka’s system “compares the position information acquired in step S1410 with the position information list loaded in step S1420, and searches for matching position information” ([0098]). For example, using “latitude and longitude obtained from the GPS” as position information, “it is determined whether or not the distance between the position information included in the position information list acquired in step S1410 and the position information held by the learning model group holding unit is within a predetermined threshold”, and if so, “a count is given to this learning model” ([0099]). The apparatus may then “select the learning model that has the largest number of counts” ([0065]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the claims of the ‘405 patent to determine the camera position and orientation based on the output of the selected machine learning model responsive to the input image and select the machine learning model trained on training images within a geographic region in which the camera is located, as taught by Tomioka, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. It is predictable that the proposed modification would have put the selected model in claim 14 of the ‘405 patent to use, thereby resulting in a working device. Indeed, the selected model would not need to be selected, as required by the claim 14 of the ‘405 patent, if not for the purpose of using the selected model to input the image to produce some output, as taught by Tomioka. It is further predictable that selecting a model trained on images located within a geographic region within which the camera is located would have made “it possible to calculate the position and orientation of the image capturing apparatus accurately” ([0100] of Tomioka).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 21-22, 28-30, 32-33, and 37-38 are rejected under 35 U.S.C. 103 as being unpatentable over “A Sensor Fusion Framework Using Multiple Particle Filters for Video-Based Navigation” by Bai et al. (made of record in parent U.S. Patent Application No. 16/978,374 and cited in IDS filed 4/2/24; hereinafter “Bai”) in view of U.S. Patent Application Publication No. 2019/0130216 to Tomioka et al. (made of record in parent U.S. Patent Application No. 16/978,374 and cited in IDS filed 4/2/24; hereinafter “Tomioka”).
As to independent claim 21, Bai discloses a computer-implemented method comprising: receiving, by a computing system comprising a camera, data generated by the computing system, wherein the data comprises a geographic location of the computing system and imagery that includes at least a portion of a physical real-world environment comprising the camera and a travelway (Section II and Fig. 2 disclose a system which receives vision data and GPS coordinates from a device comprising a camera and GPS sensor, the vision data comprising video frames including the road and surrounding features of the environment; see Figs. 1-2, 10, 11, and 13; Section VI discloses that the method is implemented on a “laptop”); providing, by the computing system, the imagery to a model as input; and determining, by the computing system, a geographic orientation of the camera with respect to the travelway based on an output of the model (Sections II and IV and Fig. 2 show that the “vision” data (camera images) are input to a model that outputs “position and camera pose”; in particular, the system comprises a plurality of “models” that work collectively and use GPS coordinates to perform map matching to identify the most probable road segments that the vehicle is on, then uses visual tracking for tracking multiple road arcs in the camera images to “estimate the lateral position and orientation of the vehicle (or the camera) with respect to the local [road coordinate system] RCS”).
Bai discloses that the road shape is modeled based on the images, and the model learns pertinent features from example images (Section IV(D)). Although this appears to be an implicit disclosure that the model is a machine-learning model, the disclosure of such a machine-learning model is not expressly disclosed. That is, Bai does not expressly disclose that the model that inputs the imagery and outputs information used to determine the geographic orientation of the camera with respect to the travelway is a respective machine-learned model.
Although Bai discloses that the GPS coordinates are used in a coarse, first step in the localization process (Section II: “GPS coordinates are indexed into the GIS to determine through map matching the most probable road segments that the vehicle is on”), Bai does not expressly disclose selecting, by the computing system, a respective machine-learned model from a plurality of machine-learned models, wherein each machine-learned model is trained using imagery from a particular geographic region and is associated with the particular geographic region and the respective machine-learned model is selected when the computing system comprising the camera is within the particular geographic region.
Tomioka, like Bai, is directed to an apparatus for “measuring the position and orientation of an image capturing apparatus” based on images captured thereby, wherein the apparatus further uses GPS of the camera at the time the images are captured, and wherein the scene captured by the camera may include a road, traffic lights, signs, and automobiles (Abstract, [0022, 0090-0103, 0126, 0212, 0216]). Tomioka discloses that the model that measures the position and orientation of the camera is a “Convolutional Neural Network (CNN)” which is known to be a “learning model” ([0023]).
Generally, Tomioka discloses acquiring an input image from the camera, selecting a learning model from among a plurality of learning models stored in unit 130 based on evaluation values of the plurality of learning models, and obtaining the position and orientation of the camera based on an output of the selected learning model ([0038-0047]). In a particular embodiment, Tomioka discloses that the evaluation values of the respective learning models are based on “positions at which the input image and the training image used in training of the learning model were shot”, wherein the positions are based on GPS and “the higher the degree of match between information on the position at which an input image was shot and information on the position at which a training image used in training of a learning model was shot is, the higher the evaluation value that is given to the learning model is” ([0090, 0100]).
Specifically, Tomioka discloses that “the learning model group holding unit 130 holds at least two learning models and position information lists that each include information relating to the position at which a training image used in training of a learning model was shot” ([0092]). Tomioka’s system “compares the position information acquired in step S1410 with the position information list loaded in step S1420, and searches for matching position information” ([0098]). For example, using “latitude and longitude obtained from the GPS” as position information, “it is determined whether or not the distance between the position information included in the position information list acquired in step S1410 and the position information held by the learning model group holding unit is within a predetermined threshold”, and if so, “a count is given to this learning model” ([0099]). The apparatus may then “select the learning model that has the largest number of counts” ([0065]).
That is, Tomioka teaches that the model that inputs the imagery and outputs information used to determine the geographic orientation of the camera is a respective machine-learned model ([0022-0023] discloses that the model that measures the position and orientation of the camera based on an input image captured by the camera is a CNN) and selecting, by the computing system, a respective machine-learned model from a plurality of machine-learned models, wherein each machine-learned model is trained using imagery from a particular geographic region and is associated with the particular geographic region and the respective machine-learned model is selected when the computing system comprising the camera is within the particular geographic region ([0090-0103] and Figs. 1 and 8 discloses that each of a plurality of learning models are stored in unit 130 along with the position information at which the training images used to train the respective learning models were shot, as determined by GPS, and that the learning model used to determine the position and orientation of the camera capturing an input image is selected as the learning model whose training images were shot at a positions sufficiently near enough of the location of the camera image (i.e., vehicle/apparatus), as determined by the incremental count assigned to each learning model by position comparison of training images in the position information list for that model. Importantly, the locations of training images necessarily define a geographic region for the model trained using those training images. Such a geographic region may be thought of as including all the locations of training images plus the “predetermined threshold” distance. The apparatus/vehicle MUST be within the geographic region characterizing the selected model; otherwise, the “count” for the model would never increment, and the model would not be selected.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Bai to determine the camera position and orientation using a machine learning model and to select the machine learning model among a plurality of machine learning models as the one that was trained on images that were captured within a geographic region in which the input image was captured (i.e., location of the vehicle/apparatus including the image capturing device), as taught by Tomioka, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. More specifically, Bai’s model for determining camera position and orientation as modified by Tomioka’s machine learning model for determining camera position and orientation and selection thereof can yield a predictable result of determining camera position and orientation of an input image since both reference teach precisely that. Further, the use of machine learning would have resulted in a more accurate determination of the camera position and orientation over the classic image processing techniques of Bai, and the selection of the model according to proximity of the input image and the training images upon which the model was trained would have further increased the accuracy ([0100] of Tomioka). Thus, a person of ordinary skill would have appreciated including in Bai’s model the ability to select a machine learning model in the manner taught by Tomioka since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.
As to claim 22, Bai as modified by Tomioka above further teaches determining, by the computing system, the geographic orientation of the travelway with respect to the physical real-world environment; and determining, by the computing system and based at least in part on the geographic orientation of the camera with respect to the travelway and the geographic orientation of the travelway with respect to the physical real-world environment, a geographic orientation of the camera with respect to the physical real-world environment (Section II and Fig. 2 of Bai disclose that the system uses GPS coordinates to perform map matching to identify the most probable road segments that the vehicle is on (geographic orientation of the road with respect to real-world coordinates), then uses visual tracking for tracking multiple road arcs in the camera images to estimate the lateral position and orientation of the camera with respect to the road coordinate system, and GIS map features are matched to the image features to improve GPS accuracy (camera orientation with respect to real-world coordinates)).
As to claim 28, Bai as modified by Tomioka further teaches communicating, by the computing system, data based at least in part on the geographic orientation to one or more of a geographic-mapping application or a geographic-navigation application (Section II and Fig. 2 of Bai disclose that the camera position and pose is fed back to the GIS system which is a geographic mapping application).
As to claim 29, Bai as modified by Tomioka further teaches communicating, by the computing system, data based at least in part on the geographic orientation to an augmented reality (AR) application (Section VII and Fig. 13 of Bai disclose that the camera pose/orientation is used in an augmented reality route-planning program in which virtual road signs are superimposed on the road).
As to claim 30, Bai as modified by Tomioka further teaches that determining the geographic orientation comprises determining, locally, by the computing system, the geographic orientation (Section VI of Bai discloses that the system is implemented by a camera locally connected to a laptop that performs the disclosed algorithm locally).
Independent claim 32 recites a computing system comprising: a camera; one or more processors; and a memory storing instructions that when executed by the one or more processors cause the system to perform operations (Section II of Bai discloses that the system includes a “camera”, and Section VI of Bai discloses that the algorithm is implemented using a “laptop” necessarily comprising a processor running software that must be stored in memory) comprising those recited in the method of independent claim 21. Accordingly, claim 32 is rejected for reasons analogous to those discussed above in conjunction with claim 21.
Claim 33 recites features nearly identical to those recited in claim 22. Accordingly, claim 33 is rejected for reasons analogous to those discussed above in conjunction with claim 22.
Independent claim 37 recites one or more non-transitory computer-readable media comprising instructions that when executed by one or more computers cause the one or more computers to perform operations (Section VI of Bai discloses that the algorithm is implemented using a “laptop” necessarily running software instructions which must be stored on some media) comprising those recited in the method of independent claim 21. Accordingly, claim 37 is rejected for reasons analogous to those discussed above in conjunction with claim 21.
Claim 38 recites features nearly identical to those recited in claim 22. Accordingly, claim 38 is rejected for reasons analogous to those discussed above in conjunction with claim 22.
Claims 23, 34, and 39 are rejected under 35 U.S.C. 103 as being unpatentable over Bai in view of Tomioka and further in view of U.S. Patent Application Publication No. 2019/0009777 to Cheng (made of record in parent U.S. Patent Application No. 16/978,374 and cited in IDS filed 4/2/24; hereinafter “Cheng”).
As to claim 23, Bai as modified by Tomioka above further teaches the computer-implemented method of claim 21 (see mapping of claim 21 above), wherein determining the geographic orientation of the camera with respect to the travelway comprises: determining, based at least in part on a machine-learning model, the geographic orientation of the camera with respect to the travelway (Sections II and IV and Fig. 2 of Bai show that the “vision” data (camera images) are input to a model that outputs “position and camera pose”; in particular, the system comprises a plurality of “models” that work collectively and use GPS coordinates to perform map matching to identify the most probable road segments that the vehicle is on, then uses visual tracking for tracking multiple road arcs in the camera images to “estimate the lateral position and orientation of the vehicle (or the camera) with respect to the local [road coordinate system] RCS”; [0090-0103] and Figs. 1 and 8 of Tomioka discloses a machine learning model used to determine the position and orientation of the camera; the reasons for combining the references are the same as those discussed above in conjunction with claim 21).
Bai as modified by Tomioka above does not expressly disclose determining two possible geographic orientations of the camera with respect to the travelway, the two possible geographic orientations differing by one hundred and eighty degrees; and selecting, from amongst the two possible geographic orientations, the geographic orientation of the camera with respect to the travelway.
Cheng, like Bai, is directed to judging a vehicle driving direction based on image data from a camera mounted on the vehicle (Abstract and [0036]). Cheng contemplates a scenario in which the vehicle driving direction algorithm arrives at two equally likely candidate directions – 0 degrees and an opposite direction 180 degrees ([0056]). Cheng discloses distinguishing the front end of the vehicle from the rear end in order to select one of the candidate directions ([0057-0060]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Bai and Tomioka to select one of two determined candidate vehicle/camera orientations differing by 180 degrees, as taught by Cheng, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. It is predictable that the proposed modification would have accurately disambiguated the vehicle/camera orientation.
Each of claims 34 and 39 recites features nearly identical to those recited in claim 23. Accordingly, claims 34 and 39 are rejected for reasons analogous to those discussed above in conjunction with claim 23.
Claims 24-25, 35, and 40 are rejected under 35 U.S.C. 103 as being unpatentable over Bai in view of Tomioka and Cheng and further in view of U.S. Patent Application Publication No. 2018/0357907 to Reiley et al. (made of record in parent U.S. Patent Application No. 16/978,374 and cited in IDS filed 4/2/24; hereinafter “Reiley”).
As to claim 24, Bai as modified by Tomioka and Cheng above does not expressly disclose that selecting the geographic orientation of the camera with respect to the travelway comprises: identifying, in the imagery, one or more of at least a portion of a building or at least a portion of a different travelway; and selecting the geographic orientation of the camera with respect to the travelway based at least in part on the one or more of the at least a portion of the building or the at least a portion of the different travelway.
Reiley, like Bai, is directed to analyzing images to determine a location of a device (Abstract). In particular, Reiley discloses extracting from captured images text features of buildings and/or street signs, matching the extracted text features with text features stored in conjunction with map data, and identifying the geographic location of the imaging device based on the matched features and corresponding map data ([0054-0055]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Bai, Tomioka and Cheng to utilize images to determine the geographic location of the camera based on recognized road signs and building text, as taught by Reiley, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. It is predictable that the proposed modification would have more accurately localized the device by virtue of using additional information in the captured images for orientation determination.
As to claim 25, Bai as modified by Tomioka, Cheng, and Reiley above further teaches that identifying the one or more of the at least a portion of the building or the at least a portion of the different travelway comprises: recognizing, in the imagery, text associated with the one or more of the at least a portion of the building or the at least a portion of the different travelway; and identifying, based at least in part on the text, the one or more of the at least a portion of the building or the at least a portion of the different travelway ([0054-0055] of Reiley discloses extracting from captured images text features of buildings and/or street signs, matching the extracted text features with text features stored in conjunction with map data, and identifying the geographic location of the imaging device based on the matched features and corresponding map data; the reasons for combining the references are the same as those discussed above).
Each of claims 35 and 40 recites features nearly identical to those recited in claim 24. Accordingly, claims 35 and 40 are rejected for reasons analogous to those discussed above in conjunction with claim 24.
Claim 27 is rejected under 35 U.S.C. 103 as being unpatentable over Bai in view of Tomioka and further in view of “Leveraging 3D City Models for Rotation Invariant Place-of-Interest Recognition” by Baatz et al. (made of record in parent U.S. Patent Application No. 16/978,374 and cited in IDS filed 4/2/24; hereinafter “Baatz”).
As to claim 27, Bai as modified by Tomioka above does not expressly disclose that the machine-learning model is trained based at least in part on training data comprising: a plurality of images cropped from panoramic imagery generated by a camera mounted on a vehicle that includes one or more sensors for determining a geographic orientation of the camera mounted on the vehicle with respect to one or more of: a travelway upon a portion of which the vehicle is traveling while the camera mounted on the vehicle captures the panoramic imagery, or a physical real-world environment comprising the vehicle and the travelway upon the portion of which the vehicle is traveling; and for each image of the plurality of images, a geographic orientation of the image with respect to a travelway upon a portion of which the vehicle was traveling when the camera mounted on the vehicle captured panoramic imagery from which the image was cropped, the geographic orientation of the image being determined based at least in part on data generated by the one or more sensors when the camera mounted on the vehicle captured the panoramic imagery from which the image was cropped.
Baatz, like Bai, is directed to analyzing images to output camera pose in real world coordinates (Abstract). Baatz discloses an offline training system in which the training data includes perspective images extracted with a 60 degree field of view every 20 degrees of “panoramic images captured by a vehicle driving systematically through the streets” (Section 3.1). Such a training system is a machine-learning model. For each of these panoramic images (and thus for the extracted perspective images), “the camera position and orientation is known from GPS and sensor data” (Section 3.1).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Bai and Tomioka to train a machine-learning model using training data that includes perspective images extracted from panoramic images captured by a vehicle driving systematically through the streets, wherein the camera position and orientation for each image are known from GPS and sensor data, as taught by Baatz, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. It is predictable that the proposed modification would have enhanced accuracy of the image recognition results by virtue of training the model using a large number of training images.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEAN M CONNER whose telephone number is (571)272-1486. The examiner can normally be reached 10 AM - 6 PM Monday through Friday, and some Saturday afternoons.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Greg Morse can be reached at (571) 272-3838. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SEAN M CONNER/Primary Examiner, Art Unit 2663