DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 12 February, 2026 has been entered.
Response to Amendment
Claims 1-30 are pending. Claims 1-30 are amended directly or by dependency on an amended claim.
Response to Arguments
Applicant's arguments, see page 8, filed 12 February, 2026 with respect to the 35 USC 101 rejections have been fully considered but they are not persuasive. The “circuitry” and “neural networks” are recited at a high level of generality, i.e., as generic computer components performing generic computer functions. The claims do not reveal any concrete way of employing “circuitry” and “neural networks” to do the object detection. While the claims are more specific following amendment, the claims are still rejected under 35 USC 101 as being directed to an abstract idea under the new rationale laid out in the 35 USC 101 section below. As opposed to the mental processes grouping, the claims now appear to be pure math.
Applicant’s arguments, see pages 9-13 with respect to the 35 USC 103 rejections of claim(s) 1-3, 7-9, 13-15, 19-21 and 25-27 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Similarly the dependent claims are now rejected under a new combination of references.
Applicant’s arguments, see pages 8-9, with respect to the 35 USC 112b/2nd rejections of claims 1, 13, 19 and 25 have been fully considered and are persuasive. The 35 USC 112b/2nd rejections of claims 1, 13, 19 and 25 have been withdrawn.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
The following is the 101 analysis of claim 1:
Step 1: Statutory Category?
Yes, a series of steps carried out using a processor
Step 2: Prong 1: Judicial Exception Recited?
Yes, the limitations “compare one or more generated feature vectors of an object segmentation in each of two or more images to determine that the one or more generated feature vectors match across the two or more images” and “identify the same object depicted in the two or more images based on determining the one or more generated feature vectors match the one or more generated feature vectors” fall within the mathematical concepts grouping of abstract ideas. The phrase “A processor comprising: one or more circuits to use one or more neural networks” is still considered composed of generic computer components. The “one or more circuits” are recited at a high level of generality/no detailed description of their operation or architecture are provided. The generic computer model is not a model invented or improved by the applicant so would also not amount to a technical improvement. There is no practical application as the neural network merely stands in for the human mind performing the image evaluation. Neural networks, especially as applied to image processing, are well-known, routine, and conventional. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claims recite an abstract idea.
Step 2A prong 2: Integrated into a Practical Application?
No. The claim recites one additional element: that a processor comprising circuitry is used to perform the identifying step. The processor is recited at a high level of generality, i.e., as a generic processor performing a generic computer function of processing data (compare one or more generated feature vectors of an object segmentation in each of two or more images to determine that the one or more generated feature vectors match across the two or more images and identify the same object). This generic processor limitation is no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to the abstract idea.
Step 2B: Claim provides an Inventive Concept?
No. As discussed with respect to Step 2A Prong Two, the additional element in the claim amounts to no more than mere instructions to apply the exception using a generic computer component.
The same analysis applies here in 2B, i.e., mere instructions to apply an exception using a generic computer component cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B. The claim is ineligible.
In step 2A prong 2, the neural network is generic and high level. The generic computer model is not a model invented or improved by the applicant so would also not amount to a technical improvement. There is no practical application as the NN merely performs a mathematical calculation. Neural networks, especially as applied to image processing, are well-known, routine, and conventional. For instance, see the following prior art:
US 20040091134 A1 “The image processing system 107 may also make use of the neural network 115 in typing the vessels that pass through the pass-through area. This may be performed in accordance with well known neural network image processing techniques during which the processing is preceded by one or more training sessions to enhance the accuracy of the image recognition” [0088]
US 20180225504 A1 “Artificial intelligence and machine learning techniques such as neural networks and rule-based expert systems have been known in the art for many years, and are applied in many different fields. In particular, in the field of machine vision and image processing a particular type of neural network known as a convolutional neural network (CNN) is often used, and such networks achieve fast and accurate results in image recognition and classification systems (see e.g. http://en.wikipedia.org/wiki/Convolutional_neural_network for a review of the state of the art), and it has been reported (ibid.) that the performance of convolutional neural networks is now close to that of humans, although they still struggle with identifying objects that are small and thin. However CNNs can be capable of outperforming humans in classifying images of objects into fine-grained categories, for example images of different breeds of dog or bird” [0003]
US 20060147101 A1 Neural networks are well known in the image processing arts for their ability to represent non-linear mappings between a set of input variables and a set of output variables [0124]
US 20180082407 A1 As explained in Gatys, one class of Deep Neural Networks (DNN) that are especially powerful in image processing tasks are known as Convolutional Neural Networks (CNNs). Convolutional Neural Networks consist of layers of small computational units that process visual information in a hierarchical fashion, e.g., often represented in the form of “layers.” [0024]
US 20120206050 A1 The artificial vision scene recognition system algorithms programmed into the controller 501 are obtained by methodologies known in the art such as using neural network architectures and learning algorithms for pattern recognition, image processing, and computer vision [0300]
Countless other references establish neural networks on a general level as well-known, routine, and conventional.
With respect to the processor in the 2B analysis there is court evidence establishing that a computer/processor is well-known, routine, and conventional.
The limitations reciting “non transitory computer readable medium,” “memory” and “processing device” have been recognized by the courts as being well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality). MPEP § 2106.05(d), II. Courts have held computer‐implemented processes not to be significantly more than an abstract idea (and thus ineligible) where the claim as a whole amounts to nothing more than generic computer functions merely used to implement an abstract idea, such as an idea that could be done by a human analog (i.e., by hand or by merely thinking). See MPEP § 2106.05(d), II. The limitations include sending and receiving information regarding the assignment of videos to decoders. See MPEP § 2106.05(d), II, I, (Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362; TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096).
“Another consideration when determining whether a claim recites significantly more than a judicial exception is whether the additional elements amount to more than a recitation of the words “apply it” (or an equivalent) or are more than mere instructions to implement an abstract idea or other exception on a computer…..Thus, for example, claims that amount to nothing more than an instruction to apply the abstract idea using a generic computer do not render an abstract idea eligible.” MPEP § 2106.05(f).
The Supreme Court has identified additional elements as mere instructions to apply an exception in several cases… The [Alice] Court found that the recitation of the computer in the claim amounted to mere instructions to apply the abstract idea on a generic computer. 134 S. Ct. at 2359-60, 110 USPQ2d at 1984. The Supreme Court also discussed this concept in an earlier case, Gottschalk v. Benson, 409 U.S. 63, 70, 175 USPQ 673, 676 (1972), where the claim recited a process for converting binary-coded-decimal (BCD) numerals into pure binary numbers. The Court found that the claimed process had no substantial practical application except in connection with a computer. Benson, 409 U.S. at 71-72, 175 USPQ at 676. The claim simply stated a judicial exception (e.g., law of nature or abstract idea) while effectively adding words that “apply it” in a computer. Id. MPEP § 2106.05(f).
An example of a case in which a computer was used as a tool to perform a mental process is Mortgage Grader, 811 F.3d. at 1324, 117 USPQ2d at 1699. The patentee in Mortgage Grader claimed a computer-implemented system for enabling borrowers to anonymously shop for loan packages offered by a plurality of lenders, comprising a database that stores loan package data from the lenders, and a computer system providing an interface and a grading module. The interface prompts a borrower to enter personal information, which the grading module uses to calculate the borrower’s credit grading, and allows the borrower to identify and compare loan packages in the database using the credit grading. 811 F.3d. at 1318, 117 USPQ2d at 1695. The Federal Circuit determined that these claims were directed to the concept of "anonymous loan shopping", which was a concept that could be "performed by humans without a computer." 811 F.3d. at 1324, 117 USPQ2d at 1699. Another example is Berkheimer v. HP, Inc., 881 F.3d 1360, 125 USPQ2d 1649 (Fed. Cir. 2018), in which the patentee claimed methods for parsing and evaluating data using a computer processing system. The Federal Circuit determined that these claims were directed to mental processes of parsing and comparing data, because the steps were recited at a high level of generality and merely used computers as a tool to perform the processes. 881 F.3d at 1366, 125 USPQ2d at 1652-53.
Therefore, based on the above, claim 1 is rejected under 35 USC 101 as an abstract idea.
Claim 2 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim recites use the one or more neural networks to identify the object segmentation. This could be a human determining a region of interest.
Claim 6 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The limitation “provide information identifying the corresponding representations of the one or more objects, wherein a task is to be performed using an identified object for the corresponding representations” could just describe a human verbally stating the image is an apple or an orange. While the specification indicates a possible embodiment for the task is the manipulation of a robot, as there is no specific citation of an instrument such as a robot in the claim, as only a possible embodiment this is not enough to make the claim statutory.
Claim 7 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more, for the same reasons as claim 1. Claim 7 is a slightly more broad version of claim 7 as claim 7 does not specify one or more circuits. The same rationale applies.
Claim 13 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more, for the same reasons as claim 1. Claim 13 is a more broad version of claim 1 as no processor or circuits are specified.
Claim 19 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more, for the same reasons as claim 1. Claim 19 recites a machine-readable medium and a processor as opposed to a processor comprising one or more circuits. The same rationale applies.
Claim 25 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more, for the same reasons as claim 1. Claim 25 additionally cites a memory, however the memory is cited at a high level of generality and is therefore again considered a generic computer component.
Claims 8, 14, 20, and 26 are rejected for the same reasons as claim 2.
Claims 12, 18, 24, and 30 are rejected for the same reasons as claim 6.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1-3, 7-9, 13-15, 19-21, and 25-27 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Bottos et al. (US 20220335626 A1).
Regarding claims 1, 7, 13, 19, and 25, Bottos et al. disclose one or more processors, comprising circuitry to use (circuit, [0033], ASICs, [0072]); a system (abstract) comprising: one or more processors ([0033], [0072], [0075]) to use; a method (abstract) comprising: using; a machine-readable medium having stored thereon a set of instructions, which if performed by one or more processors, cause the one or more processors (storage medium with software, [0077]) to at least use; and an object identification system, comprising one or more processors ([0033], [0072], [0075]) and one or more neural networks ([0037]) to: compare one or more generated feature vectors of an object segmentation in each of two or more images to determine that the one or more generated feature vectors match across the two or more images (portions or snippets of images extracted from or based on the created bounding box, [0038], Upon leaving the view of camera 100A and entering another ROI (e.g., camera 100C), object 102 can be detected and tagged or identified as a new object relative to camera 100C, object 102 should be re-identified, feature vectors of the frames may be compared to feature vectors of potential matches in object gallery 252, comparison may be implemented using a cosine similarity metric, finding a match, [0041]) and identify the same object depicted in the two or more images based on determining the one or more generated feature vectors match the one or more generated feature vectors (“Following the example of FIG. 1, object 102 may move from an ROI within the field of view of camera 100A to another ROI within the field of view of camera 100B. To provide or obtain a pool of samples relating to each object (e.g., object 102) in the form of feature vectors, and to enable maintaining a record or log of where object 102 travels, where it may be at a given time, and/or actions performed by object 102, object 102 may be re-identified upon entry into a second ROI covered by another camera”, [0040], “Upon finding a match (if a match exists), the five frames of “interim” data can be discarded from object gallery 252 and any additional data or information collected regarding object 102 from camera 100C may be saved. The location to save the additional data may correspond to the original file location (e.g., under images and/or features files corresponding to object 102 and camera 100A). That is, images or features captured by camera 100A may be saved as images/camA_1 and features/camA_1 and further images/features captured by camera 100C may be saved under the same files, since object 102 has been re-identified as being the same object detected by camera 100A”, [0041]).
Regarding claims 2, 8, 14, 20, and 26, Bottos et al. disclose the one or more processors, system, method, machine-readable medium, and system of claims 1, 7, 13, 19, and 25. Bottos et al. further indicate the circuitry is further to use the one or more neural networks to identify the object segmentation of the object in the two or more images (As alluded to above, statistical analysis, big data analytics, machine learning, or other processes can be performed on or with the aggregated data. The results of these analyses or processes can be used to further refine the operation of camera 100A or client edge device 200A. For example, certain objects detected or monitored within an ROI may be deemed to be unimportant, and camera 100A or client edge device 200A can be updated to ignore such objects, [0035]).
Regarding claims 3, 9, 15, 21, and 27, Bottos et al. disclose the one or more processors, system, method, machine-readable medium, and system of claims 2, 8, 14, 20, and 26. Bottos et al. further indicate the one or more feature vectors are calculated from a plurality of features indicative of a respective object segmentation (In some embodiments, a bounding box may be generated about some location data (e.g., scene feature) or object relative to which images may be captured (e.g., encircling, surrounding, around the object, etc.), [0038], “Feature encoder 220 may extract, for example, prominent features of the bounded object and can output those features to an object gallery 252 representative of all or some subset of images of object 102 captured by camera 100A. The features may be output as a raw image. For example, feature encoder 220 may extract features such as the shape of the bag being carried, the color of the bag, and other identifying features or aspects of the object. An assumption can be made that a tracked bounding box may bound or contain the same object(s). The output of feature encoder 220 may be a plurality of image sets or vectors, each corresponding to one of the set of raw images 250. Each image (1−N) may be represented by an M-by-1 (M×1) dimensional feature vector corresponding to a given image in object gallery 252”, [0039]).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 4, 10, 16, 22, and 28 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bottos et al. (US 20220335626 A1) as applied to claims 3, 9, 15, 21, and 27 above, further in view of Vaidwan et al. (“A study on transformer-based Object Detection”, 2021).
Regarding claims 4, 10, 16, 22, and 28, Bottos et al. disclose the one or more processors, system, method, machine-readable medium, and system of claims 3, 9, 15, 21, and 27. Bottos et al. do not disclose the one or more neural networks include a deformable detection transformer to generate the one or more feature vectors.
Vaidwan et al. teach one or more neural networks include a deformable detection transformer to generate the one or more feature vectors (The overall DETR architecture is made of 3 components: a CNN backbone [20], an encoder-decoder architecture & FFN [17]. An image is fed into the backbone & a feature vector is given as output, typically of size C x H x W. Further, this is passed through a 1x1 convolution [21] to reduce the channel dimension from C to a smaller dimension d. The encoder expects a sequence of inputs, so the vector is flattened to dx(HW) size. Since the transformer has a permutation invariant architecture, a fixed sine based encoding technique is used and these encodings are fed to the input of each layer in the encoder. Decoder transforms N embeddings of size d in parallel. Here the decoder is also not affected by permutation, so the N input embeddings to the decoder must be different to produce different results. These embeddings are also called object queries and these are learnt positional embeddings, part IIIA).
Bottos et al. and Vaidwan et al. are in the same art of using feature vectors for object detection (Bottos et al., abstract, [0041]; Vaidwan et al., abstract, part IIIA). The combination of Vaidwan et al. with Bottos et al. allows use of DETR. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the DETR of Vaidwan et al. with the invention of Bottos et al. as this was known at the time of filing, the combination would have predictable results, and as Vaidwan et al. indicates, “End to end object detection is a new paradigm that has got attention in recent times. It does not require complex hand-engineered components such as non max suppression to detect objects inside an image” “These enhanced models not only improve the mean average precision of the model but also improves the total convergence time” (abstract) thereby providing an accuracy benefit to the combination of inventions.
Claim(s) 5, 6, 11, 12, 17, 18, 23, 24, 29, and 30 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bottos et al. (US 20220335626 A1) as applied to claims 2, 8, 14, 20, and 26 above, further in view of Lee (US 20210158815 A1).
Regarding claims 5, 11, 17, 23, and 29, Bottos et al. disclose the one or more processors, system, method, machine-readable medium, and system of claims 2, 8, 14, 20, and 26. Bottos et al. do not disclose the circuitry is further to use a comparator to determine corresponding feature vectors, for corresponding representations of the object, for the two or more images.
Lee teaches use a comparator to determine corresponding feature vectors, for corresponding representations of the one or more objects, for the one or more images (“The comparator 265 may extract feature vectors from the first and second images, respectively, and may compare the feature vectors extracted from the first image and the second image. That is, the comparator 265 may extract the feature vector of the first image that is the generated image, may extract the feature vector of the second image that is the photographed image, and then may compare similarities as a vector value. For example, the similarity value may be extracted as 0.0 to 1.0, and may be extracted as a value close to 1.0 depending on a degree of similarity. Thus, the comparator 265 may compare each of the plurality of captured images with the generated image to extract a similarity value, and may output the same result (true) with respect to the captured image extracted at a value above a reference similarity value. The reference similarity value may be preset as a reference value for determining that the compared images are the same”, [0148]).
Bottos et al. and Lee are in the same art of using feature vectors for object detection (Bottos et al., abstract, [0041]; Lee, abstract, [0148]). The combination of Lee with Bottos et al. allows use of a comparator. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the comparator of Lee with the invention of Bottos et al. as this was known at the time of filing, the combination would have predictable results, and as Lee indicates, “The generative model-based device may generate an image based on a spoken utterance of a user, and may repeatedly a procedure of discriminating an actual image and an image generated through an internal discrimination model to learn the generative model, and thus may compensate for a region that is not covered by training data, thereby enhancing performance of the generative model. The discriminative model-based device may perform text classification and image classification based on the spoken utterance of a user, may combine the performing results, and may perform a specific intended operation to efficiently and accurately determine the instruction intent of the user using as much information as possible” ([0163]-[0164]), thereby indicating generative AI applications of the object detection of Lee when combined with Bottos et al..
Regarding claims 6, 12, 18, 24, and 30, Bottos et al. and Lee disclose the one or more processors system, method, machine-readable medium, and system of claims 5, 11, 17, 23, and 29. Lee further indicate the circuitry is further to provide information identifying a corresponding representation of the object, wherein a task is to be performed using an identified object for the corresponding representations (The robot 30a may refer to a machine which automatically handles a given task by its own ability, or which operates autonomously. In particular, a robot having a function of recognizing an environment and performing an operation according to its own judgment may be referred to as an intelligent robot, [0041], VR technology provides objects or backgrounds of the real world only in the form of CG images. AR technology provides virtual CG images overlaid on the physical object images, [0043], For example, the imaging apparatus 200 may perform an action of photographing children within a predetermined space in order to search for a missing child and determining whether the photographed children correspond to the missing child. Here, the action may be a task performed by the imaging apparatus 200 in order to perform the instruction indicated by the speech input and the speech input may include a named entity that is a target of the action, [0072], The imaging apparatus 200 that is embodied as a robot may capture an image and may communicate with the user while moving in an indoor space. Although FIG. 2 illustrates the case in which the imaging apparatus 200 is embodied as a robot, the imaging apparatus 200 may be embodied as various electronic devices such as an AI speaker, a smartphone, a tablet PC, or a computer, [0073], when the intent to search for a body in an airport is analyzed, the imaging apparatus 200 may capture an image of boys while moving in the airport, [0156]).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M ENTEZARI HAUSMANN whose telephone number is (571)270-5084. The examiner can normally be reached 10-7 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent M Rudolph can be reached at (571) 272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHELLE M ENTEZARI HAUSMANN/Primary Examiner, Art Unit 2671