DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The Amendment filed 11/14/2025 in response to the Non-Final Office Action mailed 05/14/2025 has been entered.
Claims 1-20 are currently pending in U.S. Patent Application No. 18/179,803 and an Office action on the merits follows.
Response to 35 USC § 112 Rejections
In view of the foregoing amendments to claim(s) 3/11 and 10 (correcting antecedent), rejections under 35 U.S.C. § 112(b) are withdrawn.
Response to 35 USC § 101 Rejections
Applicant's arguments regarding subject matter eligibility analysis have been considered but determined non-persuasive. Applicant’s remarks reference the August 4, 2025 memo and assert that a fair/proper reading of at least some of the limitations as required by e.g. claim 1 (namely that ‘querying a feature vector database’ limitation) is/are of a complexity that precludes being practically performed in the mind, with or without the assistance of tools such as a pen and paper and/or a computer as a tool (see MPEP 2106.04(a)(2) III Mental Processes and C. A Claim That Requires a Computer May Still Recite a Mental Process – and more specifically “3) merely using a computer as a tool to perform the concept” therein), and as a result the claims are eligible (directed to a Prong One finding). Applicant argues that a low complexity feature vector would be shared across many objects/targets/identities and accordingly a person would not be able to mentally derive a unique identifier on the basis of the hypothetical low complexity vector. This argument however makes assumptions regarding the nature of the database, what feature vectors are stored therein, how many unique identifiers it comprises, etc., in addition to how the search space may be additionally restricted. While true that a low complexity vector may be common to multiple identities, they can still be unique identities even if they share the low complexity vector in common (just because Bob is male and tall, does not mean the identifier for Bob is not unique if other individuals in the database are also male and tall), and a human/operator/person performing a database query may further limit the search space based on any number of criteria (male and tall need not be sole criteria), thereby rendering the determination of a unique identity/identifier possible. Even if a low complexity feature vector is used as an initial, or partial basis (e.g. the low complexity vector may serve to quickly/efficiently reduce the search space), it need not be the sole basis, in deriving/ deciding/determining a unique identifier. There are no explicitly recited limitations regarding the complexity of any of the feature vector(s), the database, the information it comprises, the unique identifier(s) determined, etc., so as to exclude the hypothetical illustration previously provided. Even assuming arguendo that the ‘querying’ in question cannot be performed practically in the mind/with a computer as a tool, Applicant’s remarks do not address any of the other limitations additionally drawn under the mental processes Abstract Idea grouping. A single (or even multiple) ‘additional element(s)’ excluded from being drawn to the exception does/do not prevent a Prong One finding that the claim ‘recites’ the exception. Various Examples from the 2024 PEG make this clear, as they feature ‘additional elements’ but corresponding Prong One analysis of Step 2A still find(s) the claim(s) ‘recite’ an exception. In some instances these ‘additional elements’ are insufficient still at Prong Two. Examiner maintains the Prong One finding previously presented accordingly.
Applicant’s Prong Two analysis identifies a ‘practical application’ that appears subsumed within/by the identified exception. Applicant’s remarks assert that a process of collecting additional feature vectors/information, in association with a unique identifier, is in itself practical/useful (regardless of how much of that process can be drawn under the exception). It is well established however that information collection and analysis, including when limited to particular content (e.g. discriminative features for object/target/person re-identification), falls within the realm of abstract ideas. See, e.g., Internet Patents Corp. v. Active Network, Inc., 790 F.3d 1343, 1349 (Fed. Cir. 2015); Digitech Image Techs., LLC v. Elecs. for Imaging, Inc., 758 F.3d 1344, 1351 (Fed. Cir. 2014); CyberSource Corp. v. Retail Decisions, Inc., 654 F.3d 1366, 1370 (Fed. Cir. 2011). Applicant’s arguments regarding eligibility analysis fail to identify what specific ‘additional elements’ (other than perhaps/implicitly the ‘querying’ which is the only limitation asserted to be an ‘additional element’), distinct from those drawn under the exception (which the querying is not for those reasons identified above), serve for integration (how they realize the improvement) and what pertinent section of the MPEP would support that finding - how it/they serve for integration in view of the manner in which they realize an improvement citing MPEP 2106.05(a) as distinguished from (f), (g) and/or (h). Instead remarks appear to assert the claim as a whole facilitates the collection of additional information/feature vectors, and is thereby useful. Is an asserted improvement based on a position that the state of the art does not permit updating a datastore to comprise/ ‘associate’ additional/ complementary feature vectors? Or that such additional information would be inaccurate if not acquired in part on the basis of an initial/first detection and that the state of the art is/was never to consider any initial/first detection? References of record do not suggest either. It is only those ‘additional elements’ beyond the exception, that may serve for integration, and the Alice-Mayo framework’s roots in pre-emption concern weighing the recited exception against any additional elements – evaluating neither in a vacuum (this is also addressed in the August 4, 2025 memo with reference to footnote 12). As an additional consideration for Applicant, while Examination practice concerns the enumerated Abstract Idea groupings from the 2019 PEG, the courts have repeatedly declined to adopt those groupings. Rideshare Displays, Inc v. Lyft, Inc., No. 23-2033, (Fed. Cir. September 29, 2025) (https://www.cafc.uscourts.gov/opinions-orders/23-2033.OPINION.9-29-2025_2579953.pdf) footnote at page 14.
Applicant’s Step 2B analysis borrows from that Prong Two of 2A finding given their overlap. Examiner finds this argument non-persuasive for those same reasons identified above regarding Prong Two of 2A – it is not clear which ‘additional elements’ serve as ‘significantly more’/an ‘inventive concept’ (at 2B), much like there are none that realize an improvement (distinct from the exception itself) to any ‘technical field’ (at Prong two of 2A). While it has not been previously asserted that any explicitly identified ‘additional elements’ are well-understood/conventional activity (WURC) (since there is an additional burden of proof on the Examiner in such an instance and there are few if any ‘additional elements’ distinct from the exception), the claim(s) as permissibly interpreted rest with ‘associating’ that second feature vector with the unique identifier. Law enforcement personnel/detectives as an example, likely commonly/conventionally extract additional/ supplemental discriminative features of a target object/person of interest (with the assistance of a computer as a tool and more broadly), from second imagery, based on an earlier detection – particularly if for example the earlier view/imagery provides for a certain degree of identifying features, such as clothes, height/body type, presence within an approximate time and/or area/location /venue/position, etc., and the second imagery may provide for complementary information (e.g. a more complete/ unobstructed, higher quality etc., view of the individual’s face, corroborate their presence proximate a time/location of interest, etc.). This/these same personnel may then link/connect/associate (see NF at page 8 regarding ‘associating’) this information in the aggregate. Corresponding rejections to the claims are maintained accordingly.
Response to Arguments/Remarks
Applicant's arguments regarding claim rejections under 35 USC 103 in view of Kim as modified by Rebien have been fully considered but they are not persuasive. Applicant’s remarks at page 13 appear to acknowledge that Kim discloses/suggests forecasting/predicting ‘where’ movement takes an object/person of interest, but assert that such a forecasted/ predicted position/location/‘where’ is not a basis in that “locating the target object in the second image using the position determined from the first image” limitation. Examiner disagrees, because the claim does not specify how the position determined is ‘used’, let alone how the ‘position’ should be necessarily interpreted outside of permissible interpretations as afforded by BRI in view of MPEP 2173.01 and 2111. Using an initial position to derive/forecast/predict a second, then using the predicted/forecasted position to locate a target object in second imagery reads – because the first/initial position was still used/relied upon – even if indirectly/for the purposes of predicting that second position. Also, it may be the case that a position/location associated with the second image, is that same/‘the’ ‘position’ determined from the first image. Stated differently, the ‘position’ recited need not be e.g. any tube for the first image, and need not exist within the first image even if it optionally may. Does Applicant intend for the position to be defined within the context of the first image (e.g. an ‘image position’ that is a set of pixel coordinates?), and not within the context of e.g. a real/physical space as defined by a world coordinate system? Even in such an instance, a simple bounding box delineating the person/target/object, as a preprocessing step, would read – because like the instance identified above, it is ‘used’ even if indirectly/initially and is accordingly a basis even if not the sole basis, in the locating as recited (e.g. an initial bounding box to set a region for feature extraction, and using those extracted features for a subsequent detection/locating step would read). Remarks at page 14 further assert that Kim relies upon ‘tubes’ that are not themselves defined as feature vectors, and thereby fails to teach/suggest that ‘associating’ accordingly. The Office Action at page 13 however identifies the manner in which Kim fails to disclose storing feature vectors directly, and instead this is a teaching from the secondary/Rebien reference. In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). Applicant’s remarks fall silent regarding Rebein as applied and are non-persuasive accordingly. Examiner disagrees with any assertion that Kim fails to suggest ‘feature vectors’ entirely. Kim features multiple instances of such language, [0125], [0132], in addition to that CNN based network structure that commonly/routinely produces (and is explicitly disclosed as such in Kim [0132]) output in such a format. Examiner maintains that references of record as reasonably combined serve to teach/suggest the instant claims as broadly/appropriately interpreted.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claim(s) 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception, in particular an Abstract Idea falling under at least the (c) mental processes grouping (concepts performed in the human mind including an observation, evaluation, judgement, opinion) and/or the (a) mathematical concepts category/grouping (mathematical relationships, formulas or equations, and/or calculations), not ‘integrated into a practical application’ at Prong Two of Step 2A and without ‘significantly more’ at Step 2B.
Step 1: The claim(s) in question are directed to a computer implemented method/ process (hardware/ structural limitations considered under the ‘apply it’ provisions of MPEP 2106.05(f)) for object re-identification. Examiner notes that the independent claims at least rest with broadly ‘associating’ extracted feature vectors - serving to suggest any additional elements of the claim(s) that do not pertain to a mentally performed object re-identification likely fall under mere data gathering as discussed in MPEP 2106.05(g) see e.g. Ultramercial, 772 F.3d at 715, 112 USPQ2d at 1754 for that ‘tenth step’ of the claim and at the second step in the analysis with reference to CyberSource Corp. v. Retail Decisions, Inc., 654 F.3d 1366, 1370 (Fed.Cir.2011)). (Step 1: Yes).
Step 2A, Prong One: This part of the eligibility analysis evaluates whether the claim recites a judicial exception. As explained in MPEP 2106.04, subsection II, a claim “recites” a judicial exception when the judicial exception is “set forth” or “described” in the claim. Claim(s) 1/9/16 recites at a high level of generality – ‘determining, from a first image captured by a first view sensor, a position of a target object’, ‘querying a feature vector database to determine a unique identifier…’, ‘locating the target object in the second image using the position determined from the first image’, and ‘extracting… discriminative features’, each/all falling (in view of a plain meaning/broadest reasonable interpretation(s), see MPEP 2111.01) under the mental processes grouping (and/or mathematical concepts grouping for the case of that ‘querying’ and/or ‘extracting’) (as per the recent guidance a claim as a whole need not be drawn to exclusively one of the three identified Abstract Idea groupings). Reference may be made to the July 2024 PEG and those various limitations drawn to the mental processes grouping(s), to include those of Example 47 claim 2. The claims/limitations in question are recited at a high level of generality and lack any specifics precluding such ‘determining’, ‘locating’, ‘extracting’, etc., from being interpreted under the mental processes grouping practically performed in the mind (see also MPEP 2106.04(a)(2) identifying how e.g. a use of pen and paper and/or a computer as a tool (to visually analyze/observe acquired images/video) fail to preclude such an interpretation under the mental processes Abstract Idea grouping). The determined position for a visually recognized and/or queried (e.g. person matching description of person(s) of interest) target object/person, may be an absolute position such as a certain room in a building/campus, a street intersection, etc., mentally determined by means of visually analyzing e.g. CCTV imagery (alternatively a predicted/expected position). Knowing/having determined this position a person/human/user may then view second imagery from a complementary view/angle in a CCTV system as an example, and use said position to identify/locate a subject of interest in the second imagery. No constraints are placed on the complexity of any ‘extracted’ feature vectors, so as to preclude them from being determined mentally (e.g. the recited features may minimally constitute a few binary values associated with the presence or absence of certain visually recognizable features (e.g. glasses, beard, white hat, PGPUB at [0015])). As such the claim(s), in view of those limitations recited at a high level of generality, fail(s) to preclude an interpretation falling under the mental processes grouping. Regarding claim 2 the recited ‘comparing’ may also be performed mentally/with pen and paper – e.g. a determination of Squared Euclidean distances between one or more query and reference feature vectors. Regarding claim 5 the recited determining may be for example reading a visual indication geodetic coordinates for any of the views/cameras in the CCTV system. Claim 7 also recites a ‘mapping’ that fails to preclude one being performed mentally – e.g. the user/human viewing the various CCTV video feeds may recognize a landmark from one perspective and use this landmark as detected in a complementary view to corroborate/validate the determined position relative to the second image for that ‘locating’. While not claimed, Applicant’s Specification discloses e.g. object detection by means of CV/ML techniques broadly, however as identified in the most recent PEG, even a form of automating that broadly/generically involves the use of a machine learning model, would fail to preclude the limitations in question from being drawn to the mental processes grouping (see guidance with respect to ‘apply it’ considerations of MPEP 2106.05(f) – a generically recited ML model for implementing what is otherwise manually/mentally performed fails to integrate). (Step 2A, Prong One: Yes).
Step 2A, Prong Two: This part of the eligibility analysis evaluates whether the claim as a whole integrates the recited judicial exception into a practical application of the exception. This evaluation is performed by (1) identifying whether there are any ‘additional elements’ recited in the claim beyond the judicial exception, and (2) evaluating those additional elements individually and in combination to determine whether the claim as a whole integrates the exception into a practical application. See MPEP 2106.04(d). Examiner notes for consideration at Prong 2 of 2A that MPEP 2106.05(a), (b), (c), and (e) generally concern limitations that are indicative of integration, whereas 2106.05(f), (g), and (h) generally concern limitations that are not indicative of integration. As an additional note, ‘additional elements’ are generally limitations excluded from interpretation under the Abstract Idea groupings, and may comprise portions of limitations otherwise identified as falling under those Abstract Idea groupings of the 2019 PEG (e.g. any ‘determination’ that may be made mentally accompanied by the use of a neural network and/or generic computer hardware considered under the ‘apply it’ considerations of 2106.05(f)). Any ‘providing’/outputting broadly, and ‘collection’ of data (i.e. image acquisition(s) and ‘associating’/storing feature vectors), be they images for training any learning model and/or data/images visually observable/ evaluated by a user/operator, also fail(s) to integrate at least in view of MPEP 2106.05(g) (extra-solution data gathering/output) and/or 2106.05(h) as ‘generally linking’ the exception to a field of use involving machine learning and/or imagery so acquired. The same determination holds for dependent claims that serve to limit the collection of data/images and/or introduce limitations generally linking to a field of use e.g. claim(s) 4/15/17. Assuming arguendo that a camera pose determination as recited in claim 5 is precluded from being performed mentally/read from a screen, and also that such a pose determination does not fall under the mathematical concepts grouping, it is not relied upon for any explicitly recited purpose in the claim(s) (failing to integrate in view of 2106.05(g) and/or (h)). None of the instant claims appear to explicitly/clearly capture/recite any disclosed improvement in technology (see MPEP 2106.05(a)) and any ‘additional elements’, even when considered in combination, fail to integrate at Prong Two of Step 2A accordingly. Integration in view of subsection (a) requires an identification of the manner in which the improvement is achieved, to be explicitly and specifically (not at a high level of generality) recited in the claims, as ‘additional elements’ precluded from interpretation under any of the Abstract Idea groupings (since the improvement cannot be to the exception itself). Even the ‘locating’ (which is not an ‘additional element’ for the reasons identified above with respect to Prong One and so cannot integrate at Prong Two), at best broadly constitutes restricting a visual search space based on a recognized/ predicted, etc., position as visually determined/recognized from a first image – which would not constitute a specifically recited improvement even if it were somehow precluded from being performed mentally. In view of MPEP 2106.05(f), the improvement cannot be merely/broadly automating what is otherwise the exception and also cannot itself be an exception. Even when viewed in combination, the ‘additional elements’ present do not integrate the recited judicial exception into a practical application (Step 2A, Prong Two: No), and the claims are directed to the judicial exception. (Revised Step 2A: Yes [Wingdings font/0xE0] Step 2B).
Step 2B: This part of the eligibility analysis evaluates whether the claim as a whole amounts to ‘significantly more’ than the recited exception, i.e., whether any ‘additional element’, or combination of additional elements, adds an inventive concept to the claim. The considerations of Step 2A Prong 2 and Step 2B overlap, but differ in that 2B also requires considering whether the claims feature any “specific limitation(s) other than what is well-understood, routine, conventional activity in the field” (WURC) (MPEP 2106.05(d)). Such a limitation if specifically recited however, must still be excluded from interpretation under any of the Abstract Idea groupings. Step 2B further requires a re-evaluation of any additional elements drawn to extra-solution activity in Step 2A (e.g. gathering imagery and ‘associating’/storing vectors) – however no limitations appear directed to any novel collection per se, or novel storage/associating format as examples. Claim 3 in particular is worthy of reconsideration at 2B in view of the manner in which that ‘saving’ fails to serve in integration at Prong Two of 2A in view of MPEP 2106.05(g). However even at 2B for the case of claim 3, as an example, updating a database to comprise a series of different/complementary reference/template views for a same object is well-understood/conventional (a pineapple top appears very different from its bottom/fruit portion, but both are characteristic of a pineapple – and as such distinct views of each serve as valuable reference/template information when identifying (visually or otherwise) a pineapple by reference/template comparison/matching). Limitations not indicative of an inventive concept/ ‘significantly more’ include those that are not specifically recited (instead recited at a high level of generality), those that are established as WURC, and/or those that are not ‘additional elements’ by nature of their analysis at Prong One (i.e. reciting the exception). Reference may also be made to the 2024 PEG describing that an improvement/ inventive concept (for ‘significantly more’ determination(s)) cannot be to the judicial exception itself. The claim(s) in question recite little beyond those limitations recited at a high level of generality and falling under the mental processes grouping, limitations involving computer implementation appear directed to at best automating what is/was otherwise routinely performed mentally/manually (see 2106.05(f) as distinguished from (a)), and ‘additional elements’ otherwise recited at a high level of generality are precluded from specifically reciting limitations required for achieving any improvement. Accordingly, even when considered in combination, additional elements for the case of the instant claims represent mere instructions to apply the exception (MPEP 2106.05(f)), and therefore cannot provide an inventive concept/‘significantly more’ (Step 2B: No).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
1. Claims 1-2, 9-10, 14 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 2021/0225013 A1) (cited by Applicant), in view of Rebien et al. (US 2021/0127071 A1).
As to claim 1, Kim discloses a method for detecting objects in sensor images, the method comprising:
determining, from a first image captured by a first view sensor (Fig. 8, [0009] “receiving a plurality of source videos captured in a plurality of areas, wherein the plurality of areas includes areas in which some of the plurality of CCTVs are installed”, [0050]), a position of a target object (Fig. 2 S130, S190 and S230, optionally Fig 8 “Location at which target object was seen for the first time” or alternatively that predicted/search candidate area of S190 & S230, [0097], [0149] “Additionally, the server 10 may calculate movement information of the object of interest from the tube of the object of interest (S190). Here, the movement information includes speed information and direction information of the object of interest being tracked”, [0160] “When the movement information of the target object is calculated, the server 10 determines at least one search candidate area based on the location information (for example, the local information, the installation coordinates, etc.) of the CCTV 30 of the system 1 in which the CCTV 30 is installed and the movement information of the target object. To this end, the system 1 is further configured to pre-store the location information of the CCTV 30 in the memory of the server 10 or a separate database”, [0163] “Referring to FIG. 8, fifteen CCTVs 30 are installed in four areas. When the target object is detected, the tube of the target object is generated by the server 10, and the movement information of the target object is acquired. Then, the server 10 determines the first to third areas of FIG. 8 as the search candidate area using the movement speed”, [0166], etc.; as an interpretation note the recited position includes any of an absolute or relative, and even a predicted/future position, under a plain meaning reading – stated differently the determined position need not exist within the first image, or within a physical space represented thereby, even if Kim also disclose a within-image position that is e.g. bounding boxes (Fig. 5)/image patches associated with target objects, Fig. 7/S130&S150, and both an initial and predicted position are disclosed/considered in Kim);
querying a feature vector Fig. 2 S170, [0016], [0045], [0054-0055] database/stored object of interest, [0126-0127], [0143], [0148] “the server 10 acquires a plurality of tubes ID 1, ID 2, ID 3 of an object of interest related to a specific object A. That is, before unification, the tubes for the same object are identified as different ones. Through the unification operation, the tubes ID 1, ID 2, ID 3 are unified into the specific object A. As a result of unification, the identifiers of the tubes may be matched to have the same ID 1 as shown in the lower part of FIG. 7”);
receiving a second image captured from a second view sensor ([0050], Fig. 3 S230 & S250, receive CAM#7-10 frames, as distinguished from ‘first’ corresponding to CAM #1-4 and/or CAM#5-6, [0167] “by capturing at least part of the determined search candidate area (S250)”);
locating the target object in the second image using the position determined from the first image (Fig. 2 S250 on the basis of S230 (as well as an initial detection of S130), [0055] “Then, when the server 10 receives the image query, the server 10 re-identifies if the target object is seen in part of the image including the detected (or stored) object of interest, thereby performing efficient search. The operation of the server 10 will be described in more detail with reference to FIGS. 2 to 10 below”, [0155], [0161] “the server 10 is configured to determine, using the movement speed, at least one search candidate area as a first area in which the target object is likely to be captured when the movement speed decreases, a second area in which the target object is likely to be captured when the movement speed does not change, and a third area in which the target object is likely to be captured when the movement speed increases (S230)”);
extracting, based on the position determined from the first image, discriminative features of the target object from the second image into a second image feature vector ([0026], Fig. 10, wherein the input to S231 comprises those frames pertaining to the candidate search areas and are therefore ‘based on’ the position(s) identified above, [0172] “the re-identification model is a machine learning model having a CNN based network structure. In an embodiment, the re-identification model is configured to: when an image (for example, the representative frame) for matching is inputted, calculate skeleton information and body-part information by detecting body parts in the human image seen in the representative frame (for example, the target patch) of the target tube or the representative frame (for example, the patch of the object of interest) of the tube of the object of interest (S231); extract global features from reconstruction information (for example, a mask reconstructed image) of the skeleton and the body-part in combination (S233); extract features per part from each of the skeleton information and the body-part information (S235), and output feature result information of the tube by concatenating the global features and the features per part (S237)”); and
associating the second image feature vector of the target object from the second image with the unique identifier from the feature vector database (Fig. 2 S270, wherein a query match is returned in association with the identifier associated with the searched/unified tube, [0054] “To this end, the system 1 may further include a database (not shown) to store the detection result of the object of interest in the source video”, [0174] “When the target object is re-identified in the tube of the object of interest related to the search candidate area, the server 10 provides the user with the tube of the object of interest re-identified as the target object as a query result. The provided tube may be the whole or part of the unified tube”, [0177]).
While Kim discloses the use of feature vectors in ultimately determining one or more object/target identifiers e.g. ID1-3, in addition to that database disclosure (e.g. [0054-0055] “server 10 detects and stores an object of interest that is highly likely to be included in the target object in response to the user's request for search in the source video. An image including the stored object of interest includes a tube of the object of interest, and may be pre-stored in the system 1”) storing tubes/frames, and implicitly memory storing even if temporarily, feature vectors associated therewith, Kim fails to disclose the database storing feature vectors directly. Stated differently, Kim is understood to store directly tubes/frames, and derive associated feature vectors therefrom ([0125], [0126], [0132], [0172], Fig. 10).
Rebien however evidences the obvious nature of querying a feature vector database to determine a unique identifier associated with the target object (Fig. 1 database 191, [0039] “In some examples, the camera module 198 is able to detect humans and extract images of humans with respective bounding boxes outlining the human objects (for example, human full body, human face, etc.) for inclusion in metadata which along with the associated surveillance video may transmitted to the server system 108. At the system 108, the media server module 168 can process extracted images and generate signatures (e.g. feature vectors) to represent objects. In computer vision, a feature descriptor is generally known as an algorithm that takes an image and outputs feature descriptions or feature vectors. Feature descriptors encode information, i.e. an image, into a series of numbers to act as a numerical "fingerprint" that can be used to differentiate one feature from another. Ideally this information is invariant under image transformation so that the features may be found again in another image of the same object”, [0040] “a feature vector is an n-dimensional vector of numerical features (numbers) that represent an image of an object processable by computers. By comparing the feature vector of a first image of one object with the feature vector of a second image, a computer implementable process may determine whether the first image and the second image are images of the same object”, [0042] “storage of feature vectors within the surveillance system 100 is contemplated. For instance, feature vectors may be indexed and stored in the database 191 with respective video. The feature vectors may also be associated with reference coordinates to where extracted images of respective objects are located in respective video. Storing may include storing surveillance video with, for example, time stamps, camera identifications, metadata with the feature vectors and reference coordinates, etc.”, [0052] “The server system 108 generates signatures based on the faces (when identified) and bodies of the people who are identified, as described above. The server system 108 stores information on whether faces were identified and the signatures as metadata together with the surveillance video recordings”, [0053], [0054], [0058], etc.,). Rebien further evidences the obvious nature of storing/associating extracted feature vectors in addition to e.g. video, reference coordinates, etc., in association with identified persons broadly (i.e. regardless of whether the recognition/identification is a first, or second/re-identification). Stated differently, motivation for storing feature vectors associated with an initial/enrollment phase, similarly applies to storing additional feature vectors and/or updating those stored vectors for second/subsequent identifications. PHOSITA would be aware of the manner in which such a continuous enrollment/feature vector updating/associating ensures stored information for comparison is the most up-to-date and/or accounts for historical information, thereby enabling more accurate re-identification. PHOSITA would further recognize that storing feature vectors directly allows for continuous modification to the means/methods used in deriving vectors for comparison, in addition to reducing/eliminating any redundant feature vector extraction from stored frames/images that have previously been processed.
It would have been obvious to a person of ordinary skill in the art, before the effective filing date, to modify the system and method of Kim such that in addition to stored tubes, associated feature vectors are stored directly in a feature vector database, and to further comprise equivalent ‘querying’ and ‘associating’ steps as taught/ suggested by Rebien, the motivation as identified above that such a storing ensures up-to-date information for similarity comparison and identifier retrieval and may reduce otherwise redundant feature vector extraction/determination processing.
As to claim 2, Kim in view of Rebien teaches/suggests the method of claim 1.
Kim in view of Rebien further teaches/suggests the method wherein the feature vector database also includes a first feature vector associated with the target object, and which further includes comparing the second image feature vector with the first feature vector from the feature vector database (Kim S130 & S150, S250, [0167], in further view of Rebien [0040], [0041] “Similarity calculation can be just an extension of the above. Specifically, by calculating the Euclidean distance between two feature vectors of two images captured by one or more of the cameras 169, a computer implementable process can determine a similarity score to indicate how similar the two images may be”).
As to claim 9, this claim is the non-transitory CRM claim corresponding to the method of claim 1 and is rejected accordingly.
As to claim 10, this claim is the non-transitory CRM claim corresponding to the method of claim 2 and is rejected accordingly.
As to claim 14, this claim is the non-transitory CRM claim comparable to the method of claim 2, but for a ‘third’ instance similar to that ‘second’ recited for the case(s) of claim(s) 1-2 above. Kim discloses a plurality of such images, detections/re-identification(s), and at the minimum suggests Fig. 2 as a whole as implemented in a re-iterated manner over the course of a long enough timeline monitoring an object of interest. That modification/motivation as presented in the rejection of claim 1 is similarly applicable to any number of repeated executions/additionally performed ‘associating’ steps for that same reason previously identified (and comparable to those various vantages points all reflected in a single unified tube for the case of Kim), namely that storing additional feature vectors based on re-identification serves to build a more complete data set for future detections/re-identifications robust to changes in factors such as view angle, illumination, object/person clothing/behavior, etc..
As to claim 16, this claim is the system claim corresponding to the method of claim 1 and is rejected accordingly.
2. Claims 3-4 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 2021/0225013 A1) in view of in view of Rebien et al. (US 2021/0127071 A1) and Schumann et al. “Person re-identification across aerial and ground-based cameras by deep feature fusion”.
As to claim 3, Kim in view of Rebien teaches/suggests the method of claim 2.
Kim in view of Rebien further teaches/suggests the method wherein if a comparison threshold is not satisfied as a result of the comparing the second image feature vector with the first feature vector from the feature vector database, Rebien [0058] “However, in certain other embodiments, the application 144 may use a non-zero match likelihood threshold that is other than 25%, or may display search results 406 in a manner not specifically based on a match likelihood threshold”, [0060] “In certain other embodiments, the application 144 may be configured to display additional results 406 in response to the user's selecting the button 424 even if those additional results 406 are below the match likelihood threshold”, see also [0038]).
Kim fails to explicitly disclose that saving as required by claim 3. Kim however does suggest the manner in which differing views may still be unified/associated with a same unique identifier (see Fig. 7).
Schumann evidences the obvious nature of saving distinct feature vectors/views in association with a shared/common identity (feature fusion in view of, Fig. 1(b) “The aerial boxes show a much greater variety of aspect ratios, which lead to distortions when the image is scaled to a uniform size prior to re-identification. Furthermore, the extreme angles can result in very different positions of body parts within the aerial images, e.g. head in a top-view is located `inside' the torso box, instead of above it”, page 3 Section 2 fusion disclosure, page 4 Section 3.3 “The general person re-identification pipeline consists of three stages: 1) feature computation, 2) feature comparison by some distance metric, and 3) ranking according to the computed distances. There are three conventional methods which can be employed when the information from two different features is to be fused”, page 5 sections 3.2-3.3, etc.,).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date, to further modify the system and method of Kim in view of Rebien so as to include that saving/fusion of even distinct/distanced feature vectors/information in association with a same/common identifier/class/etc., as taught/suggested by Schumann, the motivation as similarly taught/suggested therein that such a saving may facilitate detection/re-identification even for the case of extreme angles and resultant self-occlusion.
As to claim 4, Kim in view of Rebien and Schumann teaches/suggests the method of claim 3.
Kim discloses the method further includ[ing] capturing the first image with the first view sensor (Kim Fig. 3, CAM#1 input at S110 as used for S130 and S150).
As to claim 11, this claim is the non-transitory CRM claim corresponding to the method of claim 3 and is rejected accordingly.
3. Claims 5-6 and 12-13 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 2021/0225013 A1) in view of in view of Rebien et al. (US 2021/0127071 A1), Schumann et al. “Person re-identification across aerial and ground-based cameras by deep feature fusion”, and Kalirajan et al. (US 2022/0351519 A1).
As to claim 5, Kim in view of Rebien and Schumann teaches/suggests the method of claim 4.
Kim discloses the method further include[ing] capturing the second image with the second view sensor (Kim Fig. 3), and which further includes determining a platform position Kim ‘detailed information’ comprising “location information of the CCTV 30”, [0052], [0152], [0154]).
While Kim discloses camera/sensor location information reading on a platform position, Kim fails to explicitly disclose any camera/platform orientation.
Kalirajan evidences the obvious nature of determining a platform orientation for one or more view sensors (Kalirajan Fig. 15 platform orientation 218d Yaw, Pitch, Roll in further view of platform position 218a latitude and longitude, [0060] “This enables some distance to be traveled between images while still allowing for sufficient overlap between successive images. At block 218, a predicted position P2 is determined. This may include as inputs the location (such as latitude and longitude) of the drone at time t1, as indicated at 218a; the speed of movement as indicated at 218b, the drone's altitude as indicated at 218c and the yaw, pitch and roll of the drone, as indicated at 218d. A second image is captured at the predicted position P2 as indicated at block 220”).).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date, to further modify the system and method of Kim in view of Rebien such that those moveable CCTV embodiment(s) disclosed therein further comprise e.g. drone/UAV and a platform orientation determination so as to assist drone navigation along one or more flight paths as taught/suggested by Kalirajan, the motivation as similarly taught/suggested therein and readily recognized by PHOSITA that such a surveillance sensor embodiment and associated pose/orientation determination may serve to allow for sensor coverage in areas requiring increased and/or dynamic sensor mobility.
As to claim 6, Kim in view of Rebien, Schumann and Kalirajan teaches/suggests the method of claim 5.
Kim discloses the method further include[ing] associating the second image feature vector with the platform position and platform orientation (Kim ‘detailed information’ associated with CCTV 30 as identified for claim 5 above stored in association with source video frames, tubes, query results, etc., [0052] “the CCTV 30 is further configured to generate an image or a video and generate detailed information associated with the generated image or video (for example, the source video). The detailed information includes identification information (for example, an identifier) of the CCTV 30 having captured the source video, location information of the CCTV 30, a timecode of the source video and/or identification information of the source video. For example, when the CCTV 30 captures a situation in the coverage to generate a source video, a plurality of frames of the source video, and detailed information including an identifier of the corresponding CCTV 30, a frame identifier and a timecode are acquired”, [0152] “server 10 may further generate detailed information related to the corresponding tube, and associate the detailed information to the corresponding tube”, [0154]; see also that storing disclosure of Rebien (e.g. [0042]) identified above for the case of claim 1).
As to claims 12-13, these claims are the non-transitory CRM claims corresponding to method claims 5-6 respectively, and are rejected accordingly.
3. Claims 7, 15 and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 2021/0225013 A1) in view of in view of Rebien et al. (US 2021/0127071 A1) and Kalirajan et al. (US 2022/0351519 A1).
As to claim 7, Kim as modified by Rebien teaches/suggests the method of claim 1.
Kim fails to explicitly disclose the method wherein the position of the target object is determined from the first image on the basis of a mapping between the first image and a fiducial common between the first view sensor and the second view sensor. Kim as applied would permit such a mapping particularly for the case that the so called first and second view sensors comprised overlapping/co-extensive fields of view ([0061] “Then, for example, when an object moving in the corresponding area passes through the view range shared by the plurality of CCTVs 30, a plurality of source videos in which the object is seen is acquired. That is, the object is captured by the plurality of CCTVs 30 in overlapping views”). Official Notice is taken however to the manner in which a shared/common fiducial/landmark/GCP is commonly used to derive image correspondences (particularly for stereo vision/overlapping views using epipolar geometry) and object positions based thereon (see also Applicant’s Specification at [0019] with reference to ‘straightforward determination.. using trigonometry’).
Kalirajan further evidences the obvious nature of determining an inter-image registration/mapping on the basis of a fiducial/landmark common between two or more sensors/views ([0008] “The controller is configured to analyze each of the video streams in order to find a first common landmark in both a first video frame from a first video stream and a second video frame from a second video stream”, etc.,).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date, to further modify the system and method of Kim in view of Rebien so as to rely at least in part on a common/shared landmark/fiducial when determining an object position for overlapping views as taught/suggested by Kalirajan and routinely performed in the art, the motivation as similarly taught/suggested therein that such a fiducial/landmark serves as a readily detected/salient point that is itself less prone to errors in determined position, thereby minimizing the propagation of such error(s) otherwise for additional position determinations based thereon.
As to claim 17, Kim as modified by Rebien teaches/suggests the system of claim 16.
Kim suggests the system further includes transmitting the position of the target object to a moving vehicle having the second view sensor (Kim Fig. 1, Fig. S230 in view of [0051] “The plurality of CCTVs 30 is connected to the server 10 via a wired/wireless network to generate a plurality of source videos and provide the source videos to the server 10. The CCTV 30 is not limited to an analog CCTV, a digital CCTV, a fixed CCTV and a moveable CCTV, and may include a variety of types of imaging devices capable of acquiring image information in real time. For example, a smartphone, a black box, etc. may acts as the CCTV 30 of the present disclosure”). While Kim discloses acquiring imagery from e.g. a second view sensor (that may optionally be ‘moveable’) located within one or more of the candidate search regions of S230, and each of CCTVs 30 are in communication with server 10, Kim fails to disclose transmitting candidate locations so as to re-position e.g. a drone/UAV embodiment of one or more of the CCTVs so as to reposition such a sensor within the candidate area(s).
Kalirajan however evidences the obvious nature of transmitting the position of the target object to a moving vehicle having the second view sensor (Fig. 11 142-144, Fig. 12 156, 152 Receive instructions to fly to a particular location at which an incident is believed to be occurring, [0056] “FIG. 12 is a flow diagram showing an illustrative method 150 that the controller 136 of the drone 128 may be configured to carry out. Instructions are received to fly to a particular location at which an incident is believed to be occurring, as indicated at block 152. A first video of the incident is captured using the video camera, as indicated at block 154. The controller 136 is configured to fly the drone to a second location away from the particular location to follow the incident, as indicated at block 156”, etc.,).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date, to further modify the system and method of Kim in view of Rebien such that those moveable CCTV embodiment(s) disclosed therein further comprise e.g. drone/UAV and a transmission/receipt of position information/flight path to CCTVs 30 of Kim to facilitate a dynamic capture of second view sensor imagery as taught/suggested by Kalirajan, the motivation as similarly taught/suggested therein and readily recognized by PHOSITA that such a surveillance sensor embodiment and associated transmission may serve to allow for sensor coverage in e.g. sparsely populated areas wherein otherwise fixed infrastructure is impractical.
As to claim 18, Kim in view of Rebien and Kalirajan teaches/suggests the system of claim 17.
Kim in view of Rebien and Kalirajan teaches/suggests the system further including maneuvering the moving vehicle such that the target object is within a field of view of the second view sensor (Kalirajan Fig. 11 142-144, Fig. 12 156, 152 Receive instructions to fly to a particular location at which an incident is believed to be occurring, [0056], etc., in view of that modification/motivation as presented above for the case of claim 17 and similarly applicable to claim 18).
As to claim 19, Kim in view of Rebien and Kalirajan teaches/suggests the system of claim 18.
Kim in view of Rebien and Kalirajan teaches/suggests the system further including determining a platform position and platform orientation of the second view sensor that corresponds with the second image (Kalirajan Fig. 15 platform orientation 218d Yaw, Pitch, Roll in further view of platform position 218a latitude and longitude, [0060] “This enables some distance to be traveled between images while still allowing for sufficient overlap between successive images. At block 218, a predicted position P2 is determined. This may include as inputs the location (such as latitude and longitude) of the drone at time t1, as indicated at 218a; the speed of movement as indicated at 218b, the drone's altitude as indicated at 218c and the yaw, pitch and roll of the drone, as indicated at 218d. A second image is captured at the predicted position P2 as indicated at block 220”; see also that modification/motivation as presented above for the case of claim 5).
As to claim 20, Kim in view of Rebien and Kalirajan teaches/suggests the system of claim 19.
Kim in view of Rebien and Kalirajan teaches/suggests the system further including associating the second image feature vector with the platform position and platform orientation in the feature vector database (Kim ‘detailed information’ associated with CCTV 30 as identified for claim 5 above stored in association with source video frames, tubes, query results, etc., [0052] “the CCTV 30 is further configured to generate an image or a video and generate detailed information associated with the generated image or video (for example, the source video). The detailed information includes identification information (for example, an identifier) of the CCTV 30 having captured the source video, location information of the CCTV 30, a timecode of the source video and/or identification information of the source video. For example, when the CCTV 30 captures a situation in the coverage to generate a source video, a plurality of frames of the source video, and detailed information including an identifier of the corresponding CCTV 30, a frame identifier and a timecode are acquired”, [0152] “server 10 may further generate detailed information related to the corresponding tube, and associate the detailed information to the corresponding tube”, [0154]; see also that storing disclosure of Rebien (e.g. [0042]) identified above for the case of claim 1).
As to claim 15, this claim is the non-transitory CRM claim corresponding to the system of claim 17 and is rejected accordingly.
4. Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 2021/0225013 A1) in view of in view of Rebien et al. (US 2021/0127071 A1), Kalirajan et al. (US 2022/0351519 A1) and Johnston “Ground object geo-location using UAV video camera”.
As to claim 8, Kim as modified by Rebien teaches/suggests the method of claim 1.
Kim fails to explicitly disclose the method wherein the position of the target object is expressed in geodetic coordinates (Kim Fig. 8 comprising e.g. candidate areas that comprise portions of e.g. one or more streets/intersections, but falling silent with respect to any geodetic coordinates).
Kalirajan however evidences the obvious nature of expressing position information generally in terms of geodetic coordinates (Fig. 13 178 lat, lon of cameras, readily extended to and/or serving to reflect a corresponding position of the object for the case that the object is proximate the camera/featured in an associated FoV).
Johnston further evidences the obvious nature of object localization in terms of geodetic coordinates (Abs “This paper will present an approach for object localization that utilizes basic rotation matrices and camera system geometry matrices to calculate the WGS84 Latitude, Longitude and Elevation of a cued ground object that appears within the field of view of a camera system. The calculations for obtaining the object localization are relative to the GPS position and attitude of an Uninhabited Aerial Vehicle (UAV)”).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date, to further modify the system and method of Kim in view of Rebien such that one or more candidate areas/target positions are expressed in terms of geodetic coordinates as taught/suggested by Kalirajan and/or Johnston, the motivation as similarly taught/suggested therein and readily recognized by PHOSITA, that such a position format is conventionally used enabling efficient integration with related systems/processes, in addition to the manner in which such an expression may be more conceptually/contextually appropriate for certain object re-identification applications (e.g. for a vehicle of interest (Kalirajan [0061]) across a larger area/array of cameras).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Inquiry
Any inquiry concerning this communication or earlier communications from the examiner should be directed to IAN L LEMIEUX whose telephone number is (571)270-5796. The examiner can normally be reached Mon - Fri 9:00 - 6:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan Park can be reached on 571-272-7409. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/IAN L LEMIEUX/Primary Examiner, Art Unit 2669