Last updated: April 19, 2026
Application No. 17/868,157
Computer Vision Systems and Methods for Blind Localization of Image Forgery

Final Rejection §103
Filed
Jul 19, 2022
Examiner
HAUSMANN, MICHELLE M
Art Unit
2671
Tech Center
2600 — Communications
Assignee
The Regents of the University of Colorado
OA Round
4 (Final)
Interview Optional

— +21.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 863 resolved cases, 2023–2026
Examiner Intelligence

HAUSMANN, MICHELLE M View full profile →
Grants 76% — above average
Career Allow Rate
658 granted / 863 resolved
+14.2% vs TC avg
Strong +22% interview lift
Without
With
+21.6%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
23 currently pending
Career history
886
Total Applications
across all art units
Statute-Specific Performance

§101
14.6%
-25.4% vs TC avg
§103
61.2%
+21.2% vs TC avg
§102
5.7%
-34.3% vs TC avg
§112
10.1%
-29.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 863 resolved cases
Office Action

§103
DETAILED ACTION
Response to Amendment
Claims 1-19 are pending. Claims 1-19 are amended directly or by dependency on an amended claim.
Response to Arguments
Applicant's arguments filed 5 January, 2026 have been fully considered but they are not persuasive. 
Applicant argues on pages 7-8 the cited references do not teach the concept of “blind splice detection”. Zhou et al., Cozzolino et al., and Mayer et al. all disclose/teach the concept of “blind splice detection”. Zhou et al. disclose: Columbia dataset focuses on splicing based on uncompressed images, p1058, Qualitative results for multi-class image manipulation detection on NIST16 dataset. RGB and noise map provide different information for splicing, copy-move and removal, p1059, classes for manipulation classification to be splicing, removal and copy-move so as to learn distinct visual tampering artifacts and noise features for each class, p1060; Cozzolino et al. teach: As an example, Fig.1 shows two images subject to a splicing attack, which can be easily detected by visual inspection of their noiseprints, p2, In Splicebuster [28] the expectationmaximization algorithm is used to this end, p3, dataset focus on splicing, p6, generative adversarial networks, p11; and Mayer et al. teach: “We show that our proposed system accurately determines if two patches were captured by the same or different camera models, even when the camera models are unknown to the investigator. We also demonstrate the utility of this approach for image splicing detection and localization”, abstract, “An investigator may also be concerned with whether an image is a spliced forgery. In this experiment, we demonstrate the promise of our proposed approach for detecting spliced images that are a composite of content from two different camera models. To do this, we took a spliced image and split it into 256 ⇥ 256 patches with 50% overlap. We selected one patch as a reference, computed its comparison score to every other patch in the image, and then highlighted all of the patches that were detected as captured by a different source. An example of this analysis is shown in Fig. 3 on an image downloaded from an online forum1. The original and spliced images are shown in (a) and (b). In (c) we selected a patch in the original part of the image as reference, and our method highlighted the foreign portion of the image a having a different source. In (d) we selected a patch in the foreign part of the image as reference, and our method highlighted the original parts of the image as having a different source. The results of this experiment demonstrate the promise of our proposed approach at detecting and localizing spliced images”, part 4.2. The term “blind” is not explicitly used, however, as the source camera is not known, these are all “blind” splice detections.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1, 6-8, 13, 14 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhou et al. (IDS: “Learning Rich Features for Image Manipulation Detection” 2018) in view of Bayar et al. (IDS: “Constrained Convolutional Neural Networks: A New Approach Towards General Purpose Image Manipulation Detection”) in view of Cozzolino et al. (IDS: Noiseprint: a CNN-based camera model fingerprint, 2018) in view of Mayer et al. (“LEARNED FORENSIC SOURCE SIMILARITY FOR UNKNOWN CAMERA MODELS”, 2018).

Regarding claims 1 and 8 and 14, Zhou et al. disclose a computer vision system for localizing image forgery (manipulation detection, abstract) comprising: a memory; and a processor in communication with the memory, the processor, and method for localizing image forgery by a computer vision system, comprising the steps of, and non-transitory computer readable medium having instructions stored thereon for localizing image forgery by a computer vision system which, when executed by a processor, causes the processor to carry out the steps of: generating a convolution using a plurality of learned rich filters (two-stream Faster R-CNN network, find tampering artifacts like strong contrast difference, unnatural tampered boundaries, leverages the noise features extracted from a steganalysis rich model filter layer to discover the noise inconsistency between authentic and tampered regions, abstract, Faster R-CNN to learn rich features for image manipulation detection, p1055, the unnaturally high contrast along the baseball player’s edges provides a strong cue about the presence of tampering, p1056), training a neural network with the constrained convolution and a plurality of images of a dataset to perform blind splice detection utilizing a low-level representation indicative of a statistical signature of at least one source camera model for each image among the plurality of images (adopt Faster R-CNN [28] within a two-stream network and perform end-to-end training, Based on recent work on steganalysis rich model (SRM) for manipulation classification [35, 15], we select SRM filter kernels to produce the noise features and use them as the input channel to the second Faster R-CNN network. p1054, Fig. 2, learn rich features for image manipulation detection, p1055, “We directly use the noise features as the input to the noise stream network. The backbone convolutional network architecture of the noise stream is the same as the RGB stream”, p1056, proposed network is trained end-to-end, pre-train our model on our synthetic dataset p1057, Columbia dataset focuses on splicing based on uncompressed images, p1058, Qualitative results for multi-class image manipulation detection on NIST16 dataset. RGB and noise map provide different information for splicing, copy-move and removal, p1059, classes for manipulation classification to be splicing, removal and copy-move so as to learn distinct visual tampering artifacts and noise features for each class, p1060) and localizing an attribute of an image of the dataset by the trained neural network (localize the tampered regions, p1053, Faster R-CNN second stream analyzes the local noise features in an image, p1054, “Cozzolino et al. [7] explore and demonstrate the performance of SRM features in distinguishing tampered and authentic regions. They also combine SRM features by including the quantization and truncation operations with a Convolutional Neural Network (CNN) to perform manipulation Localization… We fuse the two streams through bilinear pooling before a fully connected layer for manipulation classification. The RPN uses the RGB stream to localize tampered regions”, p1055, Starting from 30 basic filters, along with nonlinear operations like maximum and minimum of the nearby outputs after filtering, SRM features gather the basic noise features. SRM quantifies and truncates the output of these filters and extracts the nearby co-occurrence information as the final features. The feature obtained from this process can be regarded as a local noise descriptor, p1056, detection results box shows localization on images, Fig. 5, Fig. 6).
 
Zhou et al. disclose a convolution (uses CNN) but do not disclose a constrained convolution in particular or that it is “without utilizing knowledge of the at least one source camera model”. Zhou et al. disclose a statistical signature of at least one source model, but do not disclose it is a camera model. 

Bayar et al. teach generating a constrained convolution (new type of CNN layer, called a constrained convolutional layer, abstract, CONSTRAINED CONVOLUTIONAL NEURAL NETWORK, propose a new type of convolutional layer, called a constrained convolutional layer, that is designed to be used in forensic tasks, part III A), training a neural network with the constrained convolution and a plurality of images of a dataset to learn a low-level representation indicative of a statistical signature of at least one source camera model for each image among the plurality of images (adaptively learn manipulation detection features, abstract, prediction errors are then used as low-level forensic features, feature maps it produces correspond to prediction error fields that are used as low-level forensic traces, provide the CNN with low level
forensic features, learning image manipulation fingerprints, part IIIA, This may allow us to learn better feature extractors than the human designed incorporated into the rich model feature extractors. Training time is an important factor when devising a data driven manipulation detection approach. Our CNN based approach took approximately six hours to train on this database, part VE), localizing an attribute of an image of the dataset by the trained neural network (Our approach is able to use data to directly learn the changes introduced by image tampering operations into local pixel relationships, part III A), and detecting an image fake or forgery (accurately detect image manipulations in realistic scenarios where there is a source camera model mismatch between the training and testing data, abstract, adaptively learn image
manipulation features and accurately identify the type of editing that an image has undergone, part I, learning image manipulation fingerprints, part IIIA, Our ET-based CNN approach can achieve at least 99.55% detection rate with all types of manipulations, part VA).

Zhou et al. and Bayar et al. are in the same art of detecting fake images (Zhou et al., abstract; Bayar et al., abstract). The combination of Bayar et al. with Zhou et al. enables using a constrained convolution. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the constrained convolution of Bayar et al. with the tampered image detection of Zhou et al. as this was known at the time of filing, the combination would have predictable results, and as Bayar et al. indicate “Our experimental results demonstrate that our CNN can detect multiple different editing operations with up to 99.97% accuracy and outperform the existing state-of-the-art general purpose manipulation detector. Furthermore, our constrained CNN can still accurately detect image manipulations in realistic scenarios where there is a source camera model mismatch between the training and testing data” (abstract), indicating a performance improvement over a simple rich feature model. 

To the extent Zhou et al. and Bayar et al. do not explicitly disclose a statistical signature of at least one source camera model, another reference is provided to make this explicit. Zhou et al. and Bayar et al. do not explicitly disclose the constrained convolution is “without utilizing knowledge of the at least one source camera model”.

Cozzolino et al. teach training a neural network with the convolution and a plurality of images of a dataset to learn a low-level representation indicative of a statistical signature of at least one source camera model for each image among the plurality of images (“Forensic analyses of digital images rely heavily on the traces of in-camera and out-camera processes left on the acquired images. Such traces represent a sort of camera fingerprint… In this paper we propose a method to extract a camera model fingerprint, called noiseprint, where the scene content is largely suppressed and model-related artifacts are enhanced”, a large number of methods have been proposed for forgery detection and localization or camera identification [1], [2], [3]. Some of them rely on semantic or physical inconsistencies [4], [5], but statistical methods, based on pixel-level analyses of the data, are by far the most successful and widespread, Statistical methods can follow both a model-based and a data-driven approach, p1, To any single image the network associates a noise residual, called noiseprint from now on, which shows clear traces of camera artifacts. Therefore, it can be regarded as a camera model fingerprint, much like the PRNU pattern represents a device fingerprint. It can also happen that image manipulations leave traces very evident in the noiseprint, p2, “methods proposed exploit rich-model features only to perform unsupervised anomaly detection”, “the product of our system will be an image-size noise residual, just like in PRNU-based methods, a noiseprint image that will bear traces of camera model artifacts, rather than of the individual device imperfections”, p3, “the noiseprint is desired to contain mostly camera model artifacts” “When the training process ends, the system is freezed. Consequently, to each input image a noiseprint is deterministically associated, which enhances the camera model artifacts with their model-dependent spatial distribution”, p4, build a suitable probability distribution through softmax processing, p5, “The comparison with Splicebuster (average MCC=0.365, average ranking=2.7) is especially meaningful, since the two methods differ only in the input noise residual, obtained thorough high-pass filtering in Splicebuster and given by noiseprint here”, p8) and localizing an attribute of an image of the dataset by the trained neural network (“Forgery localization based on noiseprints”, p6). Cozzolino et al. further teach performing blind splice detection (As an example, Fig.1 shows two images subject to a splicing attack, which can be easily detected by visual inspection of their noiseprints, p2, In Splicebuster [28] the expectationmaximization algorithm is used to this end, p3, dataset focus on splicing, p6, generative adversarial networks, p11).

Zhou et al. and Bayar et al. and Cozzolino et al. are in the same art of detecting fake images (Zhou et al., abstract; Bayar et al., abstract; Cozzolino et al., abstract). The combination of Cozzolino et al. with Zhou et al. and Bayar et al. enables using a camera model. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the model of Cozzolino et al. with the tampered image detection of Zhou et al. and Bayar et al. as this was known at the time of filing, the combination would have predictable results, and as Cozzolino et al. indicate “Forensic analyses of digital images rely heavily on the traces of in-camera and out-camera processes left on the acquired images. Such traces represent a sort of camera fingerprint. If one is able to recover them, by suppressing the high-level scene content and other disturbances, a number of forensic tasks can be easily accomplished” (abstract) and “To any single image the network associates a noise residual, called noiseprint from now on, which shows clear traces of camera artifacts. Therefore, it can be regarded as a camera model fingerprint, much like the PRNU pattern represents a device fingerprint. It can also happen that image manipulations leave traces very evident in the noiseprint, such to allow easy localization even by direct inspection” (p2), indicating the benefit to Zhou et al. and Bayar et al. to using a camera model.

Zhou et al. and Bayar et al. and Cozzolino et al. do not explicitly disclose the constrained convolution is “without utilizing knowledge of the at least one source camera model”.

Mayer et al. teach a constrained convolution without utilizing knowledge of the at least one source camera model (“Information about an image’s source camera model is important knowledge in many forensic investigations. In this paper we propose a system that compares two image patches to determine if they were captured by the same camera model. To do this, we first train a CNN based feature extractor to output generic, high level features which encode information about the source camera model of an image patch. Then, we learn a similarity measure that maps pairs of these features to a score indicating whether the two image patches were captured by the same or different camera models. We show that our proposed system accurately determines if two patches were captured by the same or different camera models, even when the camera models are unknown to the investigator,” abstract, “Then, we train a similarity network that maps pairings of these features to a score that indicates whether the two image patches were captured by the same camera model or two different camera models. We experimentally show that our proposed approach effectively differentiates between camera models, even if the source camera models were not used to train the system,” part 1, “To create our proposed system, we first train a convolutional neural network (CNN) to learn a feature extractor that produces generic, high-level features from image patches that are useful for camera model identification… The camera model sets A and B are disjoint to ensure that the similarity network learns to differentiate camera models that have not been learned by the feature extractor, including unknown camera models”, part 3, train a convolutional neural network (CNN) to identify camera models from image patches, 
    PNG
    media_image1.png
    332
    450
    media_image1.png
    Greyscale
part 3.1). 

Mayer et al. further teach performing blind splice detection (“We show that our proposed system accurately determines if two patches were captured by the same or different camera models, even when the camera models are unknown to the investigator. We also
demonstrate the utility of this approach for image splicing detection and localization”, abstract, “An investigator may also be concerned with whether an image is a spliced forgery. In this experiment, we demonstrate the promise of our proposed approach for detecting spliced images that are a composite of content from two different camera models. To do this, we took a spliced image and split it into 256 ⇥ 256 patches with 50% overlap. We selected one patch as a reference, computed its comparison score to every other patch in the image, and then highlighted all of the patches that were detected as captured by a different source. An example of this analysis is shown in Fig. 3 on an image downloaded from an online forum1. The original and spliced images are shown in (a) and (b). In (c) we selected a patch in the original part of the image as reference, and our method highlighted the foreign portion of the image a having a different source. In (d) we selected a patch in the foreign part of the image as reference, and our method highlighted the original parts of the image as having a different source. The results of this experiment demonstrate the promise of our proposed approach at detecting and localizing spliced images”, part 4.2).

Zhou et al. and Bayar et al. and Cozzolino et al. and Mayer et al. are in the same art of detecting fake images/image forensics (Zhou et al., abstract; Bayar et al., abstract; Cozzolino et al., abstract; Mayer et al., abstract). The combination of Mayer et al. with Zhou et al. and Bayar et al. and Cozzolino et al. enables performing constrained convolution without utilizing knowledge of the at least one source camera model. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the unknown model of Mayer et al. with the tampered image detection of Zhou et al. and Bayar et al. and Cozzolino et al. as this was known at the time of filing, the combination would have predictable results, and as Mayer et al. indicate “Our approach is different from camera model identification in that it does not determine the exact camera model that was used to capture either patch. The power of our proposed approach is that it is able to compare camera models that were not used to train the system. This allows an investigator to learn important information about images captured by any camera, and isn’t limited by the set of camera models in the investigator’s database” (part 1) thereby indicating a benefit in digital forensics.

Regarding claim 6, Zhou et al., Bayar et al., Cozzolino et al., and Mayer et al. disclose the system of claim 1. Cozzolino et al. further indicate the dataset is a Dresden Image dataset (44 cameras from the Dresden dataset, p7).

Regarding claims 7, 13 and 19, Zhou et al., Bayar et al., Cozzolino et al., and Mayer et al. disclose the system, method, and CRM of claims 1, 8, and 14. Zhou et al. and Cozzolino et al. further indicate the localized adversarial perturbation of the image is a splicing manipulation (Zhou et al., Columbia dataset focuses on splicing based on uncompressed images, p1058, Qualitative results for multi-class image manipulation detection on NIST16 dataset. RGB and noise map provide different information for splicing, copy-move and removal, p1059, classes for manipulation classification to be splicing, removal and copy-move so as to learn distinct visual tampering artifacts and noise features for each class, p1060; Cozzolino et al., As an example, Fig.1 shows two images subject to a splicing attack, which can be easily detected by visual inspection of their noiseprints, p2, In Splicebuster [28] the expectationmaximization algorithm is used to this end, p3, dataset focus on splicing, p6, generative adversarial networks, p11).

Claim 2, 3, 9, 10, 15 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhou et al. (IDS: “Learning Rich Features for Image Manipulation Detection” 2018) and Bayar et al. (IDS: “Constrained Convolutional Neural Networks: A New Approach Towards General Purpose Image Manipulation Detection”) and Cozzolino et al. (IDS: Noiseprint: a CNN-based camera model fingerprint, 2018) and Mayer et al. (IDS: “LEARNED FORENSIC SOURCE SIMILARITY FOR UNKNOWN CAMERA MODELS”) as applied to claim 1 above, further in view of Nordstrom et al. (“Biased Anisotropic Diffusion A Unified Regularization and Diffusion Approach to Edge Detection”).

Regarding claims 2, 9, and 15, Zhou et al., Bayar et al., Cozzolino et al., and Mayer et al. disclose the system, method, and CRM of claims 1, 8, and 14. Zhou et al., Bayar et al., Cozzolino et al., and Mayer et al. further indicate extracts at least one noise residual pattern from each image among the plurality of images via the constrained convolution (Zhou et al., The other is a noise stream that leverages the noise features extracted from a steganalysis rich model filter layer to discover the noise inconsistency between authentic and tampered regions, abstract, In our setting, noise is modeled  by the residual between a pixel’s value and the estimate of that pixel’s value produced by interpolating only the values of neighboring pixels, part 3.2; Cozzolino et al., In this paper, we propose a method to extract a camera model fingerprint, called noiseprint, where the scene content is largely suppressed and model-related artifacts are enhanced, abstract, To any single image the network associates a noise residual, called noiseprint from now on, which shows clear traces of camera artifacts, part 1; Mayer et al., A variety of techniques exist for camera model identification [2]. These techniques employ a range of features including CFA demosaicing artifacts [3–7], sensor pattern noise [8], local binary patterns [9], noise models [10], and chromatic aberration [11]. More recently, convolutional neural networks (CNNs) have been shown to be powerful tools for camera model identification [12–16], part 1), determines a spatial distribution of the extracted at least one noise residual pattern (Zhou et al., So, we utilize the local noise distributions of the image to provide additional evidence, part 3.2; Cozzolino et al., Consequently, to each input image a noiseprint is deterministically associated, which enhances the camera model artifacts with their model-dependent spatial distribution, part IIIA, They can be roughly grouped in three classes according to the features they exploit: i) JPEG artifacts [14], [61]–[65], ii) CFA artifacts [12], [13], iii) inconsistencies in the spatial distribution of features [24]–[26], [57], [66], [67], part IVB).

Zhou et al., Bayar et al., Cozzolino et al., and Mayer et al. do not disclose suppresses semantic edges present in each image among the plurality of images by applying a probabilistic regularization.

Nordstrom et al. teach suppresses semantic edges present in each image among the plurality of images by applying a probabilistic regularization (
    PNG
    media_image2.png
    594
    840
    media_image2.png
    Greyscale
, part 1).

Zhou et al. and Nordstrom et al. are in the same art of detecting edges (Zhou et al., part 1; Nordstrom et al., abstract). The combination of Nordstrom et al. with Zhou et al. and Bayar et al. and Cozzolino et al. enables using probabilistic regularization. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the probabilistic regularization of Nordstrom et al. with the tampered image detection of Zhou et al. and Bayar et al. and Cozzolino et al. as this was known at the time of filing, the combination would have predictable results, and as Nordstrom et al. indicate “The first of these properties implies very significant advantages over other existing regularization methods; the computation cost is typically cut by an order of magnitude or more. The second property represents considerable advantages over the existing diffusion methods; it removes the problem of deciding when to stop, as well as that of actually stopping the diffusion process” (abstract), indicating a computational improvement to the combination or inventions.

Regarding claims 3, 10, and 16, Zhou et al., Bayar et al., Cozzolino et al., and Mayer et al. disclose the system, method, and CRM of claim 2, 9, and 15. Zhou et al., Bayar et al., Cozzolino et al., and Mayer et al. further indicate the processor trains the neural network with a complete loss function based on a cross-entropy loss function over the dataset (Zhou et al., Lcls denotes cross entropy loss for RPN network, part 3.1; Mayer et al., Training is performed through iterative back propagation with binary cross entropy loss, part 3.2), the probabilistic regularization (Nordstrom et al., part 1), and a rich filter constraint penalty (Zhou et al., The other is a noise stream that leverages the noise features extracted from a steganalysis rich model filter layer to discover the noise inconsistency between authentic and tampered regions, abstract, Based on recent work on steganalysis rich model (SRM) for manipulation classification [35, 15], we select SRM filter kernels to produce the noise features and use them as the input channel to the second Faster R-CNN network, part 1, We use a two-stream network built on Faster R-CNN to learn rich features for image manipulation detection, part 2).

Claims 4, 11, and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhou et al. (IDS: “Learning Rich Features for Image Manipulation Detection” 2018) and Bayar et al. (IDS: “Constrained Convolutional Neural Networks: A New Approach Towards General Purpose Image Manipulation Detection”) and Cozzolino et al. (IDS: Noiseprint: a CNN-based camera model fingerprint, 2018) and Mayer et al. (IDS: “LEARNED FORENSIC SOURCE SIMILARITY FOR UNKNOWN CAMERA MODELS”) as applied to claims 1, 8, and 14 above, further in view of Shu et al. (IDS: “Unsupervised 3D shape segmentation and co-segmentation via deep learning” 2016).

Regarding claims 4, 11, and 17, Zhou et al., Bayar et al., Cozzolino et al., and Mayer et al. disclose the system, method, and CRM of claims 1, 8, and 14. Zhou et al., Bayar et al., Cozzolino et al., and Mayer et al. further indicate the processor localizes the attribute of the image of the dataset by the trained neural network by: subdividing the image into a plurality of patches, determining a hundred-dimensional feature vector for each patch (Bayer et al., The previously learned hierarchical features are produced by learning local spatial association within a receptive field (local region/patch convolved with a filter) in the same feature map, part IVA, Each 256×256 patch in the training and testing data has its corresponding 200 features vector, Part IVG; Mayer et al., To extract features f(X) from an image patchX, we feedX forward through the trained network and record the neuron values, preactivation, of layer fc a2. The feature vector f(X) has dimension 200 and encodes information about the source camera model of X, part 3.1), and segmenting the plurality of patches by applying an expectation maximization algorithm to each patch to fit a two component Gaussian mixture model to each feature vector (Zhou et al., Goljan et al. [19] propose a Gaussian Mixture Model (GMM) to classify CFA present regions (authentic regions) and CFA absent regions (tampered regions). Bappy et al. [2] propose an LSTM based network applied to small image patches to find the tampering artifacts on the boundaries between tampered patches and image patches, p1055; Cozzolino et al., These vectors are then fed to the expectation-maximization (EM) algorithm, which learns the two models together with the corresponding segmentation map, part IVA).

Zhou et al., Bayar et al., Cozzolino et al., and Mayer et al. do not explicitly disclose a hundred-dimensional feature vector.

Shu et al. teach determining a hundred-dimensional feature vector for each patch (define a common GMM to guide the consistent segmentation of a family of models, part 3.3, collection of 100-dimensional high-level feature vectors in the output layer, part 4.1, cluster the patches of the shape by considering their corresponding high-level feature vectors. GMM is employed for clustering in our method, resulting in a probability matrix depicting the probabilities for a patch belonging to a cluster, part5).

Zhou et al. and Shu et al. are in the same art of neural networks (Zhou et al., abstract; Shu et al., part 1). The combination of Shu et al. with Zhou et al., Dolhansky et al., and Cozzolino et al. enables using a 100 dimensional feature vector. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the vector of Shu et al. with the tampered image detection of Zhou et al., Dolhansky et al., and Cozzolino et al. as this was known at the time of filing, the combination would have predictable results, and as it has been held that discovering an optimum value of a result effective variable involves only routine skill in the art. In re Boesch, 617 F.2d 272, 205 USPQ 215 (CCPA 1980), and as Shu et al. indicate it is a commonly used design (part 4) indicating that it would be obvious to try.

Claims 5, 12 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhou et al. (“Learning Rich Features for Image Manipulation Detection” 2018) and Dolhansky et al. (US 10810725 B1) and Cozzolino et al. (IDS: Noiseprint: a CNN-based camera model fingerprint, 2018) and Mayer et al. (IDS: “LEARNED FORENSIC SOURCE SIMILARITY FOR UNKNOWN CAMERA MODELS”) as applied to claims 1, 8 and 14 above, further in view of Alberti et al. (IDS: “Are You Tampering With My Data?”, 2018).

Regarding claims 5, 12, and 18, Zhou et al., Dolhansky et al., and Cozzolino et al. disclose the system, method, and CRM of claims 1, 8 and 14. Zhou et al. further indicate the neural network is multiple layer deep Convolutional Neural Network (CNN) (Bilinear pooling [23], first proposed for finegrained classification, combines streams in a two-stream CNN network while preserving spatial information to improve the detection confidence, p1056; layers shown in Fig. 2), but do not specify it is 18 layers deep.

Alberti et al. teach a neural network is an 18 layer deep Convolutional Neural Network (CNN) (“propose a novel approach towards adversarial attacks on neural networks (NN), focusing on tampering the data used for training  instead of generating attacks on trained models… networks to misclassify any images to which the modification is applied, abstract, network models ResNet-18, p6, The residual network we used differs from a the original ResNet-18 model as it has an expected input size of 32×32 instead of the standard 224×224, p7).

Zhou et al. and Dolhansky et al. and Cozzolino et al. and Alberti et al. are in the same art of detecting fake images (Zhou et al., abstract; Dolhansky et al., abstract; Cozzolino et al., abstract; Alberti et al., abstract). The combination of Alberti et al. with Zhou et al., Dolhansky et al., and Cozzolino et al. enables using a 18 layer CNN. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the 18 layer CNN of Alberti et al. with the tampered image detection of Zhou et al., Dolhansky et al., and Cozzolino et al. as this was known at the time of filing, the combination would have predictable results, and as Alberti et al. indicate this will avoid a significant overhead in terms of computations performed and exhibits higher performances on CIFAR-10 (p7).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: “SpliceRadar: A Learned Method For Blind Image Forensics”.
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M ENTEZARI HAUSMANN whose telephone number is (571)270-5084. The examiner can normally be reached 10-7 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent M Rudolph can be reached at (571) 272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHELLE M ENTEZARI HAUSMANN/Primary Examiner, Art Unit 2671
Read full office action
Prosecution Timeline

Jul 19, 2022
Application Filed
Jun 07, 2024
Non-Final Rejection — §103
Nov 12, 2024
Response Filed
Dec 17, 2024
Final Rejection — §103
Jun 16, 2025
Request for Continued Examination
Jun 23, 2025
Response after Non-Final Action
Jul 01, 2025
Non-Final Rejection — §103
Jan 05, 2026
Response Filed
Feb 07, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/742,463
Patent 12602775
INTERPOLATION OF MEDICAL IMAGES
2y 5m to grant Granted Apr 14, 2026
17/855,522
Patent 12602793
Systems and Methods for Predicting Object Location Within Images and for Analyzing the Images in the Predicted Location for Object Tracking
2y 5m to grant Granted Apr 14, 2026
18/335,046
Patent 12602949
SYSTEM AND METHOD FOR DETECTING HUMAN PRESENCE BASED ON DEPTH SENSING AND INERTIAL MEASUREMENT
2y 5m to grant Granted Apr 14, 2026
17/964,716
Patent 12597261
OBJECT MOVEMENT BEHAVIOR LEARNING
2y 5m to grant Granted Apr 07, 2026
18/346,894
Patent 12597244
METHOD AND DEVICE FOR IMPROVING OBJECT RECOGNITION RATE OF SELF-DRIVING CAR
2y 5m to grant Granted Apr 07, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
76%
Grant Probability
98%
With Interview (+21.6%)
3y 1m
Median Time to Grant
High
PTA Risk
Based on 863 resolved cases by this examiner. Grant probability derived from career allow rate.
Computer Vision Systems and Methods for Blind Localization of Image Forgery

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email