Last updated: April 19, 2026
Application No. 18/366,750
METHOD AND SYSTEM FOR TRAINING A MACHINE LEARNING MODEL WITH A SUBCLASS OF ONE OR MORE PREDEFINED CLASSES OF VISUAL OBJECTS

Non-Final OA §103
Filed
Aug 08, 2023
Examiner
ANSARI, TAHMINA N
Art Unit
2674
Tech Center
2600 — Communications
Assignee
Briefcam Ltd.
OA Round
6 (Non-Final)
Interview Optional

— +17.9% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 868 resolved cases, 2023–2026
Examiner Intelligence

ANSARI, TAHMINA N View full profile →
Grants 86% — above average
Career Allow Rate
743 granted / 868 resolved
+23.6% vs TC avg
Strong +18% interview lift
Without
With
+17.9%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
33 currently pending
Career history
901
Total Applications
across all art units
Statute-Specific Performance

§101
12.2%
-27.8% vs TC avg
§103
40.4%
+0.4% vs TC avg
§102
22.6%
-17.4% vs TC avg
§112
10.5%
-29.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 868 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This is in response to the applicant’s reply filed October 14, 2025.  In the applicant’s reply; no claims were amended, cancelled, or newly added.  Claims 1-18 are pending in this application.
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Examiner’s Responses to Applicant’s Remark
Applicants' amendments filed on October 14, 2025 have been fully considered. The amendments overcome the following rejections set forth in the office action mailed on July 14, 2025.
Applicant’s amendments overcome the rejections of Claims 1-18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (US PGPub US 2018/0165554 A1, hereby referred to as “Zhang”), in view of Truong et al. (US Patent US 10,614,207 B1, hereby referred to as “Truong”), and the rejection is hereby withdrawn. 
Applicant's arguments with respect to claims 1-18 have been considered but are moot in view of the new ground(s) of rejection, presented below.  

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains.  Patentability shall not be negatived by the manner in which the invention was made.


Claims 1-18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (US PGPub US 2018/0165554 A1, hereby referred to as “Zhang”), in view of Truong et al. (US Patent US 10,614,207 B1, hereby referred to as “Truong”), further in view of Adeel et al. (US PGPub US 2024/0046515A1, with priority to August 4, 2022, hereby referred to as “Adeel”).

Consider Claims 1, 7 and 13. 
Zhang teaches: 
1. A method of training a machine learning model with a subclass of one predefined class of visual objects obtained from one or more videos, the method comprising: / 7. A system for training a machine learning model with a subclass of one predefined class of visual objects obtained from one or more videos, the system comprising: a computer memory configured to store one or more input videos comprising visual objects; / 13. A non-transitory computer readable medium for training a machine learning model with a subclass of one predefined class of visual objects obtained from one or more videos, the computer readable medium comprising a set of instructions that, when executed, cause at least one computer processor to: (Zhang: abstract, A method of modelling data, comprising: training an objective function of a linear classifier, based on a set of labeled data, to derive a set of classifier weights; defining a posterior probability distribution on the set of classifier weights of the linear classifier; approximating a marginalized loss function for an autoencoder as a Bregman divergence, based on the posterior probability distribution on the set of classifier weights learned from the linear classifier; and classifying unlabeled data using the autoencoder according to the marginalized loss function. [0021]-[0030])
1. presenting to a human operator, over an electronic display, a plurality of visual objects obtained from the one or more videos, wherein the plurality of visual objects belongs to one predefined class of visual objects;  / 7. a classifier implemented by a computer processor configured to classify the visual objects from the input videos into a plurality of predefined classes; an electronic display configured to present to a human operator, a plurality of visual objects obtained from the one or more videos, belonging to one of the one predefined class requested by the human operator; / 13. classify the visual objects from the one or more videos into a plurality of predefined classes; present to a human operator, over an electronic display, a plurality of visual objects obtained from the one or more videos, belonging to the one or more of the predefined classes requested by the human operator; (Zhang: [0031], [0032] A prototype-based cluster is a set of objects in which each object is closer (more similar) to the prototype that defines the cluster than to the prototype of any other cluster. For data with continuous attributes, the prototype of a cluster is often a centroid, i.e., the average (mean) of all the points in the cluster. When a centroid is not meaningful, such as when the data has categorical attributes, the prototype is often a medoid, i.e., the most representative point of a cluster. For many types of data, the prototype can be regarded as the most central point. These clusters tend to be globular. K-means is a prototype-based, partitional clustering technique that attempts to find a user-specified number of clusters (K), which are represented by their centroids. Prototype-based clustering techniques create a one-level partitioning of the data objects. There are a number of such techniques, but two of the most prominent are K-means and K-medoid. K-means defines a prototype in terms of a centroid, which is usually the mean of a group of points, and is typically applied to objects in a continuous n-dimensional space. K-medoid defines a prototype in terms of a medoid, which is the most representative point for a group of points, and can be applied to a wide range of data since it requires only a proximity measure for a pair of objects. While a centroid almost never corresponds to an actual data point, a medoid, by its definition, must be an actual data point),
1. receiving from the human operator, over a user interface associated with the electronic display, selected visual objects being a selection of some of the plurality of the presented visual objects, wherein the selection is directed at visual objects belonging to at least one subclass of the one predefined class; / 7. a user interface associated with the electronic display, configured to receive from the human operator, selected visual objects being a selection of some of the plurality of the presented visual objects, wherein the selection is directed at visual objects belonging to at least one subclass of the one predefined class; / 13. receive from the human operator, over a user interface associated with the electronic display, selected visual objects being a selection of some of the plurality of the presented visual objects, wherein the selection is directed at visual objects belonging to at least one subclass of the one predefined class;  (Zhang: [0021] In supervised classification, the mapping from a set of input data vectors to a finite set of discrete class labels is modeled in terms of some mathematical function including a vector of adjustable parameters. The values of these adjustable parameters are determined (optimized) by an inductive learning algorithm (also termed inducer), whose aim is to minimize an empirical risk function on a finite data set of input. [0022]-[0023], [0049] According to the present technology, the semisupervised approach is adopted, where label information is introduced to guide the feature learning procedure. In particular, a novel loss function is provided for training autoencoders that are directly coupled with the classification task. A linear classifier is first trained on Bo W, then a Bregman Divergence [Banerjee et al. 2004] is derived as the loss function of a subsequent autoencoder. [0055] The labelled data may be sentiment data, user preferences, social network data/documents, newsfeed, email, or other types of documents or semantic information, and in some cases multimodal data or non-semantic data, though preferably the data has semantic content amenable to analysis.)
1. carrying out  training of a classifier, using a computer processor, to distinguish between the selected visual objects, and the presented visual objects which were not selected / 7. and machine learning module, implemented by the computer processor, configured to carry out  training of a classifier to distinguish between the selected visual objects and the presented visual objects which were not selected, / 13. carry out  training of a machine learning train a model to distinguish between the selected visual objects and the presented visual objects which were not selected; (Zhang: [0024] Clustering algorithms partition data into a certain number of clusters (groups, subsets, or categories). Important considerations include feature selection or extraction (choosing distinguishing or important features, and only such features); Clustering algorithm design or selection (accuracy and precision with respect to the intended use of the classification result; feasibility and computational cost; etc.); and to the extent different from the clustering criterion, optimization algorithm design or selection. [0025], [0026] There are generally three types of clustering structures, known as partitional clustering, hierarchical clustering, and individual clusters. The most commonly discussed distinction among different types of clusterings, is whether the set of clusters is nested or unnested, or in more traditional terminology, hierarchical or partitional. A partitional clustering is simply a division of the set of data objects into non-overlapping subsets ( clusters) such that each data object is in exactly one subset. If the cluster shave sub-clusters, then we obtain a hierarchical clustering, which is a set of nested clusters that are organized as a tree. Each node (cluster) in the tree (except for the leaf nodes) is the union of its children (sub-clusters), and the root of the tree is the cluster containing all the objects. Often, but not always, the leaves of the tree are singleton clusters of individual data objects. A hierarchical clustering can be viewed as a sequence of partitional clusterings and a partitional clustering can be obtained by taking any member of that sequence; i.e., by cutting the hierarchical tree at a particular level)
1. and applying the classifier which detects visual objects belonging to the selected at least one subclass / 7. Wherein the computer processor applies the classifier which detects visual objects belonging to the selected at least one subclass / 13. and apply the classifier which detects visual objects belonging to the selected at least one subclass (Zhang: [0026] There are generally three types of clustering structures, known as partitional clustering, hierarchical clustering, and individual clusters. The most commonly discussed distinction among different types of clusterings, is whether the set of clusters is nested or unnested, or in more traditional terminology, hierarchical or partitional. A partitional clustering is simply a division of the set of data objects into non-overlapping subsets ( clusters) such that each data object is in exactly one subset. If the cluster shave sub-clusters, then we obtain a hierarchical clustering, which is a set of nested clusters that are organized as a tree. Each node (cluster) in the tree (except for the leaf nodes) is the union of its children (sub-clusters), and the root of the tree is the cluster containing all the objects. Often, but not always, the leaves of the tree are singleton clusters of individual data objects. A hierarchical clustering can be viewed as a sequence of partitional clusterings and a partitional clustering can be obtained by taking any member of that sequence; i.e., by cutting the hierarchical tree at a particular level. [0063] Autoencoders transform an unsupervised learning problem to a supervised one by the self-reconstruction criteria. This enables one to use all the tools developed for supervised learning such as back propagation to efficiently train the autoencoders)
Zhang does not teach the following amended features for the independent claims: 
-; a generative adversarial network model 
-; creating, by a computer processor, additional samples of the visual objects belonging to at least one subclass of the one predefined class, wherein the additional samples are created by tracking, by the computer processor, said visual objects across video frames of the one or more videos;
-; carrying out training of a classifier, using [[a]] the computer processor, the selected visual objects and the additional samples of the visual objects, to distinguish between the selected visual objects and the presented visual objects which were not selected;
Truong teaches: 
1. A method of training a machine learning model with a subclass of one predefined class of visual objects obtained from one or more videos, the method comprising: / 7. A system for training a machine learning model with a subclass of one predefined class of visual objects obtained from one or more videos, the system comprising: a computer memory configured to store one or more input videos comprising visual objects; / 13. A non-transitory computer readable medium for training a machine learning model with a subclass of one predefined class of visual objects obtained from one or more videos, the computer readable medium comprising a set of instructions that, when executed, cause at least one computer processor to: (Truong: column 4 lines 1-65, column 5 lines 1-8, Figure 1, According to some aspects, the GAN model may generate variants of the objects and the backgrounds. The captcha images may be assembled from the variants of the objects and the backgrounds, adding blurs on boundaries of the objects and the backgrounds. Before discussing these concepts in greater detail, however, several examples of a computing device that may be used in implementing and/or otherwise providing various aspects of the disclosure will first be discussed with respect to FIG. 1. FIG. 1 illustrates one example of a computing device 101 that may be used to implement one or more illustrative aspects discussed herein.)
1. presenting to a human operator, over an electronic display, a plurality of visual objects obtained from the one or more videos, wherein the plurality of visual objects belongs to one predefined class of visual objects;  / 7. a classifier implemented by a computer processor configured to classify the visual objects from the input videos into a plurality of predefined classes; an electronic display configured to present to a human operator, a plurality of visual objects obtained from the one or more videos, belonging to one of the one predefined class requested by the human operator; / 13. classify the visual objects from the one or more videos into a plurality of predefined classes; present to a human operator, over an electronic display, a plurality of visual objects obtained from the one or more videos, belonging to the one or more of the predefined classes requested by the human operator; (Truong: Figure 2A, column 5 lines 40-65, column 6 lines 1-25, FIG. 2 illustrates an example network architecture 200 of a GAN model. A Generative Adversarial Network (GAN) model may be a class of machine learning systems that may include a generator 210 and a discriminator 220, which may contest with each other in a zero-sum game framework. Generator 210 or discriminator 220 may each be an artificial neural network, which may be a collection of connected nodes, with the nodes and connections each having assigned weights used to generate predictions. Each node in the artificial neural network may receive input and generate an output signal. The output of a node in the artificial neural network may be a function of its inputs and the weights associated with the edges. Generator 210 may generate new data instances based on a training dataset such as ground-truth images 202, while discriminator 220 may evaluate them for authenticity. For example, discriminator 220 may decide whether each instance of data that it reviews belongs to the actual training dataset or not. Meanwhile, generator 210 may create new, synthetic images 204 that it passes to discriminator 220 in the hopes that these new synthetic images 204 will be deemed authentic, even though they are fake. The goal of generator 210 may be to generate passable synthetic images 204 and the goal of discriminator 220 may be to identify images coming from generator 210 as fake. Figure 5, step 525, 530)
1. receiving from the human operator, over a user interface associated with the electronic display, a selection of some of the plurality of visual objects, wherein the selection is directed at visual objects belonging to at least one subclass of the one predefined class; / 7. a user interface associated with the electronic display, configured to receive from the human operator, a selection of some of the plurality of visual objects, wherein the selection is directed at visual objects belonging to at least one subclass of the one predefined class; / 13. receive from the human operator, over a user interface associated with the electronic display, a selection of some of the plurality of visual objects, wherein the selection is directed at visual objects belonging to at least one subclass of the one predefined class; (Truong: column 6 lines 64-67, column 7 lines 1-16, FIG. 3 illustrates a flow chart for a method of using the GAN model to generate captcha images using variations of the same object in accordance with one or more aspects described herein. Other models such as AE and VAE may be used to generate the captcha images using variations of the same object. As used herein (and as discussed above with respect to FIG. 2), the GAN model may include one or more generators and discriminators. Method 300 may be implemented by a suitable computing system, as described further herein. For example, method 300 may be implemented by any suitable computing environment by a computing device and/or combination of computing devices, such as computing devices 101, 105, 107, and 109 of FIG. 1. Method 300 may be implemented in suitable program instructions, such as in machine learning software 127, and may operate on a suitable training set, such as training set data 129. At step 305, the system may generate a plurality of backgrounds for a plurality of captcha images. A captcha may be a type of challenge-response test used in computing to determine whether or not the user is human. Column 8 lines 44-49, FIG. 4 depicts a flow chart for a method of training 45 the GAN model according to one or more aspects of the disclosure, which may be used to implement step 315. Turning to step 405, the generator may generate synthetic images based on the plurality of GAN parameters and the set of training data. Column 10 lines 1-5, At step 425, the discriminator may evaluate the new synthetic images and ground-truth images for the predications of authenticity, similar to that of step 410. At step 430, the system may determine whether predictions of authenticity reach a threshold value.)
1. creating, by a computer processor, additional samples of the visual objects belonging to at least one subclass of the one predefined class, wherein the additional samples are created by mapping, by the computer processor, said visual objects across video frames of the one or more videos;/ 7. wherein the computer processor creates additional samples of the visual objects belonging to at least one subclass of the one predefined class, wherein the additional samples are created by mapping, by the computer processor, said visual objects across video frames of the one or more videos; / 13. create additional samples of the visual objects belonging to at least one subclass of the one predefined class, wherein the additional samples are created by mapping said visual objects across video frames of the one or more videos; (Truong: column 5 lines 65-67, column 6 lines 1-7, Truong also teaches a GAN As illustrated in FIG. 2, generator 210 may take in ground-truth images 202 as a training dataset and return synthetic images 204. These generated images may be fed into discriminator 220 with ground-truth images 202. Discriminator 220 may take in both real and fake images and return predicted labels 206 such as real and fake. In some instances, predicted labels may be represented in probabilities, as a number between 0 and 1 , with 1 representing a prediction of authenticity and 0 representing fake. Column 10 lines 14-67, The system may not be allowed to set the predictions of authenticity of the synthetic images that equal 100%. When the generator generates the synthetic images, the generator may be aware that thCoere are X number of objects in the scene. When the discriminator tries to guess how many objects in the scene, the system may set the target so that the discriminator is correct, for example, 80% of the time. Given that the system may not allowed to set the discriminator to be perfectly accurate, there may be a minimal and maximal confidence level associated with the predictions of authenticity for the discriminator. The system may implement a mechanism so that the human users may recognize the images generated by the GAN model. When these images are employed as captcha images, they may have the opportunity to be applied to millions of users, who may function analogously to the discriminator in the GAN model. In this fashion, based on the sample set of the images generated by the GAN model, the users may serve as a human discriminator to calibrate the predictions of authenticity. For example, if the users are unable to identify the objects in the scene of the synthetic images, the system may tune the generator to generate less real-looking images. The system may roll back the GAN model to the previous version with the corresponding model parameters. Based on the feedback from human users, the GAN model may be tuned to generate the synthetic images so that the discriminator may identify these images as real or fake with the appropriate predictions of authenticity. Column 8 lines 47-63 Turning to step 405, the generator may generate synthetic images based on the plurality of GAN parameters and the set of training data. The generator may be a function that transforms an arbitrary input into a synthetic output. For example, an arbitrary input may be a 2D sample such as a ground-truth image, with a (x, y) value drawn from a uniform or Gaussian distribution, and the output may also a 2D sample, such as a synthetic image, but mapped into a different position, which is a fake sample. The mapping may be visualized using manifold, where the input space may be represented as a uniform square grid. As the function maps positions in the input space into new positions, the whole grid in the output, now consisting of irregular quadrangles, would look like a warped version of the original regular grid. The area or density of each warped cell may have changed, and a very fine-grained manifold may look approximately the same as the visualization of the fake samples.)
1. carrying out training of a classifier, using [[a]] the computer processor, the selected visual objects and the additional samples of the visual objects, to distinguish between the selected visual objects and the presented visual objects which were not selected; / 7. And machine learning module, implemented by the computer processor, configured to carryout training of a classifier, using the selected visual objects and the additional samples of the visual objects. to distinguish between the selected visual objects and the presented visual objects which were not selected, / 13. carry out training of a classifier using the selected visual objects and the additional samples of the visual objects, to distinguish between the selected visual objects and the presented visual objects which were not selected;   (Truong: Column 8 lines 44-49, FIG. 4 depicts a flow chart for a method of training 45 the GAN model according to one or more aspects of the disclosure, which may be used to implement step 315. Turning to step 405, the generator may generate synthetic images based on the plurality of GAN parameters and the set of training data. column 10 lines 14-67, If the answer to step 430 is no, that the predictions of authenticity have not reached a desirable threshold value, the process may go to step 405, where the system proceeds with a new iteration of the training process and the generator may generate new synthetic images based on the new model parameters. If the answer to step 430 is yes, that the predictions of authenticity have reached a threshold value, the process may go to step 435, where the system may generate refined model parameters based on the refined GAN model with appropriate predictions of authenticity. The refined model parameters may be based on the updated model parameters generated in step 415. Referring back to FIG. 3, after the system has trained the GAN model at step 315 as illustrated in FIG. 4, the process may proceed to step 320, where the system may generate variants of the plurality of objects based on the trained GAN model. Based on the synthetic images generated by the generator, with the corresponding appropriate predictions of authenticity recognizable by the discriminator, the system may select one or more of the objects in the synthetic images as the variants of the objects. In the cases that there are a range of predictions of authenticity that satisfy the threshold value at step 430, the system may select a plurality of objects in the synthetic images as the variants.)
1. and applying the classifier which detects visual objects belonging to the selected at least one subclass / 7. Wherein the computer processor applies the classifier which detects visual objects belonging to the selected at least one subclass / 13. and apply the classifier which detects visual objects belonging to the selected at least one subclass (Truong: Column 9 lines 3-7, lines 19-22, lines 43-45, At step 410, the discriminator may evaluate the generated synthetic images and ground-truth images for predications of authenticity. The discriminator may be implemented as a multilayer perceptron (MLP), which is a deep neural network…. At step 415, the system may update the plurality of model parameters based on predictions of authenticity. The system may adjust the parameters, or the weights and biases, of the model in order to minimize error in its prediction of authenticity.... At step 420, the system may start a second iteration, where the generator may generate new synthetic images based on the updated model parameters. Column 10 lines 1-5, At step 425, the discriminator may evaluate the new synthetic images and ground-truth images for the predications of authenticity, similar to that of step 410. At step 430, the system may determine whether predictions of authenticity reach a threshold value.)
It would have been obvious before the effective filing date of the claimed invention was made to one of ordinary skill in the art to modify Zhang’s semisupervised autoencoder that leverages a hierarchical class-subclass architecture and clustering algorithm with Truong’s GAN variant-model based machine learning. The determination of obviousness is predicated upon the following findings: Both Truong and Zhang are in the same overall field of endeavor of learning-based architectures for object classification., and one skilled in the art would have been motivated to modify Zhang’s overall label-based object detection and classifier model to benefit from Truong’s GAN, as it is an improved and more robust algorithm for learning and object classification. Furthermore, the prior art collectively includes each element claimed (though not all in the same reference), and one of ordinary skill in the art could have combined the elements in the manner explained above using known engineering design, interface and programming techniques, without changing a “fundamental” operating principle of Zhang, while the teaching of Truong continues to perform the same function as originally taught prior to being combined, in order to produce the repeatable and predictable result of using a GAN-based model that can improve the overall learning and classification system. It is for at least the aforementioned reasons that the examiner has reached a conclusion of obviousness with respect to the claim in question.  
Even if the combination of Zhang and Truong does not teach: 
wherein the additional samples are created by tracking said visual objects across video frames of the one or more videos
Adeel teaches: 
1. A method of training a machine learning model with a subclass of one predefined class of visual objects obtained from one or more videos, the method comprising: / 7. A system for training a machine learning model with a subclass of one predefined class of visual objects obtained from one or more videos, the system comprising: a computer memory configured to store one or more input videos comprising visual objects; / 13. A non-transitory computer readable medium for training a machine learning model with a subclass of one predefined class of visual objects obtained from one or more videos, the computer readable medium comprising a set of instructions that, when executed, cause at least one computer processor to: (Adeel: abstract, A video file may be presented via a user application that displays one or more video frames of the video file. A user request to perform an object detection for objects of a specific object type in a video frame of the video file may be received from the user application. A machine-learning model of a plurality of machine-learning models that is configured to detect objects of the specific object type may be applied to the video frame to detect an object of the specific object type in the video frame. Each of the plurality of machine-learning models may be trained to detect objects of a corresponding object type. Subsequently, an object tracking algorithm may be applied to one or more additional video frames of the video file to track the object of the specific object type across the one or more additional video frames. [0028]-[0033], Figure 5)
1. receiving from the human operator, over a user interface associated with the electronic display, a selection of some of the plurality of visual objects, wherein the selection is directed at visual objects belonging to at least one subclass of the one predefined class; / 7. a user interface associated with the electronic display, configured to receive from the human operator, a selection of some of the plurality of visual objects, wherein the selection is directed at visual objects belonging to at least one subclass of the one predefined class; / 13. receive from the human operator, over a user interface associated with the electronic display, a selection of some of the plurality of visual objects, wherein the selection is directed at visual objects belonging to at least one subclass of the one predefined class; (Adeel: [0020] In turn, the machine-learning model may detect and highlight the objects of the specific object type in the video frame 118. The application of a machine-learning model may result in the content processing engine 104 drawing an outline shape that surrounds each object of the specific object type that is detected, as well as presenting a text label that indicates the object type of each object detected. For example, as shown in the user interface screen illustrated in FIG. 2 , the machine-learning model configured to detect faces may have detected three human faces in the video frame 118. As such, the content processing engine 104 may have drawn an outline shape (e.g., a box 202) around each detected face and labeled each detected human face with the object type label “face” (e.g., a label 204). [0021] Following a review of the objects in a video frame detected by the machine-learning model, the user may use the user interface controls to initiate one or more additional data processing operations on each object of the specific object type that is detected. In one instance, the user may initiate a tracking of the object of the specific object type across other video frames of the video file. For example, the content processing engine 104 may apply an object tracking algorithm to the object of the specific object type so that as the object appears in other video frames, the content processing engine 104 is able to highlight and identify the presence of the object in the other video frames. [0031] The content processing engine 104 may include an interface module 512, a detection module 514, a tracking module 516, a redaction module 518, and a machine-learning module 520. The modules may include routines, program instructions, objects, and/or data structures that perform particular tasks or implement particular abstract data types. The memory 506 may also include a data store 522 that is used by the content processing engine 104. [0032] The interface module 512 may include functionalities for streaming video files to a user application (e.g., user application 108) on a remote user device, such as the user device 110.)
1. creating, by a computer processor, additional samples of the visual objects belonging to at least one subclass of the one predefined class, wherein the additional samples are created by tracking, by the computer processor, said visual objects across video frames of the one or more videos;/ 7. wherein the computer processor creates additional samples of the visual objects belonging to at least one subclass of the one predefined class, wherein the additional samples are created by tracking, by the computer processor, said visual objects across video frames of the one or more videos; / 13. create additional samples of the visual objects belonging to at least one subclass of the one predefined class, wherein the additional samples are created by tracking said visual objects across video frames of the one or more videos; (Adeel: [0021] Following a review of the objects in a video frame detected by the machine-learning model, the user may use the user interface controls to initiate one or more additional data processing operations on each object of the specific object type that is detected. In one instance, the user may initiate a tracking of the object of the specific object type across other video frames of the video file. For example, the content processing engine 104 may apply an object tracking algorithm to the object of the specific object type so that as the object appears in other video frames, the content processing engine 104 is able to highlight and identify the presence of the object in the other video frames. As shown in the example user interface screen 200 illustrated in FIG. 2 , the user 112 may use the option 206 to initiate the tracking of the three detected faces in the video frames of the video file 116. [0022], [0031], [0037] The tracking module 516 may be activated to track an object of a specific object type that is identified by a machine-learning model across multiple video frames. In some embodiments, the tracking may be performed using an object tracking algorithm that makes use of object pattern recognition. The object pattern recognition may reduce an image of the object to be tracked into a set of features. The object pattern recognition may then look for the set of features in the next video frame to track the object across multiple video frames.)
1. carrying out training of a classifier, using [[a]] the computer processor, the selected visual objects and the additional samples of the visual objects, to distinguish between the selected visual objects and the presented visual objects which were not selected; / 7. And machine learning module, implemented by the computer processor, configured to carryout training of a classifier, using the selected visual objects and the additional samples of the visual objects. to distinguish between the selected visual objects and the presented visual objects which were not selected, / 13. carry out training of a classifier using the selected visual objects and the additional samples of the visual objects, to distinguish between the selected visual objects and the presented visual objects which were not selected; (Adeel: [0022], [0031], [0037] The tracking module 516 may be activated to track an object of a specific object type that is identified by a machine-learning model across multiple video frames. In some embodiments, the tracking may be performed using an object tracking algorithm that makes use of object pattern recognition. The object pattern recognition may reduce an image of the object to be tracked into a set of features. The object pattern recognition may then look for the set of features in the next video frame to track the object across multiple video frames. For example, the object tracking algorithm may be a target representation and localization algorithm, a filtering and data association algorithm, or some other comparable algorithm. For an object that is tracked across multiple video frames, the tracking module 516 may superimpose an indicator on each video frame to show that the object of the specific object type is being tracked across multiple video frames. For example, the indicator may include an outline shape that surrounds the image of the object, as well as present a text label obtained from the detection module 514 that indicates the object type of the object. This may result in the object being shown as being bounded by an outline shape with an object type label as the object moves around in a field of view as the video file is being played back. In some embodiments, the tracking module 516 may provide user interface controls that enable a user to select a particular video portion of the video file (e.g., a specific set of frames), for which the tracking of an object of an object type may be performed. In some instances, the tracking module 516 may track an object of a specific object type across multiple video frames of a video file after the user has corrected the object type label for the object.)
1. and applying the classifier which detects visual objects belonging to the selected at least one subclass / 7. Wherein the computer processor applies the classifier which detects visual objects belonging to the selected at least one subclass / 13. and apply the classifier which detects visual objects belonging to the selected at least one subclass (Adeel: [0037], [0038] In other embodiments, the object pattern recognition may be fine-tuned to detect to not only track objects of specific types, but objects of the specific types with specific feature attributes. For example, the object pattern recognition may be used to track the face of a particular person or a particular license plate across multiple video frames. In such embodiments, the tracking module 516 may provide additional user interface control accessible via a user application, such as the user application 108, that enables users to select objects with specific feature attributes for tracking across the multiple video frames. For example, the user interface controls may be used to independently track images of objects of the same type but of different relative sizes in video frames. [0039] The redaction module 518 may be activated to redact the image of an object in a video frame that is identified via the detection module 514 and/or the tracking module 516. The redaction module 518 may redact the image of the object by applying a visual effect on the image of the object.)
It would have been obvious before the effective filing date of the claimed invention was made to one of ordinary skill in the art to modify the combination of Zhang and Truong for a GAN variant-based machine learning model that uses a hierarchical class-subclass architecture and clustering with Adeel’s interactive machine learning model for video frames. The determination of obviousness is predicated upon the following findings: Truong and Zhang are both in the same overall field of endeavor of machine learning-based architectures for object classification in image data, and Adeel is also directed towards object tracking in video frames using machine learning models. One skilled in the art would have been motivated to modify the combination of Zhang and Truong for a GAN-based object detection learning model for a classifier to leverage Adeel’s improved and more robust learning algorithm for object detection using image tracking across the different video frames. Furthermore, the prior art collectively includes each element claimed (though not all in the same reference), and one of ordinary skill in the art could have combined the elements in the manner explained above using known engineering design, interface and programming techniques, without changing a “fundamental” operating principle of the combination of Zhang and Truong, while the teaching of Adeel continues to perform the same function as originally taught prior to being combined, in order to produce the repeatable and predictable result of using a machine learning model for tracking objects across the different images frames in order to improve the overall learning and classification system. It is for at least the aforementioned reasons that the examiner has reached a conclusion of obviousness with respect to the claim in question.  


Consider Claims 2, 8 and 14. 
The combination of Zhang, Truong and Adeel teaches: 
2. The method according to claim 1, further comprising: determining, using the computer processor and based on the trained model, whether or not a newly obtained visual object belongs to the at least one subclass. / 8. The system according to claim 7, wherein the computer processor is further configured to determine, based on the trained model, whether or not a newly obtained visual object belongs to the at least one subclass. / 14. The non-transitory computer readable medium according to claim 13, further comprises determining, using the computer processor and based on the trained model, whether or not a newly obtained visual object belongs to the at least one subclass. (Truong: column 11 lines 24-33, The system may generate a plurality of captcha images based on variants of the object. For example, the system select an object from an object ground-truth image database, and a background from a background ground-truth image database. The system may assemble a ground-truth captcha image using the object and the background from ground-truth image databases. The system may assemble a ground-truth captcha image using a plurality of objects and a plurality of backgrounds from ground-truth image databases. Column 12 lines 4-24, The security challenge may include a captcha image with a plurality of objects and backgrounds. For example, the captcha image may include five segments. The first segment may include a ground-truth image of a dog and a first background. The second segment may include a first variant of a cat and a second background. The third segment may include a first variant of the dog and a first variant of the first background. The fourth segment may include a second variant of the dog and a second variant of the first background. The fifth segment may include a second variant of the cat and a first variant of the second background. The security challenge may ask the user to identify the number of dogs in the captcha image. The security challenge may ask how many types of animals exist in the captcha image. The security challenge may ask whether the captcha image contains the same type of animals. These images noted above is for illustration purpose, and the system may use any combinations of the objects, the backgrounds and their variants in the captcha images and may ask any combinations of questions in the security challenges. Adeel: [0037] In some embodiments, the tracking module 516 may provide user interface controls that enable a user to select a particular video portion of the video file (e.g., a specific set of frames), for which the tracking of an object of an object type may be performed. In some instances, the tracking module 516 may track an object of a specific object type across multiple video frames of a video file after the user has corrected the object type label for the object. [0038] In other embodiments, the object pattern recognition may be fine-tuned to detect to not only track objects of specific types, but objects of the specific types with specific feature attributes. For example, the object pattern recognition may be used to track the face of a particular person or a particular license plate across multiple video frames. In such embodiments, the tracking module 516 may provide additional user interface control accessible via a user application, such as the user application 108, that enables users to select objects with specific feature attributes for tracking across the multiple video frames. For example, the user interface controls may be used to independently track images of objects of the same type but of different relative sizes in video frames. [0039] The redaction module 518 may be activated to redact the image of an object in a video frame that is identified via the detection module 514 and/or the tracking module 516. The redaction module 518 may redact the image of the object by applying a visual effect on the image of the object.)

Consider Claims 3, 9 and 15. 
The combination of Zhang, Truong and Adeel teaches:
3. The method according to claim 1, wherein the at least one subclass comprises at least a first subclass and a second sub class wherein the selection and the training is carried out for the first subclass and then repeated with the second subclass. / 9. The system according to claim 7, wherein the at least one subclass comprises at least a first subclass and a second sub class wherein the selection and the training is carried out for the first subclass and then repeated with the second subclass. / 15. The non-transitory computer readable medium according to claim 13, wherein the at least one subclass comprises at least a first subclass and a second sub class wherein the selection and the training is carried out for the first subclass and then repeated with the second subclass. (Truong: Column 12 lines 4-40, The security challenge may include a captcha image with a plurality of objects and backgrounds. For example, the captcha image may include five segments. The first segment may include a ground-truth image of a dog and a first background. The second segment may include a first variant of a cat and a second background. The third segment may include a first variant of the dog and a first variant of the first background. The fourth segment may include a second variant of the dog and a second variant of the first background. The fifth segment may include a second variant of the cat and a first variant of the second background. The security challenge may ask the user to identify the number of dogs in the captcha image. The security challenge may ask how many types of animals exist in the captcha image. The security challenge may ask whether the captcha image contains the same type of animals. These images noted above is for illustration purpose, and the system may use any combinations of the objects, the backgrounds and their variants in the captcha images and may ask any combinations of questions in the security challenges. At step 335, the system may determine whether to authorize a user access request based on a response to the security challenge. The response may indicate a number of object types contained in the security challenge or whether the security challenge contains a same type of objects. Based on the response, the system may grant or deny the user access requests. For example, the captcha images may include five objects, a ground-truth image of a dog, three variants of the dog, and a variant of a cat. The security challenge may ask how many dogs exist in the captcha images. Given that the GAN model may generate the variants of the dog or cat with an appropriate prediction of authenticity, the human users may be able to distinguish the variants whether they are cats or dogs, while the bots or computer programs may not to identify. If the user provide the correct response that there are four dogs in the captcha images, the system may grant the access request to the underlying resources from the user. Adeel: [0023], In another instance, the user may initiate a redaction of the object from one or more video frames in the video file. In such an instance, the content processing engine 104 may apply the object tracking algorithm to the object of the specific object type until all appearances of the object in the one or more video frames are recognized. Subsequently, the content processing engine 104 may apply a redaction algorithm to render the appearances of the object in the one or more video frames visually unrecognizable. In various embodiments, the redaction algorithm may apply a pixelation effect, a blurring effect, an opaque overlay effect, and/or some other obfuscation effect to the appearances of the object in the one or more video frames, [0031], [0037]-[0038] In other embodiments, the object pattern recognition may be fine-tuned to detect to not only track objects of specific t
Read full office action
Prosecution Timeline

Aug 08, 2023
Application Filed
Sep 30, 2023
Non-Final Rejection — §103
Jan 04, 2024
Response Filed
Feb 02, 2024
Final Rejection — §103
Aug 08, 2024
Request for Continued Examination
Aug 09, 2024
Response after Non-Final Action
Aug 28, 2024
Non-Final Rejection — §103
Mar 03, 2025
Response Filed
Mar 20, 2025
Final Rejection — §103
Jun 25, 2025
Request for Continued Examination
Jun 26, 2025
Response after Non-Final Action
Jul 10, 2025
Non-Final Rejection — §103
Oct 07, 2025
Examiner Interview Summary
Oct 07, 2025
Applicant Interview (Telephonic)
Oct 14, 2025
Response Filed
Nov 04, 2025
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/068,590
Patent 12586249
PROCESSING APPARATUS, PROCESSING METHOD, AND STORAGE MEDIUM FOR CALIBRATING AN IMAGE CAPTURE APPARATUS
2y 5m to grant Granted Mar 24, 2026
18/484,909
Patent 12586354
TRAINING METHOD, APPARATUS AND NON-TRANSITORY COMPUTER READABLE MEDIUM FOR A MACHINE LEARNING MODEL
2y 5m to grant Granted Mar 24, 2026
18/471,055
Patent 12573083
COMPUTER-READABLE RECORDING MEDIUM STORING OBJECT DETECTION PROGRAM, DEVICE, AND MACHINE LEARNING MODEL GENERATION METHOD OF TRAINING OBJECT DETECTION MODEL TO DETECT CATEGORY AND POSITION OF OBJECT
2y 5m to grant Granted Mar 10, 2026
17/976,971
Patent 12548297
IMAGE PROCESSING METHOD AND APPARATUS, COMPUTER DEVICE, STORAGE MEDIUM, AND PROGRAM PRODUCT BASED ON FEATURE AND DISTRIBUTION CORRELATION
2y 5m to grant Granted Feb 10, 2026
18/444,143
Patent 12524504
METHOD AND DATA PROCESSING SYSTEM FOR PROVIDING EXPLANATORY RADIOMICS-RELATED INFORMATION
2y 5m to grant Granted Jan 13, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

6-7
Expected OA Rounds
86%
Grant Probability
99%
With Interview (+17.9%)
2y 8m
Median Time to Grant
High
PTA Risk
Based on 868 resolved cases by this examiner. Grant probability derived from career allow rate.
METHOD AND SYSTEM FOR TRAINING A MACHINE LEARNING MODEL WITH A SUBCLASS OF ONE OR MORE PREDEFINED CLASSES OF VISUAL OBJECTS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email