Last updated: April 19, 2026
Application No. 18/474,672
SYSTEM, DEVICES AND/OR PROCESSES FOR TRAINING ENCODER AND/OR DECODER PARAMETERS FOR OBJECT DETECTION AND/OR CLASSIFICATION

Non-Final OA §101§102§103
Filed
Sep 26, 2023
Examiner
BAKER, CHARLOTTE M
Art Unit
2664
Tech Center
2600 — Communications
Assignee
Akasa Inc.
OA Round
1 (Non-Final)
Interview Optional

— -0.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 1067 resolved cases, 2023–2026
Examiner Intelligence

BAKER, CHARLOTTE M View full profile →
Grants 93% — above average
Career Allow Rate
991 granted / 1067 resolved
+30.9% vs TC avg
Minimal -0% lift
Without
With
+-0.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 2m
Avg Prosecution
15 currently pending
Career history
1082
Total Applications
across all art units
Statute-Specific Performance

§101
21.6%
-18.4% vs TC avg
§103
24.7%
-15.3% vs TC avg
§102
27.4%
-12.6% vs TC avg
§112
4.3%
-35.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 1067 resolved cases
Office Action

§101 §102 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.  The limitations, under their broadest reasonable interpretation, cover mental process (concept performed in a human mind, including as observation, evaluation, judgment, opinion, organizing human activity and mathematical concepts and calculations).  
Claim 1 recites An apparatus:  an encoder configured to transform samples of a content signal obtained from an electronic document retrieved from a memory to provide an embedded state, the embedded state comprising encoded samples and tokens associating the encoded samples with positional references in the content signal; a decoder configured to transform the embedded state to provide a reconstruction of at least a portion of the content signal; and one or more first neural networks to receive an input tensor populated with an intermediate state of the decoder, the one or more first neural networks to be configured to provide an output tensor comprising indications of detections of one or more features in the content signal based, at least in part, on the input tensor. 
This judicial exception is not integrated into a practical application because the steps do not add meaningful limitations to be considered specifically applied to a particular technological problem to be solved.  The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the steps of the claimed invention can be performed mentally and no additional features in the claim would preclude them from being performed as such except for the generic computer elements at high level generality (i.e., processor, memory)..
According to the USPTO guidelines, a claim is directed to non-statutory subject matter if:    
STEP 1:  the claim does not fall within one of the four statutory categories of invention (process, machine, manufacture or composition of matter), or    
STEP 2: the claim recites a judicial exception, e.g., an abstract idea, without reciting additional elements that amount to significantly more than the judicial exception, as determined using the following analysis:    
STEP 2A (PRONG 1): Does the claim recite an abstract idea, law of nature, or natural phenomenon?   
STEP 2A (PRONG 2): Does the claim recite additional elements that integrate the judicial exception into a practical application?    
STEP 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?    
Using the two-step inquiry, it is clear that claims 1-20 are directed to an abstract idea as shown below:   
  	STEP 1: Do the claims fall within one of the statutory categories? YES.  Claim(s) 1-20 are directed to an apparatus performing a method, a method, and an article performing a method.   
   	STEP 2A (PRONG 1): Is the claim directed to a law of nature, a natural phenomenon or an abstract idea? YES , the claims are directed toward a mental process (i.e., abstract idea).   
  With regard to STEP 2A (PRONG 1), the guidelines provide three groupings of subject matter that are considered abstract ideas:   
Mathematical concepts — mathematical relationships, mathematical formulas or equations, mathematical calculations;   
Certain methods of organizing human activity -- fundamental economic principles or practices (including hedging, insurance, mitigating risk); commercial or legal interactions (including agreements in the form of contracts; legal obligations; advertising, marketing or sales activities or behaviors; business relations); managing personal behavior  or relationships or interactions between people (including social activities, teaching, and following rules or instructions); and   
Mental processes -- concepts that are practicably performed in the human mind (including an observation, evaluation, judgment, opinion).
The apparatus in claim 1 comprises a mental process that can be practicably performed in the human mind (or generic computers or components configured to perform the method performed by the apparatus) and, therefore, an abstract idea.
Regarding claim 1:  an encoder configured to transform samples of a content signal obtained from an electronic document retrieved from a memory to provide an embedded state, the embedded state comprising encoded samples and tokens associating the encoded samples with positional references in the content signal (the functions performed are mathematical formulas or fundamental concepts applied using standard hardware without a significant, inventive technological improvement); a decoder configured to transform the embedded state to provide a reconstruction of at least a portion of the content signal (the functions performed are mathematical formulas or fundamental concepts applied using standard hardware without a significant, inventive technological improvement); and one or more first neural networks to receive an input tensor populated with an intermediate state of the decoder, the one or more first neural networks to be configured to provide an output tensor comprising indications of detections of one or more features in the content signal based, at least in part, on the input tensor (the functions performed by a neural network with tensors are mathematical formulas or fundamental concepts applied using standard hardware without a significant, inventive technological improvement). 
These limitations, as drafted, are a simple process that, under their broadest reasonable interpretation, covers performance of the limitations in the mind or by a human (or by a generic computer).  The Examiner notes that under MPEP 2106.04(a)(2)(III), the courts consider a mental process (thinking) that “can be performed in the human mind, or by a human using a pen and paper" to be an abstract idea. CyberSource Corp. v. Retail Decisions, Inc., 654 F.3d 1366, 1372, 99 USPQ2d 1690, 1695 (Fed. Cir. 2011). As the Federal Circuit explained, "methods which can be performed mentally, or which are the equivalent of human mental work, are unpatentable abstract ideas the ‘basic tools of scientific and technological work’ that are open to all.’" 654 F.3d at 1371, 99 USPQ2d at 1694 (citing Gottschalk v. Benson, 409 U.S. 63, 175 USPQ 673 (1972)). See also Mayo Collaborative Servs. v. Prometheus Labs. Inc., 566 U.S. 66, 71, 101 USPQ2d 1961, 1965 ("‘[M]ental processes and abstract intellectual concepts are not patentable, as they are the basic tools of scientific and technological work’" (quoting Benson, 409 U.S. at 67, 175 USPQ at 675)); Parker v. Flook, 437 U.S. 584, 589, 198 USPQ 193,197 (1978) (same).   

STEP 2A (PRONG 2):  Does the claim recite additional elements that integrate the judicial exception into a practical application? NO, the claims do not recite additional elements that integrate the judicial exception into a practical application.   
With regard to STEP 2A (prong 2), whether the claim recites additional elements that integrate the judicial exception into a practical application, the guidelines provide the following exemplary considerations that are indicative that an additional element (or combination of elements) may have integrated the judicial exception into a practical application:    
an additional element reflects an improvement in the functioning of a computer, or an improvement to other technology or technical field;   
an additional element that applies or uses a judicial exception to affect a particular treatment or prophylaxis for a disease or medical condition;   
an additional element implements a judicial exception with, or uses a judicial exception in conjunction with, a particular machine or manufacture that is integral to the claim;   
an additional element effects a transformation or reduction of a particular article to a different state or thing; and   
an additional element applies or uses the judicial exception in some other meaningful way beyond generally linking the use of the judicial exception to a particular technological environment, such that the claim as a whole is more than a drafting effort designed to monopolize the exception.   

While the guidelines further state that the exemplary considerations are not an exhaustive list and that there may be other examples of integrating the exception into a practical application, the guidelines also list examples in which a judicial exception has not been integrated into a practical application:  
an additional element merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea;   
an additional element adds insignificant extra-solution activity to the judicial exception; and   
an additional element does no more than generally link the use of a judicial exception to a particular technological environment or field of use.   
Claim 1 does not recite any of the exemplary considerations that are indicative of an abstract idea having been integrated into a practical application.

STEP 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception? NO, the claims do not recite additional elements that amount to significantly more than the judicial exception.   
 
  With regard to STEP 2B, whether the claims recite additional elements that provide significantly more than the recited judicial exception, the guidelines specify that the pre-guideline procedure is still in effect. Specifically, that examiners should continue to consider whether an additional element or combination of elements:   
adds a specific limitation or combination of limitations that are not well-understood, routine, conventional activity in the field, which is indicative that an inventive concept may be present; or  
simply appends well-understood, routine, conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception, which is indicative that an inventive concept may not be present.   
 
Claim 1 does not recite any additional elements that are not well-understood, routine or conventional. The use of generic computer elements to “encode, decode and detect as claimed in Claim 1 is a routine, well-understood and conventional process that is performed by computers.   
Independent Claims 9 and 19 contain the same steps included in claim 1; therefore, the same rationale pertains.
 		
Thus, since Claims 1-20 are: (a) directed toward an abstract idea, (b) do not recite additional elements that integrate the judicial exception into a practical application, and 
(c) do not recite additional elements that amount to significantly more than the judicial exception, it is clear that Claim(s) 1-20 are not eligible subject matter under 35 U.S.C 101.   

Regarding claim 2:  the additional limitations do not integrate the mental process into practical application or add significantly more to the mental process.  The limitation(s):  wherein the decoder comprises one or more second neural networks, and wherein the input tensor is populated with one or more intermediate states of the one or more second neural networks (the functions performed by a decoder, neural networks and tensors are mathematical formulas or fundamental concepts applied using standard hardware without a significant, inventive technological improvement).
Regarding claim 3:  the additional limitations do not integrate the mental process into practical application or add significantly more to the mental process.  The limitation(s):  wherein the encoder comprises one or more third neural networks to extract the samples of the content signal, wherein the input tensor is further populated with an intermediate state of the one or more third neural networks (the functions performed by an encoder, neural networks and tensors are mathematical formulas or fundamental concepts applied using standard hardware without a significant, inventive technological improvement).
Regarding claim 4:  the additional limitations do not integrate the mental process into practical application or add significantly more to the mental process.  The limitation(s):  wherein:  the decoder comprises one or more second neural networks, and the input tensor is populated with one or more intermediate states of the one or more second neural networks; the encoder comprises one or more third neural networks to extract the samples of the content signal, the input tensor is further populated with an intermediate state of the one or more third neural networks; and the one or more second neural networks and the one or more third neural networks comprise weights to be applied at activation functions of the one or more second neural networks and the one or more third neural networks that are trained to reconstruct the content signal at an output of the decoder (the functions performed by an encoder, decoder, neural networks and tensors are mathematical formulas or fundamental concepts applied using standard hardware without a significant, inventive technological improvement).
Claims 17 and 20 recite the same limitations; therefore, the same rationale is applicable.
Regarding claim 5:  the additional limitations do not integrate the mental process into practical application or add significantly more to the mental process.  The limitation(s):  wherein:  the content signal comprises one or more images; and the one or more first neural networks comprise weights to be applied at activation functions of the one or more first neural networks to provide inferences of classifications and locations of objects in the one or more images (the functions performed by neural networks and weights applied to an activation function are mathematical formulas or fundamental concepts applied using standard hardware without a significant, inventive technological improvement).
Claims 18 recites the same limitations; therefore, the same rationale is applicable.
Regarding claim 6:  the additional limitations do not integrate the mental process into practical application or add significantly more to the mental process.  The limitation(s):  wherein: the content signal comprises one or more images; and the output tensor comprises indications of classifications and locations of objects in at least one of the one or more images (the functions performed by tensors are mathematical formulas or fundamental concepts applied using standard hardware without a significant, inventive technological improvement).
Claims 13 recites the same limitations; therefore, the same rationale is applicable.
Regarding claim 7:  the additional limitations do not integrate the mental process into 
practical application or add significantly more to the mental process.  The limitation(s):  
wherein the one or more first neural networks comprise a cascade mask region-based 
convolutional neural network (the functions performed by a neural network are 
mathematical formulas or fundamental concepts applied using standard hardware 
without a significant, inventive technological improvement.  Furthermore, the 
cascade mask region-based convolutional neural network is not performing a 
claimed function that is a significant, inventive technological improvement).
Claim 15 recites the same limitations; therefore, the same rationale is applicable.
Regarding claim 8:  the additional limitations do not integrate the mental process into 
practical application or add significantly more to the mental process.  The limitation(s):  
wherein the one or more first neural networks comprise a feature pyramid network (the functions performed by a neural network are mathematical formulas or fundamental concepts applied using standard hardware without a significant, inventive technological improvement.  Furthermore, the feature pyramid network is not performing a claimed function that is a significant, inventive technological improvement).
Claim 16 recites the same limitations; therefore, the same rationale is applicable.
Regarding claim 10:  the additional limitations do not integrate the mental process into practical application or add significantly more to the mental process.  The limitation(s):  wherein: the parameters of the encoder and the decoder having been trained based, at least in part, on a gradient of a loss function comprising a reconstruction loss component; and the reconstruction loss component is based, at least in part, on a comparison of content signals in the one or more training sets and the reconstruction of content signals in the output state of the decoder (the functions performed by training an encoder and decoder on a gradient of a loss function are mathematical formulas or fundamental concepts applied using standard hardware without a significant, inventive technological improvement).
Regarding claim 11:  the additional limitations do not integrate the mental process into practical application or add significantly more to the mental process.  The limitation(s):  wherein: loss function further comprises a contrastive loss component; and the contrastive loss component is determined based, at least in part, on: application of an instance of the decoder to multiple distinct views of a training set content signal to provide multiple encoded views; and computation of a cross-correlation of projections of at least two of the encoded views (the functions performed by a loss function with a contrastive loss component are mathematical formulas or fundamental concepts applied using standard hardware without a significant, inventive technological improvement).
Regarding claim 12:  the additional limitations do not integrate the mental process into practical application or add significantly more to the mental process.  The limitation(s):  and further comprising executing an extractor to provide the samples of the content signal based, at least in part, on the electronic document, and wherein the input tensor having been further populated with one or more intermediate states of the extractor (the functions performed by extracting samples and using tensors are mathematical formulas or fundamental concepts applied using standard hardware without a significant, inventive technological improvement).
Regarding claim 14:  the additional limitations do not integrate the mental process into practical application or add significantly more to the mental process.  The limitation(s):  wherein parameters of the encoder and decoder are further trained based, at least in part, on a gradient of at least a localization loss term and a classification loss term, the localization loss term and the classification loss term have been computed based on the output tensor and labeled data sets (the functions performed by training an encoder and decoder on a gradient of a localization loss term and a classification loss term and using tensors are mathematical formulas or fundamental concepts applied using standard hardware without a significant, inventive technological improvement).
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-6, 9-14 and 17-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Haghighi et al. (hereinafter Haghighi) (US 12,277,687).
Regarding claim 1:  Haghighi discloses an encoder configured to transform samples of a content signal obtained from an electronic document retrieved from a memory (Fig. 1A, Original 113 and Transformed 114) to provide an embedded state (Semantic Genesis explicitly benefits from the deep semantic features enriched by self-discovering and self-classifying anatomical patterns embedded in medical images, and thus contrasts with any other existing 3D models pre-trained by either self-supervision or full supervision., col. 12, ln. 20-24), the embedded state comprising encoded samples (Fig. 1A, encoder 110) and tokens (visual words) associating the encoded samples with positional references (For simplicity and clarity, there are depicted here four coordinates in X-ray images as an example, specifically coordinate AR2-4 at element 103, coordinate AR1-3 at element 104, coordinate RPA at element 105, and coordinate LV at element 106. However, a different quantity of coordinates is permissible and expected. The input to the model as shown here is a transformed anatomical pattern crop 114, and the model is trained to classify the pseudo label and to recover the original crop 113, depicted here as the “restored” crop at element 115. In such a way, the model aims to acquire semantics-enriched representation, producing more powerful application-specific target models., col. 6, ln. 8-20) in the content signal (Fig. 8A); a decoder (Fig. 1A, decoder 111) configured to transform the embedded state to provide a reconstruction of at least a portion of the content signal (So as to permit the Semantic Genesis to restore 115 the transformed anatomical patterns 114, processing computes an L2 distance between the original pattern 113 and the reconstructed pattern via the following loss function:

    PNG
    media_image1.png
    64
    352
    media_image1.png
    Greyscale

where N denotes the batch size, X and X′ represent the ground truth (original anatomical pattern 113) and the reconstructed prediction, respectively., col. 9, ln. 23-35)-; and one or more first neural networks (Fig. 1A, Trained Encoder-Decoder Structure 114) to receive an input tensor (vector) populated with an intermediate state of the decoder (Self-discovery of anatomical patterns: According to another embodiment, processing begins by building a set of anatomical patterns from medical images, as illustrated at FIG. 1A, via the self-discovery sub-component at element 109. An auto-encoder network is first trained with training data to extract deep features of each patient scan, which learns an identical mapping from scan to itself. Once trained, a latent representation vector from the auto-encoder may be used as an indicator of each patient. In such an embodiment, one patient is randomly anchored as a reference 101 and further processing then searches for the nearest neighbors (refer to element 107) to the randomly anchored patient through the entire dataset by computing an L2 distance of the latent representation vectors, resulting in a set of similar patients in appearance., col. 7, ln. 39-53), the one or more first neural networks (Fig. 1A, Trained Encoder-Decoder Structure 114) to be configured to provide an output tensor comprising indications of detections of one or more features in the content signal based, at least in part, on the input tensor (In such an embodiment, one patient is randomly anchored as a reference 101 and further processing then searches for the nearest neighbors (refer to element 107) to the randomly anchored patient through the entire dataset by computing an L2 distance of the latent representation vectors, resulting in a set of similar patients in appearance., col. 7, ln. 48-53).
Regarding claim 2:  Haghighi satisfies all the elements of claim 1.  Haghighi further discloses wherein the decoder comprises one or more second neural networks (As is further depicted by FIG. 1A at the self-classification sub-component depicted at element 117, the restoration branch encodes the input transformed anatomical pattern into a latent space and decodes back to the original resolution, with an aim to recover the original anatomical pattern 113 from the transformed one 114, resulting in the restored pattern 115., col. 9, ln. 23-35), and wherein the input tensor is populated with one or more intermediate states (In such an embodiment, one patient is randomly anchored as a reference 101 and further processing then searches for the nearest neighbors (refer to element 107) to the randomly anchored patient through the entire dataset by computing an L2 distance of the latent representation vectors, resulting in a set of similar patients in appearance., col. 7, ln. 48-53) of the one or more second neural networks (As is further depicted by FIG. 1A at the self-classification sub-component depicted at element 117, the restoration branch encodes the input transformed anatomical pattern into a latent space and decodes back to the original resolution, with an aim to recover the original anatomical pattern 113 from the transformed one 114, resulting in the restored pattern 115., col. 9, ln. 23-35).
Regarding claim 3:  Haghighi satisfies all the elements of claim 1.  Haghighi further discloses wherein the encoder comprises one or more third neural networks to extract the samples of the content signal (Fig. 1A, Self-classification of anatomical patterns categorical cross-entropy loss 116), wherein the input tensor is further populated with an intermediate state (Self-discovery of anatomical patterns: According to another embodiment, processing begins by building a set of anatomical patterns from medical images, as illustrated at FIG. 1A, via the self-discovery sub-component at element 109. An auto-encoder network is first trained with training data to extract deep features of each patient scan, which learns an identical mapping from scan to itself. Once trained, a latent representation vector from the auto-encoder may be used as an indicator of each patient. In such an embodiment, one patient is randomly anchored as a reference 101 and further processing then searches for the nearest neighbors (refer to element 107) to the randomly anchored patient through the entire dataset by computing an L2 distance of the latent representation vectors, resulting in a set of similar patients in appearance., col. 7, ln. 39-53) of the one or more third neural networks (Fig. 1A, Self-classification of anatomical patterns categorical cross-entropy loss 116).
Regarding claim 4:  Haghighi satisfies all the elements of claim 1.  Haghighi further discloses wherein:  the decoder comprises one or more second neural networks (As is further depicted by FIG. 1A at the self-classification sub-component depicted at element 117, the restoration branch encodes the input transformed anatomical pattern into a latent space and decodes back to the original resolution, with an aim to recover the original anatomical pattern 113 from the transformed one 114, resulting in the restored pattern 115., col. 9, ln. 23-35), and the input tensor is populated with one or more intermediate states (In such an embodiment, one patient is randomly anchored as a reference 101 and further processing then searches for the nearest neighbors (refer to element 107) to the randomly anchored patient through the entire dataset by computing an L2 distance of the latent representation vectors, resulting in a set of similar patients in appearance., col. 7, ln. 48-53) of the one or more second neural networks (As is further depicted by FIG. 1A at the self-classification sub-component depicted at element 117, the restoration branch encodes the input transformed anatomical pattern into a latent space and decodes back to the original resolution, with an aim to recover the original anatomical pattern 113 from the transformed one 114, resulting in the restored pattern 115., col. 9, ln. 23-35); the encoder comprises one or more third neural networks  to extract the samples of the content signal (Fig. 1A, Self-classification of anatomical patterns categorical cross-entropy loss 116), the input tensor is further populated with an intermediate state (In such an embodiment, one patient is randomly anchored as a reference 101 and further processing then searches for the nearest neighbors (refer to element 107) to the randomly anchored patient through the entire dataset by computing an L2 distance of the latent representation vectors, resulting in a set of similar patients in appearance., col. 7, ln. 48-53) of the one or more third neural networks (Fig. 1A, Self-classification of anatomical patterns categorical cross-entropy loss 116); and the one or more second neural networks and the one or more third neural networks (Fig. 1A, Self-classification of anatomical patterns categorical cross-entropy loss 116) comprise weights to be applied at activation functions (According to another embodiment of method 600, the training comprises applying a multi-task loss function defined on each transformed anatomical pattern as L=λ.sub.clsL.sub.cls+λ.sub.recL.sub.rec, where λ.sub.cls and λ.sub.rec regulate the weights of classification and reconstruction losses, respectively., col. 15, ln. 59-63) of the one or more second neural networks (As is further depicted by FIG. 1A at the self-classification sub-component depicted at element 117, the restoration branch encodes the input transformed anatomical pattern into a latent space and decodes back to the original resolution, with an aim to recover the original anatomical pattern 113 from the transformed one 114, resulting in the restored pattern 115., col. 9, ln. 23-35) and the one or more third neural networks (Fig. 1A, Self-classification of anatomical patterns categorical cross-entropy loss 116) that are trained to reconstruct the content signal at an output of the decoder (Fig. 1A, Original 113 transformed to 114 and reconstruct (col. 9, ln. 23-35)).
Regarding claim 5:  Haghighi satisfies all the elements of claim 4.  Haghighi further discloses wherein: the content signal comprises one or more images (Fig. 1A); and the one or more first neural networks (Fig. 1A, Trained Encoder-Decoder Structure 114) comprise weights to be applied at activation functions (According to another embodiment of method 600, the training comprises applying a multi-task loss function defined on each transformed anatomical pattern as L=λ.sub.clsL.sub.cls+λ.sub.recL.sub.rec, where λ.sub.cls and λ.sub.rec regulate the weights of classification and reconstruction losses, respectively., col. 15, ln. 59-63) of the one or more first neural networks (Fig. 1A, Trained Encoder-Decoder Structure 114) to provide inferences (predictions) of classifications (According to another embodiment of method 800, the system comprises an encoder-decoder network with a classification head at the end of the encoder; wherein a self-classification branch of the network encodes the input visual word into a latent space followed by a sequence of fully-connected (fc) layers; and wherein the classification branch predicts the Visual word ID associated with the visual word., col. 18, ln. 58-64) and locations of objects in the one or more images (For simplicity and clarity, there are depicted here four coordinates in X-ray images as an example, specifically coordinate AR2-4 at element 103, coordinate AR1-3 at element 104, coordinate RPA at element 105, and coordinate LV at element 106. However, a different quantity of coordinates is permissible and expected. The input to the model as shown here is a transformed anatomical pattern crop 114, and the model is trained to classify the pseudo label and to recover the original crop 113, depicted here as the “restored” crop at element 115. In such a way, the model aims to acquire semantics-enriched representation, producing more powerful application-specific target models., col. 6, ln. 8-20).
Regarding claim 6:  Haghighi satisfies all the elements of claim 1.  Haghighi further discloses wherein: the content signal comprises one or more images (Fig. 1A); and the output tensor comprises indications of classifications and locations of objects in at least one of the one or more images (In such an embodiment, one patient is randomly anchored as a reference 101 and further processing then searches for the nearest neighbors (refer to element 107) to the randomly anchored patient through the entire dataset by computing an L2 distance of the latent representation vectors, resulting in a set of similar patients in appearance., col. 7, ln. 48-53).
Regarding claim 9:  The structural elements of apparatus claim 1 perform all of the steps of method claim 9.  Thus, claim 9 is rejected for the same reasons discussed in the rejection of claim 1.
Regarding claim 10:  Haghighi satisfies all the elements of claim 9.  Haghighi further discloses wherein: the parameters of the encoder and the decoder having been trained based, at least in part, on a gradient of a loss function comprising a reconstruction loss component (So as to permit the Semantic Genesis to restore 115 the transformed anatomical patterns 114, processing computes an L2 distance between the original pattern 113 and the reconstructed pattern via the following loss function:  
    PNG
    media_image1.png
    64
    352
    media_image1.png
    Greyscale

where N denotes the batch size, X and X′ represent the ground truth (original anatomical pattern 113) and the reconstructed prediction, respectively., col. 9, ln. 23-35)-; and the reconstruction loss component is based, at least in part, on a comparison of content signals in the one or more training sets and the reconstruction of content signals in the output state of the decoder (Fig. 1A). 
Regarding claim 11:  Haghighi satisfies all the elements of claim 10.  Haghighi further discloses wherein: loss function further comprises a contrastive loss component (Fig. 1A, Self-restoration of anatomical patterns with L2 norm loss 117); and the contrastive loss component is determined based, at least in part, on: application of an instance of the decoder to multiple distinct views of a training set content signal to provide multiple encoded views (Fig. 1A, Self-discovery of anatomical patterns 109); and computation of a cross-correlation of projections of at least two of the encoded views (Fig. 1A, Self-classification of anatomical patterns with categorical cross-entropy loss 116).
Regarding claim 12:  Haghighi satisfies all the elements of claim 9.  Haghighi further discloses and further comprising executing an extractor to provide the samples of the content signal based, at least in part, on the electronic document (Further processing crops anatomical patterns from random yet fixed coordinates, and assigns the pseudo labels to the cropped anatomical patterns according to their coordinates. For instance, the top nearest neighbors of the reference patient are measured via their deep latent features that are extracted using an auto-encoder model 108 (e.g., refer to element 107 corresponding to boxed images in the middle row to the right of the reference patient's boxed image at element 101)., col. 66, ln. 66 through col. 6, ln. 7), and wherein the input tensor having been further populated with one or more intermediate states (Self-discovery of anatomical patterns: According to another embodiment, processing begins by building a set of anatomical patterns from medical images, as illustrated at FIG. 1A, via the self-discovery sub-component at element 109. An auto-encoder network is first trained with training data to extract deep features of each patient scan, which learns an identical mapping from scan to itself. Once trained, a latent representation vector from the auto-encoder may be used as an indicator of each patient. In such an embodiment, one patient is randomly anchored as a reference 101 and further processing then searches for the nearest neighbors (refer to element 107) to the randomly anchored patient through the entire dataset by computing an L2 distance of the latent representation vectors, resulting in a set of similar patients in appearance., col. 7, ln. 39-53) of the extractor (Fig. 1A, auto-encoder model 108 ).
Regarding claim 13:  Haghighi satisfies all the elements of claim 9.  Haghighi further discloses wherein the output tensor comprises classifications and locations of objects detected in the content signal (In such an embodiment, one patient is randomly anchored as a reference 101 and further processing then searches for the nearest neighbors (refer to element 107) to the randomly anchored patient through the entire dataset by computing an L2 distance of the latent representation vectors, resulting in a set of similar patients in appearance., col. 7, ln. 48-53).
Regarding claim 14:  Haghighi satisfies all the elements of claim 13.  Haghighi further discloses wherein parameters of the encoder and decoder are further trained based, at least in part, on a gradient of at least a localization loss term and a classification loss term ((So as to permit the Semantic Genesis to restore 115 the transformed anatomical patterns 114, processing computes an L2 distance between the original pattern 113 and the reconstructed pattern via the following loss function:  
    PNG
    media_image1.png
    64
    352
    media_image1.png
    Greyscale
 where N denotes the batch size, X and X′ represent the ground truth (original anatomical pattern 113) and the reconstructed prediction, respectively., col. 9, ln. 23-35), the localization loss term and the classification loss term have been computed based on the output tensor (In such an embodiment, one patient is randomly anchored as a reference 101 and further processing then searches for the nearest neighbors (refer to element 107) to the randomly anchored patient through the entire dataset by computing an L2 distance of the latent representation vectors, resulting in a set of similar patients in appearance., col. 7, ln. 48-53) and labeled data sets (Fig. 1A).
Regarding claim 17:  Haghighi satisfies all the elements of claim 9.  The structural elements of apparatus claim 4 perform all of the steps of method claim 17.  Thus, claim 17 is rejected for the same reasons discussed in the rejection of claim 4.
Regarding claim 18:  Haghighi satisfies all the elements of claim 17.  The structural elements of apparatus claim 5 perform all of the steps of method claim 18.  Thus, claim 18 is rejected for the same reasons discussed in the rejection of claim 5.
Regarding claim 19:  Arguments analogous to those stated in the rejection of claim 1 are applicable.  A non-transitory storage medium comprising computer-readable instructions stored thereon is inherently taught as evidenced by Fig. 1A and various memories stored therein.
Regarding claim 20:  Haghighi satisfies all the elements of claim 19.  Arguments analogous to those stated in the rejection of claim 4 are applicable.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 7 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Haghighi in view of Kobayashi (US 2019/0057504 A1).
Regarding claim 7:  Haghighi satisfies all the elements of claim 1.  Haghighi further discloses wherein the one or more first neural networks (Fig. 1A, Trained Encoder-Decoder Structure 114) 
	Haghighi fails to specifically address comprise a cascade mask region-based convolutional neural network.
	Kobayashi discloses comprise a cascade mask region-based convolutional neural network (The mode according to the second modification can be implemented, for example, by providing an output element for each pixel area of the medical image in a classifying member Nb in the CNN (also referred to as regional convolutional neural network (R-CNN))., par. 90).
	It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to include a cascade mask region-based convolutional neural network in order to make it easier to distinguish an area to be noticed in a medical image as taught by Kobayashi (par. 93).
Regarding claim 15:  Haghighi satisfies all the elements of claim 9.  The structural elements of apparatus claim 7 perform all of the steps of method claim 15.  Thus, claim 15 is rejected for the same reasons discussed in the rejection of claim 7.
Claim(s) 8 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Haghighi in view of Kobayashi and further in view of Yan et al. (hereinafter Yan) (US 20210224603 A1).
Regarding claim 8:  Haghighi in view of Kobayashi satisfy all the elements of claim 7.  Haghighi further discloses wherein the one or more first neural networks (Fig. 1A, Trained Encoder-Decoder Structure 114) 
	Haghighi in view of Kobayashi fails to specifically address comprise a feature pyramid network.
	Yan discloses comprise a feature pyramid network (Referring back to FIG. 2, Step S220 is to process the medical image to predict lesion proposals and generate cropped feature maps corresponding to the lesion proposals. In certain embodiments, prior to performing organ-specific detections, an overall neural network processing may be performed to extract features from the input image and generate lesion proposals. As shown in FIG. 6A, in certain embodiments, the overall processing may include: processing the input image with a 2.5-dimensional (2.5D) feature pyramid network (FPN) to generate a feature map, processing the generated feature map with a region proposal network (RPN) to predict lesion proposals, and applying a region-of-interest alignment (RolAlign) layer to generate a cropped feature map for each lesion proposal., par. 56).
	It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Haghighi in view of Kobayashi to include comprise a feature pyramid network in order to generate a feature map for proposals as taught by Yan (par. 56).
Regarding claim 16:  Haghighi in view of Kobayashi satisfy all the elements of claim 15.  The structural elements of apparatus claim 8 perform all of the steps of method claim 16.  Thus, claim 16 is rejected for the same reasons discussed in the rejection of claim 8.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHARLOTTE M BAKER whose telephone number is (571)272-7459. The examiner can normally be reached Mon - Fri 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, JENNIFER MEHMOOD can be reached at (571)272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CHARLOTTE M BAKER/Primary Examiner, Art Unit 2664
Read full office action
Prosecution Timeline

Sep 26, 2023
Application Filed
Dec 04, 2025
Non-Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/033,226
Patent 12602905
A Computer Software Module Arrangement, a Circuitry Arrangement, an Arrangement and a Method for Improved Object Detection Adapting the Detection through Shifting the Image
2y 5m to grant Granted Apr 14, 2026
18/061,396
Patent 12585654
Dynamic Vision System for Robot Fleet Management
2y 5m to grant Granted Mar 24, 2026
18/535,929
Patent 12579900
UAV PERCEPTION VALIDATION BASED UPON A SEMANTIC AGL ESTIMATE
2y 5m to grant Granted Mar 17, 2026
18/118,995
Patent 12548331
TECHNIQUES TO PERFORM TRAJECTORY PREDICTIONS
2y 5m to grant Granted Feb 10, 2026
18/743,338
Patent 12543924
MEDICAL SUPPORT SYSTEM, MEDICAL SUPPORT DEVICE, AND MEDICAL SUPPORT METHOD
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
93%
Grant Probability
93%
With Interview (-0.2%)
2y 2m
Median Time to Grant
Low
PTA Risk
Based on 1067 resolved cases by this examiner. Grant probability derived from career allow rate.