Last updated: April 19, 2026
Application No. 18/402,785
SELF-ATTENTION IN FREQUENCY DOMAIN FOR IMAGE SEGMENTATION

Non-Final OA §101§103
Filed
Jan 03, 2024
Examiner
SHIFERAW, HENOK ASRES
Art Unit
2676
Tech Center
2600 — Communications
Assignee
International Business Machines Corporation
OA Round
1 (Non-Final)
Interview Optional

— +1.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 578 resolved cases, 2023–2026
Examiner Intelligence

SHIFERAW, HENOK ASRES View full profile →
Grants 90% — above average
Career Allow Rate
518 granted / 578 resolved
+27.6% vs TC avg
Minimal +2% lift
Without
With
+1.5%
Interview Lift
resolved cases with interview
Fast prosecutor
1y 10m
Avg Prosecution
19 currently pending
Career history
597
Total Applications
across all art units
Statute-Specific Performance

§101
12.3%
-27.7% vs TC avg
§103
72.7%
+32.7% vs TC avg
§102
6.2%
-33.8% vs TC avg
§112
4.0%
-36.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 578 resolved cases
Office Action

§101 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 01/03/2024 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Objections
Claims 1, 8, and 15 are objected to because of the following informalities: 
In claim 1, line 5, the “frequency domain” should read “a frequency domain”
In claim 8, line 7, the “frequency domain” should read “a frequency domain”
In claim 15, line 10, the “frequency domain” should read “a frequency domain”
Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1–20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The limitations, under their broadest reasonable interpretation, cover mental process (concept performed in a human mind, including as observation, evaluation, judgment, opinion, organizing human activity and mathematical concepts and calculations). The independent claims 1, 8, and 15 recites a computer-implemented method, a computer program product, and a computer system, respectively. This judicial exception is not integrated into a practical application because the steps do not add meaningful limitations to be considered specifically applied to a particular technological problem to be solved .The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the steps of the claimed invention can be done mentally and no additional features in the claims would preclude them from being performed as such except for the generic computer elements at high level of generality (i.e., processor, memory).
According to the USPTO guidelines, a claim is directed to non-statutory subject matter if:
STEP 1: the claim does not fall within one of the four statutory categories of invention (process, machine, manufacture or composition of matter), or
STEP 2: the claim recites a judicial exception, e.g. an abstract idea, without reciting additional elements that amount to significantly more than the judicial exception, as determined using the following analysis:
STEP 2A (PRONG 1): Does the claim recite an abstract idea, law of nature, or natural phenomenon?
STEP 2A (PRONG 2): Does the claim recite additional elements that integrate the judicial exception into a practical application?
STEP 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
Using the two-step inquiry, it is clear that the independent claims 1, 8, and 15 are directed to an abstract idea as shown below:
STEP 1: Do the claims fall within one of the statutory categories? YES. Independent claims 1 and 15 are directed to a process and system, respectively.
STEP 2A (PRONG 1): Is the claim directed to a law of nature, a natural phenomenon or an abstract idea? YES, the claims are directed toward a mental process (i.e. abstract idea – mathematical calculations).
With regard to STEP 2A (PRONG 1), the guidelines provide three groupings of subject matter that are considered abstract ideas:
Mathematical concepts – mathematical relationships, mathematical formulas or equations, mathematical calculations;
Certain methods of organizing human activity – fundamental economic principles or practices (including hedging, insurance, mitigating risk); commercial or legal interactions (including agreements in the form of contracts; legal obligations; advertising, marketing or sales activities or behaviors; business relations); managing personal behavior or relationships or interactions between people (including social activities, teaching, and following rules or instructions); and
Mental processes – concepts that are practicably performed in the human mind (including an observation, evaluation, judgment, opinion).
Independent claims 1, 8, and 15 comprise mathematical calculations that can be practicably performed in the human mind (or generic computers or components configured to perform the method) and, therefore, an abstract idea.
Regarding independent claims 1, 8, and 15: the limitations recite:
“inputting the image file into a deep learning model, wherein the deep learning model includes multiple blocks, each block of the multiple blocks including a Hartley transform, mixing of features in the frequency domain with a set of learnable parameters to produce new features, and an inverse of the Hartley transform” (claim 1) and “program instructions to input the image file into a deep learning model, wherein the deep learning model includes multiple blocks, each block of the multiple blocks including a Hartley transform, mixing of features in the frequency domain with a set of learnable parameters to produce new features, and an inverse of the Hartley transform” (claims 8 and 15) (mathematical concepts, mathematical relationships, mathematical formulas or equation, mathematical calculations)
These limitations, as drafted, is a simple process that, under their broadest reasonable interpretation, covers performance of the limitations in the mind or by a human. The Examiner notes that under MPEP 2106.04(a)(2)(III), the courts consider a mental process (thinking) that “can be performed in the human mind, or by a human using a pen and paper" to be an abstract idea. CyberSource Corp. v. Retail Decisions, Inc., 654 F.3d 1366, 1372, 99 USPQ2d 1690, 1695 (Fed. Cir. 2011). As the Federal Circuit explained, "methods which can be performed mentally, or which are the equivalent of human mental work, are unpatentable abstract ideas the ‘basic tools of scientific and technological work’ that are open to all.’" 654 F.3d at 1371, 99 USPQ2d at 1694 (citing Gottschalk v. Benson, 409 U.S. 63, 175 USPQ 673 (1972)). See also Mayo Collaborative Servs. v. Prometheus Labs. Inc., 566 U.S. 66, 71, 101 USPQ2d 1961, 1965 ("‘[M]ental processes[] and abstract intellectual concepts are not patentable, as they are the basic tools of scientific and technological work’" (quoting Benson, 409 U.S. at 67, 175 USPQ at 675)); Parker v. Flook, 437 U.S. 584, 589, 198 USPQ 193, 197 (1978) (same).
As such, a person could mentally calculate the Hartley transform to determine new frequency features. The mere nominal recitation that the various steps are being executed by a computer program product or a computer system does not take the limitations out of the mental process grouping. Thus, the claims recite a mental process. 

STEP 2A (PRONG 2): Does the claim recite additional elements that integrate the judicial exception into a practical application? NO, the claims do not recite additional elements that integrate the judicial exception into a practical application.
With regard to STEP 2A (prong 2), whether the claim recites additional elements that integrate the judicial exception into a practical application, the guidelines provide the following exemplary considerations that are indicative that an additional element (or combination of elements) may have integrated the judicial exception into a practical application:
an additional element reflects an improvement in the functioning of a computer, or an improvement to other technology or technical field;
an additional element that applies or uses a judicial exception to affect a particular treatment or prophylaxis for a disease or medical condition;
an additional element implements a judicial exception with, or uses a judicial exception in conjunction with, a particular machine or manufacture that is integral to the claim;
an additional element effects a transformation or reduction of a particular article to a different state or thing; and
an additional element applies or uses the judicial exception in some other meaningful way beyond generally linking the use of the judicial exception to a particular technological environment, such that the claim as a whole is more than a drafting effort designed to monopolize the exception. 
While the guidelines further state that the exemplary considerations are not an exhaustive list and that there may be other examples of integrating the exception into a practical application, the guidelines also list examples in which a judicial exception has not been integrated into a practical application:
an additional element merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea;
an additional element adds insignificant extra-solution activity to the judicial exception; and
an additional element does no more than generally link the use of a judicial exception to a particular technological environment or field of use.
Independent claims 1, 8, and 15 do not recite any of the exemplary considerations that are indicative of an abstract idea having been integrated into a practical application. Independent claims 1, 8, and 15 discloses “accessing an image file” (claim 1), “outputting another image file containing segmentation results of the accessed image file” (claim 1), a computer program product including computer readable storage media (claims 8 and 15), processor (claim 15), “program instructions to accessing an image file” (claim 8 and 15), and “program instructions to output another image file containing segmentation results of the accessed image file” (claim 8 and 15), which are generic computer components and/or insignificant pre/post-solution extra activity that do not add a meaningful limitation to the abstract idea because they amount to simply implementing the abstract idea in a method and system. 
These limitations are recited at a high level of generality (“a deep learning model”) (i.e. as a general action or change being taken based on the results of the acquiring step) and amounts to mere post solution actions, which is a form of insignificant extra-solution activity. Further, the claims are claimed generically and are operating in their ordinary capacity such that they do not use the judicial exception in a manner that imposes a meaningful limit on the judicial exception. Accordingly, even in combination, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. 
STEP 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception? No, the claims do not recite additional elements that amount to significantly more than the judicial exception.
With regard to STEP 2B, whether the claims recite additional elements that provide significantly more than the recited judicial exception, the guidelines specify that the pre-guideline procedure is still in effect. Specifically, that examiners should continue to consider whether an additional element or combination of elements:
adds a specific limitation or combination of limitations that are not well-understood, routine, conventional activity in the field, which is indicative that an inventive concept may be present; or
simply appends well-understood, routine, conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception, which is indicative that an inventive concept may not be present.

Independent claims 1, 8, and 15 do not recite any additional elements that are not well-understood, routine or conventional. The use of a generic computer elements are routine, well-understood and conventional process that is performed by computers.
Thus, since independent claims 1, 8, and 15 are: (a) directed toward an abstract idea, (b) do not recite additional elements that integrate the judicial exception into a practical application, and (c) do not recite additional elements that amount to significantly more than the judicial exception, it is clear that independent claims 1, 8, and 15 are not eligible subject matter under 35 U.S.C 101.

Regarding claims 2–7, 9–14, and 16–20: the additional limitations do not integrate the mental process into practical application or add significantly more to the mental process. The limitations: “wherein the mixing of features are performed at each frequency by learnable parameters to produce new feature” (claims 2, 9, and 16), “mixing the new features of different frequencies by self-attention of Transformers to produce another set of new features” (claims 3 and 10), “wherein the set of learnable parameters is shared by different frequencies” (claims 4 and 11), “improving convergence and accuracy by using residual connections and deep supervision” (claims 5 and 12), “wherein the deep learning model includes a convolutional layer sequentially after an input layer for input downsampling” (claims 6 and 13), and “an output transposed convolutional layer for output upsampling” (claims 7 and 14) are mental processes including insignificant pre/post-solution extra activity of generating data and generic computers or components configured to perform the abstract idea. 

Claims 8–14 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.
Claim 8 is drawn to “a computer program product,” but neither the claim nor the disclosure limits the medium to be statutory embodiments. According to MPEP 2106.03, “for all categories except process claims, the eligible subject matter must exist in some physical or tangible form.” Under the broadest reasonable interpretation in light of the specification, the claim merely recites a “computer program product” which could be simply software per se. Therefore, claim 8 as a whole is directed towards software per se and does not fall within a statutory category. Claims 9–14 are rejected due to their dependency.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1–2, 4, 8–9, 11, 15–16, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (Wang, Wenxuan, et al. "Fremae: Fourier transform meets masked autoencoders for medical image segmentation." arXiv preprint arXiv:2304.10864 4 (2023)) (hereafter, “Wang”) in view of Shi et al. (CN 113850304 B) (hereafter, “Shi”).
Regarding claim 1, Wang discloses a computer-implemented method [the proposed method is implemented in PyTorch and trained with two NVIDIA Geforce RTX 3090 GPUs, pg. 6, left to right column, Implementation Details, first paragraph] comprising: accessing an image file [given an input medical image slice X ∈ RC×H×W with a spatial resolution of H × W and C channels (# of modalities), the proposed foreground masking strategy is first employed on the original image slice to generate the masked image, pg. 3-4, right column to left column, Overall Architecture, first paragraph]; inputting the image file into a deep learning model [Figure 3; the generic encoder (i.e. according to different pre-training requirements, both CNNs and Transformers encoder can be integrated into our framework) takes the masked image as input, pg. 4, left column, Overall Architecture, first paragraph] wherein the deep learning model includes multiple blocks [Figure 3; for the fused feature of each semantic level, an FMB (the examiner interprets an FMB to be a block) is applied respectively to learn its recessive information in the frequency domain, pg. 4, Figure 3 citation of section Overall Architecture], [each block of the multiple blocks including a Hartley transform], mixings of features in the frequency domain with a set of learnable parameters [W and b are both learnable parameters, pg. 5, right column, Multi-stage Supervision Scheme, second paragraph] to produce new features [Figure 3; the aggregated feature representations at the lowest stage and highest stage will be mapped to the frequency domain through the introduced frequency mapping block (as illustrated in Fig. 3), which are followed by the low-pass and high-pass filters to get the corresponding high-pass and low-pass prediction spectrum, pg. 5, right column, Multi-stage Supervision Scheme, second paragraph], [and an inverse of the Hartley transform]; and outputting another image file containing segmentation results of the accessed image file [Figure 4-5; the skin lesion segmentation results on ISIC 2018 dataset is presented in Fig. 4, pg. 8, left column, Segmentation Results, first paragraph].

    PNG
    media_image1.png
    775
    1719
    media_image1.png
    Greyscale

Figure 3 of Wang reference displaying overall architecture displaying multiple FMBs (Frequency Mapping Blocks).
Wang fails to explicitly disclose [inputting the image file into a deep learning model, wherein the deep learning model includes multiple blocks], each block of the multiple blocks including a Hartley transform, [mixing of features in the frequency domain with a set of learnable parameters to produce new features], and an inverse Hartley transform.
However, Shi discloses [inputting the image file into a deep learning model, wherein the deep learning model includes multiple blocks], each block of the multiple blocks including a Hartley transform [Figure 1; the DHT differential pooling algorithm includes: input features, discrete Hartley transformation ... the DHT difference pooling module stores the DHT difference pooling algorithm. The DHT difference pooling algorithm can convert the feature relationship obtained in step 3 from the spatial domain to the frequency domain after discrete Hartley transformation, para 0054, 0052], [mixing of features in the frequency domain with a set of learnable parameters to produce new features], and an inverse Hartley transform [the DHT differential pooling algorithm includes the following processes: input features, discrete Hartley transformation, centering, clipping target size, and Hartley inverse transformation ... finally, convert it from the frequency domain to the spatial domain through inverse transformation, para 0054, 0052].
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Wang and incorporate the teachings of Shi to reduce the loss from direct pooling and increase segmentation accuracy, as recognized by Shi. 
Further, one skilled in the art could have combined the elements as described above with known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine Shi with Wang to obtain the invention as specified in claim 1. 
Regarding claim 2, which claim 1 is incorporated, Wang discloses wherein the mixings of features are performed at each frequency [Figure 2-3; high frequency components, and low-frequency counterparts respectively, the high/low-frequency components of which are acquired by applying the corresponding high/low-pass filters on the whole Fourier spectrum, pg. 2, left column, Figure 2 citation of section 1. Introduction ... high-level and low-level information of an image distribute in different frequency bands of the Fourier spectrum. So we propose to separately take advantage of the low-pass and high-pass Fourier spectrum, pg. 5, left column, Multi-stage Supervision Scheme, second paragraph] to produce new features [the aggregated feature representations at the lowest stage and highest stage will be mapped to the frequency domain through the introduced frequency mapping block (as illustrated in Fig. 3), which are followed by the low pass and high-pass filters to get the corresponding high-pass and low-pass prediction spectrum, pg. 5, right column, Multi-stage Supervision Scheme, second paragraph].
Regarding claim 4, which claim 1 is incorporated, Wang discloses wherein the set of learnable parameters [W and b are both learnable parameters, pg. 5, right column, Multi-stage Supervision Scheme, second paragraph] is shared by different frequencies [Figure 3 & Equations 8-9; the frequency mapping block (FMB) consists of a 2D-DFT, a Frequency Domain Perception (FDP), and a 2D-IDFT, which can be calculated as: Plow = IDFT(W⊙DFT (Alow)+b , Phigh = IDFT(W⊙DFT(Ahigh)+b) ... the aggregated feature representations at the lowest stage and highest stage will be mapped to the frequency domain through the introduced frequency mapping block (as illustrated in Fig. 3), which are followed by the low-pass and high-pass filters to get the corresponding high-pass and low-pass prediction spectrum, pg. 5, right column, Multi-stage Supervision Scheme, second paragraph].
Regarding claim 8, Wang discloses a computer program product comprising: one or more computer readable storage media and program instructions stored on the one or more computer readable storage media [the proposed method is implemented in PyTorch and trained with two NVIDIA Geforce RTX 3090 GPUs, pg. 6, left to right column, Implementation Details, first paragraph], the program instructions comprising: program instructions to accessing an image file [given an input medical image slice X ∈ RC×H×W with a spatial resolution of H × W and C channels (# of modalities), the proposed foreground masking strategy is first employed on the original image slice to generate the masked image, pg. 3-4, right column to left column, Overall Architecture, first paragraph]; program instructions to input the image file into a deep learning model [Figure 3; the generic encoder (i.e. according to different pre-training requirements, both CNNs and Transformers encoder can be integrated into our framework) takes the masked image as input, pg. 4, left column, Overall Architecture, first paragraph], wherein the deep learning model includes multiple blocks [Figure 3; for the fused feature of each semantic level, an FMB (the examiner interprets an FMB to be a block) is applied respectively to learn its recessive information in the frequency domain, pg. 4, Figure 3 citation of section Overall Architecture], [each block of the multiple blocks including a Hartley transform], mixings of features in the frequency domain with a set of learnable parameters [W and b are both learnable parameters, pg. 5, right column, Multi-stage Supervision Scheme, second paragraph] to produce new features [Figure 3; the aggregated feature representations at the lowest stage and highest stage will be mapped to the frequency domain through the introduced frequency mapping block (as illustrated in Fig. 3), which are followed by the low-pass and high-pass filters to get the corresponding high-pass and low-pass prediction spectrum, pg. 5, right column, Multi-stage Supervision Scheme, second paragraph], [and an inverse of the Hartley transform]; and program instructions to output another image file containing segmentation results of the accessed image file [Figure 4-5; the skin lesion segmentation results on ISIC 2018 dataset is presented in Fig. 4, pg. 8, left column, Segmentation Results, first paragraph].
Wang fails to explicitly disclose [program instructions to input the image file into a deep learning model, wherein the deep learning model includes multiple blocks], each block of the multiple blocks including a Hartley transform, [mixing of features in the frequency domain with a set of learnable parameters to produce new features], and an inverse Hartley transform.
However, Shi discloses [program instructions to input the image file into a deep learning model, wherein the deep learning model includes multiple blocks], each block of the multiple blocks including a Hartley transform [Figure 1; the DHT differential pooling algorithm includes: input features, discrete Hartley transformation ... the DHT difference pooling module stores the DHT difference pooling algorithm. The DHT difference pooling algorithm can convert the feature relationship obtained in step 3 from the spatial domain to the frequency domain after discrete Hartley transformation, para 0054, 0052], [mixing of features in the frequency domain with a set of learnable parameters to produce new features], and an inverse Hartley transform [the DHT differential pooling algorithm includes the following processes: input features, discrete Hartley transformation, centering, clipping target size, and Hartley inverse transformation ... finally, convert it from the frequency domain to the spatial domain through inverse transformation, para 0054, 0052].
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Wang and incorporate the teachings of Shi to reduce the loss from direct pooling and increase segmentation accuracy, as recognized by Shi. 
Further, one skilled in the art could have combined the elements as described above with known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine Shi with Wang to obtain the invention as specified in claim 8. 
Regarding claim 9, (drawn to a computer program product) the proposed combination of Wang in view of Shi explained in the rejection of method claim 2 renders obvious the steps of the computer program product claim 9, because these steps occur in the operation of the method as discussed above. Thus, the arguments similar to that presented above for claim 2 is equally applicable to claim 9.
Regarding claim 11, (drawn to a computer program product) the proposed combination of Wang in view of Shi explained in the rejection of method claim 4 renders obvious the steps of the computer program product claim 11, because these steps occur in the operation of the method as discussed above. Thus, the arguments similar to that presented above for claim 4 is equally applicable to claim 11.
Regarding claim 15, Wang discloses a computer system comprising: one or more computer processors; one or more computer readable storage media [the proposed method is implemented in PyTorch and trained with two NVIDIA Geforce RTX 3090 GPUs, pg. 6, left to right column, Implementation Details, first paragraph]; and program instructions stored on the one or more computer readable storage media for execution by at least one of the one or more computer processors, the program instructions comprising: program instructions to accessing an image file [given an input medical image slice X ∈ RC×H×W with a spatial resolution of H × W and C channels (# of modalities), the proposed foreground masking strategy is first employed on the original image slice to generate the masked image, pg. 3-4, right column to left column, Overall Architecture, first paragraph]; program instructions to input the image file into a deep learning model [Figure 3; the generic encoder (i.e. according to different pre-training requirements, both CNNs and Transformers encoder can be integrated into our framework) takes the masked image as input, pg. 4, left column, Overall Architecture, first paragraph], wherein the deep learning model includes multiple blocks [Figure 3; for the fused feature of each semantic level, an FMB (the examiner interprets an FMB to be a block) is applied respectively to learn its recessive information in the frequency domain, pg. 4, Figure 3 citation of section Overall Architecture], [each block of the multiple blocks including a Hartley transform], mixings of features in the frequency domain with a set of learnable parameters [W and b are both learnable parameters, pg. 5, right column, Multi-stage Supervision Scheme, second paragraph] to produce new features [Figure 3; the aggregated feature representations at the lowest stage and highest stage will be mapped to the frequency domain through the introduced frequency mapping block (as illustrated in Fig. 3), which are followed by the low-pass and high-pass filters to get the corresponding high-pass and low-pass prediction spectrum, pg. 5, right column, Multi-stage Supervision Scheme, second paragraph], [and an inverse of the Hartley transform]; and program instructions to output another image file containing segmentation results of the accessed image file [Figure 4-5; the skin lesion segmentation results on ISIC 2018 dataset is presented in Fig. 4, pg. 8, left column, Segmentation Results, first paragraph].
Wang fails to explicitly disclose [program instructions to input the image file into a deep learning model, wherein the deep learning model includes multiple blocks], each block of the multiple blocks including a Hartley transform, [mixing of features in the frequency domain with a set of learnable parameters to produce new features], and an inverse Hartley transform.
However, Shi discloses [program instructions to input the image file into a deep learning model, wherein the deep learning model includes multiple blocks], each block of the multiple blocks including a Hartley transform [Figure 1; the DHT differential pooling algorithm includes: input features, discrete Hartley transformation ... the DHT difference pooling module stores the DHT difference pooling algorithm. The DHT difference pooling algorithm can convert the feature relationship obtained in step 3 from the spatial domain to the frequency domain after discrete Hartley transformation, para 0054, 0052], [mixing of features in the frequency domain with a set of learnable parameters to produce new features], and an inverse Hartley transform [the DHT differential pooling algorithm includes the following processes: input features, discrete Hartley transformation, centering, clipping target size, and Hartley inverse transformation ... finally, convert it from the frequency domain to the spatial domain through inverse transformation, para 0054, 0052].
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Wang and incorporate the teachings of Shi to reduce the loss from direct pooling and increase segmentation accuracy, as recognized by Shi. 
Further, one skilled in the art could have combined the elements as described above with known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine Shi with Wang to obtain the invention as specified in claim 15.
Regarding claim 16, (drawn to a computer system) the proposed combination of Wang in view of Shi explained in the rejection of method claim 2 renders obvious the steps of the system claim 16, because these steps occur in the operation of the method as discussed above. Thus, the arguments similar to that presented above for claim 2 is equally applicable to claim 16.
Regarding claim 18, (drawn to a computer system) the proposed combination of Wang in view of Shi explained in the rejection of method claim 4 renders obvious the steps of the system claim 18, because these steps occur in the operation of the method as discussed above. Thus, the arguments similar to that presented above for claim 4 is equally applicable to claim 18.
Claims 3, 10, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Wang ("Fremae: Fourier transform meets masked autoencoders for medical image segmentation”) in view of Shi (CN 113850304 B), as applied above, and further in view of Zhang et al. (Zhang, Fengyu, Ashkan Panahi, and Guangjun Gao. "FsaNet: Frequency self-attention for semantic segmentation." IEEE Transactions on Image Processing 32 (2023): 4757-4772), disclosed in IDS, (hereafter, “Zhang”).
Regarding claim 3, which claim 2 is incorporated, neither Wang nor Shi appears to explicitly disclose mixing the new features of different frequencies by self-attention of Transformers.
However, Zhang discloses mixing the new features of different frequencies by self-attention of Transformers [Figure 1; reconstructs the spatial features from the processed frequency coefficients (expanding dimensionality), after a key process called frequency self-attention, pg. 4, right column, III. Methodology, first paragraph] to produce another set of new features [Figure 2-3; note that in the green path, the input of frequency self-attention f′ ∈ RC×k^2 is equal to the row-wise expansion of f in the red path. The input X is also reshaped to X′. These facts define Linear operation 1 as a map between X′ and f′, consisting of reshape and 2D-DCT operations, pg. 4760, right column, Linear Operations 1 and 2, first paragraph ... Linear operation 2 consists of a series of operations related to G, which transform the frequency self-attention output o′ ∈ RC×k^2, pg. 4, right column, Linear Operations 1 and 2, first paragraph].
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Wang in view of Shi and incorporate the teachings of Zhang to preserve edge details, as recognized by Zhang. 
Further, one skilled in the art could have combined the elements as described above with known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine Zhang with Wang and Shi to obtain the invention as specified in claim 3.
Regarding claim 10, (drawn to a computer program product) the proposed combination of Wang in view of Shi and further in view of Zhang explained in the rejection of method claim 3 renders obvious the steps of the computer program product claim 10, because these steps occur in the operation of the method as discussed above. Thus, the arguments similar to that presented above for claim 3 is equally applicable to claim 10.
Regarding claim 17, (drawn to a computer system) the proposed combination of Wang in view of Shi and further in view of Zhang explained in the rejection of method claim 3 renders obvious the steps of the system claim 17, because these steps occur in the operation of the method as discussed above. Thus, the arguments similar to that presented above for claim 3 is equally applicable to claim 17.
Claims 5, 12, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Wang ("Fremae: Fourier transform meets masked autoencoders for medical image segmentation”) in view of Shi et al. (CN 113850304 B), as applied above, and further in view of Liu et al. (Liu, Guoqi, et al. "FTMF-net: A Fourier transform-multiscale feature fusion network for segmentation of small polyp objects." IEEE Transactions on Instrumentation and Measurement 72 (2023): 1-15) (hereafter, “Liu”). 
Regarding claim 5, which claim 1 is incorporated, neither Wang nor Shi appears to explicitly disclose improving convergence and accuracy by using residual connections and deep supervision.
However, Liu teaches improving convergence and accuracy by using residual connections [we use the residual connection strategy to further optimize the network, pg. 7, left column, B. Multiscale Feature Fusion Module, first paragraph] and deep supervision [as an auxiliary classifier, deep supervision optimizes the gradient update process by supervising the lower three layers. It can prevent the problem of vanishing gradients and slow convergence. To fully train low-level features, we apply the deep supervision strategy on the global feature segmentation map k and three feature maps {k1,k2,k3}, pg. 7, right column, C. Loss Function, second paragraph].
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Wang in view of Shi and incorporate the teachings of Liu to prevent vanishing gradients and optimize the network, as recognized by Liu. 
Further, one skilled in the art could have combined the elements as described above with known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine Liu with Wang and Shi to obtain the invention as specified in claim 5. 
Regarding claim 12, (drawn to a computer program product) the proposed combination of Wang in view of Shi and further in view of Liu explained in the rejection of method claim 5 renders obvious the steps of the computer program product claim 12, because these steps occur in the operation of the method as discussed above. Thus, the arguments similar to that presented above for claim 5 is equally applicable to claim 12.
Regarding claim 19, (drawn to a computer system) the proposed combination of Wang in view of Shi and further in view of Liu explained in the rejection of method claim 5 renders obvious the steps of the system claim 19, because these steps occur in the operation of the method as discussed above. Thus, the arguments similar to that presented above for claim 5 is equally applicable to claim 19.

Claims 6–7, 13–14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Wang ("Fremae: Fourier transform meets masked autoencoders for medical image segmentation”) in view of Shi (CN 113850304 B), as applied above, and further in view of Huang (US 2024/0029271 A1) (hereafter, “Huang”).
Regarding claim 6, which claim 1 is incorporated, neither Wang nor Shi appears to explicitly disclose wherein the deep learning model includes a convolutional layer sequentially after an input layer for input downsampling.
However, Huang teaches wherein the deep learning model includes a convolutional layer [each encoder down-sampling block 502, 504, 506, and 508 ... may include at least two convolution (e.g., neural network) layers, para 0082] sequentially after an input layer for input downsampling [Figure 5; the U-Net model architecture 500 includes multiple encoder down-sampling blocks 502, 504, 506, and 508 ... the encoder down- sampling blocks 502, 504, 506 and 508 are connected sequentially, with the encoder block 502 arranged to receive an input ... the encoder blocks 502, 504, 506, and 508 may include a down-sampling layer, para 0076, 0082].
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Wang in view of Shi and incorporate the teachings of Huang better information recovery, as recognized by Huang. 
Further, one skilled in the art could have combined the elements as described above with known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine Huang with Wang and Shi to obtain the invention as specified in claim 6. 
Regarding claim 7, which claim 6 is incorporated, neither Wang nor Shi appears to explicitly disclose an output transposed convolutional layer for output upsampling.
However, Huang teaches an output transposed convolutional layer [each decoder up- sampling block 512, 514, 516 and 518, may include at least two convolution (e.g., neural network) layers, para 0082] for output upsampling [the decoder blocks 512, 514, 516 and 518 may include an up-sampling layer (e.g., a transposed convolution up-sampling layer) ... the U-Net model architecture 500 is configured to generate an output 522, para 0078].
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Wang in view of Shi and incorporate the teachings of Huang for better information recovery, as recognized by Huang. 
Further, one skilled in the art could have combined the elements as described above with known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine Huang with Wang and Shi to obtain the invention as specified in claim 7.	
Regarding claim 13, (drawn to a computer program product) the proposed combination of Wang in view of Shi and further in view of Huang explained in the rejection of method claim 6 renders obvious the steps of the computer program product claim 13, because these steps occur in the operation of the method as discussed above. Thus, the arguments similar to that presented above for claim 6 is equally applicable to claim 13.
Regarding claim 14, (drawn to a computer program product) the proposed combination of Wang in view of Shi and further in view of Huang explained in the rejection of method claim 7 renders obvious the steps of the computer program product claim 14, because these steps occur in the operation of the method as discussed above. Thus, the arguments similar to that presented above for claim 7 is equally applicable to claim 14.
Regarding claim 20, which claim 15 is incorporated, neither Wang nor Shi appears to explicitly disclose wherein the deep learning model includes a convolutional layer sequentially after an input layer for input downsampling, and an output transposed convolutional layer for output upsampling.
However, Huang teaches wherein the deep learning model includes a convolutional layer [each encoder down-sampling block 502, 504, 506, and 508 ... may include at least two convolution (e.g., neural network) layers, para 0082] sequentially after an input layer for input downsampling [Figure 5; the U-Net model architecture 500 includes multiple encoder down-sampling blocks 502, 504, 506, and 508 ... the encoder down- sampling blocks 502, 504, 506 and 508 are connected sequentially, with the encoder block 502 arranged to receive an input ... the encoder blocks 502, 504, 506, and 508 may include a down-sampling layer, para 0076, 0082], and an output transposed convolutional layer [each decoder up- sampling block 512, 514, 516 and 518, may include at least two convolution (e.g., neural network) layers, para 0082] for output upsampling [the decoder blocks 512, 514, 516 and 518 may include an up-sampling layer (e.g., a transposed convolution up-sampling layer) ... the U-Net model architecture 500 is configured to generate an output 522, para 0078].
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Wang in view of Shi and incorporate the teachings of Huang for better information recovery, as recognized by Huang. 
Further, one skilled in the art could have combined the elements as described above with known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine Huang with Wang and Shi to obtain the invention as specified in claim 20.
Conclusion
The art made of record and not relied upon is considered pertinent to applicant's disclosure:
From Spatial to Frequency Domain: A Pure Frequency Domain FDNet Model for the Classification of Remote Sensing Images to Wang Wei et al. discloses a frequency domain deep learning network (FDNet) that uses the frequency domain and performs feature extraction with self-attention.
	Boundary-Aware Spatial and Frequency Dual-Domain Transformer for Remote Sensing Urban Images Segmentation to Zhang Jie et al. discloses a boundary-aware spatial and frequency dual-domain transformer that includes a dual-domain mixer (DualM) where the spatial-domain branch combines depthwise convolution and attention mechanism to extract local and global features while the frequency-domain branch uses fast Fourier transform (FFT) to extract image-size features. 
Diff-SFCT: A Diffusion Model with Spatial-Frequency Cross Transformer for Medical Image Segmentation to Jiang et al. discloses a medical image segmentation framework called Diff-SFCT that uses a backbone network combining Convolutional Neural Network (CNN) and Transformer that extract semantic features from images that includes a Spatial-Frequency Attention Module (SFAM) in the Convolutional Block.
	WO 2023/128790 A1 to Dylov et al. discloses an image segmentation pipeline including an artificial agent based on reinforcement learning that obtains an optimal frequency domain.
US 10,290,107 B1 to Casas et al. discloses a transform domain regression convolution neural network for image segmentation. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TOLUWANI MARY-JANE IJASEUN whose telephone number is (571)270-1877. The examiner can normally be reached Monday - Friday 7:30AM-4PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Henok Shiferaw can be reached at (571) 272-4637. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/TOLUWANI MARY-JANE IJASEUN/Examiner, Art Unit 2676


/Henok Shiferaw/Supervisory Patent Examiner, Art Unit 2676
Read full office action
Prosecution Timeline

Jan 03, 2024
Application Filed
Feb 17, 2026
Non-Final Rejection — §101, §103
Apr 10, 2026
Interview Requested
Precedent Cases

Applications granted by this same examiner with similar technology

18/349,408
Patent 12597117
METHOD, PROGRAM, APPARATUS, AND SYSTEM FOR ABNORMALITY DETECTION SUCH AS FOR DETERMINING WHETHER A PLURALITY OF CONTAINERS TO BE STACKED ON A PALLET IS NORMAL OR ABNORMAL
2y 5m to grant Granted Apr 07, 2026
18/305,627
Patent 12555231
DETECTING ISCHEMIC STROKE MIMIC USING DEEP LEARNING-BASED ANALYSIS OF MEDICAL IMAGES
2y 5m to grant Granted Feb 17, 2026
18/274,579
Patent 12536796
REMOTE SOIL AND VEGETATION PROPERTIES DETERMINATION METHOD AND SYSTEM
2y 5m to grant Granted Jan 27, 2026
18/033,185
Patent 12525056
METHOD AND DEVICE FOR MULTI-DNN-BASED FACE RECOGNITION USING PARALLEL-PROCESSING PIPELINES
2y 5m to grant Granted Jan 13, 2026
18/051,443
Patent 12499506
INFERENCE MODEL CONSTRUCTION METHOD, INFERENCE MODEL CONSTRUCTION DEVICE, RECORDING MEDIUM, CONFIGURATION DEVICE, AND CONFIGURATION METHOD
2y 5m to grant Granted Dec 16, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
90%
Grant Probability
91%
With Interview (+1.5%)
1y 10m
Median Time to Grant
Low
PTA Risk
Based on 578 resolved cases by this examiner. Grant probability derived from career allow rate.