Last updated: April 19, 2026
Application No. 18/515,732
EFFICIENT ON-DEVICE TRANSFORMER ARCHITECTURE FOR IMAGE PROCESSING

Non-Final OA §101§103
Filed
Nov 21, 2023
Examiner
BITAR, NANCY
Art Unit
2664
Tech Center
2600 — Communications
Assignee
Samsung Electronics Co., Ltd.
OA Round
1 (Non-Final)
Interview Optional

— +8.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 946 resolved cases, 2023–2026
Examiner Intelligence

BITAR, NANCY View full profile →
Grants 83% — above average
Career Allow Rate
786 granted / 946 resolved
+21.1% vs TC avg
Moderate +8% lift
Without
With
+8.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
32 currently pending
Career history
978
Total Applications
across all art units
Statute-Specific Performance

§101
13.3%
-26.7% vs TC avg
§103
62.1%
+22.1% vs TC avg
§102
6.4%
-33.6% vs TC avg
§112
8.9%
-31.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 946 resolved cases
Office Action

§101 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 10-19 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. When reviewing independent claim 10, and based upon consideration of all of the relevant factors with respect to the claim as a whole, claim(s)10-19 are held to claim an abstract idea without reciting elements that amount to significantly more than the abstract idea and is/are therefore rejected as ineligible subject matter under 35 U.S.C. 101. The Examiner will analyze Claim 10, and similar rationale applies to independent Claim/s 11-19.  The rationale, under MPEP § 2106,  for this finding is explained below:  
The claimed invention (1) must be directed to one of the four statutory categories, and (2) must not be wholly directed to subject matter encompassing a judicially recognized exception, as defined below. The following two step analysis is used to evaluate these criteria.
Step 1: Is the claim directed to one of the four patent-eligible subject matter categories: process, machine, manufacture, or composition of matter?
When examining the claim under 35 U.S.C. 101, the Examiner interprets that the claims is related to a using a U-shaped network, since the claim is directed to a method.
Step 2a, Prong 1: Does the claim wholly embrace a judicially recognized exception, which includes laws of nature, physical phenomena, and abstract ideas, or is it a particular practical application of a judicial exception?
The Examiner interprets that the judicial exception applies since Claim 10 limitation of “projecting an input image; processing the projected ; projecting the last decoder ; processing ; feeding based on element-wise addition using a summation node associated with the skip connection” are directed to an abstract idea. The claim is related to mathematical concept by a computer . If the claim recites a judicial exception (i.e., an abstract idea enumerated in MPEP § 2106.04(a), a law of nature, or a natural phenomenon), the claim requires further analysis in Prong Two. 
Step 2a, Prong 2: Does the claim recite additional elements that integrate the judicial exception into a practical application?
The Examiner interprets that Claim 10-19 limitation does not provide additional elements or combination of additional elements to a practical application since the claim/s is/are [adding the words of “applying it” with more instructions to implement an abstract idea on a computer. See MPEP 2106.05(f). or insignificant extra-solution activity to the judicial exception - see MPEP 2106.05(g). or Generally linking the use of the judicial exception to a particular technological environment or field of use – see MPEP 2106.05(h).] See, MPEP §2106.04(a), Because a judicial exception is not eligible subject matter, Bilski, 561 U.S. at 601, 95 USPQ2d at 1005-06 (quoting Chakrabarty, 447 U.S. at 309, 206 USPQ at 197 (1980)), if there are no additional claim elements besides the judicial exception, or if the additional claim elements merely recite another judicial exception, that is insufficient to integrate the judicial exception into a practical application. See, e.g., RecogniCorp, LLC v. Nintendo Co., 855 F.3d 1322, 1327, 122 USPQ2d 1377 (Fed. Cir. 2017) ("Adding one abstract idea (math) to another abstract idea (encoding and decoding) does not render the claim non-abstract"). OR Genetic Techs. v. Merial LLC, 818 F.3d 1369, 1376, 118 USPQ2d 1541, 1546 (Fed. Cir. 2016) (eligibility "cannot be furnished by the unpatentable law of nature (or natural phenomenon or abstract idea) itself."). For a claim reciting a judicial exception to be eligible, the additional elements (if any) in the claim must "transform the nature of the claim" into a patent-eligible application of the judicial exception, Alice Corp., 573 U.S. at 217, 110 USPQ2d at 1981, either at Prong Two or in Step 2B. If there are no additional elements in the claim, then it cannot be eligible. In such a case, after making the appropriate rejection (see MPEP § 2106.07 for more information on formulating a rejection for lack of eligibility), it is a best practice for the examiner to recommend an amendment, if possible, that would resolve eligibility of the claim.
Step 2b: If a judicial exception into a practical application is not recited in the claim, the Examiner must interpret if the claim recites additional elements that amount to significantly more than the judicial exception. 
The Examiner interprets that the Claims do not amount to significantly more since the Claim/s is/state
Furthermore, the generic computer components of the computer recited as performing generic computer functions that are well-understood, routine and conventional activities amount to no more than implementing the abstract idea with a computerized system. 
Claims 11-19 depending on the independent claim/s include all the limitation of the independent claim. 
Thus, Claims 11-19 recite the same abstract idea and therefore are not drawn to the eligible subject matter as they are directed to the abstract idea without significantly more.
Therefore, the Examiner interprets that the claims are rejected under 35 U.S.C. 101. 

	Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-20  are rejected under 35 U.S.C. 103 as being unpatentable over Dekel et al (US 2024/0419382) in view of Ikonin et al (US 2023/0353764)
As to claim 1, Dekel et al teaches an apparatus for image restoration, the apparatus comprising: one or more memories storing instructions; and one or more processors configured to execute the instructions to at least: 
operate on an input image ( operate generator, 63; figure 6 with image 62a,62b)  using a U-shaped network ( u-shaped network, paragraph [0124]) to produce an output image( generate output image 64, figure 6), 
wherein an encoder of the U-shaped network contributes to a decoder of the U- shaped network(the encoder-decoder attention, paragraph [0202] note that each encoder consists of two major components: a self-attention mechanism and a feed-forward neural network. The self-attention mechanism accepts input encodings from the previous encoder and weighs their relevance to each other to generate output encodings. The feed-forward neural network further processes each output encoding individually. , paragraph [0201]) using a skip connection based on element-wise addition (A skip architecture is defined, that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations, paragraph [0125]),  
the encoder is a first instance of a transformer block (the transformer Encoder 44 may include one or more layers, paragraph [0207] ; the original Transformer model architecture uses an encoder-decoder architecture, paragraph [019]) , the decoder is a second instance of the transformer block, and the transformer block comprises (Each of the layers may comprise 3×3 Convolutions, followed by BatchNorm, and LeakyReLU activation. Any ANN herein, may comprise, may use, or may be based on, a method, scheme or architecture such as U-Net, which is based on the fully convolutional network to supplement a usual contracting network by successive layers, where pooling operations are replaced by upsampling operators. These layers increase the resolution of the output, and a successive convolutional layer can then learn to assemble a precise output based on this information, paragraph [0340] ) . While Dekel meets the limitation above. Dekel fails to teach “ a pooling input mixer followed by a first channel scaler, and use a multi-layer perceptron followed by a second channel scaler. “ Specifically, Ikonin et al teaches Input layer is the layer to which the input (such as a portion of an image as shown in FIG. 1) is provided for processing. The hidden layers of a CNN typically consist of a series of convolutional layers that convolve with a multiplication or other dot product. The result of a layer is one or more feature maps (f.maps in FIG. 1), sometimes also referred to as channels. There may be a subsampling involved in some or all of the layers. As a consequence, the feature maps may become smaller, as illustrated in FIG. 1. The activation function in a CNN is usually a ReLU (Rectified Linear Unit) layer, and is subsequently followed by additional convolutions such as pooling layers, fully connected layers and normalization layers, referred to as hidden layers because their inputs and outputs are masked by the activation function and final convolution ( paragraph [0096]).Moreover, Ikonin teaches Convolutional neural networks are biologically inspired variants of multilayer perceptrons that are specifically designed to emulate the behavior of a visual cortex. These models mitigate the challenges posed by the MLP architecture by exploiting the strong spatially local correlation present in natural images. The convolutional layer is the core building block of a CNN. The layer’s parameters consist of a set of learnable filters (the above-mentioned kernels), which have a small receptive field, but extend through the full depth of the input volume ( paragraph [0099]). Additionally, Ikonin clearly teaches in figure 9 the encoder (e.g. signs 911-920) and decoder (e.g. signs 940-951) have the same number of down sampling and up sampling layers correspondingly, the nearest neighbor method may be used for up sampling, and average pooling may be used for down sampling. The shape and size of the pooling layers are aligned with scale factor of the up sampling layers. In some other possible implementations, another method of pooling can be used, e.g. max pooling ( paragraph [0335]). It would have been obvious to one skilled in the art before filing of the claimed invention to use pooling input in combination with the scaling as taught by Ikonin in order to enhance the tracking performance . Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.	
As to claim 2, Ikonin et al teaches the apparatus of claim 1, wherein the pooling input mixer doesn't comprise learnable parameters (convolution with a kernel with identical, fixed weights and a stride of the size of the kernel corresponds to an average pooling operation. However, the stride of a convolution used in the present embodiment may be different from the kernel size and the weights may be different. In an example, the kernel weights may be such that certain features in the input feature map may be enhanced or distinguished from each. Furthermore, the weights of the kernel may be learnable or learned beforehand, paragraph [0205]).  
As to claim 3, Ikonin et al teaches the apparatus of claim 1, wherein an input shape of the pooling input mixer is equal to an output shape of the pooling input mixer ( FIG. 9, the encoder (e.g. signs 911-920) and decoder (e.g. signs 940-951) have the same number of down sampling and up sampling layers correspondingly, the nearest neighbor method may be used for up sampling, and average pooling may be used for down sampling. The shape and size of the pooling layers are aligned with scale factor of the up sampling layers. In some other possible implementations, another method of pooling can be used, e.g. max pooling., paragraph[0335]). 
As to claim 4, Dekel teaches the apparatus of claim 1, wherein the U-shaped network does not apply concatenation to the skip connection thereby reducing latency of the apparatus by avoiding a dimensionality expansion The contracting path is a typical convolutional network that consists of repeated application of convolutions, each followed by a rectified linear unit (ReLU) and a max pooling operation. During the contraction, the spatial information is reduced while feature information is increased, paragraph [0124]).  
As to claim 5, Dekel teaches  the apparatus of claim 1, wherein the pooling input mixer is preceded in the transformer block by a batch normalization, and the batch normalization is configured to be foldable into a preceding linear transformation (Each of the layers may comprise 3×3 Convolutions, followed by BatchNorm, and LeakyReLU activation. The encoder's channels dimensions are [3.fwdarw.16.fwdarw.32.fwdarw.64.fwdarw.128.fwdarw.128], while the decoder follows a reversed order. In each level of the encoder, an additional 1×1 Convolution layer is added and concatenate the output features to the corresponding level of the decoder. Lastly, a 1×1 Convolution layer is added followed by Sigmoid activation to get the final RGB output. Any ANN herein, may comprise, may use, or may be based on, a method, scheme or architecture such as U-Net, which is based on the fully convolutional network to supplement a usual contracting network by successive layers, where pooling operations are replaced by upsampling operators. These layers increase the resolution of the output, and a successive convolutional layer can then learn to assemble a precise output based on this information, paragraph [0340]).
As to claim 6, Dekel teaches  the apparatus of claim 1, wherein the transformer block comprises a first stage and a second stage, the first stage comprises a first batch normalization, the pooling input mixer, the first channel scaler, and a first summation node, wherein the first summation node operates on an input to the transformer block and an output of the first channel scaler, and the second stage comprises a second batch normalization, the multi-layer perceptron followed by the second channel scaler, and a second summation node, wherein the second summation node operates on an output of the first summation node and an output of the second channel scaler to produce an output of the transformer block( The generator 52 may be based on, may include, or may use, an Artificial Neural Network (ANN) model or architecture. In one example, A U-Net architecture, with a 5-layer encoder and a symmetrical decoder. Each of the layers may comprise 3×3 Convolutions, followed by BatchNorm, and LeakyReLU activation. Any ANN herein, may comprise, may use, or may be based on, a method, scheme or architecture such as U-Net, which is based on the fully convolutional network to supplement a usual contracting network by successive layers, where pooling operations are replaced by upsampling operators. These layers increase the resolution of the output, and a successive convolutional layer can then learn to assemble a precise output based on this information, paragraph[0340]).  
As to claim 7, Dekel teaches the apparatus of claim 1, wherein a latency of the apparatus is reduced by achieving a batch normalization which is latency-favorable on GPU hardware, a memory footprint is reduced by avoiding concatenation at the skip connection, and the avoiding concatenation allows a computation of a layer output to have a reduced latency by avoiding an increase in array dimensions (an efficient network architecture and a set of two hyper-parameters in order to build very small, low latency models that can be easily matched to the design requirements for mobile and embedded vision applications, and describes the MobileNet architecture and two hyper-parameters width multiplier and resolution multiplier to define smaller and more efficient MobileNets, paragraph [0121-0123]).  
As to claim 8, Dekel teaches the apparatus of claim 1, wherein a first decoder at a same level of the U-shaped network as a first encoder is identical to the first encoder other than learned parameters ((Like the first encoder, the first decoder takes positional information and embeddings of the output sequence as its input, rather than encodings. The transformer must not use the current or future output to predict an output, so the output sequence must be partially masked to prevent this reverse information flow. This allows for autoregressive text generation. For all attention heads, attention can't be placed on following tokens. The last decoder is followed by a final linear transformation and softmax layer, to produce the output probabilities over the vocabulary, paragraph[0202]).
As to claim 9, Dekel teaches the smartphone ( Any apparatus herein, which may be any of the systems, devices, modules, or functionalities described herein, may be integrated with a smartphone, paragraph [0389])comprising: a camera app; a user interface; and an application specific integrated circuit, wherein the application specific integrated circuit ( an Application Programming Interface (API), defined as an intermediary software serving as the interface allowing the interaction and data sharing between an application software and the application platform, across which few or all services are provided, and commonly used to expose or use a specific software functionality, while protecting the rest of the application, paragraph [0390])is configured to at least: receive an input image from the camera app( operate generator, 63; figure 6 with image 62a,62b) , operate on the input image using a U-shaped network(u-shape network [0124]) to produce an output image(generate output image 64, figure 6), wherein an encoder of the U-shaped network contributes to a decoder of the U-shaped network (the encoder-decoder attention, paragraph [0202] note that each encoder consists of two major components: a self-attention mechanism and a feed-forward neural network. The self-attention mechanism accepts input encodings from the previous encoder and weighs their relevance to each other to generate output encodings. The feed-forward neural network further processes each output encoding individually. , paragraph [0201]) using a skip connection based on element-wise addition(a skip architecture is defined, that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations, paragraph [0125]), wherein the encoder is a first instance of a transformer (The Transformer Encoder 44 may include one or more layers, paragraph [0207] ; the original Transformer model architecture uses an encoder-decoder architecture, paragraph [019]) , the decoder is a second instance of the transformer (Each of the layers may comprise 3×3 Convolutions, followed by BatchNorm, and LeakyReLU activation. Any ANN herein, may comprise, may use, or may be based on, a method, scheme or architecture such as U-Net, which is based on the fully convolutional network to supplement a usual contracting network by successive layers, where pooling operations are replaced by upsampling operators. These layers increase the resolution of the output, and a successive convolutional layer can then learn to assemble a precise output based on this information, paragraph [0340] )  and transmit the output image to the camera app for display on the user interface (A sub flow chart 85c is executed by a separate device, that may be a server, such as the server 24, that received the first image as part of the “Receive Image” step 81a and the second image as part of the “Receive Image” step 82a. The received images are processed by a processor that executes the software or firmware of the generator 52 as part of the “Train/Operate Generator” step 63 for generating the output image as part of the “Generate Output Image” step 64. The produced output image 53 is sent, as part of a “Send Image” step 83 to a device that includes the display 54, where it is received as part of a “Receive Image” step 83a for displaying on the display 54. The device that houses the display 54 executes a sub flow chart 85d, that includes receiving of the output image 53 as part of the “Receive Image” step 83a, and displaying the received output image 53 on the display 54 as part of the “Display Output Image” step 65, paragraph [0364]). While Dekel meets the limitation above. Dekel fails to teach “ the transformer comprises a pooling input mixer followed by a first channel scaler, and a multi-layer perceptron followed by a second channel scaler. “ Specifically, Ikonin et al teaches Input layer is the layer to which the input (such as a portion of an image as shown in FIG. 1) is provided for processing. The hidden layers of a CNN typically consist of a series of convolutional layers that convolve with a multiplication or other dot product. The result of a layer is one or more feature maps (f.maps in FIG. 1), sometimes also referred to as channels. There may be a subsampling involved in some or all of the layers. As a consequence, the feature maps may become smaller, as illustrated in FIG. 1. The activation function in a CNN is usually a ReLU (Rectified Linear Unit) layer, and is subsequently followed by additional convolutions such as pooling layers, fully connected layers and normalization layers, referred to as hidden layers because their inputs and outputs are masked by the activation function and final convolution ( paragraph [0096]).Moreover, Ikonin teaches Convolutional neural networks are biologically inspired variants of multilayer perceptrons that are specifically designed to emulate the behavior of a visual cortex. These models mitigate the challenges posed by the MLP architecture by exploiting the strong spatially local correlation present in natural images. The convolutional layer is the core building block of a CNN. The layer’s parameters consist of a set of learnable filters (the above-mentioned kernels), which have a small receptive field, but extend through the full depth of the input volume ( paragraph [0099]). Additionally, Ikonin clearly teaches in figure 9 the encoder (e.g. signs 911-920) and decoder (e.g. signs 940-951) have the same number of down sampling and up sampling layers correspondingly, the nearest neighbor method may be used for up sampling, and average pooling may be used for down sampling. The shape and size of the pooling layers are aligned with scale factor of the up sampling layers. In some other possible implementations, another method of pooling can be used, e.g. max pooling ( paragraph [0335]). It would have been obvious to one skilled in the art before filing of the claimed invention to use pooling input in combination with the scaling as taught by Ikonin in order to enhance the tracking performance . Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.	
The limitation of claims 10-20 has been addressed in claims 1-8 above. 
				Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NANCY BITAR whose telephone number is (571)270-1041. The examiner can normally be reached Mon-Friday from 8:00 am to 5:00 p.m..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mrs. Jennifer Mehmood  can be reached at 571-272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

NANCY . BITAR
Examiner
Art Unit 2664



/NANCY BITAR/Primary Examiner, Art Unit 2664
Read full office action
Prosecution Timeline

Nov 21, 2023
Application Filed
Jan 23, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/955,224
Patent 12599437
PRE-PROCEDURE PLANNING, INTRA-PROCEDURE GUIDANCE FOR BIOPSY, AND ABLATION OF TUMORS WITH AND WITHOUT CONE-BEAM COMPUTED TOMOGRAPHY OR FLUOROSCOPIC IMAGING
2y 5m to grant Granted Apr 14, 2026
18/224,201
Patent 12597132
IMAGE PROCESSING METHOD AND APPARATUS
2y 5m to grant Granted Apr 07, 2026
18/303,724
Patent 12597240
METHOD AND SYSTEM FOR AUTOMATED CENTRAL VEIN SIGN ASSESSMENT
2y 5m to grant Granted Apr 07, 2026
18/327,241
Patent 12597189
METHODS AND APPARATUS FOR SYNTHETIC COMPUTED TOMOGRAPHY IMAGE GENERATION
2y 5m to grant Granted Apr 07, 2026
18/195,009
Patent 12591982
MOTION DETECTION ASSOCIATED WITH A BODY PART
2y 5m to grant Granted Mar 31, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
83%
Grant Probability
91%
With Interview (+8.2%)
2y 11m
Median Time to Grant
Low
PTA Risk
Based on 946 resolved cases by this examiner. Grant probability derived from career allow rate.
EFFICIENT ON-DEVICE TRANSFORMER ARCHITECTURE FOR IMAGE PROCESSING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email