Last updated: May 04, 2026
Application No. 18/031,064
BIDIRECTIONAL COMPACT DEEP FUSION NETWORKS FOR MULTIMODALITY VISUAL ANALYSIS APPLICATIONS

Non-Final OA §103
Filed
Apr 10, 2023
Priority
Nov 19, 2020 — nonprovisional of PCTCN2020129939
Examiner
KOETH, MICHELLE M
Art Unit
2671
Tech Center
2600 — Communications
Assignee
Intel Corporation
OA Round
3 (Non-Final)
Interview Optional

— +15.9% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 77% grant rate with +15.9% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 431 resolved cases, 2023–2026
Examiner Intelligence

KOETH, MICHELLE M View full profile →
Grants 77% — above average
Career Allowance Rate
333 granted / 431 resolved
+15.3% vs TC avg
Strong +16% interview lift
Without
With
+15.9%
Interview Lift
resolved cases with interview
Fast prosecutor
2y 2m
Avg Prosecution
33 currently pending
Career history
464
Total Applications
across all art units
Statute-Specific Performance

§101
7.4%
-32.6% vs TC avg
§103
62.4%
+22.4% vs TC avg
§102
8.4%
-31.6% vs TC avg
§112
14.7%
-25.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 431 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on December 26, 2025 has been entered.
 

Response to Arguments
Applicant’s arguments and amendments in the Amendment with RCE filed December 26, 2025 (herein “Amendment”), with respect to the objections to claims 26, 38 and 42, and claims depending therefrom, have been fully considered and are persuasive.  The objections to claims 26, 38 and 42, and claims depending therefrom has been withdrawn. 
Applicant’s arguments and amendments in the Amendment, with respect to the rejections of claims 26, 38 and 42, and claims depending therefrom under 35 U.S.C. 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, new grounds of rejection are made in view of Deng et al., “RFBNet: Deep Multimodal Networks with Residual Fusion Blocks for RGB-D Semantic Segmentation,” arXiv:1907.00135v2 [cs.CV], https://doi.org/10.48550/arXiv.1907.00135, September 16, 2019.

Claim Objections
Claim 37 objected to because of the following informalities: the claim as presented in the Amendment does not represent the status of the claim prior to the Amendment. Specifically, claim 37 no longer recites the word “processors” but also does not indicate that the word had been deleted by way of a strikethrough. Accordingly, Applicant is advised to correct this claim in subsequent filings to clarify that the word “processors” was not intended to be removed. For the purposes of examination, the claim has been examined as if “processors” was not removed from the claim. Appropriate correction is required.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 26, 28–31, 37–40, 42–43, and 48 are rejected under 35 U.S.C. 103 as being unpatentable over Dolz et al., "HyperDense-Net: A Hyper-Densely Connected CNN for Multi-Modal Image Segmentation," in IEEE Transactions on Medical Imaging, vol. 38, no. 5, pp. 1116-1126, May 2019, doi: 10.1109/TMI.2018.2878669 (herein “Dolz”) in view of Deng et al., “RFBNet: Deep Multimodal Networks with Residual Fusion Blocks for RGB-D Semantic Segmentation,” arXiv:1907.00135v2 [cs.CV], https://doi.org/10.48550/arXiv.1907.00135, September 16, 2019 (herein “Deng”).
Regarding claim 26, with deficiencies noted in square brackets [], Dolz teaches a system comprising: [a memory to store] a first input volume corresponding to a first input image and a second input volume corresponding to a second input image, the first input image having a first modality and the second input image having a second modality (Dolz pages 1117–1118, Fig. 2, each imaging modality has a path, where page 1119, section C teaches that sub-volumes are used as input), the first and second modalities include different visual information for a same scene (Dolz page 1117 and 1120 (section III), Fig. 2, MRI T1 and MRI T2 of infant brain tissues are the two different modalities used in the HyperDenseNet network, where fig. 2 illustrates the input volumes as appearing differently, and thus having different visual information, as well it being well-known that T1 and T2 are different weights providing different visual information); 
[instructions; and] 
[one or more processors to be programmed by the instructions to]:
cause a shared convolutional layer (Dolz fig. 2, page 1117, convolutional layers from both respective modality branches sharing deep connections) and [a first batch normalization layer] to generate a first output volume based on at least a portion of the first input volume (Dolz fig. 2, page 1118, section A, volume MR-T1 input to the top path including the top conv_1 layer, a CNN layer, where page 1120 table II teaches the output of conv_1 to be of output size 25 x 25 x 25, thus a volume); 
cause the shared convolutional layer (Dolz fig. 2, page 1117, convolutional layers from both respective modality branches sharing deep connections) and [a second batch normalization layer] to generate a second output volume based on the second input volume (Dolz fig. 2, page 1118, section A, volume MR-T2 input to the bottom path including the bottom conv_1 layer, a CNN layer, where page 1120 table II teaches the output of conv_1 to be of output size 25 x 25 x 25, thus a volume), the shared convolutional layer to employ [the same convolutional weights when] generating the first and second output volumes, (Dolz fig. 2, page 1117, convolutional layers feature parameters common to the first and second modalities because these layers process the features from both modalities)  [the first batch normalization layer to include first batch parameters associated with the first modality, the second batch normalization layer including second batch parameters associated with the second modality, the second batch parameters different from the first batch parameters]; and 
output a plurality of features corresponding to the first and second output volumes for the scene (Dolz page 1117, Fig. 2, black arrows in the network diagram representing connections to feature maps, and where the output of conv_1, in both the top and bottom branches, is shown by a black arrow).
Dolz does not explicitly teach instructions or a memory to store, or one or more processors coupled to the memory, the one or more processors to execute the instructions. Further, Dolz does not explicitly teach a first and second batch normalization layer, or the first batch normalization layer including first batch parameters associated with the first modality and the second batch normalization layer including second batch parameters associated with the second modality, the second batch parameters different from the first batch parameters.
Still further, Dolz does not explicitly teach a shared convolutional layer to employ the same weights when generating the first and second output.
Deng teaches a memory to store, instructions and one or more processors to be programmed by the instructions to (Deng page 4, section IV (B), models implemented using Tensorflow (instructions to) using a single 1080Ti GPU (processor), where a person having ordinary skill in the art understands a GPU to include memory that stores instructions).
Deng further teaches a first batch normalization layer and a second batch normalization layer, the first batch normalization layer to including first batch parameters associated with the first modality and the second batch normalization layer including second batch parameters associated with the second modality, the second batch parameters different from the first batch parameters (Deng page 3, figs. 2 and 3, section C, Residual Fusion Block, in each residual fusion block, there are RU branches respective to each modality, within each RU branch is a BN (batch normalization) processing respective to each modality (batch parameters associated respectively to the first and second modalities, thus also being different from each other)).
Deng still further teaches a shared convolutional layer to employ the same convolutional weights when generating the first and second output (Deng page 3, fig. 3, section C, residual Fusion Block (RFB) section, the framework of the RFB including a shared layer labeled GFU where both modalities input their respective data and including a same convolutional layer with 1x1, C/2 parameters).
Therefore, taking the teachings of Dolz and Deng together as a whole, it would have been obvious to a person having ordinary skill in the art (herein “PHOSITA”) before the effective filing date of the claimed invention to have modified the network system of Dolz to include the processor and memory of Deng, as well as to have batch normalization layers respective to the modality, and with the shared convolutional layer as taught in Dang at least because doing so would improve performance for image segmentation. See Deng Abstract, and page 6.
Regarding claims 28, 39 and 43, with claim 28 as exemplary, Dolz as modified above by Deng teaches wherein at least one of the one or more processors is to – claim 28, and comprising instructions to cause the device to – claim 43. Dolz further teaches to provide bidirectional fusion for the first and second modalities (given that the present Specification discloses that bidirectional fusion is when features are crossed over from one modality to another, Dolz teaches in fig. 2, and pages 1118–1119, that features from each modality network path are linked (crossed over as shown in fig. 2)): 
combine first features from the first output volume and second features from the second output volume or a third input volume corresponding to the second modality to generate a first fused input volume corresponding to the first modality for input to a second convolutional layer (Dolz fig. 2, pages 1118–1119, outputs of the Conv_2 layer (first features) of a first modality MR-T1 path is input to another convolutional layer Conv_3 along with (combined) the outputs from the Conv_1 layer (second features) in the second modality MR-T2 path as shown by black arrows in fig. 2 – where the examiner has circled in the two feature sets being combined in the below reproduced fig. 2, and where Conv_3 corresponds to the claimed “second convolutional layer”: 

    PNG
    media_image1.png
    606
    904
    media_image1.png
    Greyscale

); and
combine third features from the second output volume and fourth features from the first output volume or a fourth output volume corresponding to the first modality to generate a second fused input volume corresponding to the second modality for input to the second convolutional layer (Dolz fig. 2, pages 1118–1119, outputs of the Conv_1 layer (fourth features) of a first modality MR-T1 path is input to the convolutional layer Conv_3 along with (combined) the outputs from the Conv_1 layer (third features) in the other modality MR-T2 path as shown by black arrows in fig. 2 – where the examiner has circled in the two feature sets being combined in the below reproduced fig. 2, and where Conv_3 corresponds to the claimed “second convolutional layer”:
    PNG
    media_image2.png
    606
    904
    media_image2.png
    Greyscale
).
Regarding claims 29, and 40, with claim 29 as exemplary, Dolz teaches wherein the first features comprise first feature maps from the first output volume and the second features comprise second feature maps from the second output volume, the first fused input volume comprising a concatenation of the first and second feature maps (Dolz pages 1118–1119, feature outputs are concatenated as given in equation 4, including the first and second feature maps from each respective volume).
Regarding claim 30, Dolz teaches wherein the first output volume comprises a first set of feature maps in an order, the second output volume comprises a second set of feature maps in the order, and the first fused input volume comprises the first and second feature maps in the order (Dolz pages 1118–1119, concatenation as given in equation 4, including a particular order for shuffling and interleaving).
Regarding claim 31, Dolz teaches wherein the third features comprise third feature maps from the second output volume and the fourth features comprises fourth feature maps from the first output volume, the second fused input volume comprising a concatenation of the third and fourth feature maps in the order (Dolz pages 1118–1119, concatenation as given in equation 4, including a particular order for shuffling and interleaving for each layer shown in fig. 2 (including the above disclosed layers outputting the third and fourth features which are feature maps)).
Regarding claim 37, Dolz as modified by Deng above teaches wherein at least one of the one or more processors is to. Dolz further teaches implement the plurality of features in a visual analysis application to generate visual analysis image outputs corresponding to the scene (Dolz page 1120–1122, proposed HyperDenseNet architecture evaluated on brain tissue segmentation, where fig. 6 illustrates the visual analysis results of the brain (scene)).
Regarding claims 38 and 42, with deficiencies noted in square brackets [], and significant differences between claims 38 and 42 noted in curly brackets {}, with claim 38 as exemplary, Dolz teaches {a method comprising: - claim 38 (Dolz Abstract, image processing through a disclosed neural network) / [at least one machine readable storage device comprising a plurality of instructions to cause a device to:] – claim 42} 
accessing a first input volume corresponding to a first input image having a first modality and a second input volume corresponding to a second input image having a second modality (Dolz pages 1117–1118, Fig. 2, each imaging modality has a path, where page 1119, section C teaches that sub-volumes are used as input (accessing)), the first and second modalities including different visual information for a same scene (Dolz page 1117 and 1120 (section III), Fig. 2, MRI T1 and MRI T2 of infant brain tissues are the two different modalities used in the HyperDenseNet network, where fig. 2 illustrates the input volumes as appearing differently, and thus having different visual information, as well it being well-known that T1 and T2 are different weights providing different visual information); 
causing a shared convolutional layer (Dolz fig. 2, page 1117, convolutional layers from both respective modality branches sharing deep connections) and [a first batch normalization layer] to the first input volume to generate a first output volume based on at least a portion of the first input volume (Dolz fig. 2, page 1118, section A, volume MR-T1 input to the top path including the top conv_1 layer, a CNN layer, where page 1120 table II teaches the output of conv_1 to be of output size 25 x 25 x 25, thus a volume); 
causing the shared convolutional layer (Dolz fig. 2, page 1117, convolutional layers from both respective modality branches sharing deep connections) and [a second batch normalization layer] to generate a second output volume based on the second input volume (Dolz fig. 2, page 1118, section A, volume MR-T2 input to the bottom path including the bottom conv_1 layer, a CNN layer, where page 1120 table II teaches the output of conv_1 to be of output size 25 x 25 x 25, thus a volume), the shared convolutional layer to employ [the {same – claim 38 / identical – claim 42} convolutional weights when] generating the first and second output volumes, (Dolz fig. 2, page 1117, convolutional layers feature parameters common to the first and second modalities because these layers process the features from both modalities)  [the first batch normalization layer including first batch parameters associated with the first modality and the second batch normalization layer including second batch parameters associated with the second modality, the second batch parameters different from the first batch parameters]; and 
outputting a plurality of features corresponding to the first and second output volumes for the scene (Dolz page 1117, Fig. 2, black arrows in the network diagram representing connections to feature maps, and where the output of conv_1, in both the top and bottom branches, is shown by a black arrow).
Dolz does not explicitly teach at least one machine readable storage device comprising a plurality of instructions to cause a device to. Further, Dolz does not explicitly teach a first and second batch normalization layer or the first batch normalization layer to including first batch parameters associated with the first modality and the second batch normalization layer including second batch parameters associated with the second modality, the second batch parameters different from the first batch parameters.
Still further, Dolz does not explicitly teach a shared convolutional layer to employ the same weights when generating the first and second output.
Deng teaches at least one machine readable storage device comprising a plurality of instructions to cause a device to (Deng page 4, section IV (B), models implemented using Tensorflow (instructions to) using a single 1080Ti GPU processor, where a person having ordinary skill in the art understands a GPU to include memory/a machine readable storage device that stores instructions).
Deng further teaches a first batch normalization layer and a second batch normalization layer, the first batch normalization layer including first batch parameters associated with the first modality and the second batch normalization layer including second batch parameters associated with the second modality, the second batch parameters different from the first batch parameters (Deng page 3, figs. 2 and 3, section C, Residual Fusion Block, in each residual fusion block, there are RU branches respective to each modality, within each RU branch is a BN (batch normalization) processing respective to each modality (batch parameters associated respectively to the first and second modalities, thus also being different from each other)).
Deng still further teaches a shared convolutional layer to employ the same/identical convolutional weights when generating the first and second output (Deng page 3, fig. 3, section C, residual Fusion Block (RFB) section, the framework of the RFB including a shared layer labeled GFU where both modalities input their respective data and including a same convolutional layer with 1x1, C/2 parameters).
Therefore, taking the teachings of Dolz and Deng together as a whole, it would have been obvious to a person having ordinary skill in the art (herein “PHOSITA”) before the effective filing date of the claimed invention to have modified the network system of Dolz to include the processor and memory of Deng, as well as to have batch normalization layers respective to the modality, and with the shared convolutional layer as taught in Dang at least because doing so would improve performance for image segmentation. See Deng Abstract, and page 6.
Regarding claim 48, Dolz teaches the first input volume and the second input volume (Dolz pages 1117–1118, Fig. 2, each imaging modality has a path, where page 1119, section C teaches that sub-volumes are used as input), but does not explicitly teach where Deng teaches wherein: the shared convolutional layer is trained as a single convolutional layer for both the first modality and the second modality (Deng pages 4–5, section B, Implementation Details, multimodal models are trained separate from the unimodal modals, where fig. 3, page 3, illustrates the modality specific streams/branches, and the cross-modal/multimodal model as the middle branch, and illustrating the single shared Conv 1x1, C/2 layer), first model parameters for the first batch normalization layer are trained for the first modality and second model parameters for the second batch normalization layer are trained for the second modality modality (Deng pages 4–5, section B, Implementation Details, multimodal models are trained separate from the unimodal modals, where fig. 3, page 3, illustrates individual BN blocks within the top branch unimodal model for one modality, and the individual BN blocks within the bottom branch unimodal model for the second modality); and 
the first batch normalization layer is a first layer applied to the at least the portion of the first input after the shared convolutional layer and the second batch normalization layer is a first layer applied to the at least the portion of the second input after the second input volume (Deng page 3, fig. 3, section C, residual fusion block, the input into the BN blocks in the top unimodal stream including a downstream output from the shared convolutional layer, and the input into the BN blocks in the bottom unimodal stream including a downstream output from the shared convolutional layer as shown below (annotations in black with arrows added):

    PNG
    media_image3.png
    614
    519
    media_image3.png
    Greyscale
).
Therefore, taking the teachings of Dolz and Deng together as a whole, it would have been obvious to a PHOSITA before the effective filing date of the claimed invention to have modified the network system of Dolz to include the batch normalization layers respective to the modality, and with the shared convolutional layer, in the processing sequence as taught in Dang at least because doing so would improve performance for image segmentation. See Deng Abstract, and page 6.
Claim 27 is rejected under 35 U.S.C. 103 as being unpatentable over Dolz in view of Deng, as set forth above regarding claim 26 from which claim 27 depends, further in view of Ioffe et al., “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 2015. JMLR: W&CP volume 37 (herein “Ioffe”). 
Regarding claim 27, while the combination of Dolz modified by Deng as disclosed above teaches the first batch normalization layer and the second batch normalization layer, Dolz as modified by Deng does not explicitly teach that the first and second batch normalization layer comprises first batch normalization layer parameters and the second batch normalization layer comprises second batch normalization layer parameters trained separately from the first activation parameters.
Ioffe teaches the first batch normalization layer comprises first batch normalization layer parameters and the second batch normalization layer comprises second batch normalization layer parameters trained separately from the first activation parameters (Ioffe section 3.2, including algorithm 2, which teaches in pseudocode the individual (separately) training of batch normalization networks (N networks) including the optimization of parameters).
Therefore, taking the teachings of Dolz as modified above by Deng and Ioffe together as a whole, it would have been obvious to a PHOSITA before the effective filing date of the claimed invention to have modified the network system of Dolz as modified by Deng to include the individual training of batch normalization networks as taught by Ioffe, at least because doing so would enable higher learning rates. See Ioffe section 3.3.
Claims 32–35, and 41, are rejected under 35 U.S.C. 103 as being unpatentable over Dolz in view of Deng, as set forth above regarding the claims from which these claims depend, further in view of Laine et al., US Patent Application Publication No. US 2020/0242739 A1 (herein “Laine”). 
Regarding claims 32, and 41, with claim 32 as exemplary, Dolz teaches wherein the first fused input volume comprises sums of first feature maps from the first output volume (Dolz page 1119, output of a layer being the concatenation (sums) of feature maps from each image modality path). Dolz and Deng do not, but Laine teaches and pixel-wise shifted versions of second feature maps from the second output volume or the third input volume (Laine ¶¶43, 141–142, 145, shifting of pixels after network branch, where a network branch 212 is a convolution layer that includes n feature maps).
Therefore, taking the teachings of Dolz as modified above by Deng and Laine together as a whole, it would have been obvious to a PHOSITA before the effective filing date of the claimed invention to have modified the network system of Dolz as modified by Deng to include the pixel shifting as taught by Laine, at least because doing so would provide for de-noising of input data (Laine ¶¶4, 9).
Regarding claim 33, Dolz as modified by Deng do not explicitly teach, but Laine teaches wherein the pixel-wise shifted versions of the second feature maps comprise a third feature map shifted in a horizontal direction and a fourth feature map shifted in a vertical direction (Laine ¶141, pixel shifting (after the network branch, thus on the feature maps) on one branch are shifted left or right (horizontal) while pixel shifting on another branch are shifted up or down (vertical)).
Therefore, taking the teachings of Dolz as modified above by Deng and Laine together as a whole, it would have been obvious to a PHOSITA before the effective filing date of the claimed invention to have modified the network system of Dolz as modified by Deng to include the pixel shifting as taught by Laine, at least because doing so would provide for de-noising of input data (Laine ¶¶4, 9).
Regarding claim 34, Dolz teaches wherein the second fused input volume comprises sums of third feature maps from the second output volume (Dolz page 1119, output of a layer being the concatenation (sums) of feature maps from each image modality path). Dolz and Deng do not, but Laine teaches and pixel-wise shifted versions of fourth feature maps from the first output volume or a fourth input volume (Laine ¶¶43, 141–142, 145, shifting of pixels after network branch, where a network branch 212 is a convolution layer that includes n feature maps).
Therefore, taking the teachings of Dolz as modified above by Deng and Laine together as a whole, it would have been obvious to a PHOSITA before the effective filing date of the claimed invention to have modified the network system of Dolz as modified by Deng to include the pixel shifting as taught by Laine, at least because doing so would provide for de-noising of input data (Laine ¶¶4, 9).
Regarding claim 35, Dolz as modified by Deng teaches wherein at least one of the one or more processors is to. Dolz further teaches shuffle feature maps of the first and second output volumes to generate a third input volume corresponding to the first modality, the third input volume comprising first feature maps from the first output volume and second feature maps from the second output volume (Dolz page 1119, fig. 2, shuffling and interleaving feature map elements from the CNN layers from each modality path per equation 4); 
add the fourth input volume and a fifth input volume to generate a sixth input volume for input to a second convolutional layer (Dolz page 1119, fig. 2, outputs from layers in different streams (modalities) are concatenated (add) per equation (3), where the network shown in fig. 2 illustrates at least four (and contemplates additional) stages of convolutional layering including outputs of one convolutional layer to be input into another convolutional layer of the other modality path). 
Dolz as modified by Deng does not explicitly teach, but Laine teaches pixel-wise shift at least a subset of the first and second feature maps to generate a fourth input volume (Laine ¶¶43, 141–142, 145, shifting of pixels after network branch, where a network branch 212 is a convolution layer that includes n feature maps).
Therefore, taking the teachings of Dolz as modified above by Deng and Laine together as a whole, it would have been obvious to a PHOSITA before the effective filing date of the claimed invention to have modified the network system of Dolz as modified by Deng to include the pixel shifting as taught by Laine, at least because doing so would provide for de-noising of input data (Laine ¶¶4, 9).
Claims 46–47 are rejected under 35 U.S.C. 103 as being unpatentable over Dolz in view of Deng, as set forth above regarding claim 26 from which claim 46 depends, further in view of Wang et al., “Deep Multimodal Fusion by Channel Exchanging,” arXiv:2011.05005v1 [cs.CV], November 10, 2020, https://doi.org/10.48550/arXiv.2011.05005 (herein “Wang2”). 
Regarding claim 46, Dolz as modified by Deng teaches the first and second output volumes (Dolz fig. 2, pages 1117–1118, section A, volume MR-T1 input to the top path and volume MR-T2 input to the bottom path) but does not explicitly teach, but Wang2 teaches wherein at least one of the one or more processors is to perform bidirectional fusion of the first and second output (Wang2 fig. 2, page 4, as shown channel exchanging from each modality is bidirectional 0 i.e. from modality 1 to modality 2 and vice versa).
Therefore, taking the teachings of Dolz as modified by Deng and Wang2 together as a whole, it would have been obvious to a PHOSITA before the effective filing date of the claimed invention to have modified the multi-modal fusion of Dolz to include the bidirectional fusion disclosed in Wang2 at least because doing so would provide a more compact and thus efficient processing network. See Wang2 Abstract.
Regarding claim 47, Dolz as modified by Deng teaches the first and second output volumes (Dolz fig. 2, pages 1117–1118, section A, volume MR-T1 input to the top path and volume MR-T2 input to the bottom path) but does not explicitly teach, but Wang2 teaches wherein at least one of the one or more processors is to perform the bidirectional fusion of the first and second output by: performing cross-modality channel shuffling (CMCS) that exchanges non-overlapping channel subsets between the first and second output (Wang2 page 3, section 3, channel exchanging networks that exchange (shuffle) channels for message fusion, where fig. 2, page 4 illustrates that the channel exchanging is done one way per subset, thus non-overlapping), and performing modality-specific pixel shifting (MSPS) that shifts grouped feature maps by one pixel in different spatial directions and adds the shifted feature maps into the output of the other modality (Wang2, figure 2, page 4, for each modality the channels comprised of grouped feature maps are shifted in height (shown in fig. 2 as being shifted up, in the height axis) and then exchanged (adds) the shifted channel to the other modality).
Therefore, taking the teachings of Dolz as modified by Deng and Wang2 together as a whole, it would have been obvious to a PHOSITA before the effective filing date of the claimed invention to have modified the multi-modal fusion of Dolz to include the bidirectional fusion disclosed in Wang2 at least because doing so would provide a more compact and thus efficient processing network. See Wang2 Abstract.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M KOETH whose telephone number is (571)272-5908. The examiner can normally be reached Monday-Thursday, 09:00-17:00, Friday 09:00-13:00, EDT/EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached at 571-272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MICHELLE M. KOETH
Primary Examiner
Art Unit 2671

/MICHELLE M KOETH/Primary Examiner, Art Unit 2671
Read full office action
Prosecution Timeline

Show 1 earlier event
Jun 25, 2025
Non-Final Rejection — §103
Sep 29, 2025
Response Filed
Oct 20, 2025
Final Rejection — §103
Dec 11, 2025
Applicant Interview (Telephonic)
Dec 11, 2025
Examiner Interview Summary
Dec 26, 2025
Request for Continued Examination
Jan 14, 2026
Response after Non-Final Action
Feb 02, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/297,396
Patent 12586221
METHOD AND APPARATUS FOR ESTIMATING DEPTH INFORMATION OF IMAGES
2y 11m to grant Granted Mar 24, 2026
17/886,027
Patent 12579651
IMPEDED DIFFUSION FRACTION FOR QUANTITATIVE IMAGING DIAGNOSTIC ASSAY
3y 7m to grant Granted Mar 17, 2026
17/988,795
Patent 12567241
Method For Generating Training Data Used To Learn Machine Learning Model, System, And Non-Transitory Computer-Readable Storage Medium Storing Computer Program
3y 3m to grant Granted Mar 03, 2026
18/132,751
Patent 12567177
METHOD, ELECTRONIC DEVICE, AND COMPUTER PROGRAM PRODUCT FOR IMAGE PROCESSING
2y 10m to grant Granted Mar 03, 2026
18/221,227
Patent 12566493
METHODS AND SYSTEMS FOR EYE-GAZE LOCATION DETECTION AND ACCURATE COLLECTION OF EYE-GAZE DATA
2y 7m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
77%
Grant Probability
93%
With Interview (+15.9%)
2y 2m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 431 resolved cases by this examiner. Grant probability derived from career allowance rate.