DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on December 26, 2025 has been entered.
Response to Arguments
Applicant’s arguments and amendments in the Amendment with RCE filed December 26, 2025 (herein “Amendment”), with respect to the objections to claims 26, 38 and 42, and claims depending therefrom, have been fully considered and are persuasive. The objections to claims 26, 38 and 42, and claims depending therefrom has been withdrawn.
Applicant’s arguments and amendments in the Amendment, with respect to the rejections of claims 26, 38 and 42, and claims depending therefrom under 35 U.S.C. 103 have been fully considered and are persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, new grounds of rejection are made in view of Deng et al., “RFBNet: Deep Multimodal Networks with Residual Fusion Blocks for RGB-D Semantic Segmentation,” arXiv:1907.00135v2 [cs.CV], https://doi.org/10.48550/arXiv.1907.00135, September 16, 2019.
Claim Objections
Claim 37 objected to because of the following informalities: the claim as presented in the Amendment does not represent the status of the claim prior to the Amendment. Specifically, claim 37 no longer recites the word “processors” but also does not indicate that the word had been deleted by way of a strikethrough. Accordingly, Applicant is advised to correct this claim in subsequent filings to clarify that the word “processors” was not intended to be removed. For the purposes of examination, the claim has been examined as if “processors” was not removed from the claim. Appropriate correction is required.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 26, 28–31, 37–40, 42–43, and 48 are rejected under 35 U.S.C. 103 as being unpatentable over Dolz et al., "HyperDense-Net: A Hyper-Densely Connected CNN for Multi-Modal Image Segmentation," in IEEE Transactions on Medical Imaging, vol. 38, no. 5, pp. 1116-1126, May 2019, doi: 10.1109/TMI.2018.2878669 (herein “Dolz”) in view of Deng et al., “RFBNet: Deep Multimodal Networks with Residual Fusion Blocks for RGB-D Semantic Segmentation,” arXiv:1907.00135v2 [cs.CV], https://doi.org/10.48550/arXiv.1907.00135, September 16, 2019 (herein “Deng”).
Regarding claim 26, with deficiencies noted in square brackets [], Dolz teaches a system comprising: [a memory to store] a first input volume corresponding to a first input image and a second input volume corresponding to a second input image, the first input image having a first modality and the second input image having a second modality (Dolz pages 1117–1118, Fig. 2, each imaging modality has a path, where page 1119, section C teaches that sub-volumes are used as input), the first and second modalities include different visual information for a same scene (Dolz page 1117 and 1120 (section III), Fig. 2, MRI T1 and MRI T2 of infant brain tissues are the two different modalities used in the HyperDenseNet network, where fig. 2 illustrates the input volumes as appearing differently, and thus having different visual information, as well it being well-known that T1 and T2 are different weights providing different visual information);
[instructions; and]
[one or more processors to be programmed by the instructions to]:
cause a shared convolutional layer (Dolz fig. 2, page 1117, convolutional layers from both respective modality branches sharing deep connections) and [a first batch normalization layer] to generate a first output volume based on at least a portion of the first input volume (Dolz fig. 2, page 1118, section A, volume MR-T1 input to the top path including the top conv_1 layer, a CNN layer, where page 1120 table II teaches the output of conv_1 to be of output size 25 x 25 x 25, thus a volume);
cause the shared convolutional layer (Dolz fig. 2, page 1117, convolutional layers from both respective modality branches sharing deep connections) and [a second batch normalization layer] to generate a second output volume based on the second input volume (Dolz fig. 2, page 1118, section A, volume MR-T2 input to the bottom path including the bottom conv_1 layer, a CNN layer, where page 1120 table II teaches the output of conv_1 to be of output size 25 x 25 x 25, thus a volume), the shared convolutional layer to employ [the same convolutional weights when] generating the first and second output volumes, (Dolz fig. 2, page 1117, convolutional layers feature parameters common to the first and second modalities because these layers process the features from both modalities) [the first batch normalization layer to include first batch parameters associated with the first modality, the second batch normalization layer including second batch parameters associated with the second modality, the second batch parameters different from the first batch parameters]; and
output a plurality of features corresponding to the first and second output volumes for the scene (Dolz page 1117, Fig. 2, black arrows in the network diagram representing connections to feature maps, and where the output of conv_1, in both the top and bottom branches, is shown by a black arrow).
Dolz does not explicitly teach instructions or a memory to store, or one or more processors coupled to the memory, the one or more processors to execute the instructions. Further, Dolz does not explicitly teach a first and second batch normalization layer, or the first batch normalization layer including first batch parameters associated with the first modality and the second batch normalization layer including second batch parameters associated with the second modality, the second batch parameters different from the first batch parameters.
Still further, Dolz does not explicitly teach a shared convolutional layer to employ the same weights when generating the first and second output.
Deng teaches a memory to store, instructions and one or more processors to be programmed by the instructions to (Deng page 4, section IV (B), models implemented using Tensorflow (instructions to) using a single 1080Ti GPU (processor), where a person having ordinary skill in the art understands a GPU to include memory that stores instructions).
Deng further teaches a first batch normalization layer and a second batch normalization layer, the first batch normalization layer to including first batch parameters associated with the first modality and the second batch normalization layer including second batch parameters associated with the second modality, the second batch parameters different from the first batch parameters (Deng page 3, figs. 2 and 3, section C, Residual Fusion Block, in each residual fusion block, there are RU branches respective to each modality, within each RU branch is a BN (batch normalization) processing respective to each modality (batch parameters associated respectively to the first and second modalities, thus also being different from each other)).
Deng still further teaches a shared convolutional layer to employ the same convolutional weights when generating the first and second output (Deng page 3, fig. 3, section C, residual Fusion Block (RFB) section, the framework of the RFB including a shared layer labeled GFU where both modalities input their respective data and including a same convolutional layer with 1x1, C/2 parameters).
Therefore, taking the teachings of Dolz and Deng together as a whole, it would have been obvious to a person having ordinary skill in the art (herein “PHOSITA”) before the effective filing date of the claimed invention to have modified the network system of Dolz to include the processor and memory of Deng, as well as to have batch normalization layers respective to the modality, and with the shared convolutional layer as taught in Dang at least because doing so would improve performance for image segmentation. See Deng Abstract, and page 6.
Regarding claims 28, 39 and 43, with claim 28 as exemplary, Dolz as modified above by Deng teaches wherein at least one of the one or more processors is to – claim 28, and comprising instructions to cause the device to – claim 43. Dolz further teaches to provide bidirectional fusion for the first and second modalities (given that the present Specification discloses that bidirectional fusion is when features are crossed over from one modality to another, Dolz teaches in fig. 2, and pages 1118–1119, that features from each modality network path are linked (crossed over as shown in fig. 2)):
combine first features from the first output volume and second features from the second output volume or a third input volume corresponding to the second modality to generate a first fused input volume corresponding to the first modality for input to a second convolutional layer (Dolz fig. 2, pages 1118–1119, outputs of the Conv_2 layer (first features) of a first modality MR-T1 path is input to another convolutional layer Conv_3 along with (combined) the outputs from the Conv_1 layer (second features) in the second modality MR-T2 path as shown by black arrows in fig. 2 – where the examiner has circled in the two feature sets being combined in the below reproduced fig. 2, and where Conv_3 corresponds to the claimed “second convolutional layer”:
PNG
media_image1.png
606
904
media_image1.png
Greyscale
); and
combine third features from the second output volume and fourth features from the first output volume or a fourth output volume corresponding to the first modality to generate a second fused input volume corresponding to the second modality for input to the second convolutional layer (Dolz fig. 2, pages 1118–1119, outputs of the Conv_1 layer (fourth features) of a first modality MR-T1 path is input to the convolutional layer Conv_3 along with (combined) the outputs from the Conv_1 layer (third features) in the other modality MR-T2 path as shown by black arrows in fig. 2 – where the examiner has circled in the two feature sets being combined in the below reproduced fig. 2, and where Conv_3 corresponds to the claimed “second convolutional layer”:
PNG
media_image2.png
606
904
media_image2.png
Greyscale
).
Regarding claims 29, and 40, with claim 29 as exemplary, Dolz teaches wherein the first features comprise first feature maps from the first output volume and the second features comprise second feature maps from the second output volume, the first fused input volume comprising a concatenation of the first and second feature maps (Dolz pages 1118–1119, feature outputs are concatenated as given in equation 4, including the first and second feature maps from each respective volume).
Regarding claim 30, Dolz teaches wherein the first output volume comprises a first set of feature maps in an order, the second output volume comprises a second set of feature maps in the order, and the first fused input volume comprises the first and second feature maps in the order (Dolz pages 1118–1119, concatenation as given in equation 4, including a particular order for shuffling and interleaving).
Regarding claim 31, Dolz teaches wherein the third features comprise third feature maps from the second output volume and the fourth features comprises fourth feature maps from the first output volume, the second fused input volume comprising a concatenation of the third and fourth feature maps in the order (Dolz pages 1118–1119, concatenation as given in equation 4, including a particular order for shuffling and interleaving for each layer shown in fig. 2 (including the above disclosed layers outputting the third and fourth features which are feature maps)).
Regarding claim 37, Dolz as modified by Deng above teaches wherein at least one of the one or more processors is to. Dolz further teaches implement the plurality of features in a visual analysis application to generate visual analysis image outputs corresponding to the scene (Dolz page 1120–1122, proposed HyperDenseNet architecture evaluated on brain tissue segmentation, where fig. 6 illustrates the visual analysis results of the brain (scene)).
Regarding claims 38 and 42, with deficiencies noted in square brackets [], and significant differences between claims 38 and 42 noted in curly brackets {}, with claim 38 as exemplary, Dolz teaches {a method comprising: - claim 38 (Dolz Abstract, image processing through a disclosed neural network) / [at least one machine readable storage device comprising a plurality of instructions to cause a device to:] – claim 42}
accessing a first input volume corresponding to a first input image having a first modality and a second input volume corresponding to a second input image having a second modality (Dolz pages 1117–1118, Fig. 2, each imaging modality has a path, where page 1119, section C teaches that sub-volumes are used as input (accessing)), the first and second modalities including different visual information for a same scene (Dolz page 1117 and 1120 (section III), Fig. 2, MRI T1 and MRI T2 of infant brain tissues are the two different modalities used in the HyperDenseNet network, where fig. 2 illustrates the input volumes as appearing differently, and thus having different visual information, as well it being well-known that T1 and T2 are different weights providing different visual information);
causing a shared convolutional layer (Dolz fig. 2, page 1117, convolutional layers from both respective modality branches sharing deep connections) and [a first batch normalization layer] to the first input volume to generate a first output volume based on at least a portion of the first input volume (Dolz fig. 2, page 1118, section A, volume MR-T1 input to the top path including the top conv_1 layer, a CNN layer, where page 1120 table II teaches the output of conv_1 to be of output size 25 x 25 x 25, thus a volume);
causing the shared convolutional layer (Dolz fig. 2, page 1117, convolutional layers from both respective modality branches sharing deep connections) and [a second batch normalization layer] to generate a second output volume based on the second input volume (Dolz fig. 2, page 1118, section A, volume MR-T2 input to the bottom path including the bottom conv_1 layer, a CNN layer, where page 1120 table II teaches the output of conv_1 to be of output size 25 x 25 x 25, thus a volume), the shared convolutional layer to employ [the {same – claim 38 / identical – claim 42} convolutional weights when] generating the first and second output volumes, (Dolz fig. 2, page 1117, convolutional layers feature parameters common to the first and second modalities because these layers process the features from both modalities) [the first batch normalization layer including first batch parameters associated with the first modality and the second batch normalization layer including second batch parameters associated with the second modality, the second batch parameters different from the first batch parameters]; and
outputting a plurality of features corresponding to the first and second output volumes for the scene (Dolz page 1117, Fig. 2, black arrows in the network diagram representing connections to feature maps, and where the output of conv_1, in both the top and bottom branches, is shown by a black arrow).
Dolz does not explicitly teach at least one machine readable storage device comprising a plurality of instructions to cause a device to. Further, Dolz does not explicitly teach a first and second batch normalization layer or the first batch normalization layer to including first batch parameters associated with the first modality and the second batch normalization layer including second batch parameters associated with the second modality, the second batch parameters different from the first batch parameters.
Still further, Dolz does not explicitly teach a shared convolutional layer to employ the same weights when generating the first and second output.
Deng teaches at least one machine readable storage device comprising a plurality of instructions to cause a device to (Deng page 4, section IV (B), models implemented using Tensorflow (instructions to) using a single 1080Ti GPU processor, where a person having ordinary skill in the art understands a GPU to include memory/a machine readable storage device that stores instructions).
Deng further teaches a first batch normalization layer and a second batch normalization layer, the first batch normalization layer including first batch parameters associated with the first modality and the second batch normalization layer including second batch parameters associated with the second modality, the second batch parameters different from the first batch parameters (Deng page 3, figs. 2 and 3, section C, Residual Fusion Block, in each residual fusion block, there are RU branches respective to each modality, within each RU branch is a BN (batch normalization) processing respective to each modality (batch parameters associated respectively to the first and second modalities, thus also being different from each other)).
Deng still further teaches a shared convolutional layer to employ the same/identical convolutional weights when generating the first and second output (Deng page 3, fig. 3, section C, residual Fusion Block (RFB) section, the framework of the RFB including a shared layer labeled GFU where both modalities input their respective data and including a same convolutional layer with 1x1, C/2 parameters).
Therefore, taking the teachings of Dolz and Deng together as a whole, it would have been obvious to a person having ordinary skill in the art (herein “PHOSITA”) before the effective filing date of the claimed invention to have modified the network system of Dolz to include the processor and memory of Deng, as well as to have batch normalization layers respective to the modality, and with the shared convolutional layer as taught in Dang at least because doing so would improve performance for image segmentation. See Deng Abstract, and page 6.
Regarding claim 48, Dolz teaches the first input volume and the second input volume (Dolz pages 1117–1118, Fig. 2, each imaging modality has a path, where page 1119, section C teaches that sub-volumes are used as input), but does not explicitly teach where Deng teaches wherein: the shared convolutional layer is trained as a single convolutional layer for both the first modality and the second modality (Deng pages 4–5, section B, Implementation Details, multimodal models are trained separate from the unimodal modals, where fig. 3, page 3, illustrates the modality specific streams/branches, and the cross-modal/multimodal model as the middle branch, and illustrating the single shared Conv 1x1, C/2 layer), first model parameters for the first batch normalization layer are trained for the first modality and second model parameters for the second batch normalization layer are trained for the second modality modality (Deng pages 4–5, section B, Implementation Details, multimodal models are trained separate from the unimodal modals, where fig. 3, page 3, illustrates individual BN blocks within the top branch unimodal model for one modality, and the individual BN blocks within the bottom branch unimodal model for the second modality); and
the first batch normalization layer is a first layer applied to the at least the portion of the first input after the shared convolutional layer and the second batch normalization layer is a first layer applied to the at least the portion of the second input after the second input volume (Deng page 3, fig. 3, section C, residual fusion block, the input into the BN blocks in the top unimodal stream including a downstream output from the shared convolutional layer, and the input into the BN blocks in the bottom unimodal stream including a downstream output from the shared convolutional layer as shown below (annotations in black with arrows added):
PNG
media_image3.png
614
519
media_image3.png
Greyscale
).
Therefore, taking the teachings of Dolz and Deng together as a whole, it would have been obvious to a PHOSITA before the effective filing date of the claimed invention to have modified the network system of Dolz to include the batch normalization layers respective to the modality, and with the shared convolutional layer, in the processing sequence as taught in Dang at least because doing so would improve performance for image segmentation. See Deng Abstract, and page 6.
Claim 27 is rejected under 35 U.S.C. 103 as being unpatentable over Dolz in view of Deng, as set forth above regarding claim 26 from which claim 27 depends, further in view of Ioffe et al., “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 2015. JMLR: W&CP volume 37 (herein “Ioffe”).
Regarding claim 27, while the combination of Dolz modified by Deng as disclosed above teaches the first batch normalization layer and the second batch normalization layer, Dolz as modified by Deng does not explicitly teach that the first and second batch normalization layer comprises first batch normalization layer parameters and the second batch normalization layer comprises second batch normalization layer parameters trained separately from the first activation parameters.
Ioffe teaches the first batch normalization layer comprises first batch normalization layer parameters and the second batch normalization layer comprises second batch normalization layer parameters trained separately from the first activation parameters (Ioffe section 3.2, including algorithm 2, which teaches in pseudocode the individual (separately) training of batch normalization networks (N networks) including the optimization of parameters).
Therefore, taking the teachings of Dolz as modified above by Deng and Ioffe together as a whole, it would have been obvious to a PHOSITA before the effective filing date of the claimed invention to have modified the network system of Dolz as modified by Deng to include the individual training of batch normalization networks as taught by Ioffe, at least because doing so would enable higher learning rates. See Ioffe section 3.3.
Claims 32–35, and 41, are rejected under 35 U.S.C. 103 as being unpatentable over Dolz in view of Deng, as set forth above regarding the claims from which these claims depend, further in view of Laine et al., US Patent Application Publication No. US 2020/0242739 A1 (herein “Laine”).
Regarding claims 32, and 41, with claim 32 as exemplary, Dolz teaches wherein the first fused input volume comprises sums of first feature maps from the first output volume (Dolz page 1119, output of a layer being the concatenation (sums) of feature maps from each image modality path). Dolz and Deng do not, but Laine teaches and pixel-wise shifted versions of second feature maps from the second output volume or the third input volume (Laine ¶¶43, 141–142, 145, shifting of pixels after network branch, where a network branch 212 is a convolution layer that includes n feature maps).
Therefore, taking the teachings of Dolz as modified above by Deng and Laine together as a whole, it would have been obvious to a PHOSITA before the effective filing date of the claimed invention to have modified the network system of Dolz as modified by Deng to include the pixel shifting as taught by Laine, at least because doing so would provide for de-noising of input data (Laine ¶¶4, 9).
Regarding claim 33, Dolz as modified by Deng do not explicitly teach, but Laine teaches wherein the pixel-wise shifted versions of the second feature maps comprise a third feature map shifted in a horizontal direction and a fourth feature map shifted in a vertical direction (Laine ¶141, pixel shifting (after the network branch, thus on the feature maps) on one branch are shifted left or right (horizontal) while pixel shifting on another branch are shifted up or down (vertical)).
Therefore, taking the teachings of Dolz as modified above by Deng and Laine together as a whole, it would have been obvious to a PHOSITA before the effective filing date of the claimed invention to have modified the network system of Dolz as modified by Deng to include the pixel shifting as taught by Laine, at least because doing so would provide for de-noising of input data (Laine ¶¶4, 9).
Regarding claim 34, Dolz teaches wherein the second fused input volume comprises sums of third feature maps from the second output volume (Dolz page 1119, output of a layer being the concatenation (sums) of feature maps from each image modality path). Dolz and Deng do not, but Laine teaches and pixel-wise shifted versions of fourth feature maps from the first output volume or a fourth input volume (Laine ¶¶43, 141–142, 145, shifting of pixels after network branch, where a network branch 212 is a convolution layer that includes n feature maps).
Therefore, taking the teachings of Dolz as modified above by Deng and Laine together as a whole, it would have been obvious to a PHOSITA before the effective filing date of the claimed invention to have modified the network system of Dolz as modified by Deng to include the pixel shifting as taught by Laine, at least because doing so would provide for de-noising of input data (Laine ¶¶4, 9).
Regarding claim 35, Dolz as modified by Deng teaches wherein at least one of the one or more processors is to. Dolz further teaches shuffle feature maps of the first and second output volumes to generate a third input volume corresponding to the first modality, the third input volume comprising first feature maps from the first output volume and second feature maps from the second output volume (Dolz page 1119, fig. 2, shuffling and interleaving feature map elements from the CNN layers from each modality path per equation 4);
add the fourth input volume and a fifth input volume to generate a sixth input volume for input to a second convolutional layer (Dolz page 1119, fig. 2, outputs from layers in different streams (modalities) are concatenated (add) per equation (3), where the network shown in fig. 2 illustrates at least four (and contemplates additional) stages of convolutional layering including outputs of one convolutional layer to be input into another convolutional layer of the other modality path).
Dolz as modified by Deng does not explicitly teach, but Laine teaches pixel-wise shift at least a subset of the first and second feature maps to generate a fourth input volume (Laine ¶¶43, 141–142, 145, shifting of pixels after network branch, where a network branch 212 is a convolution layer that includes n feature maps).
Therefore, taking the teachings of Dolz as modified above by Deng and Laine together as a whole, it would have been obvious to a PHOSITA before the effective filing date of the claimed invention to have modified the network system of Dolz as modified by Deng to include the pixel shifting as taught by Laine, at least because doing so would provide for de-noising of input data (Laine ¶¶4, 9).
Claims 46–47 are rejected under 35 U.S.C. 103 as being unpatentable over Dolz in view of Deng, as set forth above regarding claim 26 from which claim 46 depends, further in view of Wang et al., “Deep Multimodal Fusion by Channel Exchanging,” arXiv:2011.05005v1 [cs.CV], November 10, 2020, https://doi.org/10.48550/arXiv.2011.05005 (herein “Wang2”).
Regarding claim 46, Dolz as modified by Deng teaches the first and second output volumes (Dolz fig. 2, pages 1117–1118, section A, volume MR-T1 input to the top path and volume MR-T2 input to the bottom path) but does not explicitly teach, but Wang2 teaches wherein at least one of the one or more processors is to perform bidirectional fusion of the first and second output (Wang2 fig. 2, page 4, as shown channel exchanging from each modality is bidirectional 0 i.e. from modality 1 to modality 2 and vice versa).
Therefore, taking the teachings of Dolz as modified by Deng and Wang2 together as a whole, it would have been obvious to a PHOSITA before the effective filing date of the claimed invention to have modified the multi-modal fusion of Dolz to include the bidirectional fusion disclosed in Wang2 at least because doing so would provide a more compact and thus efficient processing network. See Wang2 Abstract.
Regarding claim 47, Dolz as modified by Deng teaches the first and second output volumes (Dolz fig. 2, pages 1117–1118, section A, volume MR-T1 input to the top path and volume MR-T2 input to the bottom path) but does not explicitly teach, but Wang2 teaches wherein at least one of the one or more processors is to perform the bidirectional fusion of the first and second output by: performing cross-modality channel shuffling (CMCS) that exchanges non-overlapping channel subsets between the first and second output (Wang2 page 3, section 3, channel exchanging networks that exchange (shuffle) channels for message fusion, where fig. 2, page 4 illustrates that the channel exchanging is done one way per subset, thus non-overlapping), and performing modality-specific pixel shifting (MSPS) that shifts grouped feature maps by one pixel in different spatial directions and adds the shifted feature maps into the output of the other modality (Wang2, figure 2, page 4, for each modality the channels comprised of grouped feature maps are shifted in height (shown in fig. 2 as being shifted up, in the height axis) and then exchanged (adds) the shifted channel to the other modality).
Therefore, taking the teachings of Dolz as modified by Deng and Wang2 together as a whole, it would have been obvious to a PHOSITA before the effective filing date of the claimed invention to have modified the multi-modal fusion of Dolz to include the bidirectional fusion disclosed in Wang2 at least because doing so would provide a more compact and thus efficient processing network. See Wang2 Abstract.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M KOETH whose telephone number is (571)272-5908. The examiner can normally be reached Monday-Thursday, 09:00-17:00, Friday 09:00-13:00, EDT/EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached at 571-272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
MICHELLE M. KOETH
Primary Examiner
Art Unit 2671
/MICHELLE M KOETH/Primary Examiner, Art Unit 2671