Last updated: April 19, 2026

Application No. 18/406,910

METHOD, APPARATUS, ELECTRONIC DEVICE AND STORAGE MEDIUM FOR MULTI-MODAL DATA PROCESSING

Non-Final OA §102§103

Filed

Jan 08, 2024

Examiner

NAKHJAVAN, SHERVIN K

Art Unit

2672

Tech Center

2600 — Communications

Assignee

BEIJING ZITIAO NETWORK TECHNOLOGY CO., LTD.

OA Round

1 (Non-Final)

Interview Optional

— +10.9% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 616 resolved cases, 2023–2026

Examiner Intelligence

NAKHJAVAN, SHERVIN K View full profile →

Grants 88% — above average

Career Allow Rate

544 granted / 616 resolved

+26.3% vs TC avg

Moderate +11% lift

Without

With

+10.9%

Interview Lift

resolved cases with interview

Typical timeline

2y 7m

Avg Prosecution

23 currently pending

Career history

639

Total Applications

across all art units

Statute-Specific Performance

§101

12.3%

-27.7% vs TC avg

§103

36.4%

-3.6% vs TC avg

§102

25.3%

-14.7% vs TC avg

§112

14.6%

-25.4% vs TC avg

Black line = Tech Center average estimate • Based on career data from 616 resolved cases

Office Action

§102 §103

DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 9, 10, 11, 17, 19 and 20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by US 20240233334 A1 to Xia et al (hereinafter ‘Xia’).
Regarding claim 1, Xia discloses a method for multi-modal data processing (Para [0005], wherein a multi-modal data retrieval method), comprising: acquiring data of original modality (Para [0028], wherein in step 101, target retrieval data is inputted into a first feature extraction network corresponding to a modality of the target retrieval data to acquire a data feature of the target retrieval data); and processing the data of the original modality by a target processing model to determine data of target modality corresponding to the data of the original modality (Para [0030] and [0031], wherein in step 103, retrieval is performed based on the target retrieval feature, the modality of the target retrieval data may be any modality, and to-be-retrieved data may also be any modality. The modality may include, for example, a text modality, an image modality, and a video modality, etc.); wherein the target processing model comprises a multi-modal pre-trained sub-model and a multi-modal feature correction sub-model (Para [0043], FIG. 3 shows a multi-modal retrieval network model including the first feature extraction network and the second feature extraction network.); a training process of the target processing model comprises training the multi-modal feature correction sub-model with parameters of the multi-modal pre-training sub-model fixed (Para [0042], wherein in step 203, a first loss value is determined based on a difference between the obtained retrieval features corresponding to the two or more pieces of first sample data having different modalities, and the first feature extraction networks and the second feature extraction networks corresponding to the modalities are adjusted, inherently as corrected, based on the first loss value.).
Regarding claim 9, Xia discloses wherein the data of original modality comprises any one of the following types of data: voice type, video type, text type, or image type (Para [0031], wherein the modality of the target retrieval data may be any modality, and to-be-retrieved data may also be any modality. The modality may include, for example, a text modality, an image modality, and a video modality, etc.).

Regarding claim 10, Xia discloses wherein the multi-modal pre-trained sub-model comprises a Transformer model (Para [0065], wherein in one possible implementation, the second feature extraction network is a Transformer model network.).
Regarding claim 11, Xia discloses an electronic device, comprising: one or more processors; a storage device for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors implement acts (Para [0076], wherein s shown in FIG. 9, the electronic device 900 may include a processing unit (for example, a central processing unit and a graphics processing unit) 901 that may perform various appropriate actions and processing based on programs stored in a read-only memory (ROM) 902 or programs loaded from a storage unit 908 into a random access memory (RAM) 903.) comprising: Please refer to the corresponding method claim 1 above for further teachings.
Regarding claims 17 and 19, please refer to the corresponding method claims 9 and 10 above for further teachings.
Regarding claim 20, Xia discloses a non-transitory storage medium comprising computer-executable instructions which, when executed by a computer processor, are configured to perform acts (Para [0078], wherein the process described above with reference to the flow diagrams may be implemented as a computer software program. For example, the embodiments of the present disclosure include a computer program product. The computer program product includes a computer program carried on a non-transitory computer-readable medium.) comprising: Please refer to the corresponding method claim 1 above for further teachings.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Xia in view of US 11,023,523 B2 to Hauptmann et al (hereinafter ‘Hauptmann’).
Regarding claim 7, Xia does not specifically disclose wherein the target processing model is applied to at least one of the following tasks: a video-based text indexing task, a text-based video indexing task, a video-based text generation task, a text-based video generation task, or a video question answering task. Hauptmann discloses at least a video-based text indexing task (column 1, line 63 through column 2, line 2, wherein the system includes an indexing engine for automatically indexing data representing the audio-visual recording, with the data being indexed in association with the one or more adjusted weights for the one or more semantic features, respectively. . . . the semantic features comprise one or more of a visual feature, a textual feature, or an audio feature). Xia and Hauptmann are combinable because they both disclose image feature extraction. Therefore, before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skilled in the art to combine the video based indexing of Hauptmann’s method with Xia’s in order for quick retrieval of an audio-visual recording 52 (column 6, lines 7-8).

Allowable Subject Matter
Claims 2-6, 8, 12-16 and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The following is a statement of reasons for the indication of allowable subject matter:  the prior art or the prior art of record specifically:
Xia and CN 114398505 A to Huang, does not disclose:
. . . . training the video feature correction branch and the text feature correction branch based on the data of the target modality and the label data corresponding to the sample data, of claims 2 and 12 combined with other features and elements of the claims;
Claims 3, 4, 8, 13, 14 and 18 depend from an allowable base claim and are thus allowable themselves;
Xia and CN 115238130 A to Wang et al, does not disclose:
. . . . wherein the multi-modal feature correction sub-model further comprises a cross-modal interaction branch; wherein an inter-modal shared parameter, which are acquired by the cross-modal interaction branch during a training process of the multi-modal feature correction sub-model, is used for alignment cross features for data of different modalities, of claims 5 and 15 combined with other features and elements of the claim;
Claims 6 and 16 depend from an allowable base claim and are thus allowable themselves.

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHERVIN K NAKHJAVAN whose telephone number is (571)272-5731. The examiner can normally be reached Monday-Friday 9:00-12:00 PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sue Lefkowitz can be reached at (571)272-3638. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SHERVIN K NAKHJAVAN/           Primary Examiner, Art Unit 2672

Read full office action

Prosecution Timeline

Jan 08, 2024

Application Filed

Dec 27, 2025

Non-Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/364,711

Patent 12602766

METHOD, APPARATUS, DEVICE, MEDIUM AND PRODUCT FOR DETECTING ALIGNMENT OF BATTERY ELECTRODE PLATES

2y 5m to grant Granted Apr 14, 2026

18/269,178

Patent 12597159

SYSTEM, INFORMATION PROCESSING APPARATUS, METHOD, AND COMPUTER-READABLE MEDIUM

2y 5m to grant Granted Apr 07, 2026

18/196,338

Patent 12592313

ANALYZING SURGICAL VIDEOS TO IDENTIFY A BILLING CODING MISMATCH

2y 5m to grant Granted Mar 31, 2026

18/188,329

Patent 12579671

MINIATURIZED PHASE CALIBRATION APPARATUS FOR TIME-OF-FLIGHT DEPTH CAMERA

2y 5m to grant Granted Mar 17, 2026

18/894,638

Patent 12561791

METHOD TO CALIBRATE, PREDICT, AND CONTROL STOCHASTIC DEFECTS IN EUV LITHOGRAPHY

2y 5m to grant Granted Feb 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

88%

Grant Probability

99%

With Interview (+10.9%)

2y 7m

Median Time to Grant

Low

PTA Risk

Based on 616 resolved cases by this examiner. Grant probability derived from career allow rate.