Last updated: April 19, 2026

Application No. 17/502,385

METHOD FOR TRAINING CROSS-MODAL RETRIEVAL MODEL, ELECTRONIC DEVICE AND STORAGE MEDIUM

Final Rejection §101

Filed

Oct 15, 2021

Examiner

CHIUSANO, ANDREW TSUTOMU

Art Unit

2144

Tech Center

2100 — Computer Architecture & Software

Assignee

BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

OA Round

4 (Final)

This examiner grants 55% of cases after interview

— +28.0% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 392 resolved cases, 2023–2026

Examiner Intelligence

CHIUSANO, ANDREW TSUTOMU View full profile →

Grants 55% of resolved cases

Career Allow Rate

217 granted / 392 resolved

At TC average

Strong +28% interview lift

Without

With

+28.0%

Interview Lift

resolved cases with interview

Typical timeline

3y 2m

Avg Prosecution

22 currently pending

Career history

414

Total Applications

across all art units

Statute-Specific Performance

§101

12.7%

-27.3% vs TC avg

§103

57.4%

+17.4% vs TC avg

§102

10.7%

-29.3% vs TC avg

§112

13.6%

-26.4% vs TC avg

Black line = Tech Center average estimate • Based on career data from 392 resolved cases

Office Action

§101

DETAILED ACTION
This Office Action is sent in response to Applicant’s Communication received 12/11/2025 for application number 17/502,385. 
Claims 1, 6-9, 12, 17-20 are pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 6-9, 12, 17-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Independent claims 1, 12, and 20 recite(s) (for representative claim 1)
A computer-implemented method for training a cross-modal retrieval model, wherein the cross-modal retrieval model is used by a cross-modal retrieval system, wherein a cross-modal retrieval is used for a retrieval of data of one modal using data of another modal, the method comprising: Determining a similarity of a cross-modal sample pair according to the cross-modal sample pair, the cross-modal sample pair comprising a sample of a first modal and a sample of a second modal, and the first modal being different from the second modal, wherein the first modal is a text, and the second modal is a video, and the cross-modal sample pair comprises a positive sample pair and a negative sample pair, the positive sample pair comprises an anchor sample and a positive sample, the negative sample pair comprises the anchor sample and a negative sample, the anchor sample has the first modal, and the positive sample and the negative sample have the second modal, and wherein the anchor sample is a text in a sample set, the positive sample is a video related to the text in the sample set, and the negative sample is a randomly selected video which is related or not related to the text in the sample set; determining a soft margin based on the similarity, and determining a soft margin loss function based on the soft margin, wherein the soft margin is a non-fixed value; and determining a total loss function based on the soft margin loss function, and training a cross-modal retrieval model according to the total loss function, wherein the cross-modal retrieval model is used by a cross-modal retrieval system configured to receive a text input by a user, determine a video matched with the text using the cross-modal retrieval model, and feed the matched video back to the user, wherein the determining a soft margin based on the similarity comprises: calculating a distance between similarity of the positive sample pair and similarity of the negative sample pair to obtain a similarity distance; and normalizing the similarity distance to obtain a normalized similarity distance, and determining the normalized similarity distance as the soft margin, and wherein the similarity distance comprises a similarity distance in the first modal and a similarity distance in the second modal, the similarity distance in the first modal being obtained by processing the sample pair in the first modal using a semantic representation model in the first modal and the similarity distance in the second modal being obtained by processing the sample pair in the second modal using a semantic representation model in the second modal ;the determining a soft margin based on the similarity and determining a soft margin loss function based on the soft margin comprises: determining a soft margin in the first modal based on the similarity distance in the first modal, and calculating a contrastive loss function in the first modal based on the soft margin in the first modal; determining a soft margin in the second modal based on the similarity distance in the second modal, and calculating a contrastive loss function in the second modal based on the soft margin in the second modal; and calculating the soft margin loss function according to the contrastive loss function in the first modal and the contrastive loss function in the second modal.

(Step 2A, prong 1) The underlined portions of the claim recite an abstract idea, specifically a series of mathematical calculations. Here, although the claims do not use mathematical symbols, "[w]ords used in a claim operating on data to solve a problem can serve the same purpose as a formula." In re Grams, 888 F.2d 835, 837 and n.1, 12 USPQ2d 1824, 1826 and n.1 (Fed. Cir. 1989). Also see MPEP 2106.04(a)(2). The underlined steps above explicitly recite and require mathematical calculations.  
(Step 2A, prong 2) This judicial exception is not integrated into a practical application. The claims recite the additional elements of [a] generic computer components (a computer in claim 1, generic computer components in claim 12, computer-readable medium in claim 20), and [b] that cross-modal retrieval is used for retrieving data of one modal using data of another modal, and that the cross-modal retrieval model is used by a cross-modal retrieval system for cross-modal retrieval, the system configured to receive text from a user, and find a matching video. Additional elements [a] are mere instructions to apply the exception as they merely add generic computer components after the fact to the abstract idea. See MPEP 2106.05(f). Additional element [b] is a field-of-use limitation. Specifically, the phrasing of the additional limitation confines the mathematical to intended for use in a cross-modal retrieval system that is capable of receiving text from a user and matching it to a video, but does not positively recite that the cross-modal retrieval is performed. In other words, this additional element only limits the field of use that the mathematical calculations are intended for use in but does not actually require any cross-modal retrieval. This is analogous to the field of use limitation in Parker v. Flook, in which a mathematical calculation was limited to use in a process comprising the catalytic chemical conversion of hydrocarbons.” See Parker v. Flook, 437 U.S. 584, 586, 198 USPQ 193, 196 (1978). See MPEP 2106.05(h). Even when the additional limitations are considered in the claim as a whole, the additional elements do not integrate the abstract idea into a practical application because they are mere instructions to apply the mathematical calculations on generic computer components and a field of use for the mathematical calculations. 
(Step 2B) The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional element [a] is a mere instruction to apply the exception as it merely add generic computer components after the fact to the abstract idea (as explained above) and additional element [b] is a field of use limitation (as explained above). Even when both additional elements considered in combination with the abstract idea and in the claim as a whole, the additional elements do not amount to significantly more than the judicial exception because they only add mere instructions to apply exception and field of use limitations to the mathematical calculations. Taken together, the additional limitations specify that the mathematical calculations are performed on generic computing components, are intended for use in the field of cross-modal retrieval.
The Examiner further notes that when considering if a claim is patent eligible because it is an improvement to the functioning of a computer, the Examiner must consider (1) if the specification discloses a technical improvement, (2) if the claim reflects the improvement, and (3) if more than just the judicial exception alone provides the technical improvement. See MPEP 2106.05(a).
First, the specification does disclose a technical improvement of an improved cross-modal retrieval model, which allows a user to retrieve a video using text (the specification does not explicitly state the improvement, but a person having ordinary skill in the art would recognize the disclosed technical details could provide more accurate cross-modal retrieval). 
However, for (3) the judicial exception alone provides the technical improvement. The claim only requires a set of the mathematical calculations that are used in part to train the model. Also, as discussed in MPEP 2106.05(a), the technical improvement must come from the additional elements, or the additional elements in combination with the judicial exception. However, the additional elements in the claims only recite generic computer components to run the calculations and field of use limitations for the calculations; adding a generic computer post-hoc to an abstract idea and limiting mathematical calculations to an intended use does not amount to significantly more than the abstract idea itself or provide a technical improvement in the functioning of a computer.
 With respect to dependent claims 6-9 and 17-19, each of these claims add additional mathematical calculations to the mathematical calculations in the independent claims. Therefore, they only add to the recited abstract idea from the independent claims.

Response to Arguments
Applicant's arguments filed 12/11/2025 have been fully considered but they are not persuasive. In particular, Applicant argues that the claim requires that the model is used by a cross-modal retrieval system and the model receives text input by a user to determine a video matched with the text, and therefore the invention improves the performance of cross-modal retrieval. The Examiner respectfully disagrees. Although the claimed mathematical calculations are an improvement over prior art mathematical calculations for cross-modal retrieval, mathematical calculations alone are not patentable subject matter. “…[T]he judicial exception alone cannot provide the improvement,” to the functioning of a computer or other technology. MPEP § 2106.05(a). The improvement can be provided by additional elements in combination with the recited judicial exception, but the additional elements in the claim beyond the mathematical calculations do not provide an incentive concept. 
For additional elements, Applicant points to the limitations of (in the preamble), “a computer-implemented method for training a cross-modal retrieval model which is used by a cross-modal retrieval system, wherein a cross-modal retrieval is used for a retrieval of data of one modal using data of another modal,” and, “training a cross-modal retrieval model according to the total loss function, wherein the cross-modal retrieval model is configured to receive a text input by a user, determine a video matched with the text using the cross-modal retrieval model, and feed the matched video back to the user.” The preamble limitation just generally states what cross-modal retrieval is (and does not require any cross-modal retrieval, nor does it tie into any later limitation of the claim). The second limitation states that the trained model must be capable of cross-modal text to video retrieval. Importantly, however, this limitation does not require any cross-modal retrieval to be performed for a user: the claims only require a set of mathematical calculations to be performed on a computer. The calculated model must be “configured to” to perform text-to-video cross-modal retrieval, but the claim does not ever positively state that improved cross-modal retrieval is performed. In other words, this additional limitation is limiting the technical field of calculations to a text-to-video model, but improved text-to-video retrieval using the trained model is not required by the claim. Therefore, these additional elements do not provide an improvement to the functioning of a computer sufficient to integrate the mathematical calculations into a practical application, nor do they amount to significantly more than the mathematical calculations themselves. 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Andrew T. Chiusano whose telephone number is (571)272-5231. The examiner can normally be reached M-F, 10am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tamara Kyle can be reached at 571-272-4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ANDREW T CHIUSANO/Primary Examiner, Art Unit 2144

Read full office action

Prosecution Timeline

Oct 15, 2021

Application Filed

Feb 22, 2025

Non-Final Rejection — §101

Apr 15, 2025

Response Filed

Jul 26, 2025

Final Rejection — §101

Aug 27, 2025

Response after Non-Final Action

Sep 24, 2025

Request for Continued Examination

Sep 26, 2025

Response after Non-Final Action

Sep 30, 2025

Non-Final Rejection — §101

Dec 11, 2025

Response Filed

Mar 24, 2026

Final Rejection — §101 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/541,823

Patent 12596767

ACTIVE LEARNING DRIFT ANALYSIS AND TRAINING

2y 5m to grant Granted Apr 07, 2026

17/488,261

Patent 12591771

DYNAMIC QUANTIZATION FOR ENERGY EFFICIENT DEEP LEARNING

2y 5m to grant Granted Mar 31, 2026

18/147,340

Patent 12561045

CONTENT-BASED MENUS FOR TABBED USER INTERFACE

2y 5m to grant Granted Feb 24, 2026

17/185,045

Patent 12547927

DETECTING ASSOCIATED EVENTS

2y 5m to grant Granted Feb 10, 2026

17/153,282

Patent 12541686

METHOD AND APPARATUS WITH NEURAL ARCHITECTURE SEARCH BASED ON HARDWARE PERFORMANCE

2y 5m to grant Granted Feb 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

5-6

Expected OA Rounds

55%

Grant Probability

83%

With Interview (+28.0%)

3y 2m

Median Time to Grant

High

PTA Risk

Based on 392 resolved cases by this examiner. Grant probability derived from career allow rate.