Last updated: May 04, 2026

Application No. 17/697,667

SYSTEMS AND PROCESSES FOR NATURAL LANGUAGE PROCESSING

Final Rejection §103

Filed

Mar 17, 2022

Examiner

CHAVEZ, RODRIGO A

Art Unit

2658

Tech Center

2600 — Communications

Assignee

Smarsh Inc.

OA Round

4 (Final)

This examiner grants 51% of cases after interview

— +38.1% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 231 resolved cases, 2023–2026

Examiner Intelligence

CHAVEZ, RODRIGO A View full profile →

Grants 51% of resolved cases

Career Allowance Rate

117 granted / 231 resolved

-11.4% vs TC avg

Strong +38% interview lift

Without

With

+38.1%

Interview Lift

resolved cases with interview

Typical timeline

3y 3m

Avg Prosecution

20 currently pending

Career history

251

Total Applications

across all art units

Statute-Specific Performance

§101

16.2%

-23.8% vs TC avg

§103

53.6%

+13.6% vs TC avg

§102

20.7%

-19.3% vs TC avg

§112

5.6%

-34.4% vs TC avg

Black line = Tech Center average estimate • Based on career data from 231 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments, see Remarks, filed 07/23/2025, with respect to the rejection of claims 1-5 and 7-20 under 35 U.S.C. 101 have been fully considered and are persuasive.  The rejection of claims 1-5 and 7-20 under 35 U.S.C. 101 has been withdrawn. 

Applicant’s arguments with respect to claim(s) 1-5 and 7-20 have been considered but are moot because of the new ground of rejection under 35 U.S.C. 103 in view of Singh and Higbee for claims 1-5, 7-15, 17, 18 and 20; and Singh, Higbee and Kim for claims 16 and 19.
	
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 7-15, 17, 18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Singh (US PG Pub 20210397610) in view of Higbee (US PG Pub 20210021612).	
	As per claims 1, 9 and 15, Singh discloses:
	A method performed by one or more computing devices (Singh; Fig. 1; p. 0038), system and non-transitory computer-readable medium comprising:	a memory (Singh; Fig. 1; p. 0038 - The server device 120 may comprise a computing device with one or more central processing units and/or graphical processing units and a memory resource); and 	at least one computing device in communication with the memory (Singh; Fig. 1; p. 0038 - The server device 120 may comprise a computing device with one or more central processing units and/or graphical processing units and a memory resource), the at least one computing device being configured to:	receive a plurality of first communication data items (Singh; Fig. 4, item 405; p. 0053 - At block 405, training data is obtained. The training data comprises query data samples. The query data samples may comprise at least data derived from text data representing a query; see also p. 0060), wherein the plurality of first communication data items comprises a plurality of historical communications (Singh; p. 0090 – Other dataset filtering may include removing short responses with a common non-useful pattern, removing responses indicating errors or invalid queries, and removing custom domains such as dictionary or translation requests. In tests, a process of grouping query transcriptions based on clustered text responses is first performed (e.g., clustered vector representations of the response with identical or very close matches) followed by clustering of the vector representations of the queries. The latter clustering may be based on one or more of domain, written response cluster and query content; p. 0076);	generate a cluster based on the plurality of first communication data items (Singh; Fig. 4, item 415; p. 0054-0055 - vector representations of the query data samples are obtained… vector representations… are clustered; see also p. 0061); 	intercept a plurality of second communication data items communicated between a first computing device and at least one second computing device (Singh; p. 0040 - At The server device 120 in FIG. 1 is configured to receive query data 140 from one or more of the client devices 110. The query data 140 may comprise text data representing a query placed by a user, e.g. either by voice following speech-to-text processing or by text entry; see also p. 0039); 	generate metadata for each of the plurality of second communication data items (Singh; p. 0040 - The query data 140 may also have associated metadata such as conversation state, user location information, and other user profile information);	generate at least one vector based on the plurality of second communication data items and the metadata generated for each of the plurality of the second communication data items (Singh; p. 0054-0055 - vector representations of the query data samples are obtained…); 	determine a similarity score between the at least one vector and the cluster (Singh; p. 0076 - In one case, this may be combined with a cosine similarity score generated by comparing vector representations of the source and target data samples).	Singh, however, fails to specifically disclose wherein the plurality of historical communications are associated with a violation of a compliance rule, wherein the cluster is associated with the violation of the compliance rule, wherein the similarity score indicates the violation of the compliance rule; wherein the similarity score is based on at least one metadata criteria associated with the cluster; in response to the similarity score meeting a predefined threshold: determining that at least one of the plurality of second communication data items violates the compliance rule; generating additional metadata for the at least one of the plurality of second communication data items based on the cluster; and storing the additional metadata associated with the at least one of the plurality of second communication data items.	Higbee does teach wherein the plurality of historical communications are associated with a violation of a compliance rule, wherein the cluster is associated with the violation of the compliance rule, wherein the similarity score indicates the violation of the compliance rule (Higbee; p. 0158 - Messages can be clustered based on the application of rules to messages that have been reported as suspicious (potentially violating compliance rules)… Similarities can be identified based on application of YARA rules to messages… In some embodiments, the system can use a plagiarism detection system, n-gram analysis, or comparable system to identify similar phishing stories, flag corresponding messages as suspicious, and cluster messages so identified as embodying a similar phishing story); wherein the similarity score is based on at least one metadata criteria associated with the cluster (Higbee; p. 0170 - Fuzzy hashing and string similarity algorithms can be used to dynamically cluster related phishing emails through the use of Phishing Similarity Indicators (PSI) (metadata)); in response to the similarity score meeting a predefined threshold: determine that at least one of the plurality of second communication data items violates the compliance rule (Higbee; p. 0160 - The cluster module may perform a cluster operation to group similar messages, as described above. For example, one such cluster operation may be based on the average distance of the incoming message to all messages in each cluster, wherein a message may be assigned to at least one cluster if the average distance is below some threshold); generate additional metadata for the at least one of the plurality of second communication data items based on the cluster (Higbee; p. 0310 - The rules module can also develop rules based upon reported files and extracted information from the reported messages. This feature can work in combination with the interdiction module. As a message meets specific reporting thresholds, the rules module can be automatically implemented or an administrator can implement the rules upon review. This can include extraction of header information, content information or any other information that the management console module is capable of extracting. The extraction can be automatic upon meeting a specific threshold, such as number of people reporting the same message or reporting user reputation score above a threshold. The system can then aggregate the similar characteristics or pattern matching to develop rules (generate additional metadata). These can include if specific headers are identified, attachments, links, message content or any other element that malware and virus scanning programs detect); and storing the additional metadata associated with the at least one of the plurality of second communication data items (Higbee; p. 0312-313 – rules are stored in the library and can be shared community rules).	Therefore, it would have been obvious to one of ordinary skill in the art to modify the method system and non-transitory computer-readable medium of Singh to include wherein the plurality of historical communications are associated with a violation of a compliance rule, wherein the cluster is associated with the violation of the compliance rule, wherein the similarity score indicates the violation of the compliance rule; wherein the similarity score is based on at least one metadata criteria associated with the cluster; in response to the similarity score meeting a predefined threshold: determining that at least one of the plurality of second communication data items violates the compliance rule; generating additional metadata for the at least one of the plurality of second communication data items based on the cluster; and storing the additional metadata associated with the at least one of the plurality of second communication data items, as taught by Higbee, because when users identify a suspicious message, it would be advantageous to rapidly remove that message from all other user accounts that have received that same or a similar suspicious message, and to rapidly restore that message if the message is subsequently determined to be benign (Higbee; p. 0006).

	As per claim 2, Singh in view of Higbee discloses:	The method of claim 1, wherein intercepting the plurality of second communication data items comprises intercepting communication data at a network appliance (Singh; p. 0039 - The network interface 122 is configured to receive data from the client devices 110 over the network 130).

	As per claim 3, Singh in view of Higbee discloses:	The method of claim 1, further comprising: retrieving at least one compliance rule associated with the cluster (Singh; p. 0092 - the mapping is conditional upon a weighted average of the model prediction score and the similarity score being above a threshold); and applying the at least one compliance rule to determine whether the similarity score meets the predefined threshold (Singh; p. 0092 - the mapping is conditional upon a weighted average of the model prediction score and the similarity score being above a threshold; see also p. 0095).

	As per claim 4, Singh in view of Higbee disclose:	The method of claim 1, wherein determining the similarity score comprises determining a distance between the at least one vector and the cluster (Singh; p. 0055 - Clustering may comprise evaluating a distance metric within vector space, such as a cosine similarity distance, and grouping neighboring points within vector space based on the distance metric; see also p. 0076).

	As per claim 5, Singh in view of Higbee disclose:	The method of claim 4, wherein the distance comprises a plurality of dimensions (Singh; p. 0055 - it should be noted that clustering is typically performed in vector spaces with hundreds of dimensions (e.g., corresponding to hundreds of vector elements)—this is typically difficult to visualize in a world of 2 or 3 dimensions).

	As per claim 7, Singh in view of Higbee discloses:	The method of claim 1, wherein generating the cluster comprises: generating a plurality of vectors individually associated with the plurality of first communication data items (Singh; p. 0061 - the text data pairs 512, 514 are converted into corresponding vector pairs 522, 524…); and defining a shape comprising the plurality of vectors (Singh; p. 0054-0055 - vector representations of the query data samples are obtained… vector representations… are clustered).

	As per claim 8, Singh in view of Higbee discloses:	The method of claim 1, wherein generating the cluster comprises: generating a plurality of vectors individually associated with the plurality of first communication data items (Singh; p. 0061 - the text data pairs 512, 514 are converted into corresponding vector pairs 522, 524…); computing a centroid of the plurality of vectors (Singh; p. 0065 - In a subsequent iteration of clustering, clusters 601 and 603 may be combined (e.g., based on a distance between centroids or averages of each cluster and a second predefined distance threshold)); and defining the cluster based on a predetermined distance from the centroid (Singh; p. 0065 - In a subsequent iteration of clustering, clusters 601 and 603 may be combined (e.g., based on a distance between centroids or averages of each cluster and a second predefined distance threshold)).	As per claim 10, Singh in view of Higbee discloses:	The system of claim 9, wherein the at least one computing device is further configured to: generate a plurality of vectors individually corresponding to the plurality of first data items; and generate the cluster based on the plurality of vectors (Singh; Fig. 4, item 415; p. 0054-0055 - vector representations of the query data samples are obtained… vector representations… are clustered; see also p. 0061).

	As per claim 11, Singh in view of Higbee discloses:	The system of claim 9, wherein the at least one computing device is further configured to cause a user interface to be rendered on a display, the user interface comprising a cluster visualization of the cluster (Singh; Fig. p. 0037 - the user may enter text via an onscreen keyboard or another user input device; see also p. 0040 - The response data 142 may comprise text data and/or data derived from a response representation that is used to generate one or more of a text response, a voice response (e.g., via a text-to-speech system) and a visual response (e.g., via one or more display devices)).

	As per claim 12, Singh in view of Higbee discloses:	The system of claim 11, wherein the at least one computing device is further configured to: receive an input via the user interface to adjust a size of the cluster (Singh; p. 0088 - The following are some examples of hyperparameters that may commonly be selected and/or optimized as hyperparameters of the machine learning model: word embedding size for the input and output); determine an updated similarity score between the at least one vector and the adjusted cluster (Singh; p. 0076 - In one case, this may be combined with a cosine similarity score generated by comparing vector representations of the source and target data samples); and identify at least one different one of the plurality of second communication data items for review based at least in part on the updated similarity score (Singh; p. 0076 - filtering may be performed based on a confidence or prediction score that is output by the machine learning system as part of standard operation).

	As per claim 13, Singh in view of Higbee discloses:	The system of claim 9, wherein the plurality of first communication data items comprises a plurality of textual strings (Singh; p. 0040 - The query data 140 may comprise text data representing a query placed by a user, e.g. either by voice following speech-to-text processing or by text entry).

As per claim 14, Singh in view of Higbee discloses:	The system of claim 9. wherein the plurality of second communication data items comprises data from at least one of: a text message, an email, an instant message, and a phone call sent from the first computing device to at least one second computing device (Singh; p. 0040 - The query data 140 may comprise text data representing a query placed by a user, e.g. either by voice following speech-to-text processing or by text entry).	As per claim 17, Singh in view of Higbee discloses:	The non-transitory computer-readable medium of claim 15, wherein the compliance rule comprises at least one first rule when the first computing device is within a geofence when the plurality of second communication data items were communicated and at least one second rule differing from the at least one first rule when the first computing device is outside of the geofence when the plurality of second communication data items were communicated (Singh; p. 0040 - The query data 140 may also have associated metadata such as… user location information…).	As per claim 18, Singh in view of Higbee discloses:	The non-transitory computer-readable medium of claim 15, wherein the program further causes the at least one computing device to: receive a plurality of third communication data items (Singh; p. 0039); tuning the cluster based on the plurality of third communication data items to generate an updated cluster (Singh; p. 0092 – grouping operation may be repeated multiple times); determine an updated similarity score between the plurality of vectors and the updated cluster (Singh; p. 0076 - In one case, this may be combined with a cosine similarity score generated by comparing vector representations of the source and target data samples); and identify at least one different ones of the plurality of second communication data items for review by applying the at least one compliance rule based on the updated similarity score (Singh; p. 0092 - the mapping is conditional upon a weighted average of the model prediction score and the similarity score being above a threshold).	As per claim 20, Singh in view of Higbee discloses:	The non-transitory computer-readable medium of claim 15, wherein the program further causes the at least one computing device to identify a plurality of additional communication data items for review based on a similarity to the at least one of the plurality of second communication data items identified for review (Singh; p. 0076 - filtering may be performed based on a confidence or prediction score that is output by the machine learning system as part of standard operation… this may comprise a probability value that in other comparative uses indicates a level of confidence for each possible output sequence (e.g., the confidence in each of a set of translations when used for machine translation)).
	Claims 16 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Singh in view of Higbee and further in view of Kim (US PG Pub 20200387676).
	As per claim 16, Singh in view of Higbee discloses:	The non-transitory computer-readable medium of claim 15, upon which claim 16 depends.	Singh in view of Higbee, however, fails to disclose wherein the program further causes the at least one computing device to: determine a first language corresponding to a first one of the plurality of second data items; determine a second language corresponding to a second one of the plurality of second data items; generate a first vector corresponding to the first one of the plurality of second data items using a first algorithm corresponding to the first language; and generate a second vector corresponding to the second one of the plurality of second data items using a second algorithm corresponding to the second language, wherein the plurality of vectors comprise the first vector and the second vector.	Kim does teach wherein the program further causes the at least one computing device to: determine a first language corresponding to a first one of the plurality of second data items (Kim; p. 0035 – distinguishing first language of first speaker); determine a second language corresponding to a second one of the plurality of second data items (Kim; p. 0035 – distinguishing second language of second speaker); generate a first vector corresponding to the first one of the plurality of second data items using a first algorithm corresponding to the first language (Kim; p. 0069 – context vector); and generate a second vector corresponding to the second one of the plurality of second data items using a second algorithm corresponding to the second language, wherein the plurality of vectors comprise the first vector and the second vector (Kim; p. 0069 – context vector; see also p. 0072).	Therefore, it would have been obvious to one of ordinary skill in the art to modify the non-transitory computer-readable medium of Singh in view of Higbee to include wherein the program further causes the at least one computing device to: determine a first language corresponding to a first one of the plurality of second data items; determine a second language corresponding to a second one of the plurality of second data items; generate a first vector corresponding to the first one of the plurality of second data items using a first algorithm corresponding to the first language; and generate a second vector corresponding to the second one of the plurality of second data items using a second algorithm corresponding to the second language, wherein the plurality of vectors comprise the first vector and the second vector, as taught by Kim, because by repeatedly learning an optimal result among translation results of a sentence, a natural translation result may be obtained (Kim; p. 0006).	As per claim 19, Singh in view of Higbee discloses:	The non-transitory computer-readable medium of claim 15, wherein the program further causes the at least one computing device to: capture an audio file (Singh; p. 0040 - The query data 140 may comprise text data representing a query placed by a user, e.g. either by voice following speech-to-text processing or by text entry); and analyze the audio file using a speech to text algorithm to generate a textual string, wherein the plurality of second data items comprises the textual string (Singh; p. 0040 - The query data 140 may comprise text data representing a query placed by a user, e.g. either by voice following speech-to-text processing or by text entry).	Singh in view of Higbee, however, fails to disclose wherein the audio file corresponds to a phone call between the first computing device and the at least one second computing device.	Kim does teach wherein the audio file corresponds to a phone call between the first computing device and the at least one second computing device (Kim; Fig. 2C, items 200-5 and 200-6; p. 0060).	Therefore, it would have been obvious to one of ordinary skill in the art to modify the non-transitory computer-readable medium of Singh in view of Higbee to include wherein the audio file corresponds to a phone call between the first computing device and the at least one second computing device, as taught by Kim, because by repeatedly learning an optimal result among translation results of a sentence, a natural translation result may be obtained (Kim; p. 0006).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art made of record and not relied upon includes:	Yeturu (US Patent 10108695 B1) discloses digital content may be processed to determine a set of containers in the content. Each container may correspond to a particular text element of the digital content such as a line of text on a page of a digital content file. Container data indicating values of base content properties for each container may be obtained. Derived content properties may be determined from the base content properties and values of the derived content properties may be determined for each container. Multiple iterations of a clustering algorithm may be executed, where each iteration involves grouping the containers into a set of clusters by applying a particular distance function to the values of a particular set of base and/or derived properties for each container. The distance function and set of properties utilized at each iteration may be configurable to obtain clusters that can be associated with particular semantic classifiers (Yeturu; Abstract).
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Rodrigo A Chavez whose telephone number is (571)270-0139. The examiner can normally be reached Monday - Friday 9-6 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 5712727602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/RODRIGO A CHAVEZ/Examiner, Art Unit 2658


/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658

Read full office action

Prosecution Timeline

Show 2 earlier events

Jun 28, 2024

Response Filed

Jul 13, 2024

Final Rejection — §103

Jan 21, 2025

Request for Continued Examination

Jan 24, 2025

Response after Non-Final Action

Mar 22, 2025

Non-Final Rejection — §103

Jul 23, 2025

Response Filed

Oct 17, 2025

Final Rejection — §103

Apr 24, 2026

Response after Non-Final Action

Precedent Cases

Applications granted by this same examiner with similar technology

18/175,355

Patent 12597430

MULTI-CHANNEL SIGNAL GENERATOR, AUDIO ENCODER AND RELATED METHODS RELYING ON A MIXING NOISE SIGNAL

3y 1m to grant Granted Apr 07, 2026

17/579,750

Patent 12579984

DATA AUGMENTATION SYSTEM AND METHOD FOR MULTI-MICROPHONE SYSTEMS

4y 1m to grant Granted Mar 17, 2026

17/513,419

Patent 12541653

ENTERPRISE COGNITIVE SOLUTIONS LOCK-IN AVOIDANCE

4y 3m to grant Granted Feb 03, 2026

17/532,315

Patent 12542136

DYNAMICALLY CONFIGURING A WARM WORD BUTTON WITH ASSISTANT COMMANDS

4y 2m to grant Granted Feb 03, 2026

17/450,015

Patent 12531077

METHOD AND APPARATUS IN AUDIO PROCESSING

4y 3m to grant Granted Jan 20, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

5-6

Expected OA Rounds

51%

Grant Probability

89%

With Interview (+38.1%)

3y 3m (~0m remaining)

Median Time to Grant

High

PTA Risk

Based on 231 resolved cases by this examiner. Grant probability derived from career allowance rate.