Last updated: May 29, 2026

Application No. 19/004,267

method of improving text vectorization using depth-first search and radix trees

Non-Final OA §101§103§112

Filed

Dec 28, 2024

Priority

Jan 08, 2024 — provisional 63/618,776

Examiner

LE, UYEN T

Art Unit

2156

Tech Center

2100 — Computer Architecture & Software

Assignee

Verses Technologies Usa Inc.

OA Round

1 (Non-Final)

Interview Optional

— +9.7% interview lift. Interview lift (+9.7%) is below the 15.0% threshold. A written response is recommended.

Based on 797 resolved cases, 2023–2026

Examiner Intelligence

LE, UYEN T View full profile →

Grants 84% — above average

Career Allowance Rate

669 granted / 797 resolved

+28.9% vs TC avg

Moderate +10% lift

Without

With

+9.7%

Interview Lift

resolved cases with interview

Typical timeline

2y 8m

Avg Prosecution

18 currently pending

Career history

826

Total Applications

across all art units

Statute-Specific Performance

§101

4.3%

-35.7% vs TC avg

§103

52.9%

+12.9% vs TC avg

§102

7.8%

-32.2% vs TC avg

§112

9.9%

-30.1% vs TC avg

Black line = Tech Center average estimate • Based on career data from 797 resolved cases

Office Action

§101 §103 §112

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-4 are pending.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-4 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 lines 6-10 are unclear. what does applicant intend to mean by “as the longest string is not found and the next shorter string needs to be selected until the longest available string is located”? what is a string made of and what makes up “the longest string”? note also “the longest string” and “the next shorter string” lack antecedent basis.
Furthermore according to line 4-5, each word, part of a word, sentence or part of a sentence is associated with a vector, then which vector is the vector associated with the longest string?
Claim 1 last three lines are unclear. it seems each word, part of word, sentence, part of sentence is already associated with a vector per lines 4-5. Therefore it is not clear how a “final vector” is created and what is “being vectorized”.
Art rejection is applied to claims 1-4 as best understood in light of the rejection under 35 U.S.C. 112(b) discussed above.
Claim Objections
Claims 2-4 are objected to because of the following informalities: “a method of claim 1” should be –the method of claim 1—to correctly refer back to parent claim 1. 
Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-4 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Claim 1 subject matter eligibility analysis: 
Step 1: claim 1 recites a method thus seems to be directed to a process which is one of the four statutory categories of invention. 
Step 2A Prong 1: The claim recites “searching the words, parts of words, sentences, or parts of sentences …until the longest available string is located …” These operations are processes that under the broadest reasonable interpretation, cover performance of the limitations by a human mind of with the aid of pen and paper. That is other than reciting a “computer program”, nothing in the claim element precludes the operations from practically being performed by a human mind with the aid of pen and paper. If a claim limitation, under its broadest reasonable interpretation, cover performance of the limitation in the mind, then it falls within the “Mental Processes’ grouping of abstract idea (concept performed in the human mind including an observation, evaluation, judgment and opinion). The mere nominal recitation of “wherein the storing, searching and creating is done using a computer program” does not take the claim limitation out of the mental processes grouping since the computer program is mere instructions to apply the exception using a generic computer component. Thus, the claim recites a mental process. 
Step 2A Prong 2: the judicial exception is not integrated into a practical application. Claim 1 recites the additional element “creating a final vector representing the words, parts of words, sentences or parts of sentences being vectorized…”, the creating step amounts to mere data manipulation considered insignificant extra solution activity because it does not impose any meaningful limits on practicing the abstract idea, (see MPEP 2106.05(g)). The recitation of creating a final vector does not integrate the mental process into a practical application, does not improve any technology or technical field, does not apply the judicial exception with or by use of a particular machine, does not add specific limitation other than what is well-understood, routine, conventional activity in the field, does not add unconventional steps that confine the claim to a particular useful application, does not include other meaningful limitations beyond linking the use of the judicial exception to a particular technological environment. 
Step 2B: claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The additional elements “storing data about words, parts of words, sentences, or parts of sentences in a database…” is recognized by the courts as well-understood, routine, and conventional activities when they are claimed in a merely generic manner. (See MPEP 2106.05(d)(II) (iv). 
Thus claim 1 is rejected under 35 USC 101 as being an abstract idea without significantly more.
Claim 2 merely further describes the database, considered insignificant extra solution activities (MPEP 2106.05(g).
Claim 3 merely recites searching for the section containing the longest string, considered insignificant extra solution activities (MPEP 2106.05(g).
Claim 4 merely further describes searching from left to right and combining found vectors in order, considered insignificant extra solution activities (MPEP 2106.05(g).
As discussed above, although the dependent claims are more detailed than their parent claim, none amounts to significantly more than the abstract idea. No claim is eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-4 is/are rejected under 35 U.S.C. 103 as being unpatentable by Morris et al (US 20090171953 A1), further in view of Lai (US 10380210 B1).
Regarding claim 1, Morris substantially teaches a method of achieving improvement in the vectorization of text data, comprising, 
storing data about words, parts of words, sentences, or parts of sentences in a database, wherein the database defines a discrete architecture representing the data about words, parts of words, sentences, or parts of sentences as a graph and associating each word, part of a word, sentence, or part of a sentence with a vector (see. [0006] In various embodiments, techniques for recognizing patterns within a string are provided. More specifically, and in an embodiment, a method is presented for recognizing multiple characters within a string. Initially, characters from words are organized into a hierarchy. The words are housed in a dictionary and each first character of a particular word appears as a node within the hierarchy. Moreover, each leaf node of the hierarchy represents a particular one of the words. Next, a target string is subsequently received. Each target character of the string is iterated and an attempt is made to assemble a substring of the target characters, which match to a particular leaf node of the hierarchy within the dictionary. Finally, multiple target words from the target string are identified and in response matched substrings found.), and 
searching the words, parts of words, sentences, or parts of sentences in the database by running iteratively from top to bottom through the database and moving back up the database as the longest string is not found and the next shorter string needs to be selected, until the longest available string is located (see [0024] At 130, the word recognition service iterates each target character of the target string and attempts to assemble substrings of the target characters that match to a particular leaf node of the hierarchy or a intermediate node that is a word. An intermediate node may be functionally equivalent to a leaf node by including an annotation indicating that although it is not technically a leaf node, it is a recognized word of the dictionary and should therefore be considered to be a logical leaf node.) , and 
wherein the vector associated with the longest string is used as the vector for the word, parts of words, sentences, or parts of sentences being searched (see the following paragraphs: 
[0026] For example, at 131, the word recognition service considers each target character that is presenting being processed as a potential start to a new word within the dictionary. Each target character is persisted (its location is retained during processing so it is retrievable) as long as it continues to match a potential word within the dictionary. The target character is released once it is determined that it does not match or does not form part of any word within the dictionary.), 
[0029] At 133, the first pointer is advanced to a next character in the target string just past the current target character when a match within the dictionary is not found with the current target character.
[0030] At 134, the word recognition service advances the first pointer to a location within the target string beyond a length of the substring when the substring is a match to a particular word in the dictionary unless the location is beyond a total length of the target string, which indicates that the iteration has ceased and the end of the target string is reached.
and 
the difference is Morris does not specifically show:
creating a final vector representing the words, parts of words, sentences, or parts of sentences being vectorized. 
However it is customary in the art to represent words by vectors as shown by Lai (see Lai col.10 lines 42-49: The algorithm may therefore map the token, based on its context to a vector representing the word. This vector may be set so that one or more dimensions may be represented and placed into condensed vector space. In some embodiments, this vector may be set to 100 dimensions. These dimensions may represent the vector to calculate the Euclidean distance between any two words, which may further be used to measure the similarity between any two words), 
it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include creating a final vector as claimed while implementing the method of Morris for the benefit of measuring similarity between words by their vector representations as shown by Lai;   
wherein the storing, searching, and creating is done using a computer program (see Morris [0023] In still another situation, any program or service may receive instructions or even command parameters from a mistyped string supplied by a user. Here, the program or service submits the string to the word recognition service for purposes of having the proper commands or instructions parsed from the string before assuming the commands or instructions are in error. It is apparent that there are a variety of useful situations in which a string may be submit by an entity (user, automated program, automated service, etc.) such that the word recognition service receives that target string for processing at 120.).

Regarding claim 2, Morris/Lai further teaches a method of claim 1 wherein the graph database is structured in sections that either contain single nodes or groups of individual nodes representing string elements  and  wherein each section possesses a leaf containing a vector numbers associated with the section (see Morris [0024] At 130, the word recognition service iterates each target character of the target string and attempts to assemble substrings of the target characters that match to a particular leaf node of the hierarchy or a intermediate node that is a word. An intermediate node may be functionally equivalent to a leaf node by including an annotation indicating that although it is not technically a leaf node, it is a recognized word of the dictionary and should therefore be considered to be a logical leaf node.), and wherein the sections are organized in a top-down relationship, from the section containing the smallest set of nodes, to the section containing the largest set of nodes (Morris [0054] In an embodiment, the hierarchically organized dictionary of words 301 is organized as a tree data structure where leaf nodes of the tree data structure represents words defined within the dictionary. Intermediate nodes within the tree data structure may also include annotations to make them appear as logical leaf nodes when the dictionary is being traversed. Examples and details of this particular approach was discussed above in detail with reference to the method 100 of the FIG. 1.).

Regarding claim 3, Morris/Lai further teaches a method of claim 1 wherein the vectorization involves searching through the graph database for the section containing the longest string that matches the words, parts of words, sentences, or parts of sentences being vectorized (Morris [0036] One now appreciates how a dictionary can be hierarchically organized. A string is received and parsed and each target character processed by traversing the dictionary for matches. Examples of this were provided above as well as some example pseudo code. This approach permits rapid detection of multiple words included in a single string. Other various enhancements, alterations, and perspectives of this approach are now discussed in greater detail with reference to the FIGS. 2-4., Lai col.10 lines 57-62: Using the vector set within the neural network, additional examples of interchangeable words may be identified from context. For example, server 110 may identify interchangeable words 205 such as king and queen (king <-> queen), bike and bicycle (bike <-> bicycle), and house and home (house <-> home).).

Regarding claim 4, Morris/Lai further teaches a method of claim 1 wherein the text be to be vectorized is searched from left to right, and the final vector includes combining each iteratively found vector in the order in which it was found in the database (Morris [0032] This is done by parsing the string "AABASEBALLHAT" and concurrently traversing a hierarchically organized dictionary. So, to start a first pointer is placed on the first A of the string; obviously a match is found in the dictionary on A as there are many potential words that start with A. The next A is acquired and it too is retained, since at least one word "aardvark" includes two successive "a's." The next character B is not found while traversing the dictionary hierarchy, since there are no words that start or end with "AAB." In this case, the first pointer is advanced to the second A. This results in no match again and the first pointer is advanced to B. Here, the first pointer stays on B until the H is reached and then it is advanced from B for a length of 8 characters (length of the word BASEBALL) and the first pointer now points at H. This continues until HAT is found. The process results in two words found in the string BASEBALL and HAT. This was done without taking every conceivable permutation substrings.).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Zhong et al (US 20210286989 A1) teach a computer-implemented method can include an architecture of machine learning sub-models that performs the global task of translating unstructured and semi-structured inputs into numerical representations that can be recognized and manipulated by a content-analysis (CA) sub-model without relying on brute force analysis. Embodiments of the invention achieve these results by separating the global task into auxiliary tasks and assigning each sub-model to at least one of the auxiliary tasks. The auxiliary tasks can include parsing the unstructured or semi-structured inputs into format types (e.g., lists, tables, figures, text, etc. of a PDF document), extracting features of the parsed document, and performing a computer-based CA on the extracted features. The sub-models are trained in stages and in groups, wherein both the stages and the groupings are based on the complexity of the sub-model's assigned task.   
Li et al (CN 113010670 A) teach an account number information clustering method and a detection method. device and storage medium; the account name information is processed by word segmentation to obtain a plurality of target words, then performing vectorization processing to the target words to obtain a plurality of word vectors, and then weighting and summing the word vectors according to the word type of the target words, obtaining the account name vector; then performing incremental clustering processing to the account name vector according to the central vector of the account name vector and the historical account vector set to obtain the clustering result set; because the application is firstly performing word segmentation processing and vectorization processing, and then weighting and summing according to the word type, because the part of speech type is determined; Therefore, even if the account name information comprises more random words, it can highlight the difference between the corresponding word vector, so as to more accurately identify and cluster the account name information. Therefore, the invention can be widely applied to the natural language processing technology.
Zhang et al (WO 2021068339 A1) teach an application related to artificial intelligence technology, and a text classification method, comprising: preprocessing original text data to obtain a text vector; performing label matching on the text vector to obtain a text vector with a label and a text vector without a label; inputting the text vector with the label into a BERT model to obtain a word vector feature; according to the word vector feature, training the text vector without the label by using a convolutional neural network model to obtain a text vector with a virtual label; and performing multi-label classification on the text vector with the label and the text vector with the virtual label by using a random forest model to obtain a text classification result. The present application further provides a text classification device and a computer readable storage medium. According to the present application, an accurate and efficient text classification function can be realized.
Cho (US 6240213 B1) teaches in a data compression system for compressing an input stream of characters into a compressed stream of codewords by employing a dictionary, wherein the dictionary stores a plurality of entries of characters, each entry being identified by a unique codeword, the input stream of characters is parsed into parsed strings and the unique codeword identifying each parsed string is transmitted. In the meantime, the dictionary is updated with N new entries of characters, wherein all of the N new entries include an unmatched character which is appended to the parsed string so that a new codeword is assigned to each of the N new entries, N being an integer equal to or larger than 0.

Tan, Bin, and Fuchun Peng. "Unsupervised query segmentation using generative language models and wikipedia." Proceedings of the 17th international conference on World Wide Web. 2008.
ABSTRACT In this paper, we propose a novel unsupervised approach to query segmentation, an important task in Web search. We use a generative query model to recover a query’s underlying concepts that compose its original segmented form. The model’s parameters are estimated using an expectation-maximization (EM) algorithm, optimizing the minimum description length objective function on a partial corpus that is specific to the query. To augment this unsupervised learning, we incorporate evidence from Wikipedia. Experiments show that our approach dramatically improves performance over the traditional approach that is based on mutual information, and produces comparable results with a supervised method. In particular, the basic generative language model contributes a 7.4% improvement over the mutual information based method (measured by segment F1 on the Intersection test set). EM optimization further improves the performance by 14.3%. Additional knowledge from Wikipedia provides another improvement of 24.3%, adding up to a total of 46% improvement (from 0.530 to 0.774). 

Lorenzo, Mario J. Classifying Relations using Recurrent Neural Network with Ontological-Concept Embedding. Diss. Nova Southeastern University, 2020.
Abstract: Relation extraction and classification represents a fundamental and challenging aspect of Natural Language Processing (NLP) research which depends on other tasks such as entity detection and word sense disambiguation. Traditional relation extraction methods based on pattern-matching using regular expressions grammars and lexico-syntactic pattern rules suffer from several drawbacks including the labor involved in handcrafting and maintaining large number of rules that are difficult to reuse. Current research has focused on using Neural Networks to help improve the accuracy of relation extraction tasks using a specific type of Recurrent Neural Network (RNN). A promising approach for relation classification uses an RNN that incorporates an ontology-based concept embedding layer in addition to word embeddings. This dissertation presents several improvements to this approach by addressing its main limitations. First, several different types of semantic relationships between concepts are incorporated into the model; prior work has only considered is-a hierarchical relationships. Secondly, a significantly larger vocabulary of concepts is used. Thirdly, an improved method for concept matching was devised. The results of adding these improvements to two state-of-the-art baseline models demonstrated an improvement to accuracy when evaluated on benchmark data used in prior studies. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to UYEN T LE whose telephone number is (571)272-4021. The examiner can normally be reached M-F 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ajay M Bhatia can be reached at 5712723906. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
                                                                                                                                                                                               /UYEN T LE/Primary Examiner, Art Unit 2156                                                                                                                                                                                                        13 December 2025

Read full office action

Prosecution Timeline

Dec 28, 2024

Application Filed

Dec 23, 2025

Non-Final Rejection mailed — §101, §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/078,453

Patent 12639588

AUTOMONOUS DIGITAL TWIN GENERATION USING EDGE-NODES

3y 5m to grant Granted May 26, 2026

18/143,789

Patent 12639271

DATA STORAGE METHOD AND DEVICE FOR DATA STORAGE

3y 0m to grant Granted May 26, 2026

19/074,467

Patent 12639321

ELECTRONIC DEVICE AND APPLICATION SEARCH METHOD THEREOF

1y 2m to grant Granted May 26, 2026

18/220,493

Patent 12608415

METHODS AND SYSTEMS FOR PERSONALIZED SCREEN CONTENT OPTIMIZATION

2y 9m to grant Granted Apr 21, 2026

18/337,891

Patent 12608350

Al-POWERED CONCEPT-DRIVEN VISUALIZATION AUTHORING

2y 10m to grant Granted Apr 21, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

84%

Grant Probability

94%

With Interview (+9.7%)

2y 8m (~1y 3m remaining)

Median Time to Grant

Low

PTA Risk

Based on 797 resolved cases by this examiner. Grant probability derived from career allowance rate.