Last updated: April 19, 2026
Application No. 18/601,601
Innovative Recombination Recommendation System for Corporate Knowledge Base and Method Thereof

Non-Final OA §101§102§103§112
Filed
Mar 11, 2024
Examiner
SITTNER, MATTHEW T
Art Unit
3629
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Inventec Corporation
OA Round
1 (Non-Final)
This examiner grants 58% of cases after interview

— +56.2% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 890 resolved cases, 2023–2026
Examiner Intelligence

SITTNER, MATTHEW T View full profile →
Grants 58% of resolved cases
Career Allow Rate
512 granted / 890 resolved
+5.5% vs TC avg
Strong +56% interview lift
Without
With
+56.2%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
32 currently pending
Career history
922
Total Applications
across all art units
Statute-Specific Performance

§101
33.2%
-6.8% vs TC avg
§103
33.0%
-7.0% vs TC avg
§102
13.1%
-26.9% vs TC avg
§112
16.0%
-24.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 890 resolved cases
Office Action

§101 §102 §103 §112
DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on XXXXXXXXXXXXXX has been entered.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
Status of Claims
Claims X are canceled.
Claims X are new.
Claims 1-10 are pending and have been examined.
This action is in reply to the papers filed on 03/11/2024 (effective filing date 12/06/2023).  
Information Disclosure Statement
No Information Disclosure Statement has been filed.
The information disclosure statement(s) submitted: xxxxxxxx, has/have been considered by the Examiner and made of record in the application file.
Amendment
The present Office Action is based upon the original patent application filed on xxx as modified by the amendment filed on xxx. 
Reasons For Allowance
Prior-Art Rejection withdrawn
Claims xxx are allowed. The closest prior art (See PTO-892, Notice of References Cited) does not teach the claimed: 

The closest prior-art (xxx) teach the features as disclosed in Non-final Rejection (xxxx), however, these cited references do not teach and the prior-art does not teach at least the following combination of features and/or elements:
determining, at a second time after associating the information corresponding to the first loyalty card with the logged location, that a second user computing device is located within a specified distance of the logged location using a second positioning system of the second user computing device; in response to determining that the second user computing device is located within the specified distance of the logged location of the first user computing device at the first time of detecting: retrieving information corresponding to a second loyalty card, the second loyalty card being associated with the merchant and the second user computing device; and displaying, by the second user computing device, data describing the second loyalty card. 

Claim Rejections - 35 USC §101 - Withdrawn 
Per Applicant’s amendments and arguments and considering new guidance in the MPEP, the rejections are withdrawn. Specifically, in Applicant’s Remarks (dated 03/14/2017, pgs. 8-11), Applicant traverses the 35 USC §101 rejections arguing that the amended claims recite new limitations that are not abstract, amount to significantly more, are directed to a practical application, etc… For example, Applicant argues….
In support of their arguments, Applicant cites to the following recent Fed. Cir. court cases (i.e., Alice Corp. v. CLS Bank Int’l, SRI Int’l, Inc. v. Cisco Systems, Inc., Ultramercial, Inc. v. Hulu, LLC, Berkheimer, Core Wireless, McRO, Enfish, Bascom, DDR, etc…). 

Claim Rejections - 35 USC § 101
35 U.S.C. § 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-10 are rejected under 35 U.S.C. § 101 as being directed to non-statutory subject matter because the claimed invention is directed to an abstract idea without significantly more. These claims recite a method and system for implementing an innovative recombination recommendation system for corporate knowledge base.
Claim 6 recites [a]n innovative recombination recommendation method for a corporate knowledge base, comprising: linking the corporate knowledge base to a server-end device through network, wherein the corporate knowledge base stores pieces of patent raw data, wherein each of the pieces of patent raw data corresponds to at least one mathematical vector, wherein the server-end device comprises a non-transitory computer-readable storage medium storing computer readable instructions, and a hardware processor executing the computer readable instructions to make the server-end device execute: receiving an innovation summary, splitting the innovation summary into phrases, and vectorizing each of the phrases to generate a phrase vector, by the server-end device; transmitting the phrase vectors to the corporate knowledge base, comparing the phrase vectors with one of the mathematical vectors one by one, and calculating a first vector distance between the phrase vectors and one of the mathematical vectors, by the server-end device; selecting a mathematical vector from the mathematical vectors with the first vector distance exceeding a threshold value, and the patent raw data corresponding to the selected mathematical vector, by the server-end device; and recombining the phrases, vectorizing the recombined phrases to generate a combination vector, calculating a second vector distance between the combination vector and the mathematical vector of the selected patent raw data, and when the calculated second vector distance does not exceed the threshold value, generating an evaluation pass message based on the recombined phrase, by the server-end device.
The claims are being rejected according to the 2019 Revised Patent Subject Matter Eligibility Guidance (Federal Register, Vol. 84, No. 5, p. 50-57 (Jan. 7, 2019)). 
Step 1: Does the Claim Fall within a Statutory Category?
Yes. Claims 6-10 recite a method and, therefore, are directed to the statutory class of a process. Claims 1-5 recite a system/apparatus and, therefore, are directed to the statutory class of machine.  
Step 2A, Prong One: Is a Judicial Exception Recited?
Yes. The following tables identify the specific limitations that recite an abstract idea. The column that identifies the additional elements will be relevant to the analysis in step 2A, prong two, and step 2B.  

Claim 1: Identification of Abstract Idea and Additional Elements, using Broadest Reasonable Interpretation
Claim Limitation
Abstract Idea
Additional Element
1. An innovative recombination recommendation method for a corporate knowledge base, comprising:

No additional elements are positively claimed.
linking the corporate knowledge base to a server-end device through network, wherein the corporate knowledge base stores pieces of patent raw data, wherein each of the pieces of patent raw data corresponds to at least one mathematical vector, wherein the server-end device comprises a non-transitory computer-readable storage medium storing computer readable instructions, and a hardware processor executing the computer readable instructions to make the server-end device execute:
This limitation includes the step(s) of: linking the corporate knowledge base to a server-end device through network, wherein the corporate knowledge base stores pieces of patent raw data, wherein each of the pieces of patent raw data corresponds to at least one mathematical vector, wherein the server-end device comprises a non-transitory computer-readable storage medium storing computer readable instructions, and a hardware processor executing the computer readable instructions to make the server-end device execute. 
But for the server-end device and/or hardware processor, this limitation is directed to processing and/or communicating known information to facilitate implementing an innovative recombination recommendation system for corporate knowledge base which may be categorized as any of the following:
mathematical concept (mathematical relationships, mathematical formulas or equations, mathematical calculations) 
and/or
certain method of organizing human activity – 
fundamental economic principles or practices (including hedging, insurance, mitigating risk), and/or
commercial or legal interactions (including agreements in the form of contracts; legal obligations; advertising, marketing or sales activities or behaviors; business relations).
linking the corporate knowledge base to a server-end device through network, … wherein the server-end device comprises a non-transitory computer-readable storage medium storing computer readable instructions, and a hardware processor executing the computer readable instructions to make the server-end device execute…
receiving an innovation summary, splitting the innovation summary into phrases, and vectorizing each of the phrases to generate a phrase vector, by the server-end device;
This limitation includes the step(s) of: receiving an innovation summary, splitting the innovation summary into phrases, and vectorizing each of the phrases to generate a phrase vector, by the server-end device. 
But for the server-end device and/or hardware processor, this limitation is directed to processing and/or communicating known information to facilitate implementing an innovative recombination recommendation system for corporate knowledge base which may be categorized as any of the following:
mathematical concept (mathematical relationships, mathematical formulas or equations, mathematical calculations) 
and/or
certain method of organizing human activity – 
fundamental economic principles or practices (including hedging, insurance, mitigating risk), and/or
commercial or legal interactions (including agreements in the form of contracts; legal obligations; advertising, marketing or sales activities or behaviors; business relations).
server-end device
transmitting the phrase vectors to the corporate knowledge base, comparing the phrase vectors with one of the mathematical vectors one by one, and calculating a first vector distance between the phrase vectors and one of the mathematical vectors, by the server-end device;
This limitation includes the step(s) of: transmitting the phrase vectors to the corporate knowledge base, comparing the phrase vectors with one of the mathematical vectors one by one, and calculating a first vector distance between the phrase vectors and one of the mathematical vectors, by the server-end device. 
But for the server-end device and/or hardware processor, this limitation is directed to processing and/or communicating known information to facilitate implementing an innovative recombination recommendation system for corporate knowledge base which may be categorized as any of the following:
mathematical concept (mathematical relationships, mathematical formulas or equations, mathematical calculations) 
and/or
certain method of organizing human activity – 
fundamental economic principles or practices (including hedging, insurance, mitigating risk), and/or
commercial or legal interactions (including agreements in the form of contracts; legal obligations; advertising, marketing or sales activities or behaviors; business relations).
server-end device
selecting a mathematical vector from the mathematical vectors with the first vector distance exceeding a threshold value, and the patent raw data corresponding to the selected mathematical vector, by the server-end device; and
This limitation includes the step(s) of: selecting a mathematical vector from the mathematical vectors with the first vector distance exceeding a threshold value, and the patent raw data corresponding to the selected mathematical vector, by the server-end device. 
But for the server-end device and/or hardware processor, this limitation is directed to processing and/or communicating known information to facilitate implementing an innovative recombination recommendation system for corporate knowledge base which may be categorized as any of the following:
mathematical concept (mathematical relationships, mathematical formulas or equations, mathematical calculations) 
and/or
certain method of organizing human activity – 
fundamental economic principles or practices (including hedging, insurance, mitigating risk), and/or
commercial or legal interactions (including agreements in the form of contracts; legal obligations; advertising, marketing or sales activities or behaviors; business relations).
server-end device
recombining the phrases, vectorizing the recombined phrases to generate a combination vector, calculating a second vector distance between the combination vector and the mathematical vector of the selected patent raw data, and when the calculated second vector distance does not exceed the threshold value, generating an evaluation pass message based on the recombined phrase, by the server-end device.
This limitation includes the step(s) of: recombining the phrases, vectorizing the recombined phrases to generate a combination vector, calculating a second vector distance between the combination vector and the mathematical vector of the selected patent raw data, and when the calculated second vector distance does not exceed the threshold value, generating an evaluation pass message based on the recombined phrase, by the server-end device. 
But for the server-end device and/or hardware processor, this limitation is directed to processing and/or communicating known information to facilitate implementing an innovative recombination recommendation system for corporate knowledge base which may be categorized as any of the following:
mathematical concept (mathematical relationships, mathematical formulas or equations, mathematical calculations) 
and/or
certain method of organizing human activity – 
fundamental economic principles or practices (including hedging, insurance, mitigating risk), and/or
commercial or legal interactions (including agreements in the form of contracts; legal obligations; advertising, marketing or sales activities or behaviors; business relations).
server-end device


As shown above, under Step 2A, Prong One, the claims recite a judicial exception (an abstract idea). The claims are directed to the abstract idea of implementing an innovative recombination recommendation system for a corporate knowledge base, which, pursuant to MPEP 2106.04, is aptly categorized as a mathematical concept and/or a method of organizing human activity.  Therefore, under Step 2A, Prong One, the claims recite a judicial exception.
Next, the aforementioned claims recite additional functional elements that are associated with the judicial exception, including: knowledge base for storing information. Examiner understands these limitations to be insignificant extrasolution activity. (See Accenture, 728 F.3d 1336, 108 U.S.P.Q.2d 1173 (Fed. Cir. 2013), citing Cf. Diamond v. Diehr, 450 U.S. 175, 191-192 (1981) ("[I]nsignificant post-solution activity will not transform an unpatentable principle in to a patentable process.”).  
The aforementioned claims also recite additional technical elements including: a “processor” to execute the method/system and a “non-transitory computer-readable storage device” for storing executable instructions. These limitations are recited at a high level of generality and appear to be nothing more than generic computer components. Claims that amount to nothing more than an instruction to apply the abstract idea using a generic computer do not render an abstract idea eligible. Alice Corp., 134 S. Ct. at 2358, 110 USPQ2d at 1983. See also 134 S. Ct. at 2389, 110 USPQ2d at 1984. 
Step 2A, Prong Two: Is the Abstract Idea Integrated into a Practical Application?
No. The judicial exception is not integrated into a practical application. The additional elements listed above that relate to computing components are recited at a high level of generality (i.e., as generic components performing generic computer functions such as communicating, receiving, processing, analyzing, and outputting/displaying data) such that they amount to no more than mere instructions to apply the exception using generic computing components. Simply implementing the abstract idea on a generic computer is not a practical application of the abstract idea. Additionally, the claims do not purport to improve the functioning of the computer itself. There is no technological problem that the claimed invention solves. Rather, the computer system is invoked merely as a tool. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Therefore, these claims are directed to an abstract idea. 
Furthermore, looking at the elements individually and in combination, under Step 2A, Prong Two, the claims as a whole do not integrate the judicial exception into a practical application because they fail to: improve the functioning of a computer or a technical field, apply the judicial exception in the treatment or prophylaxis of a disease, apply the judicial exception with a particular machine, effect a transformation or reduction of a particular article to a different state or thing, or apply the judicial exception beyond generally linking the use of the judicial exception to a particular technological environment.  Rather, the claims merely use a computer as a tool to perform the abstract idea(s), and/or add insignificant extra-solution activity to the judicial exception, and/or generally link the use of the judicial exception to a particular technological environment.
Step 2B: Does the Claim Provide an Inventive Concept?
Next, under Step 2B, the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements, when considered both individually and as an ordered combination, do not amount to significantly more than the abstract idea.  Furthermore, looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually.  Simply put, as noted above, there is no indication that the combination of elements improves the functioning of a computer (or any other technology), and their collective functions merely provide conventional computer implementation.  As discussed above with respect to integration of the abstract idea into a practical application, the additional elements relating to computing components amount to no more than applying the exception using a generic computing components.  Mere instructions to apply an exception using a generic computing component cannot provide an inventive concept. Furthermore, the broadest reasonable interpretation of the claimed computer components (i.e., additional elements) includes any generic computing components that are capable of being programmed to communicate, receive, send, process, analyze, output, or display data. 
Additionally, pursuant to the requirement under Berkheimer, the following citations are provided to demonstrate that the additional elements, identified as extra-solution activity, amount to activities that are well-understood, routine, and conventional.  See MPEP 2106.05(d).
Capturing an image (code) with an RFID reader.  Ritter, US Patent No. 7734507 (Col. 3, Lines 56-67); “RFID: Riding on the Chip” by Pat Russo.  Frozen Food Age.  New York: Dec. 2003, vol. 52, Issue 5; page S22.
Receiving or transmitting data over a network. Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362; OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014).
Storing and retrieving information in memory. Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93.
Outputting/Presenting data to a user.  Mayo, 566 U.S. at 79, 101 USPQ2d at 1968; OIP Techs., Inc. v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1092-93 (Fed. Cir. 2015); MPEP 2106.05(g)(3).
Using a machine learning model to determine user segment characteristics for an ad campaign.  https://whites.agency/blog/how-to-use-machine-learning-for-customer-segmentation/.
Thus, taken alone and in combination, the additional elements do not amount to significantly more than the above-identified judicial exception (the abstract idea), and are ineligible under 35 USC 101.
Independent system claim 1 also contains the identified abstract ideas, with the additional elements of a processor and storage medium, which are a generic computer components, and thus not significantly more for the same reasons and rationale above. 
Dependent claims 2-5 and 7-10 further describe the abstract idea. The additional elements of the dependent claims fail to integrate the abstract idea into a practical application and do not amount to significantly more than the abstract idea. Thus, as the dependent claims remain directed to a judicial exception, and as the additional elements of the claims do not amount to significantly more, the dependent claims are not patent eligible.
As such, the claims are not patent eligible. 
Invention Could be Performed Manually
It is conceivable that the invention could be performed manually without the aid of machine and/or computer. For example, Applicant claims storing data, receiving a summary and splitting the summary into phrases, vectorizing the phrases, transmitting information, recombining phrases, etc… Each of these features could be performed manually and/or with the aid of a simple generic computer to facilitate the transmission of data. 
See also Leapfrog Enterprises, Inc. v. Fisher-Price, Inc., and In re Venner, which stand for the concept that automating manual activity and/or applying modern electronics to older mechanical devices to accomplish the same result is not sufficient to distinguish over the prior art. Here, applicant is merely claiming computers to facilitate and/or automate functions which used to be commonly performed by a human.
Leapfrog Enterprises, Inc. v. Fisher-Price, Inc., 485 F.3d 1157, 82 USPQ2d 1687 (Fed. Cir. 2007) "[a]pplying modern electronics to older mechanical devices has been commonplace in recent years…").  The combination is thus the adaptation of an old idea or invention using newer technology that is commonly available and understood in the art. 
In In re Venner, 262 F.2d 91, 95, 120 USPQ 193, 194 (CCPA 1958), the court held that broadly providing an automatic or mechanical means to replace manual activity which accomplished the same result is not sufficient to distinguish over the prior art.  MPEP 2144.04, III Automating a Manual Activity.
MPEP 2144.04 III - Automating a Manual Activity and In re Venner, 262 F.2d 91, 95, 120 USPQ 193, 194 (CCPA 1958) further stand for and provide motivation for using technology, hardware, computer, or server to automate a manual activity.  
Therefore, the Office finds no improvements to another technology or field, no improvements to the function of the computer itself, and no meaningful limitations beyond generally linking the use of an abstract idea to a particular technological environment.  Therefore, based on the two-part Alice Corp. analysis, there are no limitations in any of the claims that transform the exception (i.e., the abstract idea) into a patent eligible application.
Claim Rejections - Not an Ordered Combination
None of the limitations, considered as an ordered combination provide eligibility, because taken as a whole, the claims simply instruct the practitioner to implement the abstract idea with routine, conventional activity.
Claim Rejections - Preemption
Allowing the claims, as presently claimed, would preempt others from implementing an innovative recombination recommendation system for a corporate knowledge base. Furthermore, the claim language only recites the abstract idea of performing this method, there are no concrete steps articulating a particular way in which this idea is being implemented or describing how it is being performed.  

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over: Iyer et al. 2024/0054290; in view of Semenov 2024/0143632.
18/601,601 – Claim 1. Iyer et al. 2024/0054290 teaches An innovative recombination recommendation system for a corporate knowledge base (Iyer et al. 2024/0054290 [0088 - knowledge repositories] A network communicator 535 may be provided to connect the computer system to a network and in turn to other devices connected to the network including other clients, servers, data stores, and interfaces, for example. A network communicator 535 may include, for example, a network adapter such as a LAN adapter or a wireless adapter. The computer system may include a data sources interface 540 to access the data source 545. The data source 545 may be an information resource. As an example, a database of exceptions and rules may be provided as the data source 545. Moreover, knowledge repositories and curated data may be other examples of the data source 545.), comprising: a corporate knowledge base (Iyer et al. 2024/0054290 [0083 - idea repository] The input may be received by the processor 102 from one or more databases/repositories. The input includes identified trends, idea trends mapping, cohorts, idea repository, engagement data, user and expert data, and the like. The processor 102 may display on the dashboard top trends and cohorts, user engagement dashboard, and dashboard for expert engagement. Further, the processor 102 may output a semantic knowledge network of trends, ideas, innovators, experts, and user demography of the innovators corresponding to the emerging innovation trend. Further, the processor 102 may illustrate the relationship between entities such as trends, ideas, innovators, experts, and user demography, to enable thought seeding. [0078 - innovation dataset] The base data for the processor 102 may include user profile data, ideas (innovation dataset), teams (themes mapping (AI+human validated)), and expert data. For example, the innovation may be organized from corporate incubators, innovation centers of excellence, innovation ecosystems, innovation hubs, innovation labs, open innovation, innovation governance committees, and the like. The ideas may be generated from expert networks for innovation, idea management tools, hackathons, internal pitch events, idea challenges, innovation workshops, and the like. Further, the idea may be evaluated and experimented with a lean start-up, design thinking. Further, the idea may be improving the innovation process and culture using an AI-driven innovation, International Organization for Standardization (ISO), innovation culture hacks, innovation training, and the like.), configured to store pieces of patent raw data (Iyer et al. 2024/0054290 [0058 - patent, non-patent literature] At step 342, the method includes determining candidate expression. The expressions are a subset of the n-grams. At step 344, the method includes mapping contextual keywords and ideas. At step 346, the method includes validating the best candidate expressions based on the idea similarity model and historical keywords dataset from a database. For each idea, the keywords extracted may be validated against the historical keyword dataset using the idea similarity model <<Actual document number>>. The actual document number can be a Identification (ID) number or application/publication ID of, but not limited to, a patent, non-patent literature, a technical paper, a white paper, an article, a power point presentation, a note, and the like. For example, the keyword extraction engine 106 outputs extracted keywords, for example, Robotic Process Automation (RPA), an automation BOTs for some ideas. The extracted keywords may be validated and compared to the idea similarity model, which matches and adds words which are in proximity such as Robotic Process Automation (RPA) in operations, and intelligent operations. After obtaining the ranked list of the n-grams based on how relevant those are to the source document, the next step is to re-rank them based on Maximal margin relevance or Max sum strategy. Here the distance between n-gram to the source document is minimized, however at the same time maximizes the distance with other candidate n-grams. This ensures that no output is similar meaning n-grams as probable keywords in final set, hence there is diversity. Hence after the final step of re-ranking, when a user searches for ‘intelligent operations,’ in turn, the results may also include the ideas related to RPA or BOTs.), wherein each of the pieces of patent raw data corresponds to at least one mathematical vector (Iyer et al. 2024/0054290 [0003 - extracts a context-based keyword from an innovation dataset by transforming the innovation dataset into a vector] An embodiment of present disclosure includes a system, the system extracts a context-based keyword from an innovation dataset by transforming the innovation dataset into a vector. The innovation dataset includes data corresponding to an innovation. Further, the system searches semantically relevant keywords for the extracted context-based keyword, by extracting an entity and a key phrase from the extracted a context-based keyword. The entities correspond to named entity recognition in the innovation dataset. Furthermore, the system clusters the vector, by identifying frequent keywords in the semantically relevant keywords to obtain cluster centroids of the frequent keywords. Thereafter, the system determines weighted keywords in each cluster using the obtained cluster centroids, and classifies the weighted keywords to identify emerging innovation trends relevant to the innovation in the innovation dataset. [0032 - convert the extracted context-based keywords to high-dimensional vectors. In an example embodiment, the processor 102 may calculate the semantic distance between high-dimensional vectors and the innovation dataset] In an example embodiment, the processor 102 may cause the keyword extraction engine 106 to extract a context-based keyword from an innovation dataset by transforming the innovation dataset into a vector. In an example embodiment, cause the keyword extraction engine 106 may also identify language in the innovation dataset. In an example embodiment, the innovation dataset includes data corresponding to an innovation. In an example embodiment, for extracting the context-based keywords from the innovation dataset, the processor 102 may extract n-grams from the innovation dataset, wherein the n-grams correspond to a sequence of n-consecutive tokens in a string of the innovation dataset. In an example embodiment, the processor 102 may rank the n-grams based on a frequency of the extracted n-grams in the innovation dataset. In an example embodiment, the processor 102 may determine a similarity of each ranked n-grams to the innovation dataset, using a cosine similarity technique, and extracting context-based keywords for the similar n-grams. In an example embodiment, the processor 102 may convert the extracted context-based keywords to high-dimensional vectors. In an example embodiment, the processor 102 may calculate the semantic distance between high-dimensional vectors and the innovation dataset. In an example embodiment, the processor 102 may validate the context-based keywords with a historical keyword dataset. [0034 - cause the clustering engine 110 to cluster the vector, by identifying frequent keywords in the semantically relevant keywords to obtain cluster centroids of the frequent keywords] In an example embodiment, the processor 102 may cause the clustering engine 110 to cluster the vector, by identifying frequent keywords in the semantically relevant keywords to obtain cluster centroids of the frequent keywords. In an embodiment, the clustering may be performed using, but not limited to, an agglomerative hierarchical clustering technique, a K-means clustering technique, and the like. [0056 - Count Vectorizer model ranks the n-grams based on the corresponding frequency in the … the n-grams extracted from the previous step may be converted to respective high-dimensional vectors using the is BERT model…] For example, the process encompasses three steps. First, the method includes extracting, by the processor 102, n-grams from the underlying text corpus for extracting keywords. N-grams depict the sequence of n-consecutive tokens in a string. For example, “digital bank experience” is a tri-gram of 3 consecutive words, “sustainability” is a uni-gram while “carbon emission” is a bi-gram of two consecutive words. The processor 102 may use the Count Vectorizer model to obtain a list of candidate n-grams. The Count Vectorizer model ranks the n-grams based on the corresponding frequency in the original document. All the n-grams extracted from the previous step may be converted to respective high-dimensional vectors using the is BERT model. The next step is to calculate the semantic distance between each n-gram and the original text document. The more the similarity, the more relevant and representative the keyword is to the source document. A semantic distance between items may be based on the likeness of item meaning or semantic content as opposed to lexicographical similarity. The semantic distance is determined to minimize the distance between each n-gram keyword vector and document vectors. From the generated matrix, top-n keywords may be extracted which has minimum distance from the document vectors.); and a server-end device, linked to the corporate knowledge base through network (Iyer et al. 2024/0054290 [0005 - knowledge network] Further, the system provides innovation insights, and relationships to create a semantic knowledge network for a thought seeding. The semantic knowledge network includes at least one of the emerging innovation trends, multiple innovations, innovators, experts, and a demography of the innovators associated with the emerging innovation trends. [0027 - innovation dataset… create a semantic knowledge network…] Additionally, the system recommends at least one of content, a team, cohorts, and experts relevant to the emerging innovation trends relevant to the innovation in the innovation dataset. Further, the system creates a cohort or a private channel comprising team members relevant to the recommendation for reusing the innovation in the innovation dataset. Further, the cohort or a private channel may be created to interact with other innovators, to drive collaborations between innovators, and inspire other innovators Further, the system provides innovation insights, and relationships to create a semantic knowledge network for a thought seeding. The semantic knowledge network includes at least one of emerging innovation trends, multiple innovations, innovators, experts, and a demography of the innovators associated with the emerging innovation trends. [0078 - base data for the processor 102 may include … ideas (innovation dataset), teams (themes mapping (AI+human validated)), … the innovation may be organized from corporate incubators, innovation centers of excellence, innovation ecosystems, …] The base data for the processor 102 may include user profile data, ideas (innovation dataset), teams (themes mapping (AI+human validated)), and expert data. For example, the innovation may be organized from corporate incubators, innovation centers of excellence, innovation ecosystems, innovation hubs, innovation labs, open innovation, innovation governance committees, and the like. The ideas may be generated from expert networks for innovation, idea management tools, hackathons, internal pitch events, idea challenges, innovation workshops, and the like. Further, the idea may be evaluated and experimented with a lean start-up, design thinking. Further, the idea may be improving the innovation process and culture using an AI-driven innovation, International Organization for Standardization (ISO), innovation culture hacks, innovation training, and the like.), wherein the server-end device comprises: a non-transitory computer-readable storage medium, configured to store computer readable instructions (Iyer et al. 2024/0054290 [0086 - instructions on the computer-readable storage medium 510 are read and stored the instructions…] The instructions on the computer-readable storage medium 510 are read and stored the instructions in storage 515 or in random access memory (RAM). The storage 515 may provide a space for keeping static data where at least some instructions could be stored for later execution. The stored instructions may be further compiled to generate other representations of the instructions and dynamically stored in the RAM such as RAM 520. The processor 505 may read instructions from the RAM 520 and perform actions as instructed.); and a hardware processor, electrically connected to the non-transitory computer-readable storage medium, and configured to execute computer readable instructions to make the server-end device execute (Iyer et al. 2024/0054290 [0031 - system 100 may be a hardware device including the processor 102 executing machine-readable program instructions/processor-executable instructions…] The system 100 may be a hardware device including the processor 102 executing machine-readable program instructions/processor-executable instructions, to perform deep technology innovation management by cross-pollinating innovations dataset. Execution of the machine-readable program instructions by the processor 102 may enable the proposed system 100 to enable deep technology innovation management by cross-pollinating innovations dataset. The “hardware” may comprise a combination of discrete components, an integrated circuit, an application-specific integrated circuit, a field-programmable gate array, a digital signal processor, or other suitable hardware. The “software” may comprise one or more objects, agents, threads, lines of code, subroutines, separate software applications, two or more lines of code, or other suitable software structures operating in one or more software applications or on one or more processors. The processor 102 may include, for example, microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuits, and/or any devices that manipulate data or signals based on operational instructions. Among other capabilities, processor 102 may fetch and execute computer-readable instructions in a memory operationally coupled with system 100 for performing tasks such as data processing, input/output processing, keyword extraction, and/or any other functions. Any reference to a task in the present disclosure may refer to an operation being or that may be performed on data.): receiving an innovation summary, splitting the innovation summary into phrases (Iyer et al. 2024/0054290 [0054 - innovation dataset may be abstract structured, and unstructured, with multiple dimensions having free text data, … the processor 102 may execute keyword extraction engine 106 to perform context-based keyword extraction … that allows transforming phrases and documents to vectors that captures corresponding meaning…] For example, the innovation dataset may be abstract structured, and unstructured, with multiple dimensions having free text data, with multiple languages, acronyms, and emojis acting as noise. Hence, a Python® script may be used to perform noise removal, and text pre-processing using multiple Natural Language (NLP) techniques. The dataset may include multiple languages. A language detection algorithm may be used to detect language and transform the text data into the English language. Considering the length of each document, the processor 102 may execute keyword extraction engine 106 to perform context-based keyword extraction and then use the document/word embedding on top of the extracted keyword for the semantic search. The context-based keyword extraction may be performed using BERT, which is a bi-directional transformer model that allows transforming phrases and documents to vectors that captures corresponding meaning. The KeyBERT is a keyword extraction library that leverages BERT embeddings to obtain keywords that represent the underlying text document. The KeyBERT may be an unsupervised extractive way of obtaining keywords from a given text. [0062 - stemming and tokenization in which the text is split into smaller units] In another example, ‘retaining,’ ‘amplifying’ upgrading’ and the like. The POS entities may be detected to capture important and relevant entities such as a noun, singular (NN), proper noun singular (NNP) (tags for singular nouns), noun, plural (NNS), proper noun plural (NNPS) (tags for plural nouns), adjective (JJ). For example, retaining nouns or entities such as ‘metaverse,’ ‘digital twin,’ ‘quantum computing.’ Further, for stemming and tokenization in which the text is split into smaller units. For stemming may be the process of reducing a word to its word stem that affixes to suffixes and prefixes. Punctuations and special characters from the text may be removed.), and vectorizing each of the phrases to generate a phrase vector (Iyer et al. 2024/0054290 [0033 - processor 102 may vectorize the extracted entity and key phrase for searching semantically relevant keywords] In an example embodiment, the processor 102 may cause the search engine 108 to search semantically relevant keywords for the extracted context-based keyword, by extracting an entity and a key phrase from the extracted a context-based keyword. In an example embodiment, the entities may correspond to named entity recognition in the innovation dataset. In an example embodiment, for searching semantically relevant keywords, the processor 102 may pre-process the extracted context-based keywords. The pre-processing comprises at least one of a noise removal process, a tokenization process, a stemming process, a lemmatization process, and a normalization process. In an example embodiment, the processor 102 may extract the entity and the key phrase from the pre-processed context-based keywords. In an example embodiment, the processor 102 may vectorize the extracted entity and key phrase for searching semantically relevant keywords. [0054 - transforming phrases and documents to vectors that captures corresponding meaning] For example, the innovation dataset may be abstract structured, and unstructured, with multiple dimensions having free text data, with multiple languages, acronyms, and emojis acting as noise. Hence, a Python® script may be used to perform noise removal, and text pre-processing using multiple Natural Language (NLP) techniques. The dataset may include multiple languages. A language detection algorithm may be used to detect language and transform the text data into the English language. Considering the length of each document, the processor 102 may execute keyword extraction engine 106 to perform context-based keyword extraction and then use the document/word embedding on top of the extracted keyword for the semantic search. The context-based keyword extraction may be performed using BERT, which is a bi-directional transformer model that allows transforming phrases and documents to vectors that captures corresponding meaning. The KeyBERT is a keyword extraction library that leverages BERT embeddings to obtain keywords that represent the underlying text document. The KeyBERT may be an unsupervised extractive way of obtaining keywords from a given text. [0063 – extracting … key phrases using the preprocessed data … includes vectorizing … the extracted entity and the key phrase … training the vector model, the model may be tuned…] At step 356, the method includes extracting entity, by the processor 102, using the preprocessed data. At step 358, the method includes extracting, by the processor 102, key phrases using the preprocessed data. At step 360, the method includes vectorizing, by the processor 102, the extracted entity and the key phrase. While training the vector model, the model may be tuned using multiple hyperparameters such as changing several iterations (epochs) over the corpus data, “vector_size” may be an input of dimensionality of the feature vector, input for the training algorithm ‘Distributed Memory’ (PV-DM) may be used. Otherwise, distributed Bag of Words (PV-DBOW) may be employed. A Hierarchical Softmax (HS) may be used for model training, “DM_mean” which decides the sum of the context word vectors.); transmitting the phrase vectors to the corporate knowledge base (Iyer et al. 2024/0054290 [0054 - a bi-directional transformer model that allows transforming phrases and documents to vectors…] For example, the innovation dataset may be abstract structured, and unstructured, with multiple dimensions having free text data, with multiple languages, acronyms, and emojis acting as noise. Hence, a Python® script may be used to perform noise removal, and text pre-processing using multiple Natural Language (NLP) techniques. The dataset may include multiple languages. A language detection algorithm may be used to detect language and transform the text data into the English language. Considering the length of each document, the processor 102 may execute keyword extraction engine 106 to perform context-based keyword extraction and then use the document/word embedding on top of the extracted keyword for the semantic search. The context-based keyword extraction may be performed using BERT, which is a bi-directional transformer model that allows transforming phrases and documents to vectors that captures corresponding meaning. The KeyBERT is a keyword extraction library that leverages BERT embeddings to obtain keywords that represent the underlying text document. The KeyBERT may be an unsupervised extractive way of obtaining keywords from a given text. [0040 - keyword extraction engine 106 to receive input data such as an innovation dataset from an idea repository] At step 202, the processor 102 may execute the keyword extraction engine 106 to receive input data such as an innovation dataset from an idea repository. At step 204, keywords from the innovation dataset may be extracted using keyword extraction based Bidirectional Encoder Representations from Transformers (KeyBERT) for extracting keywords that represent an underlying text document associated with the innovation dataset, and detecting the language of the innovation dataset using, for example, spaCy language detector®.), and comparing the phrase vectors with the mathematical vectors (Iyer et al. 2024/0054290 [0058 - extracted keywords may be validated and compared to the idea similarity model…] At step 342, the method includes determining candidate expression. The expressions are a subset of the n-grams. At step 344, the method includes mapping contextual keywords and ideas. At step 346, the method includes validating the best candidate expressions based on the idea similarity model and historical keywords dataset from a database. For each idea, the keywords extracted may be validated against the historical keyword dataset using the idea similarity model <<Actual document number>>. The actual document number can be a Identification (ID) number or application/publication ID of, but not limited to, a patent, non-patent literature, a technical paper, a white paper, an article, a power point presentation, a note, and the like. For example, the keyword extraction engine 106 outputs extracted keywords, for example, Robotic Process Automation (RPA), an automation BOTs for some ideas. The extracted keywords may be validated and compared to the idea similarity model, which matches and adds words which are in proximity such as Robotic Process Automation (RPA) in operations, and intelligent operations. After obtaining the ranked list of the n-grams based on how relevant those are to the source document, the next step is to re-rank them based on Maximal margin relevance or Max sum strategy. Here the distance between n-gram to the source document is minimized, however at the same time maximizes the distance with other candidate n-grams. This ensures that no output is similar meaning n-grams as probable keywords in final set, hence there is diversity. Hence after the final step of re-ranking, when a user searches for ‘intelligent operations,’ in turn, the results may also include the ideas related to RPA or BOTs. [0054 - transformer model that allows transforming phrases and documents to vectors…] For example, the innovation dataset may be abstract structured, and unstructured, with multiple dimensions having free text data, with multiple languages, acronyms, and emojis acting as noise. Hence, a Python® script may be used to perform noise removal, and text pre-processing using multiple Natural Language (NLP) techniques. The dataset may include multiple languages. A language detection algorithm may be used to detect language and transform the text data into the English language. Considering the length of each document, the processor 102 may execute keyword extraction engine 106 to perform context-based keyword extraction and then use the document/word embedding on top of the extracted keyword for the semantic search. The context-based keyword extraction may be performed using BERT, which is a bi-directional transformer model that allows transforming phrases and documents to vectors that captures corresponding meaning. The KeyBERT is a keyword extraction library that leverages BERT embeddings to obtain keywords that represent the underlying text document. The KeyBERT may be an unsupervised extractive way of obtaining keywords from a given text.), one by one, and calculating a plurality of first vector distances between the phrase vectors and each of the mathematical vectors (Iyer et al. 2024/0054290 [0032 - processor 102 may calculate the semantic distance between high-dimensional vectors and the innovation dataset…] In an example embodiment, the processor 102 may cause the keyword extraction engine 106 to extract a context-based keyword from an innovation dataset by transforming the innovation dataset into a vector. In an example embodiment, cause the keyword extraction engine 106 may also identify language in the innovation dataset. In an example embodiment, the innovation dataset includes data corresponding to an innovation. In an example embodiment, for extracting the context-based keywords from the innovation dataset, the processor 102 may extract n-grams from the innovation dataset, wherein the n-grams correspond to a sequence of n-consecutive tokens in a string of the innovation dataset. In an example embodiment, the processor 102 may rank the n-grams based on a frequency of the extracted n-grams in the innovation dataset. In an example embodiment, the processor 102 may determine a similarity of each ranked n-grams to the innovation dataset, using a cosine similarity technique, and extracting context-based keywords for the similar n-grams. In an example embodiment, the processor 102 may convert the extracted context-based keywords to high-dimensional vectors. In an example embodiment, the processor 102 may calculate the semantic distance between high-dimensional vectors and the innovation dataset. In an example embodiment, the processor 102 may validate the context-based keywords with a historical keyword dataset. [0056 - calculate the semantic distance between each n-gram and the original text document. … semantic distance between items may be based on the likeness of item meaning or semantic content … semantic distance is determined to minimize the distance between each n-gram keyword vector and document vectors … top-n keywords may be extracted which has minimum distance from the document vectors] For example, the process encompasses three steps. First, the method includes extracting, by the processor 102, n-grams from the underlying text corpus for extracting keywords. N-grams depict the sequence of n-consecutive tokens in a string. For example, “digital bank experience” is a tri-gram of 3 consecutive words, “sustainability” is a uni-gram while “carbon emission” is a bi-gram of two consecutive words. The processor 102 may use the Count Vectorizer model to obtain a list of candidate n-grams. The Count Vectorizer model ranks the n-grams based on the corresponding frequency in the original document. All the n-grams extracted from the previous step may be converted to respective high-dimensional vectors using the is BERT model. The next step is to calculate the semantic distance between each n-gram and the original text document. The more the similarity, the more relevant and representative the keyword is to the source document. A semantic distance between items may be based on the likeness of item meaning or semantic content as opposed to lexicographical similarity. The semantic distance is determined to minimize the distance between each n-gram keyword vector and document vectors. From the generated matrix, top-n keywords may be extracted which has minimum distance from the document vectors.); when one of the first vector distances exceeds a threshold value (Iyer et al. 2024/0054290 [0032 - calculate the semantic distance between] In an example embodiment, the processor 102 may cause the keyword extraction engine 106 to extract a context-based keyword from an innovation dataset by transforming the innovation dataset into a vector. In an example embodiment, cause the keyword extraction engine 106 may also identify language in the innovation dataset. In an example embodiment, the innovation dataset includes data corresponding to an innovation. In an example embodiment, for extracting the context-based keywords from the innovation dataset, the processor 102 may extract n-grams from the innovation dataset, wherein the n-grams correspond to a sequence of n-consecutive tokens in a string of the innovation dataset. In an example embodiment, the processor 102 may rank the n-grams based on a frequency of the extracted n-grams in the innovation dataset. In an example embodiment, the processor 102 may determine a similarity of each ranked n-grams to the innovation dataset, using a cosine similarity technique, and extracting context-based keywords for the similar n-grams. In an example embodiment, the processor 102 may convert the extracted context-based keywords to high-dimensional vectors. In an example embodiment, the processor 102 may calculate the semantic distance between high-dimensional vectors and the innovation dataset. In an example embodiment, the processor 102 may validate the context-based keywords with a historical keyword dataset.), selecting a piece of the patent raw data corresponding to the one of the first vector distances (Iyer et al. 2024/0054290 [0032 - innovation dataset includes data corresponding to an innovation] In an example embodiment, the processor 102 may cause the keyword extraction engine 106 to extract a context-based keyword from an innovation dataset by transforming the innovation dataset into a vector. In an example embodiment, cause the keyword extraction engine 106 may also identify language in the innovation dataset. In an example embodiment, the innovation dataset includes data corresponding to an innovation. In an example embodiment, for extracting the context-based keywords from the innovation dataset, the processor 102 may extract n-grams from the innovation dataset, wherein the n-grams correspond to a sequence of n-consecutive tokens in a string of the innovation dataset. In an example embodiment, the processor 102 may rank the n-grams based on a frequency of the extracted n-grams in the innovation dataset. In an example embodiment, the processor 102 may determine a similarity of each ranked n-grams to the innovation dataset, using a cosine similarity technique, and extracting context-based keywords for the similar n-grams. In an example embodiment, the processor 102 may convert the extracted context-based keywords to high-dimensional vectors. In an example embodiment, the processor 102 may calculate the semantic distance between high-dimensional vectors and the innovation dataset. In an example embodiment, the processor 102 may validate the context-based keywords with a historical keyword dataset.); and recombining the phrases, vectorizing the recombined phrases to generate a combination vector (Iyer et al. 2024/0054290 [0033 - vectorize the extracted entity and key phrase] In an example embodiment, the processor 102 may cause the search engine 108 to search semantically relevant keywords for the extracted context-based keyword, by extracting an entity and a key phrase from the extracted a context-based keyword. In an example embodiment, the entities may correspond to named entity recognition in the innovation dataset. In an example embodiment, for searching semantically relevant keywords, the processor 102 may pre-process the extracted context-based keywords. The pre-processing comprises at least one of a noise removal process, a tokenization process, a stemming process, a lemmatization process, and a normalization process. In an example embodiment, the processor 102 may extract the entity and the key phrase from the pre-processed context-based keywords. In an example embodiment, the processor 102 may vectorize the extracted entity and key phrase for searching semantically relevant keywords. [0056 - Count Vectorizer model ranks the n-grams based on the corresponding frequency in the original document] For example, the process encompasses three steps. First, the method includes extracting, by the processor 102, n-grams from the underlying text corpus for extracting keywords. N-grams depict the sequence of n-consecutive tokens in a string. For example, “digital bank experience” is a tri-gram of 3 consecutive words, “sustainability” is a uni-gram while “carbon emission” is a bi-gram of two consecutive words. The processor 102 may use the Count Vectorizer model to obtain a list of candidate n-grams. The Count Vectorizer model ranks the n-grams based on the corresponding frequency in the original document. All the n-grams extracted from the previous step may be converted to respective high-dimensional vectors using the is BERT model. The next step is to calculate the semantic distance between each n-gram and the original text document. The more the similarity, the more relevant and representative the keyword is to the source document. A semantic distance between items may be based on the likeness of item meaning or semantic content as opposed to lexicographical similarity. The semantic distance is determined to minimize the distance between each n-gram keyword vector and document vectors. From the generated matrix, top-n keywords may be extracted which has minimum distance from the document vectors.), calculating a second vector distance between the combination vector and the mathematical vector of the selected piece of patent raw data (Iyer et al. 2024/0054290 [0032 - calculate the semantic distance between high-dimensional vectors and the innovation dataset] In an example embodiment, the processor 102 may cause the keyword extraction engine 106 to extract a context-based keyword from an innovation dataset by transforming the innovation dataset into a vector. In an example embodiment, cause the keyword extraction engine 106 may also identify language in the innovation dataset. In an example embodiment, the innovation dataset includes data corresponding to an innovation. In an example embodiment, for extracting the context-based keywords from the innovation dataset, the processor 102 may extract n-grams from the innovation dataset, wherein the n-grams correspond to a sequence of n-consecutive tokens in a string of the innovation dataset. In an example embodiment, the processor 102 may rank the n-grams based on a frequency of the extracted n-grams in the innovation dataset. In an example embodiment, the processor 102 may determine a similarity of each ranked n-grams to the innovation dataset, using a cosine similarity technique, and extracting context-based keywords for the similar n-grams. In an example embodiment, the processor 102 may convert the extracted context-based keywords to high-dimensional vectors. In an example embodiment, the processor 102 may calculate the semantic distance between high-dimensional vectors and the innovation dataset. In an example embodiment, the processor 102 may validate the context-based keywords with a historical keyword dataset.), and when the calculated second vector distance does not exceed the threshold value, generating an evaluation pass message based on the recombined phrases (Iyer et al. 2024/0054290 [0081 - processor 102 may post a proactive message channel by calling the graph API to provide context to the users and help them] For example, an application and contest admins can create channels or cohorts from the theme/trends cloud shown in the application. Following implementation helps to generate these channels in the applications. Step1: create “Team”, the application provides an option to create a cohort for trends identified. The user is provided with the option to create a cohort. The processor 102 may check for the existence of a team for the contest in the one or more different databases, if the team does not exist application may create a team by using graph API. The user may define a template by using graph API to provision the team by adding contest owners and co-owners to the team as team “owners.” Step 2: creating cohort/private channel on demand: once the team is created, the processor 102 may dynamically create a channel using graph API. The channel may be provisioned with the “searched keyword” name. Step 3: Add members to the channel. The processor 102 may include custom code to add contest owners and co-owners as owners of the channel and submitter, team members will be added as members of the created channel. Step 4: Post a proactive message in a channel, once the success flag is returned from the application, the processor 102 may post a proactive message channel by calling the graph API to provide context to the users and help them enable the collaboration journey. The message may be received as a nudge and activity feed in teams or channels.).
Iyer et al. 2024/0054290 may not expressly disclose the “phrase combining/recombining” and “threshold” features, however, Semenov 2024/0143632 teaches recombining the phrases (Semenov 2024/0143632 [0053 - combination of characters] In some implementations, the processing logic, at operation 206, can perform one or more of the following types of search: (i) exact word search; (ii) prefix search (i.e., searching all words in the index for which the searched word is a prefix); (iii) a fuzzy word search; (iv) a fuzzy prefix search (i.e., determination of all words in the index for which the searched word is a prefix using fuzzy comparisons); a (v) wildcard search (i.e., where a template pattern is set containing ordinary characters as well as a “?” (e.g., any character indication) or a “*” (e.g., an indication of any combination of characters, including empty characters/blanks)). [0058 - various possible combinations of words] For example, initially, for a particular entry in a record in the database, various possible combinations of words can be generated from the words of documents according to given rules (e.g., words immediately to the right can be x). Next, a set of rules is generated based on the types of information that can be contained in the entry on the database and the corresponding field in the document (i.e., each rule in the set of rules depends on the type of information that the entry/field can contain). Rules can be set by taking into account, for example, expert experience (e.g., knowledge) and/or static data sampling. For example, one such rule can be that an address can consist of 1-10 words, which are distributed from 1-5 uninterrupted character string per line, in accordance with the relevant expert/statistical data about the address field from the field nomenclature (e.g., product, total, vendor, etc.). [0065 - combinations of words, phrases, sentences can be represented/encoded by the vector embeddings] Individual words or meaningful combinations of words, phrases, sentences can be represented/encoded by the vector embeddings (i.e., vectors of fixed length, for example, 10-12 digits), for example, through Word2vec or other suitable trained methods. Thus, vector embeddings can be generated for each stable combinations of words on the document. The embeddings of each of the stable combinations on the document can be collected and stored as another array (e.g., Array2).), vectorizing the recombined phrases to generate a combination vector (Semenov 2024/0143632 [0065 - combinations of words, phrases, sentences can be represented/encoded by the vector embeddings] Individual words or meaningful combinations of words, phrases, sentences can be represented/encoded by the vector embeddings (i.e., vectors of fixed length, for example, 10-12 digits), for example, through Word2vec or other suitable trained methods. Thus, vector embeddings can be generated for each stable combinations of words on the document. The embeddings of each of the stable combinations on the document can be collected and stored as another array (e.g., Array2).), calculating a second vector distance between the combination vector and the mathematical vector of the selected piece of patent raw data (Semenov 2024/0143632 [0068 - determine that pairs of embeddings for which the calculated value of a function (e.g., distance) satisfies a criterion (e.g., the calculated distance between which is below a threshold value) are “close” to each other while those for which the calculated value of a function (e.g., distance) does not satisfy a criterion (e.g., the calculated distance between which is above a threshold value)] To determine a measure of similarity or correspondence to one another between the embeddings, the processing logic can evaluate the distance or another function in vector space of the respective encodings. For example, the correspondence between encodings can be evaluated by calculating a cosine or Euclidean measure of a distance between them in vector space. A calculation is made of how close each of the respective vector encodings are to one another. In some implementations, the processing logic can determine that pairs of embeddings for which the calculated value of a function (e.g., distance) satisfies a criterion (e.g., the calculated distance between which is below a threshold value) are “close” to each other while those for which the calculated value of a function (e.g., distance) does not satisfy a criterion (e.g., the calculated distance between which is above a threshold value) are “far” from each other. Subsequently, the processing logic can select close embeddings (i.e., encoded vectorized representations) as potential proposed correspondence associations for each particular field. Thus, when all distances between the embeddings stored in Array1 for a particular field class and embeddings in Array2 are calculated, the processing logic can determine stable combinations of embeddings (i.e., phrases/words on the documents with the maximum proximity/distance measure). If several embeddings (i.e., encodings of words/phrases of a document) are evaluated to be close to each other, then they can be considered to be a stable combination and therefore can be considered similar to one another. [0054 - distance equation to measure the difference between a regular expression and potential text matches. The difference or similarity (i.e., in terms of a percentage) between the regular expression pattern and the matched text can be expressed as a “confidence score” (e.g., a percentage)] A fuzzy word search, as used in this disclosure, can refer to a simplified version of the TRS system mechanism or as a separate mechanism or component of the TRS system. A fuzzy regular expression, Fuzzy RegEx, can use the Levenshtein distance equation to measure the difference between a regular expression and potential text matches. The difference or similarity (i.e., in terms of a percentage) between the regular expression pattern and the matched text can be expressed as a “confidence score” (e.g., a percentage). If the resulting confidence score is above a pre-determined specified threshold score (e.g., 90%), the matched expression pair can be kept for later use as a proposed correspondence association. If the resulting confidence score is below the threshold, the matched expression pair can be discarded. [0060 - one or more chains (i.e., sets) of proposed correspondence associations (i.e., each containing values of locations, confidence, length, and/or other attributes)] After the generation of proposed correspondence associations, the set of generated proposed correspondence associations can be fed as an input to another portion of the processing logic operating as a “proposed correspondence association enumerator”. Consequently, the processing logic, can assign a weight (i.e., a coefficient reflecting the probability that the entry corresponds to a particular region on the document) to each of the proposed correspondence associations. In some implementations, this weight, can reflect the measure or the corresponding degree of association between the entry of the record and the respective item of information. In the same or other implementations this weight, can reflect the measure or the corresponding degree of association between the character string and the respective the region on the document where item of information referenced by the entry is located. As the proposed correspondence association enumerator iterates through each proposed correspondence association in the set of proposed correspondence associations, it can calculate an aggregate sum of coefficients (e.g., weights) of the proposed correspondence associations of the fields and respective database record elements. The enumerator can calculate the aggregate value of the coefficients (i.e., indications of a degree of association) of for each of the proposed correspondence associations generated for each of the fields. Thus, a one or more chains (i.e., sets) of proposed correspondence associations (i.e., each containing values of locations, confidence, length, and/or other attributes) for each of the fields in the document is formed. For example, the processing logic can generate a proposed correspondence association indicating that character string Word-1 is associated with the information in document region R1 with a weight of 0.95 (indicating a probability of 95%) and that Word-2 is associated with the information in document region R1 with a weight of 0.65 (indicating a probability of 65%) and is associated with the information in document region R2 with 60% probability. Rather than deciding that Word-2 should be associated with field R1 (according to the higher probability), the processing logic can analyze two chains of hypothesis: (1) Word-1 is associated with the information in document region R1 and Word-2 is associated with the information in document region R1, and (2) Word-1 is associated with the information in document region R1 and Word-2 is associated with the information in document region R2. Because different entries in the database should not correspond to the same area in the document, the aggregate weight of the proposed correspondence associations of chain (2) is higher in total. The processing logic then determine that it is more likely that Word-1 and Word-2 are respectively associated with information in different document regions than that they are respectively associated with information in the same document regions. Consequently, the processing logic can calculate a function that selects a certain maximum value from the values in the various chains. In some implementations, the processing logic can calculate a function that maximizes an aggregate degree of association of a set of degrees of association. For example, the proposed correspondence association indicating that character string Word-2 corresponds to the information in document region R2 can be maximize an aggregate sum of weights of the proposed correspondence associations within the chains of proposed correspondence associations. In another example, a chain of proposed correspondence associations that leaves some entries uncorrelated with any document areas may be disfavored compared with a chain that associates at least one document area to each of the entries in the database. Thus, as a result of the operation of the enumerator, chains that contain proposed correspondence associations associated with the character strings in particular fields on the document can be selected based on the calculated function.), and when the calculated second vector distance does not exceed the threshold value (Semenov 2024/0143632 [0061 - criterion can be satisfied if a value (e.g., a weight value indicating a probability of a proposed correspondence association being valid) equals or exceeds a pre-determined threshold value or a minimum value, while in other implementations, the criterion can be satisfied if the value is equal to or less than a pre-determined threshold value or a maximum value] After the proposed correspondence associations enumerator has processed each character string, the processing logic can select, among the one or more proposed correspondence associations, those proposed correspondence associations that collectively or in the aggregate indicate a certain degree of association (i.e., collectively have an aggregate measure of correspondence) that satisfies a criterion. In several implementations, the processing logic can select, among the corresponding degrees of associations that have been determined, a set of corresponding degrees of association whose aggregate degree of association satisfies a criterion. In some implementation the degree of association can be indicated by the weight coefficient and collectively the aggregate weight values can represent the degree of association for a set of multiple proposed correspondence associations. In some implementations, the criterion can be satisfied if a value (e.g., a weight value indicating a probability of a proposed correspondence association being valid) equals or exceeds a pre-determined threshold value or a minimum value, while in other implementations, the criterion can be satisfied if the value is equal to or less than a pre-determined threshold value or a maximum value. For example, the criterion can be satisfied by a value of a measure of probability that the proposed correspondence association is valid exceeding a threshold probability (e.g., 65%). In another example, the criterion can be satisfied if a maximal aggregate weight coefficient value of a set of proposed correspondence associations is obtained by a particular combination or number of proposed correspondence associations forming the set. Accordingly, in some examples, the processing logic can select the proposed correspondence association with the maximum weight or the proposed correspondence associations of a chain that has a maximum aggregate weight value for all the proposed correspondence association in the chain. Thus, in such implementations, the processing logic would select the set of corresponding degrees of association whose aggregate degree of association is maximized or otherwise satisfies a criterion among the corresponding degrees of associations that have been determined. Accordingly, a collection of words (i.e., from among all of the words in the document) that are associated with each of the given document fields according to their respective nomenclature can be generated. After the word set is generated, the processing logic can identify the field in the document by determining the proposed correspondence association with the highest weight (e.g., aggregate confidence value). As a result, a potential region in the image can be identified as a field after the trained generator is tested. [0068 - calculated value of a function (e.g., distance) satisfies a criterion (e.g., the calculated distance between which is below a threshold value) are “close” to each other while those for which the calculated value of a function (e.g., distance) does not satisfy a criterion (e.g., the calculated distance between which is above a threshold value) are “far” from each other] To determine a measure of similarity or correspondence to one another between the embeddings, the processing logic can evaluate the distance or another function in vector space of the respective encodings. For example, the correspondence between encodings can be evaluated by calculating a cosine or Euclidean measure of a distance between them in vector space. A calculation is made of how close each of the respective vector encodings are to one another. In some implementations, the processing logic can determine that pairs of embeddings for which the calculated value of a function (e.g., distance) satisfies a criterion (e.g., the calculated distance between which is below a threshold value) are “close” to each other while those for which the calculated value of a function (e.g., distance) does not satisfy a criterion (e.g., the calculated distance between which is above a threshold value) are “far” from each other. Subsequently, the processing logic can select close embeddings (i.e., encoded vectorized representations) as potential proposed correspondence associations for each particular field. Thus, when all distances between the embeddings stored in Array1 for a particular field class and embeddings in Array2 are calculated, the processing logic can determine stable combinations of embeddings (i.e., phrases/words on the documents with the maximum proximity/distance measure). If several embeddings (i.e., encodings of words/phrases of a document) are evaluated to be close to each other, then they can be considered to be a stable combination and therefore can be considered similar to one another.), generating an evaluation pass message based on the recombined phrases (Semenov 2024/0143632 [0065 - combinations of words, phrases, sentences] Individual words or meaningful combinations of words, phrases, sentences can be represented/encoded by the vector embeddings (i.e., vectors of fixed length, for example, 10-12 digits), for example, through Word2vec or other suitable trained methods. Thus, vector embeddings can be generated for each stable combinations of words on the document. The embeddings of each of the stable combinations on the document can be collected and stored as another array (e.g., Array2).). Before the effective filing date of the claimed invention, it would have been obvious for one of ordinary skill in the art to have modified Iyer et al. 2024/0054290 to include the features as taught by Semenov 2024/0143632. One of ordinary skill in the art would have been motivated to do so to implement well known tools and features useful for implementing an innovative recombination recommendation system for a corporate knowledge base which should prove to improve user experience, maximize profits, and optimize revenue.

18/601,601 – Claim 6. Iyer et al. 2024/0054290 further teaches An innovative recombination recommendation method for a corporate knowledge base (Iyer et al. 2024/0054290 [0088 - knowledge repositories] A network communicator 535 may be provided to connect the computer system to a network and in turn to other devices connected to the network including other clients, servers, data stores, and interfaces, for example. A network communicator 535 may include, for example, a network adapter such as a LAN adapter or a wireless adapter. The computer system may include a data sources interface 540 to access the data source 545. The data source 545 may be an information resource. As an example, a database of exceptions and rules may be provided as the data source 545. Moreover, knowledge repositories and curated data may be other examples of the data source 545.), comprising: linking the corporate knowledge base to a server-end device through network (Iyer et al. 2024/0054290 [0083 - idea repository] The input may be received by the processor 102 from one or more databases/repositories. The input includes identified trends, idea trends mapping, cohorts, idea repository, engagement data, user and expert data, and the like. The processor 102 may display on the dashboard top trends and cohorts, user engagement dashboard, and dashboard for expert engagement. Further, the processor 102 may output a semantic knowledge network of trends, ideas, innovators, experts, and user demography of the innovators corresponding to the emerging innovation trend. Further, the processor 102 may illustrate the relationship between entities such as trends, ideas, innovators, experts, and user demography, to enable thought seeding. [0078 - innovation dataset] The base data for the processor 102 may include user profile data, ideas (innovation dataset), teams (themes mapping (AI+human validated)), and expert data. For example, the innovation may be organized from corporate incubators, innovation centers of excellence, innovation ecosystems, innovation hubs, innovation labs, open innovation, innovation governance committees, and the like. The ideas may be generated from expert networks for innovation, idea management tools, hackathons, internal pitch events, idea challenges, innovation workshops, and the like. Further, the idea may be evaluated and experimented with a lean start-up, design thinking. Further, the idea may be improving the innovation process and culture using an AI-driven innovation, International Organization for Standardization (ISO), innovation culture hacks, innovation training, and the like. [Figs. 1 and 2; 0040 - processor 102 may execute the keyword extraction engine 106 to receive input data such as an innovation dataset from an idea repository] At step 202, the processor 102 may execute the keyword extraction engine 106 to receive input data such as an innovation dataset from an idea repository. At step 204, keywords from the innovation dataset may be extracted using keyword extraction based Bidirectional Encoder Representations from Transformers (KeyBERT) for extracting keywords that represent an underlying text document associated with the innovation dataset, and detecting the language of the innovation dataset using, for example, spaCy language detector®. [0083 - received by the processor 102 from one or more databases/repositories] The input may be received by the processor 102 from one or more databases/repositories. The input includes identified trends, idea trends mapping, cohorts, idea repository, engagement data, user and expert data, and the like. The processor 102 may display on the dashboard top trends and cohorts, user engagement dashboard, and dashboard for expert engagement. Further, the processor 102 may output a semantic knowledge network of trends, ideas, innovators, experts, and user demography of the innovators corresponding to the emerging innovation trend. Further, the processor 102 may illustrate the relationship between entities such as trends, ideas, innovators, experts, and user demography, to enable thought seeding. [0088 - network communicator 535 may be provided to connect the computer system to a network and in turn to other devices connected to the network including other clients, servers, data stores, and interfaces] A network communicator 535 may be provided to connect the computer system to a network and in turn to other devices connected to the network including other clients, servers, data stores, and interfaces, for example. A network communicator 535 may include, for example, a network adapter such as a LAN adapter or a wireless adapter. The computer system may include a data sources interface 540 to access the data source 545. The data source 545 may be an information resource. As an example, a database of exceptions and rules may be provided as the data source 545. Moreover, knowledge repositories and curated data may be other examples of the data source 545.), wherein the corporate knowledge base stores pieces of patent raw data (Iyer et al. 2024/0054290 [0058 - patent, non-patent literature] At step 342, the method includes determining candidate expression. The expressions are a subset of the n-grams. At step 344, the method includes mapping contextual keywords and ideas. At step 346, the method includes validating the best candidate expressions based on the idea similarity model and historical keywords dataset from a database. For each idea, the keywords extracted may be validated against the historical keyword dataset using the idea similarity model <<Actual document number>>. The actual document number can be a Identification (ID) number or application/publication ID of, but not limited to, a patent, non-patent literature, a technical paper, a white paper, an article, a power point presentation, a note, and the like. For example, the keyword extraction engine 106 outputs extracted keywords, for example, Robotic Process Automation (RPA), an automation BOTs for some ideas. The extracted keywords may be validated and compared to the idea similarity model, which matches and adds words which are in proximity such as Robotic Process Automation (RPA) in operations, and intelligent operations. After obtaining the ranked list of the n-grams based on how relevant those are to the source document, the next step is to re-rank them based on Maximal margin relevance or Max sum strategy. Here the distance between n-gram to the source document is minimized, however at the same time maximizes the distance with other candidate n-grams. This ensures that no output is similar meaning n-grams as probable keywords in final set, hence there is diversity. Hence after the final step of re-ranking, when a user searches for ‘intelligent operations,’ in turn, the results may also include the ideas related to RPA or BOTs.), wherein each of the pieces of patent raw data corresponds to at least one mathematical vector (Iyer et al. 2024/0054290 [0003 - extracts a context-based keyword from an innovation dataset by transforming the innovation dataset into a vector] An embodiment of present disclosure includes a system, the system extracts a context-based keyword from an innovation dataset by transforming the innovation dataset into a vector. The innovation dataset includes data corresponding to an innovation. Further, the system searches semantically relevant keywords for the extracted context-based keyword, by extracting an entity and a key phrase from the extracted a context-based keyword. The entities correspond to named entity recognition in the innovation dataset. Furthermore, the system clusters the vector, by identifying frequent keywords in the semantically relevant keywords to obtain cluster centroids of the frequent keywords. Thereafter, the system determines weighted keywords in each cluster using the obtained cluster centroids, and classifies the weighted keywords to identify emerging innovation trends relevant to the innovation in the innovation dataset. [0032 - convert the extracted context-based keywords to high-dimensional vectors. In an example embodiment, the processor 102 may calculate the semantic distance between high-dimensional vectors and the innovation dataset] In an example embodiment, the processor 102 may cause the keyword extraction engine 106 to extract a context-based keyword from an innovation dataset by transforming the innovation dataset into a vector. In an example embodiment, cause the keyword extraction engine 106 may also identify language in the innovation dataset. In an example embodiment, the innovation dataset includes data corresponding to an innovation. In an example embodiment, for extracting the context-based keywords from the innovation dataset, the processor 102 may extract n-grams from the innovation dataset, wherein the n-grams correspond to a sequence of n-consecutive tokens in a string of the innovation dataset. In an example embodiment, the processor 102 may rank the n-grams based on a frequency of the extracted n-grams in the innovation dataset. In an example embodiment, the processor 102 may determine a similarity of each ranked n-grams to the innovation dataset, using a cosine similarity technique, and extracting context-based keywords for the similar n-grams. In an example embodiment, the processor 102 may convert the extracted context-based keywords to high-dimensional vectors. In an example embodiment, the processor 102 may calculate the semantic distance between high-dimensional vectors and the innovation dataset. In an example embodiment, the processor 102 may validate the context-based keywords with a historical keyword dataset. [0034 - cause the clustering engine 110 to cluster the vector, by identifying frequent keywords in the semantically relevant keywords to obtain cluster centroids of the frequent keywords] In an example embodiment, the processor 102 may cause the clustering engine 110 to cluster the vector, by identifying frequent keywords in the semantically relevant keywords to obtain cluster centroids of the frequent keywords. In an embodiment, the clustering may be performed using, but not limited to, an agglomerative hierarchical clustering technique, a K-means clustering technique, and the like. [0056 - Count Vectorizer model ranks the n-grams based on the corresponding frequency in the … the n-grams extracted from the previous step may be converted to respective high-dimensional vectors using the is BERT model…] For example, the process encompasses three steps. First, the method includes extracting, by the processor 102, n-grams from the underlying text corpus for extracting keywords. N-grams depict the sequence of n-consecutive tokens in a string. For example, “digital bank experience” is a tri-gram of 3 consecutive words, “sustainability” is a uni-gram while “carbon emission” is a bi-gram of two consecutive words. The processor 102 may use the Count Vectorizer model to obtain a list of candidate n-grams. The Count Vectorizer model ranks the n-grams based on the corresponding frequency in the original document. All the n-grams extracted from the previous step may be converted to respective high-dimensional vectors using the is BERT model. The next step is to calculate the semantic distance between each n-gram and the original text document. The more the similarity, the more relevant and representative the keyword is to the source document. A semantic distance between items may be based on the likeness of item meaning or semantic content as opposed to lexicographical similarity. The semantic distance is determined to minimize the distance between each n-gram keyword vector and document vectors. From the generated matrix, top-n keywords may be extracted which has minimum distance from the document vectors.), wherein the server-end device comprises a non-transitory computer-readable storage medium storing computer readable instructions (Iyer et al. 2024/0054290 [0086 - instructions on the computer-readable storage medium 510 are read and stored the instructions…] The instructions on the computer-readable storage medium 510 are read and stored the instructions in storage 515 or in random access memory (RAM). The storage 515 may provide a space for keeping static data where at least some instructions could be stored for later execution. The stored instructions may be further compiled to generate other representations of the instructions and dynamically stored in the RAM such as RAM 520. The processor 505 may read instructions from the RAM 520 and perform actions as instructed.), and a hardware processor executing the computer readable instructions to make the server-end device execute (Iyer et al. 2024/0054290 [0031 - system 100 may be a hardware device including the processor 102 executing machine-readable program instructions/processor-executable instructions…] The system 100 may be a hardware device including the processor 102 executing machine-readable program instructions/processor-executable instructions, to perform deep technology innovation management by cross-pollinating innovations dataset. Execution of the machine-readable program instructions by the processor 102 may enable the proposed system 100 to enable deep technology innovation management by cross-pollinating innovations dataset. The “hardware” may comprise a combination of discrete components, an integrated circuit, an application-specific integrated circuit, a field-programmable gate array, a digital signal processor, or other suitable hardware. The “software” may comprise one or more objects, agents, threads, lines of code, subroutines, separate software applications, two or more lines of code, or other suitable software structures operating in one or more software applications or on one or more processors. The processor 102 may include, for example, microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuits, and/or any devices that manipulate data or signals based on operational instructions. Among other capabilities, processor 102 may fetch and execute computer-readable instructions in a memory operationally coupled with system 100 for performing tasks such as data processing, input/output processing, keyword extraction, and/or any other functions. Any reference to a task in the present disclosure may refer to an operation being or that may be performed on data.): receiving an innovation summary, splitting the innovation summary into phrases (Iyer et al. 2024/0054290 [0054 - innovation dataset may be abstract structured, and unstructured, with multiple dimensions having free text data, … the processor 102 may execute keyword extraction engine 106 to perform context-based keyword extraction … that allows transforming phrases and documents to vectors that captures corresponding meaning…] For example, the innovation dataset may be abstract structured, and unstructured, with multiple dimensions having free text data, with multiple languages, acronyms, and emojis acting as noise. Hence, a Python® script may be used to perform noise removal, and text pre-processing using multiple Natural Language (NLP) techniques. The dataset may include multiple languages. A language detection algorithm may be used to detect language and transform the text data into the English language. Considering the length of each document, the processor 102 may execute keyword extraction engine 106 to perform context-based keyword extraction and then use the document/word embedding on top of the extracted keyword for the semantic search. The context-based keyword extraction may be performed using BERT, which is a bi-directional transformer model that allows transforming phrases and documents to vectors that captures corresponding meaning. The KeyBERT is a keyword extraction library that leverages BERT embeddings to obtain keywords that represent the underlying text document. The KeyBERT may be an unsupervised extractive way of obtaining keywords from a given text. [0062 - stemming and tokenization in which the text is split into smaller units] In another example, ‘retaining,’ ‘amplifying’ upgrading’ and the like. The POS entities may be detected to capture important and relevant entities such as a noun, singular (NN), proper noun singular (NNP) (tags for singular nouns), noun, plural (NNS), proper noun plural (NNPS) (tags for plural nouns), adjective (JJ). For example, retaining nouns or entities such as ‘metaverse,’ ‘digital twin,’ ‘quantum computing.’ Further, for stemming and tokenization in which the text is split into smaller units. For stemming may be the process of reducing a word to its word stem that affixes to suffixes and prefixes. Punctuations and special characters from the text may be removed.), and vectorizing each of the phrases to generate a phrase vector (Iyer et al. 2024/0054290 [0033 - processor 102 may vectorize the extracted entity and key phrase for searching semantically relevant keywords] In an example embodiment, the processor 102 may cause the search engine 108 to search semantically relevant keywords for the extracted context-based keyword, by extracting an entity and a key phrase from the extracted a context-based keyword. In an example embodiment, the entities may correspond to named entity recognition in the innovation dataset. In an example embodiment, for searching semantically relevant keywords, the processor 102 may pre-process the extracted context-based keywords. The pre-processing comprises at least one of a noise removal process, a tokenization process, a stemming process, a lemmatization process, and a normalization process. In an example embodiment, the processor 102 may extract the entity and the key phrase from the pre-processed context-based keywords. In an example embodiment, the processor 102 may vectorize the extracted entity and key phrase for searching semantically relevant keywords. [0054 - transforming phrases and documents to vectors that captures corresponding meaning] For example, the innovation dataset may be abstract structured, and unstructured, with multiple dimensions having free text data, with multiple languages, acronyms, and emojis acting as noise. Hence, a Python® script may be used to perform noise removal, and text pre-processing using multiple Natural Language (NLP) techniques. The dataset may include multiple languages. A language detection algorithm may be used to detect language and transform the text data into the English language. Considering the length of each document, the processor 102 may execute keyword extraction engine 106 to perform context-based keyword extraction and then use the document/word embedding on top of the extracted keyword for the semantic search. The context-based keyword extraction may be performed using BERT, which is a bi-directional transformer model that allows transforming phrases and documents to vectors that captures corresponding meaning. The KeyBERT is a keyword extraction library that leverages BERT embeddings to obtain keywords that represent the underlying text document. The KeyBERT may be an unsupervised extractive way of obtaining keywords from a given text. [0063 – extracting … key phrases using the preprocessed data … includes vectorizing … the extracted entity and the key phrase … training the vector model, the model may be tuned…] At step 356, the method includes extracting entity, by the processor 102, using the preprocessed data. At step 358, the method includes extracting, by the processor 102, key phrases using the preprocessed data. At step 360, the method includes vectorizing, by the processor 102, the extracted entity and the key phrase. While training the vector model, the model may be tuned using multiple hyperparameters such as changing several iterations (epochs) over the corpus data, “vector_size” may be an input of dimensionality of the feature vector, input for the training algorithm ‘Distributed Memory’ (PV-DM) may be used. Otherwise, distributed Bag of Words (PV-DBOW) may be employed. A Hierarchical Softmax (HS) may be used for model training, “DM_mean” which decides the sum of the context word vectors.), by the server-end device; transmitting the phrase vectors to the corporate knowledge base (Iyer et al. 2024/0054290 [0054 - a bi-directional transformer model that allows transforming phrases and documents to vectors…] For example, the innovation dataset may be abstract structured, and unstructured, with multiple dimensions having free text data, with multiple languages, acronyms, and emojis acting as noise. Hence, a Python® script may be used to perform noise removal, and text pre-processing using multiple Natural Language (NLP) techniques. The dataset may include multiple languages. A language detection algorithm may be used to detect language and transform the text data into the English language. Considering the length of each document, the processor 102 may execute keyword extraction engine 106 to perform context-based keyword extraction and then use the document/word embedding on top of the extracted keyword for the semantic search. The context-based keyword extraction may be performed using BERT, which is a bi-directional transformer model that allows transforming phrases and documents to vectors that captures corresponding meaning. The KeyBERT is a keyword extraction library that leverages BERT embeddings to obtain keywords that represent the underlying text document. The KeyBERT may be an unsupervised extractive way of obtaining keywords from a given text. [0040 - keyword extraction engine 106 to receive input data such as an innovation dataset from an idea repository] At step 202, the processor 102 may execute the keyword extraction engine 106 to receive input data such as an innovation dataset from an idea repository. At step 204, keywords from the innovation dataset may be extracted using keyword extraction based Bidirectional Encoder Representations from Transformers (KeyBERT) for extracting keywords that represent an underlying text document associated with the innovation dataset, and detecting the language of the innovation dataset using, for example, spaCy language detector®.), comparing the phrase vectors with one of the mathematical vectors one by one (Iyer et al. 2024/0054290 [0058 - extracted keywords may be validated and compared to the idea similarity model…] At step 342, the method includes determining candidate expression. The expressions are a subset of the n-grams. At step 344, the method includes mapping contextual keywords and ideas. At step 346, the method includes validating the best candidate expressions based on the idea similarity model and historical keywords dataset from a database. For each idea, the keywords extracted may be validated against the historical keyword dataset using the idea similarity model <<Actual document number>>. The actual document number can be a Identification (ID) number or application/publication ID of, but not limited to, a patent, non-patent literature, a technical paper, a white paper, an article, a power point presentation, a note, and the like. For example, the keyword extraction engine 106 outputs extracted keywords, for example, Robotic Process Automation (RPA), an automation BOTs for some ideas. The extracted keywords may be validated and compared to the idea similarity model, which matches and adds words which are in proximity such as Robotic Process Automation (RPA) in operations, and intelligent operations. After obtaining the ranked list of the n-grams based on how relevant those are to the source document, the next step is to re-rank them based on Maximal margin relevance or Max sum strategy. Here the distance between n-gram to the source document is minimized, however at the same time maximizes the distance with other candidate n-grams. This ensures that no output is similar meaning n-grams as probable keywords in final set, hence there is diversity. Hence after the final step of re-ranking, when a user searches for ‘intelligent operations,’ in turn, the results may also include the ideas related to RPA or BOTs. [0054 - transformer model that allows transforming phrases and documents to vectors…] For example, the innovation dataset may be abstract structured, and unstructured, with multiple dimensions having free text data, with multiple languages, acronyms, and emojis acting as noise. Hence, a Python® script may be used to perform noise removal, and text pre-processing using multiple Natural Language (NLP) techniques. The dataset may include multiple languages. A language detection algorithm may be used to detect language and transform the text data into the English language. Considering the length of each document, the processor 102 may execute keyword extraction engine 106 to perform context-based keyword extraction and then use the document/word embedding on top of the extracted keyword for the semantic search. The context-based keyword extraction may be performed using BERT, which is a bi-directional transformer model that allows transforming phrases and documents to vectors that captures corresponding meaning. The KeyBERT is a keyword extraction library that leverages BERT embeddings to obtain keywords that represent the underlying text document. The KeyBERT may be an unsupervised extractive way of obtaining keywords from a given text.), and calculating a first vector distance between the phrase vectors and one of the mathematical vectors, by the server-end device (Iyer et al. 2024/0054290 [0032 - processor 102 may calculate the semantic distance between high-dimensional vectors and the innovation dataset…] In an example embodiment, the processor 102 may cause the keyword extraction engine 106 to extract a context-based keyword from an innovation dataset by transforming the innovation dataset into a vector. In an example embodiment, cause the keyword extraction engine 106 may also identify language in the innovation dataset. In an example embodiment, the innovation dataset includes data corresponding to an innovation. In an example embodiment, for extracting the context-based keywords from the innovation dataset, the processor 102 may extract n-grams from the innovation dataset, wherein the n-grams correspond to a sequence of n-consecutive tokens in a string of the innovation dataset. In an example embodiment, the processor 102 may rank the n-grams based on a frequency of the extracted n-grams in the innovation dataset. In an example embodiment, the processor 102 may determine a similarity of each ranked n-grams to the innovation dataset, using a cosine similarity technique, and extracting context-based keywords for the similar n-grams. In an example embodiment, the processor 102 may convert the extracted context-based keywords to high-dimensional vectors. In an example embodiment, the processor 102 may calculate the semantic distance between high-dimensional vectors and the innovation dataset. In an example embodiment, the processor 102 may validate the context-based keywords with a historical keyword dataset. [0056 - calculate the semantic distance between each n-gram and the original text document. … semantic distance between items may be based on the likeness of item meaning or semantic content … semantic distance is determined to minimize the distance between each n-gram keyword vector and document vectors … top-n keywords may be extracted which has minimum distance from the document vectors] For example, the process encompasses three steps. First, the method includes extracting, by the processor 102, n-grams from the underlying text corpus for extracting keywords. N-grams depict the sequence of n-consecutive tokens in a string. For example, “digital bank experience” is a tri-gram of 3 consecutive words, “sustainability” is a uni-gram while “carbon emission” is a bi-gram of two consecutive words. The processor 102 may use the Count Vectorizer model to obtain a list of candidate n-grams. The Count Vectorizer model ranks the n-grams based on the corresponding frequency in the original document. All the n-grams extracted from the previous step may be converted to respective high-dimensional vectors using the is BERT model. The next step is to calculate the semantic distance between each n-gram and the original text document. The more the similarity, the more relevant and representative the keyword is to the source document. A semantic distance between items may be based on the likeness of item meaning or semantic content as opposed to lexicographical similarity. The semantic distance is determined to minimize the distance between each n-gram keyword vector and document vectors. From the generated matrix, top-n keywords may be extracted which has minimum distance from the document vectors.); selecting a mathematical vector from the mathematical vectors with the first vector distance exceeding a threshold value, and the patent raw data corresponding to the selected mathematical vector, by the server-end device (Iyer et al. 2024/0054290 [0032 - innovation dataset includes data corresponding to an innovation] In an example embodiment, the processor 102 may cause the keyword extraction engine 106 to extract a context-based keyword from an innovation dataset by transforming the innovation dataset into a vector. In an example embodiment, cause the keyword extraction engine 106 may also identify language in the innovation dataset. In an example embodiment, the innovation dataset includes data corresponding to an innovation. In an example embodiment, for extracting the context-based keywords from the innovation dataset, the processor 102 may extract n-grams from the innovation dataset, wherein the n-grams correspond to a sequence of n-consecutive tokens in a string of the innovation dataset. In an example embodiment, the processor 102 may rank the n-grams based on a frequency of the extracted n-grams in the innovation dataset. In an example embodiment, the processor 102 may determine a similarity of each ranked n-grams to the innovation dataset, using a cosine similarity technique, and extracting context-based keywords for the similar n-grams. In an example embodiment, the processor 102 may convert the extracted context-based keywords to high-dimensional vectors. In an example embodiment, the processor 102 may calculate the semantic distance between high-dimensional vectors and the innovation dataset. In an example embodiment, the processor 102 may validate the context-based keywords with a historical keyword dataset.); and recombining the phrases, vectorizing the recombined phrases to generate a combination vector (Iyer et al. 2024/0054290 [0033 - vectorize the extracted entity and key phrase] In an example embodiment, the processor 102 may cause the search engine 108 to search semantically relevant keywords for the extracted context-based keyword, by extracting an entity and a key phrase from the extracted a context-based keyword. In an example embodiment, the entities may correspond to named entity recognition in the innovation dataset. In an example embodiment, for searching semantically relevant keywords, the processor 102 may pre-process the extracted context-based keywords. The pre-processing comprises at least one of a noise removal process, a tokenization process, a stemming process, a lemmatization process, and a normalization process. In an example embodiment, the processor 102 may extract the entity and the key phrase from the pre-processed context-based keywords. In an example embodiment, the processor 102 may vectorize the extracted entity and key phrase for searching semantically relevant keywords. [0056 - Count Vectorizer model ranks the n-grams based on the corresponding frequency in the original document] For example, the process encompasses three steps. First, the method includes extracting, by the processor 102, n-grams from the underlying text corpus for extracting keywords. N-grams depict the sequence of n-consecutive tokens in a string. For example, “digital bank experience” is a tri-gram of 3 consecutive words, “sustainability” is a uni-gram while “carbon emission” is a bi-gram of two consecutive words. The processor 102 may use the Count Vectorizer model to obtain a list of candidate n-grams. The Count Vectorizer model ranks the n-grams based on the corresponding frequency in the original document. All the n-grams extracted from the previous step may be converted to respective high-dimensional vectors using the is BERT model. The next step is to calculate the semantic distance between each n-gram and the original text document. The more the similarity, the more relevant and representative the keyword is to the source document. A semantic distance between items may be based on the likeness of item meaning or semantic content as opposed to lexicographical similarity. The semantic distance is determined to minimize the distance between each n-gram keyword vector and document vectors. From the generated matrix, top-n keywords may be extracted which has minimum distance from the document vectors.), calculating a second vector distance between the combination vector and the mathematical vector of the selected patent raw data (Iyer et al. 2024/0054290 [0032 - calculate the semantic distance between high-dimensional vectors and the innovation dataset] In an example embodiment, the processor 102 may cause the keyword extraction engine 106 to extract a context-based keyword from an innovation dataset by transforming the innovation dataset into a vector. In an example embodiment, cause the keyword extraction engine 106 may also identify language in the innovation dataset. In an example embodiment, the innovation dataset includes data corresponding to an innovation. In an example embodiment, for extracting the context-based keywords from the innovation dataset, the processor 102 may extract n-grams from the innovation dataset, wherein the n-grams correspond to a sequence of n-consecutive tokens in a string of the innovation dataset. In an example embodiment, the processor 102 may rank the n-grams based on a frequency of the extracted n-grams in the innovation dataset. In an example embodiment, the processor 102 may determine a similarity of each ranked n-grams to the innovation dataset, using a cosine similarity technique, and extracting context-based keywords for the similar n-grams. In an example embodiment, the processor 102 may convert the extracted context-based keywords to high-dimensional vectors. In an example embodiment, the processor 102 may calculate the semantic distance between high-dimensional vectors and the innovation dataset. In an example embodiment, the processor 102 may validate the context-based keywords with a historical keyword dataset.), and when the calculated second vector distance does not exceed the threshold value, generating an evaluation pass message based on the recombined phrase, by the server-end device (Iyer et al. 2024/0054290 [0081 - processor 102 may post a proactive message channel by calling the graph API to provide context to the users and help them] For example, an application and contest admins can create channels or cohorts from the theme/trends cloud shown in the application. Following implementation helps to generate these channels in the applications. Step1: create “Team”, the application provides an option to create a cohort for trends identified. The user is provided with the option to create a cohort. The processor 102 may check for the existence of a team for the contest in the one or more different databases, if the team does not exist application may create a team by using graph API. The user may define a template by using graph API to provision the team by adding contest owners and co-owners to the team as team “owners.” Step 2: creating cohort/private channel on demand: once the team is created, the processor 102 may dynamically create a channel using graph API. The channel may be provisioned with the “searched keyword” name. Step 3: Add members to the channel. The processor 102 may include custom code to add contest owners and co-owners as owners of the channel and submitter, team members will be added as members of the created channel. Step 4: Post a proactive message in a channel, once the success flag is returned from the application, the processor 102 may post a proactive message channel by calling the graph API to provide context to the users and help them enable the collaboration journey. The message may be received as a nudge and activity feed in teams or channels.).
Iyer et al. 2024/0054290 may not expressly disclose the “phrase combining/recombining” and “threshold” features, however, Semenov 2024/0143632 teaches recombining the phrases (Semenov 2024/0143632 [0053 - combination of characters] In some implementations, the processing logic, at operation 206, can perform one or more of the following types of search: (i) exact word search; (ii) prefix search (i.e., searching all words in the index for which the searched word is a prefix); (iii) a fuzzy word search; (iv) a fuzzy prefix search (i.e., determination of all words in the index for which the searched word is a prefix using fuzzy comparisons); a (v) wildcard search (i.e., where a template pattern is set containing ordinary characters as well as a “?” (e.g., any character indication) or a “*” (e.g., an indication of any combination of characters, including empty characters/blanks)). [0058 - various possible combinations of words] For example, initially, for a particular entry in a record in the database, various possible combinations of words can be generated from the words of documents according to given rules (e.g., words immediately to the right can be x). Next, a set of rules is generated based on the types of information that can be contained in the entry on the database and the corresponding field in the document (i.e., each rule in the set of rules depends on the type of information that the entry/field can contain). Rules can be set by taking into account, for example, expert experience (e.g., knowledge) and/or static data sampling. For example, one such rule can be that an address can consist of 1-10 words, which are distributed from 1-5 uninterrupted character string per line, in accordance with the relevant expert/statistical data about the address field from the field nomenclature (e.g., product, total, vendor, etc.). [0065 - combinations of words, phrases, sentences can be represented/encoded by the vector embeddings] Individual words or meaningful combinations of words, phrases, sentences can be represented/encoded by the vector embeddings (i.e., vectors of fixed length, for example, 10-12 digits), for example, through Word2vec or other suitable trained methods. Thus, vector embeddings can be generated for each stable combinations of words on the document. The embeddings of each of the stable combinations on the document can be collected and stored as another array (e.g., Array2).), vectorizing the recombined phrases to generate a combination vector (Semenov 2024/0143632 [0065 - combinations of words, phrases, sentences can be represented/encoded by the vector embeddings] Individual words or meaningful combinations of words, phrases, sentences can be represented/encoded by the vector embeddings (i.e., vectors of fixed length, for example, 10-12 digits), for example, through Word2vec or other suitable trained methods. Thus, vector embeddings can be generated for each stable combinations of words on the document. The embeddings of each of the stable combinations on the document can be collected and stored as another array (e.g., Array2).), calculating a second vector distance between the combination vector and the mathematical vector of the selected piece of patent raw data (Semenov 2024/0143632 [0068 - determine that pairs of embeddings for which the calculated value of a function (e.g., distance) satisfies a criterion (e.g., the calculated distance between which is below a threshold value) are “close” to each other while those for which the calculated value of a function (e.g., distance) does not satisfy a criterion (e.g., the calculated distance between which is above a threshold value)] To determine a measure of similarity or correspondence to one another between the embeddings, the processing logic can evaluate the distance or another function in vector space of the respective encodings. For example, the correspondence between encodings can be evaluated by calculating a cosine or Euclidean measure of a distance between them in vector space. A calculation is made of how close each of the respective vector encodings are to one another. In some implementations, the processing logic can determine that pairs of embeddings for which the calculated value of a function (e.g., distance) satisfies a criterion (e.g., the calculated distance between which is below a threshold value) are “close” to each other while those for which the calculated value of a function (e.g., distance) does not satisfy a criterion (e.g., the calculated distance between which is above a threshold value) are “far” from each other. Subsequently, the processing logic can select close embeddings (i.e., encoded vectorized representations) as potential proposed correspondence associations for each particular field. Thus, when all distances between the embeddings stored in Array1 for a particular field class and embeddings in Array2 are calculated, the processing logic can determine stable combinations of embeddings (i.e., phrases/words on the documents with the maximum proximity/distance measure). If several embeddings (i.e., encodings of words/phrases of a document) are evaluated to be close to each other, then they can be considered to be a stable combination and therefore can be considered similar to one another. [0054 - distance equation to measure the difference between a regular expression and potential text matches. The difference or similarity (i.e., in terms of a percentage) between the regular expression pattern and the matched text can be expressed as a “confidence score” (e.g., a percentage)] A fuzzy word search, as used in this disclosure, can refer to a simplified version of the TRS system mechanism or as a separate mechanism or component of the TRS system. A fuzzy regular expression, Fuzzy RegEx, can use the Levenshtein distance equation to measure the difference between a regular expression and potential text matches. The difference or similarity (i.e., in terms of a percentage) between the regular expression pattern and the matched text can be expressed as a “confidence score” (e.g., a percentage). If the resulting confidence score is above a pre-determined specified threshold score (e.g., 90%), the matched expression pair can be kept for later use as a proposed correspondence association. If the resulting confidence score is below the threshold, the matched expression pair can be discarded. [0060 - one or more chains (i.e., sets) of proposed correspondence associations (i.e., each containing values of locations, confidence, length, and/or other attributes)] After the generation of proposed correspondence associations, the set of generated proposed correspondence associations can be fed as an input to another portion of the processing logic operating as a “proposed correspondence association enumerator”. Consequently, the processing logic, can assign a weight (i.e., a coefficient reflecting the probability that the entry corresponds to a particular region on the document) to each of the proposed correspondence associations. In some implementations, this weight, can reflect the measure or the corresponding degree of association between the entry of the record and the respective item of information. In the same or other implementations this weight, can reflect the measure or the corresponding degree of association between the character string and the respective the region on the document where item of information referenced by the entry is located. As the proposed correspondence association enumerator iterates through each proposed correspondence association in the set of proposed correspondence associations, it can calculate an aggregate sum of coefficients (e.g., weights) of the proposed correspondence associations of the fields and respective database record elements. The enumerator can calculate the aggregate value of the coefficients (i.e., indications of a degree of association) of for each of the proposed correspondence associations generated for each of the fields. Thus, a one or more chains (i.e., sets) of proposed correspondence associations (i.e., each containing values of locations, confidence, length, and/or other attributes) for each of the fields in the document is formed. For example, the processing logic can generate a proposed correspondence association indicating that character string Word-1 is associated with the information in document region R1 with a weight of 0.95 (indicating a probability of 95%) and that Word-2 is associated with the information in document region R1 with a weight of 0.65 (indicating a probability of 65%) and is associated with the information in document region R2 with 60% probability. Rather than deciding that Word-2 should be associated with field R1 (according to the higher probability), the processing logic can analyze two chains of hypothesis: (1) Word-1 is associated with the information in document region R1 and Word-2 is associated with the information in document region R1, and (2) Word-1 is associated with the information in document region R1 and Word-2 is associated with the information in document region R2. Because different entries in the database should not correspond to the same area in the document, the aggregate weight of the proposed correspondence associations of chain (2) is higher in total. The processing logic then determine that it is more likely that Word-1 and Word-2 are respectively associated with information in different document regions than that they are respectively associated with information in the same document regions. Consequently, the processing logic can calculate a function that selects a certain maximum value from the values in the various chains. In some implementations, the processing logic can calculate a function that maximizes an aggregate degree of association of a set of degrees of association. For example, the proposed correspondence association indicating that character string Word-2 corresponds to the information in document region R2 can be maximize an aggregate sum of weights of the proposed correspondence associations within the chains of proposed correspondence associations. In another example, a chain of proposed correspondence associations that leaves some entries uncorrelated with any document areas may be disfavored compared with a chain that associates at least one document area to each of the entries in the database. Thus, as a result of the operation of the enumerator, chains that contain proposed correspondence associations associated with the character strings in particular fields on the document can be selected based on the calculated function.), and when the calculated second vector distance does not exceed the threshold value (Semenov 2024/0143632 [0061 - criterion can be satisfied if a value (e.g., a weight value indicating a probability of a proposed correspondence association being valid) equals or exceeds a pre-determined threshold value or a minimum value, while in other implementations, the criterion can be satisfied if the value is equal to or less than a pre-determined threshold value or a maximum value] After the proposed correspondence associations enumerator has processed each character string, the processing logic can select, among the one or more proposed correspondence associations, those proposed correspondence associations that collectively or in the aggregate indicate a certain degree of association (i.e., collectively have an aggregate measure of correspondence) that satisfies a criterion. In several implementations, the processing logic can select, among the corresponding degrees of associations that have been determined, a set of corresponding degrees of association whose aggregate degree of association satisfies a criterion. In some implementation the degree of association can be indicated by the weight coefficient and collectively the aggregate weight values can represent the degree of association for a set of multiple proposed correspondence associations. In some implementations, the criterion can be satisfied if a value (e.g., a weight value indicating a probability of a proposed correspondence association being valid) equals or exceeds a pre-determined threshold value or a minimum value, while in other implementations, the criterion can be satisfied if the value is equal to or less than a pre-determined threshold value or a maximum value. For example, the criterion can be satisfied by a value of a measure of probability that the proposed correspondence association is valid exceeding a threshold probability (e.g., 65%). In another example, the criterion can be satisfied if a maximal aggregate weight coefficient value of a set of proposed correspondence associations is obtained by a particular combination or number of proposed correspondence associations forming the set. Accordingly, in some examples, the processing logic can select the proposed correspondence association with the maximum weight or the proposed correspondence associations of a chain that has a maximum aggregate weight value for all the proposed correspondence association in the chain. Thus, in such implementations, the processing logic would select the set of corresponding degrees of association whose aggregate degree of association is maximized or otherwise satisfies a criterion among the corresponding degrees of associations that have been determined. Accordingly, a collection of words (i.e., from among all of the words in the document) that are associated with each of the given document fields according to their respective nomenclature can be generated. After the word set is generated, the processing logic can identify the field in the document by determining the proposed correspondence association with the highest weight (e.g., aggregate confidence value). As a result, a potential region in the image can be identified as a field after the trained generator is tested. [0068 - calculated value of a function (e.g., distance) satisfies a criterion (e.g., the calculated distance between which is below a threshold value) are “close” to each other while those for which the calculated value of a function (e.g., distance) does not satisfy a criterion (e.g., the calculated distance between which is above a threshold value) are “far” from each other] To determine a measure of similarity or correspondence to one another between the embeddings, the processing logic can evaluate the distance or another function in vector space of the respective encodings. For example, the correspondence between encodings can be evaluated by calculating a cosine or Euclidean measure of a distance between them in vector space. A calculation is made of how close each of the respective vector encodings are to one another. In some implementations, the processing logic can determine that pairs of embeddings for which the calculated value of a function (e.g., distance) satisfies a criterion (e.g., the calculated distance between which is below a threshold value) are “close” to each other while those for which the calculated value of a function (e.g., distance) does not satisfy a criterion (e.g., the calculated distance between which is above a threshold value) are “far” from each other. Subsequently, the processing logic can select close embeddings (i.e., encoded vectorized representations) as potential proposed correspondence associations for each particular field. Thus, when all distances between the embeddings stored in Array1 for a particular field class and embeddings in Array2 are calculated, the processing logic can determine stable combinations of embeddings (i.e., phrases/words on the documents with the maximum proximity/distance measure). If several embeddings (i.e., encodings of words/phrases of a document) are evaluated to be close to each other, then they can be considered to be a stable combination and therefore can be considered similar to one another.), generating an evaluation pass message based on the recombined phrases (Semenov 2024/0143632 [0065 - combinations of words, phrases, sentences] Individual words or meaningful combinations of words, phrases, sentences can be represented/encoded by the vector embeddings (i.e., vectors of fixed length, for example, 10-12 digits), for example, through Word2vec or other suitable trained methods. Thus, vector embeddings can be generated for each stable combinations of words on the document. The embeddings of each of the stable combinations on the document can be collected and stored as another array (e.g., Array2).). Before the effective filing date of the claimed invention, it would have been obvious for one of ordinary skill in the art to have modified Iyer et al. 2024/0054290 to include the features as taught by Semenov 2024/0143632. One of ordinary skill in the art would have been motivated to do so to implement well known tools and features useful for implementing an innovative recombination recommendation system for a corporate knowledge base which should prove to improve user experience, maximize profits, and optimize revenue.
Claim 6, has similar limitations as of Claim 1, therefore it is REJECTED under the same rationale as Claim 1. 
Claim 1, has similar limitations as of Claim 6, therefore it is REJECTED under the same rationale as Claim 6. 

Claims 2 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over: Iyer et al. 2024/0054290; in view of Semenov 2024/0143632; in further view of Kupershmidt et al. 2007/0162411.
18/601,601 – Claim 2. Iyer et al. 2024/0054290 further teaches The innovative recombination recommendation system for a corporate knowledge base according to claim 1, wherein the server-end device performs labelling (Iyer et al. 2024/0054290 [0060 – tagging interpreted as labelling] At step 352, the method includes receiving, by the processor 102, a corpus of innovation dataset, extracted keywords, ideas feedback, idea implementation data, and short-listed ideas. For example, a Python code and combined multiple NLP techniques may be used to detect synonyms and entities, Part of Speech (POS) tagging, using seed words, multi-language detection, pre-processing, and contextual keyword extraction.) by at least one of an automatic manner and a manual manner, and splits the innovation summary into one or more feature phrases (Iyer et al. 2024/0054290 [0062 - stemming and tokenization in which the text is split into smaller units] In another example, ‘retaining,’ ‘amplifying’ upgrading’ and the like. The POS entities may be detected to capture important and relevant entities such as a noun, singular (NN), proper noun singular (NNP) (tags for singular nouns), noun, plural (NNS), proper noun plural (NNPS) (tags for plural nouns), adjective (JJ). For example, retaining nouns or entities such as ‘metaverse,’ ‘digital twin,’ ‘quantum computing.’ Further, for stemming and tokenization in which the text is split into smaller units. For stemming may be the process of reducing a word to its word stem that affixes to suffixes and prefixes. Punctuations and special characters from the text may be removed. [0091 - extracting an entity and a key phrase from the extracted context-based keyword] At block 604, the method 600 may include searching semantically, by the processor 102, relevant keywords for the extracted context-based keyword, by extracting an entity and a key phrase from the extracted context-based keyword. The entities correspond to named entity recognition in the innovation dataset. [Fig. 1; 0030 - system 100 may be implemented by way of a single device or a combination of multiple devices that may be operatively connected or networked together. The system 100 may be implemented in hardware or a suitable combination of hardware and software. The system 100 includes a processor 102, and a memory 104. The memory 104 may include a plurality of processing engines] FIG. 1 illustrates a system 100 for deep technology innovation management by cross-pollinating innovations dataset, according to an example embodiment of the present disclosure. The system 100 may be implemented by way of a single device or a combination of multiple devices that may be operatively connected or networked together. The system 100 may be implemented in hardware or a suitable combination of hardware and software. The system 100 includes a processor 102, and a memory 104. The memory 104 may include a plurality of processing engines. The processing engines may include, but are not limited to, a keyword extraction engine 106, a search engine 108, a clustering engine 110, a trend-spotting engine 112, a Graphical User Interface (GUI) engine 114, and the like.).
Iyer et al. 2024/0054290 may not expressly disclose the “automatic labelling” features, however, Kupershmidt et al. 2007/0162411 teaches (Kupershmidt et al. 2007/0162411 [0106 - Tagging may be performed automatically or manually. Automatic tagging automatically extracts key concepts for imported data]). Before the effective filing date of the claimed invention, it would have been obvious for one of ordinary skill in the art to have modified Iyer et al. 2024/0054290 to include the features as taught by Kupershmidt et al. 2007/0162411. One of ordinary skill in the art would have been motivated to do so to implement well known tools and features useful for implementing an innovative recombination recommendation system for a corporate knowledge base which should prove to improve user experience, maximize profits, and optimize revenue.
18/601,601 – Claim 7. The innovative recombination recommendation method for a corporate knowledge base according to claim 6, wherein the server-end device performs labelling by at least one of an automatic manner and a manual manner, and splits the innovation summary into one or more feature phrases.
Claim 7, has similar limitations as of Claim 2, therefore it is REJECTED under the same rationale as Claim 2. 

Claims 3 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over: Iyer et al. 2024/0054290; in view of Semenov 2024/0143632; in further view of Kupershmidt et al. 2007/0162411; in view of Van Dusen et al. 2014/0075004.
18/601,601 – Claim 3. Iyer et al. 2024/0054290 further teaches The innovative recombination recommendation system for a corporate knowledge base according to claim 2, wherein the server-end device executes one of a clustering algorithm and an association rule algorithm to analyze the feature phrases, and generates at least one recombined technology combination based on an analysis result (Iyer et al. 2024/0054290 [0007 - extracting an entity and a key phrase from … the method includes clustering the vector…] Another embodiment of the present disclosure may include a method, the method includes extracting a context-based keyword from an innovation dataset by transforming the innovation dataset into a vector. The innovation dataset comprises data corresponding to an innovation. Further, the method includes searching semantically relevant keywords for the extracted context-based keyword, by extracting an entity and a key phrase from the extracted context-based keyword. The entities correspond to named entity recognition in the innovation dataset. Furthermore, the method includes clustering the vector, by identifying frequent keywords in the semantically relevant keywords to obtain cluster centroids of the frequent keywords. Further, the method includes determining weighted keywords in each cluster using the obtained cluster centroids, and classifying the weighted keywords to identify emerging innovation trends relevant to the innovation in the innovation dataset. [0072 – clustering algorithm]).
Iyer et al. 2024/0054290 may not expressly disclose the “clustering algorithm and an association rule algorithm” features, however, Van Dusen et al. 2014/0075004 teaches (Van Dusen et al. 2014/0075004 [0164; 5847; 5850; 5851; 5853] clustering algorithm, [6390; 1417] association algorithm/rules). Before the effective filing date of the claimed invention, it would have been obvious for one of ordinary skill in the art to have modified Iyer et al. 2024/0054290 to include the features as taught by Van Dusen et al. 2014/0075004. One of ordinary skill in the art would have been motivated to do so to implement well known tools and features useful for implementing an innovative recombination recommendation system for a corporate knowledge base which should prove to improve user experience, maximize profits, and optimize revenue.
18/601,601 – Claim 8. The innovative recombination recommendation method for a corporate knowledge base according to claim 7, wherein the server-end device executes one of a clustering algorithm and an association rule algorithm to analyze the feature phrases, and generates at least one recombined technology combination based on an analysis result.
Claim 8, has similar limitations as of Claim 3, therefore it is REJECTED under the same rationale as Claim 3. 

No Prior-art Rejection / Potentially Allowable
Claims 4, 5, 9, 10 cannot be rejected with prior-art. Individual claimed features are taught in the prior-art, however, the unique combination of features and elements are not taught by the prior-art without hindsight reasoning. These claims are further rejected to as being dependent upon a rejected base claim but might possibly be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
18/601,601 – Claim 4. The innovative recombination recommendation system for a corporate knowledge base according to claim 3, wherein the server-end device vectors the at least one recombined technology combination to generate a recombination vector, calculates a third vector distance between the recombination vector and one of the mathematical vectors, and loads the patent raw data with the third vector distance exceeding the threshold value from the corporate knowledge base, as a prior art reference.
18/601,601 – Claim 5. The innovative recombination recommendation system for a corporate knowledge base according to claim 4, wherein the server-end device performs an association analysis based on the loaded patent raw data, and the association analysis comprises an analysis for at least one of the recombined technology combination, a patent citation, joint inventors, and joint applicants.
18/601,601 – Claim 9. The innovative recombination recommendation method for a corporate knowledge base according to claim 8, wherein the server-end device vectors the recombined technology combination to generate a recombination vector, calculates a third vector distance between the recombination vector and one of the mathematical vectors, and loads the patent raw data with the third vector distance exceeding the threshold value from the corporate knowledge base, as a prior art reference.
18/601,601 – Claim 10. The innovative recombination recommendation method for a corporate knowledge base according to claim 9, wherein the server-end device performs an association analysis based on the loaded patent raw data, and the association analysis comprises an analysis for at least one of the recombined technology combination, a patent citation, joint inventors, and joint applicants.

Examiner’s Response to Arguments
Per Applicants’ amendments/arguments, the rejections are withdrawn.
Applicant's arguments have been considered but are moot in view of the new ground(s) of rejection.
Applicants’ amendments have necessitated the new grounds of rejection noted above.

Examiner’s Response: Claim Rejections – 35 USC §112
Per Applicants’ amendments/arguments, the rejections are withdrawn.
Applicant's arguments have been considered but are moot in view of the new ground(s) of rejection.
Applicants’ amendments have necessitated the new grounds of rejection noted above.

Examiner’s Response: Claim Rejections – 35 USC §101
Per Applicants’ amendments/arguments, the rejections are withdrawn. See notes above for additional reasoning and rationale for dropping 35 USC 101 rejection including Applicant’s amendments, arguments, lack of abstract idea, and practical integration.
Applicant's arguments have been considered but are moot in view of the new ground(s) of rejection.
Applicants’ amendments have necessitated the new grounds of rejection noted above.
Regarding Claims 1-15, on page(s) 6-12 of Applicant’s Remarks (dated 12/27/2016), Applicants traverse the 35 USC §101 rejections arguing the following: 

Examiner’s Response: Claim Rejections – 35 USC § 102 / § 103
Per Applicants’ amendments/arguments, the rejections are withdrawn. See notes above for additional reasoning and rationale for dropping prior-art rejection including Applicant’s amendments and arguments and unique combination of features and elements not taught by the prior-art without hindsight reasoning.
Applicant's arguments have been considered but are moot in view of the new ground(s) of rejection.
Applicants’ amendments have necessitated the new grounds of rejection noted above.
Regarding Claim X, on page(s) 8-9 of Applicant’s Remarks / After Final Amendments (dated 07/15/2011), Applicant(s) argues that the cited reference(s) (Ellis and Vandermolen) fails to teach, describe, or suggest the amended features.  Specifically, Applicant(s) argues that cited reference(s) do not teach, describe, or suggest the following:  .  With respect, Applicant’s arguments are deemed unpersuasive and the amended feature(s) remain rejected as follows.
With respect, Applicant’s arguments are deemed unpersuasive and the amended feature(s) remain rejected as follows.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee. Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.” 

Conclusion
PERTINENT PRIOR ART – Patent Literature
The prior-art made of record and considered pertinent to applicant's disclosure. 
MOMO et al. 2024/0012979 [0089 - vectorizing a graph that is a search result (e.g., a graph representing the shortest path between a plurality of designated words or phrases)] 
Chiang et al. 2019/0146590 [0007 - machine learning algorithm based on the raw data sets and the standard action labels corresponding to the raw data sets to build a feature vector creation model and a classifier model]

PERTINENT PRIOR ART – Non-Patent Literature (NPL)
The NPL prior-art made of record and considered pertinent to applicant's disclosure. 
Yan, B. and Luo, J. (2017), Measuring technological distance for patent mapping. Journal of the Association for Information Science and Technology, 68: 423-437. https://doi.org/10.1002/asi.23664

THIS ACTION IS MADE FINAL
Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).   
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
THIS ACTION IS MADE FINAL
Applicant’s amendment necessitated new grounds of rejection and FINAL Rejection.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MATTHEW T. SITTNER whose telephone number is (571) 270-7137 and email: matthew.sittner@uspto.gov.  The examiner can normally be reached on Monday-Friday, 8:00am - 5:00pm (Mountain Time Zone). Please schedule interview requests via email: matthew.sittner@uspto.gov
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sarah M. Monfeldt can be reached on (571) 270-1833.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MATTHEW T SITTNER/
Primary Examiner, Art Unit 3629b
Read full office action
Prosecution Timeline

Mar 11, 2024
Application Filed
Feb 25, 2026
Non-Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/215,831
Patent 12596996
SYSTEMS AND METHODS FOR PROVIDING DYNAMIC REPRESENTATION OF ASSETS IN A FACILITY
2y 5m to grant Granted Apr 07, 2026
18/345,528
Patent 12591843
SCALABLE AND EFFICIENT PACKAGE DELIVERY USING TRANSPORTER FLEET
2y 5m to grant Granted Mar 31, 2026
18/273,871
Patent 12572962
CUSTOMER SERVING ASSISTANCE APPARATUS, CUSTOMER SERVING ASSISTANCE METHOD, AND NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIUM
2y 5m to grant Granted Mar 10, 2026
18/662,631
Patent 12572992
SYSTEMS AND METHODS FOR AUTOMATED BUILDING CODE CONFORMANCE
2y 5m to grant Granted Mar 10, 2026
18/179,810
Patent 12565335
DETERMINING PART UTILIZATION BY MACHINES
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
58%
Grant Probability
99%
With Interview (+56.2%)
3y 1m
Median Time to Grant
Low
PTA Risk
Based on 890 resolved cases by this examiner. Grant probability derived from career allow rate.