DETAILED ACION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
2. The Amendment filed on February 26th 2026 has been entered. Claims 1, 10, 13 and 16 have been amended and claims 9 and 17 have been cancelled. Claims 1 – 8, 10 – 16 and 18 - 20 are currently pending.
Response to Arguments
35 U.S.C. §103
3. Applicant's arguments, see Remarks pp. 9 -15, filed February 26th 2026, with respect to the rejections of claims 1 and 13 under 35 U.S.C. §103 have been fully considered and they are persuasive.
The crux of Applicant’s arguments is that the amendments to the independent claim 1 and 13 are not taught either singly or in combination by the art of record
Examiner respectfully agrees.
Upon further consideration new grounds of rejection have been necessitated due
to Applicant's amendments and are made in view of Traviesco (United States Patent Publication Number 2016/0357733) hereinafter Traviesco and Xiaojun Huang (United States Patent Publication Number 20200311779) hereinafter Huang
4. Applicant's arguments, see Remarks pp. 15 -17, filed February 26th 2026, with respect to the rejections of claims 16 under 35 U.S.C. §103 have been fully considered and they are persuasive.
Applicant argues a distinction between URLs and HTTP documents as to the how documents are located via their links and the other just containing the links.
Examiner respectfully disagrees and states that within the broadest reasonable interpretation of applicant’s claimed invention in light of the Specification, URL (Uniform Resource Locator) and HTTP (Hypertext Transfer Protocol) are both fundamental, interdependent components of web browsing that enable client-server communication.
Claim Rejections – 35 U.S.C. §103
5. The following is a quotation of 35 U.S.C. 103 which forms the basis for all
obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
6. The factual inquiries set forth in Graham v John Deere Co., 383 U.S. 1, 148 USPQ
459 (1966), that are applied for establishing a background for determining obviousness
under 35 U.S.C. 103 are summarized as follows:
a. Determining the scope and contents of the prior art
b. Ascertaining the differences between the prior art and the claims at issue
c. Resolving the level of ordinary skill in the pertinent art
d. Considering objective evidence present in the application indicating
obviousness or nonobviousness
Claims 1, 5, 6 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Ganti et al. (United States Patent Publication Number 20200273052), in view of Holger Schwedes et al. (United States Patent Publication Number 20030018617 ), hereinafter referred to as Schwedes in view of Traviesco (United States Patent Publication Number 2016/0357733) hereinafter Traviesco and in further view of Xiaojun Huang (United States Patent Publication Number 20200311779) hereinafter Huang
Regarding claim 1 Ganti teaches a system (cognitive system [0062]) see also cloud system [0034] for implementing uniform resource locator ("URL") embeddings for aligning parallel documents that are corresponding web pages in different languages, the system (cognitive system [0062]) see also cloud system [0034] comprising: a processing system; (processing unit [0063]) and memory (memory 430 [0063]) coupled to (in communication with [0063]) the processing system, (processing unit [0063]) the memory(memory 430 [0063] comprising computer executable instructions (one or more executable instructions [0103]) that, when executed by (being executed by [0047]) the processing system, (processing unit [0063])causes the system (cognitive system [0062]) see also cloud system [0034] to perform operations (perform particular tasks [0047]) comprising: calculating, (calculations may be performed [0065]) using an artificial intelligence ("Al") model, (artificial neural networks [0076]) URL embeddings for each URL among a plurality of URLs (neural network embedding for fuzzy matching, the URL names of similar internet/web pages [0072]) to produce a plurality of URL embeddings (vector embeddings computed on the sequence neighborhood and content of a URL [0072]) for host portions (content of a URL [0072]) of the plurality of URLs, (URL names of internet/web pages [0072]); computing vector similarities (cosine distance between vectors [0072]) between first URL embeddings corresponding to a first subset of web documents and second URL embeddings corresponding to second subset of web documents; (The semantic similarity between internet/web pages may be learned using vector embeddings computed on the sequence neighborhood and content of a URL [0072])
Ganti does not fully disclose the plurality of URLs corresponding to web documents in at least two different languages in a source language; in a target language; identifying parallel document candidates based on the computed vector similarities, the parallel document candidates including parallel documents that are corresponding web documents in at least two different languages including the source language and the target language; and selecting a set of parallel URLs form a set of candidate parallel URLs corresponding to the identified parallel document candidates, by using a weighted bipartite matching algorithm, wherein the weighted bipartite matching algorithm comprises a competitive linking algorithm that causes matching and linking of nodes corresponding to the set of parallel URLs and that prevents linked nodes from being subsequently linked with other nodes corresponding to other candidate parallel URLs from the set of candidate parallel URLs
Schwedes teaches the plurality of URLs corresponding to web documents in at least two different languages; (The non-text components of the Web pages, e.g.,
hyperlinks and URLs, contain information that may be useful in clustering and classifying Web pages, especially for similar pages that contain many images but little text, are
compiled in different languages, [0030]) in a source language; (German document D5 [0031]) in a target language (English document D6) [0031]) based on the computed vector similarities, (Fig. 3 document vectors 301-303 of the exemplary documents weighted using a TFIDF weighting technique, in the computation of the document similarities. [0025]
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti to incorporate the teachings of Schwedes whereby the plurality of URLs corresponding to web documents in at least two different languages; in a source language; in a target language; based on the computed vector similarities. By doing so the hyperlink(s) and URL for each page can be charted into the enhanced document vector model along with text components. Schwedes [0030]
Traviesco teaches Identifying parallel document candidates the parallel document candidates including parallel documents that are corresponding web documents in at least two different languages including the source language and the target language (Fig. 3 FIG. 3 is a block diagram illustrating an exemplary system architecture of a web site presented in two languages, according to one embodiment of the present teaching. The web site shown in FIG. 3 may be presented in a first language, such as English, and a second language, such as Spanish. [0048])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti in view of Schwedes to incorporate the teachings of Traviesco wherein identifying parallel document candidates, the parallel document candidates including parallel documents that are corresponding web documents in at least two different languages including the source language and the target language. By doing so a translation server 300
situated apart from and existing independently from the web server 112. The translation server 300 may embody the main functions of the present teaching, including the provision of a web site in a secondary language, such as Spanish. The
translation server 300 may provide the secondary language components of a base web site, which is provided by web server 112, without requiring integration with the base web site or re-configuring or re-engineering of the web server 112. Traviesco [0050]
Huang teaches and selecting a set of parallel URLs from a set of candidate parallel URLs (the system may receive, via the GUI, a user selection to display a plurality of hyperlinks. For example, the system may receive a user input (e.g., from a
button, keyboard, mouse, pen, touchscreen, or other pointing device) selecting link 211 ("See Other Sellers") in the interface display of FIG. 2B (Single Display Page (SDP) [0052]corresponding to the identified parallel document candidates, (Fig. 2A search result page that includes one or more search results satisfying a search request along with interactive user interface elements, [0013]; Fig. 2B sample Single Display Page (SDP) that includes a product and information about the product along with interactive user interface elements [0014]; Fig. 2C sample Seller Listing that includes information about sellers of the product within the marketplace along with interactive user interface elements [0015] ) such as “identified parallel document candidates” by using a weighted bipartite matching algorithm (In step 340, the transaction data is used to construct
a first bipartite graph between the buyers and sellers. An exemplary bipartite graph is shown in FIG. SA. As shown in FIG. SA, for example, the set of sellers may be represented
in the bipartite graph by one or more nodes 1-4 and the set of buyers may be represented by one or more nodes 5-9. [0061]) such as “weighted bipartite matching algorithm” wherein the weighted bipartite matching algorithm comprises a competitive linking algorithm that causes matching and linking of nodes corresponding to the set of parallel URLs and that prevents linked nodes from being subsequently linked with other nodes corresponding to other candidate parallel URLs from the set of candidate parallel URLs (ABS., constructing a first bipartite graph between the one or more nodes of the first set and one or more nodes of a second set, the first set and the second set being mutually exclusive … The first-ranked hyperlink is automatically moved to a first position on the GUI and the second and subsequently ranked hyperlinks are automatically moved to second and subsequent positions, respectively, on the GUI.) (Fig. 3B (340) Construct a first bipartite graph between the buyers and sellers [0061], (350) Weight edges of the bipartite graph [0062]) (For example, the hyperlink 225a corresponding to a first-ranked seller may be moved to a first position on the GUI (Step 380). [0073])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti in view of Schwedes and Traviesco to incorporate the teachings of Huang wherein and selecting a set of parallel URLs from a set of candidate parallel URLs corresponding to the identified parallel document candidates, by using a weighted bipartite matching algorithm the weighted bipartite matching algorithm comprises a competitive linking algorithm that causes matching and linking of nodes corresponding to the set of parallel URLs and that prevents linked nodes from being subsequently linked with other nodes corresponding to other candidate parallel URLs from the set of candidate parallel URLs.
By doing so in determining a rank includes constructing a first bipartite graph between the one or more nodes of the first set and one or more nodes of a second set, the first set and the second set being mutually exclusive … The first-ranked hyperlink is automatically moved to a first position on the GUI and the second and subsequently ranked hyperlinks are automatically moved to second and subsequent positions, respectively, on the GUI. Huang [0008]
Regarding claim 5 Ganti in view of Schwedes, Traviesco and Huang teaches the system of claim 1,
Ganti as modified teaches based on (based on [0023]) the corresponding (corresponding [0026]) one of the second URL embeddings; (vector embeddings computed on the sequence neighborhood and content of a URL [0072])
Ganti as modified does not fully disclose wherein the plurality of URLs comprises a first subset of URLs corresponding to the first subset of web documents in the source language and a second subset of URLs corresponding to the second subset of web documents in the target language, wherein the operations further comprise: partitioning, using a clustering algorithm, the first subset of URLs into a plurality of clusters; determining a vector distance between each URL of the second subset of URLs and a centroid of two or more clusters of the plurality of clusters, based on the corresponding one of the second URL embeddings; and for each URL of the second subset of URLs, identifying a cluster among the two or more clusters that is most relevant to said URL based on the determined vector distance for said URL, and assigning said URL to the identified cluster.
Schwedes teaches wherein the plurality of URLs (URLs (Uniform Resource Locators) of Web "pages." [0027]) comprises a first subset of URLs ("mehr dazu: <a href="link.html">dort<N>" (German document D5); [0031]) corresponding to the first subset of web documents (hypertext documents [0031]) in the source language (German [0031]) and a second subset of URLs ("you find more info <a href="link.html">here</A>" (English document D4);) corresponding to the second subset of web documents(hypertext documents [0031]) in the target language, (English [0031]) wherein the operations (operations [0038]) further comprise: partitioning, (partitioned [0032]) using a clustering algorithm, (clustering algorithm [0032]) the first subset of URLs ("mehr dazu: <a href="link.html">dort<N>" (German document D5); [0031]) into a plurality of clusters; (cluster documents into different groups (block 625). [0032]) determining a vector distance (set of points in a multidimensional vector space [0017]) such as “vector distance” between each URL of the second subset of URLs ("you find more info <a href="link.html">here</A>" (English document D4);) and a centroid (documents judged to be similar due to their cosine or cartesian distance [0017]) such as “centroid” of two or more clusters of the plurality of clusters, (Fig. 2, (201) D1, (202) D2 and (203) D3 cluster of documents [0018]) and for each URL of the second subset of URLs, ("you find more info <a href="link.html">here</A>" (English document D4); [0031]) identifying (identifies [0032]) a cluster among the two or more clusters (any one of Fig. 2, (201) D1, (202) D2 and (203) D3 cluster of documents [0018]) that is most relevant (relevant [0014]) to said URL ("you find more info <a href="link.html">here</A>" (English document D4);) based on (based on [0014], [0029]) the determined vector distance (set of points in a multidimensional vector space [0017]) such as “vector distance” for said URL, ("you find more info <a href="link.html">here</A>" (English document D4);) and assigning (grouped together [0017]) such as “assign” said URL ("you find more info <a href="link.html">here</A>" (English document D4);) to the identified cluster (any one of Fig. 2, (201) D1, (202) D2 and (203) D3 cluster of documents [0018])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti in view of Traviesco and Huang to incorporate the teachings of Schwedes wherein the plurality of URLs comprises a first subset of URLs corresponding to the first subset of web documents in the source language and a second subset of URLs corresponding to the second subset of web documents in the target language, wherein the operations further comprise: partitioning, using a clustering algorithm, the first subset of URLs into a plurality of clusters; determining a vector distance between each URL of the second subset of URLs and a centroid of two or more clusters of the plurality of clusters, identifying a cluster among the two or more clusters that is most relevant to said URL based on the determined vector distance for said URL, and assigning said URL to the identified cluster. By doing so the non-text components of the Web pages, e.g., hyperlinks and URLs, contain information that may be useful in clustering and classifying Web pages, especially for similar pages that contain many images but little text, are compiled in different languages, and/or include synonyms or homonyms. Schwedes [0030].
Regarding claim 6 Ganti in view of Schwedes, Traviesco and Huang teaches the system of claim 5,
Ganti as modified does not fully disclose wherein the clustering algorithm comprises a k-means clustering algorithm.
Schwedes teaches wherein the clustering algorithm (clustering algorithm [0032]) comprises a k-means clustering algorithm. (k-means [0036[)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti in view of Traviesco and Huang to incorporate the teachings of Schwedes wherein the clustering algorithm comprises a k-means clustering algorithm. By doing so the transparent integration of the additional document non-text components makes the enhanced document vector model compatible with clustering algorithms typically used with "text only" document vector models without modification. Schwedes [0036]
Regarding claim 7 Ganti in view of Schwedes, Traviesco and Huang teaches the system of claim 5,
Ganti as modified further teaches of the plurality of URL embeddings (vector embeddings computed on the sequence neighborhood and content of a URL [0072])
Ganti as modified does not fully disclose wherein computing the vector similarities is performed based on cosine similarity calculations
Schwedes teaches wherein computing the vector similarities is performed based on cosine similarity calculations (Similarity between two documents typically is measured by the cosine of the angle between their vectors, [0017])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti in view of Traviesco and Huang to incorporate the teachings of Schwedes wherein computing the vector similarities is performed based on cosine similarity calculations. By doing so documents judged to be similar by this measure are grouped together by the clustering algorithm used by the enhanced document vector module 135. Schwedes [0017]
Claims 2 and 3 are rejected under 35 U.S.C. 103 as being unpatentable over Ganti et al. (United States Patent Publication Number 20200273052), in view of Holger Schwedes et al. (United States Patent Publication Number 20030018617 ), hereinafter referred to as Schwedes, in view of Traviesco (United States Patent Publication Number 2016/0357733) hereinafter Traviesco and Xiaojun Huang (United States Patent Publication Number 20200311779) hereinafter Huang and in further view of Fan et al., (United States Patent Number 11153293) hereinafter Fan
Regarding claim 2 Ganti in view of Schwedes, Traviesco and Huang teaches the system of claim 1
Ganti as modified does not fully disclose wherein the plurality of URLs are URLs that have been classified into URLs for a single network domain from a collection of URLs, for a plurality of network domains, that is contained in a metadata index of a pre-collected archive of web documents.
Fan teaches wherein the plurality of URLs (hyperlinks Col 16 ln – 20 -25) such as “plurality URLs” are URLs (hyperlinks Col 16 ln – 20 -25) such as “plurality URLs” that have been classified (identity linking policies Col 16 ln 20 – 25) such as “classified” into URLs (hyperlinks Col 16 ln 20 – 25) for a single network domain (specific domain Col 16 ln 30 – 35) from a collection of URLs, (external application hyperlinks Col 16 ln 20 - 25) for a plurality of network domains, (a list of domains which are allowed to display link previews within the group-based communication system and domains for which authentication information can be shared Col 16 ln 25 - 30) that is contained in a metadata index (list Col 8 ln 55 – 60) such as “metadata index” of a pre-collected archive (list of known preview domains Col 8 ln 10 – 15) such as “pre-collected archive” of web documents (web resources Col 8 ln 55 – 60)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti in view of Schwedes, Traviesco and Huang to incorporate the teachings of Fan wherein the plurality of URLs are URLs that have been classified into URLs for a single network domain from a collection of URLs, for a plurality of network domains, that is contained in a metadata index of a pre-collected archive of web documents. By doing a single developer may employ a plurality of different domains and that each of the domains may be defined by a single domain name or a wildcard domain name associated with a plurality of subdomains. Fan Col 17 ln 8 - 12
Regarding claim 3 Ganti in view of Schwedes, Traviesco, Huang and Fan teaches the system of claim 2,
Ganti as modified further teaches identifying (identifying [0069]) the first URL embeddings (neural network embeddings for URL names different web pages [0072]) such as “first URL embeddings” corresponding (corresponding [0055]) to a first subset of web documents; (web pages for Acme product 2.0 [0072]) and identifying the second URL embeddings (neural network embeddings for URL names different web pages [0072]) such as “first URL embeddings” corresponding (corresponding [0055]) to second subset of web documents(web pages for Acme product 2.2 [0072])
Ganti as modified does not fully disclose wherein the metadata index includes a language identification token for a URL of at least a subset of the web documents in the pre-collected archive, wherein the operations further comprise: based on the language identification token,
Schwedes teaches in a source language; (German document D5 [0031]) in a target language (English document D6) [0031])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti in view of Traviesco, Huang and Fan to incorporate the teachings of Schwedes wherein in a source language; in a target language. By doing so the non-text components for the enhanced document vectors may provide information for determining the similarity between documents that text components may not supply, especially for documents containing many images but little text, which are compiled in different languages, or use synonyms and/or homonyms. Schwedes [0006].
Fan teaches wherein the metadata index(list Col 8 ln 55 – 60) such as “metadata index” includes a language identification token (a token Col 14 ln 18) for a URL (hyperlink Col 14 ln 49) of at least a subset of the web documents (web resource Col 14 ln 63) in the pre-collected archive, (list of known preview domains Col 8 ln 10 – 15) such as “pre-collected archive” wherein the operations (operations Col 6 ln 32) further comprise: based on (based on Col 8 ln 9)the language identification token, (a token Col 14 ln 18)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti in view of Traviesco, Huang and Schwedes to incorporate the teachings of Fan wherein the metadata index includes a language identification token for a URL of at least a subset of the web documents in the pre-collected archive, wherein the operations further comprise: based on the language identification token. By doing so the group-based communication system interface 300 may determine that a hyperlink has been added within the message composer 306 and in response to detecting the hyperlink 308, may display a preview notification 310 indicating to the user that a preview associated with the hyperlink 308 is available. Fan Col 8 ln 3 - 8
Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Ganti et al. (United States Patent Publication Number 20200273052), in view of Holger Schwedes et al. (United States Patent Publication Number 20030018617 ), hereinafter referred to as Schwedes in view of Traviesco (United States Patent Publication Number 2016/0357733) hereinafter Traviesco and Xiaojun Huang (United States Patent Publication Number 20200311779) hereinafter Huang, in view of Fan et al., (United States Patent Number 11153293) hereinafter Fan and in further view of Lee(United States Patent Publication Number 2024/0354504) hereinafter Lee
Regarding claim 4 Ganti in view of Schwedes, Traviesco, Huang and Fan teaches the system of claim 2,
Ganti as modified further teaches wherein computing vector similarities comprises computing vector similarities between the first URL embeddings corresponding to web documents and each of the two or more second URL embeddings each corresponding to web documents (The semantic similarity between internet/web pages may be learned using vector embeddings computed on the sequence neighborhood and content of a URL [0072])
Ganti as modified does not fully disclose wherein each URL is further embedded with a language identification token indicating to which of three or more languages the web documents associated with said URL corresponds, wherein the second URL embeddings comprise two or more second URL embeddings correspond to web documents in two or more target languages, in the source language; in one of the two or more target languages.
Lee teaches wherein each URL (Fig. 2, (210), (218) various websites [0016]) such as “URL” is further embedded (text embeddings [0026]) with a language identification token (numbered tokens [0023], [0026], [0028]) indicating to which of three or more languages (written in Hebrew, Arabic, Chinese, … Japanese, etc. [0032]) the web documents (Fig. 2, webpages [0016]) associated with said URL corresponds, (Fig. 2, (210), (218) various websites [0016]) such as “URL” wherein the second URL embeddings comprise two or more second URL embeddings (ABS., “supertoken” embeddings) (embeddings [0030]) correspond to web documents(Fig. 2, webpages [0016]) in two or more target languages, (written in Hebrew, Arabic, Chinese, … Japanese, etc. [0032]) in the source language; (anyone of Hebrew, Arabic, Chinese, … Japanese, etc. [0032]) in one of the two or more target language(anyone or two of Hebrew, Arabic, Chinese, … Japanese, etc. [0032])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti in view of Schwedes, Traviesco, Huang and Fan to incorporate the teachings of Lee wherein each URL is further embedded with a language identification token indicating to which of three or more languages the web documents associated with said URL corresponds, wherein the second URL embeddings comprise two or more second URL embeddings correspond to web documents in two or more target languages, in the source language; in one of the two or more target languages.
Claims 8, 10 - 12 are rejected under 35 U.S.C. 103 as being unpatentable over Ganti et al. (United States Patent Publication Number 20200273052), in view of Holger Schwedes et al. (United States Patent Publication Number 20030018617 ), hereinafter referred to as Schwedes, in view of Traviesco (United States Patent Publication Number 2016/0357733) hereinafter Traviesco and Xiaojun Huang (United States Patent Publication Number 20200311779) hereinafter Huang, and in further view of Ravikumar et al., (United States Patent Publication Number 20100049709) hereinafter Ravikumar
Regarding claim 8 Ganti in view of Schwedes, Traviesco and Huang teaches the system of claim 1
Ganti as modified does not fully disclose wherein the operations further comprise: selecting a set of parallel URLs from a set of candidate parallel URLs corresponding to the identified parallel document candidates, by using a weighted bipartite matching algorithm; and filtering out other candidate parallel URLs among the set of candidate parallel URLs.
Ravikumar teaches and filtering out (by taking the top three most common text instances of each source from the context I(b ). [0056]) other candidate parallel URLs (quicklink URLs that had unusable titles [0055]) among the set of candidate parallel URLs (quicklink URLs of a set of 4000 of the most accessed websites were selected [0054])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti in view of Schwedes, Traviesco and Huang to incorporate the teachings of Ravikumar wherein the operations further comprise filtering out other candidate parallel URLs among the set of candidate parallel URLs. By doing so the URLs labeled in this dataset are biased towards frequently navigated web pages within the website. Ravikumar [0055]
Regarding claim 10 Ganti in view of Schwedes, Traviesco, Huang and Ravikumar teaches the system of claim 8
Ganti as modified does not fully disclose wherein the competitive linking algorithm uses weights that represent a quality of the set of parallel URLs.
Ravikumar teaches wherein the competitive linking algorithm (a Ranking SVM (Support Vector Machine) method [0031]) such as “competitive linking algorithm” uses weights (default source weights [0068]) that represent a quality (produce more accurate models [0068]) of the set of parallel URLs (quicklink URLs of a set of 1430 [0054])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti in view of Schwedes, Traviesco and Huang to incorporate the teachings of Ravikumar wherein the competitive linking algorithm uses weights that represent a quality of the set of parallel URLs. By doing so as more data becomes available, the full formulation learns source specific weights and starts performing better. Ravikumar [0068]
Regarding claim 11 Ganti in view of Schwedes, Traviesco, Huang and Ravikumar teaches the system of claim 10,
Ganti as modified does not fully disclose wherein the weights are calculated as a margin score based on a highest scoring URL pair and a plurality of other high-scoring URL pairs.
Ravikumar teaches wherein the weights (specific weights [0031]) are calculated as a margin score based on a highest scoring (highest ranked candidate [0032]) such as “highest scoring” URL pair (Associated with every pair (w, s) ofa web page w and a sources ES is a (possibly empty) set I(w, s)={ (t1, x1), ... , (tn, xm) }, where each tuple (t,, x,) represents a text instance with the corresponding weight [0032]) and a plurality of other high-scoring (and a candidate link title set Ic(w). [0032]) such as “plurality of other high-scoring” URL pairs (Associated with every pair (w, s) ofa web page w and a sources ES is a (possibly empty) set I(w, s)={ (t1, x1), . .. , (tn, xm) }, where each tuple (t,, x,) represents a text instance with the corresponding weight [0032])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti in view of Schwedes, Traviesco and Huang to incorporate the teachings of Ravikumar wherein the weights are calculated as a margin score based on a highest scoring URL pair and a plurality of other high-scoring URL pairs. By doing so the relevance of each source is assessed and accounted for while selecting the best link title. Ravikumar [0030]
Regarding claim 12 Ganti in view of Schwedes, Traviesco and Huang teaches the system of claim 1,
Ganti as modified does not fully disclose wherein the operations further comprise: extracting at least one of parallel document text or parallel sentences from web documents corresponding to the selected set of parallel URLs; and training a translation model of a machine language translation system among two or more languages using the extracted at least one of parallel document text or parallel sentences.
Schwedes teaches among two or more languages
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti in view of Traviesco and Huang to incorporate the teachings of Schwedes wherein among two or more languages. By doing so compilation may occur in different languages. Schwedes [0006]
Ravikumar teaches wherein the operations (operations [0018]) further comprise: extracting (extract [0032]) at least one of parallel document text or parallel sentences (pick important sentences and phrases, [0027]) from web documents (web page as the source [0027]) corresponding to (corresponding to [0035]) the selected set of parallel URLs; (quicklink URLs of a set of 1430 [0054]) and training (training [0031]) a translation model of a machine language translation system (based on statistical translation techniques, and uses probabilistic model-based methods [0027]) using the extracted (extracted [0033]) at least one of parallel document text or parallel sentences (important sentences and phrases, [0027])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti in view of Schwedes, Traviesco and Huang to incorporate the teachings of Ravikumar wherein the operations further comprise: extracting at least one of parallel document text or parallel sentences from web documents corresponding to the selected set of parallel URLs; and training a translation model of a machine language translation system using the extracted at least one of parallel document text or parallel sentences. By doing so the methods generates a plurality of candidates for the link title from the different sources. Ravikumar [0008]
Claims 13 – 15 are rejected under 35 U.S.C. 103 as being unpatentable over Ganti et al. (United States Patent Publication Number 20200273052), in view of Holger Schwedes et al. (United States Patent Publication Number 20030018617 ), hereinafter referred to as Schwedes in view of in view of Xiaojun Huang (United States Patent Publication Number 20200311779) hereinafter Huang, in view of Fan et al., (United States Patent Number 11153293) hereinafter Fan and in further view of Ravikumar et al., (United States Patent Publication Number 20100049709) hereinafter Ravikumar
Regarding claim 13 Ganti teaches a computer-implemented method (Fig. 9 method [0013]) for implementing uniform resource locator ("URL") embeddings for aligning parallel documents that are corresponding web pages in different languages, the computer-implemented method (Fig. 9 method [0013])comprising: calculating, (calculations may be performed [0065])using an artificial intelligence ("Al") model, (artificial neural networks [0076]) URL embeddings for each URL (vector embeddings computed on the sequence neighborhood and content of a URL [0072])among a plurality of URLs (URL names of similar internet/web pages [0072]) by the URL embeddings of the one or more URLs(vector embeddings computed on the sequence neighborhood and content of a URL [0072]) analyzing the URL embeddings for the one or more URLs (neural network embedding for fuzzy matching, the URL names of similar internet/web pages [0072]) that have been assigned to the cluster; (identified and grouped into a first group. [0081]) being associated with parallel documents (discriminatory sequence patterns that are similar to each other [0092]) such as “parallel document”
Ganti does not fully disclose that have been classified into one domain among a plurality of domains, the plurality of URLs comprising a first subset of URLs corresponding to a first subset of web documents in a source language and a second subset of URLs corresponding to a second subset of web documents in a target language;) partitioning, using a clustering algorithm, the first subset of URLs into a plurality of clusters; assigning one or more URLs among the second subset of URLs into a cluster of the plurality of clusters, based on closeness of the points represented to a centroid of the cluster;) that are corresponding web documents in at least two different languages; identifying a set of candidate parallel URLs by selecting a set of parallel URLs from the identified set of candidate parallel URLs, by using a weighted bipartite matching algorithm; the set of parallel URLs extracting at least one of document text or parallel sentences from web documents corresponding to the parallel URLs; and training a machine translation model with the extracted at least one of document text or parallel sentences; wherein the weighted bipartite matching algorithm comprises a competitive linking algorithm that causes matching and linking of nodes corresponding to the set of parallel URLs and that prevents linked nodes from being subsequently linked with other nodes corresponding to other candidate parallel URLs from the set of candidate parallel URLs
Schwedes teaches the plurality of URLs (URLs (Uniform Resource Locators) of Web "pages." [0027])comprising a first subset of URLs ("mehr dazu: <a href="link.html">dort<N>" (German document D5); [0031])corresponding to(expressed in [0031]) such as “corresponding to” a first subset of web documents (hypertext documents [0031])in a source language (German [0031])and a second subset of URLs ("you find more info <a href="link.html">here</A>" (English document D4);)corresponding to (expressed in [0031]) such as “corresponding to” a second subset of web documents (hypertext documents [0031])in a target language; (English [0031]) partitioning, (partitioned [0032]) using a clustering algorithm, (clustering algorithm [0032]) the first subset of URLs ("mehr dazu: <a href="link.html">dort<N>" (German document D5); [0031] into a plurality of clusters; (cluster documents into different groups (block 625). [0032]) assigning (grouped together [0017]) such as “assign” one or more URLs ("you find more info <a href="link.html">here</A>" (English document D4); [0031]) ("mehr dazu: <a href="link.html">dort<N>" (German document D5); [0031]) among the second subset of URLs("you find more info <a href="link.html">here</A>" (English document D4); [0031]) into a cluster of the plurality of clusters, (cluster documents into different groups (block 625). [0032]) based on closeness of the points represented (represented as a set of points in a multi-dimensional vector
space [0017]) to a centroid of the cluster; (documents judged to be similar due to their cosine or cartesian distance [0017]) that are corresponding web documents in at least two different languages; (The non-text components of the Web pages, e.g.,
hyperlinks and URLs, contain information that may be useful in clustering and classifying Web pages, especially for similar pages that contain many images but little text, are
compiled in different languages, [0030])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti to incorporate the teachings of Schwedes wherein the plurality of URLs comprising a first subset of URLs corresponding to a first subset of web documents in a source language and a second subset of URLs corresponding to a second subset of web documents in a target language; partitioning, using a clustering algorithm, the first subset of URLs into a plurality of clusters; assigning one or more URLs among the second subset of URLs into a cluster of the plurality of clusters, based on closeness of the points represented to a centroid of the cluster; that are corresponding web documents in at least two different languages. By doing so the non-text components of the Web pages, e.g., hyperlinks and URLs, contain information that may be useful in clustering and classifying Web pages, especially for similar pages that contain many images but little text, are compiled in different languages, and/or include synonyms or homonyms. Schwedes [0030].
Huang teaches by using a weighted bipartite matching algorithm (ABS., first bipartite graph are weighted according to a first criterium) (In Step 360, a weighted HITS algorithm is performed on the aggregated bipartite graph (FIG. SD) [0071]) wherein the weighted bipartite matching algorithm comprises a competitive linking algorithm that causes matching and linking of nodes corresponding to the set of parallel URLs and that prevents linked nodes from being subsequently linked with other nodes corresponding to other candidate parallel URLs from the set of candidate parallel URLs (ABS., constructing a first bipartite graph between the one or more nodes of the first set and one or more nodes of a second set, the first set and the second set being mutually exclusive … The first-ranked hyperlink is automatically moved to a first position on the GUI and the second and subsequently ranked hyperlinks are automatically moved to second and subsequent positions, respectively, on the GUI.) (Fig. 3B (340) Construct a first bipartite graph between the buyers and sellers [0061], (350) Weight edges of the bipartite graph [0062]) (For example, the hyperlink 225a corresponding to a first-ranked seller may be moved to a first position on the GUI (Step 380). [0073])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti in view of Schwedes and Traviesco to incorporate the teachings of Huang wherein, by using a weighted bipartite matching algorithm the weighted bipartite matching algorithm comprises a competitive linking algorithm that causes matching and linking of nodes corresponding to the set of parallel URLs and that prevents linked nodes from being subsequently linked with other nodes corresponding to other candidate parallel URLs from the set of candidate parallel URLs.
By doing so in determining a rank includes constructing a first bipartite graph between the one or more nodes of the first set and one or more nodes of a second set, the first set and the second set being mutually exclusive … The first-ranked hyperlink is automatically moved to a first position on the GUI and the second and subsequently ranked hyperlinks are automatically moved to second and subsequent positions, respectively, on the GUI. Huang[0008]
Fan teaches that have been classified (identity linking policies Col 16 ln 20 – 25) such as “classified” into one domain (specific domain Col 16 ln 30 – 35) among a plurality of domains, (a list of domains which are allowed to display link previews within the group-based communication system and domains for which authentication information can be shared Col 16 ln 25 - 30)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti in view of Schwedes and Huangto incorporate the teachings of Fan wherein that have been classified into one domain among a plurality of domains. By doing so the group-based communication system interface 300 may determine that a hyperlink has been added within the message composer 306 and in response to detecting the hyperlink 308, may display a preview notification 310 indicating to the user that a preview associated with the hyperlink 308 is available. Fan Col 8 ln 3 - 8
Ravikumar teaches identifying (identify [0022]) a set of candidate parallel URLs (quicklink URLs of a set of 4000 of the most accessed websites were selected [0054]) by selecting (selecting [0030]) a set of parallel URLs (quicklink URLs of a set of 1430 [0054]) from the identified set of candidate parallel URLs, (quicklink URLs of a set of 4000 of the most accessed websites were selected [0054]) the set of parallel URLs (quicklink URLs of a set of 1430 [0054]) extracting (extract [0032])at least one of document text or parallel sentences (pick important sentences and phrases, [0027])from web documents(web page as the source [0027]) corresponding to(corresponding to [0035]) the parallel URLs; (quicklink URLs of a set of 1430 [0054]) and training (training [0031])a machine translation model (based on statistical translation techniques, and uses probabilistic model-based methods [0027]) with the extracted (extract [0032])at least one of document text or parallel sentences. (important sentences and phrases, [0027]
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti in view of Schwedes, Huang and Fan to incorporate the teachings of Ravikumar wherein identifying a set of candidate parallel URLs by selecting a set of parallel URLs from the identified set of candidate parallel URLs, the set of parallel URLs extracting at least one of document text or parallel sentences from web documents corresponding to the parallel URLs; and training a machine translation model with the extracted at least one of document text or parallel sentences. By doing so the relevance of each source is assessed and accounted for while selecting the best link title. Ravikumar [0030]
Regarding claim14 Ganti in view of Schwedes, Huang, Fan and Ravikumar teaches the computer-implemented method of claim 13,
Ganti as modified further teaches wherein: the Al model comprises an encoder model of a pre-trained neural machine translation ("NMT") model; (artificial neural networks, [0076]) such as “encoder model of a pre-trained neural machine translation ("NMT") model”
Ganti as modified does not fully disclose the plurality of URLs are URLs that have been classified into URLs for a single network domain from a collection of URLs, for a plurality of network domains, that is contained in a metadata index of a pre-collected archive of web documents; the metadata index includes a language identification token for a URL of at least a subset of the web documents in the pre-collected archive, and identifying the set of candidate parallel URLs comprises identifying based on the language identification token for the at least a subset of the web documents; and the clustering algorithm comprises k-means clustering algorithm.
Schwedes teaches and the clustering algorithm (clustering algorithm [0032])comprises k-means clustering algorithm (k-means [0036[)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti in view of Huang, Fan and Ravikumar to incorporate the teachings of Schwedes wherein and the clustering algorithm comprises k-means clustering algorithm. By doing so the non-text components of the Web pages, e.g., hyperlinks and URLs, contain information that may be useful in clustering and classifying Web pages, especially for similar pages that contain many images but little text, are compiled in different languages, and/or include synonyms or homonyms. Schwedes [0030].
Fan teaches the plurality of URLs(hyperlinks Col 16 ln – 20 -25) such as “plurality URLs” are URLs (hyperlinks Col 16 ln – 20 -25) such as “plurality URLs” that have been classified (identity linking policies Col 16 ln 20 – 25) such as “classified” into URLs (hyperlinks Col 16 ln 20 – 25) for a single network domain (specific domain Col 16 ln 30 – 35) from a collection of URLs, (external application hyperlinks Col 16 ln 20 - 25) for a plurality of network domains, (a list of domains which are allowed to display link previews within the group-based communication system and domains for which authentication information can be shared Col 16 ln 25 - 30) that is contained in a metadata index (list Col 8 ln 55 – 60) such as “metadata index” of a pre-collected archive (list of known preview domains Col 8 ln 10 – 15) such as “pre-collected archive” of web documents; (web resources Col 8 ln 55 – 60) the metadata index(list Col 8 ln 55 – 60) such as “metadata index” includes a language identification token (a token Col 14 ln 18) for a URL (hyperlinks Col 16 ln – 20 -25) such as “plurality URLs” of at least a subset of the web documents (web resources Col 8 ln 55 – 60) in the pre-collected archive, (list of known preview domains Col 8 ln 10 – 15) such as “pre-collected archive” comprises identifying (identifying Col 14 ln 11) based on (based on Col 14 ln 24) the language identification token (a token Col 14 ln 18) for the at least a subset of the web documents; (web resources Col 8 ln 55 – 60)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti in view of Schwedes, Huang and Ravikumar to incorporate the teachings of Fan wherein the plurality of URLs are URLs that have been classified into URLs for a single network domain from a collection of URLs, for a plurality of network domains, that is contained in a metadata index of a pre-collected archive of web documents; the metadata index includes a language identification token for a URL of at least a subset of the web documents in the pre-collected archive, comprises identifying based on the language identification token for the at least a subset of the web documents. By doing so the group-based communication system interface 300 may determine that a hyperlink has been added within the message composer 306 and in response to detecting the hyperlink 308, may display a preview notification 310 indicating to the user that a preview associated with the hyperlink 308 is available. Fan Col 8 ln 3 – 8
Ravikumar teaches and identifying the set of candidate parallel URLs (quicklink URLs of a set of 4000 of the most accessed websites were selected [0054])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti in view of Schwedes, Huang and Fan to incorporate the teachings of Ravikumar wherein and identifying the set of candidate parallel URLs. By doing so the relevance of each source is assessed and accounted for while selecting the best link title. Ravikumar [0030]
Regarding claim15 Ganti in view of Schwedes, Huang, Fan and Ravikumar teaches the computer-implemented method of claim 13,
Ganti as modified does not fully disclose further comprising: applying a linking algorithm to the identified set of candidate parallel URLs to apply an estimated relevance weight value to each URL in the identified set of candidate parallel URLs; and filtering out least relevant URLs from the identified set of candidate parallel URLs based on the estimated relevance weight value that is applied to each URL; wherein selecting the set of parallel URLs comprises selecting remaining URLs after filtering.
Ravikumar teaches applying a linking algorithm (a Ranking SVM (Support Vector Machine) method [0031]) such as “linking algorithm” to the identified set of candidate parallel URLs (quicklink URLs of a set of 4000 of the most accessed websites were selected [0054]) to apply an estimated relevance weight value (weights [0031]) to each URL(quicklink URL [0054]) in the identified set of candidate parallel URLs; (quicklink URLs of a set of 4000 of the most accessed websites were selected [0054]) and filtering out (for each website a quicklink selection algorithm picked salient URLs that people often select as navigation destinations. [0054]) least relevant URLs (About 17 percent of these web pages had unusable titles, and were thrown out [0055]) such as “filtering” from the identified set of candidate parallel URLs (quicklink URLs of a set of 4000 of the most accessed websites were selected [0054])based on (based on [0038]) the estimated relevance weight value(weights [0031]) that is applied to (applied [0029]) each URL; (quicklink URL [0054]) wherein selecting (selecting [0030]) the set of parallel URLs (quicklink URLs of a set of 4000 of the most accessed websites were selected [0054])comprises selecting (selecting [0030])remaining URLs (quicklink URLs of a set of 1430 [0054])after filtering (About 17 percent of these web pages had unusable titles, and were thrown out [0055]) such as “filtering”
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti in view of Schwedes, Huang, Fan to incorporate the teachings of Ravikumar wherein applying a linking algorithm to the identified set of candidate parallel URLs to apply an estimated relevance weight value to each URL in the identified set of candidate parallel URLs; and filtering out least relevant URLs from the identified set of candidate parallel URLs based on the estimated relevance weight value that is applied to each URL; wherein selecting the set of parallel URLs comprises selecting remaining URLs after filtering. By doing so these URLs were then shown to three human judges who manually constructed titles that suitably addressed the content of the URLs in the context of the homepage. In this manner, 2,187 unique titles were constructed for 1,430 URLs. Ravikumar [0054]
Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Ganti et al. (United States Patent Publication Number 20200273052), in view of Lee et al., (United States Patent Publication Number 2024/0354504) hereinafter Lee in view of Traviesco (United States Patent Publication Number 2016/0357733) hereinafter Traviesco and in further view of Holger Schwedes et al. (United States Patent Publication Number 20030018617 ), hereinafter referred to as Schwedes.
Regarding claim 16 Ganti teaches a system, (Fig. 1, (12) system [0048]) comprising: a processing system; (Fig. 1, (16) processing unit [0048]) and memory (Fig. 1, (28) system memory [0048]) coupled to (couples [0048]) the processing system, (Fig. 1, (16) processing unit [0048]) the memory (Fig. 1, (28) system memory [0048]) comprising computer executable instructions (computer system-executable instructions, such as program modules, [0047]) that, when executed (being executed [0047]) by the processing system, (Fig. 1, (16) processing unit [0048]) causes the system (Fig. 1, (12) system [0048]) to perform operations (operations [0061]) comprising:
training, (training algorithm [0080]) using one or more sets of parallel uniform resource locators ("URLs") (URL names of similar internet/web pages [0072]) such as “sets of parallel uniform resource locators ("URLs")” as training data, (two sets of data (e.g., “D1” and “D2”) [0080]) such as “training data” an artificial intelligence ("Al") model (artificial neural networks, [0076]) to calculate (neural network embedding for fuzzy matching, the URL names [0072]) URL embeddings (vector embeddings computed on the sequence neighborhood and content of a URL. [0072]) for each URL, (URL names [0072]) wherein URL embeddings (vector embeddings computed on the sequence neighborhood and content of a URL. [0072]) of parallel URLs (Satisfied/
successful customers paths ( e.g., happy customer paths) [0072]) such as “parallel URLs” are closer in value compared with URL embeddings (vector embeddings computed on the sequence neighborhood and content of a URL. [0072])of non-parallel URLs; (unsatisfied/unsuccessful paths ( e.g., unhappy customer paths) [0072]) such as “non-parallel URLs” applying a loss function (A distance algorithm ( e.g., DTW) may be applied [0072]) ) to the calculated URL embeddings (vector embeddings computed on the sequence neighborhood and content of a URL. [0072]) for each URL (a URL [0072]) among the plurality of URLs (URL names [0072]) to cause similar (discriminatory pattern identified for (satisfied/successful customers) [0023]) such as “similar” URL embeddings (vector embeddings computed on the sequence neighborhood and content of a URL. [0072]) to converge (Each of the discriminatory patterns may be grouped
similar to each other [0023]) while causing dissimilar (discriminatory pattern identified for (unsatisfied/unsuccessful customers). [0023]) such as “dissimilar” URL embeddings(vector embeddings computed on the sequence neighborhood and content of a URL. [0072]) to diverge; (Each of the discriminatory patterns may be groupedsimilar to each other [0023]) either for a set number of cycles (steps are repeated [0084]) or until the determined level of effectiveness exceeds a threshold level of effectiveness. and augmenting the training data (two sets of data (e.g., “D1” and “D2”) [0080]) such as “training data” with the constructed one or more sets of synthetic parallel URLs (URL names [0072]) such as “one or more sets of synthetic parallel URLs”
Ganti does not fully disclose constructing one or more sets of synthetic parallel URLs, determining a level of effectiveness of the Al model in identifying parallel URLs that correspond to web documents in at least two different languages; and repeating the processes of training the Al model to calculate URL embeddings for each URL, applying the loss function, and determining the level of effectiveness of the Al model; constructing one or more sets of synthetic parallel URLs, wherein each set of synthetic parallel URLs comprises a first synthetic URL that is a pseudo-URL constructed from a first sentence in a first language and a second synthetic URL that is a pseudo-URL constructed from a second sentence in a second language,
Lee teaches determining a level of effectiveness (the potential for the model
to reach false conclusions based on improperly serialized tokens may be effectively and efficiently managed by adjusting local attention scores based on differences between
predicted and actual values of the order and distance between the attender and attendee tokens [0059]) of the Al model (ABS., sequence model) (Fig. 3 structure-aware sequence model [0009], [0015[, [0021], [0067], [0074]) in identifying parallel URLs (various websites [0016]) that correspond to web documents (web pages [0016]) in at least two different languages; (Chinese, Japanese, Hebrew, Arabic, [0032]) and repeating the processes of training the Al model (ABS., sequence model) (Fig. 3 structure-aware sequence model [0009], [0015[, [0021], [0067], [0074]) to calculate URL embeddings for each URL, (Fig. 2, (210) website 1; (218) website 2 [0016]) applying the loss function, (similar order-and distance-based adjustment paradigm [0054]) such as “loss function’ and determining the level of effectiveness (the potential for the model to reach false conclusions based on improperly serialized tokens may be effectively and efficiently managed by adjusting local attention scores based on differences between predicted and actual values of the order and distance between the attender and attendee tokens [0059]) of the Al model (ABS., sequence model) (Fig. 3 structure-aware sequence model [0009], [0015[, [0021], [0067], [0074])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti to incorporate the teachings of Lee wherein determining a level of effectiveness of the Al model in identifying parallel URLs that correspond to web documents in at least two different languages; and repeating the processes of training the Al model to calculate URL embeddings for each URL, applying the loss function, and determining the level of effectiveness of the Al model. By doing so supertokens may be used to improve transformers that are not otherwise configured to modify local attention scores based
on the order and distance between attender and attendee supertokens. Lee [0034]
Traviesco teaches are associated with parallel documents (The web site shown in FIG. 3 may be presented in a first language, such as English, and a second language, such as Spanish. [0048]) such as “parallel document candidates” constructing one or more sets of synthetic parallel URLs, wherein each set of synthetic parallel URLs comprises a first synthetic URL that is a pseudo-URL constructed from a first sentence in a first language and a second synthetic URL that is a pseudo-URL constructed from a second sentence in a second language, (http: //www.abcwidgets.com/site/olspagejsp?skuid:
=927 6286&productCategory Id-=abcat0l 01001&type=product&id=1218073534751 &session=12345 [0104]) (optimized URLs in the second language are generated that map to customer web sites. For example, on a Spanish site the above ABC Widgets SONY BRAVIA 46" LCD HDTV original URL can be translated into:http: // espanol .abcwidgets .com/televisiones-sony-braviaxbr-clase-46-1080p- -240 hz-lcd-hdtv-kdl-46xbr9/?session=12345 [0106]) the second sentence being a translation in the second language (espanol .abcwidgets .com/televisiones-sony-braviaxbr-
clase-46-1080p- -240 hz-lcd-hdtv-kdl-46xbr9 [0106]) of the first sentence; (SONY'S product BRAVIA 46" LCD HDTV [0104])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti in view of Lee to incorporate the teachings of Traviesco wherein are associated with parallel documents; constructing one or more sets of synthetic parallel URLs, wherein each set of synthetic parallel URLs comprises a first synthetic URL that is a pseudo-URL constructed from a first sentence in a first language and a second synthetic URL that is a pseudo-URL constructed from a second sentence in a second language, the second sentence being a translation in the second language of the first sentence. By doing so once the secondary language components have been established by the translation server 300, they are automatically kept synchronized with the English language components of the base web site. Traviesco [0051]
Schwedes teaches constructing one or more sets of synthetic parallel URLs, (respectively, for the following hypertext documents: "you find more info <a href="link.html">here</A>" (English document D4); "mehr dazu: <a href="link.html">dort<N>" (German document D5); [0031])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti in view of Lee and Traviesco to incorporate the teachings of Schwedes wherein the operations further comprise: constructing one or more sets of synthetic parallel URLs, wherein each set of synthetic parallel URLs. By doing so the non-text components of the documents may be integrated transparently into the enhanced documents vectors, making the enhanced documents vector model compatible with clustering algorithms typically used with "text only" document vector models without modification. Schwedes [0006]
Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Ganti et al. (United States Patent Publication Number 20200273052), in view of Lee(United States Patent Publication Number 2024/0354504) hereinafter Lee, in view of Traviesco (United States Patent Publication Number 2016/0357733) hereinafter Traviesco, in view of Holger Schwedes et al. (United States Patent Publication Number 20030018617 ), hereinafter referred to as Schwedes and in further view of Yih et al., (United States Patent Publication Number 20110219012) hereinafter Yih
Regarding claim 18 Ganti in view of Lee, Traviesco and Schwedes teaches the system of claim 16,
Ganti as modified does not teach wherein the operations further comprise: finetuning the Al model by maximizing a margin score between cosine similarity values of correct URL sets and cosine similarity values of incorrect URL sets.
Yih teaches wherein the operations (tuning operations [0042]) further comprise: finetuning (tuning [0042]) the Al model (learned model 116 [0023]) by maximizing (optimize [0060]) such as “maximize” a margin score (similarity score close to the lael [0055]) such as “margin score” between cosine similarity values (cosine similarity measure [0028]) of correct URL sets (URL representing the name of the document wherein a term is a substring of the URL thus having a value of 1; Title, where the term is a part of the Title representing a value of 1, [0021]) such as “correct URL sets” and cosine similarity values(cosine similarity measure [0028]) of incorrect URL sets. (URL representing the name of the document wherein a term is not a substring of the URL thus having a value of 0; Title, where the term is not a part of the Title representing a value of 0, [0021]) such as “incorrect URL sets”
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti in view of Lee, Traviesco and Schwedes to incorporate the teachings of Yih wherein the operations further comprise: finetuning the Al model by maximizing a margin score between cosine similarity values of correct URL sets and cosine similarity values of incorrect URL sets. By doing so when processed through the learned model 116, results 122 such as a similarity score or ranking against other feature vectors representing other objects may be used as desired. Yih [0023].
Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Ganti et al. (United States Patent Publication Number 20200273052), in view of Lee (United States Patent Publication Number 2024/0354504) hereinafter Lee, in view of Traviesco (United States Patent Publication Number 2016/0357733) hereinafter Traviesco in view of Holger Schwedes et al. (United States Patent Publication Number 20030018617 ), hereinafter referred to as Schwedes and in further view of Chen et al., (United States Patent Publication Number 2022/0129727) hereinafter Chen
Regarding claim 19 Ganti in view of Lee, Traviesco and Schwedes teaches the system of claim 16,
Ganti as modified does not teach wherein the operations further comprise: down-sampling, in the training data, sets of parallel URLs whose URLs differ only by language identifiers, while up-sampling the training data on sets of parallel URLs whose URLs contain parallel words or parallel phrases.
Chen teaches wherein the operations (operations [0047]) further comprise: down-sampling, (second training sample with a second, lower model score is given lower weighting value [0050]) such as “down-sampling” in the training data, (training dataset [0049]) sets of parallel URLs whose URLs differ only by language identifiers, (one or more attributes of the training dataset [0048]) such as “sets of parallel URLs whose URLs differ only by language identifiers” while up-sampling (first training sample with a first model score is given a higher weighting value [0050]) such as “up-sampling” the training data (training dataset [0049]) on sets of parallel URLs whose URLs contain parallel words or parallel phrases. (one or more attributes of the training dataset [0048]) such as “sets of parallel URLs whose URLs contain parallel words or parallel phrases.”
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti in view of Lee, Traviesco and Schwedes to incorporate the teachings of Chen wherein the operations further comprise: down-sampling, in the training data, sets of parallel URLs whose URLs differ only by language identifiers, while up-sampling the training data on sets of parallel URLs whose URLs contain parallel words or parallel phrases. By doing so in training a classification model based on a training dataset with such a distribution during a second training phase, the disclosed techniques are operable to generate a classification model that is more accurate in an upper portion of the model score distribution, which may be particularly advantageous when the classification model is used to classify elements for which the decision threshold is in the upper portion of the distribution. Chen [0046]
Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Ganti et al. (United States Patent Publication Number 20200273052), in view of Lee (United States Patent Publication Number 2024/0354504) hereinafter Lee, in view of Traviesco (United States Patent Publication Number 2016/0357733) hereinafter Traviesco in view of Holger Schwedes et al. (United States Patent Publication Number 20030018617 ), hereinafter referred to as Schwedes and in further view of Kelly (United States Patent Publication Number 2024/0378247) hereinafter Kelly
Regarding claim 20 Ganti in view of Lee, Traviesco and Schwedes teaches the system of claim 16,
Ganti as modified does not teach wherein determining the level of effectiveness of the Al comprises: comparing URL pairs corresponding to converged similar URL embeddings with corresponding ground truth URL pairs.
Kelly teaches wherein determining the level of effectiveness (maximum effectiveness [0179]) of the Al (artificial intelligence techniques [0085], [0454]) comprises: comparing (comparing [0561]) URL (hyperlink URLs [0091]) pairs (source pairs [0559]) corresponding to converged similar (content features that are similar as indicated by citation patterns with one or more other content features [0580]) URL (hyperlink URLs [0091])embeddings (embeddings [0488]) with corresponding ground truth (ground truth of actual links [0653]) URL (hyperlink URLs [0091]) pairs (domain pairs [0559])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ganti in view of Lee, Traviesco and Schwedes to incorporate the teachings of Kelly wherein determining the level of effectiveness of the Al comprises: comparing URL pairs corresponding to converged similar URL embeddings with corresponding ground truth URL pairs. By doing so a knowledge graph embeddings using the one or more data structures, where generating the plurality of clusters may comprise performing dimension reduction on the knowledge graph embeddings to yield reduced dimension data. Kelly [0012]
Conclusion
7. Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire
THREE MONTHS from the mailing date of this action. In the event a first reply is
filed within TWO MONTHS of the mailing date of this final action and the advisory action
is not mailed until after the end of the THREE-MONTH shortened statutory
period, then the shortened statutory period will expire on the date the advisory
action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be
calculated from the mailing date of the advisory action. In no event, however, will
the statutory period for reply expire later than SIX MONTHS from the date of this
final action.
Examiner interviews are available via telephone, in-person, and video
conferencing using a USPTO supplied web-based collaboration tool. To schedule an
interview, applicant is encouraged to use the USPTO Automated Interview Request
(AIR) at http://www.uspto.gov/interviewpractice.
8. Any inquiry concerning this communication or earlier communications from the
examiner should be directed to Kweku Halm whose telephone number is (469)295-
9144. The examiner can normally be reached on 9:00AM - 5:30PM Mon - Thur. If
attempts to reach the examiner by telephone are unsuccessful, the examiner's
supervisor, Sanjiv Shah can be reached on (571) 272 - 4098. The fax phone
number for the organization where this application or proceeding is assigned is 571-273-
8300.
Information regarding the status of an application may be obtained from the
Patent Application Information Retrieval (PAIR) system. Status information for published
applications may be obtained from either Private PAIR or Public PAIR. Status information
for unpublished applications is available through Private PAIR only. For more
information about the PAIR system, see http://pair-direct.uspto.gov. Should you have
questions on access to the Private PAIR system, contact the Electronic Business Center
(EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer
Service Representative or access to the automated information system, call 800-786-
9199 (IN USA OR CANADA) or 571-272-1000.
/KWEKU WILLIAM HALM/Examiner, Art Unit 2166
/SANJIV SHAH/Supervisory Patent Examiner, Art Unit 2166