DETAILED ACTION
Receipt of Applicant’s Amendment, filed September 19, 2025 is acknowledged.
Claim 21, 24, 29, and 37 was amended.
Claims 1-20, 22-23, 25-28, 30-31, 33-36, 38-39 were cancelled.
Claims 41-43 were newly added
Claims 21, 24, 29, 32, 37, 40, and 41-43 are pending in this office action.
Please note that the claim listing has an apparent typo, listing claim 24 as canceled and as amended. As this has been identified as a typo, claim 24 has been interpreted as being amended as presented.
Claim Interpretation
Please note that claim 37 recites “one or more computer readable storage medium”, which has been interpreted in view of Paragraph [0130] which recites “A computer readable storage medium, as used herein, is not be construed as being transitory signals per se”, as such, the claim may not reasonably be interpreted as a signal per se.
Also note that the program instructions (in claim 37) are explicitly recited as being executed by the processor as recited in the claims (and therefore 112 6th paragraph is not invoked).
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 21, 24, 29, 32, 37, 40, and 41-43 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
With regard to claims 21, 29, and 37 the claim recites “[g] comparing the search query with the keyword indexes of the enhanced index table;
[h] responsive to matching a keyword in the search query with a plurality of keyword indexes of the enhanced index table, identifying a target code page type used to encode the search query;” Claims 29 and 37 appear to recite substantially similar language. There is insufficient antecedent basis for this limitation in the claim.
The recitation of “a keyword” and “a plurality of keyword indexes” lacks antecedent basis. The claim has previously recited “keywords” and “keywords indexes”. It is unclear if applicant is referring to one of these previously recited elements or attempting to define a new element. Furthermore, the distinction between the keywords of the query and keywords of the document is not clear, as both are merely referred to as ‘keywords’.
The claims recite the comparison and matching as two distinct steps of the claim. Yet one of ordinary skill in the art would identify the plain meaning of the comparison and determining step as reciting the same step. This interpretation appears consistent with the instant specification. The determination that the index information “matches” the search query is done by comparing the index information with the search query. Meaning that the comparison and determining are in fact a single operation. The use of distinct labels to refer to the same claim element is an explicit lack of antecedent basis. It makes it unclear how many comparisons/matches are being performed by the claimed system.
For examination purposes this claim limitation has been construed to mean --[g] comparing target keywords the keyword, wherein the search query includes the target keywords; [h] responsive to the comparison identifying a match, identifying a target code page type used to encode the search query;--. Please note ‘the keyword’ is recited in steps [d] and [e] as being included in the enhanced index table, this limitation need not be repeated.
Claim Objections
Claims 21, 24, 29, 32, 37, 40, and 41-43 are objected to because of the following informalities. Appropriate correction is required.
With regard to claims 21, 29 and 37, claim 21 recites “[d] the set of indexing information including keywords… [e] keyword indexes… [g] the keyword indexes”. Claims 29 and 37 appear to recite substantially similar language and are objected to based upon the same rational.
One of ordinary skill in the art would recognize an indexing information including keywords as keyword indexes. The use of distinct labels, for what functionally appears to be the same element raises an antecedent basis issue. It is unclear if applicant is attempting to define a new element, or referring to a previously recited element. For examination purposes the keyword indexes has been construed as referring to the keywords within the set of indexing information.
With regard to claims 21, 29 and 37, claim 21 recites “[e] storing the set of indexing information, including a code page identifier for the first code page type, in an enhanced index table according to the index request, the enhanced index table including keyword indexes for a plurality of documents, including the stored set of indexing information for the document”. Claims 29 and 37 appear to recite substantially similar language and are objected to based upon the same rational.
This claim limitation appears to recite the same requirement twice. The claim recites storing the set of indexing information (herein referred to as element 1)… in an enhanced index table (herein referred to as element 2). The claim then recites that the enhanced index table (element 2) includes the stored set of indexing information (element 1). Effectively, the claims recite storing element 1 in element 2, wherein element 2 includes element 1. Repeating the same requirement multiple times within the claims merely serves to make reading of the claims more complicated and increases the difficulty in understanding the scope of the claimed device.
It is unclear if a distinction between the limitations was intended. This limitation follows limitation [d] which recites that “the set of indexing information including keywords from the document”. One of ordinary skill in the art would recognize that should the set of indexing information (element 1) include keywords (element 3), and be stored in the enhanced index table (element 2), then the enhanced index table (element 2) would thereby include said keywords (element 3) because the keywords (element 3) are part of the indexing information (element 1). It is unnecessary, and problematic to repeat claim limitations as it serves only to blur the meaning of the claims.
For examination purposes this claim limitation has been construed to mean – [d]… the set of indexing information including keywords from the document; [e] storing the set of indexing information, including a code page identifier for the first code page type, in an enhanced index table according to the index request”--.
Claim interpretation for examination purposes, in view of the above objections: Taken in combination the above two issues render the structure of the claimed device unclear, the relationship between the enhanced index table, set of indexing information, and keywords is not clear in view of the above identified issues. For examination purposes these limitations, when taken together have been understood to mean that --the enhanced index table includes the set of indexing information, the set of indexing information includes a code page identifier and the keywords--.
With regard to claim 41, the claim recites “determining a relevance degree between the document and the search query by comparing the decoded current code page and the decoded target code page.”.
This claim depends from claim 21, which recites “[i] determining, based in part on code page tracking, respective relevance degrees between the search query and the plurality of keyword indexes”. The distinction between the relevance degree and the respective relevance degree is unclear, one of ordinary skill in the art may reasonable read claim 41 as further defining the respective relevance degree calculation of claim 21. Yet the claim using distinct language suggesting that the claim element is intended to be distinct.
It is unclear how many comparisons are being claimed, the parent claim (claim 21) recites a comparison in step [g], [i], and now in claim 41. It is unclear if the comparison in claim 41 is a distinct operation, or attempting to further define on the previously recited comparisons. It is suggested that the comparisons be given distinct labels, for example, first and second.
For examination purposes this claim limitation has been construed to mean --wherein the second comparison (referring to the comparison in step [i]) compares the decoded current code page and the decoded target code page to determine the respective relevance degree--.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim 21, 24, 29, 32, 37, 40-43 are rejected under 35 U.S.C. 103 as being unpatentable over Fan[2017/0329839] in view of Swen [2006/0117002], Liu [2013/0275403]
With regard to claim 21 Fan further teaches A computer-implemented method for code page tracking (Fan, ¶29 “ The information may be saved in one container, but with different codepages”) for search query response (Fan, ¶52 “That is, in response to a search request, a plurality of documents are presented as a search result list”), the computer-implemented method comprising:
[a] Receiving an index request as the indexing request being obtained by the system (Fan, ¶34 “The code point identifier 202 is configured to, in response to an indexing request for a document… the indexing request for a document may be input from an external device”) for a document as the document (Id), wherein a plurality of document characters as the characters of the document (Fan, ¶2 “A codepage may be a table of values that describes the characters of a document”) are encoded in a current code page as the codepage is the table that describes the characters of the document, one of the ‘different codepages’ (Id; ¶29 “The two attachments may have different codepages from that of the email body. When the email body and its attachments are parsed to create an index, it is possible to create a wrong index if there is no correct codepage indication for each of the individual email parts, including the email body and the attachments”) and wherein the current code page is a table defining a plurality of code points (Fan, ¶2 “A codepage may be a table of values that describes the characters of a document”; ¶29 “The two attachments may have different codepages from that of the email body. When the email body and its attachments are parsed to create an index, it is possible to create a wrong index if there is no correct codepage indication for each of the individual email parts, including the email body and the attachments”) and wherein the plurality of code points are specific sequences of bits (Figure 3, see the example code points; ¶41 “In the Parser 1, the codepage ISO8859-l may be used to interpret the above unknown code points to obtain a first set of converted code points (Unicode) as shown by reference numerals 301, that is, characters A, B, C, D, E, F, G, H, I, and J.”) used to represent specific characters (Fan, ¶28 “a code point or code position is any of numerical values that make up a code space. Many code points represent single characters”) or [[
[b] collecting character encoding data for the document as known code points (Fan, ¶35 “if a known codepage ( chars et) is provided (i.e., a corresponding character set for interpreting the code point is known), then it is referred to as "known code point."”) and the unknown code points after conversion (Fan, ¶41 “In the Parser 1, the codepage ISO8859-l may be used to interpret the above unknown code points to obtain a first set of converted code points (Unicode) as shown by reference numerals 301, that is, characters A, B, C, D, E, F, G, H, I, and J.”), including, where available, [[ as default codepage (Fan, ¶29 “One scenario is that for the plain text with unknown codepages, if it is assumed that a plain text are encoded with a default codepage, then an index built for the plain text is also based on the default codepage, it may be misunderstood because the full text index is inaccurate and insufficient.”);
[c] identifying the current code page for the document as identifying the code point for the document (Fan, ¶34 “document, identify the unknown code points for a document”; ¶35 “The known code points can be transmitted directly to the code point parser 208 from the code point identifier 202 for generating one set of converted code points for indexing in a posting list repository”) based on the character encoding data (Fan, ¶28 “a code point or code position is any of numerical values that make up a code space. Many code points represent single characters”), the current code page as one of different codepages (Fan, ¶29 “The information may be saved in one container, but with different codepages”) being a first code page type (Fan, ¶38 “The codepages according to at least one embodiment of the present disclosure could be different character sets (charsets), such as, but not limited to, charset ISO8859-1, charset GB18030, charset ISO8859-15, charset Windows-1252, GB2312, etc. The codepages according to at least one embodiment of the present disclosure are not limited to the first codepage and the second codepage, but may include more codepages. Here, the type and quantity of the codepages can be determined by one skilled in the art according to the actual needs, and the quantity of codepages may not limit the scope of the present disclosure”);
[d] generating a set of indexing information as obtaining the file frequency, location, and other information for the document (Fan, ¶46 “in the indexing engine 210 (FIG. 2), an association may be built between the converted code points and the file in accordance with frequency, location, and other information that the converted code points appear in the file, as well as information on the association stored in the posting-list repository for building an index. The detailed implementation of the indexing engine 210 (FIG. 2) may be well known by those skilled in the art, so the detailed implementation is omitted here.”) for the document as the document (Id), based on the character encoding data (Fan, ¶41 “In the Parser 1, the codepage ISO8859-l may be used to interpret the above unknown code points to obtain a first set of converted code points (Unicode) as shown by reference numerals 301, that is, characters A, B, C, D, E, F, G, H, I, and J.”) of the document and the first code page type as the codepage ISO8859-I (Id), the set of indexing information including keywords from the document as the keywords in the index (¶51 “for example, when a user enters a keyword in the search engine and clicks a button such as "Search" the search engine traverses the posting-list repository, from which an index matching the keywords entered by the user is found, and associated documents are found.”);
[e] storing the set of indexing information as obtaining the file frequency, location, and other information for the document (Fan, ¶46; FIG. 2 210), including a code page identifier for the first code page type as the codepage ISO8859-I (Fan, ¶41), in an enhanced index table (Fan, ¶45 “the indexing engine 210 (FIG. 2) may be configured to build an index based on the converted point codes output”) according to the index request (Fan, ¶34 “The code point identifier 202 is configured to, in response to an indexing request for a document… the indexing request for a document may be input from an external device”), the enhanced index table as the index (Fan ¶45) [[as the keywords in the index (¶51 “for example, when a user enters a keyword in the search engine and clicks a button such as "Search" the search engine traverses the posting-list repository, from which an index matching the keywords entered by the user is found, and associated documents are found.”), including the stored set of indexing information for the document (Fan, ¶46; FIG. 2 210);
[f] receiving a search query (Fan, ¶51 “When a search engine receives a search request, for example, when a user enters a keyword in the search engine and clicks a button such as "Search" the search engine traverses the posting-list repository, from which an index matching the keywords entered by the user is found, and associated documents are found.”) ;
[g] comparing the search query as matching keywords entered by the user (Id) with the keyword indexes as the index matching the keywords (Id) of the enhanced index table (Fan, ¶45 “the indexing engine 210 (FIG. 2));
[h] responsive to matching a keyword in the search query with a plurality of keyword indexes of the enhanced index table as the resultant documents are sorted based on the weight of the code points (Fan, ¶52 “The search engine sorts all resultant documents based on the search conditions stored in the knowledge base”; ¶59 “The method further includes a step in which a weight assigned to a set of converted code points.”), [[
[i] determining, based in part on code page tracking (Fan, ¶29 “ The information may be saved in one container, but with different codepages”), respective relevance degrees (¶59 “The method further includes a step in which a weight assigned to a set of converted code points.”) between the search query (Fan, ¶51) and the plurality of keyword indexes as the index matching the keywords (Id) by [[ as the codepage ISO8859-I (Fan, ¶41) associated with the plurality of keyword indexes as the keywords in the index (¶51) in the enhanced index table as the index (Fan ¶45); and
[j] responsive to the search query, returning a ranked set of documents based on the respective relevance degrees of the plurality of keyword indexes and corresponding documents, including the document (Fan, ¶60 “The method may further include a step in which the searched documents based on the redundant indexes are presented to the user in descending order of the weights of the redundant indexes. A display 24 (FIG. 1) could be used to present the resulted documents to the user in a manner of displaying information on its screen.”).
Fan does not explicitly teach [a] wherein the plurality of code points …used to represent … or words [e] the enhanced index table included keyword indexes…
Swen teaches [a] wherein the plurality of code points …used to represent … or words (Swen, ¶20 “In many cases the inverted index may include more information on the frequency, occurrence positions and text formats of each keyword in each document. A”);
[e] storing the set of indexing information, including a code page identifier for the first code page type as text format of each keyword (Swen, ¶20 “In many cases the inverted index may include more information on the frequency, occurrence positions and text formats of each keyword in each document. A”)… the enhanced index table as the inverted index (Id) including keyword indexes for a plurality of documents as each keyword in each document (Swen, ¶20 “Large-scale document retrieval systems generally use inverted indexes, i.e., indexes that record for each keyword (called an index keyword) a list of documents that contain that keyword. … An inverted index consists of many inverted lists, each of which corresponds to an index keyword. In many cases the inverted index may include more information on the frequency, occurrence positions and text formats of each keyword in each document. A”), including the stored set of indexing information for the document as the text formats (Id);
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the indexing device taught by Fran, to generate an index storing the inverted indexing information as well as the format for the keyword (e.g. the codepage information) within a single structure, as it yields the predictable results of enabling a user to search for and rank documents based on their format (e.g. the specified clustering technique (Swen, ¶22 “the search result clustering method of the present invention uses some particular pre-retrieval processing on the documents and their inverted index to facilitate more efficient techniques for determining and ranking the clusters of result documents”).
Fan does not explicitly teach [b] historical code page data… [h] identifying a target code page type used to encode the search query… [i] by comparing the target code page type with corresponding.
Liu teaches [b] historical code page data as historical character code point (Liu, ¶24 “The illustrative embodiments construct a historic character code point table and use the table to identify multi-code point characters”) …
[h] responsive to matching a keyword in the search query …, identifying a target code page type used to encode the search query as identifying the code points of the characters in the query (Liu, ¶46 “query modification logic 310 analyzes user query 324 to identify all of the characters in user query 324 and their associated encoding code point based on the current encoding of search engine 312.”);
[i] determining… respective relevance degrees … by comparing the target code page type as determining if the code point exists within historic code point table (Liu, ¶46 “query modification logic 310 identifies whether any of the associated code points exist within historic code point table 318”) and comparing the character image (Liu, ¶42 “For characters that only exist in the PUA, character image matching may be used to determine the code points from different vendors. For example, for a PUA character defined by AIX®, an image recognition program or other method may take the image of the character from AIX® and compare with PUA characters defined by Oracle® Solaris one by one in order to find out the code point for the character in Oracle® Solaris.”) with corresponding code page types as the code points in the historical table (Liu, ¶46) and the PUA characters defined by Orical Solaris (Liu, ¶42) associated with the plurality of keyword indexes in the enhanced index table as the code point table (¶24 “The illustrative embodiments construct a historic character code point table and use the table to identify multi-code point characters”);
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the search engine taught by the proposed combination to utilize the stored code point information in the index to find documents containing both matching keywords and the code page of the query string as taught by Liu as it yields the predictable results of ensuring that the system is able to not only find documents containing the content the user is looking for, but that the system is able to find documents containing characters in a particular format (Liu, ¶12, ¶24, ¶48).
With regard to claims 24, 32 and 40 the proposed combination further teaches generating enhanced indexing information as the extended index (Swen, ¶45 “inverted index data structure that is extended with the keyword associated clustering information for each of the indexed documents… a pointer to the list of records that include the information of occurrence positions and text formats of keyword word_id in document doc_id, which is denoted by position_list_ptr in the diagram”), wherein the enhanced indexing information is when some of the code points as the determined format (Fan, ¶31 “"IS0-8859-12" is a codepage used to describe code points of the Pdf.pdf document, so the code points of the Pdf.pdf document are referred to as "known code points,"”; ¶34 “document, identify the unknown code points for a document”; ¶35 “The known code points can be transmitted directly to the code point parser 208 from the code point identifier 202 for generating one set of converted code points for indexing in a posting list repository”; Swen, ¶45 “inverted index data structure that is extended with the keyword associated clustering information for each of the indexed documents… a pointer to the list of records that include the information of occurrence positions and text formats of keyword word_id in document doc_id, which is denoted by position_list_ptr in the diagram”) are defined by reserving one or more bytes see the 10 length array for the Unicode (Fan, Figure 3, see 301-304) reserved for use as the bytes that make up specifically Unicode (Fan, Figure 3, see 301-304), resulting in a plurality of reserved fields as the plurality of fields in the extended index (Id).
With regard to claim 41 the proposed combination further teaches decoding the current code page as converting the code points (Fan, ¶46 “in the indexing engine 210 (FIG. 2), an association may be built between the converted code points and the file in accordance with frequency, location, and other information that the converted code points appear in the file, as well as information on the association stored in the posting-list repository for building an index. The detailed implementation of the indexing engine 210 (FIG. 2) may be well known by those skilled in the art, so the detailed implementation is omitted here.”; Please note that ‘decoding’ has been read as identifying the code page information when viewed in light of Paragraph [100] of the original specification) and the target code page (Liu, ¶47 “Query modification logic 310 then creates numerous code point sets for user query 3 24 that combines selected ones of the identified code points”); and
determining a relevance degree as a weight, e.g. a matching degree (Fan, ¶52 “That is, if the index for searching is the redundant index built in accordance with the embodiments of this disclosure, the weight of the redundant index will be computed based on a matching degree of the keyword with the associated documents, such as the location and frequency of the keyword appearing in the documents.”) between the document as the associated documents (Id) and the search query as the keyword (Id) by comparing as matching degree (Id) the decoded current code page as converting the code points (Fan, ¶46 “in the indexing engine 210 (FIG. 2), an association may be built between the converted code points and the file in accordance with frequency, location, and other information that the converted code points appear in the file, as well as information on the association stored in the posting-list repository for building an index. The detailed implementation of the indexing engine 210 (FIG. 2) may be well known by those skilled in the art, so the detailed implementation is omitted here.”)and the decoded target code page (Liu, ¶47 “Query modification logic 310 then creates numerous code point sets for user query 3 24 that combines selected ones of the identified code points”).
With regard to claim 42 the proposed combination further teaches wherein the code page type (¶38 “The codepages according to at least one embodiment of the present disclosure could be different character sets (charsets), such as, but not limited to, charset ISO8859- 1, charset GB18030, charset ISO8859-15, charset Windows-1252, GB2312, etc.”) is a member of the group consisting of (Please note this claim language has been identified as reciting a Markush type grouping of alternatives, MPEP 2117):
a) Windows-1250 as Windows-1252 is substantially equivalent (Id),
b) UCS-4,
c) ISO-8859-1 as ISO8859-1 (Id),
d) UTF-7,
e) UTF-32,
f) IBM852, and
g) GB18030 as GB18030 (Id).
With regard to claim 43 the proposed combination further teaches wherein determining respective relevance degrees is based on weights assigned to the corresponding documents according to the associated code page types in the enhanced index table as a weight, e.g. a matching degree (Fan, ¶52 “That is, if the index for searching is the redundant index built in accordance with the embodiments of this disclosure, the weight of the redundant index will be computed based on a matching degree of the keyword with the associated documents, such as the location and frequency of the keyword appearing in the documents.”).
With regard to claim 29 Fan further teaches A computer system for code page tracking (Fan, ¶29 “ The information may be saved in one container, but with different codepages”) for search query response (Fan, ¶52 “That is, in response to a search request, a plurality of documents are presented as a search result list”), the computer-implemented method comprising:
one or more computer processors (Fan, Figure 1 16 “Processing Unit”; ¶33 “the code point identifier 202 may correspond to a first program instruction run by the processor unit 16 (FIG. 1),”);
one or more computer readable storage devices (Fan, ¶12 “The computer program product may include a computer readable storage medium ( or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.”); and
program instructions stored on the computer readable storage devices for execution by at least one of the one or more computer processors (Id), the program instructions comprising:
[a] Receive an index request as the indexing request being obtained by the system (Fan, ¶34 “The code point identifier 202 is configured to, in response to an indexing request for a document… the indexing request for a document may be input from an external device”) for a document as the document (Id), wherein a plurality of document characters as the characters of the document (Fan, ¶2 “A codepage may be a table of values that describes the characters of a document”) are encoded in a current code page as the codepage is the table that describes the characters of the document, one of the ‘different codepages’ (Id; ¶29 “The two attachments may have different codepages from that of the email body. When the email body and its attachments are parsed to create an index, it is possible to create a wrong index if there is no correct codepage indication for each of the individual email parts, including the email body and the attachments”) and wherein the current code page is a table defining a plurality of code points (Fan, ¶2 “A codepage may be a table of values that describes the characters of a document”; ¶29 “The two attachments may have different codepages from that of the email body. When the email body and its attachments are parsed to create an index, it is possible to create a wrong index if there is no correct codepage indication for each of the individual email parts, including the email body and the attachments”) and wherein the plurality of code points are specific sequences of bits (Figure 3, see the example code points; ¶41 “In the Parser 1, the codepage ISO8859-l may be used to interpret the above unknown code points to obtain a first set of converted code points (Unicode) as shown by reference numerals 301, that is, characters A, B, C, D, E, F, G, H, I, and J.”) used to represent specific characters (Fan, ¶28 “a code point or code position is any of numerical values that make up a code space. Many code points represent single characters”) or [[
[b] collect character encoding data for the document as known code points (Fan, ¶35 “if a known codepage ( chars et) is provided (i.e., a corresponding character set for interpreting the code point is known), then it is referred to as "known code point."”) and the unknown code points after conversion (Fan, ¶41 “In the Parser 1, the codepage ISO8859-l may be used to interpret the above unknown code points to obtain a first set of converted code points (Unicode) as shown by reference numerals 301, that is, characters A, B, C, D, E, F, G, H, I, and J.”), including, where available, [[ as default codepage (Fan, ¶29 “One scenario is that for the plain text with unknown codepages, if it is assumed that a plain text are encoded with a default codepage, then an index built for the plain text is also based on the default codepage, it may be misunderstood because the full text index is inaccurate and insufficient.”);
[c] identify the current code page for the document as identifying the code point for the document (Fan, ¶34 “document, identify the unknown code points for a document”; ¶35 “The known code points can be transmitted directly to the code point parser 208 from the code point identifier 202 for generating one set of converted code points for indexing in a posting list repository”) based on the character encoding data (Fan, ¶28 “a code point or code position is any of numerical values that make up a code space. Many code points represent single characters”), the current code page as one of different codepages (Fan, ¶29 “The information may be saved in one container, but with different codepages”) being a first code page type (Fan, ¶38 “The codepages according to at least one embodiment of the present disclosure could be different character sets (charsets), such as, but not limited to, charset ISO8859-1, charset GB18030, charset ISO8859-15, charset Windows-1252, GB2312, etc. The codepages according to at least one embodiment of the present disclosure are not limited to the first codepage and the second codepage, but may include more codepages. Here, the type and quantity of the codepages can be determined by one skilled in the art according to the actual needs, and the quantity of codepages may not limit the scope of the present disclosure”);
[d] generate a set of indexing information as obtaining the file frequency, location, and other information for the document (Fan, ¶46 “in the indexing engine 210 (FIG. 2), an association may be built between the converted code points and the file in accordance with frequency, location, and other information that the converted code points appear in the file, as well as information on the association stored in the posting-list repository for building an index. The detailed implementation of the indexing engine 210 (FIG. 2) may be well known by those skilled in the art, so the detailed implementation is omitted here.”) for the document as the document (Id), based on the character encoding data (Fan, ¶41 “In the Parser 1, the codepage ISO8859-l may be used to interpret the above unknown code points to obtain a first set of converted code points (Unicode) as shown by reference numerals 301, that is, characters A, B, C, D, E, F, G, H, I, and J.”) of the document and the first code page type as the codepage ISO8859-I (Id), the set of indexing information including keywords from the document as the keywords in the index (¶51 “for example, when a user enters a keyword in the search engine and clicks a button such as "Search" the search engine traverses the posting-list repository, from which an index matching the keywords entered by the user is found, and associated documents are found.”);
[e] store the set of indexing information as obtaining the file frequency, location, and other information for the document (Fan, ¶46; FIG. 2 210), including a code page identifier for the first code page type as the codepage ISO8859-I (Fan, ¶41), in an enhanced index table (Fan, ¶45 “the indexing engine 210 (FIG. 2) may be configured to build an index based on the converted point codes output”) according to the index request (Fan, ¶34 “The code point identifier 202 is configured to, in response to an indexing request for a document… the indexing request for a document may be input from an external device”), the enhanced index table as the index (Fan ¶45) [[as the keywords in the index (¶51 “for example, when a user enters a keyword in the search engine and clicks a button such as "Search" the search engine traverses the posting-list repository, from which an index matching the keywords entered by the user is found, and associated documents are found.”), including the stored set of indexing information for the document (Fan, ¶46; FIG. 2 210);
[f] receive a search query (Fan, ¶51 “When a search engine receives a search request, for example, when a user enters a keyword in the search engine and clicks a button such as "Search" the search engine traverses the posting-list repository, from which an index matching the keywords entered by the user is found, and associated documents are found.”) ;
[g] compare the search query as matching keywords entered by the user (Id) with the keyword indexes as the index matching the keywords (Id) of the enhanced index table (Fan, ¶45 “the indexing engine 210 (FIG. 2));
[h] responsive to matching a keyword in the search query with a plurality of keyword indexes of the enhanced index table as the resultant documents are sorted based on the weight of the code points (Fan, ¶52 “The search engine sorts all resultant documents based on the search conditions stored in the knowledge base”; ¶59 “The method further includes a step in which a weight assigned to a set of converted code points.”), [[
[i] determine, based in part on code page tracking (Fan, ¶29 “ The information may be saved in one container, but with different codepages”), respective relevance degrees (¶59 “The method further includes a step in which a weight assigned to a set of converted code points.”) between the search query (Fan, ¶51) and the plurality of keyword indexes as the index matching the keywords (Id) by [[ as the codepage ISO8859-I (Fan, ¶41) associated with the plurality of keyword indexes as the keywords in the index (¶51) in the enhanced index table as the index (Fan ¶45); and
[j] responsive to the search query, return a ranked set of documents based on the respective relevance degrees of the plurality of keyword indexes and corresponding documents, including the document (Fan, ¶60 “The method may further include a step in which the searched documents based on the redundant indexes are presented to the user in descending order of the weights of the redundant indexes. A display 24 (FIG. 1) could be used to present the resulted documents to the user in a manner of displaying information on its screen.”).
Fan does not explicitly teach [a] wherein the plurality of code points …used to represent … or words [e] the enhanced index table included keyword indexes…
Swen teaches [a] wherein the plurality of code points …used to represent … or words (Swen, ¶20 “In many cases the inverted index may include more information on the frequency, occurrence positions and text formats of each keyword in each document. A”);
[e] store the set of indexing information, including a code page identifier for the first code page type as text format of each keyword (Swen, ¶20 “In many cases the inverted index may include more information on the frequency, occurrence positions and text formats of each keyword in each document. A”)… the enhanced index table as the inverted index (Id) including keyword indexes for a plurality of documents as each keyword in each document (Swen, ¶20 “Large-scale document retrieval systems generally use inverted indexes, i.e., indexes that record for each keyword (called an index keyword) a list of documents that contain that keyword. … An inverted index consists of many inverted lists, each of which corresponds to an index keyword. In many cases the inverted index may include more information on the frequency, occurrence positions and text formats of each keyword in each document. A”), including the stored set of indexing information for the document as the text formats (Id);
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the indexing device taught by Fran, to generate an index storing the inverted indexing information as well as the format for the keyword (e.g. the codepage information) within a single structure, as it yields the predictable results of enabling a user to search for and rank documents based on their format (e.g. the specified clustering technique (Swen, ¶22 “the search result clustering method of the present invention uses some particular pre-retrieval processing on the documents and their inverted index to facilitate more efficient techniques for determining and ranking the clusters of result documents”).
Fan does not explicitly teach [b] historical code page data… [h] identify a target code page type used to encode the search query… [i] by comparing the target code page type with corresponding.
Liu teaches [b] historical code page data as historical character code point (Liu, ¶24 “The illustrative embodiments construct a historic character code point table and use the table to identify multi-code point characters”) …
[h] responsive to matching a keyword in the search query …, identify a target code page type used to encode the search query as identifying the code points of the characters in the query (Liu, ¶46 “query modification logic 310 analyzes user query 324 to identify all of the characters in user query 324 and their associated encoding code point based on the current encoding of search engine 312.”);
[i] determining… respective relevance degrees … by comparing the target code page type as determining if the code point exists within historic code point table (Liu, ¶46 “query modification logic 310 identifies whether any of the associated code points exist within historic code point table 318”) and comparing the character image (Liu, ¶42 “For characters that only exist in the PUA, character image matching may be used to determine the code points from different vendors. For example, for a PUA character defined by AIX®, an image recognition program or other method may take the image of the character from AIX® and compare with PUA characters defined by Oracle® Solaris one by one in order to find out the code point for the character in Oracle® Solaris.”) with corresponding code page types as the code points in the historical table (Liu, ¶46) and the PUA characters defined by Orical Solaris (Liu, ¶42) associated with the plurality of keyword indexes in the enhanced index table as the code point table (¶24 “The illustrative embodiments construct a historic character code point table and use the table to identify multi-code point characters”);
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the search engine taught by the proposed combination to utilize the stored code point information in the index to find documents containing both matching keywords and the code page of the query string as taught by Liu as it yields the predictable results of ensuring that the system is able to not only find documents containing the content the user is looking for, but that the system is able to find documents containing characters in a particular format (Liu, ¶12, ¶24, ¶48).
With regard to claim 37, Fan teaches A computer program product for code page indexing, the computer program product comprising:
one or more computer readable storage medium and program instructions stored on at least one of the one or more computer readable storage medium, the program instructions executable by the one or more computer processors (Fan, ¶12 “The computer program product may include a computer readable storage medium ( or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.”) and further comprising:
[a] program instructions to (Fran, ¶12) receive an index request as the indexing request being obtained by the system (Fan, ¶34 “The code point identifier 202 is configured to, in response to an indexing request for a document… the indexing request for a document may be input from an external device”) for a document as the document (Id), wherein a plurality of document characters as the characters of the document (Fan, ¶2 “A codepage may be a table of values that describes the characters of a document”) are encoded in a current code page as the codepage is the table that describes the characters of the document, one of the ‘different codepages’ (Id; ¶29 “The two attachments may have different codepages from that of the email body. When the email body and its attachments are parsed to create an index, it is possible to create a wrong index if there is no correct codepage indication for each of the individual email parts, including the email body and the attachments”) and wherein the current code page is a table defining a plurality of code points (Fan, ¶2 “A codepage may be a table of values that describes the characters of a document”; ¶29 “The two attachments may have different codepages from that of the email body. When the email body and its attachments are parsed to create an index, it is possible to create a wrong index if there is no correct codepage indication for each of the individual email parts, including the email body and the attachments”) and wherein the plurality of code points are specific sequences of bits (Figure 3, see the example code points; ¶41 “In the Parser 1, the codepage ISO8859-l may be used to interpret the above unknown code points to obtain a first set of converted code points (Unicode) as shown by reference numerals 301, that is, characters A, B, C, D, E, F, G, H, I, and J.”) used to represent specific characters (Fan, ¶28 “a code point or code position is any of numerical values that make up a code space. Many code points represent single characters”) or [[
[b] program instructions to (Fran, ¶12) collect character encoding data for the document as known code points (Fan, ¶35 “if a known codepage ( chars et) is provided (i.e., a corresponding character set for interpreting the code point is known), then it is referred to as "known code point."”) and the unknown code points after conversion (Fan, ¶41 “In the Parser 1, the codepage ISO8859-l may be used to interpret the above unknown code points to obtain a first set of converted code points (Unicode) as shown by reference numerals 301, that is, characters A, B, C, D, E, F, G, H, I, and J.”), including, where available, [[ as default codepage (Fan, ¶29 “One scenario is that for the plain text with unknown codepages, if it is assumed that a plain text are encoded with a default codepage, then an index built for the plain text is also based on the default codepage, it may be misunderstood because the full text index is inaccurate and insufficient.”);
[c] program instructions to (Fran, ¶12) identify the current code page for the document as identifying the code point for the document (Fan, ¶34 “document, identify the unknown code points for a document”; ¶35 “The known code points can be transmitted directly to the code point parser 208 from the code point identifier 202 for generating one set of converted code points for indexing in a posting list repository”) based on the character encoding data (Fan, ¶28 “a code point or code position is any of numerical values that make up a code space. Many code points represent single characters”), the current code page as one of different codepages (Fan, ¶29 “The information may be saved in one container, but with different codepages”) being a first code page type (Fan, ¶38 “The codepages according to at least one embodiment of the present disclosure could be different character sets (charsets), such as, but not limited to, charset ISO8859-1, charset GB18030, charset ISO8859-15, charset Windows-1252, GB2312, etc. The codepages according to at least one embodiment of the present disclosure are not limited to the first codepage and the second codepage, but may include more codepages. Here, the type and quantity of the codepages can be determined by one skilled in the art according to the actual needs, and the quantity of codepages may not limit the scope of the present disclosure”);
[d] program instructions to (Fran, ¶12) generate a set of indexing information as obtaining the file frequency, location, and other information for the document (Fan, ¶46 “in the indexing engine 210 (FIG. 2), an association may be built between the converted code points and the file in accordance with frequency, location, and other information that the converted code points appear in the file, as well as information on the association stored in the posting-list repository for building an index. The detailed implementation of the indexing engine 210 (FIG. 2) may be well known by those skilled in the art, so the detailed implementation is omitted here.”) for the document as the document (Id), based on the character encoding data (Fan, ¶41 “In the Parser 1, the codepage ISO8859-l may be used to interpret the above unknown code points to obtain a first set of converted code points (Unicode) as shown by reference numerals 301, that is, characters A, B, C, D, E, F, G, H, I, and J.”) of the document and the first code page type as the codepage ISO8859-I (Id), the set of indexing information including keywords from the document as the keywords in the index (¶51 “for example, when a user enters a keyword in the search engine and clicks a button such as "Search" the search engine traverses the posting-list repository, from which an index matching the keywords entered by the user is found, and associated documents are found.”);
[e] program instructions to (Fran, ¶12) store the set of indexing information as obtaining the file frequency, location, and other information for the document (Fan, ¶46; FIG. 2 210), including a code page identifier for the first code page type as the codepage ISO8859-I (Fan, ¶41), in an enhanced index table (Fan, ¶45 “the indexing engine 210 (FIG. 2) may be configured to build an index based on the converted point codes output”) according to the index request (Fan, ¶34 “The code point identifier 202 is configured to, in response to an indexing request for a document… the indexing request for a document may be input from an external device”), the enhanced index table as the index (Fan ¶45) [[as the keywords in the index (¶51 “for example, when a user enters a keyword in the search engine and clicks a button such as "Search" the search engine traverses the posting-list repository, from which an index matching the keywords entered by the user is found, and associated documents are found.”), including the stored set of indexing information for the document (Fan, ¶46; FIG. 2 210);
[f] program instructions to (Fran, ¶12) receive a search query (Fan, ¶51 “When a search engine receives a search request, for example, when a user enters a keyword in the search engine and clicks a button such as "Search" the search engine traverses the posting-list repository, from which an index matching the keywords entered by the user is found, and associated documents are found.”) ;
[g] program instructions to (Fran, ¶12) compare the search query as matching keywords entered by the user (Id) with the keyword indexes as the index matching the keywords (Id) of the enhanced index table (Fan, ¶45 “the indexing engine 210 (FIG. 2));
[h] responsive to matching a keyword in the search query with a plurality of keyword indexes of the enhanced index table as the resultant documents are sorted based on the weight of the code points (Fan, ¶52 “The search engine sorts all resultant documents based on the search conditions stored in the knowledge base”; ¶59 “The method further includes a step in which a weight assigned to a set of converted code points.”), program instructions to (Fran, ¶12) [[
[i] program instructions to (Fran, ¶12) determine, based in part on code page tracking (Fan, ¶29 “ The information may be saved in one container, but with different codepages”), respective relevance degrees (¶59 “The method further includes a step in which a weight assigned to a set of converted code points.”) between the search query (Fan, ¶51) and the plurality of keyword indexes as the index matching the keywords (Id) by [[ as the codepage ISO8859-I (Fan, ¶41) associated with the plurality of keyword indexes as the keywords in the index (¶51) in the enhanced index table as the index (Fan ¶45); and
[j] program instructions to (Fran, ¶12), responsive to the search query, return a ranked set of documents based on the respective relevance degrees of the plurality of keyword indexes and corresponding documents, including the document (Fan, ¶60 “The method may further include a step in which the searched documents based on the redundant indexes are presented to the user in descending order of the weights of the redundant indexes. A display 24 (FIG. 1) could be used to present the resulted documents to the user in a manner of displaying information on its screen.”).
Fan does not explicitly teach [a] wherein the plurality of code points …used to represent … or words [e] the enhanced index table included keyword indexes…
Swen teaches [a] wherein the plurality of code points …used to represent … or words (Swen, ¶20 “In many cases the inverted index may include more information on the frequency, occurrence positions and text formats of each keyword in each document. A”);
[e] store the set of indexing information, including a code page identifier for the first code page type as text format of each keyword (Swen, ¶20 “In many cases the inverted index may include more information on the frequency, occurrence positions and text formats of each keyword in each document. A”)… the enhanced index table as the inverted index (Id) including keyword indexes for a plurality of documents as each keyword in each document (Swen, ¶20 “Large-scale document retrieval systems generally use inverted indexes, i.e., indexes that record for each keyword (called an index keyword) a list of documents that contain that keyword. … An inverted index consists of many inverted lists, each of which corresponds to an index keyword. In many cases the inverted index may include more information on the frequency, occurrence positions and text formats of each keyword in each document. A”), including the stored set of indexing information for the document as the text formats (Id);
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the indexing device taught by Fran, to generate an index storing the inverted indexing information as well as the format for the keyword (e.g. the codepage information) within a single structure, as it yields the predictable results of enabling a user to search for and rank documents based on their format (e.g. the specified clustering technique (Swen, ¶22 “the search result clustering method of the present invention uses some particular pre-retrieval processing on the documents and their inverted index to facilitate more efficient techniques for determining and ranking the clusters of result documents”).
Fan does not explicitly teach [b] historical code page data… [h] identify a target code page type used to encode the search query… [i] by comparing the target code page type with corresponding.
Liu teaches [b] historical code page data as historical character code point (Liu, ¶24 “The illustrative embodiments construct a historic character code point table and use the table to identify multi-code point characters”) …
[h] responsive to matching a keyword in the search query …, identify a target code page type used to encode the search query as identifying the code points of the characters in the query (Liu, ¶46 “query modification logic 310 analyzes user query 324 to identify all of the characters in user query 324 and their associated encoding code point based on the current encoding of search engine 312.”);
[i] determining… respective relevance degrees … by comparing the target code page type as determining if the code point exists within historic code point table (Liu, ¶46 “query modification logic 310 identifies whether any of the associated code points exist within historic code point table 318”) and comparing the character image (Liu, ¶42 “For characters that only exist in the PUA, character image matching may be used to determine the code points from different vendors. For example, for a PUA character defined by AIX®, an image recognition program or other method may take the image of the character from AIX® and compare with PUA characters defined by Oracle® Solaris one by one in order to find out the code point for the character in Oracle® Solaris.”) with corresponding code page types as the code points in the historical table (Liu, ¶46) and the PUA characters defined by Orical Solaris (Liu, ¶42) associated with the plurality of keyword indexes in the enhanced index table as the code point table (¶24 “The illustrative embodiments construct a historic character code point table and use the table to identify multi-code point characters”);
It would have been obvious to one of ordinary skill to which said subject matter pertains at the time the invention was filed to have implemented the search engine taught by the proposed combination to utilize the stored code point information in the index to find documents containing both matching keywords and the code page of the query string as taught by Liu as it yields the predictable results of ensuring that the system is able to not only find documents containing the content the user is looking for, but that the system is able to find documents containing characters in a particular format (Liu, ¶12, ¶24, ¶48).
Response to Arguments
Applicant's arguments filed September 19, 2025 have been fully considered but they are not persuasive.
Regarding the argued distinction between code page indexing and code page tracking, applicant argues that code page tracking is “more than code page indexing in that the tracking of the code page provides for both indexing keywords and accurate search query responses”.
In response, one of ordinary skill in the art would recognize that the act of indexing the code page, e.g. storing the code page in the index functions to ‘track’ the code pages. The distinction applicant argues appears to qualify as within the broadest reasonable interpretation of both terms (code page tracking and code page indexing). One of ordinary skill in the art would identify code page tracking and code page indexing as the same thing. The discussion of Paragraph 56 is a discussion of standard keyword indexing (e.g. when the code page is not stored in the index, see the language “Generally, to index different document”, please note the specification does not say ‘code page indexing’ here). The standard index (e.g. keyword only index) is distinct from ‘code page indexing’ which one of ordinary skill in the art would recognize the plain meaning of these terms as requiring code page information to be stored in the index, thereby ‘tracking’ the code page. In applicants arguments they detail that “tracking” is storing the code page types in the index (See Page 10 “The code page types are tracked for various keyword indexes of the search documents so that a plurality of code page types may pertain to a single keyword in the enhanced index table”). The act of storing the code page in the index, is what builds the ‘code page index’. It is code page indexing. Therefore within applicant’s own arguments, “code page tracking” appears to be the act of indexing the code page (e.g. building the code page index).
All applicants arguments appear to be directed to the newly presented claim amendments and are therefore addressed in the claim mapping above.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMANDA WILLIS whose telephone number is (571)270-7691. The examiner can normally be reached Monday-Friday 8am-2pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ajay Bhatia can be reached at 571-272-3906. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/AMANDA L WILLIS/Primary Examiner, Art Unit 2156