DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
As to independent claims 1, 8, and 15:
At Step 1:
The claims are directed to a “medium”, “method”, and “apparatus” and thus directed to a statutory category.
At Step 2A, Prong One:
The claims recite the following limitations directed to an abstract idea:
“specifying a terminal subject based on a parent-child relationship of subjects that correspond to a plurality of tags used in a document” as drafted recites a mental process. One can mentally evaluate or judge a category (i.e. terminal subject) based on relationships of categories (i.e. subjects) corresponding to words/terms (i.e. tags) in a document.
“calculating a vector of a tag that corresponds to the terminal subject based on each word included in definition information set for the terminal subject and a word vector dictionary that defines a vector of each word” as drafted recites a mathematical concept. Specifically, organizing information and manipulating information through mathematical correlations, Digitech Image Techs., LLC v. Electronics for Imaging, Inc., 758 F.3d 1344, 1350, 111 USPQ2d 1717, 1721 (Fed. Cir. 2014). (See MPEP 2106.04(a)(2)(I)(A) “iv”). Applicant’s specification teaches converting words into numerical vectors (e.g. via mathematical algorithms), called word embeddings, to capture their semantic meaning (See [0021]-[0027] and [0036]-[0045]).
At Step 2A, Prong Two:
The claims recite the following additional elements:
That the medium, method, and apparatus are performed by a “computer” and “in a computer-readable recording medium” which is a high-level recitation of a generic computer components and represents mere instructions to apply on a computer as in MPEP 2106.05(f), which does not provide integration into a practical application.
Viewing the additional limitations together and the claim as a whole, nothing provides integration into a practical application.
At Step 2B:
The conclusions for the mere implementation using a computer are carried over and do not provide significantly more.
Looking at the claims as a whole does not change this conclusion and the claim is ineligible.
As to dependent claims 2-7, 9-14, and 16-20:
At Step 1:
The claims are directed to a “medium”, “method”, and “apparatus” and thus directed to a statutory category.
At Step 2A, Prong One:
The claims recite the following limitations directed to an abstract idea:
“calculating a vector of a tag that corresponds to a subject other than the terminal subject based on a parent-child relationship between a vector that corresponds to the terminal subject and the subject other than the terminal subject” as drafted recites a mathematical concept. Specifically, organizing information and manipulating information through mathematical correlations, Digitech Image Techs., LLC v. Electronics for Imaging, Inc., 758 F.3d 1344, 1350, 111 USPQ2d 1717, 1721 (Fed. Cir. 2014). (See MPEP 2106.04(a)(2)(I)(A) “iv”). Applicant’s specification teaches converting words into numerical vectors (e.g. via mathematical algorithms), called word embeddings, to capture their semantic meaning (See [0021]-[0027] and [0036]-[0045]).
“registering a relationship between the tag and the vector of the tag in a tag vector dictionary” as drafted recites a mental process. One can mentally evaluate or judge a relationship between words in a document and words in a dictionary.
“calculating a vector of the document based on the word vector dictionary and the tag vector dictionary” as drafted recites a mathematical concept. Specifically, organizing information and manipulating information through mathematical correlations, Digitech Image Techs., LLC v. Electronics for Imaging, Inc., 758 F.3d 1344, 1350, 111 USPQ2d 1717, 1721 (Fed. Cir. 2014). (See MPEP 2106.04(a)(2)(I)(A) “iv”). Applicant’s specification teaches converting words into numerical vectors (e.g. via mathematical algorithms), called word embeddings, to capture their semantic meaning (See [0021]-[0027] and [0036]-[0045]).
“calculating the vector of the tag that corresponds to the subject other than the terminal subject preferentially calculates the vector of the subject for which all vectors of the subjects included in the parent-child relationship of the subject other than the terminal subject are calculated” as drafted recites a mathematical concept. Specifically, organizing information and manipulating information through mathematical correlations, Digitech Image Techs., LLC v. Electronics for Imaging, Inc., 758 F.3d 1344, 1350, 111 USPQ2d 1717, 1721 (Fed. Cir. 2014). (See MPEP 2106.04(a)(2)(I)(A) “iv”). Applicant’s specification teaches converting words into numerical vectors (e.g. via mathematical algorithms), called word embeddings, to capture their semantic meaning (See [0021]-[0027] and [0036]-[0045]).
“generating an index in which the vector of the document is associated with a registration position of the document” as drafted recites a mathematical concept. Specifically, organizing information and manipulating information through mathematical correlations, Digitech Image Techs., LLC v. Electronics for Imaging, Inc., 758 F.3d 1344, 1350, 111 USPQ2d 1717, 1721 (Fed. Cir. 2014). (See MPEP 2106.04(a)(2)(I)(A) “iv”).
“the parent-child relationship includes a calculation relationship of the subjects that correspond to the plurality of tags, and the calculation relationship derives a value of a subject using a value of the terminal subject among the subjects that correspond to the plurality of tags” as drafted recites a mathematical concept. Specifically, organizing information and manipulating information through mathematical correlations, Digitech Image Techs., LLC v. Electronics for Imaging, Inc., 758 F.3d 1344, 1350, 111 USPQ2d 1717, 1721 (Fed. Cir. 2014). (See MPEP 2106.04(a)(2)(I)(A) “iv”). Applicant’s specification teaches converting words into numerical vectors (e.g. via mathematical algorithms), called word embeddings, to capture their semantic meaning (See [0021]-[0027] and [0036]-[0045]).
“specifying the terminal subject specifies the terminal subject based on the calculation relationship of the subjects defined in taxonomy of an extensible business reporting language (XBRL) document” as drafted recites a mental process. One can mentally evaluate or judge a category (i.e. terminal subject) based on relationships of categories (i.e. subjects) corresponding to words/terms (i.e. tags) in a document.
At Step 2A, Prong Two:
The claims recite the following additional elements:
That the medium, method, and apparatus are performed by a “computer” and “in a computer-readable recording medium” which is a high-level recitation of a generic computer components and represents mere instructions to apply on a computer as in MPEP 2106.05(f), which does not provide integration into a practical application.
“when a search query is received, searching for a document that corresponds to the search query based on a vector of the search query and the index” as drafted recites insignificant extra-solution activity. This limitation recited as retrieval/receiving of data (i.e. mere data gathering).
Viewing the additional limitations together and the claim as a whole, nothing provides integration into a practical application.
At Step 2B:
The conclusions for the mere implementation using a computer are carried over and do not provide significantly more.
With respect to the “searching” identified as insignificant extra-solution activity in Step 2A Prong 2, when re-evaluated at Step 2B, this limitation is well-understood, routine, and conventional and remains insignificant extra-solution activity. See MPEP 2106.05(d)(II) “i. Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) (using a telephone for image transmission); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network); but see DDR Holdings, LLC v. Hotels.com, L.P., 773 F.3d 1245, 1258, 113 USPQ2d 1097, 1106 (Fed. Cir. 2014) ("Unlike the claims in Ultramercial, the claims at issue here specify how interactions with the Internet are manipulated to yield a desired result‐‐a result that overrides the routine and conventional sequence of events ordinarily triggered by the click of a hyperlink." (emphasis added));” and “iv. Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93.”
Looking at the claims as a whole does not change this conclusion and the claim is ineligible.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claim(s) 1-4, 6, 8-11, 13, 15-18, and 20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Gattiker et al (US 20150134666 A1).
As to claims 1, 8, and 15, Gattiker teaches An information processing method, apparatus, and medium for causing a computer to perform a process comprising:
specifying a terminal subject based on a parent-child relationship of subjects that correspond to a plurality of tags used in a document (Gattiker [0019] and [0034] discloses multiple dictionaries, each having a corresponding subject, contain entries corresponding to descriptive terms (i.e. tags) that are associated with the subject, i.e., the terms (i.e. tags) that ordinarily occur in association with the subject in written documents. Gattiker further discloses hierarchical classification (i.e. parent-child relationship), where the hierarchy resembles an ontology. Terms (i.e. tags) are organized in a downward branching tree, in which each branch represents a different hierarchical sub-classification. Leaf nodes at the bottom of the tree have strong affinity (similarity) to each other (siblings) and less affinity to nodes above the leaf nodes (parents) and even less affinity to leaf nodes under affinity to other leaf nodes not under the same parent (cousins). Gattkier Figure 5C, shows terms Crawdad and Worm (i.e. ) have a high affinity value with respect to each other, lower affinity values with respect to their parent term Bait, and still lower affinity values with respect to cousin terms Weights and Line.); and
calculating a vector of a tag that corresponds to the terminal subject based on each word included in definition information set for the terminal subject and a word vector dictionary that defines a vector of each word (Gattiker [0035] and [0076] discloses tagging documents prior to processing search queries. Associating document with classifications, i.e., the per-subject dictionaries, so that once a subject or subjects of a query is discovered, the appropriate documents can be retrieved. The document terms are matched with the dictionary entries. If a term matches a term in one of the dictionaries (decision 43), then the dictionary name or subject (or other suitable identifier for the dictionary), and optionally the SDP score and/or term is added to the collection (step 45). Gattiker further discloses dictionary-matching may be performed by finding all terms (i.e. tags) that exist in both the document and a dictionary of interest, generating a document-occurrence vector (i.e. vector of a tag) with one entry per common term, generating a dictionary-vector (i.e. word vector dictionary) with one entry per common term in the same order as the vector above).
As to claims 2, 9, and 16, Gattiker teaches calculating a vector of a tag that corresponds to a subject other than the terminal subject based on a parent-child relationship between a vector that corresponds to the terminal subject and the subject other than the terminal subject (Gattiker [0040]-[0041] discloses search terms "Crawdad" and "Worm" using hierarchical retrieval the result would be at subject of Fishing and an additional subject Gardening. The Fishing dictionary appears first in the order because the Fishing dictionary is the best match to the search terms. Documents tagged with the Fishing dictionary are then ordered from highest-to-lowest match score against the Fishing dictionary. The Gardening dictionary appears next, and again the documents are ordered from highest-to-lowest match score against the gardening dictionary. NOTE: It is the position of the Examiner that the highest scored subject (i.e. Fishing) is the terminal subject and the lowest scored subject (i.e. Gardening) is the subject other than the terminal subject. For vector calculating corresponding to subjects please see [0076].).
As to claims 3, 10, and 17, Gattiker teaches registering a relationship between the tag and the vector of the tag in a tag vector dictionary, and calculating a vector of the document based on the word vector dictionary and the tag vector dictionary (Gattiker [0035] and [0076] discloses tagging documents prior to processing search queries. Associating document with classifications, i.e., the per-subject dictionaries, so that once a subject or subjects of a query is discovered, the appropriate documents can be retrieved. The document terms are matched with the dictionary entries. If a term matches a term in one of the dictionaries (decision 43), then the dictionary name or subject (or other suitable identifier for the dictionary), and optionally the SDP score and/or term is added to the collection (step 45). Gattiker further discloses dictionary-matching may be performed by finding all terms (i.e. tags) that exist in both the document and a dictionary of interest, generating a document-occurrence vector (i.e. vector of a tag) with one entry per common term, generating a dictionary-vector (i.e. word vector dictionary) with one entry per common term in the same order as the vector above).
As to claims 4, 11, and 18, Gattiker teaches calculating the vector of the tag that corresponds to the subject other than the terminal subject preferentially calculates the vector of the subject for which all vectors of the subjects included in the parent-child relationship of the subject other than the terminal subject are calculated (Gattiker [0040]-[0041] discloses search terms "Crawdad" and "Worm" using hierarchical retrieval the result would be at subject of Fishing and an additional subject Gardening. The Fishing dictionary appears first in the order because the Fishing dictionary is the best match to the search terms. Documents tagged with the Fishing dictionary are then ordered from highest-to-lowest match score against the Fishing dictionary. The Gardening dictionary appears next, and again the documents are ordered from highest-to-lowest match score against the gardening dictionary. NOTE: It is the position of the Examiner that the highest scored subject (i.e. Fishing) is the terminal subject and the lowest scored subject (i.e. Gardening) is the subject other than the terminal subject. For vector calculating corresponding to subjects please see [0076].).
As to claims 6, 13, and 20, Gattiker teaches the parent-child relationship includes a calculation relationship of the subjects that correspond to the plurality of tags, and the calculation relationship derives a value of a subject using a value of the terminal subject among the subjects that correspond to the plurality of tags (Gattiker [0034] discloses generating and storing information describing the frequency of occurrence of terms in proximity to other terms, the average distance (in words) between pairs of terms in each document, or other indicators of affinity between the terms (i.e. value). The statistics of term proximities (i.e. value) can be used to determine a distance between terms, which in turn may be used to determine which terms are grouped together at each level of the hierarchy. Terms that are adjacent most frequently, while not frequently appearing adjacent to other terms, can be collected to form groups or clusters, which then are placed in the lowest-level (bottom) row of the hierarchy. Terms that occur less frequently proximate the terms in a group, but occur equally frequently proximate the group and other groups, are placed at a next higher level in the hierarchy, and so forth. The classification process continues until the most generic term that is, on average, equally related to each of the highest sub-classifications is placed at the highest level of the hierarchy. The most generic term can be used as a descriptor of the subject of the dictionary).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 5, 12, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gattiker et al (US 20150134666 A1) in view of Hoehne et al (US 11281928 B1).
As to claims 5, 12, and 19, Gattiker fails to teach generating an index in which the vector of the document is associated with a registration position of the document, and when a search query is received, searching for a document that corresponds to the search query based on a vector of the search query and the index.
However, Hoehne teaches generating an index in which the vector of the document is associated with a registration position of the document, and when a search query is received, searching for a document that corresponds to the search query based on a vector of the search query and the index (Hoehne column 1, line 59 through column 2, line 7 discloses querying document terms and identifying target data from documents. Hoehne column 11, lines 41-55 discloses generate a character grid for the document 120 using the character and position information. Generating the character grid may include replacing characters of document 120 with an index value. Hoehne further discloses utilizing a dictionary to map a character to an index value. In some embodiments, the index value may be a vector. Document processing system 110A may generate the vector using model techniques such as, for example, Word2vec. Generating index values for the characters allows document processing system 110A to compile the character grid having index values for the characters contained within.).
Before the effective filing date, it would have been obvious to one of ordinary skill in the art, to modify the teachings of Gattiker to incorporate the querying for document terms and identifying target data from documents using character grid using position information as taught by Hoehne for the purpose of increasing the speed and relevancy with which desired data is retrieved by efficiently analyzing documents.
Claim(s) 7 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gattiker et al (US 20150134666 A1) in view of Mandelstein et al (US 20130031117 A1).
As to claims 7 and 14, Gattiker fails to teach the specifying the terminal subject specifies the terminal subject based on the calculation relationship of the subjects defined in taxonomy of an extensible business reporting language (XBRL) document.
However, Mandelstein teaches the specifying the terminal subject specifies the terminal subject based on the calculation relationship of the subjects defined in taxonomy of an extensible business reporting language (XBRL) document (Mandelstein [0012] discloses mapping data from a source data model (e.g., a data warehouse) to a target data model (e.g., eXtensible Business Reporting Language (XBRL) used to file annual reports with a financial authority. Mandelstein [0056] and [0057] further disclose the target ontology matches the ontological fingerprint of the data object as determined at step 502, concept matching is performed based on instances of the target data model (e.g., sample XBRL documents of a domain) and the value partition set of the data object (e.g., values of the table column in data warehouse system 120) at step 504. In particular, the instances of the target data are modeled as vectors (e.g., document vectors using a conventional term frequency--inverse document frequency (tf-idf) technique, where document terms are assigned a weight that is a statistical measure used to evaluate the importance of a word). The vectors are compared to the data values in the data object, and a conventional cosine distance measure is employed to compute a similarity value between the vectors and the data values. The similarity values are compared to one or more thresholds to determine the presence of a match for mapping (e.g., the similarity value for the vectors and data values may exceed or be below the thresholds, the quantity of vector terms and data values considered similar may be compared to thresholds to determine a match, etc.). When the vectors are sufficiently similar to the data values to provide a mapping as determined at step 506, the mapping is verified using a data type comparison, where the data types of the ontological fingerprint (e.g., Domain (r)) of the data object and concept in the target (e.g., XBRL) ontology are compared. This enables the target data model (e.g., XBRL) to comply with semantic requirements.).
Before the effective filing date, it would have been obvious to one of ordinary skill in the art, to modify the teachings of Gattiker to incorporate the mapping data from a source data model (e.g., a data warehouse) to a target data model (e.g., eXtensible Business Reporting Language (XBRL) used to file annual reports with a financial authority as taught by Mandelstein for the purpose of increasing the speed and relevancy with which desired data is retrieved by efficiently extracting data based on proper mapping.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
CHEN et al (US 20210357775 A1) - A method and system for mapping labels of documents is described. A training set including a plurality of documents and at least one map can be retrieved. Each document can include a plurality of labels, and the at least one map can represent associations between the labels of one document and another document in the set. Each document (or group of documents) in the set can include certain features. These features can relate to the labels in the documents. Each label can correspond to one or more data points (or datasets) in each documents. In one example embodiment, the map can be generated based on the features extracted from each document.
Goodman et al (US 11087070 B1) - Disclosed are systems and methods for XBRL tag suggestion and validation. In some embodiments, the method includes the steps of: receiving an XBRL document associated with one or more assigned XBRL tags; analyzing the XBRL document using a trained machine learning model to generate one or more suggested XBRL tags and determine one or more corresponding confidence values; comparing the one or more assigned XBRL tags with the one or more suggested XBRL tags to generate comparison results; and determining a tag confidence value associated with each assigned XBRL tag of the one or more assigned XBRL tags based on the comparison results.
Yount et al (US 20170052931 A1) - A method of performing XBRL extension taxonomy concept replacement includes analyzing, by a processor, an XBRL document having XBRL tags to identify an XBRL extension taxonomy concept of an XBRL extension taxonomy that is superfluous in comparison with an XBRL base taxonomy concept for an XBRL base taxonomy upon which the XBRL extension taxonomy is based. The processor is configured to identify an extension extended linkrole in the XBRL extension taxonomy that includes the identified XBRL extension taxonomy concept, determine a base extended linkrole in the XBRL base taxonomy that matches the extension extended linkrole, determine an XBRL base taxonomy concept in the base extended linkrole that matches the identified XBRL extension taxonomy concept, and replace the identified XBRL extension taxonomy concept with the XBRL base taxonomy concept in the base extended linkrole.
Malik et al (US 20120278336 A1) - Systems and techniques are disclosed for representing information included in unstructured text documents into a structured format. The systems and techniques identify events and information associated with the events in unstructured documents, classify the identified events and information, and represent the identified events and information in a structured format based on a computed classification score. The systems and techniques may also assign a confidence score to identified events, compare the confidence score associated with events to a confidence score associated with a trained confidence model, and represent the identified events and information associated with the events in a structured format based on the comparison.
Block et al (US 6947947 B2) - A method for adding labels to data, for example XML compliant or XBRL compliant labels, includes a) identifying data in an electronically represented file, b) selecting labels that correspond to text strings in the identified data, based on a list associating labels with text strings, and c) adding the selected labels into the electronically represented file to label the text strings and elements in the identified data associated with the text strings. The labels include information about the data and are defined in one or more taxonomies. When the list does not associate a label with the text string, a user can be prompted to select a label corresponding to a text string in the identified data. The association indicated by the user's selection, can then be added to the list associating labels with text strings.
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JARED M BIBBEE whose telephone number is (571)270-1054. The examiner can normally be reached Monday-Thursday 8AM-6PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, APU MOFIZ can be reached at 5712724080. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JARED M BIBBEE/Primary Examiner, Art Unit 2161