DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Amendments
This Office Action is in response to the amendment filed on December 30, 2025.
Claims 1, 6, and 7 have been amended.
Claims 2, 4-5, and 8 have been cancelled.
No new claims have been added.
The objections and rejections from the prior correspondence that are not restated herein are withdrawn.
Response to Arguments
Applicant's arguments filed on December 30, 2025 have been fully considered.
Applicant's arguments regarding the 35 U.S.C. 101 rejections of the previous office action have been fully considered but are not persuasive. Applicant argues:
“Applicant's approach of using a semantic similarity matrix between embedding vectors of targets, integrated with a portfolio optimization under gain constraints, is not abstract but unconventional, non-routine, and not well understood. It is respectfully submitted that for several reasons the rejection under 35 U.S.C. § 101 does not apply.
First, claim 1 is clearly directed to a unique ordered combination of technical elements that provides a neural networked, superior computer-based tool that employs techniques utilizing hardware for computing a return as a difference between observed prices before and after a specific period that optimizes the portfolio under any constraint requiring a designated gain to be achieved within a defined period. The method of claim 1 does so more effectively and more efficiently than any generic computer could ever accomplish and in real-time.
Second, claim 1 provides a dynamic optimization in which a portfolio vector is determined by minimizing risk subject to a constraint that a given gain (E[b,e]) must be achieved during a specified period ([b,e]) (see support in paragraphs [0080]-[0083]). This is a novel and unconventional approach, and is not in any way abstract.
Thirdly, the unique ordered combination of technical elements recited in claim 1 both integrate the abstract idea into a practical application, individually or in combination, and is sufficient to amount to significantly more than the judicial exception. These claim elements, when considered alone and in combination, are not directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Moreover, the above-mentioned structural elements and process steps cannot be accomplished with just any computer.
As such, the claims integrate any abstract idea into a practical application, as these steps represent a particular link (rather than a general link) between technology and abstraction, referring to MPEP § 2106.0S(e). Thus, the claims are not therefore directed to the abstract idea.
Here, the claims as amended herein provide a specific ordered, combination of elements that amount to meaningful limitations that transform the claims into patent-eligible subject matter. The added unique technical limitations discussed above provide a unique ordered, combination of elements that are not well-understood, non-routine, and unconventional.
This claim further characterizes the operation of the hardware and is a definite improvement to the technology, and thus is not in any way abstract. Therefore, it is submitted that independent claim 1 is directed to patent eligible subject matter under 3 5 U.S. C. § 101.”
Examiner respectfully disagrees. It should be noted that claim limitations that remain within the realm of abstract ideas, such as optimizing a portfolio vector representation, do not integrate the judicial exception into a practical application. Furthermore, an unconventional approach of using specific mathematical elements for optimizing a vector does not address subject matter eligibility. Examiner notes that unconventional approaches such as specific ordering of mathematical elements or operations may help applicants in distinguishing claimed inventions from prior art, which is a non-obviousness issue. Regarding claim 1, the claim is directed to, per remarks “utilizing hardware for computing a return as a difference between observed prices before and after a specific period that optimizes the portfolio under any constraint requiring a designated gain to be achieved within a defined period.” The computed return is a difference of prices (i.e., numerical values) to help make financial decisions for optimizing portfolios. The process for computing the return is directed to mathematical calculations (e.g., using semantic similarity matrix and observed price data for computing an optimized allocation vector), optimizing the portfolio is directed to mathematical concepts, and the additional elements recited in the claims simply apply the judicial exceptions recited in the claims using generic computer components (i.e., at least one processor; at least one a vector arithmetic circuit including, but not limited to, a GPU, an FPGA, or an ASIC). Even if the method computes the optimization for a portfolio more efficiently than any other known method implemented in generic computer components, the improvement occurs in the computation of the return for optimizing the portfolio, and thus the additional elements do not provide an improvement to computer functionality or technological field, nor integrate the abstract idea into a practical application. Moreover, the additional elements do not amount to significantly more than the judicial exception. Claim 1 explicitly states that the method may utilize a GPU, FPGA, or an ASIC to achieve a neural network that executes the steps of the claimed method. A GPU, FPGA, or ASIC are generic computer components.
Applicant further argues:
“Thus, it is submitted that dependent claim 3, and analogous independent method claim 6, as well as analogous independent non-transitory storage medium claim 7, are also directed to patent eligible subject matter under 35 U.S.C. § 101.
The claims, as a whole, are not directed to an abstract idea and integrate a practical application and recite additional elements which constitute significantly more than the abstract idea. Therefore, claims 1, 3, 6, and 7 are patent eligible under the Alice/Mayo analysis.”
Examiner respectfully disagrees. Claim 3 recites further additional limitations that do not integrate the abstract idea, do not provide improvements to technology, nor amount to significantly more than the judicial exception as shown in detail in the 101 rejections below. Furthermore, independent claims 6 and 7 recite the same steps as independent claim 1 and adds additional limitations that do not integrate the abstract idea, do not provide improvements to technology, nor amount to significantly more than the judicial exception. For this reason, claim 6 and 7 are rejected in a similar manner as independent claim 1, as shown in detail in the 101 rejections below.
Applicant further argues:
“The Examiner's rationale rests on knowledge of the Applicant's disclosure. None of the references suggests using:
target-specific embedding vectors learned from NLP text input,
computation of a semantic similarity matrix between targets, and
integration of that matrix into a constrained portfolio optimization.
The proposed combination is only apparent when starting from the Applicant's claims and
working backward, which is a classic form of hindsight reconstruction. Therefore, the Examiner's reasoning lacks the required articulated rationale to be an abstract idea without significantly more.”
Examiner respectfully disagrees. It should be noted that the concept of impermissible hindsight pertains to 35 USC 103 obviousness analysis and not to Alice/Mayo analysis for subject matter eligibility. As described in detail above and in the 101 rejections section, the claims are directed to an abstract idea. Furthermore, the additional elements recited in the claim do not integrate the judicial exception into a practical application, do not provide improvements to technology, nor amount to significantly more than the judicial exception.
Applicant's arguments regarding the 35 U.S.C. 103 rejections of the previous office action have been fully considered but are not persuasive. Applicant argues:
“However, independent claim 1, and analogous independent claims 6 and 7 have been amended herein with features that are neither taught, suggested nor disclosed […]. The Examiner's combination of the five cited references-Hu, Qi, Miller, Guosheng, and Markowitz-is improper for the following reasons.
Hu relates to predicting stock price trends from news using an NLP-based hybrid attention network.
Qi relates to classifying Chinese news headlines via semantic enhancement and multi-level label embeddings.
Miller concerns key-value memory networks for reading documents and answering questions.
Guosheng performs stock clustering based solely on statistical properties of price series (e.g., modularity optimization, Sharpe ratios), without using NLP or embeddings.
Markowitz is the foundational modern portfolio theory reference, which optimizes a portfolio using covariance of returns-not semantic similarity matrices or embeddings.
These references address entirely different problem domains and employ disparate technical approaches. There is no teaching or suggestion in the cited art that would lead a POSITA to combine:
NLP-based extraction of word-level and context-level embedding vectors (Hu, Qi, Miller),
price-series clustering unrelated to NLP (Guosheng), and
classical covariance-based portfolio optimization (Markowitz),
to arrive at Applicant's approach of using a semantic similarity matrix between embedding vectors of targets, integrated with a portfolio optimization under gain constraints.
While the Examiner may regard the claimed "risk calculation based on the similarity matrix and the portfolio vector" as analogous to the variance of portfolio return in Markowitz, the two models are fundamentally different. Markowitz discloses a static optimization model that minimizes the variance of expected returns (μi) using a covariance matrix derived from historical return data. However, Markowitz neither computes the return as a difference between observed prices before and after a specific period, nor optimizes the portfolio under any constraint requiring a designated gain to be achieved within a defined period. Accordingly, Markowitz neither teaches nor suggests determining, for any arbitrary period, a portfolio vector constrained to achieve a desired gain, nor calculating risk based on a semantic similarity matrix obtained from text-based embeddings.
The Examiner's proposed combination of Hu, Qi, Miller, Guosheng, and Markowitz therefore still depends on impermissible hindsight, since none of the cited references-alone or in combination-discloses or suggests these distinctive features of the present invention. Thus, the invention according to amended independent claim 1, and analogous independent claims 6 and 7 are novel and non-obvious. Therefore, amended independent claim 1, and analogous independent claims 6 and 7 obviate the rejections, and thus dependent claim 3 obviates the rejections and are in allowable form. The Applicant respectfully submits that with the amendments to claims 1, 6, and 7, the claims rejections should be obviated and/or rendered moot.”
Examiner respectfully disagrees. Regarding the QI reference, the reference explicitly teaches learning deep characteristics of Chinese news texts such as in finance (see QI [pg. 5, Table I], [pg. 6, Fig. 2], and [pg. 3, Table III]). The claim language does not restrict the types of text the method can process, and thus Chinese finance news texts, or financial news in other languages can be used to learn deep characteristics of finance texts. Regarding MILLER, the reference teaches a method for answering questions from text by using a knowledge base (i.e., information storage). A POSITA having the teachings of HU, which uses natural language processing methods to process news for predicting trends and constructing a portfolio, would see the benefit of using QI’s finance news article deep characteristics learning, and using MILLER’s assigning of relevance probability for looking up relevant information to answer questions posed by HU for predicting stock price trends. Also, HU [pg. 263, section 3.3 Effective and Efficient Learning] explicitly teaches that “News cannot always provide an informative indication of the stock trend, especially when there exists only an insufficient number of news about specific stocks in a period.” Additionally, HU [pg. 269, section 6. Conclusion and Future Work] lays the ground for bringing in GUOSHENG for further improving HU’s portfolio construction strategy. Specifically, HU’s conclusions section teaches “In the future, beyond modeling the sequence of news related to one stock, we plan to further leverage the relationship between news related to different individual stocks, according to their industrial connections in real world. Moreover, we will investigate how to integrate the news-oriented approach with technical analysis for more accurate stock trend prediction.” Therefore, a POSITA can be motivated by the GUOSHENG reference to further leverage the relationship between news related to different individual stocks. GUOSHENG [Abstract & Introduction] proposes a portfolio construction strategy and optimization to address selection and weighting of assets to be held in the portfolio. GUOSHENG [pg. 2708, sections 2.2 and 3.2] and teaches computing the cosine similarity between learned feature vector representations of stocks, where the learned features capture nonlinear dynamics and semantic information. Under BRI correspond to the semantic similarity matrix. Regarding MARKOWITZ, the reference is brought to specifically teach:
optimize, for a given gain and a given period, a portfolio vector representing an allocation to the plurality of targets by minimizing a risk based […] the prices of the targets in the given period under a constraint condition that the portfolio vector achieves the given gain in the given period. (MARKOWITZ [pg. 81] teaches: “Let
X
i
be the percentage of the investor's assets (i.e., a portfolio vector representing an allocation to the plurality of targets) which are allocated to the
i
t
h
security.” MARKOWITZ [pg. 82] teaches: “The E-V rule states that the investor would (or should) want to select one of those portfolios which give rise (i.e., optimize) to the (E, V) combinations indicated as efficient in the figure; i.e., those with minimum V (i.e., by minimizing risk based on […] the prices of the targets) for given E (i.e., for a given gain) or more and maximum E for given V or less. […] The investor, being informed of what (E, V) combinations were attainable, could state which he desired. We could then find the portfolio which gave this desired combination.” MARKOWITZ [pg. 91] teaches: “To use the E-V rule in the selection of securities we must have procedures for finding reasonable
μ
i
and
σ
i
j
. These procedures, I believe, should combine statistical techniques and the judgment of practical men. […] Using this revised set of
μ
i
and
σ
i
j
, the set of efficient E, V combinations could be computed, the investor could select the combination he preferred, and the portfolio which gave rise to this E, V combination could be found. One suggestion as to tentative
μ
i
,
σ
i
j
is to use the observed
μ
i
,
σ
i
j
for some period of the past (i.e., for a given period). MARKOWITZ [pg. 85] teaches: “The efficient line begins at the attainable point with minimum variance (in this case on the
a
b
-
line).” Examiner’s note: Under broadest reasonable interpretation, under a constraint condition that the portfolio vector achieves the given gain in the given period can be interpreted as the investor selecting the preferred combination of (E, V) using observations
μ
i
and
σ
i
j
of some past period that meets the investor’s desired combination for highest returns based on the portfolio diversification (i.e., price of targets in the given period).)
In the same manner, analogous claims 6 and 7 are also rejected in a similar way as independent claim 1. The rejection for claim 3 is maintained as shown in the previous Office Action. As noted, the references of HU, GUOSHENG, and MARKOWITZ are all related to the finance domain. Therefore, the references of HU, GUOSHENG, QI, MILLER, and MARKOWITZ provide sufficient motivation for a POSITA to combine for improving the portfolio selection and improving its performance, as shown in detail in the 103 rejections below.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1, 3, and 6-7 are rejected under 35 U.S.C.101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1: Claims 1, 3, and 7 are directed to a machine or an article of manufacture. Claim 6 is directed to a method.
With respect to claims 1, 6, and 7:
2A Prong 1: The claim(s) recite(s) an abstract idea. Specifically:
optimizing a portfolio (Mathematical concepts – Optimizing a portfolio involves mathematical calculations (see paragraphs [0076-0083]) – see MPEP § 2106.04(a)(2)(I))
determining statuses of the plurality of targets at each of dates and times from the past date and time to the base date and time by: extracting a word level feature vector and a context level feature vector from each text released at each of dates and times; (Mathematical concept – extracting a word level feature vector and a context level feature vector involves mathematical calculations (see paragraph [0041]) – see MPEP § 2106.04(a)(2)(I))
determining a weight for the each text, based on an inner product of the word level feature vector extracted from each text and each of the plurality of embedding vectors; (Mathematical concept – determining a weight based on an inner product of vectors involves mathematical calculations – see MPEP § 2106.04(a)(2)(I))
multiplying the context level feature vector extracted from the each text by the determined weight of the each text and taking a sum; and (Mathematical concept – multiplying vectors and taking a sum involves mathematical calculations– see MPEP § 2106.04(a)(2)(I))
compute/computing, based on similarities between the plurality of embedding vectors included in the trained model, a semantic similarity matrix between pairs of targets in the plurality of targets; and (Mathematical concept – computing a similarity matrix between a plurality of embedding vectors involves mathematical calculations such as the cosine similarity (see paragraphs [0081]) – see MPEP § 2106.04(a)(2)(I))
optimize/optimizing, for a given gain and a given period, a portfolio vector representing an allocation to the plurality of targets by minimizing a risk based on the semantic similarity matrix and the prices of the targets in the given period under a constraint condition that the portfolio vector achieves the given gain in the given period. (Mathematical concepts – Optimizing a portfolio vector involves mathematical calculations (see paragraphs [0076-0083]) – see MPEP § 2106.04(a)(2)(I))
If claim limitations, under their broadest reasonable interpretation, cover performance of the limitations as a mental process, but for the recitation of generic computer components, then the claim limitations fall within the mathematical or mental process grouping of abstract ideas. Accordingly, the claim “recites” an abstract idea.
2A Prong 2: The additional elements recited in the claim(s) do not integrate the abstract idea into a practical application, individually or in combination.
Additional elements:
(Claim 1) An information processing device for (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)
(Claim 1) at least one processor, (Mere recitation of a generic computer component – see § MPEP 2106.05(b)(I))
(Claim 6) A method of controlling an information processing device […] the information processing device comprising […] (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)
(Claim 7) A non-transitory computer-readable information recording medium storing a program […] the program causing a computer comprising […] (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)
[…] at least one a vector arithmetic circuit including, but not limited to, a GPU, an FPGA, or an ASIC to achieve a neural network that […] (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)
accepts a set of texts released at each of a plurality of dates and times from a past date and time before a base date and time to the base date and time as input, (Mere data gathering – Adding insignificant extra-solution activity of mere data gathering to the judicial exception – see § MPEP2106.05(g).)
outputs a classification indicating whether a price of each of a plurality of targets has increased or decreased since a date and time immediately before the base date and time until the base date and time (Adding insignificant extra-solution activity to the judicial exception – see § MPEP2106.05(g).)
includes, in a model, a plurality of embedding vectors in which features of the plurality of targets are respectively embedded, (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)
(Claim 1) a memory to store the plurality of embedding vectors, and instructions to be executed by said at least one processor (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)
(Claim 1) wherein, the instruction causes said at least one processor to train the model by: […] (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)
(Claim 6) the method causing the information processing device to execute processing of: training the model by: […] (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)
(Claim 7) the program causing the computer to execute processing of: training the model by: […] (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)
Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.
2B: The claim(s) do(es) not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Additional elements:
(Claim 1) An information processing device for (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)
(Claim 1) at least one processor, (Mere recitation of a generic computer component – see § MPEP 2106.05(b)(I))
(Claim 6) A method of controlling an information processing device […] the information processing device comprising […] (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)
(Claim 7) A non-transitory computer-readable information recording medium storing a program […] the program causing a computer comprising […] (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)
[…] at least one a vector arithmetic circuit including, but not limited to, a GPU, an FPGA, or an ASIC to achieve a neural network that […] (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)
accepts a set of texts released at each of a plurality of dates and times from a past date and time before a base date and time to the base date and time as input, (Simply appending well-understood, routine, conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception (WURC)- see MPEP § 2106.05(d)(ll)(i) - Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information).)
outputs a classification indicating whether a price of each of a plurality of targets has increased or decreased since a date and time immediately before the base date and time until the base date and time (Simply appending well-understood, routine, conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception (WURC)- see MPEP § 2106.05(d)(ll)(i) - Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information).)
includes, in a model, a plurality of embedding vectors in which features of the plurality of targets are respectively embedded, (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)
(Claim 1) a memory to store the plurality of embedding vectors, and instructions to be executed by said at least one processor (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)
(Claim 1) wherein, the instruction causes said at least one processor to train the model by: […] (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)
(Claim 6) the method causing the information processing device to execute processing of: training the model by: […] (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)
(Claim 7) the program causing the computer to execute processing of: training the model by: […] (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)
Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
With respect to claim 3:
2A Prong 2: The additional elements recited in the claim do not integrate the abstract idea into a practical application, individually or in combination.
Additional elements:
wherein the training device includes a bidirectional gated recurrent unit (Bi-GRU) and a multilayer perceptron (MLP). (Generally linking the use of a judicial exception to a particular technological environment or field of use – see MPEP § 2106.05(h).)
2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Additional elements:
wherein the training device includes a bidirectional gated recurrent unit (Bi-GRU) and a multilayer perceptron (MLP). (Generally linking the use of a judicial exception to a particular technological environment or field of use – see MPEP § 2106.05(h).)
Therefore, the claim is ineligible.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 3, and 6-7 are rejected under 35 U.S.C. 103 as being unpatentable over HU ("Listening to Chaotic Whispers: A Deep Learning Framework for News-oriented Stock Trend Prediction") in view of GUOSHENG ("Deep Stock Representation Learning: From Candlestick Charts To Investment Decisions"), QI ("Semantic Enhancement and Multi-level Label Embedding for Chinese News Headline Classification"), MILLER ("Key-Value Memory Networks for Directly Reading Documents”) and MARKOWITZ ("Portfolio Selection"), hereafter HU, GUOSHENG, QI, MILLER, and MARKOWITZ respectively.
Regarding Claim 1:
HU teaches:
An information processing device for optimizing a portfolio, comprising: (HU [pg. 268, section 5.4] teaches: "Based on these scores, a straightforward portfolio construction strategy called top-K selects K stocks with the highest scores to construct a new portfolio for the next trading day.")
[…] a neural network that, accepts a set of texts released at each of a plurality of dates and times from a past date and time before a base date and time to the base date and time as input, (HU [pg. 262, section I. Introduction] teaches: "we design a Hybrid Attention Networks (HAN) to predict the stock trend based on the sequence of recent related news." HU [pg. 264, section 4.1 Problem Statement] teaches: "For a given date
t
and a given stock
s
, we can calculate its rise percent by:
R
i
s
e
_
P
e
r
c
e
n
t
(
t
)
=
O
p
e
n
_
P
r
i
c
e
t
+
1
-
O
p
e
n
_
P
r
i
c
e
(
t
)
O
p
e
n
_
P
r
i
c
e
(
t
)
outputs a classification indicating whether a price of each of a plurality of targets has increased or decreased since a date and time immediately before the base date and time until the base date and time and (HU [pg. 265, section 4.2 Hybrid Attention Networks] teaches: "Trend Prediction: The final discriminative network is a standard Multi-layer Perceptron (MLP), which takes V as input and produces the three-class classification of the future stock trend.” HU [pg. 264, section I. Introduction] teaches: “The stock trend prediction task can be formulated as follows: given the length of a time sequence
N
, the stock
s
and date
t
, the goal is to use the news corpus sequence from time
t
-
N
to
t
-
1
, denoted as
[
C
t
-
N
,
C
t
-
N
+
1
,
.
.
.
,
C
t
-
1
]
,
to predict the class of
R
i
s
e
_
P
e
r
c
e
n
t
(
t
)
, i.e. DOWN, UP, or PRESERVE. Note that each news corpus
C
i
contains a set of news with the size of
L
,
C
i
=
n
i
1
,
n
i
2
,
…
,
n
i
L
]
, denoting
L
related news on date
i
.”)
determining statuses of the plurality of targets at each of dates and times from the past date and time to the base date and time by: (HU [pg. 266, section 5.1 Experimental Setup] teaches: “we collected 1,271,442 economic news between 2014 and 2017. […] For each stock, we then aggregate all the news in a certain date to construct the daily news corpus.” HU [pg. 265, section 4.2 Hybrid Attention Networks] teaches: "[…] we calculate the overall corpus vector
d
t
as a weighted sum of each news vector respectively, and use this vector to represent all news information for date
t
. THUs, we get a temporal sequence of corpus vector
D
=
d
i
,
i
∈
1
,
N
.
[…] To encode the temporal sequence of corpus vectors, we adopt Gated Recurrent Units (GRU). […] At date
t
, the GRU computes the news state
h
t
by linearly interpolating the previous state
h
t
-
1
and the current updated state
h
t
~
[…]. The current updated state
h
t
~
is computed by non-linearly combining the corpus vector input for this time-stamp and the previous state […]. Therefore, we can get the latent vector for each date
t
through GRU.” Examiner’s note: under BRI, “the plurality of targets” can be interpreted as the stocks for which the news corpus is constructed, and “determining statuses of the plurality of targets” can be interpreted as V, which is the result of sequential modeling and temporal attention calculation based on the computed news state
h
t
at date
t
based on a previous state
h
t
-
1
for each date
t
(i.e., at each of dates and times from the past date and time to the base date and time). Additionally, paragraph [0024] of the present application defines "dates and times" as "As a unit of date and time, an appropriate unit, such as 1 day, 12 hours, 1 hour, and 30 minutes, can be employed,". Furthermore, paragraph [0003] of the present application states: "Studies for predicting whether the price of a target at a date and time t (for example, tomorrow) has increased or decreased with respect to the price of the target at a date and time t-1 immediately before the date and time t (for example, today), based on texts released at respective dates and times t-1, t-2, t-3,…. before the date and time
t
, using deep learning and a neural net have been conducted (Non Patent Literature1)", Non Patent Literature 1 being the HU reference. Therefore, HU's date
t
satisfies the conditions for determining the status of a target at a past date and time.)
extracting a word level feature vector […] from each text released at each of dates and times; (HU [pg. 264, section 4.2 Hybrid Attention Networks] teaches: “For each
i
t
h
news (i.e., from each text) in news corpus
C
t
of date
t
, we use a word embedding layer to calculate the embedded vector for each word (i.e., extracting a word level feature vector) and then average all the words’ vectors to construct a news vector
n
t
i
.”)
classifying the targets into the classifications by the determined statuses; (HU [pg. 265, section 4.2 Hybrid Attention Networks] teaches: "Then we use β to calculate the weighted sum V , so that it can incorporate the sequential news context information with temporal attention, and will be used for classification. Trend Prediction: The final discriminative network is a standard Multi-layer Perceptron (MLP), which takes V as input and produces the three-class classification of the future stock trend.")
HU is not relied upon for teaching:
at least one processor, at least one a vector arithmetic circuit including, but not limited to, a GPU, an FPGA, or an ASIC to achieve a neural network [...]
includes, in a model, a plurality of embedding vectors in which features of the plurality of targets are respectively embedded, and
a memory to store the plurality of embedding vectors, and instructions to be executed by said at least one processor, wherein, the instruction causes said at least one processor to train the model by:
extracting […] a context level feature vector […]
determining a weight for the each text, based on an inner product of the word level feature vector extracted from each text and each of the plurality of embedding vectors; and
multiplying the context level feature vector extracted from the each text by the determined weight of the each text and taking a sum; and
compute, based on similarities between the plurality of embedding vectors included in the trained model, a semantic similarity matrix between pairs of targets in the plurality of targets; and
optimize, for a given gain and a given period, a portfolio vector representing an allocation to the plurality of targets by minimizing a risk based on the semantic similarity matrix and the prices of the targets in the given period under a constraint condition that the portfolio vector achieves the given gain in the given period.
However, GUOSHENG teaches: includes, in a model, a plurality of embedding vectors in which features of the plurality of targets are respectively embedded (GUOSHENG [pg. 2707, Fig. 2] teaches: "The 512D feature following average pooling provides our representation for clustering and portfolio construction." GUOSHENG [pg. 2707, section 2.1 Deep Feature Learning with CAEs] teaches: "[...] this single 512D vector encodes the 20-day 4-channel price history of the stock, and will provide the representation for further processing [...]". GUOSHENG [pg. 2708, section 3.1 Dataset and Settings] teaches: "We use all the stocks in FTSE 100 from 4th Jan 2000 to 14th May 2017." Examiner's note: under BRI, "a plurality of embedding vectors in which features of the plurality of targets are respectively embedded" can be interpreted as the 512D feature for representing one stock for each of the stocks used during processing. Furthermore, “a model” can be interpreted as the CAE that encodes the stocks as embedded representations.)
compute, based on similarities between the plurality of embedding vectors included in the trained model, a semantic similarity matrix between pairs of targets in the plurality of targets; (GUOSHENG [pg. 2706, section 1 Introduction] teaches: "To solve the aforementioned problems, we propose to use deep learning (DL) features for stock similarity measurement instead of raw time series." GUOSHENG [pg. 2707, section 2.1 Convolutional Autoencoder] teaches: "Thus this single 512D vector encodes the 20-day 4-channel price history of the stock, and will provide the representation for further processing (clustering and portfolio construction)." GUOSHENG [pg. 2708, section 2.2 Clustering] teaches: "We next aim to provide a clustering method for diversified – and hence low risk – portfolio selection. [...] To solve these problems, we introduce the network modularity method [18] to find the cluster structure of the stocks, where each stock is set as one node and the link between each pair of stocks is set as the cosine similarity calculated by our learned CAE features.” GUOSHENG [pg. 2708, section 3.2 Quantitative Results] teaches: “This illustrates the efficacy of our CAE and learned feature for capturing semantic information about stocks.” Examiner's note: under BRI, "a semantic similarity matrix" can be interpreted as the calculated cosine similarity for each pair of stocks, which results in a plurality of cosine similarity values as the links for each pair of stocks. All of the cosine similarity values linking the pairs of stocks constitute a matrix of cosine similarities capturing semantic information learned by CAE features. The pairs of stocks are represented as the 512D vector which contains encoded historical information of the stock.)
optimize, for […] a given period, a portfolio […] representing an allocation to the plurality of targets by minimizing a risk based on the semantic similarity matrix and the prices of the targets in the given period […] (GUOSHENG [pg. 2707, section 2 Methodology] teaches: "Finally, we perform portfolio construction by choosing stocks with the best performance measured by Sharpe ratio [19] from each cluster (i.e., and the prices of the targets in the given period)." GUOSHENG [pg. 2709, section 3.3 Quantitative Results] teaches: “We perform backtesting to compare our full portfolio (i.e., a portfolio […] representing an allocation to the plurality of targets) optimisation strategy against the market benchmark (FTSE 100 Index). […] We evaluate features and clustering methods over a long term period (4K trading days) (i.e., for […] a given period). GUOSHENG [pg. 2707, section ] teaches: “In the next clustering step, we aim to segment the market into diverse sectors in a data-driven way. This is important to provide risk reduction (i.e., minimizing a risk) by selecting a well diversified portfolio [16, 17].” GUOSHENG [pg. 2708, section 2.2 Clustering] teaches: "We next aim to provide a clustering method for diversified – and hence low risk – portfolio selection. [...] To solve these problems, we introduce the network modularity method [18] to find the cluster structure of the stocks, where each stock is set as one node and the link between each pair of stocks is set as the cosine similarity calculated by our learned CAE features (i.e., based on the semantic similarity matrix).” GUOSHENG [pg. 2708, section 3.2 Quantitative Results] teaches: “This illustrates the efficacy of our CAE and learned feature for capturing semantic information about stocks.” GUOSHENG [pg. 2708, section 2.3 Portfolio Construction and Backtesting] teaches: “Given the learned stock clustering (market segmentation), we construct a complete portfolio by picking diverse yet high-return stocks, and evaluate the result.”)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of HU, and GUOSHENG before them, to include GUOSHENG’s similarity calculation in HU’ news-oriented stock prediction framework. One would have been motivated to make such a combination in order to improve portfolio construction based on stock similarity features (GUOSHENG [pg. 2709, section 4 Conclusions).
HU in view of GUOSHENG is not relied upon for teaching:
at least one processor, at least one a vector arithmetic circuit including, but not limited to, a GPU, an FPGA, or an ASIC to achieve a neural network [...]
a memory to store the plurality of embedding vectors, and instructions to be executed by said at least one processor, wherein, the instruction causes said at least one processor to train the model by:
extracting […] a context level feature vector […]
determining a weight for the each text, based on an inner product of the word level feature vector extracted from each text and each of the plurality of embedding vectors; and
multiplying the context level feature vector extracted from the each text by the determined weight of the each text and taking a sum; and
optimize, for a given gain and a given period, a portfolio vector representing an allocation to the plurality of targets by minimizing a risk based […] the prices of the targets in the given period under a constraint condition that the portfolio vector achieves the given gain in the given period.
However, QI teaches: at least one processor, at least one a vector arithmetic circuit including, but not limited to, a GPU, an FPGA, or an ASIC to achieve a neural network [...] (QI [pg. 5, section C. Implementation Details] teaches: “all experiments in the paper are completed on the open source framework PyTorch on Linux CentOS 7.6.1810 system with NVIDIA TITAN Xp GPU (12G graphics memory).” QI [pg. 4, section D. Model Training] teaches: “Due to the classification of news headline is discrete, we employ a multi-label cross entropy as the loss function to calculate the whole loss: […]. Cross entropy is used to describe the difference between the predicted value and the real value of the model, and the stochastic gradient descent algorithm is used to optimize and adjust the model parameters (i.e., to achieve a neural network).”)
a memory to store the plurality of embedding vectors, and instructions to be executed by said at least one processor, wherein, the instruction causes said at least one processor to train the model by: (QI [pg. 5, section C. Implementation Details] teaches: “all experiments in the paper are completed on the open source framework PyTorch on Linux CentOS 7.6.1810 system with NVIDIA TITAN Xp GPU (12G graphics memory).” QI [pg. 4, section D. Model Training] teaches: “Due to the classification of news headline is discrete, we employ a multi-label cross entropy as the loss function to calculate the whole loss: […]. Cross entropy is used to describe the difference between the predicted value and the real value of the model, and the stochastic gradient descent algorithm is used to optimize and adjust the model parameters (i.e., to train the model).” Examiner’s note: QI [pg. 2, section B. Label Embedding] teaches embedding vectors, which are implemented using the framework and system described in QI [pg. 5, section C].)
extracting […] a context level feature vector […] (QI [pg. 2, section B. Label Embedding] teaches: "In this paper, we attempt to construct a multi-level label embedding strategy to represent as a word-level vector and sentence-level text for news classification. [...] In order to represent more text features and make the sentences have both sequence features and local features, we aggregate the representation results of Bi-GRU and multi-scale CNN together to form the final sentence representation (i.e., context level feature vector) […]". QI [pg. 8, section V. Conclusion] teaches: "Moreover, a joint model of the bidirectional GRU and multiscale CNN is designed as the extractor of sentence (i.e., extracting) features to expand sentence semantics from multi-dimensions." Examiner’s note: Under broadest reasonable interpretation, a context level feature vector can be interpreted as the final sentence representation.)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of HU, GUOSHENG, and QI before them to include QI’s final GPU system, memory, and sentence representation calculation in HU and GUOSHENG’s news-oriented stock prediction framework. One would have been motivated to make such a combination in order to improve the performance of news headline (and text content) classification (QI [pg. 7, section V. Conclusion]).
HU in view of GUOSHENG and QI is not relied upon for teaching:
determining a weight for the each text, based on an inner product of the word level feature vector extracted from each text and each of the plurality of embedding vectors; and
multiplying the context level feature vector extracted from the each text by the determined weight of the each text and taking a sum; and
optimize, for a given gain and a given period, a portfolio vector representing an allocation to the plurality of targets by minimizing a risk based […] the prices of the targets in the given period under a constraint condition that the portfolio vector achieves the given gain in the given period.
However, MILLER teaches: determining a weight for the each text, based on an inner product of the […] feature vector extracted from each text and each of the plurality of embedding vectors; and (MILLER [pg. 3, section 3.1 Model Description] teaches: "Key Addressing: during addressing, each candidate memory is assigned a relevance probability by comparing the question to each key:
p
h
i
=
S
o
f
t
m
a
x
(
A
Φ
X
x
⋅
A
Φ
K
(
K
h
i
)
Examiner's note: The "determining a weight for the each text" can be interpreted as the result of the softmax function, which is based on an inner product of
A
Φ
X
x
as HU/QI’s
v
e
n
h
a
n
c
e
word-level embedding vector and
A
Φ
K
(
K
h
i
)
as the 512D feature vector that encodes information for a stock. The softmax function assigns a relevance probability (i.e., weight) to each inner product of the vectors.)
multiplying the […] feature vector extracted from the each text by the determined weight of the each text and taking a sum; and (MILLER [pg. 3, section 3.1 Model Description] teaches: "Value Reading: in the final reading step, the values of the memories are read by taking their weighted sum using the addressing probabilities, and the vector
o
is returned:
o
=
∑
i
p
h
i
A
Φ
V
(
v
h
i
)
Examiner’s note: under BRI, the “determined weight” can be interpreted as the addressing probabilities
p
h
i
. The “taking a sum” can be interpreted as the resulting
o
vector.)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of HU, GUOSHENG, QI, and MILLER before them, to include MILLER’s softmax function for determining relevance probability and weighted sum using the probabilities in HU/GUOSHENG/QI’s news-oriented stock prediction framework. One would have been motivated to make such a combination in order to improve information extraction from documents to answer questions related to the documents (e.g., stock price prediction), (MILLER [pg. 1, Abstract).
HU in view of GUOSHENG, QI, and MILLER is not relied upon for teaching, but MARKOWITZ teaches: optimize, for a given gain and a given period, a portfolio vector representing an allocation to the plurality of targets by minimizing a risk based […] the prices of the targets in the given period under a constraint condition that the portfolio vector achieves the given gain in the given period. (MARKOWITZ [pg. 81] teaches: “Let
X
i
be the percentage of the investor's assets (i.e., a portfolio vector representing an allocation to the plurality of targets) which are allocated to the
i
t
h
security.” MARKOWITZ [pg. 82] teaches: “The E-V rule states that the investor would (or should) want to select one of those portfolios which give rise (i.e., optimize) to the (E, V) combinations indicated as efficient in the figure; i.e., those with minimum V (i.e., by minimizing risk based on […] the prices of the targets) for given E (i.e., for a given gain) or more and maximum E for given V or less. […] The investor, being informed of what (E, V) combinations were attainable, could state which he desired. We could then find the portfolio which gave this desired combination.” MARKOWITZ [pg. 91] teaches: “To use the E-V rule in the selection of securities we must have procedures for finding reasonable
μ
i
and
σ
i
j
. These procedures, I believe, should combine statistical techniques and the judgment of practical men. […] Using this revised set of
μ
i
and
σ
i
j
, the set of efficient E, V combinations could be computed, the investor could select the combination he preferred, and the portfolio which gave rise to this E, V combination could be found. One suggestion as to tentative
μ
i
,
σ
i
j
is to use the observed
μ
i
,
σ
i
j
for some period of the past (i.e., for a given period). MARKOWITZ [pg. 85] teaches: “The efficient line begins at the attainable point with minimum variance (in this case on the
a
b
-
line).” Examiner’s note: Under broadest reasonable interpretation, under a constraint condition that the portfolio vector achieves the given gain in the given period can be interpreted as the investor selecting the preferred combination of (E, V) using observations
μ
i
and
σ
i
j
of some past period that meets the investor’s desired combination for highest returns based on the portfolio diversification (i.e., price of targets in the given period).)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of HU, GUOSHENG, QI, MILLER, and MARKOWITZ before them, to include MARKOWITZ’ E-V computation in HU, GUOSHENG, QI, and MILLER’s news-oriented stock prediction framework. One would have been motivated to make such a combination in order to find a combination that meets the desires of an investor (MARKOWITZ [pg. 82]).
Regarding Claim 3:
HU in view of GUOSHENG, QI, MILLER, and MARKOWITZ teaches the elements of claim 1 as outlined above. HU further teaches:
The information processing device according to claim 1, wherein the training device includes a bidirectional gated recurrent unit (Bi-GRU) and a multilayer perceptron (MLP). (HU [pg. 265, section 4.2 Hybrid Attention Networks] teaches: "Afterwards, these corpus vectors are encoded by a bi-directional Gated Recurrent Units (GRU)." HU [pg. 265, section 4.2 Hybrid Attention Networks] teaches: "Trend Prediction: The final discriminative network is a standard Multi-layer Perceptron (MLP), which takes V as input and produces the three-class classification of the future stock trend.")
Regarding Claim 6:
The claim recites similar limitations as corresponding claim 1 and is rejected for similar reasons as claim 1 using similar teachings and rationale.
Regarding Claim 7:
HU teaches:
[…] execute processing for optimizing a portfolio, (HU [pg. 268, section 5.4] teaches: "Based on these scores, a straightforward portfolio construction strategy called top-K selects K stocks with the highest scores to construct a new portfolio for the next trading day." HU [pg. 262, section I. Introduction] teaches: "we design a Hybrid Attention Networks (HAN) to predict the stock trend based on the sequence of recent related news." HU [pg. 264, section 4.1 Problem Statement] teaches: "For a given date
t
and a given stock
s
, we can calculate its rise percent by:
R
i
s
e
_
P
e
r
c
e
n
t
(
t
)
=
O
p
e
n
_
P
r
i
c
e
t
+
1
-
O
p
e
n
_
P
r
i
c
e
(
t
)
O
p
e
n
_
P
r
i
c
e
(
t
)
[…] a neural network that, accepts a set of texts released at each of a plurality of dates and times from a past date and time before a base date and time to the base date and time as input, (HU [pg. 262, section I. Introduction] teaches: "we design a Hybrid Attention Networks (HAN) to predict the stock trend based on the sequence of recent related news." HU [pg. 264, section 4.1 Problem Statement] teaches: "For a given date
t
and a given stock
s
, we can calculate its rise percent by:
R
i
s
e
_
P
e
r
c
e
n
t
(
t
)
=
O
p
e
n
_
P
r
i
c
e
t
+
1
-
O
p
e
n
_
P
r
i
c
e
(
t
)
O
p
e
n
_
P
r
i
c
e
(
t
)
outputs a classification indicating whether a price of each of a plurality of targets has increased or decreased since a date and time immediately before the base date and time until the base date and time (HU [pg. 265, section 4.2 Hybrid Attention Networks] teaches: "Trend Prediction: The final discriminative network is a standard Multi-layer Perceptron (MLP), which takes V as input and produces the three-class classification of the future stock trend.” HU [pg. 262, section I. Introduction] teaches: “The stock trend prediction task can be formulated as follows: given the length of a time sequence
N
, the stock
s
and date
t
, the goal is to use the news corpus sequence from time
t
-
N
to
t
-
1
, denoted as
[
C
t
-
N
,
C
t
-
N
+
1
,
.
.
.
,
C
t
-
1
]
,
to predict the class of
R
i
s
e
_
P
e
r
c
e
n
t
(
t
)
, i.e. DOWN, UP, or PRESERVE. Note that each news corpus
C
i
contains a set of news with the size of
L
,
C
i
=
n
i
1
,
n
i
2
,
…
,
n
i
L
]
, denoting
L
related news on date
i
.”)
determining statuses of the plurality of targets at each of dates and times from the past date and time to the base date and time by: (HU [pg. 266, section 5.1 Experimental Setup] teaches: “we collected 1,271,442 economic news between 2014 and 2017. […] For each stock, we then aggregate all the news in a certain date to construct the daily news corpus.” HU [pg. 265, section 4.2 Hybrid Attention Networks] teaches: "[…] we calculate the overall corpus vector
d
t
as a weighted sum of each news vector respectively, and use this vector to represent all news information for date
t
. THUs, we get a temporal sequence of corpus vector
D
=
d
i
,
i
∈
1
,
N
.
[…] To encode the temporal sequence of corpus vectors, we adopt Gated Recurrent Units (GRU). […] At date
t
, the GRU computes the news state
h
t
by linearly interpolating the previous state
h
t
-
1
and the current updated state
h
t
~
[…]. The current updated state
h
t
~
is computed by non-linearly combining the corpus vector input for this time-stamp and the previous state […]. Therefore, we can get the latent vector for each date
t
through GRU.” Examiner’s note: under BRI, “the plurality of targets” can be interpreted as the stocks for which the news corpus is constructed, and “determining statuses of the plurality of targets” can be interpreted as V, which is the result of sequential modeling and temporal attention calculation based on the computed news state
h
t
at date
t
based on a previous state
h
t
-
1
for each date
t
(i.e., at each of dates and times from the past date and time to the base date and time). Additionally, paragraph [0024] of the present application defines "dates and times" as "As a unit of date and time, an appropriate unit, such as 1 day, 12 hours, 1 hour, and 30 minutes, can be employed,". Furthermore, paragraph [0003] of the present application states: "Studies for predicting whether the price of a target at a date and time t (for example, tomorrow) has increased or decreased with respect to the price of the target at a date and time t-1 immediately before the date and time t (for example, today), based on texts released at respective dates and times t-1, t-2, t-3,…. before the date and time
t
, using deep learning and a neural net have been conducted (Non Patent Literature1)", Non Patent Literature 1 being the HU reference. Therefore, HU's date
t
satisfies the conditions for determining the status of a target at a past date and time.)
extracting a word level feature vector […] from each text released at each of dates and times; (HU [pg. 264, section 4.2 Hybrid Attention Networks] teaches: “For each
i
t
h
news (i.e., from each text) in news corpus
C
t
of date
t
, we use a word embedding layer to calculate the embedded vector for each word (i.e., extracting a word level feature vector) and then average all the words’ vectors to construct a news vector
n
t
i
.”)
classifying the targets into the classifications by the determined statuses; (HU [pg. 265, section 4.2 Hybrid Attention Networks] teaches: "Then we use β to calculate the weighted sum V , so that it can incorporate the sequential news context information with temporal attention, and will be used for classification. Trend Prediction: The final discriminative network is a standard Multi-layer Perceptron (MLP), which takes V as input and produces the three-class classification of the future stock trend.")
HU is not relied upon for teaching:
A non-transitory computer-readable information recording medium storing a program causing a computer to execute processing […], the program causing a computer comprising at least one vector arithmetic circuit including, but not limited to, a GPU, an FPGA, or an ASIC to achieve a neural network; […] the program causing the computer to execute processing of […]
includes, in a model, a plurality of embedding vectors in which features of the plurality of targets are respectively embedded
extracting […] a context level feature vector […]
determining a weight for the each text, based on an inner product of the word level feature vector extracted from each text and each of the plurality of embedding vectors; and
multiplying the context level feature vector extracted from the each text by the determined weight of the each text and taking a sum; and
computing, based on similarities between the plurality of embedding vectors included in the trained model, a semantic similarity matrix between pairs of targets in the plurality of targets; and
optimizing, for a given gain and a given period, a portfolio vector representing an allocation to the plurality of targets by minimizing a risk based on the semantic similarity matrix and the prices of the targets in the given period under a constraint condition that the portfolio vector achieves the given gain in the given period.
However, GUOSHENG teaches: includes, in a model, a plurality of embedding vectors in which features of the plurality of targets are respectively embedded (GUOSHENG [pg. 2707, Fig. 2] teaches: "The 512D feature following average pooling provides our representation for clustering and portfolio construction." GUOSHENG [pg. 2707, section 2.1 Deep Feature Learning with CAEs] teaches: "[...] this single 512D vector encodes the 20-day 4-channel price history of the stock, and will provide the representation for further processing [...]". GUOSHENG [pg. 2708, section 3.1 Dataset and Settings] teaches: "We use all the stocks in FTSE 100 from 4th Jan 2000 to 14th May 2017." Examiner's note: under BRI, "a plurality of embedding vectors in which features of the plurality of targets are respectively embedded" can be interpreted as the 512D feature for representing one stock for each of the stocks used during processing. Furthermore, “a model” can be interpreted as the CAE that encodes the stocks as embedded representations.)
computing, based on similarities between the plurality of embedding vectors included in the trained model, a semantic similarity matrix between pairs of targets in the plurality of targets; (GUOSHENG [pg. 2706, section 1 Introduction] teaches: "To solve the aforementioned problems, we propose to use deep learning (DL) features for stock similarity measurement instead of raw time series." GUOSHENG [pg. 2707, section 2.1 Convolutional Autoencoder] teaches: "THUs this single 512D vector encodes the 20-day 4-channel price history of the stock, and will provide the representation for further processing (clustering and portfolio construction)." GUOSHENG [pg. 2708, section 2.2 Clustering] teaches: "We next aim to provide a clustering method for diversified – and hence low risk – portfolio selection. [...] To solve these problems, we introduce the network modularity method [18] to find the cluster structure of the stocks, where each stock is set as one node and the link between each pair of stocks is set as the cosine similarity calculated by our learned CAE features.” GUOSHENG [pg. 2708, section 3.2 Quantitative Results] teaches: “This illustrates the efficacy of our CAE and learned feature for capturing semantic information about stocks.” Examiner's note: under BRI, "a semantic similarity matrix" can be interpreted as the calculated cosine similarity for each pair of stocks, which results in a plurality of cosine similarity values as the links for each pair of stocks. All of the cosine similarity values linking the pairs of stocks constitute a matrix of cosine similarities capturing semantic information learned by CAE features. The pairs of stocks are represented as the 512D vector which contains encoded historical information of the stock.)
optimizing, for […] a given period, a portfolio […] representing an allocation to the plurality of targets by minimizing a risk based on the semantic similarity matrix and the prices of the targets in the given period […] (GUOSHENG [pg. 2707, section 2 Methodology] teaches: "Finally, we perform portfolio construction by choosing stocks with the best performance measured by Sharpe ratio [19] from each cluster (i.e., and the prices of the targets in the given period)." GUOSHENG [pg. 2709, section 3.3 Quantitative Results] teaches: “We perform backtesting to compare our full portfolio (i.e., a portfolio […] representing an allocation to the plurality of targets) optimisation strategy against the market benchmark (FTSE 100 Index). […] We evaluate features and clustering methods over a long term period (4K trading days) (i.e., for […] a given period). GUOSHENG [pg. 2707, section ] teaches: “In the next clustering step, we aim to segment the market into diverse sectors in a data-driven way. This is important to provide risk reduction (i.e., minimizing a risk) by selecting a well diversified portfolio [16, 17].” GUOSHENG [pg. 2708, section 2.2 Clustering] teaches: "We next aim to provide a clustering method for diversified – and hence low risk – portfolio selection. [...] To solve these problems, we introduce the network modularity method [18] to find the cluster structure of the stocks, where each stock is set as one node and the link between each pair of stocks is set as the cosine similarity calculated by our learned CAE features (i.e., based on the semantic similarity matrix).” GUOSHENG [pg. 2708, section 3.2 Quantitative Results] teaches: “This illustrates the efficacy of our CAE and learned feature for capturing semantic information about stocks.” GUOSHENG [pg. 2708, section 2.3 Portfolio Construction and Backtesting] teaches: “Given the learned stock clustering (market segmentation), we construct a complete portfolio by picking diverse yet high-return stocks, and evaluate the result.”)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of HU, and GUOSHENG before them, to include GUOSHENG’s similarity calculation in HU’ news-oriented stock prediction framework. One would have been motivated to make such a combination in order to improve portfolio construction based on stock similarity features (GUOSHENG [pg. 2709, section 4 Conclusions).
HU in view of GUOSHENG is not relied upon for teaching:
A non-transitory computer-readable information recording medium storing a program causing a computer to execute processing […], the program causing a computer comprising at least one vector arithmetic circuit including, but not limited to, a GPU, an FPGA, or an ASIC to achieve a neural network; […] the program causing the computer to execute processing of […]
extracting […] a context level feature vector […]
determining a weight for the each text, based on an inner product of the word level feature vector extracted from each text and each of the plurality of embedding vectors; and
multiplying the context level feature vector extracted from the each text by the determined weight of the each text and taking a sum; and
optimize, for a given gain and a given period, a portfolio vector representing an allocation to the plurality of targets by minimizing a risk based […] the prices of the targets in the given period under a constraint condition that the portfolio vector achieves the given gain in the given period.
However, QI teaches: A non-transitory computer-readable information recording medium storing a program causing a computer to execute processing […], the program causing a computer comprising at least one vector arithmetic circuit including, but not limited to, a GPU, an FPGA, or an ASIC to achieve a neural network; […] the program causing the computer to execute processing of […] (QI [pg. 5, section C. Implementation Details] teaches: “all experiments in the paper are completed on the open source framework PyTorch on Linux CentOS 7.6.1810 system with NVIDIA TITAN Xp GPU (12G graphics memory).” QI [pg. 4, section D. Model Training] teaches: “Due to the classification of news headline is discrete, we employ a multi-label cross entropy as the loss function to calculate the whole loss: […]. Cross entropy is used to describe the difference between the predicted value and the real value of the model, and the stochastic gradient descent algorithm is used to optimize and adjust the model parameters (i.e., to achieve a neural network).”)
extracting […] a context level feature vector […] (QI [pg. 2, section B. Label Embedding] teaches: "In this paper, we attempt to construct a multi-level label embedding strategy to represent as a word-level vector and sentence-level text for news classification. [...] In order to represent more text features and make the sentences have both sequence features and local features, we aggregate the representation results of Bi-GRU and multi-scale CNN together to form the final sentence representation (i.e., context level feature vector) […]". QI [pg. 8, section V. Conclusion] teaches: "Moreover, a joint model of the bidirectional GRU and multiscale CNN is designed as the extractor of sentence (i.e., extracting) features to expand sentence semantics from multi-dimensions." Examiner’s note: Under broadest reasonable interpretation, a context level feature vector can be interpreted as the final sentence representation.)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of HU, GUOSHENG, and QI before them to include QI’s final GPU system, memory, and sentence representation calculation in HU and GUOSHENG’s news-oriented stock prediction framework. One would have been motivated to make such a combination in order to improve the performance of news headline (and text content) classification (QI [pg. 7, section V. Conclusion]).
HU in view of GUOSHENG and QI is not relied upon for teaching:
determining a weight for the each text, based on an inner product of the word level feature vector extracted from each text and each of the plurality of embedding vectors; and
multiplying the context level feature vector extracted from the each text by the determined weight of the each text and taking a sum; and
optimize, for a given gain and a given period, a portfolio vector representing an allocation to the plurality of targets by minimizing a risk based […] the prices of the targets in the given period under a constraint condition that the portfolio vector achieves the given gain in the given period.
However, MILLER teaches: determining a weight for the each text, based on an inner product of the […] feature vector extracted from each text and each of the plurality of embedding vectors; and (MILLER [pg. 3, section 3.1 Model Description] teaches: "Key Addressing: during addressing, each candidate memory is assigned a relevance probability by comparing the question to each key:
p
h
i
=
S
o
f
t
m
a
x
(
A
Φ
X
x
⋅
A
Φ
K
(
K
h
i
)
Examiner's note: The "determining a weight for the each text" can be interpreted as the result of the softmax function, which is based on an inner product of
A
Φ
X
x
as HU/QI’s
v
e
n
h
a
n
c
e
word-level embedding vector and
A
Φ
K
(
K
h
i
)
as the 512D feature vector that encodes information for a stock. The softmax function assigns a relevance probability (i.e., weight) to each inner product of the vectors.)
multiplying the […] feature vector extracted from the each text by the determined weight of the each text and taking a sum; and (MILLER [pg. 3, section 3.1 Model Description] teaches: "Value Reading: in the final reading step, the values of the memories are read by taking their weighted sum using the addressing probabilities, and the vector
o
is returned:
o
=
∑
i
p
h
i
A
Φ
V
(
v
h
i
)
Examiner’s note: under BRI, the “determined weight” can be interpreted as the addressing probabilities
p
h
i
. The “taking a sum” can be interpreted as the resulting
o
vector.)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of HU, GUOSHENG, QI, and MILLER before them, to include MILLER’s softmax function for determining relevance probability and weighted sum using the probabilities in HU/GUOSHENG/QI’s news-oriented stock prediction framework. One would have been motivated to make such a combination in order to improve information extraction from documents to answer questions related to the documents (e.g., stock price prediction), (MILLER [pg. 1, Abstract).
HU in view of GUOSHENG, QI, and MILLER is not relied upon for teaching, but MARKOWITZ teaches: optimizing, for a given gain and a given period, a portfolio vector representing an allocation to the plurality of targets by minimizing a risk based […] the prices of the targets in the given period under a constraint condition that the portfolio vector achieves the given gain in the given period. (MARKOWITZ [pg. 81] teaches: “Let
X
i
be the percentage of the investor's assets (i.e., a portfolio vector representing an allocation to the plurality of targets) which are allocated to the
i
t
h
security.” MARKOWITZ [pg. 82] teaches: “The E-V rule states that the investor would (or should) want to select one of those portfolios which give rise (i.e., optimize) to the (E, V) combinations indicated as efficient in the figure; i.e., those with minimum V (i.e., by minimizing risk based on […] the prices of the targets) for given E (i.e., for a given gain) or more and maximum E for given V or less. […] The investor, being informed of what (E, V) combinations were attainable, could state which he desired. We could then find the portfolio which gave this desired combination.” MARKOWITZ [pg. 91] teaches: “To use the E-V rule in the selection of securities we must have procedures for finding reasonable
μ
i
and
σ
i
j
. These procedures, I believe, should combine statistical techniques and the judgment of practical men. […] Using this revised set of
μ
i
and
σ
i
j
, the set of efficient E, V combinations could be computed, the investor could select the combination he preferred, and the portfolio which gave rise to this E, V combination could be found. One suggestion as to tentative
μ
i
,
σ
i
j
is to use the observed
μ
i
,
σ
i
j
for some period of the past (i.e., for a given period). MARKOWITZ [pg. 85] teaches: “The efficient line begins at the attainable point with minimum variance (in this case on the
a
b
-
line).” Examiner’s note: Under broadest reasonable interpretation, under a constraint condition that the portfolio vector achieves the given gain in the given period can be interpreted as the investor selecting the preferred combination of (E, V) using observations
μ
i
and
σ
i
j
of some past period that meets the investor’s desired combination for highest returns based on the portfolio diversification (i.e., price of targets in the given period).)
Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of HU, GUOSHENG, QI, MILLER, and MARKOWITZ before them, to include MARKOWITZ’ E-V computation in HU, GUOSHENG, QI, and MILLER’s news-oriented stock prediction framework. One would have been motivated to make such a combination in order to find a combination that meets the desires of an investor (MARKOWITZ [pg. 82]).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
GORBATOVKSY (US 20060271466 A1) relates to portfolio optimization by maximizing expected return such as achieving a desired return based on investor’s maximum time.
PRIOR ART USED!!!!!!)
Applicant's amendment necessitated the new ground(s) of rejection presented in
this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP
§ 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37
CFR 1.136(a). (NEW PRIOR ART IS INTRODUCED!!!!!!)
A shortened statutory period for reply to this final action is set to expire THREE
MONTHS from the mailing date of this action. In the event a first reply is filed within TWO
MONTHS of the mailing date of this final action and the advisory action is not mailed until after
the end of the THREE-MONTH shortened statutory period, then the shortened statutory period
will expire on the date the advisory action is mailed, and any nonprovisional extension fee
(37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the
advisory action. In no event, however, will the statutory period for reply expire later than SIX
MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Alvaro S Laham Bauzo whose telephone number is (571)272-5650. The examiner can normally be reached Mon-Fri 7:30 AM - 11:00 AM | 1:00 PM - 5:30 PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on (571) 272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/A.S.L./Examiner, Art Unit 2146
/USMAAN SAEED/Supervisory Patent Examiner, Art Unit 2146