Last updated: April 19, 2026
Application No. 18/199,611
SYSTEMS AND METHODS FOR SELF-SUPPERVISED TIME-SERIES REPRESENTATION LEARNING

Non-Final OA §101§103§112
Filed
May 19, 2023
Examiner
BOSTWICK, SIDNEY VINCENT
Art Unit
2124
Tech Center
2100 — Computer Architecture & Software
Assignee
Royal Bank Of Canada
OA Round
1 (Non-Final)
This examiner grants 52% of cases after interview

— +38.2% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 136 resolved cases, 2023–2026
Examiner Intelligence

BOSTWICK, SIDNEY VINCENT View full profile →
Grants 52% of resolved cases
Career Allow Rate
71 granted / 136 resolved
-2.8% vs TC avg
Strong +38% interview lift
Without
With
+38.2%
Interview Lift
resolved cases with interview
Typical timeline
4y 7m
Avg Prosecution
68 currently pending
Career history
204
Total Applications
across all art units
Statute-Specific Performance

§101
24.4%
-15.6% vs TC avg
§103
40.9%
+0.9% vs TC avg
§102
12.0%
-28.0% vs TC avg
§112
21.9%
-18.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 136 resolved cases
Office Action

§101 §103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Detailed Action 
This action is in response to the claims filed 5/19/2023: 
Claims 1 – 13 are pending.
Claims 1 is independent.

Specification
The following title is suggested: “Systems and methods for self-supervised time-series representation learning”.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1-13 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding claim 1, "the student encoder" lacks antecedent basis.  "A student encoder" is recommended.

Regarding claim 1, "the teacher encoder" lacks antecedent basis.  "A teacher encoder" is recommended.

Regarding claim 11, "the plurality of anchor sequences comprise previous subsequences " is indefinite.  It would be unclear to one of ordinary skill in the art what the previous subsequences are previous relative to.  In the interest of further examination this is interpreted as simply a compound pronoun.

The remaining claims are rejected with respect to their dependence on the rejected claims.

Claim Rejections - 35 USC § 101
101 Rejection
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-13 are rejected under 35 USC § 101 because the claimed invention is directed to non-statutory subject matter.

Regarding Claim 1:  Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 1 under its broadest reasonable interpretation is a series of mental processes.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass machine learning processing, including the following: 
determining a temporal loss based on teacher temporal similarities between representations at different temporal locations within a teacher representation of an input time-series and student temporal similarities between representations at different temporal locations within a student representation of the input time-series (observation, evaluation, and judgement)
 determining an instance loss based on teacher instance similarities between representations at common temporal locations within the teacher representation and a plurality of anchor representations and student instance similarities between representations at common temporal locations within the student representation and the plurality of anchor representations (observation, evaluation and judgement)
updating the student encoder based on the temporal loss and instance loss (observation, evaluation, and judgement)
updating the teacher encoder as a moving average of the student encoder (observation, evaluation, and judgement)
Therefore, claim 1 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 1 recites additional elements “generating a neural network that provides a universal time-series representation”. However, these additional features are computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application (See MPEP 2106.05(f)).  Therefore, claim 1 is directed to a judicial exception.
Step 2B Analysis:  Claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 1 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
For the reasons above, claim 1 is rejected as being directed to non-patentable subject matter under §101. This rejection applies equally to independent claims 7 and 8, which recite a system and a computer program product, respectively, as well as to dependent claims 2-6. The additional limitations of the dependent claims are addressed briefly below:
Dependent claim 2 recites additional observation, evaluation, and judgement “applying a first augmented subsequence of the input time-series to a teacher encoder to generate the teacher representation of the input time-series; and applying a second augmented subsequence of the input time-series to a student encoder to generate the student representation of the input time-series”
Dependent claim 3 recites additional observation, evaluation, and judgement “the first augmented subsequence is generated by applying a first augmentation to a first sampled subsequence of the input time series and the second augmented subsequence is generated by applying a second augmentation to a second sampled subsequence of the input time series, wherein the first and second sampled subsequences have a minimum overlap.”
Dependent claim 4 recites additional observation, evaluation, and judgement “the first augmentation and the second augmentation have the same number of timestamps”
Dependent claim 5 recites additional observation, evaluation, and judgement “determining the teacher temporal similarities by: comparing a representation of the teacher representation at a particular temporal location to representations of the teacher representation at other temporal locations”.
Dependent claim 6 recites additional observation, evaluation, and judgement “determining the student temporal similarities by: comparing a representation of the student representation at the particular temporal location to representations of the student representation at other temporal locations”
Dependent claim 7 recites additional mathematical calculations and relationships “the temporal loss is determined by summing Kullback-Leibler divergences between the teacher temporal similarities and the student temporal similarities over all temporal position”
Dependent claim 8 recites additional observation, evaluation, and judgement “determining the teacher instance similarities by: comparing a representation of the teacher representation at a first temporal location to representations of a plurality of anchor sequences at the first temporal location”
Dependent claim 9 recites additional observation, evaluation, and judgement “determining the student instance similarities by: comparing a representation of the student representation at a second temporal location to representations of the plurality of anchor sequences at the second temporal location”
Dependent claim 10 recites additional observation, evaluation, and judgement “the instance loss is determined by summing Kullback-Leibler divergences between the teacher instance similarities and the student instance similarities over all temporal position”
Dependent claim 11 recites additional observation, evaluation, and judgement “the plurality of anchor sequences comprise previous subsequences used to generate the teacher representations or the student representation”
Dependent claim 12 recites additional instructions to apply the judicial exception using generic computer components “A neural network trained according to the method of claim 1”
Dependent claim 13 recites additional instructions to apply the judicial exception using generic computer components “A non-transitory computer readable memory storing instructions, which when executed by a processor of a system configure the system to perform the method of claim 1”
Therefore, when considering the elements separately and in combination, they do not add significantly more to the inventive concept. Accordingly, claims 1-13 are rejected under 35 U.S.C. § 101. 

Regarding claims 12, claims 12 is directed towards non-statutory subject matter, “software per se”. Claim 12 is directed towards a neural network, which in view of the instant specification and what is known in the art is interpreted as software ([¶0028] “The CPU 104 performs arithmetic calculations and control functions to execute software stored in a non-transitory internal memory”).  Therefore, claim 12 is rejected as software-per-se.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.


	Claims 1, 2, and 5-13 are rejected under U.S.C. §103 as being unpatentable over the combination of Tejankar (“ISD: Self-Supervised Learning by Iterative Similarity Distillation”, 2021) and Gong (“KDCTime: Knowledge Distillation with Calibration on InceptionTime for Time-series Classification”, 2021).

    PNG
    media_image1.png
    760
    1554
    media_image1.png
    Greyscale

FIG. 1 of Tejankar


	 Regarding claim 1, Tejankar teaches A method of generating a neural network ([p. 3] "our ResNet50 experiments")
	determining a [temporal] loss based on teacher [temporal] similarities between representations at different [temporal] locations within a teacher representation of an input time-series ([p. 2 §2] "we pick a random query image and a bunch of random other images that we call anchor points. We augment those images and feed them to the teacher model to get their embeddings […] We calculate the similarity of the query point compared to the anchor points in the teacher’s embedding space [...] we calculate the similarity of the query embedding t(qt) compared to all anchor points, divide by a temperature, and convert to a probability distribution using a SoftMax operator" [p. 3] "Finally, we optimize the student only by minimizing the following loss" KL divergence loss interpreted as instance loss which is explicitly based on teacher instance similarities between representations at all locations within the teacher representation and all anchor representations.)
	and student [temporal] similarities between representations at different [temporal] locations within a student representation of the input time-series;([p. 3] "Then, we calculate a similar probability distribution for the student’s query embedding to get ps(i)[...] Finally, we optimize the student only by minimizing the following loss")
	determining an instance loss based on teacher instance similarities between representations at common [temporal] locations within the teacher representation and a plurality of anchor representations ([p. 2 §2] "we pick a random query image and a bunch of random other images that we call anchor points. We augment those images and feed them to the teacher model to get their embeddings […] We calculate the similarity of the query point compared to the anchor points in the teacher’s embedding space [...] we calculate the similarity of the query embedding t(qt) compared to all anchor points, divide by a temperature, and convert to a probability distribution using a SoftMax operator" [p. 3] "Finally, we optimize the student only by minimizing the following loss" KL divergence loss interpreted as instance loss which is explicitly based on teacher instance similarities between representations at all locations within the teacher representation and all anchor representations.)
	and student instance similarities between representations at common [temporal] locations within the student representation and the plurality of anchor representations;([p. 3] "Then, we calculate a similar probability distribution for the student’s query embedding to get ps(i)[...] Finally, we optimize the student only by minimizing the following loss")
	and instance loss; and([p. 2] "We update the student based on KL divergence loss")
	updating the teacher encoder as a moving average of the student encoder. ([p. 2] "update the teacher as running average of the student" See also FIG. 1 Moving Average).
	However, Tejankar does not explicitly teach a universal time-series representation, the method comprising:
	temporal and instance loss
	updating the student encoder based on the temporal loss.

	Gong, in the same field of endeavor, teaches a universal time-series representation, the method comprising:([p. 1 Abstract] "Time-series classification approaches based on deep neural networks")
	temporal and instance loss([p. 5] "LKD(yh,ˆy,ytτ,ˆyτ)=(1−ε)LCE(yh,ˆy)+ετ2LKL(ytτ,ˆyτ)" KD loss interpreted as temporal loss, LKL interpreted as instance loss)
	updating the student encoder based on the temporal loss ([p. 5] "LKD(yh,ˆy,ytτ,ˆyτ)=(1−ε)LCE(yh,ˆy)+ετ2LKL(ytτ,ˆyτ)" [p. 5] "LKL(ytτ, ˆyτ) representing the loss of soft labels and LCE(yh,ˆy) representing the loss of hard labels are incorporated into a whole for training the student model" KD loss interpreted as temporal loss).

	Tejankar as well as Gong are directed towards student teacher knowledge distillation of Resnet models.  Therefore, Tejankar as well as Gong are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Tejankar with the teachings of Gong by using the KL loss term in Tejankar in the combined loss formula in Gong for temporal classification.  Gong explicitly uses an analogous ResNet architecture further reinforcing the obviousness.  Tejankar's loss term satisfies the claim requirements for both an instance loss and temporal loss such that combining the KL term into a combined loss term for temporal classification in Gong and treating the combined loss term as the temporal loss and the specific KL loss term as the instance loss is an obvious combination..  Gong provides as additional motivation for combination ([p. 2] “KDCTime simultaneously improves the accuracy and reduces the inference time of it with an acceptable training time overhead. As a conclusion, the performance of KDCTime is promising”).  This motivation for combination also applies to the remaining claims which depend on this combination.

	 Regarding claim 2, the combination of Tejankar and Gong teaches The method of claim 1, further comprising: applying a first augmented subsequence of the input time-series to a teacher encoder to generate the teacher representation of the input time-series; and(Gong [p. 3] "A dataset D is a pair of sets, including a set of time-series X = {x1,x2,...,xM} and a set of true labels Yh ={yh 1,yh 2,...,yh M} respectively, where each time-series xi corresponds to a true label yh […] An InceptionTime model is treated as a function F ∈ F mapping an input x into an output z" [p. 4] "the predicted labels from the teacher model can be regarded as the knowledge learned by it, denoted yt")
	applying a second augmented subsequence of the input time-series to a student encoder to generate the student representation of the input time-series.(Gong [p. 4] "predicted labels yt from the teacher model and y^ from the student model").
	
	 Regarding claim 5, the combination of Tejankar and Gong teaches The method of claim 2, further comprising determining the teacher temporal similarities by: comparing a representation of the teacher representation at a particular temporal location to representations of the teacher representation at other temporal locations.(Tejankar [p. 2 §2] "we pick a random query image and a bunch of random other images that we call anchor points. We augment those images and feed them to the teacher model to get their embeddings […] We calculate the similarity of the query point compared to the anchor points in the teacher’s embedding space [...] we calculate the similarity of the query embedding t(qt) compared to all anchor points, divide by a temperature, and convert to a probability distribution using a SoftMax operator" [p. 3] "Finally, we optimize the student only by minimizing the following loss" Tejankar explicitly computes teacher and student similarity distribution by comparing the query embedding "to all anchor points", converting those similarities into distributions, and then minimizing "L=KL(pt||ps)".  Gong supplies the time-series and using teacher/student on that input).
	
	 Regarding claim 6, the combination of Tejankar and Gong teaches The method of claim 5, further comprising determining the student temporal similarities by: comparing a representation of the student representation at the particular temporal location to representations of the student representation at other temporal locations.(Tejankar [p. 2 §2] "we pick a random query image and a bunch of random other images that we call anchor points. We augment those images and feed them to the teacher model to get their embeddings […] We calculate the similarity of the query point compared to the anchor points in the teacher’s embedding space [...] we calculate the similarity of the query embedding t(qt) compared to all anchor points, divide by a temperature, and convert to a probability distribution using a SoftMax operator" [p. 3] "Finally, we optimize the student only by minimizing the following loss" Tejankar explicitly computes teacher and student similarity distribution by comparing the query embedding "to all anchor points", converting those similarities into distributions, and then minimizing "L=KL(pt||ps)".  Gong supplies the time-series and using teacher/student on that input).
	
	 Regarding claim 7, the combination of Tejankar and Gong teaches The method of claim 6, wherein the temporal loss is determined by summing Kullback-Leibler divergences between the teacher temporal similarities and the student temporal similarities over all temporal position.(Tejankar [p. 2 §2] "we pick a random query image and a bunch of random other images that we call anchor points. We augment those images and feed them to the teacher model to get their embeddings […] We calculate the similarity of the query point compared to the anchor points in the teacher’s embedding space [...] we calculate the similarity of the query embedding t(qt) compared to all anchor points, divide by a temperature, and convert to a probability distribution using a SoftMax operator" [p. 3] "Finally, we optimize the student only by minimizing the following loss L =KL(pt||ps)" Tejankar explicitly computes teacher and student similarity distribution by comparing the query embedding "to all anchor points", converting those similarities into distributions, and then minimizing "L=KL(pt||ps)".  Gong then supplies the time-series and then explicitly sums the KL divergence over all teacher and student temporal positions to get the KD Loss term ([p. 5] "LKD(yh,ˆy,ytτ,ˆyτ)=(1−ε)LCE(yh,ˆy)+ετ2LKL(ytτ,ˆyτ)").).
	
	 Regarding claim 8, the combination of Tejankar and Gong teaches The method of claim 2, further comprising determining the teacher instance similarities by: comparing a representation of the teacher representation at a first temporal location to representations of a plurality of anchor sequences at the first temporal location. (Tejankar [p. 2 §2] "we pick a random query image and a bunch of random other images that we call anchor points. We augment those images and feed them to the teacher model to get their embeddings […] We calculate the similarity of the query point compared to the anchor points in the teacher’s embedding space [...] we calculate the similarity of the query embedding t(qt) compared to all anchor points, divide by a temperature, and convert to a probability distribution using a SoftMax operator" [p. 3] "Finally, we optimize the student only by minimizing the following loss").
	
	 Regarding claim 9, the combination of Tejankar and Gong teaches The method of claim 8, further comprising determining the student instance similarities by: comparing a representation of the student representation at a second temporal location to representations of the plurality of anchor sequences at the second temporal location.(Tejankar [p. 2 §2] "we pick a random query image and a bunch of random other images that we call anchor points. We augment those images and feed them to the teacher model to get their embeddings […] We calculate the similarity of the query point compared to the anchor points in the teacher’s embedding space [...] we calculate the similarity of the query embedding t(qt) compared to all anchor points, divide by a temperature, and convert to a probability distribution using a SoftMax operator" [p. 3] "Finally, we optimize the student only by minimizing the following loss").
	
	 Regarding claim 10, the combination of Tejankar and Gong teaches The method of claim 9, wherein the instance loss is determined by summing Kullback-Leibler divergences between the teacher instance similarities and the student instance similarities over all temporal position.(Tejankar [p. 2 §2] "we pick a random query image and a bunch of random other images that we call anchor points. We augment those images and feed them to the teacher model to get their embeddings […] We calculate the similarity of the query point compared to the anchor points in the teacher’s embedding space [...] we calculate the similarity of the query embedding t(qt) compared to all anchor points, divide by a temperature, and convert to a probability distribution using a SoftMax operator" [p. 3] "Finally, we optimize the student only by minimizing the following loss L =KL(pt||ps)" Tejankar explicitly computes teacher and student similarity distribution by comparing the query embedding "to all anchor points", converting those similarities into distributions, and then minimizing "L=KL(pt||ps)".  Gong then supplies the time-series and then explicitly sums the KL divergence over all teacher and student temporal positions to get the KD Loss term ([p. 5] "LKD(yh,ˆy,ytτ,ˆyτ)=(1−ε)LCE(yh,ˆy)+ετ2LKL(ytτ,ˆyτ)").).
	
	 Regarding claim 11, the combination of Tejankar and Gong teaches The method of claim 10, wherein the plurality of anchor sequences comprise previous subsequences used to generate the teacher representations or the student representation.(Tejankar [p. 2] "We capture the similarity of the query to the anchor points in the teacher’s embedding space and transfer that knowledge to the student" See FIG. 1.  Previous anchor points interpreted as previous subsequences).
	
	 Regarding claim 12, the combination of Tejankar and Gong teaches A neural network trained according to the method of claim 1.(Gong [Abstract] "Time-series classification approaches based on deep neural networks […] we first propose Label Smoothing for InceptionTime (LSTime), which adopts the information of soft labels compared to just hard labels. Next, instead of manually adjusting soft labels by LSTime, Knowledge Distillation for InceptionTime (KDTime) is proposed in order to automatically generate soft labels by the teacher model" InceptionTime is a time-series neural network based off the Inception neural network.).
	
	 Regarding claim 13, the combination of Tejankar and Gong teaches A non-transitory computer readable memory storing instructions, which when executed by a processor of a system configure the system to perform the method of claim 1.(Gong [p. 7] "Our experiments are conducted on a computer equipped with an Intel Core i9-11900 CPU at 2.50GHz, 32GB memory, and a NVIDIA GeForce RTX 3090 GPU. The operating system is Windows 10. Additionally, the development environment is Anaconda 4.10.3 with Python 3.8.8 and Pytorch 1.9.0.").
	
	Claims 3 and 4 are rejected under U.S.C. §103 as being unpatentable over the combination of Tejankar and Gong and in further view of Choi (“MULTI-TASK SELF-SUPERVISED TIME-SERIES REPRESENTATION LEARNING”, 2023).

	 Regarding claim 3, the combination of Tejankar and Gong teaches The method of claim 2.
	However, the combination of Tejankar and Gong doesn't explicitly teach wherein the first augmented subsequence is generated by applying a first augmentation to a first sampled subsequence of the input time series and the second augmented subsequence is generated by applying a second augmentation to a second sampled subsequence of the input time series, 
	wherein the first and second sampled subsequences have a minimum overlap.

	Choi, in the same field of endeavor, teaches the first augmented subsequence is generated by applying a first augmentation to a first sampled subsequence of the input time series and the second augmented subsequence is generated by applying a second augmentation to a second sampled subsequence of the input time series, ([p. 6] "Given input segments Xi,a1:b1 and Xi,a2:b2 , the shared encoder maps them to timestamp-level representations Ri,a1:b1 = {ri,a1 ,··· ,ri,b1 } and Ri,a2:b2 = {ri,a2 ,··· ,ri,b2 }, where [a2,b1] is an overlapping time region. In the shared encoder, the timestamp masking module generates augmented context views and makes representations at the same timestamp have differing context information. Based on these timestamp-level representations, timestamp wise and instance-wise contrasting are performed to enforce contextual consistency. Both contrasting methods treat the representations at the same timestamp from two augmented views as positives. Specifically, ri,t and ri,t, which are representations at timestamp t from Xi,a1:b1 and Xi,a2:b2 , are a positive pair. In contrast, different negative samples are chosen depending on the goal of the two contrasting methods. First, timestamp-wise contrasting seeks to make the shared encoder learn discriminative representations over time")
	wherein the first and second sampled subsequences have a minimum overlap.([p. 2] "treats the same timestamps in two overlapping time segments as positive pairs" [p. 4] "In both contrastive learnings, representations at the same timestamp in two augmented segments were considered positive pairs. In contrast, negative samples were represented differently in both augmented segments for the first contrastive learning, while representations at the same timestamp in other instances").

	The combination of Tejankar and Gong as well as Choi are directed towards time-series representation learning.  Therefore, the combination of Tejankar and Gong as well as Choi are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Tejankar and Gong with the teachings of Choi by using a time-series augmentation strategy (Choi [p. 3] “to improve the quality of the learned representations”).  This motivation for combination also applies to the remaining claims which depend on this combination.

	 Regarding claim 4, the combination of Tejankar, Gong, and Choi teaches The method of claim 3, wherein the first augmentation and the second augmentation have the same number of timestamps.(Choi [p. 2] "treats the same timestamps in two overlapping time segments as positive pairs" [p. 4] "In both contrastive learnings, representations at the same timestamp in two augmented segments were considered positive pairs. In contrast, negative samples were represented differently in both augmented segments for the first contrastive learning, while representations at the same timestamp in other instances").
	
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Ma (US20230141610A1) is directed towards time-series student teacher knowledge distillation using teacher and student representation similarities, moving average for updates, and KL divergence loss function.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SIDNEY VINCENT BOSTWICK/Examiner, Art Unit 2124
Read full office action
Prosecution Timeline

May 19, 2023
Application Filed
Mar 12, 2026
Non-Final Rejection — §101, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/373,021
Patent 12561604
SYSTEM AND METHOD FOR ITERATIVE DATA CLUSTERING USING MACHINE LEARNING
2y 5m to grant Granted Feb 24, 2026
18/486,534
Patent 12547878
Highly Efficient Convolutional Neural Networks
2y 5m to grant Granted Feb 10, 2026
16/902,547
Patent 12536426
Smooth Continuous Piecewise Constructed Activation Functions
2y 5m to grant Granted Jan 27, 2026
18/607,777
Patent 12518143
FEEDFORWARD GENERATIVE NEURAL NETWORKS
2y 5m to grant Granted Jan 06, 2026
16/940,293
Patent 12505340
STASH BALANCING IN MODEL PARALLELISM
2y 5m to grant Granted Dec 23, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
52%
Grant Probability
90%
With Interview (+38.2%)
4y 7m
Median Time to Grant
Low
PTA Risk
Based on 136 resolved cases by this examiner. Grant probability derived from career allow rate.