Last updated: April 19, 2026

Application No. 18/531,976

Autoencoder with Non-Uniform Unrolling Recursion

Non-Final OA §103

Filed

Dec 07, 2023

Examiner

BEZUAYEHU, SOLOMON G

Art Unit

2674

Tech Center

2600 — Communications

Assignee

Mitsubishi Electric Research Laboratories Inc.

OA Round

1 (Non-Final)

Interview Optional

— +30.9% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 618 resolved cases, 2023–2026

Examiner Intelligence

BEZUAYEHU, SOLOMON G View full profile →

Grants 75% — above average

Career Allow Rate

464 granted / 618 resolved

+13.1% vs TC avg

Strong +31% interview lift

Without

With

+30.9%

Interview Lift

resolved cases with interview

Typical timeline

3y 4m

Avg Prosecution

30 currently pending

Career history

648

Total Applications

across all art units

Statute-Specific Performance

§101

16.0%

-24.0% vs TC avg

§103

49.7%

+9.7% vs TC avg

§102

13.4%

-26.6% vs TC avg

§112

11.7%

-28.3% vs TC avg

Black line = Tech Center average estimate • Based on career data from 618 resolved cases

Office Action

§103

DETAILED ACTION
Allowable Subject Matter
Claim 12 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 14, 15, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Bhorkar (Pub. No. US 2020/0244969) in view of YANG et al. (Pub. No. US 2021/0089863).
Regarding claim 1, Bhorkar teaches a non-uniform video encoder system, comprising: at least one processor [Abstract and para. 3]; and a memory having instructions stored thereon that, when executed by the at least one processor, [Abstract and para. 3] cause the non-uniform video encoder system to: 
receive a sequence of video frames of a video of a scene [Para. 22 “server 116 may obtain a sequence of frames of a video” and “server 116 may perform similar operations with respect to additional frames in the same or additional sequences of the video to create additional encoding blocks (e.g., for other scenes).”]; transform the sequence of video frames into series input data (difference vector) indicative of an evolution of the scene in time, space, or both [Para. 3 “the processing system may then generate a first difference vector comprising a difference between a latent space representation of the second frame and a latent space representation of the first frame in response to detecting the correlation between the visual properties, where the latent space representation of the first frame and the latent space representation of the second frame are generated via an autoencoder, and store the first difference vector in a first encoding block”]; partitioning the series input data into a sequence of non-uniform segments (encoding blocks) indicative of changes in the evolution of the scene [Para. 46 “step 430 may segregate frames for different encoding blocks based upon scene changes detected via MSE and/or another technique”; para. 50 “the third frame may be the first frame in a next encoding block (the second encoding block) after the first encoding block. For instance, the third frame may be detected to be the beginning of a new scene”. The claim doesn’t explicitly use the exact term “non-uniform”, however it’s clear that the blocks are built until a stopping condition occurs (loss of correlation/scene boundaries). Because the scene boundaries occur at content dependent times, the block duration is not fixed (non-uniform)]; encode each segment (encoding block) in the sequence of non-uniform segments by an encoder of an autoencoder architecture (autoencoder) [Para. 3 “the present disclosure describes a method, computer-readable medium, and device for creating an encoding block in accordance with latent space representations of video frames generated via an autoencoder”]; and output (transmit) the encoding of the series input data [Para. 51 “The processing system may transmit the first encoding block”].
However, Bhorkar does not explicitly teach encode each segment in the sequence of non-uniform segments by an encoder of an autoencoder architecture with non-uniform unrolling recursion to produce multi-depth encoding of the series input data, wherein, to encode a current segment at a current iteration to produce a current encoding, the non-uniform unrolling recursion combines the current segment with a previous encoding produced at a previous iteration and encodes the combination with the encoder; and output the multi-depth encoding. 
YANG teaches encode each segment in the sequence of non-uniform segments by an encoder of an autoencoder architecture with non-uniform unrolling recursion (clockwork recurrent neural network) to produce multi-depth encoding (multi-time compression) of the series input data, wherein, to encode a current segment (next sequential input x_{t+1}) at a current iteration to produce a current encoding, the non-uniform unrolling recursion combines (fed back) the current segment with a previous encoding (discretized code z_t) produced at a previous iteration and encodes the combination with the encoder [Para. 60 “In a fifth autoencoder architecture 610 the discretized code z.sub.t is fed back to the encoder for processing of the next sequential input x.sub.{t+1}, and in a sixth autoencoder architecture 612 the reconstructed output {circumflex over (x)}.sub.t is fed back to the encoder for processing of the next sequential input x.sub.t+1.”. Para. 64 “For example, some implementations may include a clockwork recurrent neural network” and “The clockwork recurrent neural networks enable further reduced bit-rate (e.g., higher compression) by having codes that different time-scales, providing multi-time compression (e.g., latent variables with a time-scale hierarchy)”]; and output the multi-depth encoding (multi-time compression) [Para. 64 “The clockwork recurrent neural networks enable further reduced bit-rate (e.g., higher compression) by having codes that different time-scales, providing multi-time compression (e.g., latent variables with a time-scale hierarchy)”)].
It would have been obvious to one of ordinary skill in the art, before the effective filing date, to incorporate the teaching of YANG into the system/method of Bhorkar to achieve improved neural network based temporal compression using recurrent feedback and muti-time scale codes for video encoding blocks. 
Regarding claims 2 and 15,  Bhorkar teaches wherein the changes in the evolution of the scene are identified by one or a combination of: an event (scene boundaries) detected in the scene, a change in a coloration pattern in the scene, a change in captions describing the scene, a change in results of a classification of the scene, an anomaly detected in the scene, an acoustic event detected in the scene, and an event associated with a camera capturing the evolution of the scene with the sequence of the video frames [Para. 49 “Optional step 460 may include determining that the MSE between the third frame and the preceding frame is greater than the threshold mentioned above in connection with step 430 and/or may comprise the detection of a scene boundary in accordance with any number of scene/boundary detection algorithms”]. 
Claims 14 and 20 are rejected for the same reasons as claim 1 above. Furthermore, Bhorkar teaches a processor, controller and computer readable medium to perform the claim limitations [Abstract and para. 33].

Claims 3 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Bhorkar (Pub. No. US 2020/0244969) in view of YANG et al. (Pub. No. US 2021/0089863) further in view of Adeli-Mosabbeb et al. (Pub. No. US 2021/0103742).
Regarding claims 3 and 16, Bhorkar in view of YANG doesn’t explicitly teach the claim limitations. 
However, Adeli teaches wherein the multi-depth (two layers) encoding of the series input data forms a spatio-temporal scene graph having nodes (graph node) representing one or multiple objects in the scene, wherein the current segment (frame) for the current iteration includes a portion of the scene graph, wherein the previous encoding produced at the previous iteration forms a super node (context node), and wherein the processor is configured to connect the super node with at least one node in the portion of the scene graph to produce the combination (concatenated vector) encoded by the encoder at the current iteration [Para. 18 “the technology disclosed herein generates a pedestrian-centric dynamic scene graph encoded with the spatiotemporal information between pedestrians and objects”; and “The pedestrian node is connected to all other instance nodes (e.g., each other pedestrian node and the object nodes) as well as a context node, which aggregates all the contextual visual information”, Para. 36 “The pedestrian-centric star graph 104a-x discussed above is constructed on each frame” and “a first GRU (also referred to herein as a pedestrian GRU) may be used to connect each pedestrian node temporally across frames, with the output of the pedestrian GRU at time t serving as the input pedestrian node at time t+1, while in other embodiments an LSTM network of a Quasi RNN (QRNN) network may be used (neither shown in FIG. 1).”; Para. 37 “Leveraging the spatiotemporal context provided through the graph node representation, embodiments of the technology perform two layers of graph convolution on each observed frame, where the features for the pedestrian node and the context node are hidden states of the corresponding convolutional model utilized for prediction during the prediction stage (stage 4).” And Para. 39 “the concatenated vector 108 serves as an input to the designated prediction GRU 110, which outputs a prediction regarding the future behavior of the pedestrian.” And para. 38].
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Bhorkar in view of YANG to teach the claim limitations stated above, feature as taught by Adeli; because the modification enables the system to achieve efficient, error resilient methods for video compression and transmission.   
Claims 4 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Bhorkar (Pub. No. US 2020/0244969) in view of YANG et al. (Pub. No. US 2021/0089863) in view of Adeli-Mosabbeb et al. (Pub. No. US 2021/0103742) and further in view of Cherian (Patent No. US 11,582,485).
Regarding claims 4 and 17, Bhorkar in view of YANG further in view of Adeli doesn’t explicitly teach the claim limitations. 
However, Cherian teaches wherein the spatio-temporal scene graph includes nodes representing one or multiple static objects and one or multiple dynamic objects in the scene [Col. 4 lines 32-36] wherein an appearance and a location of each of the static objects in the scene are represented by properties of a single node of the spatio-temporal scene graph [claim 1 “wherein an appearance and a location of each of the static objects in the scene are represented by properties of a single node of the spatio-temporal scene graph”], and wherein each of the dynamic objects in the scene is represented by properties of multiple nodes of the spatio-temporal scene graph describing an appearance, a location, and a motion of each of the dynamic objects at different instances of time [Col. 4l lines 16-20].
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Bhorkar in view of YANG further in view of Adeli to teach the claim limitations stated above, feature as taught by Cherian; because the modification enables the system to improves video compression efficiency and scene fidelity by converting video frames into a spatio-temporal scene graph that separately models static and dynamic objects across time for more scene-aware encoding. 


Claims 5, 6, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Bhorkar (Pub. No. US 2020/0244969) in view of YANG et al. (Pub. No. US 2021/0089863) further in view of Han et al. (Pub. NO. US 2020/0126538).
Regarding claim 5 and 18, Bhorkar in view of YANG doesn’t explicitly teach the claim limitations. 
However, Han teaches wherein the processor is configured to submit the multi-depth encoding (output encodings) of the series input data to a downstream neural network (attender neural network)/(decoder natural network) to perform a task [Para. 7 and 11].  
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Bhorkar in view of YANG to teach the claim limitations stated above, feature as taught by Han; because the modification enables to improve long form speech recognition by segmenting audio into overlapping chunks and merging the per segment sequence to sequence outputs into a single, more accurate transcription.
Regarding claim 6, Bhorkar in view of YANG doesn’t explicitly teach the claim limitations. 
However, Han teaches wherein the scene includes an audio scene (audio data) including speech utterance data (long – form utterance) (decoder neural network) having multiple sentences, and wherein the downstream neural network is configured to perform a speech processing task (transcribing) in response to submitting to multi-depth encoding (output encoding) of the series input data of the audio scene to the downstream neural network (attender neural network) [Para. 7 and 11]. 
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Bhorkar in view of YANG to teach the claim limitations stated above, feature as taught by Han; because the modification enables to improve long form speech recognition by segmenting audio into overlapping chunks and merging the per segment sequence to sequence outputs into a single, more accurate transcription.

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Bhorkar (Pub. No. US 2020/0244969) in view of YANG et al. (Pub. No. US 2021/0089863) further in view of Cherian et al. (Pub. No. US 2025/0187198).
Regarding claims 7, Bhorkar in view of YANG doesn’t explicitly teach the claim limitation. 
However, Cherian teaches wherein the processor is configured to submit (transmitted) the multi-depth encoding of the series input data to a downstream neural network to perform a navigation task (task navigation) [Para. 133-134].
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Bhorkar in view of YANG to teach the claim limitations stated above, feature as taught by Cherian; because the modification improves efficient downstream navigation and other time series tasks by compactly encoding evolving scene data into semantically meaningful multi-depth representations that reduce computational burden while preserving information for neural network decision making
Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Bhorkar (Pub. No. US 2020/0244969) in view of YANG et al. (Pub. No. US 2021/0089863) in view of Han et al. (Pub. NO. US 2020/0126538) and further in view of Cherian et al. (Pub. No. US 2025/0187198).
Regarding claim 19, Bhorkar in view of YANG further in view of Han doesn’t explicitly teach the claim limitation.
However, Cherian teaches wherein the processor is configured to submit (transmitted) the multi-depth encoding of the series input data to a downstream neural network to perform a navigation task (task navigation) [Para. 133-134].
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Bhorkar in view of YANG to further in view of Han teach the claim limitations stated above, feature as taught by Cherian; because the modification improves efficient downstream navigation and other time series tasks by compactly encoding evolving scene data into semantically meaningful multi-depth representations that reduce computational burden while preserving information for neural network decision making

Claims 8-10 are rejected under 35 U.S.C. 103 as being unpatentable over Bhorkar (Pub. No. US 2020/0244969) in view of YANG et al. (Pub. No. US 2021/0089863) further in view of Narayanan et al. (Pub. No. US 2023/0132280).
Regarding claim 8, Bhorkar in view of YANG doesn’t explicitly teach the claim limitation.
However, Narayanan teaches a navigation system including a neural network (neural network) configured to generate a navigation command (navigational commands) based on the multi-depth encoding of the series input data [Para. 53, and 61].
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Bhorkar in view of YANG to teach the claim limitations stated above, feature as taught by Narayanan; because the modification improves indoor robot navigation and object transport tasks by using learned topological scene representations and navigation policies to help a robot find objects, plan routes around clutter and occlusions, and deliver the objects to goal locations more effectively.
Regarding claim 9, Bhorkar in view of YANG doesn’t explicitly teach the claim limitation.
However, Narayanan teaches wherein the scene includes observing objects in a room of a building by the robot moving within the building, and an end of the scene is detected when a robot exits the room [Para. 3, 18, and 43].
  It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Bhorkar in view of YANG to teach the claim limitations stated above, feature as taught by Narayanan; because the modification improves indoor robot navigation and object transport tasks by using learned topological scene representations and navigation policies to help a robot find objects, plan routes around clutter and occlusions, and deliver the objects to goal locations more effectively.
Regarding claim 10, Bhorkar in view of YANG doesn’t explicitly teach the claim limitation.
However, Narayanan teaches wherein the processor is configured to execute a scene decoder configure to generate a navigation plan, the navigation plan including computer-executable instructions that cause the robot to reach a target object in a scene previously encoded by the autoencoder [Para. 53 and Para. 42 “When the agent is within a threshold distance of the target object or goal zone, based on object closeness score 216, block 222 performs the pickup or drop action”].  
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Bhorkar in view of YANG to teach the claim limitations stated above, feature as taught by Narayanan; because the modification improves indoor robot navigation and object transport tasks by using learned topological scene representations and navigation policies to help a robot find objects, plan routes around clutter and occlusions, and deliver the objects to goal locations more effectively.

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Bhorkar (Pub. No. US 2020/0244969) in view of YANG et al. (Pub. No. US 2021/0089863) further in view of HAJIMIRSADEGHI et al. (Pub. No. US 2020/0076841).
Regarding claim 11, Bhorkar in view of YANG doesn’t explicitly teach the claim limitation.
However, HAJIMIRSADEGHI teaches wherein the processor (computer) is configured to execute a supernode graph embeddings (graph embedding) (SuGE) algorithm to perform the non-uniform unrolling recursion to encode the series input data (sequence of related log messages) into a super node corresponding to the multi-depth encoding of the series input data (sequence of related log messages) [Para. 99 and 253].
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Bhorkar in view of YANG to teach the claim limitations stated above, feature as taught by HAJIMIRSADEGHI; because the modification improves sequential log-data analysis by generating context-aware graph/recurrent embeddings that better capture relationships across related messages for more accurate anomaly detection. 

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Bhorkar (Pub. No. US 2020/0244969) in view of YANG et al. (Pub. No. US 2021/0089863) further in view of AYUSH et al. (Pub. No. US 2023/0316379).
Regarding claim 13, Bhorkar in view of YANG doesn’t explicitly teach the claim limitation.
However, AYUSH teaches wherein the autoencoder (type-conditioned graph autoencoder) is a graph autoencoder [Para. 4].  
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Bhorkar in view of YANG to teach the claim limitations stated above, feature as taught by AYUSH; because the modification improves bundle recommendation accuracy by modeling item-item relationships with a type-conditioned graph autoencoder to better predict visual compatibility. 








				              Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SOLOMON G BEZUAYEHU whose telephone number is (571)270-7452.  The examiner can normally be reached on Monday-Friday 10 AM-7 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, O’Neal Mistry can be reached on 313-446-4912. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-0101 (IN USA OR CANADA) or 571-272-1000.

/SOLOMON G BEZUAYEHU/           Primary Examiner, Art Unit 2666                                                                                                                                                                                             

/ONEAL R MISTRY/           Supervisory Patent Examiner, Art Unit 2674

Read full office action

Prosecution Timeline

Dec 07, 2023

Application Filed

Mar 31, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/785,277

Patent 12602717

APPARATUS, METHOD, AND COMPUTER-READABLE STORAGE MEDIUM FOR CONTEXTUALIZED EQUIPMENT RECOMMENDATION

2y 5m to grant Granted Apr 14, 2026

18/464,620

Patent 12602946

DOCUMENT CLASSIFICATION USING UNSUPERVISED TEXT ANALYSIS WITH CONCEPT EXTRACTION

2y 5m to grant Granted Apr 14, 2026

17/307,407

Patent 12591350

TECHNIQUES FOR POSITIONING SPEAKERS WITHIN A VENUE

2y 5m to grant Granted Mar 31, 2026

17/941,676

Patent 12586355

ROAD AND INFRASTRUCTURE ANALYSIS TOOL

2y 5m to grant Granted Mar 24, 2026

18/770,154

Patent 12561852

Cross-Modal Contrastive Learning for Text-to-Image Generation based on Machine Learning Models

2y 5m to grant Granted Feb 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

75%

Grant Probability

99%

With Interview (+30.9%)

3y 4m

Median Time to Grant

Low

PTA Risk

Based on 618 resolved cases by this examiner. Grant probability derived from career allow rate.