DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1 – 3, 5, 6, 9, 11, 12, 14, 15, 17 – 19, 21, 22, 24, 25, 27, 28, 30, 32 – 35, 37 – 39, 41, 43, 44, 46, 49 – 51 and 53 are pending in this application.
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 07/22/2024, 05/01/2025 and 07/16/2025 were filed in compliance with the provisions of 37 CFR 1.97 and 1.98. Accordingly, the information disclosure statement is being considered by the examiner.
Applicants have not provided an explanation of some relevance of cited document(s) discussed below.
Sachdev et al. (U.S PreGrant Publication No. 2019/0297276 A1) teaches an endoscopy video feature enhancement platform (100) that has a video capture device configured to capture an input endoscopy video stream received by an input video interface. A detection module applies a previously trained detection model to the endoscopy video stream and generates detection of a region of possible abnormality in the endoscopy video stream. An abnormality identification module detects a type of the possible abnormality detected by the detection module and overlays a visual border around the detected region to generate an augmented endoscopy video stream. A video augmentation module overlays a visual indicator of the type of the possible abnormality in the detected region over a relevant portion of the augmented endoscopy video stream. A video output device outputs the augmented endoscopy video stream to an output video interface.
Setegn et al. (U.S PreGrant Publication No. 2020/0258627 A1) include a set of hardware and software tools employed to rapidly rule-out patients that present to, for example, the emergency room and observation clinical decision units with chest pain, for coronary artery disease.
Schwartz et al. (U.S PreGrant Publication No. 2019/0046007 A1) teaches an endoscopy system configured to place video recording in an active mode; receive voice command to start recording; receive voice command to take note; and convert audio to text note.
Poltaretskyi et al. (U.S PreGrant Publication No. 2019/0380792 A1) provides an example method includes displaying, via a visualization device and overlaid on a portion of an anatomy of a patient viewable via the visualization device, a virtual model of the portion of the anatomy obtained from a virtual surgical plan for an orthopedic joint repair surgical procedure to attach a prosthetic to the anatomy; and displaying, via the visualization device and overlaid on the portion of the anatomy, a virtual guide that guides at least one of preparation of the anatomy for attachment of the prosthetic or attachment of the prosthetic to the anatomy.
Kim et al. (U.S Patent No. 10,102,444 B2) provide an object recognition method and apparatus which determine an object of interest included in a recognition target image using a trained machine learning model and determine an area in which the object of interest is located in the recognition target image. The object recognition method based on weakly supervised learning, performed by an object recognition apparatus, includes extracting a plurality of feature maps from a training target image given classification results of objects of interest, generating an activation map for each of the objects of interest by accumulating the feature maps, calculating a representative value of each of the objects of interest by aggregating activation values included in a corresponding activation map, determining an error by comparing classification results determined using the representative value of each of the objects of interest with the given classification results and updating a CNN-based object recognition model by back-propagating the error.
Ma et al. (U.S Patent No. 10,682,108 B1) provide methods, systems, and computer readable media for deriving a three-dimensional (3D) surface from colonoscopic video are disclosed. According to one method for deriving a 3D surface from colonoscopic video, the method comprises: performing video frame preprocessing to identify a plurality of keyframes of a colonoscopic video, wherein the video frame preprocessing includes informative frame selection and keyframe selection; generating, using a recurrent neural network and direct sparse odometry, camera poses and depth maps for the keyframes; and fusing, using SurfelMeshing and the camera poses, the depth maps into a three-dimensional (3D) surface of a colon portion, wherein the 3D surface indicates at least one region of the colon portion that was not visualized.
Pizer et al. (U.S Patent No. 10,733,745 B2) disclose methods, systems, and computer readable media for deriving a three-dimensional (3D) textured surface from endoscopic video. According to one method for deriving a 3D textured surface from endoscopic video, the method comprises: performing video frame preprocessing to identify a plurality of video frames of an endoscopic video, wherein the video frame preprocessing includes informative frame selection, specularity removal, and key-frame selection; generating, using a neural network or a shape-from-motion-and-shading (SfMS) approach, a 3D textured surface from the plurality of video frames; and optionally registering the 3D textured surface to at least one CT image.
Byun et al. (U.S Patent No. 12182999 B2) teaches a system and method for assisting colonoscopic image diagnosis based on artificial intelligence include a processor, which is configured to analyze each video frame of a colonoscopic image using at least one medical image analysis algorithm and detects a finding suspected of being a lesion in the video frame. The processor calculates the coordinates of the location of the finding suspected of being a lesion. The processor generates display information, including whether the finding suspected of being a lesion is present and the coordinates of the location of the finding suspected of being a lesion.
Smith et al. (U.S Patent No. 11830607 B2) disclose a system for facilitating image finding analysis includes one or more processors and one or more hardware storage devices storing instructions that are executable by the one or more processors to configure the system to perform acts such as (i) presenting an image on a user interface, the image being one of a plurality of images provided on the user interface in a navigable format, (ii) obtaining a voice annotation for the image, the voice annotation being based on a voice signal of a user, and (iii) binding the voice annotation to at least one aspect of the image, wherein the binding modifies metadata of the image based on the voice annotation.
Sullivan et al. (U.S PreGrant Publication No. 2020/0035350 A1) disclose a method for processing one or more histological images captured by a medical imaging device. In this method, the histological image is received, and target regions each of which corresponds to a candidate type of tissue are identified based on a predictive model as sociating one more sample histological images with one or more sample target histological images. One or more display characteristics associated with the identified at least one target histological image is applied to the histological image.
Sonntag (U.S PreGrant Publication No. 2013/0249783 A1) is related to a method and system for annotating image regions with specific concepts based on multimodal user input. The system (10) comprises an identification unit (11) for the identification of a region of interest on a multidimensional image; an automatic speech recognition unit (12) for recognizing speech input in a natural language; a natural language understanding unit (13) which interprets the speech input in the context of a specific application domain; a fusion unit (14) which combines the multimodal user input from the identification unit (11) and the natural language understanding unit (13); and an annotation unit (15) which annotates the result of the natural language understanding unit (13) on the image regions and optionally provides user feedback about the annotation process. Thus, the system advantageously facilitates a user's task to annotate specific image regions with standardized key concepts based on multimodal speech-based user input.
Specification
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1 - 3, 5, 6, 9, 11, 12, 14, 15, 19, 21, 22, 24, 25, 30, 31, 33 - 35, 37 - 39, 41, 43, 44, 46, 51 and 53 are rejected under 35 U.S.C. 103 as being unpatentable over Innanje et al. (U.S PreGrant Publication No. 2021/0090736 A1, hereinafter ‘Innanje’) in view of Paik et al. (U.S Patent No. 11,894,114 B2, hereinafter ‘Paik’).
With respect to claim 1, Innanje teaches a system for analyzing medical image data for a medical procedure (e.g., a system for examining medical image data for a medical procedure, ¶0036 - ¶0038), wherein the system comprises: a non-transitory computer-readable medium (e.g., a non-transitory computer readable medium, ¶0019) having stored thereon program instructions for analyzing medical image data for the medical procedure (e.g., wherein include a set of instructions for anomaly detection for a medical procedure, ¶0019); and at least one processor that, when executing the program instructions (e.g., at least a processor 210 configured to control function(s), ¶0051 - ¶0052), is configured to: receive at least one image from a series of images (e.g., acquire one or more images of different modalities or acquire an image relating to at least one part of a subject and perform treatment on the at least one part of the subject, etc., ¶0041); determine when there is at least one object of interest (OOI) in the at least one image and, when there is at least one OOI (e.g., determine when there is at least an area or object of interest (AOI or OOI) from the acquired images, when there is at least one OOI, ¶0042, ¶0059 - ¶0060, Fig. 6), determine a classification for the at least one OOI, where both determinations are performed using at least one machine learning model (e.g., perform a classification for the AOI/OOI, where both process are performed using a machine learning model, ¶0038, ¶0082, ¶0102); display the at least one image and any determined OOIs to a user on a display using a bounding box during the medical procedure (e.g., present/display at least a determined/performed OOI, via a display, using bounding box during the medical procedure, ¶0082, ¶0107); receive an input audio signal including speech from the user during the medical procedure and recognize the speech (e.g., receiving an audio data that represent a voice to be recognized, ¶0087); when the speech is recognized as a comment on the at least one image during the medical procedure, convert the speech into at least one text string using a speech-to-text conversion algorithm (e.g., convert the audio data into text using the voice recognition technique, ¶0087); and match the at least one text string with the at least one image for which the speech from the user was provided (e.g., determine whether an anomaly regarding the medical procedure exists based on the text data and/or the one or more images in the image data, ¶0087); but fails to teach: a terminology correction algorithm when converting said speech into at least a text string using a speech-to-text conversion algorithm; and generate at least one annotated image in which the at least one text string is linked to the corresponding at least one image during the medical procedure.
However, in the same field of endeavor of converting speech-to-text, the mentioned claimed limitations are well-known in the art as evidenced by Paik. In particular, Paik teaches when the speech is recognized as a comment on the at least one image during the medical procedure, convert the speech into at least one text string using a speech-to-text conversion algorithm and a terminology correction algorithm (Paik: e.g., when a doctor/physician says (speaks) a word during a surgical procedure, convert said word into text using a speech-to-text conversion algorithm and find for corresponding sentence(s) to be inserted into a medical report, Col 4 (line 50) to Col 5 (line 62); Col 6 (lines 14 – 31), Col 35 (lines 12 – 38), Fig. 1, Fig. 3); and generate at least one annotated image in which the at least one text string is linked to the corresponding at least one image during the medical procedure (e.g., an annotated image along with at least a text is generated during medical practice, Col 2 (lines 45 – 52); Col 4 (lines 14 – 49); Col 33 (lines 50 – 55); Col 62 (lines 3 – 15), Fig. 2).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention was made to modify the system of Innanje as taught by Paik since Paik suggested within Col 4 (line 50) to Col 5 (line 62); Col 6 (lines 14 – 31), Col 35 (lines 12 – 38), Fig. 1, Fig. 3 that such modification of having a terminology correcting algorithm would prevent a user/physician from having to say two words in order to decrease errors and/or increase efficiency in result(s) during medical procedures.
With respect to claim 2, Innanje in view of Paik teaches the system of claim 1, wherein the at least one processor is further configured to, when the speech is recognized as a request for at least one reference image with OOIs that have been classified with the same classification as the at least one OOI during the medical procedure, display the at least one reference image and receive input from the user that either confirms or dismisses the classification of the at least one OOI to update the at least one machine learning model (e.g., the speech or audio data, as an input, is detected as a request, for an image with OOI that have been classified during medical/surgery procedure, present the image in order to allow a user to respond to a feedback related to an anomaly, ¶0036, ¶0061 and ¶0087).
With respect to claim 3, Innanje in view of Paik teaches the system of claim 1, wherein the at least one processor is further configured to, when the at least one OOI is classified as being suspicious, receive input from the user indicating a user classification for the at least one image with the undetermined OOI; and/or wherein the at least one processor is further configured to automatically generate a report that includes the at least one annotated image (e.g., At least one OOI is classified as anomaly, and receive input from feedback, and/or present/display a notification that include the representation that anomaly exists, ¶0038, ¶0061, ¶0074, ¶0082 - ¶0085).
With respect to claim 5, Innanje in view of Paik teaches the system of claim 1, wherein the at least one processor is further configured to, for a given OOI in a given image: identify bounding box coordinates for a bounding box that is associated with the given OOI in the given image; calculate a confidence score based on a probability distribution of the classification for the given OOI; overlay the bounding box on the at least one image at the bounding box coordinates when the confidence score is higher than a confidence threshold; and upon receiving confirmation from the user during the medical procedure, overlay custom vocabulary on the at least one image (e.g., these correspond to the bounding box(es) referring to at least a portion of the detected object of interest in the image data, ¶0082, ¶0107; score of each of the plurality of regions indicating probability that the each of the plurality of regions includes the at least one of the one or more objects of interest, ¶0016, ¶0080 - ¶0081, ¶0100; if the trained machine learning model determines that an anomaly score of the specific region is greater than an anomaly threshold, the trained machine learning model may determine that the detection result of the specific region is positive and/or designate a positive label “1” for the specific region; otherwise, the trained machine learning model may determine that the predicted result of the specific region is negative and/or designate a negative label “0” for the specific region, ¶0080; and provide the feedback to “accept or skip”, ¶0082 - ¶0085).
With respect to claim 6, Innanje in view of Paik teaches the system of claim 1, wherein: (a) the at least one processor is configured to determine the classification for the OOI by: applying a convolutional neural network (CNN) to the OOI by performing convolutional, activation, and pooling operations to generate a matrix; generating a feature vector by processing the matrix using the convolutional, activation, and pooling operations; and performing the classification of the OOI based on the feature vector (e.g., a convolutional neural network is applied to the object of interest (OOI) by performing convolutional, learning machine models, and pooling layers involving a matrix reduction, ¶0093 - ¶0094; the one or more image features extracted and/or output by the object detection model may be also referred to as a feature map or vector, ¶0080, ¶0094); (b) the at least one processor is further configured to overlay a timestamp and time-stamped documentation of at least one procedural occurrence on the corresponding at least one image when generating the at least one annotated image during the medical procedure (e.g., timestamp are also recorded during medical procedure, ¶0073, ¶0107); and/or (c) the at least one processor is further configured to indicate the confidence score on the at least one image in real time on a display during the medical procedure (e.g., or providing weighting coefficient(s) or score(s) in real-time in/during medical procedure, ¶0080 - ¶0081, ¶0094, ¶0100 with ¶0036, ¶0040, ¶0059, ¶0074).
With respect to claim 9, Innanje in view of Paik teaches the system of claim 1, wherein the at least one processor is configured to receive the input audio during the medical procedure by: initiating receipt of an audio stream for the input audio from the user upon detection of a first user action that includes: pausing a display of the series of images; taking a snapshot of a given image in the series of images; or providing an initial voice command; and ending receipt of the audio stream upon detection of a second user action that includes: remaining silent for a pre-determined length; pressing a designated button; or providing a final voice command; and/or the at least one processor is further configured to store the series of images when receiving the input audio during the medical procedure, thereby designating the at least one image to receive annotation data for generating a corresponding at least one annotated image (e.g., image data coupled with the collected audio data during medical procedure are stored including a representation or label, ¶0042, ¶0087, ¶0107).
With respect to claim 11, Innanje in view of Paik teaches the system of claim 4, wherein the at least one processor is further configured to generate a report for the medical procedure by: capturing a set of patient information data generated during the medical procedure to be added to the report; loading a subset of the series of images that includes the at least one annotated image or the at least one OOI identified by the bounding box; and combining the set of patient information data with the subset of the series of images that includes the at least one annotated image into the report (e.g., it refers to an activity or series of actions attended to achieve a result in delivery of healthcare, for example, directed at or performed on a subject (e.g., a patient) to measure, diagnosis and/or treat the subject. Exemplary medical procedures may include an immediate test, a diagnostic test, a treatment procedure, an autopsy, etc., ¶0040 - ¶0042, ¶0072 and ¶0078).
With respect to claim 12, Innanje in view of Paik teaches the system of claim 1, wherein the at least one processor is further configured to perform training of the at least one machine learning model by: applying an encoder to at least one training image to generate at least one feature vector for a training OOI in the at least one training image; selecting a class for the training OOI by applying the at least one feature vector to the at least one machine learning model; and reconstructing, using a decoder, a labeled training image by associating the at least one feature vector with the at least one training image and the selected class with which to train the at least one machine learning model (Under examiner’s approach: segmentation is well-known considered as a encoder to generate at least a feature representation for training OOI from a training sample, ¶0010, ¶0018, ¶0065 - ¶0067, ¶0099; define positive/negative result(s), upon classification, for training OOI by applying at least the feature representation to a trained machine learning, ¶0059, ¶0067; from the defined positive/negative result(s), a labeled training image is constructed, ¶0059 - ¶0066, ¶0079 - ¶0080, ¶0089 - ¶0093).
With respect to claim 14, Innanje in view of Paik teaches the system of claim 12, wherein the class is a healthy tissue class, an unhealthy tissue class, a suspicious tissue class, or an unfocused tissue class and wherein the at least one processor is further configured to: train the at least one machine learning model using training datasets that include labeled training images, unlabelled training images, or a mix of labelled and unlabelled training images, the images including examples categorized by healthy tissue, unhealthy tissue, suspicious tissue, and unfocused tissue (e.g. it’s simply the trained machine learning model that classify one or more objects present in the inputted image data into two categories including a positive category and a negative category; the positive category can be a “class” indicating a presence of anomaly, and negative category can be another “class” indicating no presence of anomaly, ¶0082, ¶0102; anomaly is a disease from tissue, ¶0040).
With respect to claim 15, Innanje in view of Paik teaches the system of any one of claim 12, wherein the at least one processor is further configured to train the at least one machine learning model by using supervised learning, unsupervised learning, or semi-supervised learning; and/or the training datasets further include subcategories for each of the unhealthy tissue and the suspicious tissue (e.g., supervised learning, semi-supervised learning, and/or others, ¶0091).
With respect to claim 19, Innanje in view of Paik teaches the system of claim 1, wherein Paik teaches the at least one processor is further configured to train the speech-to-text conversion algorithm using a speech dataset, the speech dataset comprising ground truth text and audio data for the ground truth text, to compare new audio data to the speech dataset to identify a match with the ground truth text; and/or wherein the speech-to-text conversion algorithm maps the at least one OOI to one of a plurality of OOI medical terms. (Paik: e.g. In order to monitor and improve performance of a machine learning-based computer vision algorithms and NLP algorithms and systems, the computational system can be tied to a user interface system allowing for active learning. Active learning is a process whereby ground-truth user interaction data is fed back into the algorithms' training set, helping the AI algorithm to learn from data gathered in the real world reflecting the algorithms' performance “in the wild.” Accordingly, described here is system for gathering that ground truth feedback from users, and well as an NLP-based system which detects incongruence in inconsistency between a user's explicit interaction with the outputs of the AI models (through the described UI components) and the words they dictate into the diagnostic report. This system ensures that the highest-fidelity ground truth data is fed back into the algorithms for training. A non-limiting flow chart of this process is illustrated in FIG. 8. The flow chart shows the medical image being analyzed by the AI algorithm to generate findings (e.g., AI-assisted findings) for insertion into a medical report. Next, the AI findings are displayed to the user such as a radiologist who can choose to amend or accept the finding. Both decisions provide ground truth data that can be used to further train the algorithm to improve future performance. The amended finding may then be accepted. Once the finding has been accepted, the billing system may then accept the charge corresponding to the finding, Col 35 (lines 12 – 38)).
With respect to claim 21, Innanje in view of Paik teaches the system of claim 1, wherein the medical image data is obtained from one or more endoscopy procedures, one or more MRI scans, one or more CT scans, one or more X-rays, one or more ultrasonographs, one or more nuclear medicine images, or one or more histology images (e.g. refer to ¶0041 for more details, wherein provides some examples of medical equipments obtaining medical images such CT, X-Rays, MRI scan, etc.; thereby well-known in the art).
With respect to claims 22, 24, 25, 30 and 31, these are rejected for the similar reasons as those described in connection with claim 1, 14, 15, 19 and 21, respectively.
With respect to claim 33, this is a method claim corresponding to the system claim 1. Therefore, this is rejected for the same reasons as the system claim 1.
With respect to claim 34, 35 and 37, these are method claims corresponding to the system claim 2, 3 and 5, respectively. Therefore, these are rejected for the same reasons as the system claims 2, 3 and 5, respectively.
With respect to claims 38 and 39, these are method claims corresponding to the system claim 6. Therefore, this is rejected for the same reasons as the system claim 6 (Note that claim 38 and 39 is literally similar to claim 6).
With respect to claim 41 and 43, these are method claims corresponding to the system claims 9 and 11, respectively. Therefore, these are rejected for the same reasons as the system claims 9 and 11, respectively.
With respect to claim 44, this is a method claim corresponding to the system claims 12 and 13. Therefore, this is rejected for the same reasons as the system claims 12 and 13.
With respect to claim 46, 51 and 53, these are a method claims corresponding to the system claims 14, 19 and 21, respectively. Therefore, these are rejected for the same reasons as the system claims 14, 19 and 21, respectively.
Claims 17, 27 and 49 are rejected under 35 U.S.C. 103 as being unpatentable over Innanje in view of Paik and further in view of Gernand et al. (U.S PreGrant Publication No. 2021/0056691 A1, hereinafter ‘Gernand’).
With respect to claim 17, Innanje in view of Paik teaches the system of claim 12, but neither of them teaches wherein the at least one processor is further configured to create the at least one machine learning model by: receiving training images as input to the encoder; projecting the training images, using the encoder, into features that are part of a feature space; mapping the features, using a classifier, to a set of target classes; identifying morphological characteristics of the training images to generate a new training dataset, the new training dataset having data linking parameters to the training images; and determining whether there is one or more mapped classes or no mapped classes based on the morphological characteristics.
However, in the same field of endeavor of medical data, receiving training data, classifying and label(s), the aforementioned claim limitations are well-known in the art as evidenced by Gernand. In general, Gernand, teaches wherein the at least one processor is further configured to create the at least one machine learning model by:
receiving training images as input to the encoder (e.g., receiving training images toward an encoder, abstract, ¶0004 with ¶0085, Fig. 3);
projecting the training images, using the encoder, into features that are part of a feature space (e.g., predicting the received training images, via the encoder, into feature, ¶0048 - ¶0049, ¶0089);
mapping the features, using a classifier, to a set of target classes (e.g., mapping the features, using a classifier, to a set of target classes, ¶0050, ¶0077, ¶0089);
identifying morphological characteristics of the training images to generate a new training dataset, the new training dataset having data linking parameters to the training images (e.g., involving morphological characterization to generate a new dataset, the new dataset includes parameters to the received training images, ¶0005, ¶0042 - ¶0043, ¶0059, ¶0079, Fig. 10B); and
determining whether there is one or more mapped classes or no mapped classes based on the morphological characteristics (e.g., determining whether there is a one or more mapped classes or not mapped classed based on the involved morphological characterization, abstract, ¶0042 - ¶0043, ¶0079, ¶0128, Fig. 13).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention was made to modify the system of Innanje in view of Paik as taught by Gernand since Gernand suggested within ¶0005, ¶0042 - ¶0043, ¶0059, ¶0079, Fig. 10B that such modification of identifying morphological characteristics and determining whether there is a mapped classes or no mapped classes based on morphological characteristics would enables physicians/clinician to explore the possible futures of a system based on a combination of hypotheses related to system's components or variables and/or analyze image(s) by observing physical features, forms, and structures of an organism (like shape, size, color, internal anatomy) or even a structure of words, used for identification, classification, and understanding evolutionary adaptations, contrasting with physiology (function) in biology and word parts in linguistics in order to improve feature analysis tasks.
With respect to claim 27, it's rejected for the similar reasons as those described in connection with claim 17.
With respect to claim 49, this is a method claim corresponding to the system claim 17. Therefore, this is rejected for the same reasons as the system claim 17.
Claims 18, 28 and 50 are rejected under 35 U.S.C. 103 as being unpatentable over Innanje in view of Paik and Gernand and further in view of Arditi (U.S PreGrant Publication No. 2019/0147331 A1, hereinafter ‘Arditi’).
With respect to claim 18, Innanje in view of Paik and further in view of Gernand teaches the system of claim 17, but neither of them teaches wherein the at least one processor is further configured to determine the classification for the at least one OOI by: receiving one or more of the features as input to the decoder; mapping the one of the features over an unlabeled data set using a deconvolutional neural network; and reconstructing a new training image from the one of the features using the decoder to train the at least one machine learning model.
However, in the same field of endeavor of neural network, the technology of the aforementioned claim limitations are well-known in the art as evidenced by Arditti. In general, despite the cited limitations of vehicles and roads, Arditti technologically teaches wherein the at least one processor is further configured to determine the classification for the at least one OOI by: receiving one or more of the features as input to the decoder (e.g., receiving encoded data toward a decoder, ¶0018, ¶0027 and ¶0030);
mapping the one of the features over an unlabeled data set using a deconvolutional neural network (e.g., generate a map data over an unlabeled training data using a deconvolutional neural network, ¶0020, ¶0028 - ¶0030, ¶0038); and
reconstructing a new training image from the one of the features using the decoder to train the at least one machine learning model (e.g., updating training data from the generated map data using the decoder to train a machine-learning model, ¶0021 - ¶0023, ¶0028 - ¶0030, ¶0035).
Hence the prior art includes each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of an actual combination of the elements in a single prior art reference. Arditi and Gernand are combinable because they are from the same field of endeavor of applying neural network(s); both references teach generating feature map(s).
Therefore, one of ordinary skill in the art could have combined the elements as claimed by known methods, and that in combination, each element merely performs the same function as it does separately.
The results of the combination would have been predictable and resulted in modifying the invention of Innanje in view of Paik and further in view of Gernand to include a deconvolutional neural network.
At the time of the invention, it would have been obvious to a person of ordinary skill in the art to use the deconvolutional neural network when generating a feature map. The suggestion/motivation for doing so is because deconvoluting can inverse or reverse the process made by a first neural network as disclosed by Arditi.
Therefore it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate Arditi into the system of Innanje in view of Paik and further in view of Gernand as taught by Arditi since Arditi suggested within ¶0020 - ¶0023, ¶0028 - ¶0030, ¶0035 that such design of mapping feature(s) using the deconvolutional neural network would reverse process made by a convolutional neural network in order to generate effectively up-sampling low-resolution feature maps; thereby providing high-resolution outputs and/or improving training result(s) obtained using unlabeled training data.
With respect to claim 28, it's rejected for the similar reasons as those described in connection with claim 18.
With respect to claim 50, this is a method claim corresponding to the system claim 18. Therefore, this is rejected for the same reasons as the system claim 18.
Conclusion
The prior art made of record and not relied upon are considered pertinent to applicant's disclosure:
Li (U.S PG Publication No. 2020/0320135 A1)1
Cherepanov et al. (U.S PG Publication No. 2021/0158796 A1)2
Mahjouri et al. (U.S PG Publication No. 2022/0328173 A1)3
1This reference generally teaches a function that converts information into text information of voice information that has been currently inputted by an user. Because the user has not finished speaking, the text information obtained through conversion is incomplete, and even a syntax error may exist. In this case, an error correction module may provide a segment of complete text, without a syntax error, that is corrected based on current text content, to return the text and provide the text for the user to select and use.
2This reference teaches a suitable speech to text technique can be used to provide voice recognition of the voice input and convert it into a recognized text output. The speech to text technique can include the use of an acoustic model that identifies phonemes or other linguistic units from the audio of the voice input and a language model that assigns probabilities to particular words or sequences of words. In some implementations, the speech to text technique can correct or compensate for errors in the voice input, e.g., based on spelling and/or grammar rules. The recognition output is provided to the user device, for example, for display in a user interface. The recognition output can be displayed, for example, to indicate the system's recognition of the voice input. The user can then examine the presented recognition output to determine whether the system correctly recognized the voice input. For example, the voice input “baroque pictures” may be recognized as [rock pictures]. Here the word “baroque” was misrecognized as “rock”.
3This reference teaches a head mounted camera-based system configured to collect medical data (e.g., video, images, etc.) during a medical procedure and extract parameters from collected medical data, and when a clinician or doctor speaks, spoken word or voice are converted into text including finding a keyword using AI corresponding to the spoken word or voice. Once converted, the text is annotated along the medical data for report, result or documentation.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JUAN M GUILLERMETY whose telephone number is (571)270-3481. The examiner can normally be reached 9:00AM - 5:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Benny Q TIEU can be reached at 571-272-7490. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JUAN M GUILLERMETY/Primary Examiner, Art Unit 2682