Last updated: April 19, 2026

Application No. 18/767,937

ENDOSCOPE SYSTEM, MEDICAL INFORMATION PROCESSING METHOD, AND MEDICAL INFORMATION PROCESSING PROGRAM

Non-Final OA §102§103

Filed

Jul 09, 2024

Examiner

VILLENA, MARK

Art Unit

2658

Tech Center

2600 — Communications

Assignee

Fujifilm Corporation

OA Round

1 (Non-Final)

Interview Optional

— +15.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 478 resolved cases, 2023–2026

Examiner Intelligence

VILLENA, MARK View full profile →

Grants 70% — above average

Career Allow Rate

334 granted / 478 resolved

+7.9% vs TC avg

Strong +16% interview lift

Without

With

+15.5%

Interview Lift

resolved cases with interview

Typical timeline

3y 10m

Avg Prosecution

22 currently pending

Career history

500

Total Applications

across all art units

Statute-Specific Performance

§101

13.7%

-26.3% vs TC avg

§103

51.5%

+11.5% vs TC avg

§102

20.4%

-19.6% vs TC avg

§112

5.0%

-35.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 478 resolved cases

Office Action

§102 §103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged or paper submitted under 35 U.S.C. 119(a)-(d), which papers have been places of record in the file.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 09/06/2024 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Drawings
The drawings were submitted on 07/09/2024.  These drawings are reviewed and accepted by the examiner.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-11, 14-15, and 17-18 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Horiuchi et al. (US 20190354176 A1).

Regarding claims 1, 17, and 18, Horiuchi teaches:
“An endoscope system” (par. 0039; ‘endoscopic system’) comprising:
“a speech recognition device configured to receive input of speech and perform speech recognition” (par. 0037; ‘The user's voice data that is input from an outer side is generated by a voice input unit such as a microphone (not illustrated). For example, in a case where a keyword input from an outer side represents “cancer”, “bleeding”, and the like, and the corresponding importance index is “10” and “8” to each, the setting unit 11 sets a period (section or time) in which the keyword occurs to the important period by using known voice pattern matching or the like.’);
“an endoscope configured to acquire a medical image of a subject” (par. 0039; ‘In addition, in the use aspect of the endoscopic system or the optical microscope, when performing recording as a moving image, gaze detection data and an image that is recorded or presented simultaneously with detection of the gaze are used to generate mapping data of the field of view.’); and 
“processor” (abstract; ‘An information processing apparatus includes a processor comprising hardware, the processor being configured to execute…’), wherein the processor is configured to:
“cause the endoscope to capture time-series medical images of the subject” (par. 0050; ‘image data’; par. 0062; ‘As illustrated in FIG. 5, the display controller 15 causes the display unit 20 to display a gaze mapping image P1 in which the gaze mapping data generated by the generation unit 13 is superimposed on an image corresponding to image data.’);
“detect delimiters of results of the speech recognition during capturing of the time-series medical images” (par. 0051; Utterance period; See also par. 0053, 0055); and
“group and record, in a recording apparatus, the results of the speech recognition, the results being obtained during a period from detection of one of the delimiters until detection of another one of the delimiters corresponding to the one of the delimiters at a time later than a time at which the one of the delimiters is detected” (par. 0037; ‘Note that, the setting unit 11 may set the important period to include time before and after the period in which the keyword occurs, for example, approximately one second or two seconds.’; par. 0064; ‘According to the above-described first embodiment, with respect to the gaze data that is correlated with the same time axis as in the voice data, the analysis unit 12 allocates the corresponding gaze period corresponding to an index allocated to the keyword of the important words to a period corresponding to the important period of the voice data which is set by the setting unit 11 to synchronize the voice data and the gaze data, and records the voice data and the gaze data in the recording unit 14.’; par. 0065; ‘In addition, in the first embodiment, the generation unit 13 generates the gaze mapping data in which the corresponding gaze period analyzed by the analysis unit 12 and coordinate information of the corresponding gaze period are correlated with an image corresponding to image data that is input from an outer side, and thus a user can intuitively understand an important position on the image.’).

Regarding claim 2 (dep. on claim 1), Horiuchi further teaches:
“wherein the processor is configured to cause a display device to display item information indicating an item to be subjected to speech recognition and a result of the speech recognition corresponding to the item information if the speech recognition is started” (par. 0063; ‘Note that, in FIG. 6, the display controller 15 may cause the display unit 20 to display textual information (for example, the message Q1 and the message Q2) obtained by converting voice data that is uttered by a user in a period (time) of each corresponding gaze period by using a known character conversion technology in the vicinity of records M11 to M15, or in a state of being superimposed on the records.’).

Regarding claim 3 (dep. on claim 2), Horiuchi further teaches:
“wherein the processor is configured to record, in the recording apparatus, the results of the speech recognition corresponding to one set of pieces of the item information as one group” (par. 0063; ‘Note that, in FIG. 6, the display controller 15 may cause the display unit 20 to display textual information (for example, the message Q1 and the message Q2) obtained by converting voice data that is uttered by a user in a period (time) of each corresponding gaze period by using a known character conversion technology in the vicinity of records M11 to M15, or in a state of being superimposed on the records.’).

Regarding claim 4 (dep. on claim 2), Horiuchi further teaches:
“continue to display the item information and the result of the speech recognition from detection of the one of the delimiters until detection of the other one of the delimiters” (par. 0107; ‘In addition, in a case where the user U1 operates the operating unit 37 and selects any one of the records M11 to M15, for example, the record M14 is selected, the display controller 323 highlights the record M14 on the display unit 20, and highlights textual information corresponding to time of the record M14, for example, the icon B4 on the display unit 20 (for example, a frame is highlighted or is displayed with a bold line). According to this, the user U1 can intuitively understand important voice content and a gazing area, and can intuitively understand content at the time of utterance.’); and
“change a display manner of the item information and the result of the speech recognition on the display device if the other one of the delimiters is detected” (par. 0107; ‘In addition, the display controller 323 causes the display unit 20 to display icons B1 to B5 in which textual information and time at which the textual information is uttered are correlated.’).

Regarding claim 5 (dep. on claim 2), Horiuchi further teaches:
“wherein the processor is configured to cause the display device to display the item information and the result of the speech recognition in real time” (par. 0063; ‘Note that, in FIG. 6, the display controller 15 may cause the display unit 20 to display textual information (for example, the message Q1 and the message Q2) obtained by converting voice data that is uttered by a user in a period (time) of each corresponding gaze period by using a known character conversion technology in the vicinity of records M11 to M15, or in a state of being superimposed on the records.’).

Regarding claim 6 (dep. on claim 2), Horiuchi further teaches:
“wherein the item information includes at least one of diagnosis, findings, treatment, or hemostasis” (par. 0037; ‘For example, in a case where a keyword input from an outer side represents “cancer”, “bleeding”, and the like, and the corresponding importance index is “10” and “8” to each, the setting unit 11 sets a period (section or time) in which the keyword occurs to the important period by using known voice pattern matching or the like.’).

Regarding claim 7 (dep. on claim 1), Horiuchi further teaches:
“detect the one of the delimiters as a start delimiter of grouping and detect the other one of the delimiters as an end delimiter of the grouping” (par. 0064; ‘According to the above-described first embodiment, with respect to the gaze data that is correlated with the same time axis as in the voice data, the analysis unit 12 allocates the corresponding gaze period corresponding to an index allocated to the keyword of the important words to a period corresponding to the important period of the voice data which is set by the setting unit 11 to synchronize the voice data and the gaze data, and records the voice data and the gaze data in the recording unit 14.’).

Regarding claim 8 (dep. on claim 7), Horiuchi further teaches:
“group the results of the speech recognition during a period from detection of the end delimiter until re-detection of the end delimiter at a time later than a time at which the end delimiter is detected” (par. 0064; ‘According to the above-described first embodiment, with respect to the gaze data that is correlated with the same time axis as in the voice data, the analysis unit 12 allocates the corresponding gaze period corresponding to an index allocated to the keyword of the important words to a period corresponding to the important period of the voice data which is set by the setting unit 11 to synchronize the voice data and the gaze data, and records the voice data and the gaze data in the recording unit 14.’).

Regarding claim 9 (dep. on claim 7), Horiuchi further teaches:
“detect, as the end delimiter, at least one of an end of detection of a specific subject in the medical image, speech input of a first specific word/phrase to the speech recognition device, continuation of a non-input state of speech input to the speech recognition device for a determined time or more, completion of speech input to all items to be subjected to speech recognition, completion of speech input to a specific item among the items to be subjected to speech recognition, acquisition of information indicating that an insertion length and/or an insertion shape of the endoscope has changed by a determined value or more, or a start or stop of an operation by a user of the endoscope system via an operating device” (par. 0064; ‘According to the above-described first embodiment, with respect to the gaze data that is correlated with the same time axis as in the voice data, the analysis unit 12 allocates the corresponding gaze period corresponding to an index allocated to the keyword of the important words to a period corresponding to the important period of the voice data which is set by the setting unit 11 to synchronize the voice data and the gaze data, and records the voice data and the gaze data in the recording unit 14.’).

Regarding claim 10 (dep. on claim 7), Horiuchi further teaches:
“detect, as the start delimiter, at least one of a start of detection of a specific subject in the medical image, speech input of a second specific word/phrase to the speech recognition device, input by a user of the endoscope system via an operating device, a start of a discrimination mode for the specific subject, a start of output of a discrimination result for the specific subject, or a start of a measurement mode for the specific subject” (par. 0053; ‘As illustrated in FIG. 3, the setting unit 11 uses voice pattern matching that is known with respect to the voice data, and in a case where a keyword of important words input from an outer side is “cancer”, a period before and after an utterance period (utterance time) of the voice data in which the “cancer” occurs is set as an important period D1 in which the degree of importance is highest.’).

Regarding claim 11 (dep. on claim 9), Horiuchi further teaches:
“determine at least one of a lesion, a candidate lesion region, a landmark, or a post-treatment region as the specific subject” (par. 0176; ‘That is, the analysis unit 40 performs processing so that the greater important operation content such as enlargement observation and treatment countermeasure with respect to a lesion is, the higher the rank of the corresponding gaze period is.’).

Regarding claim 14 (dep. on claim 1), Horiuchi further teaches:
“cause an image selected from the medical images captured by the endoscope during a period from detection of the one of the delimiters until detection of the other one of the delimiters to be grouped and recorded together with the results of the speech recognition” (par. 0099; ‘In this case, the display controller 323 causes the display unit 20 to display an image corresponding to image data that is selected in accordance with an operation of the operating unit 37.’; par. 0106; ‘Records M11 to M15 corresponding to gaze areas of a gaze based on the rank of the corresponding gaze period, and a trajectory K1 of the gaze are superimposed on the gaze mapping image P3, and textual information of the voice data that is uttered at timing of the corresponding gaze period is correlated with the gaze mapping image P3.’).

Regarding claim 15 (dep. on claim 1), Horiuchi further teaches:
“cause an image selected from frame images constituting the time-series medical images and/or an image selected from captured images captured separately from the time-series medical images to be grouped and recorded together with the results of the speech recognition” (par. 0099; ‘In this case, the display controller 323 causes the display unit 20 to display an image corresponding to image data that is selected in accordance with an operation of the operating unit 37.’; par. 0106; ‘Records M11 to M15 corresponding to gaze areas of a gaze based on the rank of the corresponding gaze period, and a trajectory K1 of the gaze are superimposed on the gaze mapping image P3, and textual information of the voice data that is uttered at timing of the corresponding gaze period is correlated with the gaze mapping image P3.’).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 12, 13, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Horiuchi in view of Horiuchi et al. (US 20210297635 A1), hereinafter referred to as Horiuchi 2. 

Regarding claim 12 (dep. on claim 9), Horiuchi does not expressly teach machine learning, as in:
“recognize the specific subject by using an image recognizer generated by machine learning.”
 Horiuchi 2 teaches:
“recognize the specific subject by using an image recognizer generated by machine learning” (par. 0053; ‘Specifically, the similar-region extracting unit 15b calculates feature data based on tissue characteristic such as a tint and a shape of the region of interest and extracts, from the entire observation image, as the similar region, a region where a difference from feature data of the region of interest is equal to or smaller than a predetermined threshold. The similar-region extracting unit 15b may extract, with machine learning using a convolutional neural network (CNN), a region similar to the region of interest from the observation image as the similar region.’).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Horiuchi’s information processing apparatus by incorporating Horiuchi 2’s similar-region extracting unit in order to recognize a specific subject by using an image recognizer generated by machine learning. The combination allows a user to input a region of interest in a handsfree state. (Horiuchi 2: par. 0003)

Regarding claim 13 (dep. on claim 8), the combination of Horiuchi in view of Horiuchi 2 further teaches:
“cause an output device to output a message for encouraging speech input for the medical image if the start delimiter is detected” (par. 0119; ‘Note that, besides the input of the speech, a speaker or the like that can output speech may be provided and a speech output function may be provided in the speech input unit 31.’).

Regarding claim 16 (dep. on claim 1), the Examiner takes official notice for the feature of two different displays, as in “wherein the processor is configured to cause a display device to display the time-series medical images and a different display device to display the results of the speech recognition.” One of ordinary skill in the art would find it obvious to utilize more than one visual output/display as a design choice. 


Conclusion
Other pertinent prior art are cited in the PTO-892 for the applicant's consideration. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARK VILLENA whose telephone number is (571)270-3191. The examiner can normally be reached 10 am - 6pm EST Monday through Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MARK . VILLENA
Examiner
Art Unit 2658



/MARK VILLENA/Examiner, Art Unit 2658

Read full office action

Prosecution Timeline

Jul 09, 2024

Application Filed

Mar 07, 2026

Non-Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/111,671

Patent 12591407

ROBUST VOICE ACTIVITY DETECTOR SYSTEM FOR USE WITH AN EARPHONE

2y 5m to grant Granted Mar 31, 2026

18/141,182

Patent 12592232

SYSTEMS, METHODS, AND APPARATUSES FOR DETECTING AI MASKING USING PERSISTENT RESPONSE TESTING IN AN ELECTRONIC ENVIRONMENT

2y 5m to grant Granted Mar 31, 2026

18/250,511

Patent 12586581

ELECTRONIC DEVICE CONTROL METHOD AND APPARATUS

2y 5m to grant Granted Mar 24, 2026

18/623,751

Patent 12578922

Natural Language Processing Platform For Automated Event Analysis, Translation, and Transcription Verification

2y 5m to grant Granted Mar 17, 2026

18/292,214

Patent 12573394

ESTIMATION METHOD, RECORDING MEDIUM, AND ESTIMATION DEVICE

2y 5m to grant Granted Mar 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

70%

Grant Probability

85%

With Interview (+15.5%)

3y 10m

Median Time to Grant

Low

PTA Risk

Based on 478 resolved cases by this examiner. Grant probability derived from career allow rate.