Last updated: April 19, 2026
Application No. 18/433,497
TEXT-TO-SPEECH DEVICE, METHOD OF CONTROLLING TEXT-TO-SPEECH DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM

Final Rejection §101§103
Filed
Feb 06, 2024
Examiner
LOWEN, NICHOLAS DANIEL
Art Unit
2653
Tech Center
2600 — Communications
Assignee
Jvckenwood Corporation
OA Round
2 (Final)
This examiner grants 62% of cases after interview

— +75.0% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 8 resolved cases, 2023–2026
Examiner Intelligence

LOWEN, NICHOLAS DANIEL View full profile →
Grants 62% of resolved cases
Career Allow Rate
5 granted / 8 resolved
+0.5% vs TC avg
Strong +75% interview lift
Without
With
+75.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
23 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
36.3%
-3.7% vs TC avg
§103
42.0%
+2.0% vs TC avg
§102
17.2%
-22.8% vs TC avg
§112
3.2%
-36.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 8 resolved cases
Office Action

§101 §103
DETAILED ACTION
This communication is in response to the Application filed on 2/6/2024. Claims 1-5 are pending and have been examined. Hence, this action is made FINAL.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 13, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. JP2023-016802, filed on 2/7/2023.

Response to Arguments
Regarding the claim interpretation under 35 U.S.C. 112(f) for claim 1, the applicant asserts that amended claim 1 does not recite the terms "means" or "step," and does recite claim terms that a person of ordinary skill in the art would recognize as being sufficiently definite structure to perform the recited function or operation.
Examiner agrees that the “means” or “step” language from the original claim set has been rewritten, there has also been a new 112(f) issue introduced in the form of “an imaging unit that is configured to” in claim 1. Details on this claim interpretation can be found below.

Regarding the rejections under 35 U.S.C. 101 for claims 1-5, in regards to Step 2A - Prong 1 the applicant asserts that a person, in his or her mind, even with the aid of pen and paper, could not practically the respective functions of an imaging unit that is configured to capture a video of a region around a user to acquire video data including the video, and a positional information sensor that is configured to measure positional information indicating a position of the user. Further, a person, in his or her mind, even with the aid of pen and paper, could not practically perform the operations of setting degrees of priority for the extracted pieces of text information based on a distance to a position of each piece of text information in the video data from a center of the video data; and converting the extracted pieces of text information into voice to cause a speaker to output the voice, in descending order of the set degrees of priority, as recited in amended claim 1.
Examiner respectfully disagrees, the act of capturing video represents pre-solution activity within the claim language as it is merely a data gathering step prior to the method beginning, such limitations cannot be relied upon to separate the method from a mental process. This also applies to measuring positional information of a user. A human is capable of identifying priorities for extracted text information. A video is just a continuous stream of images and the human mind is capable of looking at an image and reading/prioritizing text that appears closest to the middle of the image. A human is capable of reading said text aloud. The speaker is merely a component for post-solution activity as it performs a final presentment step for the data in the method. 
Applicant further asserts that amended claim 1 involves technologically improved computer-related and communication-related devices. This is so because amended claim 1 comprises a specific text-to-speech device that comprises at least one processor that is configured to execute the computer executable instructions to perform operations, comprising: ... setting degrees of priority for the extracted pieces of text information based on a distance to a position of each piece of text information in the video data from a center of the video data; and converting the extracted pieces of text information into voice to cause a speaker to output the voice, in descending order of the set degrees of priority. In contrast, existing methods, systems, and devices at least do not possess the above-identified features of amended claim 1. This is demonstrated, in part, by the fact that, when considering these elements of amended claim 1 and the other elements of amended claim 1 (as recited above) in an ordered combination, the references of record (e.g., including Wexler et al. (U.S. Patent No. 9,911,361)), do not disclose, teach, or suggest each and every element of the claimed subject matter of amended claim 1.
Examiner respectfully disagrees,  Wexler et al. cannot be solely relied upon to show that this is an improvement to the technology. The analysis for 35 U.S.C. 101 is separate from that of 35 U.S.C. 102 or 103 analysis. Furthermore, the amendments to the claim language necessitated a further prior art search and new references have been used to teach the claim limitations.
Applicant further asserts that the technological improvements of amended claim 1 (e.g., the technological improvements to computer- related and communication-related technologies of amended claim 1) provide enhancements over existing computer-related and communication-related technologies, including providing enhanced techniques that can assist people, such as people suffering from impaired vision, in part by enabling pieces of text information to be read aloud in descending order of relevance to a category of a location where a user is positioned, by setting a higher degree of priority to a piece of text information having higher relevance to the category, and thus, the text-to-speech device is capable of reading aloud pieces of text from an image in an appropriate sequence, wherein the degree of priority can be set based at least in part on the position of the extracted piece of text in the video data.
Examiner respectfully disagrees, improvement alone is not enough to overcome the 35 U.S.C. 101 rejections if the improved aspects are deemed to be mental processes being applied via generic computer components. As stated previously, the analysis for 35 U.S.C. 101 is separate from that of 35 U.S.C. 102 or 103 analysis. Furthermore, the newly updated 35 U.S.C. 103 rejections below detail prior art examples of such improvements already existing.
In regards to Step 2A - Prong 2 the applicant asserts that the cited art, including Wexler et al., do not disclose, teach, or suggest such elements of amended claim 1, and amended claim 1, as a whole, when considering such elements of amended claim 1, along with the other elements of amended claim 1, integrate the alleged abstract idea into a practical application and such additional elements of amended claim 1 amount to significantly more than the alleged abstract idea itself, and the technological improvements of amended claim 1 (e.g., the technological improvements to computer-related and communication-related technologies of claim 1) provide enhancements over existing computer-related and communication-related technologies, including providing enhanced techniques that can assist people, such as people suffering from impaired vision, in part by enabling pieces of text information to be read aloud in descending order of relevance to a category of a location where a user is positioned, by setting a higher degree of priority to a piece of text information having higher relevance to the category, and thus, the text-to-speech device is capable of reading aloud pieces of text from an image in an appropriate sequence, wherein the degree of priority can be set based at least in part on the position of the extracted piece of text in the video data.
Examiner respectfully disagrees, as stated above, the current claim language does not sufficiently separate the method from that of a mental process nor does it clearly demonstrate an improvement to the technology. As currently written the additional components are generic/general purpose computer components and/or components merely used for extra-solution activity. 

Regarding the rejections under 35 U.S.C. 102 for claims 1-5, the applicant asserts that Wexler et al. at least fails to disclose setting degrees of priority for the extracted pieces of text information based on a distance to a position of each piece of text information in the video data from a center of the video data, as recited in amended claim 1. Furthermore, Wexler et al. fails to disclose the features of amended claim 2. Lastly, Wexler et al. fails to disclose the aspects of amended claim 3.
Examiner states that these arguments are considered moot in view of an updated prior art search necessitated by the amendments to the claims. The claims have received new grounds for rejection under 35 U.S.C. 103. Details on the updated rejections can be found below.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) are: “an imaging unit that is configured to” in claim 1.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-5 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Claims 1, 4, and 5 recite A text-to-speech device, comprising: [an imaging unit] that is configured to capture a video of a region around a user to acquire video data including the video; a [positional information sensor] that is configured to measure positional information indicating a position of the user; [a memory] that is configured to store computer executable instructions: and at least one [processor] that is configured to execute the computer executable instructions to perform operations comprising: identifying a category of a location where the user is positioned, based on the positional information and map information related to an area including the position of the user indicated by the positional information; extracting pieces of text information representing pieces of text included in the video data, based on the video data; and setting degrees of priority for the extracted pieces of text information based on a distance to a position of each piece of text information in the video data from a center of the video data; and converting the extracted pieces of text information into voice to cause a [speaker] to output the voice, in descending order of the set degrees of priority
The limitations in these claims, as drafted, are a process that, under broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. The human mind is capable of looking at a photo/video and reading aloud text found in the video in some sort of priority order. Furthermore, the human mind is capable of guiding another person by reading aloud any text they find around them in some sort of priority order. For example, let’s say someone with a vision impairment shows a picture to their friend and asks them to read the signs posted in the picture. The friend would be able to use context information from the picture in order to tell where their friend was, for this example let’s say they were at a specific restaurant. The friend could look at the menu within the picture and read the menu aloud so their friend with the vision impairment knows what it said. Furthermore, when reading the menu, the friend could choose to read the items in the middle of the menu first if they wanted to. The text in the middle of the menu or any other image could be read first as often times things of larger importance are in the center of menus, pictures, or videos. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claims recite an abstract idea. 
This judicial exception is not integrated into a practical application. In particular, claims 1, 4, and 5 recite an imaging unit, a positional information sensor, a memory, a processor, and a speaker. These components would fall under the category of extra solution activity. The imaging unit is considered pre-solution activity as it merely acts as data gathering step meant to obtain the image data which is necessary to the operation of the system. The imaging unit is described in Page 20 of the specification as a general-purpose camera or device with optical elements. The positional information sensor is considered pre-solution activity as it merely acts as data gathering step meant to obtain the position data which is necessary to the operation of the system. The positional information sensor is described in Page 21, Paragraph 2 of the specification as a general-purpose GPS. The memory is considered a general-purpose computer component in which the method is applied by. The memory is described in Page 7, Paragraph 2 of the specification with a generic description of the component. The processor is considered a general-purpose computer component in which the method is applied by. The processor is described in Page 8, Paragraph 2 of the specification with a generic description of the component. The speaker is considered post-solution activity as it is a merely a presentment step of the audio generated by the system. The speaker is described in Page 22, Paragraph 2 of the specification with a generic description of the component. Claim 5 specifically lists the additional component of a non-transitory computer-readable storage medium. The non-transitory computer-readable storage medium is considered a general-purpose computer component in which the method is applied by. The storage medium is described on Page 32 of the specification in the first paragraph and is described using general-purpose examples of the component. Accordingly, the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claims are not patent eligible.

Claim 2 recites wherein the setting of the degrees of priority for the extracted pieces of text information comprises calculating a score indicating a degree of priority based on whether or not the extracted piece of text information is emitting light.
The limitation in this claim, as drafted, is a process that, under broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. The human mind is capable of prioritizing the order they read things aloud based on the light level or contrast level. For example, if identifying words in a picture a human might choose to ignore words with a low contrast level as they are less confident in what those words are. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. 
This judicial exception is not integrated into a practical application. The claim does not recite any additional elements that were not present in the independent claim. Accordingly, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.

Claim 3 recites wherein the setting of the degrees of priority for the extracted pieces of text information comprises, when a distance between a position at the time of extraction of a piece of text information extracted earlier and a position at the time of disappearance of the piece of text information is not in a predetermined range, deleting the piece of text information extracted earlier, from a text-to-speech list.
The limitation in this claim, as drafted, is a process that, under broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. The human mind is capable choosing not to read text that is no longer in view. For example, if guiding a visually impaired person through a city, one would choose not to bother reading signs to them that they have already passed in favor of informing them on what’s ahead. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. 
This judicial exception is not integrated into a practical application. The claim does not recite any additional elements that were not present in the independent claim. Accordingly, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1 and 3-5 are rejected under 35 U.S.C. 103 as being unpatentable over US Patent Publication US 9911361 B2 (Wexler et al.) in view of US Patent Publication US 11163378 B2 (Gralewicz).

Regarding Claims 1, 4, and 5, Wexler et al. teaches A text-to-speech device, comprising: 
(Also, the processor may be configured to cause the text to be read aloud from the level break associated with the trigger.) (Column 2, Lines 43-45).
Claim 4 poses an alternative limitation of A method of controlling a text-to-speech device, comprising:
(this disclosure relates to devices and methods for providing information to a user by processing images captured from the environment of the user.) (Column 1, Lines 20-23).
Claim 5 poses an alternative limitation of A non-transitory computer-readable storage medium storing a program causing a computer to execute
(More specifically, embodiments consistent with the present disclosure may provide an apparatus, a method, and a software product stored on a non-transitory computer readable medium for recognizing text on a curved surface.) (Column 124, Lines 34-38).
an imaging unit that is configured to capture a video of a region around a user to acquire video data including the video;
(In addition, sensory unit 120 may include an image sensor (not shown in FIG. 1) for capturing real-time image data of the field-of-view of user 100. The term “image data” includes any form of data retrieved from optical signals in the near-infrared, infrared, visible, and ultraviolet spectrums. The image data may be used to form video clips and/or photographs.) (Column 8, Lines 34-40).
The device contains an image sensor to capture data around the user which can include video clips.
a positional information sensor that is configured to measure positional information indicating a position of the user; 
(In some embodiments, the context information may comprise a geographical location based on GPS coordinates.) (Column 42, Lines 25-27).
The system uses GPS coordinates to help provide context to objects detected in the image
a memory that is configured to store computer executable instructions:
(The instructions executed by processor 540 may, for example, be pre-loaded into a memory integrated with or embedded into processor 540 or may be stored in a separate memory (e.g., memory 520).) (Col. 12, Lines 20-23).
and at least one processor that is configured to execute the computer executable instructions to perform operations comprising:
(The instructions executed by processor 540 may, for example, be pre-loaded into a memory integrated with or embedded into processor 540 or may be stored in a separate memory (e.g., memory 520).) (Col. 12, Lines 20-23).
identifying a category of a location where the user is positioned, based on the positional information and map information related to an area including the position of the user indicated by the positional information;
(By way of example, as illustrated in FIG. 34, a captured image 3400 may include an image of an exit door 3410, textual information corresponding to an exit sign 3420 (e.g., “EXIT TO STREET”) … In an embodiment, upon execution of the identified system command, apparatus 110 may access a positioning system (e.g., a GPS unit) to obtain a current position of apparatus 110, and access a mapping system to identify a street onto which exit door 3410 leads.) (Column 53, Lines 4-20).
Mapping information can be used in tandem with a GPS location to provide more precise context to the user’s location.
extracting pieces of text information representing pieces of text included in the video data, based on the video data; 
(In certain embodiments, at least one of the actions includes a text-based action performed on textual information located within the captured image data. By way of example, such test-based actions include, but are not limited to, optical character recognition (OCR) processes performed on portions of the textual information associated with the detected trigger, processes that generate audible representations of machine-readable text identified and retrieved from the textual information by the OCR processes, and various processes that summarize, process, translate, or store portions of the textual information and/or the machine-readable text.) (Column 78, Lines 5-16).
The system can identify text in images and use OCR techniques to extract the text.
and converting the extracted pieces of text information into voice to cause a speaker to output the voice, in descending order of the set degrees of priority 
(After processor 540 finishes the OCR process on text portion 716, process 800 proceeds to step 808, in which processor 540 initiates audible presentation of the recognized text of text portion 716. Processor 540 may execute software instructions stored in audible presentation module 620 to initiate the audible reading of the recognized text to user 100. For example, processor 540 may generate audible reading signals (e.g., analog and/or digital signals) based on the recognized text of text portion 716 using a text-to-voice algorithm stored in audible presentation module 620.) (Column 21, Lines 55-64).
(In some aspects, “contextual information” may include any information having a direct or indirect relationship with textual or non-textual information disposed within a field-of-view of sensory unit 120 of apparatus 110. By way of example, contextual information consistent with the disclosed embodiments may include, but is not limited to, a time or a location at which apparatus 110 captured a portion of textual and/or non-textual information, information identifying a type of document associated with captured image data (e.g., a newspaper, magazine, or web page), information indicative of one or more user preferences for an audible presentation of textual information, a location of a user, and any additional or alternate contextual information appropriate to the user, the textual information, and apparatus 110.) (Column 133, Lines 49-62).
(Further, in some aspects, “contextual rules” consistent with the disclosed embodiments may associate elements of contextual information with elements of textual information to be excluded from an audible presentation of the textual information.) (Column 133, Lines 63-67).
(Additionally or alternatively, contextual rules consistent with the disclosed embodiments may specify a presentation order for corresponding portions of textual information, and additionally or alternatively, may specify that one or more portions of the textual information are prioritized during audible presentation.) (Column 134, Lines 16-21).
Text-to-voice algorithms are used to output the text found in the images. Contextual information is provided for pieces of text. From the contextual information, contextual rules are made which can assign priority to pieces of text before audio presentation.
 Wexler et al. does not explicitly teach: and setting degrees of priority for the extracted pieces of text information based on a distance to a position of each piece of text information in the video data from a center of the video data;
However, Gralewicz teaches and setting degrees of priority for the extracted pieces of text information based on a distance to a position of each piece of text information in the video data from a center of the video data;

(In addition, the electronic device 100 may include an optical character recognition (OCR) module, and may perform the OCR function by using the OCR module. The OCR function is for reading characters by using light, which is a function of recognizing at least one text included in an image.) (Col. 4, Lines 59-67).
(For example, the electronic device 100 may set the highest priority for text recognized at a location closest to a center point of the image from among the recognized texts.) (Col. 5, Lines 10-12).
Gralewicz teaches a system that recognizes text within images and prioritizes the text based on its position relative to the center of the image.
It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to modify the video to text-to-speech device as taught by Wexler et al. to also include distance from the center of the image as a priority measurement for presentation as taught by Gralewicz. This would have been an obvious improvement optical character recognition can become cumbersome when large amounts of text are in an image, this allows the system to prioritize what it thinks is most important (Gralewicz, Col. 1, Lines 25-29).

Regarding Claim 3, Wexler et al. in view Gralewicz teaches the device of claim 1, 
Furthermore, Wexler et al. teaches wherein the setting of the degrees of priority for the extracted pieces of text information comprises, when a distance between a position at the time of extraction of a piece of text information extracted earlier and a position at the time of disappearance of the piece of text information is not in a predetermined range, deleting the piece of text information extracted earlier, from a text-to-speech list.
(One way apparatus 110 can assist persons who have low vision is by identifying relevant objects in an environment. For example, in some embodiments, processor 540 may execute one or more computer algorithms and/or signal-processing techniques to find objects relevant to user 100 in image data captured by sensory unit 120. The term “object” refers to any physical object, person, text, or surroundings in an environment.) (Col. 14, Lines 52-59).
(If tracking module 6920 determines that the object should continue to be tracked (step 7190—NO), process 7100 may return to step 7140, where tracking module 6920 may continue to monitor real-time images to follow the object. In this way, if the object changes states (for the first time and/or a subsequent time), apparatus 110 may provide feedback to the user. Process 7100 may continue through iterations of tracking an object and providing feedback each time it leaves a field-of-view and/or changes states, as long as tracking module 6920 determines that the object should continue to be tracked. However, if tracking module 6920 determines that tracking of the object should stop (step 7190—YES), apparatus 110 may stop performing processes associated with the object and process 7100 may end. For example, apparatus 110 may be configured to stop tracking an object after a predetermined number of changes in the state of the object has occurred, after a predetermined period of time, after a predetermined period of time without any change to the state of an object, etc.) (Col. 112, Lines 21-39)
Wexler et al. teaches stopping the tracking of an object (text) and thus lowering/removing it’s priority if it has left the field of view for a specified period of time.

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over US Patent Publication US 9911361 B2 (Wexler et al.) in view of US Patent Publication US 11163378 B2 (Gralewicz) and further in view of US Patent Publication US 10691984 B2 (Loginov et al.).

Regarding Claim 2, Wexler et al. in view of Gralewicz teaches the device of claim 1, 
While Wexler et al. in view of Gralewicz does teach prioritizing the order of a text-to-speech operation based on a plurality of factors, they do not explicitly teach: wherein the setting of the degrees of priority for the extracted pieces of text information comprises calculating a score indicating a degree of priority based on whether or not the extracted piece of text information is emitting light.
However, Loginov et al. teaches wherein the setting of the degrees of priority for the extracted pieces of text information comprises calculating a score indicating a degree of priority based on whether or not the extracted piece of text information is emitting light.
(Broadly speaking, the error level R is calculated as a proportion of signal to noise, where signal is represented as contrast of contour pixels and the noise is represented as contrast of the background pixels.) (Col. 16, Lines 28-31).
(In some embodiments of the present technology, the image noise determinator 114 executes a histogram-based analysis for determining the contrast value C.sub.i for the given block defined in step 303. More specifically, in some embodiments of the present technology, the image noise determinator 114 generates a brightness histogram. Using the brightness histogram, the image noise determinator 114 determines a minimum and a maximum value of the brightness value such that 0.1% of all pixels of a given block have brightness lesser than the minimum value and 0.1% of all pixels of the given block have brightness higher than the maximum.) (Col. 14, Lines 45-57).
(At step 406 the OCR suitability classifier 116 determines if the digital image 340 is suitable for OCR processing, based on the OCR suitability parameter. In some embodiments of the present technology the OCR suitability classifier 116 compares the so-determined OCR suitability parameter to a pre-determined threshold.) (Col. 20, Lines 36-41).
(At step 408, in response to the OCR suitability parameter being below the pre-determined threshold (the “NO” branch of step 406), the OCR suitability classifier 116 executes causing the user electronic device 102 to re-acquire the digital image 240.) (Col. 20, Lines 50-54).
Loginov et al. determines if images or sections of images are suitable for OCR by using a variety of metrics to determine noise. An error level is calculated to determine if OCR processing should be done as can be seen in Fig. 3. A level of light or contrast is used in this calculation. If it is deemed to be unsuitable for OCR then the image is not used. A flowchart of this process can be seen in Fig. 4
It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to modify the video to text-to-speech device as taught by Wexler et al. in view of Gralewicz to also prioritize text based on a light level or contrast level as taught by Loginov et al. This would have been an obvious improvement a lower contrast level can result in a higher quality OCR processing output (Loginov et al. Col. 3, Lines 53-67).

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICHOLAS DANIEL LOWEN whose telephone number is (571)272-5828. The examiner can normally be reached Mon-Fri 8:00am - 4:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Paras D Shah can be reached at (571) 270-1650. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NICHOLAS D LOWEN/Examiner, Art Unit 2653                                                                                                                                                                                                        
/Paras D Shah/Supervisory Patent Examiner, Art Unit 2653                                                                                                                                                                                                        
03/23/2026
Read full office action
Prosecution Timeline

Feb 06, 2024
Application Filed
Sep 23, 2025
Non-Final Rejection — §101, §103
Jan 08, 2026
Response Filed
Mar 23, 2026
Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/442,441
Patent 12592224
INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND COMPUTER PROGRAM PRODUCT
2y 5m to grant Granted Mar 31, 2026
18/064,095
Patent 12511494
SYSTEMS AND METHODS FOR FINETUNING WITH LEARNED HIDDEN REPRESENTATIONS OF PARAMETER CHANGES
2y 5m to grant Granted Dec 30, 2025
Study what changed to get past this examiner. Based on 2 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
62%
Grant Probability
99%
With Interview (+75.0%)
2y 7m
Median Time to Grant
Moderate
PTA Risk
Based on 8 resolved cases by this examiner. Grant probability derived from career allow rate.