Last updated: April 19, 2026
Application No. 18/748,175
METHODS AND SYSTEMS FOR IMPLEMENTING MULTI-CHANNEL SERVICE PLATFORMS OVER AUDIO-BASED COMMUNICATION CHANNELS

Non-Final OA §103
Filed
Jun 20, 2024
Examiner
SAUNDERS JR, JOSEPH
Art Unit
2692
Tech Center
2600 — Communications
Assignee
Polyview Health Inc.
OA Round
1 (Non-Final)
Interview Optional

— +20.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 740 resolved cases, 2023–2026
Examiner Intelligence

SAUNDERS JR, JOSEPH View full profile →
Grants 73% — above average
Career Allow Rate
538 granted / 740 resolved
+10.7% vs TC avg
Strong +21% interview lift
Without
With
+20.6%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
27 currently pending
Career history
767
Total Applications
across all art units
Statute-Specific Performance

§101
5.1%
-34.9% vs TC avg
§103
40.0%
+0.0% vs TC avg
§102
29.6%
-10.4% vs TC avg
§112
14.6%
-25.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 740 resolved cases
Office Action

§103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . This Office action is based on the communications filed June 20, 2024. Claims 1 – 20 are currently pending and considered below.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on February 24, 2025 and the IDS submitted on October 22, 2024 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.

Claim Objections
Claims 2, 3, 9, 10, 16, and 17 are objected to because of the following informalities: The aforementioned claims recite “where in” instead of “wherein”.  Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 5 – 8, 12 – 15, 19, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wohlert et al. (US 2015/0120293 A1), Wohlert, in view of Shevchenko et al. (US 10,922,483 B1), hereinafter Shevchenko.

Claim 1: Wohlert discloses a computer-implemented method comprising: 
intercepting a first audio segment over an audio channel of a communication session between a first user device and a second user device, wherein the first audio segment is transmitted by the second user device (see at least, “One embodiment of the subject disclosure includes a system having a memory to store executable instructions and a processor coupled with the memory. The processor, responsive to executing the executable instructions, can perform operations including receiving user speech captured at a second end user device during a communication session between the second end user device and a first end user device,” Wohlert [0014], “FIG. 1 depicts an illustrative embodiment of a system 100 that can utilize a multimedia accessibility platform 110 (hereinafter server 110) to facilitate a communication session between a first end user 101 utilizing an end user device 120 and a second end user 102 utilizing another end user device 120. The end user devices 120 can be various types of devices including smart phones, mobile devices, laptop computers, desktop computers, landline telephones, cordless telephones, set top boxes and/or any other communication device capable of engaging in a communication session to exchange or otherwise communicate voice, video and/or data. Platform 110 is described as a server, but it should be understood that the platform 110 can be implemented using any number of computing devices (e.g., a single server in a centralized system or multiple server in a distributed environment), any type of computing devices (e.g., a service provider server or a customer computing device), and/or any configuration of the computing device(s) (e.g., a server farm where one or more servers are in a master/slave arrangement with one or more other servers or a combination of service provider devices and customer equipment performing the multimedia accessibility platform functions),” Wohlert [0019]); 
identifying user profile data associated with the first user device (see at least, “The processor can access a second profile for a second user of the second end user device and can access a first profile for a first user of the first end user device. The processor can detect at least one of an undesirable speech trait associated with the second user according to the second profile or an impairment associated with the first user according to the first profile. The processor can apply speech recognition to the user speech responsive to the detecting of at least one of the undesirable speech trait or the impairment. The processor can identify an unclear word in the user speech based on the speech recognition,” Wohlert [0014], “Server 110 can determine accessibility requirements or desires of one or both the users 101, 102,” Wohlert [0020]); 
generating, based on the user profile data, a second audio segment, wherein the second audio segment is contextually related to the first audio segment (see at least, “The processor can adjust the user speech to generate adjusted user speech by replacing at least a portion of the unclear word with replacement audio content,” Wohlert [0014], “Server 110 can execute various processing functions ( e.g., text, audio, video and so forth) to implement accessibility adaptation. The accessibility adaptation can include adjustment of the multimedia content, adjusting the presentation of the multimedia content or otherwise making adjustments associated with the presentation of the multimedia content to facilitate the accessibility by the user to the content,” Wohlert [0020]); and 
transmitting, to the first user device, the second audio segment over the audio channel of the communication session (see at least, “The processor can provide the adjusted user speech to the first end user device during the communication session,” Wohlert [0014], “In one or more embodiments, adjustments to the content (e.g., user speech, music, graphics and so forth) can be made and provided to the end user device in a timely manner as part of the communication session so that any conversation or communication exchange is not disrupted,” Wohlert [0020]), wherein when received by the first user device, the second audio segment is presented over a portion of the first audio segment  (see at least, “One embodiment of the subject disclosure includes a computer-readable storage device comprising computer instructions which, responsive to being executed by a processor of a first end user device, causes the processor to perform operations including receiving adjusted user speech from a system that includes an application server during a communication session between the first end user device and a second end user device. The adjusted user speech can be generated from a modification of user speech captured at the second end user device during the communication session, where the modification is responsive to a detection of at least one of an impairment of a first user of the first end user device or an undesirable speech trait of a second user of the second end user device, and where the modification includes identifying an unclear word utilizing speech recognition and replacing a portion of the unclear word with replacement audio content without replacing a remainder of the unclear word. The processor can present the adjusted user speech at the first end user device,” Wohlert [0015], “As one example, the server 110 can monitor for and identify unclear words during a communication session and replace all or a portion of the unclear word with audio content (e.g., synthesized and/or recorded speech) as part of the communication session. The replacement of the unclear word or a portion thereof can be performed in a timely fashion so that the communication session is uninterrupted,” Wohlert [0021]).
Wohlert does not disclose using a machine-learning model. However, Shevchenko discloses in regards to a similar communication assistance involving adjusting user speech in a communication session (see at least, “In embodiments, a method of electronic communication assistance may include: intercepting a electronic communication at an artificial intelligence assistant computing facility, wherein the electronic communication was transmitted from a first electronic identifier associated with a first user to a second electronic identifier associated with a second user, the electronic communication comprising a communication content and comprising or associated with the first electronic identifier associated with the first user and the second electronic identifier associated with the second user; encoding the electronic communication for processing creating an encoded electronic communication; retrieving from a communication profile database a first communication profile for the first user using the first electronic identifier, wherein the first communication profile comprises a first user communication attribute; retrieving from the communication profile database a second communication profile for the second user using the second electronic identifier, wherein the second communication profile comprises a second user communication attribute that identifies a receiving communication preference; processing the encoded electronic communication with a processor to generate a modified electronic communication that is a modified version of the electronic communication, wherein the processor uses at least one of the communication content, the first user communication attribute, or the second user communication attribute to process the encoded electronic communication; and transmitting the modified electronic communication to the second electronic identifier,” Shevchenko Column 22 Lines 26 – 54) and further discloses using a machine-learning model (see at least, “The processor may generate the modified electronic communication derived at least in part from representations of previous electronic communications from a plurality of user profiles stored in the  communication profile database that are similar to at least one of the first communication profile or the second communication profile. The processor may be trained on large-scale data mixed with prior communication and effective communications from the plurality of user profiles. The processor may use at least one of a machine learning model, deep learning model, or statistical learning model for generating the modified electronic communication,” Shevchenko Column 23 Lines 6 – 17). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the machine learning model of Shevchenko in the invention of Wohlert thereby allowing for the advantage of “providing an artificial intelligent assistant to increase the effectiveness of communications,” Shevchenko Column 1 Lines 17 – 18, in the invention of Wohlert.

Claim 5: Wohlert and Shevchenko disclose the computer-implemented method of claim 1, wherein the second audio segment includes a definition of a word included in the first audio segment (see at least, “Transformations may include rewriting messages or documents using vocabulary and readability (e.g., splitting long sentences; replacing long or rare words with shorter, more common synonyms; replacing idioms with literal/universally understandable equivalent phrases, and the like) to a level that the user would understand, depending on their language proficiency level and background. Methods may include using manually and/or automatically curated dictionaries and reference sources as part of rule or statistical approaches,” Shevchenko Column 68 Lines 51 – 60).

Claim 6: Wohlert and Shevchenko disclose the computer-implemented method of claim 1, wherein the second audio segment includes an explanation of a word or phrase included in the first audio segment (see at least, “Transformations may include adding missing context, such as explaining terms, abbreviations, slang, idioms, and the like, that can be unfamiliar to the user,” Shevchenko Column 69 Lines 22 – 24).

Claim 7: Wohlert and Shevchenko disclose the computer-implemented method of claim 1, wherein a filter of the audio channel prevents the second audio segment from being received by the second user device (see at least, “In embodiments, the processor may generate the modified electronic communication by removing or replacing language from the electronic communication based at least in part on the second user communication attribute. The removed or replaced language may be offensive or abusive language,” Shevchenko Column 22 Lines 55 – 60).

Claims 8 and 12 – 14 are directed to a system comprising: one or more processors and a non-transitory computer-readable medium storing instructions that when executed by the one or more processors cause the one or more processors to perform operations substantially similar in scope to claims 1 and 5 – 7, respectively, and therefore are rejected for the same reasons (see also at least, “FIG. 15 depicts an exemplary diagrammatic representation of a machine in the form of a computer system 1500
within which a set of instructions, when executed, may cause the machine to perform any one or more of the methods describe above. One or more instances of the machine can operate, for example, as the server 110, 1230, 1317 and other devices of FIGS. 1-9 and 11-14 in order to perform accessibility
adjustments. In some embodiments, the machine may be connected ( e.g., using a network 1526) to other machines. In a networked deployment, the machine may operate in the capacity of a server or a client user machine in server-client user network environment, or as a peer machine in a peer-to-peer
(or distributed) network environment,” Wohlert [0109], “The disk drive unit 1516 may include a tangible computer-readable storage medium 1522 on which is stored one or more sets of instructions (e.g., software 1524) embodying any one or more of the methods or functions described herein, including those methods illustrated above. The instructions 1524 may also reside, completely or at least partially, within the main memory 1504, the static memory 1506, and/or within the processor 1502 during execution thereof by the computer system 1500. The main memory 1504 and the processor 1502 also may constitute tangible computer-readable storage media,” Wohlert [0112]).

Claims 15, 19, and 20 are directed to a non-transitory computer-readable medium storing instructions that when executed by one or more processors cause the one or more processors to perform operations substantially similar in scope to claims 1, 5, and 6, respectively, and therefore are rejected for the same reasons (see also at least, “The disk drive unit 1516 may include a tangible computer-readable storage medium 1522 on which is stored one or more sets of instructions (e.g., software 1524) embodying any one or more of the methods or functions described herein, including those methods illustrated above. The instructions 1524 may also reside, completely or at least partially, within the main memory 1504, the static memory 1506, and/or within the processor 1502 during execution thereof by the computer system 1500. The main memory 1504 and the processor 1502 also may constitute tangible computer-readable storage media,” Wohlert [0112]).

Claim(s) 2 – 4, 9 – 11, and 16 – 18  is/are rejected under 35 U.S.C. 103 as being unpatentable over Wohlert and Shevchenko in view of Aue et al. (US 2015/0347399 A1), hereinafter Aue.

Claim 2: Wohlert and Shevchenko disclose the computer-implemented method of claim 1, but do not disclose where in the second audio segment is configured to be presented at an offset from the first audio segment. However, Aue discloses in regards to a similar invention for assisting with communication wherein the second audio segment is configured to be presented at an offset from the first audio segment (see at least, “In such embodiments, at the top level is the "bot," which appears to users of the chat system just as a regular human network member would. The bot intercepts audio stream(s) from all the users who speak its source language ( e.g. 104a ), and passes them on to a speech-to-text translation system (audio translator 404). The output of the speech-to-text translation system is target language text. The bot then communicates the target language information to the target language user(s) 104b,” Aue [0059], “The text can also be passed to a text-to-speech component (text-to-speech converter 410), which renders the target language text as an audio signal which can either replace the speaker's original audio signal or else be mixed with it,” Aue [0064], “Translation can either be turn-based (the Bot waits until the user pauses or indicates in some other way that their utterance is complete, like, say, clicking button, then communicates the target language information) or simultaneous – that is, substantially contemporaneous with the source speech (the Bot begins to communicate the target language information the moment it has enough text to produce semantically
and syntactically coherent output). The former uses Voice Activation Detection to determine when to commence translating a preceding portion of speech (translation being per interval of detected speech activity); the latter uses voice activation detection and an automatic segmentation component
(being performed, for each interval of detected speech activity, on a per segment of that interval, which may have one or more segments). As will be appreciated, components for performing such functions are readily available. In the turn-based scenario the use of a bot acting as a third party virtual translator in the call would aid the users by framing them in a common real world scenario with a translator (such as one might have in a courtroom); simultaneous translation is analogous to a human simultaneous interpreter (e.g. such as one encounters in the European Parliament or the UN). Thus, both provide an intuitive translation experience for the target user(s),” Aue [0065]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the aforementioned features of Aue in the invention of Wohlert and Shevchenko thereby aiding comprehension (see at least, “At step S514, the synthetic audio is supplied to the mixer 412 where it is mixed with Alice's original audio (comprising her original, natural speech) to generate a mixed audio
stream comprising both the synthetic translated speech in the target language and the original natural speech in the source language, which is transmitted to Bob via the network 106 (S516) for outputting via the audio output device(s) of his user device as part of the call. Bob can thus gauge Alice's tone etc. from the natural speech ( even if he doesn't understand it), whilst grasping the meaning from the synthetic speech resulting in a more natural communication. That is, the system can also transmit Alice's untranslated audio as well as the translated audio. Further, even when the target user does not understand the source language, there is still information to be gleaned there from e.g. intonation (they may be able to tell whether the source speaker is asking a question, for instance),” Aue [0076]).

Claim 3: Wohlert and Shevchenko disclose the computer-implemented method of claim 1, but does not disclose where in the second audio segment is configured to be presented at a different volume than the first audio segment. However, Aue discloses in regards to a similar invention for assisting with communication the second audio segment is configured to be presented at a different volume than the first audio segment (see at least, “Alternatively, the automatic translation may be performed on a per-word or per several word basis and e.g. outputted whilst Alice's speech is still ongoing and being
heard by Bob e.g. as subtitles displayed on Bob's device and/or as audio played out over the top of Alice's natural speech (e.g. with the volume of Alice's speech reduced relative to the audible translation). This may result in a more responsive user experience for Bob as the translation is generated
in near-real-time (e.g. with a less than approx. 2 second response time). The two can also be combined; for instance the intermediate results of the (translated) speech recognition system may be displayed on screen, enabling them to be edited as the best hypothesis changes as the sentence goes on, and the translation of the best hypothesis then translated into audio (see below),” Aue [0038], “In such embodiments, at the top level is the "bot," which appears to users of the chat system just as a regular human network member would. The bot intercepts audio stream(s) from all the users who speak its source language ( e.g. 104a ), and passes them on to a speech-to-text translation system (audio translator 404). The output of the speech-to-text translation system is target language text. The bot then communicates the target language information to the target language user(s) 104b,” Aue [0059], “The text can also be passed to a text-to-speech component (text-to-speech converter 410), which renders the target language text as an audio signal which can either replace the speaker's original audio signal or else be mixed with it,” Aue [0064], “Translation can either be turn-based (the Bot waits until the user pauses or indicates in some other way that their utterance is complete, like, say, clicking button, then communicates the target language information) or simultaneous – that is, substantially contemporaneous with the source speech (the Bot begins to communicate the target language information the moment it has enough text to produce semantically and syntactically coherent output). The former uses Voice Activation Detection to determine when to commence translating a preceding portion of speech (translation being per interval of detected speech activity); the latter uses voice activation detection and an automatic segmentation component (being performed, for each interval of detected speech activity, on a per segment of that interval, which may have one or more segments). As will be appreciated, components for performing such functions are readily available. In the turn-based scenario the use of a bot acting as a third party virtual translator in the call would aid the users by framing them in a common real world scenario with a translator (such as one might have in a courtroom); simultaneous translation is analogous to a human simultaneous interpreter (e.g. such as one encounters in the European Parliament or the UN). Thus, both provide an intuitive translation experience for the target user(s),” Aue [0065]).). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the aforementioned features of Aue in the invention of Wohlert and Shevchenko thereby aiding comprehension (see at least, “At step S514, the synthetic audio is supplied to the mixer 412 where it is mixed with Alice's original audio (comprising her original, natural speech) to generate a mixed audio stream comprising both the synthetic translated speech in the target language and the original natural speech in the source language, which is transmitted to Bob via the network 106 (S516) for outputting via the audio output device(s) of his user device as part of the call. Bob can thus gauge Alice's tone etc. from the natural speech (even if he doesn't understand it), whilst grasping the meaning from the synthetic speech resulting in a more natural communication. That is, the system can also transmit Alice's untranslated audio as well as the translated audio. Further, even when the target user does not understand the source language, there is still information to be gleaned there from e.g. intonation (they may be able to tell whether the source speaker is asking a question, for instance),” Aue [0076]).

Claim 4: Wohlert and Shevchenko disclose the computer-implemented method of claim 1, but do not disclose wherein the first audio segment includes a set of words spoken by a user of the second user device in a first language, and wherein the second audio segment includes a translation of the set of words in a second language. However, Aue discloses in regards to a similar invention for assisting with communication wherein the first audio segment includes a set of words spoken by a user of the second user device in a first language, and wherein the second audio segment includes a translation of the set of words in a second language (see at least, “In such embodiments, at the top level is the "bot," which appears to users of the chat system just as a regular human network member would. The bot intercepts audio stream(s) from all the users who speak its source language ( e.g. 104a ), and passes them on to a speech-to-text translation system (audio translator 404). The output of the speech-to-text translation system is target language text. The bot then communicates the target language information to the target language user(s) 104b,” Aue [0059], “The text can also be passed to a text-to-speech component (text-to-speech converter 410), which renders the target language text as an audio signal which can either replace the speaker's original audio signal or else be mixed with it,” Aue [0064], “Translation can either be turn-based (the Bot waits until the user pauses or indicates in some other way that their utterance is complete, like, say, clicking button, then communicates the target language information) or simultaneous – that is, substantially contemporaneous with the source speech (the Bot begins to communicate the target language information the moment it has enough text to produce semantically
and syntactically coherent output). The former uses Voice Activation Detection to determine when to commence translating a preceding portion of speech (translation being per interval of detected speech activity); the latter uses voice activation detection and an automatic segmentation component
(being performed, for each interval of detected speech activity, on a per segment of that interval, which may have one or more segments). As will be appreciated, components for performing such functions are readily available. In the turn-based scenario the use of a bot acting as a third party virtual translator in the call would aid the users by framing them in a common real world scenario with a translator (such as one might have in a courtroom); simultaneous translation is analogous to a human simultaneous interpreter (e.g. such as one encounters in the European Parliament or the UN). Thus, both provide an intuitive translation experience for the target user(s),” Aue [0065]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the aforementioned features of Aue in the invention of Wohlert and Shevchenko thereby aiding comprehension (see at least, “At step S514, the synthetic audio is supplied to the mixer 412 where it is mixed with Alice's original audio (comprising her original, natural speech) to generate a mixed audio
stream comprising both the synthetic translated speech in the target language and the original natural speech in the source language, which is transmitted to Bob via the network 106 (S516) for outputting via the audio output device(s) of his user device as part of the call. Bob can thus gauge Alice's tone etc. from the natural speech ( even if he doesn't understand it), whilst grasping the meaning from the synthetic speech resulting in a more natural communication. That is, the system can also transmit Alice's untranslated audio as well as the translated audio. Further, even when the target user does not understand the source language, there is still information to be gleaned there from e.g. intonation (they may be able to tell whether the source speaker is asking a question, for instance),” Aue [0076]).

Claims 9 – 11 are substantially similar in scope to claims 2 – 4, respectively, and therefore are rejected for the same reasons.

Claims 16 – 18 are substantially similar in scope to claims 2 – 4, respectively, and therefore are rejected for the same reasons.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOSEPH SAUNDERS whose telephone number is (571)270-1063. The examiner can normally be reached Monday-Thursday, 9:00 a.m. - 4 p.m., EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Carolyn R Edwards can be reached at (571)270-7136. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JOSEPH SAUNDERS JR/Primary Examiner, Art Unit 2692   
/CAROLYN R EDWARDS/Supervisory Patent Examiner, Art Unit 2692
Read full office action
Prosecution Timeline

Jun 20, 2024
Application Filed
Jan 10, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/976,804
Patent 12596883
Audio Analysis for Text Generation
2y 5m to grant Granted Apr 07, 2026
18/284,588
Patent 12598420
AUDIO DEVICE WITH ELECTROSTATIC DISCHARGE PROTECTION
2y 5m to grant Granted Apr 07, 2026
18/424,655
Patent 12593190
User Experience Localizing Binaural Sound During a Telephone Call
2y 5m to grant Granted Mar 31, 2026
18/340,728
Patent 12585425
Light-function audio parameters
2y 5m to grant Granted Mar 24, 2026
18/465,496
Patent 12585422
DATA PROCESSING METHOD OF PROCESSING MULTITRACK AUDIO DATA AND DATA PROCESSING APPARATUS
2y 5m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
73%
Grant Probability
93%
With Interview (+20.6%)
2y 9m
Median Time to Grant
Low
PTA Risk
Based on 740 resolved cases by this examiner. Grant probability derived from career allow rate.