DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant's claim for foreign priority based on an application filed in India on 01/12/2024. It is noted, however, that applicant has not filed a certified copy of IN02441002556 application as required by 37 CFR 1.55.
An attempt by the Office to electronically retrieve, under the priority document exchange program, the foreign application 02441002556 to which priority is claimed has FAILED on 06/12/2025. Therefore, there is currently no certified copy in the file.
Response to Arguments
Applicant's arguments filed December 31, 2025 have been fully considered but they are not persuasive.
The Applicant argues that neither Lim et al. nor Park, alone or in combination, teaches or suggests the features of “wherein the one or more secondary events are contextually related to the primary event.” The Examiner respectfully disagrees. Park discloses the streaming service apparatus 200 divides the execution screen of the content into the main area and the auxiliary area according to the contents of the contents. The main area and the auxiliary area are set in consideration of the intention of the content creator (paragraphs [0023] and [0033]). The main domain corresponds to an essential part of the contents of the contents to grasp the contents of the contents. For example, if the content is a movie or a TV drama, it may be a person who is talking about the present dialogue, or a person or an object that is focused. In the case where the content is a streaming game, the user's character, the object selected by the user, and the target for proceeding the game can be the main areas (paragraph [0024]). The subarea corresponds to a portion of content that is less important than the main region of the content - for example, if the content is a movie or a TV drama, it may be an auxiliary person, simple background not related to the drama flow (paragraph [0025]). The primary event and the second events are determined based on the determined essential parts of the scene that help the viewer understand the content. For example, the person’s gaze in the scene or even the dialogue being spoken. Therefore, Park teaches that the one or more secondary events are contextually related to the primary event. Once that concept is combined with the teachings of Lim et al., all relevant regions of interest are further enhanced accordingly. Therefore, the combination of Lim et al. and Park meet all the claimed limitations and the rejection is maintained. The Examiner suggests adding the contextual inputs from the user found in the Applicant’s invention would help distinguish the claims from the previously cited prior art references.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 2, 8-10, 12, 13, 16-18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Lim et al. (U.S. Patent Application Publication 2020/0210766) in view of Park (KR20180025367A).
Regarding claim 1, Lim et al. discloses an aspect ratio based method for displaying a video, the aspect ratio based method comprising: identifying a primary region, within each of video frames of a video, based on an analysis of the video frames (paragraph [0087] – when information on an area of interest is acquired from a learning network model, the processor 120 may retarget an input image frame by applying the first conversion weight to pixels corresponding to the area of interest, and applying the second conversion weight to pixels corresponding to the remaining area (or an area of non-interest) – the retargeting information may include the aspect ratio of the input image frame and the aspect ratio of the output image frame); determining a first aspect ratio in which the video is displayed on at least one of an electronic device or one or more applications (paragraph [0087] – when information on an area of interest is acquired from a learning network model, the processor 120 may retarget an input image frame by applying the first conversion weight to pixels corresponding to the area of interest, and applying the second conversion weight to pixels corresponding to the remaining area (or an area of non-interest) – the retargeting information may include the aspect ratio of the input image frame and the aspect ratio of the output image frame); predicting, using an Artificial Intelligence (AI) model, positions of the primary region, based on the determined first aspect ratio (paragraph [0087] – when information on an area of interest is acquired from a learning network model, the processor 120 may retarget an input image frame by applying the first conversion weight to pixels corresponding to the area of interest, and applying the second conversion weight to pixels corresponding to the remaining area (or an area of non-interest) – the retargeting information may include the aspect ratio of the input image frame and the aspect ratio of the output image frame); obtaining frames matching the determined first aspect ratio and having the predicted positions of the primary region (paragraph [0087] – when information on an area of interest is acquired from a learning network model, the processor 120 may retarget an input image frame by applying the first conversion weight to pixels corresponding to the area of interest, and applying the second conversion weight to pixels corresponding to the remaining area (or an area of non-interest) – the retargeting information may include the aspect ratio of the input image frame and the aspect ratio of the output image frame); and displaying the video having the obtained frames and the determined first aspect ratio (paragraph [0057] – the memory 110 stores an instruction that controls the processor 120 to acquire an output image frame based on information on an area of interest acquired by applying an input image frame on a learning network model – here, the learning network model may be a model that is trained to acquire information on an area of interest in an input image frame; paragraph [0087] – when information on an area of interest is acquired from a learning network model, the processor 120 may retarget an input image frame by applying the first conversion weight to pixels corresponding to the area of interest, and applying the second conversion weight to pixels corresponding to the remaining area (or an area of non-interest) – the retargeting information may include the aspect ratio of the input image frame and the aspect ratio of the output image frame). However, Lim et al. fails to disclose identifying a primary event in a primary region and one or more secondary events in one or more secondary regions, within each of video frames of a video, based on an analysis of the video frames, wherein the one or more secondary events are contextually related to the primary event; and obtaining a semantic relationship between identified primary event and the one or more secondary events.
Referring to the Park reference, Park discloses a method for displaying a video, the method comprising: identifying a primary event in a primary region and one or more secondary events in one or more secondary regions, within each of video frames of a video, based on an analysis of the video frames, wherein the one or more secondary events are contextually related to the primary event (paragraph [0023] – the streaming service apparatus 200 divides the execution screen of the content into the main area and the auxiliary area according to the contents of the contents – the main area and the auxiliary area are set in consideration of the intention of the content creator; paragraph [0024] - the main domain corresponds to an essential part of the contents of the contents to grasp the contents of the contents - for example, if the content is a movie or a TV drama, it may be a person who is talking about the present dialogue, or a person or an object that is focused - in the case where the content is a streaming game, the user's character, the object selected by the user, and the target for proceeding the game can be the main areas; paragraph [0025] - the subarea corresponds to a portion of content that is less important than the main region of the content - for example, if the content is a movie or a TV drama, it may be an auxiliary person, simple background not related to the drama flow; paragraph [0033] – when the content is executed, the streaming device 200 divides the generated execution screen by executing the content (S204) – at this time, the streaming service apparatus 200 can divide the execution screen based on the contents of the contents. The location, size, and shape of the image are also set based on the contents of the content – the execution screen is divided into a main area and a sub area according to the contents of the contents – the main area and the auxiliary area of the content can be arbitrarily set by the content creator or the administrator of the streamlining service device 200 - in the case where the content is a movie or a TV drama, it is also possible to recognize a face of the actor or a close-up object by using a screen recognition technology and set the corresponding part as a main area – when the content is a game, the main area and the auxiliary area can be divided by referring to the game data); and obtaining a semantic relationship between identified primary event and the one or more secondary events (paragraph [0023] – the streaming service apparatus 200 divides the execution screen of the content into the main area and the auxiliary area according to the contents of the contents – the main area and the auxiliary area are set in consideration of the intention of the content creator; paragraph [0024] - the main domain corresponds to an essential part of the contents of the contents to grasp the contents of the contents - for example, if the content is a movie or a TV drama, it may be a person who is talking about the present dialogue, or a person or an object that is focused - in the case where the content is a streaming game, the user's character, the object selected by the user, and the target for proceeding the game can be the main areas; paragraph [0025] - the subarea corresponds to a portion of content that is less important than the main region of the content - for example, if the content is a movie or a TV drama, it may be an auxiliary person, simple background not related to the drama flow; paragraph [0033] – when the content is executed, the streaming device 200 divides the generated execution screen by executing the content (S204) – at this time, the streaming service apparatus 200 can divide the execution screen based on the contents of the contents. The location, size, and shape of the image are also set based on the contents of the content – the execution screen is divided into a main area and a sub area according to the contents of the contents – the main area and the auxiliary area of the content can be arbitrarily set by the content creator or the administrator of the streamlining service device 200 - in the case where the content is a movie or a TV drama, it is also possible to recognize a face of the actor or a close-up object by using a screen recognition technology and set the corresponding part as a main area – when the content is a game, the main area and the auxiliary area can be divided by referring to the game data).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have had identified a primary event in a primary region and one or more secondary events in one or more secondary regions, within each of video frames of a video, based on an analysis of the video frames, wherein the one or more secondary events are contextually related to the primary event; and obtained a semantic relationship between identified primary event and the one or more secondary events as disclosed by Park in the method disclosed by Lim et al. in order to ensure that the regions of interest are further enhanced.
Regarding claim 2, Lim et al. in view of Park discloses all of the limitations as previously discussed with respect to claim 1 including that the aspect ratio based method further comprises identifying the primary event and the one or more secondary events, based on an analysis of at least one of an audio of the video frames, the video frames, and a plurality of multi-modal contextual inputs (Lim et al.: paragraph [0057] – the memory 110 stores an instruction that controls the processor 120 to acquire an output image frame based on information on an area of interest acquired by applying an input image frame on a learning network model – here, the learning network model may be a model that is trained to acquire information on an area of interest in an input image frame; Park: paragraph [0023] – the streaming service apparatus 200 divides the execution screen of the content into the main area and the auxiliary area according to the contents of the contents – the main area and the auxiliary area are set in consideration of the intention of the content creator; paragraph [0024] - the main domain corresponds to an essential part of the contents of the contents to grasp the contents of the contents - for example, if the content is a movie or a TV drama, it may be a person who is talking about the present dialogue, or a person or an object that is focused - in the case where the content is a streaming game, the user's character, the object selected by the user, and the target for proceeding the game can be the main areas; paragraph [0025] - the subarea corresponds to a portion of content that is less important than the main region of the content - for example, if the content is a movie or a TV drama, it may be an auxiliary person, simple background not related to the drama flow).
Regarding claim 8, Lim et al. in view of Park discloses all of the limitations as previously discussed with respect to claim 1 including that wherein obtaining the semantic relationship between the identified primary event and the one or more secondary events comprises: identifying at least one of one or more objects, one or more faces, an orientation of a head of one or more users, gaze angles of the one or more users in the primary event in the primary region and in the one or more secondary events in the one or more secondary regions based on a performance of the analysis of the video; and obtaining, based on a result of the detection, the semantic relationship between each of the primary event and the one or more secondary events with respect to a plurality of semantic relationships parameters, wherein the plurality of semantic relationship parameters comprise proximity of the identified one or more objects and the one or more faces with respect to a camera, the gaze angles of the one or more users, a pixel displacement in the primary region and the one or more secondary regions, a visual similarity in the primary event, the visual similarity in the one or more secondary regions (Lim et al.: paragraph [0057] – the memory 110 stores an instruction that controls the processor 120 to acquire an output image frame based on information on an area of interest acquired by applying an input image frame on a learning network model – here, the learning network model may be a model that is trained to acquire information on an area of interest in an input image frame; Park: paragraph [0023] – the streaming service apparatus 200 divides the execution screen of the content into the main area and the auxiliary area according to the contents of the contents – the main area and the auxiliary area are set in consideration of the intention of the content creator; paragraph [0024] - the main domain corresponds to an essential part of the contents of the contents to grasp the contents of the contents - for example, if the content is a movie or a TV drama, it may be a person who is talking about the present dialogue, or a person or an object that is focused - in the case where the content is a streaming game, the user's character, the object selected by the user, and the target for proceeding the game can be the main areas; paragraph [0025] - the subarea corresponds to a portion of content that is less important than the main region of the content - for example, if the content is a movie or a TV drama, it may be an auxiliary person, simple background not related to the drama flow; paragraph [0033] – when the content is executed, the streaming device 200 divides the generated execution screen by executing the content (S204) – at this time, the streaming service apparatus 200 can divide the execution screen based on the contents of the contents. The location, size, and shape of the image are also set based on the contents of the content – the execution screen is divided into a main area and a sub area according to the contents of the contents – the main area and the auxiliary area of the content can be arbitrarily set by the content creator or the administrator of the streamlining service device 200 - in the case where the content is a movie or a TV drama, it is also possible to recognize a face of the actor or a close-up object by using a screen recognition technology and set the corresponding part as a main area – when the content is a game, the main area and the auxiliary area can be divided by referring to the game data).
Regarding claim 9, Lim et al. in view of Park discloses all of the limitations as previously discussed with respect to claim 1 including that wherein the first aspect ratio is determined based on a second aspect ratio of at least one of the display of the device for displaying the video or the one or more applications (Lim et al.: paragraph [0005] – provided are an image processing apparatus that is capable of acquiring an output image while minimizing image distortion through detection of an area of an area of interest and adjusting an aspect ratio of an input image, and an image processing method thereof; paragraph [0087] – the retargeting information may include the aspect ratio of the input image frame and the aspect ratio of the output image frame).
Regarding claim 10, Lim et al. in view of Park discloses all of the limitations as previously discussed with respect to claim 1 including that wherein obtaining the frames comprises: obtaining background features of the primary event and each of the one or more secondary events based on the semantic relationship and an assigned second priority; determining a plurality of aesthetic effects for the primary event and each of the one or more secondary events based on the background features and an event score from the semantic relationship; and obtaining frames matching with the determined first aspect ratio and having the predicted positions of the primary event and the one or more secondary events along with the determined plurality of aesthetic effects (Lim et al.: paragraph [0057] – the memory 110 stores an instruction that controls the processor 120 to acquire an output image frame based on information on an area of interest acquired by applying an input image frame on a learning network model – here, the learning network model may be a model that is trained to acquire information on an area of interest in an input image frame; Park: paragraph [0023] – the streaming service apparatus 200 divides the execution screen of the content into the main area and the auxiliary area according to the contents of the contents – the main area and the auxiliary area are set in consideration of the intention of the content creator; paragraph [0024] - the main domain corresponds to an essential part of the contents of the contents to grasp the contents of the contents - for example, if the content is a movie or a TV drama, it may be a person who is talking about the present dialogue, or a person or an object that is focused - in the case where the content is a streaming game, the user's character, the object selected by the user, and the target for proceeding the game can be the main areas; paragraph [0025] - the subarea corresponds to a portion of content that is less important than the main region of the content - for example, if the content is a movie or a TV drama, it may be an auxiliary person, simple background not related to the drama flow; paragraph [0028] – the streaming service apparatus 200 may not only divide the execution screen of the content into the main area and the auxiliary area but also divide the main figure in stages and set the encoding quality in stages according to the steps of the main figure; paragraph [0033] – when the content is executed, the streaming device 200 divides the generated execution screen by executing the content (S204) – at this time, the streaming service apparatus 200 can divide the execution screen based on the contents of the contents. The location, size, and shape of the image are also set based on the contents of the content – the execution screen is divided into a main area and a sub area according to the contents of the contents – the main area and the auxiliary area of the content can be arbitrarily set by the content creator or the administrator of the streamlining service device 200 - in the case where the content is a movie or a TV drama, it is also possible to recognize a face of the actor or a close-up object by using a screen recognition technology and set the corresponding part as a main area – when the content is a game, the main area and the auxiliary area can be divided by referring to the game data; plurality of aesthetic effects – aspect ratio and quality/resolution).
Regarding claim 12, Lim et al. discloses an aspect ratio based an electronic device for displaying a video, the aspect ratio based the electronic device comprising one or more processors configured to: identify a primary region within each video frames of a video based on an analysis of the video frames (paragraph [0087] – when information on an area of interest is acquired from a learning network model, the processor 120 may retarget an input image frame by applying the first conversion weight to pixels corresponding to the area of interest, and applying the second conversion weight to pixels corresponding to the remaining area (or an area of non-interest) – the retargeting information may include the aspect ratio of the input image frame and the aspect ratio of the output image frame); determine a first aspect ratio in which the video is to be displayed on at least one of an electronic device or one or more applications (paragraph [0087] – when information on an area of interest is acquired from a learning network model, the processor 120 may retarget an input image frame by applying the first conversion weight to pixels corresponding to the area of interest, and applying the second conversion weight to pixels corresponding to the remaining area (or an area of non-interest) – the retargeting information may include the aspect ratio of the input image frame and the aspect ratio of the output image frame); predict, using an Artificial Intelligence (AI) model, positions of the primary region, based on the determined first aspect ratio (paragraph [0087] – when information on an area of interest is acquired from a learning network model, the processor 120 may retarget an input image frame by applying the first conversion weight to pixels corresponding to the area of interest, and applying the second conversion weight to pixels corresponding to the remaining area (or an area of non-interest) – the retargeting information may include the aspect ratio of the input image frame and the aspect ratio of the output image frame); obtain frames matching the determined first aspect ratio and having the predicted positions of the primary region (paragraph [0087] – when information on an area of interest is acquired from a learning network model, the processor 120 may retarget an input image frame by applying the first conversion weight to pixels corresponding to the area of interest, and applying the second conversion weight to pixels corresponding to the remaining area (or an area of non-interest) – the retargeting information may include the aspect ratio of the input image frame and the aspect ratio of the output image frame); and displaying the video with the obtained frames with the determined first aspect ratio (paragraph [0057] – the memory 110 stores an instruction that controls the processor 120 to acquire an output image frame based on information on an area of interest acquired by applying an input image frame on a learning network model – here, the learning network model may be a model that is trained to acquire information on an area of interest in an input image frame; paragraph [0087] – when information on an area of interest is acquired from a learning network model, the processor 120 may retarget an input image frame by applying the first conversion weight to pixels corresponding to the area of interest, and applying the second conversion weight to pixels corresponding to the remaining area (or an area of non-interest) – the retargeting information may include the aspect ratio of the input image frame and the aspect ratio of the output image frame). However, Lim et al. fails to disclose identifying a primary event in a primary region and one or more secondary events in one or more secondary regions, within each of video frames of a video, based on an analysis of the video frames, wherein the one or more secondary events are contextually related to the primary event; and obtaining a semantic relationship between identified primary event and the one or more secondary events.
Referring to the Park reference, Park discloses an electronic device for displaying a video, the electronic device comprising one or more processors configured to: identifying a primary event in a primary region and one or more secondary events in one or more secondary regions, within each of video frames of a video, based on an analysis of the video frames, wherein the one or more secondary events are contextually related to the primary event (paragraph [0023] – the streaming service apparatus 200 divides the execution screen of the content into the main area and the auxiliary area according to the contents of the contents – the main area and the auxiliary area are set in consideration of the intention of the content creator; paragraph [0024] - the main domain corresponds to an essential part of the contents of the contents to grasp the contents of the contents - for example, if the content is a movie or a TV drama, it may be a person who is talking about the present dialogue, or a person or an object that is focused - in the case where the content is a streaming game, the user's character, the object selected by the user, and the target for proceeding the game can be the main areas; paragraph [0025] - the subarea corresponds to a portion of content that is less important than the main region of the content - for example, if the content is a movie or a TV drama, it may be an auxiliary person, simple background not related to the drama flow; paragraph [0033] – when the content is executed, the streaming device 200 divides the generated execution screen by executing the content (S204) – at this time, the streaming service apparatus 200 can divide the execution screen based on the contents of the contents. The location, size, and shape of the image are also set based on the contents of the content – the execution screen is divided into a main area and a sub area according to the contents of the contents – the main area and the auxiliary area of the content can be arbitrarily set by the content creator or the administrator of the streamlining service device 200 - in the case where the content is a movie or a TV drama, it is also possible to recognize a face of the actor or a close-up object by using a screen recognition technology and set the corresponding part as a main area – when the content is a game, the main area and the auxiliary area can be divided by referring to the game data); and obtaining a semantic relationship between identified primary event and the one or more secondary events (paragraph [0023] – the streaming service apparatus 200 divides the execution screen of the content into the main area and the auxiliary area according to the contents of the contents – the main area and the auxiliary area are set in consideration of the intention of the content creator; paragraph [0024] - the main domain corresponds to an essential part of the contents of the contents to grasp the contents of the contents - for example, if the content is a movie or a TV drama, it may be a person who is talking about the present dialogue, or a person or an object that is focused - in the case where the content is a streaming game, the user's character, the object selected by the user, and the target for proceeding the game can be the main areas; paragraph [0025] - the subarea corresponds to a portion of content that is less important than the main region of the content - for example, if the content is a movie or a TV drama, it may be an auxiliary person, simple background not related to the drama flow; paragraph [0033] – when the content is executed, the streaming device 200 divides the generated execution screen by executing the content (S204) – at this time, the streaming service apparatus 200 can divide the execution screen based on the contents of the contents. The location, size, and shape of the image are also set based on the contents of the content – the execution screen is divided into a main area and a sub area according to the contents of the contents – the main area and the auxiliary area of the content can be arbitrarily set by the content creator or the administrator of the streamlining service device 200 - in the case where the content is a movie or a TV drama, it is also possible to recognize a face of the actor or a close-up object by using a screen recognition technology and set the corresponding part as a main area – when the content is a game, the main area and the auxiliary area can be divided by referring to the game data).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have had identified a primary event in a primary region and one or more secondary events in one or more secondary regions, within each of video frames of a video, based on an analysis of the video frames, wherein the one or more secondary events are contextually related to the primary event; and obtained a semantic relationship between identified primary event and the one or more secondary events as disclosed by Park in the device disclosed by Lim et al. in order to ensure that the regions of interest are further enhanced.
Regarding claim 13, Lim et al. in view of Park discloses all of the limitations as previously discussed with respect to claim 12 including that wherein the primary event and the one or more secondary events are identified based on an analysis of at least one of an audio of the video frames, the video frames and a plurality of multi-modal contextual inputs (Lim et al.: paragraph [0057] – the memory 110 stores an instruction that controls the processor 120 to acquire an output image frame based on information on an area of interest acquired by applying an input image frame on a learning network model – here, the learning network model may be a model that is trained to acquire information on an area of interest in an input image frame; Park: paragraph [0023] – the streaming service apparatus 200 divides the execution screen of the content into the main area and the auxiliary area according to the contents of the contents – the main area and the auxiliary area are set in consideration of the intention of the content creator; paragraph [0024] - the main domain corresponds to an essential part of the contents of the contents to grasp the contents of the contents - for example, if the content is a movie or a TV drama, it may be a person who is talking about the present dialogue, or a person or an object that is focused - in the case where the content is a streaming game, the user’s character, the object selected by the user, and the target for proceeding the game can be the main areas; paragraph [0025] - the subarea corresponds to a portion of content that is less important than the main region of the content - for example, if the content is a movie or a TV drama, it may be an auxiliary person, simple background not related to the drama flow).
Regarding claim 16, Lim et al. in view of Park discloses all of the limitations as previously discussed with respect to claim 12 including that wherein to obtain the semantic relationship between the identified primary event and the one or more secondary events, the one or more processors are configured to: identify at least one of one or more objects, one or more faces, orientation of a head of one or more users, gaze angles of the one or more users in the primary event in the primary region and in the one or more secondary events in the one or more secondary regions based on a performance of the analysis of the video; and obtain, based on a result of the detection, the semantic relationship between each of the primary event and the one or more secondary events with respect to a plurality of semantic relationships parameters, wherein the plurality of semantic relationship parameters comprise proximity of the identified one or more objects and the one or more faces with respect to a camera, the gaze angles of the one or more users, a pixel displacement in the primary region and the one or more secondary regions, a visual similarity in the primary event, the visual similarity in the one or more secondary regions (Lim et al.: paragraph [0057] – the memory 110 stores an instruction that controls the processor 120 to acquire an output image frame based on information on an area of interest acquired by applying an input image frame on a learning network model – here, the learning network model may be a model that is trained to acquire information on an area of interest in an input image frame; Park: paragraph [0023] – the streaming service apparatus 200 divides the execution screen of the content into the main area and the auxiliary area according to the contents of the contents – the main area and the auxiliary area are set in consideration of the intention of the content creator; paragraph [0024] - the main domain corresponds to an essential part of the contents of the contents to grasp the contents of the contents - for example, if the content is a movie or a TV drama, it may be a person who is talking about the present dialogue, or a person or an object that is focused - in the case where the content is a streaming game, the user's character, the object selected by the user, and the target for proceeding the game can be the main areas; paragraph [0025] - the subarea corresponds to a portion of content that is less important than the main region of the content - for example, if the content is a movie or a TV drama, it may be an auxiliary person, simple background not related to the drama flow; paragraph [0033] – when the content is executed, the streaming device 200 divides the generated execution screen by executing the content (S204) – at this time, the streaming service apparatus 200 can divide the execution screen based on the contents of the contents. The location, size, and shape of the image are also set based on the contents of the content – the execution screen is divided into a main area and a sub area according to the contents of the contents – the main area and the auxiliary area of the content can be arbitrarily set by the content creator or the administrator of the streamlining service device 200 - in the case where the content is a movie or a TV drama, it is also possible to recognize a face of the actor or a close-up object by using a screen recognition technology and set the corresponding part as a main area – when the content is a game, the main area and the auxiliary area can be divided by referring to the game data).
Regarding claim 17, Lim et al. in view of Park discloses all of the limitations as previously discussed with respect to claim 12 including that wherein the first aspect ratio is determined based on a second aspect ratio of at least one of the display of the device for displaying the video or the one or more applications (Lim et al.: paragraph [0005] – provided are an image processing apparatus that is capable of acquiring an output image while minimizing image distortion through detection of an area of an area of interest and adjusting an aspect ratio of an input image, and an image processing method thereof; paragraph [0087] – the retargeting information may include the aspect ratio of the input image frame and the aspect ratio of the output image frame).
Regarding claim 18, Lim et al. in view of Park discloses all of the limitations as previously discussed with respect to claim 12 including that wherein to obtain the frames, the one or more processors are configured to: obtain background features of the primary event and each of the one or more secondary events based on the semantic relationship and an assigned second priority; determine a plurality of aesthetic effects for the primary event and each of the one or more secondary events based on the background features and an event score from the semantic relationship; and obtain frames matching with the determined first aspect ratio and having the predicted positions of the primary event and the one or more secondary events along with the determined plurality of aesthetic effects (Lim et al.: paragraph [0057] – the memory 110 stores an instruction that controls the processor 120 to acquire an output image frame based on information on an area of interest acquired by applying an input image frame on a learning network model – here, the learning network model may be a model that is trained to acquire information on an area of interest in an input image frame; Park: paragraph [0023] – the streaming service apparatus 200 divides the execution screen of the content into the main area and the auxiliary area according to the contents of the contents – the main area and the auxiliary area are set in consideration of the intention of the content creator; paragraph [0024] - the main domain corresponds to an essential part of the contents of the contents to grasp the contents of the contents - for example, if the content is a movie or a TV drama, it may be a person who is talking about the present dialogue, or a person or an object that is focused - in the case where the content is a streaming game, the user's character, the object selected by the user, and the target for proceeding the game can be the main areas; paragraph [0025] - the subarea corresponds to a portion of content that is less important than the main region of the content - for example, if the content is a movie or a TV drama, it may be an auxiliary person, simple background not related to the drama flow; paragraph [0028] – the streaming service apparatus 200 may not only divide the execution screen of the content into the main area and the auxiliary area but also divide the main figure in stages and set the encoding quality in stages according to the steps of the main figure; paragraph [0033] – when the content is executed, the streaming device 200 divides the generated execution screen by executing the content (S204) – at this time, the streaming service apparatus 200 can divide the execution screen based on the contents of the contents. The location, size, and shape of the image are also set based on the contents of the content – the execution screen is divided into a main area and a sub area according to the contents of the contents – the main area and the auxiliary area of the content can be arbitrarily set by the content creator or the administrator of the streamlining service device 200 - in the case where the content is a movie or a TV drama, it is also possible to recognize a face of the actor or a close-up object by using a screen recognition technology and set the corresponding part as a main area – when the content is a game, the main area and the auxiliary area can be divided by referring to the game data; plurality of aesthetic effects – aspect ratio and quality/resolution).
Regarding claim 20, Lim et al. discloses a non-transitory computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to: identify a primary region, within each of video frames of a video, based on an analysis of the video frames (paragraph [0087] – when information on an area of interest is acquired from a learning network model, the processor 120 may retarget an input image frame by applying the first conversion weight to pixels corresponding to the area of interest, and applying the second conversion weight to pixels corresponding to the remaining area (or an area of non-interest) – the retargeting information may include the aspect ratio of the input image frame and the aspect ratio of the output image frame); determine a first aspect ratio in which the video is displayed on at least one of an electronic device or one or more applications (paragraph [0087] – when information on an area of interest is acquired from a learning network model, the processor 120 may retarget an input image frame by applying the first conversion weight to pixels corresponding to the area of interest, and applying the second conversion weight to pixels corresponding to the remaining area (or an area of non-interest) – the retargeting information may include the aspect ratio of the input image frame and the aspect ratio of the output image frame); predict, using an Artificial Intelligence (AI) model, positions of the primary region, based on the determined first aspect ratio (paragraph [0087] – when information on an area of interest is acquired from a learning network model, the processor 120 may retarget an input image frame by applying the first conversion weight to pixels corresponding to the area of interest, and applying the second conversion weight to pixels corresponding to the remaining area (or an area of non-interest) – the retargeting information may include the aspect ratio of the input image frame and the aspect ratio of the output image frame); obtain frames matching the determined first aspect ratio and having the predicted positions of the primary region (paragraph [0087] – when information on an area of interest is acquired from a learning network model, the processor 120 may retarget an input image frame by applying the first conversion weight to pixels corresponding to the area of interest, and applying the second conversion weight to pixels corresponding to the remaining area (or an area of non-interest) – the retargeting information may include the aspect ratio of the input image frame and the aspect ratio of the output image frame); and display the video having the obtained frames and the determined first aspect ratio (paragraph [0057] – the memory 110 stores an instruction that controls the processor 120 to acquire an output image frame based on information on an area of interest acquired by applying an input image frame on a learning network model – here, the learning network model may be a model that is trained to acquire information on an area of interest in an input image frame; paragraph [0087] – when information on an area of interest is acquired from a learning network model, the processor 120 may retarget an input image frame by applying the first conversion weight to pixels corresponding to the area of interest, and applying the second conversion weight to pixels corresponding to the remaining area (or an area of non-interest) – the retargeting information may include the aspect ratio of the input image frame and the aspect ratio of the output image frame). However, Lim et al. fails to disclose identifying a primary event in a primary region and one or more secondary events in one or more secondary regions, within each of video frames of a video, based on an analysis of the video frames, wherein the one or more secondary events are contextually related to the primary event; and obtaining a semantic relationship between identified primary event and the one or more secondary events.
Referring to the Park reference, Park discloses an electronic device for displaying a video, the electronic device comprising one or more processors configured to: identifying a primary event in a primary region and one or more secondary events in one or more secondary regions, within each of video frames of a video, based on an analysis of the video frames, wherein the one or more secondary events are contextually related to the primary event (paragraph [0023] – the streaming service apparatus 200 divides the execution screen of the content into the main area and the auxiliary area according to the contents of the contents – the main area and the auxiliary area are set in consideration of the intention of the content creator; paragraph [0024] - the main domain corresponds to an essential part of the contents of the contents to grasp the contents of the contents - for example, if the content is a movie or a TV drama, it may be a person who is talking about the present dialogue, or a person or an object that is focused - in the case where the content is a streaming game, the user's character, the object selected by the user, and the target for proceeding the game can be the main areas; paragraph [0025] - the subarea corresponds to a portion of content that is less important than the main region of the content - for example, if the content is a movie or a TV drama, it may be an auxiliary person, simple background not related to the drama flow; paragraph [0033] – when the content is executed, the streaming device 200 divides the generated execution screen by executing the content (S204) – at this time, the streaming service apparatus 200 can divide the execution screen based on the contents of the contents. The location, size, and shape of the image are also set based on the contents of the content – the execution screen is divided into a main area and a sub area according to the contents of the contents – the main area and the auxiliary area of the content can be arbitrarily set by the content creator or the administrator of the streamlining service device 200 - in the case where the content is a movie or a TV drama, it is also possible to recognize a face of the actor or a close-up object by using a screen recognition technology and set the corresponding part as a main area – when the content is a game, the main area and the auxiliary area can be divided by referring to the game data); and obtaining a semantic relationship between identified primary event and the one or more secondary events (paragraph [0023] – the streaming service apparatus 200 divides the execution screen of the content into the main area and the auxiliary area according to the contents of the contents – the main area and the auxiliary area are set in consideration of the intention of the content creator; paragraph [0024] - the main domain corresponds to an essential part of the contents of the contents to grasp the contents of the contents - for example, if the content is a movie or a TV drama, it may be a person who is talking about the present dialogue, or a person or an object that is focused - in the case where the content is a streaming game, the user's character, the object selected by the user, and the target for proceeding the game can be the main areas; paragraph [0025] - the subarea corresponds to a portion of content that is less important than the main region of the content - for example, if the content is a movie or a TV drama, it may be an auxiliary person, simple background not related to the drama flow; paragraph [0033] – when the content is executed, the streaming device 200 divides the generated execution screen by executing the content (S204) – at this time, the streaming service apparatus 200 can divide the execution screen based on the contents of the contents. The location, size, and shape of the image are also set based on the contents of the content – the execution screen is divided into a main area and a sub area according to the contents of the contents – the main area and the auxiliary area of the content can be arbitrarily set by the content creator or the administrator of the streamlining service device 200 - in the case where the content is a movie or a TV drama, it is also possible to recognize a face of the actor or a close-up object by using a screen recognition technology and set the corresponding part as a main area – when the content is a game, the main area and the auxiliary area can be divided by referring to the game data).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have had identified a primary event in a primary region and one or more secondary events in one or more secondary regions, within each of video frames of a video, based on an analysis of the video frames, wherein the one or more secondary events are contextually related to the primary event; and obtained a semantic relationship between identified primary event and the one or more secondary events as disclosed by Park in the device disclosed by Lim et al. in order to ensure that the regions of interest are further enhanced.
Claims 11 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Lim et al. in view of Park as applied to claims 1 and 12 above, and further in view of Yoon et al. (U.S. Patent Application Publication 2020/0027226).
Regarding claim 11, Lim et al. in view of Park discloses all of the limitations as previously discussed with respect to claims 1 and 10, but fails to disclose that wherein the plurality of aesthetic effects comprises at least one of a depth effect, a pose change effect, a luminance effect, a lightning effect, and an audio effect with respect to the primary region.
Referring to the Yoon et al. reference, Yoon et al. discloses a method for displaying a video, the method comprising: wherein the plurality of aesthetic effects comprises at least one of a depth effect, a pose change effect, a luminance effect, a lightning effect, and an audio effect with respect to the primary region (paragraph [0077] – the applying module 230 may focus an object within a predetermined area from where the touch is sensed while applying an image effect (e.g., blurring) to at least one object other than the object in the previewed image – the applying module 230 may apply the image effect (e.g., blurring) to the object using the respective depth information corresponding to the at least one object – the image effect may include adjusting one of blur, color, brightness, mosaic, and resolution; claims 1-3 and 6).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have had the plurality of aesthetic effects comprise at least one of a depth effect, a pose change effect, a luminance effect, a lightning effect, and an audio effect with respect to the primary region as disclosed by Yoon et al. in the method disclosed by Lim et al. in view of Park in order to improve the overall look of the image.
Regarding claim 19, Lim et al. in view of Park discloses all of the limitations as previously discussed with respect to claims 12 and 18, but fails to disclose that wherein the plurality of aesthetic effects comprises at least one of a depth effect, a pose change effect, a luminance effect, a lightning effect, and an audio effect with respect to the primary region.
Referring to the Yoon et al. reference, Yoon et al. discloses an electronic device for displaying a video, the electronic device comprising one or more processors configured to: wherein the plurality of aesthetic effects comprises at least one of a depth effect, a pose change effect, a luminance effect, a lightning effect, and an audio effect with respect to the primary region (paragraph [0077] – the applying module 230 may focus an object within a predetermined area from where the touch is sensed while applying an image effect (e.g., blurring) to at least one object other than the object in the previewed image – the applying module 230 may apply the image effect (e.g., blurring) to the object using the respective depth information corresponding to the at least one object – the image effect may include adjusting one of blur, color, brightness, mosaic, and resolution; claims 1-3 and 6).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have had the plurality of aesthetic effects comprise at least one of a depth effect, a pose change effect, a luminance effect, a lightning effect, and an audio effect with respect to the primary region as disclosed by Yoon et al. in the device disclosed by Lim et al. in view of Park in order to improve the overall look of the image.
Allowable Subject Matter
Claims 3-7, 14, and 15 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter: Prior art, either alone or in combination, fails to teach or fairly suggest in combination with all of the other elements claimed:
The aspect ratio based method further comprises: obtaining the video frames and the plurality of multi-modal contextual inputs from at least one of a user input or one or more applications in the electronic device; and performing the analysis of the video frames, wherein the analysis of the video frame comprises: determining a depth map based on a RedGreenBlue-Depth (RGBD) data or a RedGreenBlue (RGB) data in the obtained video frames; identifying key corners for each of the video frames based on the determined RGBD data or the RGB data; estimating a depth-aware optical flow comprising one or more flow points respective of each of the video frames, based on the key corners and the depth map; classifying similar depth-aware optical flows, using curve matching techniques, into one or more categories, wherein the one or more categories respectively correspond to one or more flow clusters; determining a first category among the one or more categories having a highest cardinality, wherein the highest cardinality corresponds to a highest number of optical flows in a cluster among the one or more clusters; obtaining one or more convex hull points, encompassing the one or more flow points in each of the one or more clusters and the first category; and determining one or more bounding boxes enclosing each of the obtained one or more convex hull points, wherein the one or more bounding boxes comprise the primary region and the one or more secondary regions (dependent claim 3, which depends from claims 1 and 2; claims 4-7 depend from claim 3).
wherein the one or more processors are configured to: obtain video frames and the plurality of multi-modal contextual inputs from at least one of a user input or one or more application in a device; and perform the analysis of the video frame, wherein the analysis of the video frame comprises: determine a depth map based on a RedGreenBlue-Depth (RGBD) data or a RedGreenBlue (RGB) data in the obtained video frames; identify key corners for each of the video frames based on the determined RGBD data or the RGB data; estimate a depth-aware optical flow including one or more flow points respective of each of the video frames based on the key corners and the depth map; classify similar depth-aware optical flows, using curve matching techniques, into one or more categories, wherein the one or more categories corresponds to one or more flow clusters; determine a first category among the one or more categories having a highest cardinality, wherein the highest cardinality refers to a highest number of optical flows in a cluster among the one or more clusters; obtain one or more convex hull points, encompassing the one or more flow points in each of the one or more clusters and the first category; and determine one or more bounding boxes enclosing each of the obtained one or more convex hull points, wherein the one or more bounding boxes comprise the primary region and the one or more secondary regions (dependent claim 14, which depends from claims 12 and 13; claim 15 depends from claim 14).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HEATHER R JONES whose telephone number is (571)272-7368. The examiner can normally be reached Mon. - Fri.: 9:00am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, William Vaughn can be reached at (571)272-3922. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/HEATHER R JONES/Primary Examiner, Art Unit 2481
April 30, 2026