DETAILED ACTION
In response to communication filed on 26 January 2026, claims 1, 16 , 30, 31 and 33 are amended. Claims 3-5 and 29 are canceled. Claims 1-2, 6-28 and 30-34 are pending.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments, see “Section § 103 Rejection” filed 26 January 2026, have been carefully considered but are not considered to be persuasive.
APPLICANT’S ARGUMENT: Applicant argues that with particular regard to claim 23, upon further review of Gonzalez, while it does mention "the client device includes a capture device ... and thus possesses the capability of navigating throughout its environment and acquiring a series of time-related images of a video" (Office Action, page 14), it does not mention a "proprietary video environment" as mentioned in claim 23. Further, paragraph [0047] of the present application states: "the library of short-form videos is part of a proprietary video environment. The proprietary video environment can be a subscription service, a walled garden portal, or another suitable proprietary video environment." Thus, the limitations of claim 23 are not taught or suggested by Gonzalez or the other cited references.
EXAMINER’S RESPONSE: Examiner has been carefully considered the argument but respectfully disagrees. According to MPEP [2145 (VI)] “Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims”. Similarly, the current claim language does not clarify the argued details of proprietary video environment being subscription service or walled garden portal. Hence these argued interpretations cannot be read into the claims. Based on the broadest reasonable interpretation, in light of the specification to a person of ordinary skill in the art, “proprietary video environment” may be reasonably interpreted as capture device and its environment, which is being taught by Gonzalez in [0034]. As a result the argument cannot be considered to be persuasive.
The other arguments are related to newly added limitations and are addressed in the rejection below.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 16-17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 16 recites “wherein generating is based on the obtaining”. The independent claim 1 also recites “obtaining… a text-based query”, “obtains a synthesized image” and “the short-form video is obtained”. Therefore, it is not clear that “the obtaining” in claim 16 is referring to which specific “obtaining” limitation from claim 1. Is “the obtaining” in claim 16 is referring to “obtaining… a text-based query”, “obtains a synthesized image” or “the short-form video is obtained”. This makes the claim indefinite. For the purpose of applying prior art, “wherein generating is based on the obtaining” has been interpreted as – the obtaining, from the user, the text-based query—
Claim 17 is also rejected since it inherits this deficiency form claim 16.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-2, 6, 9-13, 16-18, 22-24, 27, 30-32 and 34 are rejected under 35 U.S.C. 103 as being unpatentable over Gonzalez-Banos et al. (US 2020/0335134 A1, hereinafter “Gonzalez”) in view of Surya et al. (US 10,713,821 B1, hereinafter “Surya”) further in view of Barnett et al. (US 2016/0381111 A1, hereinafter “Barnett”).
Regarding claim 1, Gonzalez teaches
A computer-implemented method for searching comprising: (see Gonzalez, [0139] “A system and method for decomposing a video to salient fragments and synthesizing a video composition based on the salient fragments”; [0042] “may search the database of salient fragments upon receiving a query about the video from a user”).
accessing a library of short-form videos, wherein short-form videos include… (see Gonzalez, [0039] “A salient fragment includes multiple time-related frames of the video, where each frame of the salient fragment at a time instant includes a particular region that is slightly different and is connected in a certain continuity”; [0045] “creates a database to store salient fragments… The dynamic retrieval of salient fragments ensures the dynamic generation of a video composition, for example, different sets of salient fragments may be retrieved to generate different video compositions responsive to a single query”; [0036] “The video composition is a summarization of the video… the video composition is shorter than the original video in time length”; [0052] “the data storage 243 may store an original video, non-salient portions of the video, salient fragments of the video”; [0121] “to retrieve, from the database of the plurality of salient fragments, a set of salient fragments based on the query” – salient fragments are interpreted as short-form videos).
obtaining, from a user, a text-based query for a short-form video based on a textual description; (see Gonzalez, [0042] “may search the database of salient fragments upon receiving a query about the video from a user”; [0043] “may receive a query related to a salient fragment of a first person in a video”; [0089] “queries the database of salient fragments to retrieve all fragments related to the queried salient fragment… there are many other types of queries or combinations of queries… queries based on tags, keywords, metadata… the query module 209 communicates with the synthesis module 211 to retrieve a set of salient fragments based on the query for synthesizing a video composition”).
… the text-based query (see Gonzalez, [0089] “queries the database of salient fragments to retrieve all fragments related to the queried salient fragment… there are many other types of queries or combinations of queries… queries based on tags, keywords, metadata… the query module 209 communicates with the synthesis module 211 to retrieve a set of salient fragments based on the query for synthesizing a video composition”).
searching the library of short-form videos for the short-form video that corresponds to an image or a frame… (see Gonzalez, [0042] “may search the database of salient fragments upon receiving a query about the video from a user and generate a video composition as the query result for display to the user”; [0090] “extracts a plurality of salient fragments from a surveillance video of an airport, and the indexer 207 stores and indexes the salient fragments in a database. When a user (e.g., an analyst) selects a portion of an image/frame 602 of the surveillance video, it triggers the query module 209 to generate a query fragment 604…generates a query fragment for querying the database to retrieve related fragments”) wherein the searching is an image-based search,… (see Gonzalez, [0042] “may search the database of salient fragments upon receiving a query about the video from a user and generate a video composition as the query result for display to the user”; [0090] “extracts a plurality of salient fragments from a surveillance video of an airport, and the indexer 207 stores and indexes the salient fragments in a database. When a user (e.g., an analyst) selects a portion of an image/frame 602 of the surveillance video, it triggers the query module 209 to generate a query fragment 604…generates a query fragment for querying the database to retrieve related fragments”) query fragment is input for the image-based search, and wherein the short-form video is obtained from a result of the image-based search; and (see Gonzalez, [0117] “responsive to a query of image 602 of fragment 31… the analysis module 205 determines a set of salient fragments that are related to fragment 31, and transmits the set of salient fragments to the synthesis module 211 to generate a video composition… The resulting video composition generated by the synthesis module 211 based on these salient fragments and non-salient portions”; [0090]-[0091] “a query fragment generated based on a user input. The salience module 203 extracts a plurality of salient fragments from a surveillance video of an airport, and the indexer 207 stores and indexes the salient fragments in a database. When a user (e.g., an analyst) selects a portion of an image/frame 602 of the surveillance video, it triggers the query module 209 to generate a query fragment 604. The query fragment 604 includes the frame 602, and is also referred to as fragment 31 according to its index… generates a query fragment for querying the database to retrieve related fragments”; [0126] “the synthesis module 211 retrieves the fragments that are connected to the query fragment in the graph, and uses the retrieved fragments to generate the video composition”).
presenting, to the user, the short-form video from the library that corresponds to the query (see Gonzalez, [0042] “may search the database of salient fragments upon receiving a query about the video from a user and generate a video composition as the query result for display to the user”).
Gonzalez does not explicitly teach videos include livestreams, livestream replays, influencer videos, and product and service promotion videos; synthesizing an image, to create a synthesized image, wherein the synthesizing is performed by inputting the text-based query into a synthesizing engine, wherein the synthesizing engine obtains a synthesized image from a generative model; searching the synthesized image from the generative model, wherein the searching is an image-based search, wherein the synthesized image from the generative model is input for an image-based search.
However, Surya discloses synthetic image data and teaches
synthesizing an image, to create a synthesized image, wherein the synthesizing is performed by inputting the input text into a synthesizing engine, wherein the synthesizing engine obtains a synthesized image from a generative model; (see Surya, [col 2 lines 24-60] “a user may initially perform a search using the search query "blue pants." Initially, the machine learning models described herein may generate a synthetic image of blue denim jeans… may then generate an image of blue capri pants that are more narrowly tapered relative to what was previously shown in the synthetic image data… to generate photorealistic synthetic image data representing any desired object(s)… the text-to-image synthesis machine learning systems described herein may include a stage-I generative adversarial network (GAN) and a stage-II GAN that may be used to iteratively generate images representative of input text as the input text is modified over time during a session (e.g., during a search session). The text-to-image machine learning systems may leverage recurrent neural networks (RNNs) to model sequences of data to generate image data that represents the subject matter described by text descriptions as the text descriptions are input and modified over time”; Figs. 4-5; [col 11 lines 17-19] “illustrates three synthetic images---408, 410, and 412-that were generated in response to the query 402 using the stage-I and stage-II GANs” – 408 has been interpreted as synthesized image; [col 3 lines 28-31] “program the at least one processor to perform the various techniques described herein. Additionally, memory 103 may store one or more of the machine learning models described herein such as the GANs”).
… the synthesized image train various machine learning models (see Surya, [col 2 lines 40-43] “Various machine learning models described herein may be used to generate photorealistic synthetic image data representing any desired object(s), depending on the training data sets used to train the various machine learning models”) from the generative model… (see Surya, [col 2 lines 24-60] “a user may initially perform a search using the search query "blue pants." Initially, the machine learning models described herein may generate a synthetic image of blue denim jeans… may then generate an image of blue capri pants that are more narrowly tapered relative to what was previously shown in the synthetic image data… to generate photorealistic synthetic image data representing any desired object(s)… the text-to-image synthesis machine learning systems described herein may include a stage-I generative adversarial network (GAN) and a stage-II GAN that may be used to iteratively generate images representative of input text as the input text is modified over time during a session (e.g., during a search session). The text-to-image machine learning systems may leverage recurrent neural networks (RNNs) to model sequences of data to generate image data that represents the subject matter described by text descriptions as the text descriptions are input and modified over time”; Figs. 4-5; [col 11 lines 17-19] “illustrates three synthetic images---408, 410, and 412-that were generated in response to the query 402 using the stage-I and stage-II GANs” – 408 has been interpreted as synthesized image; [col 3 lines 28-31] “program the at least one processor to perform the various techniques described herein. Additionally, memory 103 may store one or more of the machine learning models described herein such as the GANs”) wherein the synthesized image train various machine learning models (see Surya, [col 2 lines 40-43] “Various machine learning models described herein may be used to generate photorealistic synthetic image data representing any desired object(s), depending on the training data sets used to train the various machine learning models”) from the generative model (see Surya, [col 2 lines 24-60] “a user may initially perform a search using the search query "blue pants." Initially, the machine learning models described herein may generate a synthetic image of blue denim jeans… may then generate an image of blue capri pants that are more narrowly tapered relative to what was previously shown in the synthetic image data… to generate photorealistic synthetic image data representing any desired object(s)… the text-to-image synthesis machine learning systems described herein may include a stage-I generative adversarial network (GAN) and a stage-II GAN that may be used to iteratively generate images representative of input text as the input text is modified over time during a session (e.g., during a search session). The text-to-image machine learning systems may leverage recurrent neural networks (RNNs) to model sequences of data to generate image data that represents the subject matter described by text descriptions as the text descriptions are input and modified over time”; Figs. 4-5; [col 11 lines 17-19] “illustrates three synthetic images---408, 410, and 412-that were generated in response to the query 402 using the stage-I and stage-II GANs” – 408 has been interpreted as synthesized image; [col 3 lines 28-31] “program the at least one processor to perform the various techniques described herein. Additionally, memory 103 may store one or more of the machine learning models described herein such as the GANs”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of generative model, synthesizing images to generate synthesized images, GAN, evaluating information and search corresponding to specific criteria as being disclosed and taught by Surya, in the system taught by Gonzalez to yield the predictable results of improving the quality of image data (see Surya, [col 6 lines 13-35] “depicts a stage-II GAN that may be used in conjunction with the stage-I GAN described in FIG. 1 to improve the quality of the image data generated by the stage-I GAN. For example, the stage-I GAN may output image data at a resolution of 64x64 pixels, while the stage-II GAN may output image data at a resolution of 256x256 pixels… may be effective to improve the image quality of the low resolution images I1" generated using the stage-I GAN network”).
The proposed combination of Gonzalez and Surya does not explicitly teach videos include livestreams, livestream replays, influencer videos, and product and service promotion videos.
However, Barnett discloses media streams and teaches
videos include livestreams, (see Barnett, [0050] “provide a media stream using the capturing client device 105 (e.g., provide a stream of live digital video)”; [0031] “a media stream (e.g., a video media stream)”) livestream replays, (see Barnett, [0042] “replay a media segment from a media stream”; [0031] “a media stream (e.g., a video media stream)”) influencer videos, and (see Barnett, [0127] “Based on recognizing an influencer within a media stream, the media presentation system 102 may modify one or more characteristics (or generate one or more characteristics) to indicate that the media stream includes the influencer”; [0031] “a media stream (e.g., a video media stream)”) product and service promotion videos; (see Barnett, [0128] “may include a sponsored media stream… within a media stream presentation… the sponsored media stream may be presented in connection with a brand, such as the "SPRITE front row media stream”; [0193] “the media stream may display a brand (e.g., a logo) indicating that the media stream is sponsored by a specific entity or company”; [0031] “a media stream (e.g., a video media stream)”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of livestreams, livestream replays, influencer videos, product and service promotion as being disclosed and taught by Barnett, in the system taught by the proposed combination of Gonzalez and Surya to yield the predictable results of improving the video quality of a media stream based on the characteristics of the media on the media stream (see Barnett, [0013] “based on the media characteristics on the media stream, the systems and methods can apply production edits to the media stream to improve the quality of the media stream. For example, the systems and methods can improve the video quality of a media stream (e.g., correct for shakiness of the media stream, or remove long portions of video that do not include any action)”).
Claims 30 and 31 incorporate substantively all the limitations of claim 1 in a computer-readable medium form (see Gonzalez, [0145] “computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system”; [0146] “executing program code can include at least one processor”) and system form and (see Gonzalez, [0145]-[0146] “a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device… A data processing system suitable for storing and/or executing program code can include at least one processor coupled directly or indirectly to memory elements through a system bus”) are rejected under the same rationale.
Regarding claim 2, the proposed combination of Gonzalez, Surya and Barnett teaches
wherein the generative model is based on a generative adversarial network (GAN), (see Surya, [col 3 lines 3-10] “The generator of the stage-I GAN may generate a low-resolution image with the basic contour and color of the object… the stage-II generator up-samples the generated image and adds finer details including texture, stylistic details, and/or color gradients producing a more realistic high-resolution image”; [col 2 lines 50-55] “the text-to-image synthesis machine learning systems described herein may include a stage-I generative adversarial network (GAN) and a stage-II GAN that may be used to iteratively generate images representative of input text as the input text is modified over time during a session”). The motivation for the proposed combination is maintained.
Regarding claim 6, the proposed combination of Gonzalez, Surya and Barnett teaches
wherein the synthesizing includes (see Surya, [col 2 lines 24-60] “a user may initially perform a search using the search query "blue pants." Initially, the machine learning models described herein may generate a synthetic image of blue denim jeans… may then generate an image of blue capri pants that are more narrowly tapered relative to what was previously shown in the synthetic image data… to generate photorealistic synthetic image data representing any desired object(s)… the text-to-image synthesis machine learning systems described herein may include a stage-I generative adversarial network (GAN) and a stage-II GAN that may be used to iteratively generate images representative of input text as the input text is modified over time during a session (e.g., during a search session). The text-to-image machine learning systems may leverage recurrent neural networks (RNNs) to model sequences of data to generate image data that represents the subject matter described by text descriptions as the text descriptions are input and modified over time”; Fig. 5) creating a second synthesized image based on the generative model (see Surya, [col 11 lines 17-19]; Fig. 4; “illustrates three synthetic images---408, 410, and 412-that were generated in response to the query 402 using the stage-I and stage-II GANs” – 410 has been interpreted as the second synthesized image; [col 2 lines 50-55] “the text-to-image synthesis machine learning systems described herein may include a stage-I generative adversarial network (GAN) and a stage-II GAN that may be used to iteratively generate images representative of input text as the input text is modified over time during a session”). The motivation for the proposed combination is maintained.
Regarding claim 9, the proposed combination of Gonzalez, Surya and Barnett teaches
further comprising evaluating (see Surya, [col 11 lines 10- 14] “a user may have searched a database (e.g., a fashion database) using an initial search string query… may thereafter modify the text string search query in order to provide search results that are more narrowly tailored to the user's interest”; [col lines 38-40] “In text string 404, the user has modified the input search query by including the term "Petite" in a text string modification”) results of searches based on (see Gonzalez, [0042] “may search the database of salient fragments upon receiving a query about the video from a user and generate a video composition as the query result for display to the user”) the synthesized image and the second synthesized image (see Surya, [col 11 lines 17-19] “illustrates three synthetic images---408, 410, and 412-that were generated in response to the query 402 using the stage-I and stage-II GANs” – 408 has been interpreted as the synthesized image and 410 has been interpreted as the second synthesized image). The motivation for the proposed combination is maintained.
Regarding claim 10, the proposed combination of Gonzalez, Surya and Barnett teaches
wherein the presenting is based on (see Gonzalez, [0042] “may search the database of salient fragments upon receiving a query about the video from a user and generate a video composition as the query result for display to the user”) the evaluating (see Surya, [col 11 lines 10- 14] “a user may have searched a database (e.g., a fashion database) using an initial search string query… may thereafter modify the text string search query in order to provide search results that are more narrowly tailored to the user's interest”; [col lines 38-40] “In text string 404, the user has modified the input search query by including the term "Petite" in a text string modification”). The motivation for the proposed combination is maintained.
Regarding claim 11, the proposed combination of Gonzalez, Surya and Barnett teaches
wherein the searching includes (see Gonzalez, [0042] “may search the database of salient fragments upon receiving a query about the video from a user and generate a video composition as the query result for display to the user”) correspondence to (see Surya, [col 1 lines 66-67] “refine the search to narrow down the results and/ or to confine the search results to a particular area of interest”) the second synthesized image (see Surya, [col 11 lines 17-19]; Fig. 4; “illustrates three synthetic images---408, 410, and 412-that were generated in response to the query 402 using the stage-I and stage-II GANs” – 410 has been interpreted as the second synthesized image). The motivation for the proposed combination is maintained.
Regarding claim 12, the proposed combination of Gonzalez, Surya and Barnett teaches
wherein the synthesized image and the second synthesized image (see Surya, [col 2 lines 24-60] “a user may initially perform a search using the search query "blue pants." Initially, the machine learning models described herein may generate a synthetic image of blue denim jeans… may then generate an image of blue capri pants that are more narrowly tapered relative to what was previously shown in the synthetic image data… to generate photorealistic synthetic image data representing any desired object(s)… the text-to-image synthesis machine learning systems described herein may include a stage-I generative adversarial network (GAN) and a stage-II GAN that may be used to iteratively generate images representative of input text as the input text is modified over time during a session (e.g., during a search session). The text-to-image machine learning systems may leverage recurrent neural networks (RNNs) to model sequences of data to generate image data that represents the subject matter described by text descriptions as the text descriptions are input and modified over time”; Figs. 4-5; [col 11 lines 17-19] “illustrates three synthetic images---408, 410, and 412-that were generated in response to the query 402 using the stage-I and stage-II GANs” – 408 has been interpreted as synthesized image and 410 has been interpreted as second synthesized image) form a sequence (see Gonzalez, [0031] “captures images from a scene and combines the time-sequenced images to a video”; [0095] “when the synthesis module 211 receives a query of a time interval, the synthesis module 211 communicates with the query module 209 to retrieve all fragments that occur within the time interval, and generate a video composition based on these fragments and start stop times associated with the fragments”) based on the textual description (see Gonzalez, [0089] “the query is a time interval. For example, the query module 209 generates a query for querying all fragments within a time interval based on user input”). The motivation for the proposed combination is maintained.
Regarding claim 13, the proposed combination of Gonzalez, Surya and Barnett teaches
wherein the sequence represents a video sequence (see Gonzalez, [0031] “captures images from a scene and combines the time-sequenced images to a video”; [0095] “when the synthesis module 211 receives a query of a time interval, the synthesis module 211 communicates with the query module 209 to retrieve all fragments that occur within the time interval, and generate a video composition based on these fragments and start stop times associated with the fragments”). The motivation for the proposed combination is maintained.
Regarding claim 16, the proposed combination of Gonzalez, Surya and Barnett teaches
wherein the generative model (see Surya, [col 2 lines 51-55] “may include a stage-I generative adversarial network (GAN) and a stage-II GAN that may be used to iteratively generate images representative of input text as the input text is modified over time during a session (e.g., during a search session)”) creates a synthesized video, wherein generating is based (see Gonzalez, [0029] “for decomposing a video stream into salient fragments and synthesizing a video composition based on the salient fragments”) on the obtaining (see Gonzalez, [0042] “may search the database of salient fragments upon receiving a query about the video from a user”; [0043] “may receive a query related to a salient fragment of a first person in a video”; [0089] “queries the database of salient fragments to retrieve all fragments related to the queried salient fragment… there are many other types of queries or combinations of queries… queries based on tags, keywords, metadata… the query module 209 communicates with the synthesis module 211 to retrieve a set of salient fragments based on the query for synthesizing a video composition”). The motivation for the proposed combination is maintained.
Regarding claim 17, the proposed combination of Gonzalez, Surya and Barnett teaches
wherein the searching is based on correspondence to (see Gonzalez, [0042] “may search the database of salient fragments upon receiving a query about the video from a user and generate a video composition as the query result for display to the user”; [0090] “extracts a plurality of salient fragments from a surveillance video of an airport, and the indexer 207 stores and indexes the salient fragments in a database. When a user (e.g., an analyst) selects a portion of an image/frame 602 of the surveillance video, it triggers the query module 209 to generate a query fragment 604…generates a query fragment for querying the database to retrieve related fragments”) the synthesized video resulting from the synthesizing (see Gonzalez, [0117]-[0118] “responsive to a query of image 602 of fragment 31 shown in FIG. 6, the analysis module 205 determines a set of salient fragments that are related to fragment 31, and transmits the set of salient fragments to the synthesis module 211 to generate a video composition… images 1102, 1104, 1106, 1108, and 1110 depicted in the upper part are taken from the original surveillance video, while images 1102a, 1104a, 1106a, 1108a, and 1110a depicted in the lower part are taken from the video composition synthesized based on the surveillance video”).
Regarding claim 18, the proposed combination of Gonzalez, Surya and Barnett teaches
wherein the library further comprises (see Gonzalez, [0086] “and stores the index along with the segments and corresponding fragments in the database”; [0033] “a database of a video sharing website”) store images (see Surya, [col 9 lines 22-24] “Different portions of the storage element 302, for example, may be used for program instructions for execution by the processing element 304, storage of images”). The motivation for the proposed combination is maintained.
Regarding claim 22, the proposed combination of Gonzalez, Surya and Barnett teaches
wherein the library further comprises (see Gonzalez, [0033] “a database of a video sharing website”) image frames from videos (see Gonzalez, [0090] “stores and indexes the salient fragments in a database. When a user (e.g., an analyst) selects a portion of an image/frame 602 of the surveillance video, it triggers the query module 209 to generate a query fragment 604. The query fragment 604 includes the frame 602, and is also referred to as fragment 31 according to its index”).
Regarding claim 23, the proposed combination of Gonzalez, Surya and Barnett teaches
wherein the library of short-form videos is part of (see Gonzalez, [0039] “A salient fragment includes multiple time-related frames of the video, where each frame of the salient fragment at a time instant includes a particular region that is slightly different and is connected in a certain continuity”; [0045] “creates a database to store salient fragments… The dynamic retrieval of salient fragments ensures the dynamic generation of a video composition, for example, different sets of salient fragments may be retrieved to generate different video compositions responsive to a single query”; [0036] “The video composition is a summarization of the video… the video composition is shorter than the original video in time length”; [0052] “the data storage 243 may store an original video, non-salient portions of the video, salient fragments of the video”; [0121] “to retrieve, from the database of the plurality of salient fragments, a set of salient fragments based on the query” – salient fragments are interpreted as short-form videos) a proprietary video environment (see Gonzalez, [0034] “the client device 115 includes a capture device… and thus possesses the capability of navigating throughout its environment and acquiring a series of time-related images of a video”).
Regarding claim 24, the proposed combination of Gonzalez, Surya and Barnett teaches
wherein the user comprises a system (see Gonzalez, [0035] “The client device 115 receives and sends data to and from a user accessing the client device 115”).
Regarding claim 27, the proposed combination of Gonzalez, Surya and Barnett teaches
wherein the presenting includes (see Gonzalez, [0042] “may search the database of salient fragments upon receiving a query about the video from a user and generate a video composition as the query result for display to the user”) images from frames (see Gonzalez, [0090] “stores and indexes the salient fragments in a database. When a user (e.g., an analyst) selects a portion of an image/frame 602 of the surveillance video, it triggers the query module 209 to generate a query fragment 604. The query fragment 604 includes the frame 602, and is also referred to as fragment 31 according to its index”) of the presented short-form video from the library of short-form videos that match the query (see Gonzalez, [0042] “may search the database of salient fragments upon receiving a query about the video from a user and generate a video composition as the query result for display to the user”).
Regarding claim 32, the proposed combination of Gonzalez, Surya and Barnett teaches
wherein the synthesized image and the second synthesized image comprise (see Surya, [col 11 lines 17-19]; Fig. 4; “illustrates three synthetic images---408, 410, and 412-that were generated in response to the query 402 using the stage-I and stage-II GANs” – 408 has been interpreted as synthesized image and 410 has been interpreted as the second synthesized image; [col 2 lines 50-55] “the text-to-image synthesis machine learning systems described herein may include a stage-I generative adversarial network (GAN) and a stage-II GAN that may be used to iteratively generate images representative of input text as the input text is modified over time during a session”) a sequence having a temporal relationship (see Gonzalez, [0057] “Each frame of the salient fragment at a time instant includes a particular region that is slightly different and is connected in a certain continuity. For example, a salient fragment may include three time-related frames of an activity of waving hands. The first frame shows that a man is raising a hand to a first position. The second frame shows that the man is waving the hand at the first position. The third frame shows that the man is lowering the hand to a second position. A single salient fragment does not necessarily include a dramatic change of the particular region. That is, a fragment represents a sequence of small and/or steady changes in activity”). The motivation for the proposed combination is maintained.
Regarding claim 34, the proposed combination of Gonzalez, Surya and Barnett teaches
wherein the image frames comprise (see Gonzalez, [0090] “stores and indexes the salient fragments in a database. When a user (e.g., an analyst) selects a portion of an image/frame 602 of the surveillance video, it triggers the query module 209 to generate a query fragment 604. The query fragment 604 includes the frame 602, and is also referred to as fragment 31 according to its index”) intra-coded frames retrieved from the videos at periodic intervals, and (see Gonzalez, [0039] “A salient fragment includes multiple time-related frames of the video, where each frame of the salient fragment at a time instant includes a particular region that is slightly different and is connected in a certain continuity”) wherein the searching the library of short-form videos for the short-form video (see Gonzalez, [0039] “A salient fragment includes multiple time-related frames of the video, where each frame of the salient fragment at a time instant includes a particular region that is slightly different and is connected in a certain continuity”; [0045] “creates a database to store salient fragments… The dynamic retrieval of salient fragments ensures the dynamic generation of a video composition, for example, different sets of salient fragments may be retrieved to generate different video compositions responsive to a single query”; [0036] “The video composition is a summarization of the video… the video composition is shorter than the original video in time length”; [0052] “the data storage 243 may store an original video, non-salient portions of the video, salient fragments of the video”; [0121] “to retrieve, from the database of the plurality of salient fragments, a set of salient fragments based on the query” – salient fragments are interpreted as short-form videos) salient fragment that corresponds to frames of the video (see Gonzalez, [0056] “A salient fragment of the video is a subset of frames of the video”; [0093] “The video composition emphasizes the information of the original video corresponding to the retrieved salient fragments”; [0070] “Each salient fragment sequence includes a series of images/frames of a salient object”) the synthesized image comprises (see Surya, [col 11 lines 17-19] “illustrates three synthetic images---408, 410, and 412-that were generated in response to the query 402 using the stage-I and stage-II GANs” – 408 has been interpreted as the synthesized image and 410 has been interpreted as the second synthesized image) searching the image frames (see Gonzalez, [0090]-[0091] “stores and indexes the salient fragments in a database. When a user (e.g., an analyst) selects a portion of an image/frame 602 of the surveillance video, it triggers the query module 209 to generate a query fragment 604. The query fragment 604 includes the frame 602, and is also referred to as fragment 31 according to its index… generates a query fragment for querying the database to retrieve related fragments”). The motivation for the proposed combination is maintained.
Claims 7-8 and 33 are rejected under 35 U.S.C. 103 as being unpatentable over Gonzalez, Surya and Barnett in view of Yamamoto et al. (US 2016/0132724 A1, hereinafter “Yamamoto”).
Regarding claim 7, the proposed combination of Gonzalez, Surya and Barnett teaches
the synthesized image and the second synthesized image (see Surya, [col 11 lines 17-19] “illustrates three synthetic images---408, 410, and 412-that were generated in response to the query 402 using the stage-I and stage-II GANs” – 408 has been interpreted as the synthesized image and 410 has been interpreted as the second synthesized image).
The proposed combination of Gonzalez, Surya and Barnett does not explicitly teach further comprising selecting, by the user, between the synthesized image and the second synthesized image.
However, Yamamoto discloses creating synthesized images and teaches
further comprising selecting, by the user, between the first and the second synthesized images (see Yamamoto, [0051] “The synthesized image selection unit 41 displays a dialog box 44 on the display 10, and accepts input of synthesized image selection information. FIG. 3 shows an example of the dialog box 44 the synthesized image selection unit 41 presents on the display 10. The synthesized image selection information specifies whether to use the first synthesized image I1 or the second synthesized image I2”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of selection of synthesized image, as being disclosed and taught by Yamamoto, in the system taught by the proposed combination of Gonzalez, Surya and Barnett to yield the predictable results of effectively generating corrected images (see Yamamoto, [0039] “The control device 7 can create two types of synthesized images, a first synthesized image I1 (see FIG. 8A) and a second synthesized image 12 (see FIG. 9B). The first synthesized image I1 is created by generating a first corrected image that reduces the luminance of the first image G1… generates one of the synthesized images based on synthesized image selection information previously input by the operator”).
Regarding claim 8, the proposed combination of Gonzalez, Surya, Barnett and Yamamoto teaches
wherein the searching is based on (see Gonzalez, [0042] “may search the database of salient fragments upon receiving a query about the video from a user and generate a video composition as the query result for display to the user”; [0090] “extracts a plurality of salient fragments from a surveillance video of an airport, and the indexer 207 stores and indexes the salient fragments in a database. When a user (e.g., an analyst) selects a portion of an image/frame 602 of the surveillance video, it triggers the query module 209 to generate a query fragment 604…generates a query fragment for querying the database to retrieve related fragments”) the selecting (see Yamamoto, [0051] “The synthesized image selection unit 41 displays a dialog box 44 on the display 10, and accepts input of synthesized image selection information. FIG. 3 shows an example of the dialog box 44 the synthesized image selection unit 41 presents on the display 10. The synthesized image selection information specifies whether to use the first synthesized image I1 or the second synthesized image 12”). The motivation for the proposed combination is maintained.
Regarding claim 33, the proposed combination of Gonzalez, Surya and Barnett teaches
wherein the system is configured to: (see Gonzalez, [0146] “A data processing system”).
generate a plurality of synthesized candidate images; and (see Surya, [col 11 lines 15-19] “the user may initially input the search query 402-"women's blue pants." FIG. 4 illustrates three synthetic images---408, 410, and 412-that were generated in response to the query 402 using the stage-I and stage-II GANs” – 408, 410 and 412 are interpreted as candidate images).
… the plurality of synthesized candidate images,… (see Surya, [col 11 lines 15-19] “the user may initially input the search query 402-"women's blue pants." FIG. 4 illustrates three synthetic images---408, 410, and 412-that were generated in response to the query 402 using the stage-I and stage-II GANs”) query image is input for the image-based search (see Gonzalez, [0042] “may search the database of salient fragments upon receiving a query about the video from a user and generate a video composition as the query result for display to the user”; [0117] “responsive to a query of image 602 of fragment 31 shown in FIG. 6, the analysis module 205 determines a set of salient fragments that are related to fragment 31, and transmits the set of salient fragments to the synthesis module 211 to generate a video composition. The set of salient fragments selected by the analysis module 205 from all the salient fragments includes fragments 26, 28, 30, 31, 32, and 33 shown in FIGS. 7 and 8”).
The proposed combination of Gonzalez, Surya and Barnett does not explicitly teach obtain a user selection for an image from the plurality of synthesized candidate images wherein the user selection is input for the image-based search.
However, Yamamoto discloses creating synthesized images and teaches
obtain a user selection for an image from synthesized image... wherein the user selection is input into the control device (see Yamamoto, [0051] “The synthesized image selection unit 41 displays a dialog box 44 on the display 10, and accepts input of synthesized image selection information. FIG. 3 shows an example of the dialog box 44 the synthesized image selection unit 41 presents on the display 10. The synthesized image selection information specifies whether to use the first synthesized image I1 or the second synthesized image I2… A pulldown menu 44a for selecting the first synthesized image I1 or second synthesized image 12 is provided in the dialog box 44… synthesized image selection information specifying the first synthesized image I1 as the synthesized image is input to the control device 7… from the pull down menu 44a, synthesized image selection information specifying the second synthesized image 12 as the synthesized image is input to the control device 7”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of selection of synthesized image, as being disclosed and taught by Yamamoto, in the system taught by the proposed combination of Gonzalez, Surya and Barnett to yield the predictable results of effectively generating corrected images (see Yamamoto, [0039] “The control device 7 can create two types of synthesized images, a first synthesized image I1 (see FIG. 8A) and a second synthesized image 12 (see FIG. 9B). The first synthesized image I1 is created by generating a first corrected image that reduces the luminance of the first image G1… generates one of the synthesized images based on synthesized image selection information previously input by the operator”).
Claims 14-15, 20-21, 26 and 28 are rejected under 35 U.S.C. 103 as being unpatentable over Gonzalez, Surya and Barnett in view of Hohwald et al. (US 11,176,189 B1, hereinafter “Hohwald”).
Regarding claim 14, the proposed combination of Gonzalez, Surya and Barnett teaches
…the presented short-form video (see Gonzalez, [0042] “may search the database of salient fragments upon receiving a query about the video from a user and generate a video composition as the query result for display to the user”).
The proposed combination of Gonzalez, Surya and Barnett does not explicitly teach further comprising improving search results, wherein the improving is based on a second search iteration with the presented video
However, Hohwald discloses image search results and teaches
further comprising improving search results, wherein the improving is based on a second search iteration with image (see Hohwald, [col 4 line 46 col 5 line 8] “refers to a query refinement term that indicates a visual feature and/or object present in a corresponding search result image and is used as a form of relevant feedback input… provides for the user to indicate through the interface what results are desirable and what are not. An arbitrary number of the initial search results can be tagged as being "good" or "bad" results for the image. Once a set of initial results are tagged, the IR system then incorporates that feedback to present a new, improved set of results, potentially iterating in successive manner, and each time presenting a set of search results that is more relevant to what the user was searching for”; [col 3 lines 18-25] “receiving a first set of search results based on the image search query, in which the first set of search results includes first images associated with the first search term from a collection of images. The method includes receiving a user interface control with each of the first images, in which the user interface control provides one or more facets for the image, and the one or more facets prompt a user to provide feedback with respect to the image”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of search results based on second iterations, image corresponding to the query, accuracy of searching, search results including a number and sorting results as being disclosed and taught by Hohwald, in the system taught by the proposed combination of Gonzalez, Surya and Barnett to yield the predictable results of providing improved search results (see Hohwald, [col 22 lines 54-59] “Once a set of initial results are tagged, the processor 236, using the image search engine 242, then incorporates that feedback to present a new, improved set of results, potentially iterating in successive manner, and each time presenting a set of search results that is more relevant with what the user was searching for”).
Regarding claim 15, the proposed combination of Gonzalez, Surya, Barnett and Hohwald teaches
wherein the second search iteration includes (see Hohwald, [col 4 line 46 col 5 line 8] “refers to a query refinement term that indicates a visual feature and/or object present in a corresponding search result image and is used as a form of relevant feedback input… provides for the user to indicate through the interface what results are desirable and what are not. An arbitrary number of the initial search results can be tagged as being "good" or "bad" results for the image. Once a set of initial results are tagged, the IR system then incorporates that feedback to present a new, improved set of results, potentially iterating in successive manner, and each time presenting a set of search results that is more relevant to what the user was searching for”; [col 3 lines 18-25] “receiving a first set of search results based on the image search query, in which the first set of search results includes first images associated with the first search term from a collection of images. The method includes receiving a user interface control with each of the first images, in which the user interface control provides one or more facets for the image, and the one or more facets prompt a user to provide feedback with respect to the image”) a second synthesized image based on the generative model (see Surya, [col 11 lines 17-19]; Fig. 4; “illustrates three synthetic images---408, 410, and 412-that were generated in response to the query 402 using the stage-I and stage-II GANs” – 410 has been interpreted as the second synthesized image; [col 2 lines 50-55] “the text-to-image synthesis machine learning systems described herein may include a stage-I generative adversarial network (GAN) and a stage-II GAN that may be used to iteratively generate images representative of input text as the input text is modified over time during a session”). The motivation for the proposed combination is maintained.
Regarding claim 20, the proposed combination of Gonzalez, Surya and Barnett teaches
wherein the searching (see Gonzalez, [0139] “A system and method for decomposing a video to salient fragments and synthesizing a video composition based on the salient fragments”; [0042] “may search the database of salient fragments upon receiving a query about the video from a user”).
The proposed combination of Gonzalez, Surya and Barnett does not explicitly teach wherein the searching identifies one or more images that correspond to the query.
However, Hohwald discloses image search results and teaches
search identifies one or more images that correspond to the query (see Hohwald, [col 5 lines 54-56] “by considering a search engine system using an object classifier for classifying salient objects in images using query refinements of search queries”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of search results based on second iterations, image corresponding to the query, accuracy of searching, search results including a number and sorting results as being disclosed and taught by Hohwald, in the system taught by the proposed combination of Gonzalez, Surya and Barnett to yield the predictable results of providing improved search results (see Hohwald, [col 22 lines 54-59] “Once a set of initial results are tagged, the processor 236, using the image search engine 242, then incorporates that feedback to present a new, improved set of results, potentially iterating in successive manner, and each time presenting a set of search results that is more relevant with what the user was searching for”).
Regarding claim 21, the proposed combination of Gonzalez, Surya, Barnett and Hohwald teaches
further comprising presenting, to the user, (see Gonzalez, [0042] “may search the database of salient fragments upon receiving a query about the video from a user and generate a video composition as the query result for display to the user”) the one or more images that correspond to the query (see Hohwald, [col 22 lines 4-7] “a user interface 800 for initiating an image search via an application 222 and presenting image search results responsive to a text-based image search query”). The motivation for the proposed combination is maintained.
Regarding claim 26, the proposed combination of Gonzalez, Surya and Barnett teaches
… short-form videos to be presented to the user (see Gonzalez, [0042] “may search the database of salient fragments upon receiving a query about the video from a user and generate a video composition as the query result for display to the user”).
The proposed combination of Gonzalez, Surya and Barnett does not explicitly teach wherein search results include a number of short-form videos to be presented to the user.
However, Hohwald discloses image search results and teaches
wherein search results include a number of search results (see Hohwald, [col 16 lines 42-43] “determines a predetermined number of top search results”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of search results based on second iterations, image corresponding to the query, accuracy of searching, search results including a number and sorting results as being disclosed and taught by Hohwald, in the system taught by the proposed combination of Gonzalez, Surya and Barnett to yield the predictable results of providing improved search results (see Hohwald, [col 22 lines 54-59] “Once a set of initial results are tagged, the processor 236, using the image search engine 242, then incorporates that feedback to present a new, improved set of results, potentially iterating in successive manner, and each time presenting a set of search results that is more relevant with what the user was searching for”).
Regarding claim 28, the proposed combination of Gonzalez, Surya and Barnett teaches
… the searching (see Gonzalez, [0139] “A system and method for decomposing a video to salient fragments and synthesizing a video composition based on the salient fragments”; [0042] “may search the database of salient fragments upon receiving a query about the video from a user”).
The proposed combination of Gonzalez, Surya and Barnett does not explicitly teach further comprising sorting results of the searching.
However, Hohwald discloses image search results and teaches
further comprising sorting results of search (see Hohwald, [col 22 lines 11-18] “The user interface 800 includes search controls such as sorting and filtering… the user interface 800 includes a control to sort by a ranking such as popularity… includes a control to filter by the image orientation and/or image type. The user interface 800 may include other search controls to refine the listing of images within the scope of the given search query”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of search results based on second iterations, image corresponding to the query, accuracy of searching, search results including a number and sorting results as being disclosed and taught by Hohwald, in the system taught by the proposed combination of Gonzalez, Surya and Barnett to yield the predictable results of providing improved search results (see Hohwald, [col 22 lines 54-59] “Once a set of initial results are tagged, the processor 236, using the image search engine 242, then incorporates that feedback to present a new, improved set of results, potentially iterating in successive manner, and each time presenting a set of search results that is more relevant with what the user was searching for”).
Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Gonzalez, Surya and Barnett in view of Stavely et al. (US 2004/0095396 A1, hereinafter “Stavely”).
Regarding claim 19, the proposed combination of Gonzalez, Surya and Barnett teaches
wherein the images (see Surya, [col 9 lines 22-24] “Different portions of the storage element 302, for example, may be used for program instructions for execution by the processing element 304, storage of images”) within the library of short-form videos include (see Gonzalez, [0039] “A salient fragment includes multiple time-related frames of the video, where each frame of the salient fragment at a time instant includes a particular region that is slightly different and is connected in a certain continuity”; [0045] “creates a database to store salient fragments… The dynamic retrieval of salient fragments ensures the dynamic generation of a video composition, for example, different sets of salient fragments may be retrieved to generate different video compositions responsive to a single query”; [0036] “The video composition is a summarization of the video… the video composition is shorter than the original video in time length”; [0052] “the data storage 243 may store an original video, non-salient portions of the video, salient fragments of the video”; [0121] “to retrieve, from the database of the plurality of salient fragments, a set of salient fragments based on the query” – salient fragments are interpreted as short-form videos).
The proposed combination of Gonzalez, Surya and Barnett does not explicitly teach video include line drawings, icons, or emojis.
However, Stavely discloses video thumbnail and teaches
video thumbnail comprises icons (see Stavely, [0031] “The video thumbnail comprises a relatively short, low resolution, animated video thumbnail that is used as an icon of a digital video file”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of video icons as being disclosed and taught by Stavely, in the system taught by the proposed combination of Gonzalez, Surya and Barnett to yield the predictable results of efficiently reminding a user of the key aspect of the digital video file (see Stavely, [0032] “An exemplary video thumbnail comprises a plurality (sequence or series) of preferred digital video frames of the digital video file. The sequence or series of preferred digital video frames are selected to easily remind a user of the key aspect of the digital video file”).
Claim 25 is rejected under 35 U.S.C. 103 as being unpatentable over Gonzalez, Surya and Barnett in view of Megiddo et al. (US 2003/0126117 A1, hereinafter “Megiddo”).
Regarding claim 25, the proposed combination of Gonzalez, Surya and Barnett teaches
wherein the query includes… (see Gonzalez, [0042] “may search the database of salient fragments upon receiving a query about the video from a user”; [0043] “may receive a query related to a salient fragment of a first person in a video”; [0089] “queries the database of salient fragments to retrieve all fragments related to the queried salient fragment… there are many other types of queries or combinations of queries… queries based on tags, keywords, metadata”).
The proposed combination of Gonzalez, Surya and Barnett does not explicitly teach an accuracy level of the searching, wherein the accuracy level is based on input query tokenization.
However, Megiddo discloses tokens and teaches
an accuracy level of the searching, wherein the accuracy level is based on input query tokenization (see Megiddo, [0052]-[0053] “the step of identifying a token in the search query responsive to the step of receiving, wherein related expressions are assigned to the token. The search engine performs this step by looking for the predetermined identifier that identifies the predetermined keywords or phrases as a token… If the token cannot be found in the index, the search engine may provide the user interface device 102 with feedback related to the accuracy of the token. Such feedback may include an error message, a list of similar tokens, a definition of the token, examples of tokens, etcetera, in an attempt to assist a person operating the user interface device 102”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of tokens as being disclosed and taught by Megiddo, in the system taught by the proposed combination of Gonzalez, Surya and Barnett to yield the predictable results of providing a powerful way to increase the effectiveness of creating search queries to retrieve greater numbers of relevant documents, and to avoid retrieving irrelevant documents (see Megiddo, [0028] “the tokens provide a powerful way to increase the effectiveness of creating search queries to retrieve greater numbers of relevant documents, and to avoid retrieving irrelevant documents”).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VAISHALI SHAH whose telephone number is (571)272-8532. The examiner can normally be reached Monday - Friday (7:30 AM to 4:00 PM).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, AJAY BHATIA can be reached at (571)272-3906. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/VAISHALI SHAH/Primary Examiner, Art Unit 2156