DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments filed 02/04/2026 have been fully considered but they are not persuasive.
On page 7, Applicant argues that,
Bliss, Hartley and Jung do not disclose or suggest the features of Claim 1. The rejection acknowledges that Bliss and Hartley do not teach the feature “wherein the association file is generated as a subtitle file of the main video, and the bookmark entry is recorded in a form of text strings in the association file,” but relies on Jung for this feature. In particular, the rejection alleges the following on pages 11-12 of the Office Action:
Jung discloses an association file is generated as a subtitle file of a main video, and a bookmark entry is recorded in a form of text strings in the association file (Fig. 2; page 3 - an association file is generated as a caption file using GPS data received from the GPS receiver 10 recording longitude and latitude coordinates and time as text string).
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to incorporate the teachings of Jung into the
method taught by Bliss and Hartley to conveniently provide detailed location information regarding the landmarks while the main video is played back to enhance viewer's viewing experience.
However, when properly considered in view of their respective disclosures, technical fields, and operative functions, these references do not teach or suggest the claimed invention, either individually or in combination.
In response, Examiner respectfully disagrees. Examiner would like to provide the reasons by responding to each of the arguments that Applicant further made below.
On pages 7-8, Applicant further argues that,
1. Bliss Does Not Teach or Suggest Metadata-Based Location Matching as Claimed
The rejection alleges that Bliss discloses the feature of "scanning through the plurality of multimedia files to find one file from among the plurality of multimedia files that has metadata corresponding to one of the locations in the geolocation file," citing paragraphs [0066] and [0186]-[0187] of Bliss (see page 7 of the Office Action). Respectfully, these passages do not disclose the claimed feature.
Bliss is directed to a video "snipping" and social interaction system in which "objects" (e.g., comments, tags, people, or locations) are associated with "marked moments" in a video timeline (see Abstract and paragraphs [0008], [0012], [0065], and [0066] of Bliss). As described in Bliss, marked moments are represented using Video Snip Field (VSF) structures that include a timestamp and a representation of an associated object (see Fig. 3A and paragraph [0065] of Bliss). However, Bliss does not disclose scanning a plurality of stored multimedia files based on metadata that encodes locations, nor does it disclose that such metadata corresponds to locations recorded in a geolocation file generated for a main video.
More specifically, Bliss does not describe metadata of other multimedia files that records capture locations, nor does it describe generating or enriching metadata of multimedia files with location information. The cited portions of Bliss describe associating marked moments of a video with objects in a conceptual or social context, rather than processing multimedia files to extract landmarks, determining locations from those landmarks, and recording those locations into metadata for subsequent scanning and matching. Accordingly, the rejection's characterization of "markings" or "objects" in Bliss as "metadata corresponding to locations" is not supported by the disclosure of Bliss.
By contrast, Claim 1 expressly recites "by the processor, for a subset of the plurality of multimedia files that do not have any geolocation information, performing a landmark detection process on the subset of the plurality of multimedia files to determine locations of landmarks present in the subset of the plurality of multimedia files," and further recites "by the processor, recording the locations thus determined in metadata of the subset of the plurality of multimedia files." Bliss contains no teaching or suggestion of performing landmark detection on multimedia files, no disclosure of determining geographic locations from visual content, and no disclosure of recording such determined locations into metadata of stored multimedia files. Accordingly, Bliss does not teach or suggest the features of Claim 1.”
In response, Examiner respectfully disagrees and submits that, in at least [0179]-[0181], Bliss wrote:
[0179] In another embodiment, the location information is discoverable. For instance, the location information is associated with the device capturing the source digital video, in one embodiment. That is, the location information comprises geographic information associated with the device at the instance in time that the source digital video is being taken. More specifically, geographic information determined by the capturing device, and designating the geographic position of the capturing device when capturing the source digital video, may be read and imprinted as meta data to the source digital video. As such, geographic information is included and associated with the source digital video.
[0180] In addition, the location information that is discoverable may pertain to an object captured within the source digital video. Using the previous example of a scenic movie, images and/or objects within images may be recognizable and associated with geographic information. For instance, the video snipping server may recognize certain objects captured within the source digital video and deliver location information suggestions through the marking interface returned back to the creator user.
[0181] In one embodiment, individual frames or images, or a small set of frames or images, within the source digital video is associated with location information. For instance, the individual or set of frames may be associated with meta data indicating the geographic position of the capturing device when capturing the image and/or frame, in one instance. In another instance, the geographic position is associated with an object captured within one of the images of the source digital video. As previously described, the location information pertaining to the individual or set of frames may be user defined or discoverable.
(emphases added)
Examiner respectfully submits that, according to Bliss teachings above, there are at least two types of video files: 1) video files associated with metadata storing location information of the capturing device when the capturing devices recorded the video data, and 2) video files that do not have this metadata. Thus, the set of the video files in type 2) clearly is a subset of the plurality of multimedia files that do not have any geolocation information.
For this subset of video files, Bliss clearly teaches the processor performs recognition of object within the video frames and determine its location information.
For all video files, the location information is recorded as metadata at least in a field of the VSF as shown in Figs. 3A-3B.
The system then uses this piece of information of a VSF of a main video file to determine those video files that include an object located at the same geolocation information, as described at least in [0186] as follows:
[0186] As such, although the source digital video 1300 includes separately taken videos, each pertaining to different vacation locations, the markings within the source digital video help give the video relevance. That information included in the markings is searchable and can be grouped together with other videos having similar object associations. For instance a viewer searching for movies with images taken at Bullfrog Marina in Lake Powell will discover the marked video snip including the third marked moment 1360, regardless of the superfluous inclusion of the beach images, and Las Vegas images.
(emphasis added)
As such, Applicant’s arguments are not persuasive.
On pages 8-9, Applicant argues that Hartley and Jung do not disclose the same features thus do not cure the deficiencies of Bliss.
In response, Examiner respectfully submits that the arguments are moot in view of the discussion of Bliss above.
On pages 9-10, Applicant argues that,
“4. No Motivation to Combine and No Reasonable Expectation of Success
The rejection asserts that it would have been obvious to incorporate Jung's subtitle file format into the systems of Bliss and Hartley (pages 9-10 of the Office Action). However, even assuming arguendo that such a combination were made, the resulting system would still fail to meet the features of Claim 1.
Bliss addresses social annotation of video moments, Hartley addresses synchronization of multi-channel video streams, and Jung addresses GPS-based advertisement monitoring. These references solve different technical problems in different technical fields. None of the references recognizes the problem addressed by Applicants-namely, that personal images and videos often lack geolocation metadata, and that landmark detection can be used to infer and record such location information for subsequent content matching and aggregation (see paragraphs [0024] and [0037] of the present specification).
Moreover, Claim 1 recites a specific technical solution, namely, performing landmark detection on multimedia files lacking geolocation information, recording detected landmark locations into metadata, and then scanning multimedia files based on that metadata to find matched files. There is no teaching, suggestion, or motivation in Bliss, Hartley, or Jung to perform such landmark detection or metadata enrichment, nor would a person of ordinary skill in the art have had a reasonable expectation of success in arriving at the claimed invention by combining these disparate disclosures.
5. Additional Technical Effects and Advantages
As described, e.g., at paragraph [0037] the specification, by performing landmark detection and recording detected locations into metadata, the claimed method enables, e.g., advanced content aggregation based on landmarks. Multimedia files can be grouped and recommended based on shared or related landmarks (e.g., different landmarks within the same city), allowing users to play through video clips and image slideshows associated with a common geographic context. These technical effects are neither taught nor suggested by the cited references and further support the non-obviousness of the claimed invention.
For at least the reasons set forth above, the cited references-either alone or in combination-do not teach or suggest the features of Claim 1. Accordingly, Claim 1 and its dependent claims are patentably distinguishable over the cited references. Withdrawal of the rejection is requested.”
As such, Applicant’s arguments are not persuasive and submits that, at least Bliss teaches for video files that lack geolocation metadata, the system performs recognition of objects in the frames and determine location information as described at least in [0180]. Such location information is recorded as metadata in field of a VSF structure which can be used for subsequent content matching and aggregation (searching for other video files that include the same object for display).
Hartley and Jung are only relied upon to disclose details such as using an association file as a subtitle file of a main video file to associate matched files, and how to switch playback among the matched video files and a main video file in the association. These features are relevant in managing the files and reproducing the files taught by Bliss effectively in a manner that enhances the viewer’s experience as described in the Office Action.
As such, Applicant’s arguments are not persuasive.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-7, 11, and 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Bliss et al. (US 2011/0158605 A1 – hereinafter Bliss), Hartley et al. (US 2013/0188923 A1 – hereinafter Hartley), and Jung et al. (KR 10-2006-0033296 – hereinafter Jung, references to machine translated copy attached).
Regarding claim 1, Bliss discloses a method of displaying multimedia content related to a location appearing in a video, the method to be implemented by an electronic system that includes a display unit, a processor, and a memory unit storing a main video to be played on the display unit and a plurality of multimedia files, the method comprising: by the processor, generating, based on the main video, a geolocation file that records timestamps and locations related to the main video and respectively corresponding to the timestamps (Figs. 3A-3B; Fig. 12; [0065]-[0066]; [0070] – a file storing VFS’s, each comprising a timestamp and a corresponding representation of an object, which is a geo-location as described at least in [0066], [0070], and [0170]-[0186]); by the processor, for a subset of the plurality of multimedia files that do not have any geolocation information, performing a landmark detection process on the subset of the plurality of multimedia files to determine locations of landmarks present in the subset of the plurality of multimedia files ([0180] – for those video files that do not have any geolocation information, i.e. those files with which the location information is not associated with the device capturing the source digital video as described at least in [0179] or [0181], performing a landmark detection process by the processor to recognize and associate an object within the frames of the video with geographic information); by the processor, recording the locations thus determined in metadata of the subset of the plurality of multimedia files ([0180] – recording the determined location information as metadata of the corresponding video files at least in a field in a VFS as shown in Figs. 3A-3B); by the processor, scanning through the plurality of multimedia files to find one file from among the plurality of multimedia files that has metadata corresponding to one of the locations in the geolocation file, and making the file serve as a matched file ([0066]; [0186]-[0087] – searching other videos or movies having metadata, which is information in markings of the videos or movies, corresponding to the locations in the geolocation file of the digital video comprising the objects as described in [0066]); by the processor, associating the matched file with the main video ([0066]; [0186] – connecting the marked moment of the main video to the matched file, which is one of the other videos); by the processor, playing the main video on the display unit ([0068]; [0070]; [0093]-[0094] – playing the marked video on the display unit).
However, Bliss does not disclose associating, by the processor, the matched file with the main video by generating an association file for the main video, the association file recording a bookmark entry that indicates a file path to the matched file and one of the timestamps which corresponds to said one of the locations in the geolocation file; displaying, by the processor, a thumbnail of the matched file on the display unit when the main video is at a time of the one of the timestamps, and displaying the matched file on the display unit when the thumbnail is selected, wherein in associating the matched file with the main video, the association file is a companion file of the main video, is stored in a same directory as the main video, and has a same file name as the main video but a different file extension from the main video, wherein the association file is generated as a subtitle file of the main video, and the bookmark entry is recorded in a form of text strings in the association file.
Hartley discloses associating, by a processor, a matched file with a main video by generating an association file for the main video, the association file recording a bookmark entry that indicates a file path to the matched file and one of the timestamps which corresponds to one of location (Fig. 3B; [0032]-[0033] – an association file recording references to the files and timestamps synchronized to location information); displaying, by the processor, a thumbnail of the matched file on the display unit when the main video is at a time of the one of the timestamps, and displaying the matched file on the display unit when the thumbnail is selected ([0034]; Fig. 1 – when the main video, i.e. the active video, is at a current time, thumbnails of related files are displayed and when a thumbnail is selected, corresponding video segment of a related file is displayed), wherein in associating the matched file with the main video, the association file is a companion file of the main video (Figs. 3A-3B).
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to incorporate the teachings of Hartley into the method taught by Bliss to enhance the playback interface of the video content so that portions of matched files could have been easily recognized and selected for playback. Further, in view of such a combination of Bliss and Hartley, one skilled in the art would have recognized that the locations in the association file taught by Hartley would have been corresponding to in the geolocation file taught by Bliss.
However, Bliss and Hartley do not explicitly disclose the association file is stored in a same directory as the main video, and has a same file name as the main video but a different file extension from the main video.
Official Notice is taken that storing a companion file in a same directory as a main file, and naming the companion file with a same file name as the main file but a different file extension from the main file is conventional and well known in the art.
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to incorporate storing and naming a companion file as described above into the method of Bliss and Hartley in order to easily recognize and locate the files.
However, Bliss and Hartley in view of the taken Official Notice do not disclose the association file is generated as a subtitle file of the main video, and the bookmark entry is recorded in a form of text strings in the association file.
Jung discloses an association file is generated as a subtitle file of a main video, and a bookmark entry is recorded in a form of text strings in the association file (Fig. 2; page 3 – an association file is generated as a caption file using GPS data received from the GPS receiver 10 recording longitude and latitude coordinates and time as text string).
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to incorporate the teachings of Jung into the method taught by Bliss and Hartley to conveniently provide detailed location information regarding the landmarks while the main video is played back to enhance viewer’s viewing experience.
Regarding claim 2, Bliss also discloses in scanning through the plurality of multimedia files to find the one file, the metadata of the matched file records a location that matches said one of the locations recorded in the geolocation file ([0066]; [0186]-[0087] – searching other videos or movies having metadata, which is information in markings of the videos or movies, having same locations in the geolocation file of the digital video comprising the objects as described in [0066]).
Regarding claim 3, Bliss also discloses wherein the location recorded in the metadata of the matched file indicates a location where the matched file was generated ([0066] – a location wherein the video was taken thus generated).
Regarding claim 4, Bliss also discloses the location recorded in the metadata of the matched file indicates a location of a landmark that is detected in the matched file ([0185]-[0186] – landmarks such as a place name for a casino in Las Vegas, or Bullfrog Marina in Lake Powell, etc.).
Regarding claim 5, Bliss also discloses the location recorded in the metadata of the matched file is within a specific distance from the one of the locations recorded in the geolocation file ([0066]; [0186]-[0087]).
Regarding claim 6, Bliss also discloses the main video including different video parts, metadata of the main video storing plural pieces of geolocation information each corresponding to a respective one of the different video parts, and each indicating a timestamp of the video part in the main video and a location where the video part was recorded ([0070]); wherein in generating a geolocation file that records timestamps and locations, the geolocation file records the timestamps and the locations indicated by the plural pieces of geolocation information ([0070] - timestamps and corresponding representations of object, which are a geo-locations as described at least in [0066], [0070], and [0170]-[0186]).
Regarding claim 7, Bliss also discloses prior to generating a geolocation file, further comprising: by the processor, determining landmarks appearing in the main video by performing the landmark detection process on the main video in order to determine the locations related to the main video ([0170]-[0186]).
Regarding claim 11, Bliss also discloses in associating the matched file with the main video, the file path indicated by the bookmark entry recorded in the association file represents a Uniform Resource Identifier (URI) of the matched file ([0070]).
Regarding claim 14, Bliss also discloses based on the association file, displaying a bookmark indicator on a video progress bar of the main video, the bookmark indicator corresponding to the bookmark entry recorded in the association file and being located at a position of the video progress bar that corresponds to the timestamp indicated by the bookmark entry (Fig. 8; Fig. 11B).
Regarding claim 15, see the teachings of Bliss and Hartley as discussed in claim 1 above, in which Hartley also discloses, prior to displaying a thumbnail of the matched file, further comprising: by the processor, accessing the matched file based on the file path indicated by the bookmark entry recorded in the association file, and generating the thumbnail of the matched file (Fig. 3B; [0032]-[0034] – prior to display a thumbnail of a matched file, accessing the matched file based on the references to the files and timestamps synchronized to location information in the association file).
The motivation for incorporating the teachings of Hartley into the method of Bliss has been discussed in claim 1 above.
Claims 8-10 are rejected under 35 U.S.C. 103 as being unpatentable over Bliss, Hartley, and Jung as applied to claims 1-7, 11, and 14-15 above, and further in view of Oguz et al. (US 2003/0142750 A1 – hereinafter Oguz).
Regarding claim 8, see the teachings of Bliss, Hartley, and Jung as discussed in claim 7 above. Bliss also discloses the main video including a sequence of video frames that respectively correspond to different timestamps ([0076]), wherein performing the landmark detection process includes: extracting frames from the main video and making at least a part of the frames thus extracted serve as key frames for the main video ([0010]; [0180] – at least a first frame that correspond to the moment, i.e. when the moment starts); and with respect to each of the key frames, detecting a landmark in the key frame to obtain a detection result that indicates a location of the landmark represented in a set of longitude and latitude coordinates (Figs. 3A-3B; Fig. 12; [0063]-[0066]; [0070] – detecting a representation of an object, which is a location of the landmark represented as geo-location as described at least in [0066], [0070], and [0170]-[0186], in the first frame, i.e. at the marked time).
However, Bliss, Hartley, and Jung do not disclose the sequence of video frames are composed of I-frames, P-frames and B-frames, the extracted frames are the I-frames from the main video and making at least a part of the I-frames thus extracted serve as key frames for the main video.
Oguz discloses a sequence of video frames are composed of I-frames, P-frames and B-frames ([0005] – an MPEG video stream comprises I-frames, P-frames and B-frames); extracted frames are the I-frames from the main video (Fig. 1; [0013]-[0014] – I-frames are extracted to detect a scene change); and making at least a part of the I-frames thus extracted serve as key frames for the main video ([0045] – a current I-frame is used as a key frame serve as reference images for scene changes).
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to incorporate the teachings of Oguz into the method taught by Bliss, Hartley, and Jung to facilitate the detection of the landmark since decoding those frames is faster since each I-frame containing a complete description of a single video frame (image or picture) independent of any other frame.
Regarding claim 9, see the teachings of Bliss, Hartley, Jung, and Oguz as discussed in claim 8 above, in which Oguz also discloses in making at least a part of the l-frames thus extracted serve as key frames, the processor selects scene-changing frames from among the I-frames using a scene filter, and makes the scene-changing frames thus selected serve as the key frames ([0013]-[0014]; [0045]).
The motivation for incorporating the teachings of Oguz into the method of Bliss, Hartley, and Jung has been discussed in claim 8 above.
Regarding claim 10, Bliss in view of Oguz also discloses in generating a geolocation file that records timestamps and locations, the timestamps of the key frames and the locations of the landmarks thus detected in the key frames are recorded in the geolocation file (Figs. 3A-3B; Fig. 12; [0063]-[0066]; [0070] – detecting a representation of an object, which is a location of the landmark represented as geo-location as described at least in [0066], [0070], and [0170]-[0186], in the first frame, i.e. at the marked time – in view of the Oguz disclosing the first frame of a scene change, thus denoting the moment being detected for the first time, is a key frame). The motivation for incorporating the teachings of Oguz into the method of Bliss, Hartley, and Jung has been discussed in claim 8 above.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUNG Q DANG whose telephone number is (571)270-1116. The examiner can normally be reached IFT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Thai Q Tran can be reached on 571-272-7382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/HUNG Q DANG/Primary Examiner, Art Unit 2484