Last updated: April 19, 2026
Application No. 18/766,022
USER BEHAVIOR RECOGNITION METHOD AND APPARATUS BASED ON SCREEN RECORDING DATA, AND READABLE STORAGE MEDIUM

Non-Final OA §103
Filed
Jul 08, 2024
Examiner
TRAN, LOI H
Art Unit
2484
Tech Center
2400 — Computer Networks
Assignee
Ghawar Digital Co. Ltd.
OA Round
1 (Non-Final)
Interview Optional

— +23.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 611 resolved cases, 2023–2026
Examiner Intelligence

TRAN, LOI H View full profile →
Grants 64% of resolved cases
Career Allow Rate
394 granted / 611 resolved
+6.5% vs TC avg
Strong +24% interview lift
Without
With
+23.6%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
25 currently pending
Career history
636
Total Applications
across all art units
Statute-Specific Performance

§101
6.3%
-33.7% vs TC avg
§103
54.9%
+14.9% vs TC avg
§102
14.8%
-25.2% vs TC avg
§112
12.5%
-27.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 611 resolved cases
Office Action

§103
--
DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1,148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 
1. Determining the scope and contents of the prior art. 
2. Ascertaining the differences between the prior art and the claims at issue. 
3. Resolving the level of ordinary skill in the pertinent art. 
4. Considering objective evidence present in the application indicating     
    obviousness or nonobviousness.

Claims 1, 4, 6, 16, 17, and 20 are rejected under AIA  35 U.S.C. 103 as being unpatentable over Lu et al. (English Translation of Chinese Publication CN110851148, 02-2020) in view of Verkasalo et al. (US Publication 2015/0220814).  
Regarding claim 1, Lu discloses a user behavior recognition method based on screen recording data (Lu, para. 0002, an analysis system and a method for analyzing user behavior data based on intelligent image recognition), comprising: 
extracting key frames of image from a plurality of frames of image in screen recording data to acquire a plurality of key frames of image (Lu, para. 0036, acquiring video stream data frame by frame into keyframes; selecting the key frames that need to be recognized, and sending them to the AI image recognition module for image recognition; para’s 0031 and 0041, capturing, storing, and uploading valuable screen data generated by users during use); 
performing picture classification on the plurality of key frames of image based on the feature information of the plurality of key frames of image to acquire classification information of the plurality of key frames of image (Lu, para’s 0077-0079, extracting the unique physical features of the acquired video stream data, performing differential analysis and comparison with the detection model set in the storage server, selecting useful video stream data, and then breaking down the keyframes in the first method to obtain image recognition keyframes before sending them to the AI image recognition module for image recognition; para’s 0063 and 0078, the AI image recognition module compares the image recognition keyframe data processed by the video stream data processing module with the trained classifier models in the data storage module; obtaining, based on the comparison results, the recognition parameter results and the corresponding classification type); and 
traversing the classification information of the plurality of key frames of image in the screen recording data, and acquiring a user behavior recognition result based on an association among classification information of a plurality of consecutive key frames of image (Lu, para’s 0077-0079, extracting the unique physical features of the acquired video stream data, performing differential analysis and comparison with the detection model set in the storage server, selecting useful video stream data, and then breaking down the keyframes in the first method to obtain image recognition keyframes before sending the plurality of keyframes to the AI image recognition module for image recognition; classifying and storing the acquired and identified image data; obtaining complete consumer behavior).
Lu does not explicitly disclose but Verkasalo discloses:
performing data analysis on each of the plurality of key frames of image to extract feature information of each of the plurality of key frames of image (Verkasalo, para. 0181, the overview of the entire image content analysis process is presented in FIG. 18, which may be used to extract the features or "fingerprints", of any incoming input screenshot image, i.e., each image frame of a plurality of image frames, for subsequent matching with a library of features for recognition); 
performing picture classification on the plurality of key frames of image comprises performing picture classification on each of the plurality of key frames of image based on the feature information of each of the plurality of key frames of image to acquire classification information of each of the plurality of key frames of image, wherein the classification information characterizes an operation action performed by a user in the key frame of image (Verkasalo, para. 0065,  with reference to item 408a,b of fig. 4, the device executes other actions and the user potentially interacts therewith is configured to optionally periodically capture screen images, and with on-device logic to reconstruct them including identification of certain screen/display view areas, and/or provision of a set of compressed characteristics vectors (one could call it as the "DNA of the property/service/app") describing the identity of the content in the desired detail level  including category; para’s 0010-0013, recognizing and interpreting the events of media exposure, the content and target of such exposure, its duration and even other characteristics as experienced by the user. Collecting and validating e.g. visual data on use-initiated actions with digital devices, recognizing contextual factors (e.g. is the user outside or inside, i.e. the user's relative and/or absolute location context), and even recognizing and keeping track of various external objects and events like the fact that the user saw or provided attention to a piece of an outdoor advertising for a duration of 4 seconds in a given location. User-initiated actions may be tracked and analyzed how people execute and complete transactions, like payments or purchases, or other similar events on their digital devices. Visual information about such events can be collected and eventually the type and content of such activities can be retrieved and interpreted). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Verkasalo’s features into Lu’s invention for effectively extracting feature information of a plurality of image frames.
Regarding claim 4, Lu-Verkasalo discloses the method according to claim 1, wherein said performing data analysis on each of the plurality of key frames of image to extract feature information of each of the plurality of key frames of image comprises: in response to the feature information comprising text feature information, performing optical character recognition on each of the plurality of key frames of image to extract the text feature information in each of the plurality of key frames of image; and/or in response to the feature information comprising target feature information, performing target detection on each of the plurality of key frames of image to extract the target feature information in each of the plurality of key frames of image (Verkasalo, para. 0102, extracting, from images, not only patterns but also high-level information about the whole scene, such information may incorporate retrieving the text and the icons, or other graphics, presented on the screen, in order to understand what the screen is displaying to the viewer, automatically and without intrusion. Such information is useful in describing various kinds of device-user interaction, and may be of high technical and commercial value to many companies and across varied sectors; para. 0130, the text part may be deciphered using optical character recognition).
The motivation to combine the references and obviousness arguments are the same as claim 1.

Regarding claim 6, Lu-Verkasalo discloses the method according to claim 1, wherein said acquiring the user behavior recognition result based on the association among the classification information of the plurality of consecutive key frames of image comprises: the association comprising a dependence relationship in a time dimension of user operation actions represented by the classification information; and determining the user behavior recognition result based on the dependence relationship in the time dimension of the user operation actions represented by the classification information of the plurality of consecutive key frames of image (Verkasalo, para. 0065,  with reference to item 408a,b of fig. 4, the device executes other actions and the user potentially interacts therewith is configured to optionally periodically capture screen images, and with on-device logic to reconstruct them including identification of certain screen/display view areas, and/or provision of a set of compressed characteristics vectors (one could call it as the "DNA of the property/service/app") describing the identity of the content in the desired detail level  including category; para’s 0010-0013, recognizing and interpreting the events of media exposure, the content and target of such exposure, its duration and even other characteristics as experienced by the user. Collecting and validating e.g. visual data on use-initiated actions with digital devices, recognizing contextual factors (e.g. is the user outside or inside, i.e. the user's relative and/or absolute location context), and even recognizing and keeping track of various external objects and events like the fact that the user saw or provided attention to a piece of an outdoor advertising for a duration of 4 seconds in a given location. User-initiated actions may be tracked and analyzed how people execute and complete transactions, like payments or purchases, or other similar events on their digital devices. Visual information about such events can be collected and eventually the type and content of such activities can be retrieved and interpreted).
The motivation to combine the references and obviousness arguments are the same as claim 1.

Claims 16-17 and 20 are rejected the same reasons set forth in claims 1 and 4. Lu-Verkasalo further discloses processor(s), memory module(s), and computer readable medium (see Lu, para. 0011, processing server, data storage medium; Verkasalo, para. 0009, non-transitory medium).

Claims 2 and 18 are rejected under AIA  35 U.S.C. 103 as being unpatentable over Lu-Verkasalo, as applied to claims 1 and 17 above, in view of Siniavine (US Publication 2024/0089460).  
Regarding claim 2, Lu-Verkasalo discloses the method according to claim 1.
Lu-Verkasalo does not explicitly disclose but disclose Siniavine discloses wherein said extracting the key frames of image from the plurality of frames of image in the screen recording data comprises: comparing picture information of adjacent frames of image in the screen recording data; and in response to a picture information change of the adjacent frames of image having a proportion greater than a preset change threshold, taking a latter frame of image in the adjacent frames of image as a key frame of image (Siniavine, para. 0024, server 105 includes a scene change detector 230 to identify scene changes between consecutive frames of the rendered video stream; the scene change detector compares, based on a default threshold, a first frame 236 to a second frame 237 immediately following the first frame 236. If the scene change detector 230 determines that the first frame 236 and the second frame 237 are insufficiently correlated, the scene change detector 230 identifies a scene change. Obviously, the second frame can be determined as a key frame).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Siniavine’s features into Lu-Verkasalo’s invention for effectively selecting key frames in a sequence of video frames. 
Claim 18 is rejected for the same reasons set forth in claim 2.

Claims 3 and 19 are rejected under AIA  35 U.S.C. 103 as being unpatentable over Lu-Verkasalo, as applied to claims 1 and 17 above, in view of Li et al. (US Publication 2025/0022279).  
Regarding claim 3, Lu-Verkasalo discloses the method according to claim 1.
 Lu-Verkasalo does not explicitly disclose but disclose Li discloses wherein following acquiring the plurality of key frames of image, the method further comprises: determining whether a time interval between adjacent key frames of image is longer than a preset shortest time interval; and in response to the time interval between the adjacent key frames of image being not longer than the preset shortest time interval, deleting a latter frame of image in the adjacent key frames of image (Li, para. 0005, the time interval between the next keyframe and the previous keyframe can be determined according to preset thresholds).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Li’s features into Lu-Verkasalo’s invention for effectively selecting key frames in a sequence of video frames.
Claim 19 is rejected for the same reasons set forth in claim 3.

Claim 5 is rejected under AIA  35 U.S.C. 103 as being unpatentable over Lu-Verkasalo, as applied to claim 4 above, in view of Shanbhag et al. (US Publication 2018/0253883).  
Regarding claim 5, Lu-Verkasalo discloses the method according to claim 4.
Lu-Verkasalo does not explicitly disclose but Shanbhag discloses for adjacent key frames of image, calculating a position offset of the text feature information in the adjacent key frames of image; and in response to the position offset being less than a preset offset threshold, deleting a latter frame of image in the adjacent key frames of image (Shanbhag, para. 0063, At 1206, method 1200 may include calculating a set of keyframes having respective keyframe positions along the animation path. Text processing component 28 may calculate the set of keyframes 32, each having a corresponding keyframe position 34, along the animation path 30. Keyframe positions 34 may include the animation starting point 16, the animation end point 18, and locations on the animation path 30 where the geometry changes non-linearly for the geometric animation data. For example, keyframes 32 may be computed at the animation starting point 16 and end point 18. In addition, keyframes 32 may be computed when one or more of new regions of influence 33 are encountered by animation path 30 (e.g., the intersection of animation path 30 with the region of influence 33), when a peak value of a region of influence 33 is crossed by animation path 30, or when animation path 30 moves from a negative to positive half of a plane on a particular axis coordinate. For example, keyframe positions 34 may be determined by evaluating intersection points between the animation path 30 and any instance region of influences 33 along the animation path 30. In addition, text processing component 28 may calculate a minimum number of keyframes 32 from the starting point 16 on the animation path 30 to an end point 18 of the animation path 30. The minimum number of keyframes 32 may be computed by evaluating the intersection points with an instance regions of influence 33 along the animation path 30. The minimum number of keyframes 32 may be a smallest number of keyframes 32 that may be used to animate glyphs 15 across the entire animation path 30, for example, based on non-linear changes in geometry defined by each region of influence 33).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Shanbhag’s features into Lu-Verkasalo’s invention for effectively selecting key frames in a sequence of video frames.

Claims 7-9 and 14-15 are rejected under AIA  35 U.S.C. 103 as being unpatentable over Lu-Verkasalo, as applied to claims 6 and 7 above, in view of Han (English Translation of Chinese Publication CN105869008 08-2016). 
Regarding claim 7, Lu-Verkasalo discloses the method according to claim 6, and said determining the user behavior recognition result based on the dependence relationship in the time dimension of the user operation actions represented by the classification information of the plurality of consecutive key frames of image comprises: determining whether the classification information of the plurality of consecutive key frames of image matches a preset behavior sequence; and in response to the classification information of the plurality of consecutive key frames of image matching the preset behavior sequence, acquiring the user behavior recognition result based on the user behavior represented by the preset behavior sequence (Verkasalo, para. 0065,  with reference to item 408a,b of fig. 4, the device executes other actions and the user potentially interacts therewith is configured to optionally periodically capture screen images, and with on-device logic to reconstruct them including identification of certain screen/display view areas, and/or provision of a set of compressed characteristics vectors (one could call it as the "DNA of the property/service/app") describing the identity of the content in the desired detail level  including category; para’s 0010-0013, recognizing and interpreting the events of media exposure, the content and target of such exposure, its duration and even other characteristics as experienced by the user. Collecting and validating e.g. visual data on use-initiated actions with digital devices, recognizing contextual factors (e.g. is the user outside or inside, i.e. the user's relative and/or absolute location context), and even recognizing and keeping track of various external objects and events like the fact that the user saw or provided attention to a piece of an outdoor advertising for a duration of 4 seconds in a given location. User-initiated actions may be tracked and analyzed how people execute and complete transactions, like payments or purchases, or other similar events on their digital devices. Visual information about such events can be collected and eventually the type and content of such activities can be retrieved and interpreted).
Lu-Verkasalo does not explicitly disclose but Han discloses wherein the preset behavior sequence is used to represent the dependence relationship of the user operation actions in the time dimension, and comprises a plurality of classification information indicating a plurality of consecutive operation actions corresponding to a user behavior (Han, para. 0049, behavioral information includes all user activity information on video websites, mainly historical behavioral information related to video resource operations, such as the type of video watched, whether the user paid to watch or download the video, the resolution of the video used, the viewing time period, and whether comments were posted after watching, etc. By collecting a user's historical behavioral data and using big data analytics to organize and summarize the collected data, it is possible to determine the user's preference for different types of videos and their video viewing habits. This is used to determine the user's level of demand for different types of advertisements. The correspondence between videos and advertisements is determined by a pre-set strategy, and system administrators can modify the strategy according to the actual situation. For example, if a user frequently watches children's animation on a video website, then that user is more likely to be interested in advertisements for children's products; while if a user always watches videos featuring celebrities or variety shows, then that user is more receptive to advertisements featuring those celebrities).
It would It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Han’s features into Lu-Verkasalo’s invention for effectively classifying video content of user’s interest. 

Regarding claim 8, Lu-Verkasalo-Han discloses the method according to claim 7, wherein said determining whether the classification information of the plurality of consecutive key frames of image matches the preset behavior sequence comprises: acquiring a preset behavior sequence associated with the traversed classification information of an i-th key frame of image, where i is a positive integer greater than or equal to 1; while traversing to an (i+1)th key frame of image, acquiring a preset behavior sequence associated with the classification information of the (i+1)th key frame of image, and determining whether the classification information of the (i+1)th key frame of image matches the preset behavior sequence associated with the i-th key frame of image; in response to the classification information of the (i+1)th key frame of image matching the preset behavior sequence associated with the i-th key frame of image, retaining the preset behavior sequence associated with the i-th key frame of image, and continuing to match the classification information of the (i+2)th key frame of image with the preset behavior sequence associated with the i-th key frame of image, until the classification information of the plurality of consecutive key frames of image matches the preset behavior sequence associated with the i-th key frame of image, to acquire the user behavior recognition result; and in response to the classification information of the (i+1)th key frame of image not matching the preset behavior sequence associated with the i-th key frame of image, dropping the preset behavior sequence associated with the i-th key frame of image (Verkasalo, para. 0065,  with reference to item 408a,b of fig. 4, the device executes other actions and the user potentially interacts therewith is configured to optionally periodically capture screen images, and with on-device logic to reconstruct them including identification of certain screen/display view areas, and/or provision of a set of compressed characteristics vectors (one could call it as the "DNA of the property/service/app") describing the identity of the content in the desired detail level  including category; para’s 0010-0013, recognizing and interpreting the events of media exposure, the content and target of such exposure, its duration and even other characteristics as experienced by the user. Collecting and validating e.g. visual data on use-initiated actions with digital devices, recognizing contextual factors (e.g. is the user outside or inside, i.e. the user's relative and/or absolute location context), and even recognizing and keeping track of various external objects and events like the fact that the user saw or provided attention to a piece of an outdoor advertising for a duration of 4 seconds in a given location. User-initiated actions may be tracked and analyzed how people execute and complete transactions, like payments or purchases, or other similar events on their digital devices. Visual information about such events can be collected and eventually the type and content of such activities can be retrieved and interpreted; Han, para. 0049, Behavioral information includes all user activity information on video websites, mainly historical behavioral information related to video resource operations, such as the type of video watched, whether the user paid to watch or download the video, the resolution of the video used, the viewing time period, and whether comments were posted after watching, etc. By collecting a user's historical behavioral data and using big data analytics to organize and summarize the collected data, it is possible to determine the user's preference for different types of videos and their video viewing habits. This is used to determine the user's level of demand for different types of advertisements. The correspondence between videos and advertisements is determined by a pre-set strategy, and system administrators can modify the strategy according to the actual situation. For example, if a user frequently watches children's animation on a video website, then that user is more likely to be interested in advertisements for children's products; while if a user always watches videos featuring celebrities or variety shows, then that user is more receptive to advertisements featuring those celebrities; it is obvious for ones skilled in the art to continue comparing user’s behavior in subsequent frames and preset behavior an initial frame, retain the preset behavior when results of the comparison match the preset behavior which highly indicates users strong attention to the viewed frames, and drop the preset behavior when not matching as user’s attention has diminished).
The motivation to combine the references and obviousness arguments are the same as claim 7.
Regarding claim 9, Lu-Verkasalo-Han discloses the method according to claim 8, wherein said acquiring the preset behavior sequence associated with the traversed classification information of the i-th key frame of image comprises: matching the classification information of the i-th key frame of image with first classification information in each of multiple preset behavior sequences, and taking the matched preset behavior sequence as the preset behavior sequence associated with the classification information of the i-th key frame of image (Verkasalo, para. 0065,  with reference to item 408a,b of fig. 4, the device executes other actions and the user potentially interacts therewith is configured to optionally periodically capture screen images, and with on-device logic to reconstruct them including identification of certain screen/display view areas, and/or provision of a set of compressed characteristics vectors (one could call it as the "DNA of the property/service/app") describing the identity of the content in the desired detail level  including category; para’s 0010-0013, recognizing and interpreting the events of media exposure, the content and target of such exposure, its duration and even other characteristics as experienced by the user. Collecting and validating e.g. visual data on use-initiated actions with digital devices, recognizing contextual factors (e.g. is the user outside or inside, i.e. the user's relative and/or absolute location context), and even recognizing and keeping track of various external objects and events like the fact that the user saw or provided attention to a piece of an outdoor advertising for a duration of 4 seconds in a given location. User-initiated actions may be tracked and analyzed how people execute and complete transactions, like payments or purchases, or other similar events on their digital devices. Visual information about such events can be collected and eventually the type and content of such activities can be retrieved and interpreted; Han, para. 0049, Behavioral information includes all user activity information on video websites, mainly historical behavioral information related to video resource operations, such as the type of video watched, whether the user paid to watch or download the video, the resolution of the video used, the viewing time period, and whether comments were posted after watching, etc. By collecting a user's historical behavioral data and using big data analytics to organize and summarize the collected data, it is possible to determine the user's preference for different types of videos and their video viewing habits. This is used to determine the user's level of demand for different types of advertisements. The correspondence between videos and advertisements is determined by a pre-set strategy, and system administrators can modify the strategy according to the actual situation. For example, if a user frequently watches children's animation on a video website, then that user is more likely to be interested in advertisements for children's products; while if a user always watches videos featuring celebrities or variety shows, then that user is more receptive to advertisements featuring those celebrities; it is obvious for ones skilled in the art and/or design option for matching the classification information of the i-th key frame of image with first classification information in each of multiple preset behavior sequences, and taking the matched preset behavior sequence as the preset behavior sequence associated with the classification information of the i-th key frame of image).
The motivation to combine the references and obviousness arguments are the same as claim 7. 

Regarding claim 14, Lu-Verkasalo-Han discloses the method according to claim 7, wherein the user behavior recognition result comprises a duration of the user behavior and/or auxiliary information of the user behavior, the duration of the user behavior is a sum of durations of the plurality of consecutive key frames of image, and the auxiliary information of the user behavior is acquired in the following manner: locating candidate key frames of image from the plurality of consecutive key frames of image based on an operation action of to-be-acquired auxiliary information in the preset behavior sequence and the classification information of the plurality of consecutive key frames of image, and acquiring the auxiliary information based on feature information of the candidate key frames of image (Verkasalo, para. 0065,  with reference to item 408a,b of fig. 4, the device executes other actions and the user potentially interacts therewith is configured to optionally periodically capture screen images, and with on-device logic to reconstruct them including identification of certain screen/display view areas, and/or provision of a set of compressed characteristics vectors (one could call it as the "DNA of the property/service/app") describing the identity of the content in the desired detail level  including category; para’s 0010-0013, recognizing and interpreting the events of media exposure, the content and target of such exposure, its duration and even other characteristics as experienced by the user. Collecting and validating e.g. visual data on use-initiated actions with digital devices, recognizing contextual factors (e.g. is the user outside or inside, i.e. the user's relative and/or absolute location context), and even recognizing and keeping track of various external objects and events like the fact that the user saw or provided attention to a piece of an outdoor advertising for a duration of 4 seconds in a given location. User-initiated actions may be tracked and analyzed how people execute and complete transactions, like payments or purchases, or other similar events on their digital devices. Visual information about such events can be collected and eventually the type and content of such activities can be retrieved and interpreted; Han, para. 0049, behavioral information includes all user activity information on video websites, mainly historical behavioral information related to video resource operations, such as the type of video watched, whether the user paid to watch or download the video, the resolution of the video used, the viewing time period, and whether comments were posted after watching, etc. By collecting a user's historical behavioral data and using big data analytics to organize and summarize the collected data, it is possible to determine the user's preference for different types of videos and their video viewing habits. This is used to determine the user's level of demand for different types of advertisements. The correspondence between videos and advertisements is determined by a pre-set strategy, and system administrators can modify the strategy according to the actual situation. For example, if a user frequently watches children's animation on a video website, then that user is more likely to be interested in advertisements for children's products; while if a user always watches videos featuring celebrities or variety shows, then that user is more receptive to advertisements featuring those celebrities).
The motivation to combine the references and obviousness arguments are the same as claim 7.
Regarding claim 15, Lu-Verkasalo-Han discloses the method according to claim 7, wherein the preset behavior sequence is acquired in the following manner: acquiring an application identifier corresponding to the screen recording data from log information of the screen recording data; and acquiring the preset behavior sequence associated with the application identifier (Han, para. 0059, the user behavior information can be obtained by analyzing and filtering the user's operation logs on the website using an ID as known in the art. After obtaining user viewing behavior information, the corresponding advertisement is searched. Since the advertisements are categorized with different tags before going live, the tags in the advertisements viewed by the user can be extracted, deduplicated, and marked as tags for the advertisement category that the user needs. The tag information is the advertisement category information that the user needs).
The motivation to combine the references and obviousness arguments are the same as claim 7.

Claims 10 and 11 are rejected under AIA  35 U.S.C. 103 as being unpatentable over Lu-Verkasalo, as applied to claim 1 above, in view of Park (US Publication 2018/0144194).  
Regarding claim 10, Lu-Verkasalo discloses the method according to claim 1.
 Lu-Verkasalo does not explicitly disclose but Park discloses wherein prior to said traversing the classification information of the plurality of key frames of image in the screen recording data, the method further comprises: filtering the plurality of key frames of image based on the classification information, and taking the key frames of image after the filtering as traversed key frames of image (Park, para’s 0139-0148, the system can provide classification on portions of the video, and filter out the portion according to the classification information).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Park’s features into Lu-Verkasalo’s invention for effectively selecting key frames in a sequence of video frames.
Regarding claim 11, Lu-Verkasalo-Park discloses the method according to claim 10, wherein said filtering the plurality of key frames of image based on the classification information comprises: determining whether there are multiple consecutive key frames of image with the same classification information according to a time sequence of the plurality of key frames of image; and in response to there being multiple consecutive key frames of image with the same classification information, retaining one key frame of image among the multiple consecutive key frames of image with the same classification information (Park, para’s 0139-0148, the system can provide classification on a time sequence portion of the video, and filter out the portion according to the classification information. it is obvious for ones skilled in the art to determine whether there are multiple consecutive key frames of image with the same classification information according to a time sequence of the plurality of key frames of image; and in response to there being multiple consecutive key frames of image with the same classification information, retaining one key frame of image among the multiple consecutive key frames of image with the same classification information).
The motivation to combine the references and obviousness arguments are the same as claim 10.

Claims 12 and 13 are rejected under AIA  35 U.S.C. 103 as being unpatentable over Lu-Verkasalo-Han, as applied to claim 8 above, in view of Park (US Publication 2018/0144194).  
Regarding claim 12, Lu-Verkasalo-Han discloses the method according to claim 8.
 Lu-Verkasalo does not explicitly disclose but Park discloses wherein prior to said traversing the classification information of the plurality of key frames of image in the screen recording data, the method further comprises: filtering the plurality of key frames of image based on the classification information, and taking the key frames of image after the filtering as traversed key frames of image (Park, para’s 0139-0148, the system can provide classification on portions of the video, and filter out the portion according to the classification information).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Park’s features into Lu-Verkasalo-Han’s invention for effectively selecting key frames in a sequence of video frames.

Regarding claim 13, Lu-Verkasalo-Han-Park discloses the method according to claim 12, wherein said filtering the plurality of key frames of image based on the classification information comprises: determining whether there are multiple consecutive key frames of image with the same classification information according to a time sequence of the plurality of key frames of image; and in response to there being multiple consecutive key frames of image with the same classification information, retaining one key frame of image among the multiple consecutive key frames of image with the same classification information (Park, para’s 0139-0148, the system can provide classification on a time sequence portion of the video, and filter out the portion according to the classification information. it is obvious for ones skilled in the art to determine whether there are multiple consecutive key frames of image with the same classification information according to a time sequence of the plurality of key frames of image; and in response to there being multiple consecutive key frames of image with the same classification information, retaining one key frame of image among the multiple consecutive key frames of image with the same classification information).
The motivation to combine the references and obviousness arguments are the same as claim 12.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LOI H TRAN whose telephone number is (571)270-5645. The examiner can normally be reached 8:00AM-5:00PM PST FIRST FRIDAY OF BIWEEK OFF.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, THAI TRAN can be reached at 571-272-7382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/LOI H TRAN/           Primary Examiner, Art Unit 2484
Read full office action
Prosecution Timeline

Jul 08, 2024
Application Filed
Nov 25, 2025
Examiner Interview (Telephonic)
Nov 29, 2025
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/463,427
Patent 12598366
CONTENT DATA PROCESSING METHOD AND CONTENT DATA PROCESSING APPARATUS
2y 5m to grant Granted Apr 07, 2026
18/194,454
Patent 12593112
METHOD, DEVICE, AND COMPUTER PROGRAM FOR ENCAPSULATING REGION ANNOTATIONS IN MEDIA TRACKS
2y 5m to grant Granted Mar 31, 2026
18/528,425
Patent 12592261
VIDEO EDITING METHOD AND APPARATUS, AND DEVICE AND STORAGE MEDIUM
2y 5m to grant Granted Mar 31, 2026
17/302,302
Patent 12576798
CAMERA SYSTEM AND ASSISTANCE SYSTEM FOR A VEHICLE AND A METHOD FOR OPERATING A CAMERA SYSTEM
2y 5m to grant Granted Mar 17, 2026
18/322,321
Patent 12579810
SYSTEM AND METHOD FOR AUTOMATIC EVENTS IDENTIFICATION ON VIDEO
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
64%
Grant Probability
88%
With Interview (+23.6%)
2y 10m
Median Time to Grant
Low
PTA Risk
Based on 611 resolved cases by this examiner. Grant probability derived from career allow rate.