Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 11, and 15 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-6, 9, 11-12, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Horsman et al. (US 20200035021 A1) in view of Abdrashitov et al. (US 20190259214 A1).
Re claim 1, Horsman discloses a video encoding method, comprising:
receiving a video sequence (Horsman: Fig. 5, step 502);
identifying character elements in the video sequence (Horsman: Fig. 5, step 504; paragraph [0074], The objects of interest can be calculated from a single 2D image and can correspond to points like the eyes, ears, neck, shoulder, hips, etc.);
extracting character key point data from the identified character elements (Horsman: Fig. 5, steps 506 and 508; paragraph [0074], At 440, objects of interest can be used to determine Joints 442);
fitting the extracted character key point data with a standard character model to obtain character fitting parameters (Horsman: paragraph [0074], Since the Objects of Interest 408 of interest can be calculated from a single 2D image, more accurate locations can be determined by calculating the best fit from multiple camera angles using a non-linear least squares minimization with a loss function that includes the confidence associated with each of the residuals and by accounting for rigidity of the distance between key points across frames);
encoding the character key point data to form encoded character key point data (Horsman: paragraph [0080], The encoding can comprise encoding a mesh stream comprising surface shape information and a texture stream comprising surface albedo information.); and
encoding the character fitting parameters to form encoded character fitting parameters (Horsman: paragraph [0080], The encoding can comprise encoding a mesh stream comprising surface shape information and a texture stream comprising surface albedo information).
Horsman does not specifically disclose that the standard character model is a virtual character model, the virtual character model is a computer-generated virtual character image, and the received video sequence is different from the computer-generated virtual character image.
However, Abdrashitov discloses that FIG. 2 depicts an example of a frame of input video content that is transformed into a corresponding frame of stylized video content using the system of FIG. 1 (Abdrashitov: paragraph [0010]). Abdrashitov also discloses FIG. 6 depicts an example of stylized facial landmark data from stylized video content generated by the process of FIG. 3 from the input facial landmark data of FIG. 4 (Abdrashitov: paragraph [0014]). A stylization engine applies one or more stylization filters to video content depicting a face (Abdrashitov: paragraph [0020]). These stylization filters modify the trajectories traveled, over the duration of the video, by key points defining a feature (e.g., a point that partially defines a mouth) and thereby provide an intentionally exaggerated or distorted version of the feature's movement (Abdrashitov: paragraph [0020]). The stylization engine 104 performs one or more operations that generate stylized video content 122 from the input video content 120 (Abdrashitov: paragraph [0027]). In FIG. 2, the face depicted in frame 202 depicts a facial expression that includes a smiling and open mouth, slightly widened eyes, and slightly raised eyebrows, while the frame 204, which is a stylized version of the frame 202, depicts this facial expression with exaggerations in the smile (e.g., by widening the smile) and the eyebrows (e.g., by sharpening the arch of the raised eyebrows) (Abdrashitov: paragraph [0027]). The stylization filters 108 generate stylized facial landmark data, which modifies the facial expressions from the original facial landmark data such that facial expressions are exaggerated, distorted, or otherwise made more expressive in the stylized video content 122 as compared to the input video content 120 (Abdrashitov: paragraph [0029]).
Since Horsman and Abdrashitov relate to modification of video data based on key point extraction, one of ordinary skill in the art before the effective filing date would have found it obvious to combine the stylization of Abdrashitov with the system of Horsman in order to provide improved computer animation (Abdrashitov: paragraph [0023]).
Re claim 2, Horsman discloses that the extracted character key point data comprises human body key point data and human face key point data (Horsman: paragraph [0074]).
Re claim 3, Horsman discloses that
the human body key point data comprises human skeleton key point data (Horsman: paragraphs [0071] and [0074]), and
the human face key point data comprises data about eyes, a nose bridge shape, and a mouth shape of a human face (Horsman: paragraph [0074]).
Re claim 4, Horsman discloses that the fitting of the extracted character key point data with the standard character model to obtain the character fitting parameters comprises:
fitting the human body key point data with a standard human body model to obtain human body fitting parameters (Horsman: paragraphs [0079]-[0080], stabilized mesh);
and/or fitting the human face key point data with a standard human face model to obtain human face fitting parameters.
Re claim 5, Horsman discloses that the encoding of the character key point data to form the encoded character key point data comprises:
encoding the human body key point data (Horsman: paragraph [0080], The encoding can comprise encoding a mesh stream comprising surface shape information and a texture stream comprising surface albedo information); and
encoding the human face key point data (Horsman: paragraph [0080], The encoding can comprise encoding a mesh stream comprising surface shape information and a texture stream comprising surface albedo information).
Re claim 6, Horsman discloses that the encoding of the character fitting parameters to form the encoded character fitting parameters comprises:
encoding the human body fitting parameters (Horsman: paragraph [0080], The encoding can comprise encoding a mesh stream comprising surface shape information and a texture stream comprising surface albedo information); and
encoding the human face fitting parameters (Horsman: paragraph [0080], The encoding can comprise encoding a mesh stream comprising surface shape information and a texture stream comprising surface albedo information).
Re claim 9, Horsman discloses forming a structured information code stream according to the encoded character key point data and the encoded character fitting parameters (Horsman: paragraph [0080], The encoding can comprise encoding a mesh stream comprising surface shape information and a texture stream comprising surface albedo information).
Re claim 11, Horsman discloses a video decoding method, comprising:
receiving a structured information code stream (Horsman: paragraph [0125], a client 2302 can transfer an encoded file, (e.g., encoded media item), to server 2304. Server 2304 can store the file, decode the file, or transmit the file to another client 2302),
wherein the structured information code stream comprises at least encoded character key point data and encoded character fitting parameters (Horsman: paragraph [0080], The encoding can comprise encoding a mesh stream comprising surface shape information and a texture stream comprising surface albedo information),
decoding the encoded character fitting parameters to form a personalized character model (Horsman: paragraph [0125], a client 2302 can transfer an encoded file, (e.g., encoded media item), to server 2304. Server 2304 can store the file, decode the file, or transmit the file to another client 2302; paragraph [0080], The encoding can comprise encoding a mesh stream comprising surface shape information and a texture stream comprising surface albedo information; paragraph [0062]); and
decoding the encoded character key point data to form a character image of a corresponding frame (Horsman: paragraph [0125], a client 2302 can transfer an encoded file, (e.g., encoded media item), to server 2304. Server 2304 can store the file, decode the file, or transmit the file to another client 2302; paragraph [0080], The encoding can comprise encoding a mesh stream comprising surface shape information and a texture stream comprising surface albedo information; paragraph [0062]).
Horsman does not specifically disclose that the encoded character key point data is obtained based on encoding of character key point data extracted from a video sequence, the encoded character fitting parameters is obtained based on fitting of character key point data with a standard character model, the standard character model is a virtual character model, the virtual character model is a computer-generated virtual character image, and the video sequence is different from the computer-generated virtual character image
However, Abdrashitov discloses that FIG. 2 depicts an example of a frame of input video content that is transformed into a corresponding frame of stylized video content using the system of FIG. 1 (Abdrashitov: paragraph [0010]). Abdrashitov also discloses FIG. 6 depicts an example of stylized facial landmark data from stylized video content generated by the process of FIG. 3 from the input facial landmark data of FIG. 4 (Abdrashitov: paragraph [0014]). A stylization engine applies one or more stylization filters to video content depicting a face (Abdrashitov: paragraph [0020]). These stylization filters modify the trajectories traveled, over the duration of the video, by key points defining a feature (e.g., a point that partially defines a mouth) and thereby provide an intentionally exaggerated or distorted version of the feature's movement (Abdrashitov: paragraph [0020]). The stylization engine 104 performs one or more operations that generate stylized video content 122 from the input video content 120 (Abdrashitov: paragraph [0027]). In FIG. 2, the face depicted in frame 202 depicts a facial expression that includes a smiling and open mouth, slightly widened eyes, and slightly raised eyebrows, while the frame 204, which is a stylized version of the frame 202, depicts this facial expression with exaggerations in the smile (e.g., by widening the smile) and the eyebrows (e.g., by sharpening the arch of the raised eyebrows) (Abdrashitov: paragraph [0027]). The stylization filters 108 generate stylized facial landmark data, which modifies the facial expressions from the original facial landmark data such that facial expressions are exaggerated, distorted, or otherwise made more expressive in the stylized video content 122 as compared to the input video content 120 (Abdrashitov: paragraph [0029]).
Since Horsman and Abdrashitov relate to modification of video data based on key point extraction, one of ordinary skill in the art before the effective filing date would have found it obvious to combine the stylization of Abdrashitov with the system of Horsman in order to provide improved computer animation (Abdrashitov: paragraph [0023]).
Re claim 12, Horsman discloses
wherein the encoded character fitting parameters comprise encoded human body fitting parameters and encoded human face fitting parameters (Horsman: paragraph [0080], The encoding can comprise encoding a mesh stream comprising surface shape information and a texture stream comprising surface albedo information);
the decoding of the encoded character fitting parameters to form the personalized character model comprises:
decoding the encoded human body fitting parameters to form a personalized human body model (Horsman: paragraph [0125], a client 2302 can transfer an encoded file, (e.g., encoded media item), to server 2304. Server 2304 can store the file, decode the file, or transmit the file to another client 2302; paragraph [0080], The encoding can comprise encoding a mesh stream comprising surface shape information and a texture stream comprising surface albedo information; paragraph [0062]); and
decoding the encoded human face fitting parameters to form a personalized human face model (Horsman: paragraph [0125], a client 2302 can transfer an encoded file, (e.g., encoded media item), to server 2304. Server 2304 can store the file, decode the file, or transmit the file to another client 2302; paragraph [0080], The encoding can comprise encoding a mesh stream comprising surface shape information and a texture stream comprising surface albedo information; paragraph [0062]),
the encoded character key point data comprises encoded human body key point data and encoded human face key point data, and
decoding the encoded character key point data to form the character image of the corresponding frame comprises:
decoding the encoded human body key point data to form a human body image of the corresponding frame (Horsman: paragraph [0125], a client 2302 can transfer an encoded file, (e.g., encoded media item), to server 2304. Server 2304 can store the file, decode the file, or transmit the file to another client 2302; paragraph [0080], The encoding can comprise encoding a mesh stream comprising surface shape information and a texture stream comprising surface albedo information; paragraph [0062]); and
decoding the encoded human face key point data to form a human face image of the corresponding frame (Horsman: paragraph [0125], a client 2302 can transfer an encoded file, (e.g., encoded media item), to server 2304. Server 2304 can store the file, decode the file, or transmit the file to another client 2302; paragraph [0080], The encoding can comprise encoding a mesh stream comprising surface shape information and a texture stream comprising surface albedo information; paragraph [0062]).
Re claim 15, Horsman discloses a method for transmitting a video sequence, comprising:
encoding the video sequence, wherein the encoding of the video sequence comprises:
receiving a video sequence (Horsman: Fig. 5, step 502);
identifying character elements in the received video sequence (Horsman: Fig. 5, step 504; paragraph [0074], The objects of interest can be calculated from a single 2D image and can correspond to points like the eyes, ears, neck, shoulder, hips, etc.);
extracting character key point data from the identified character elements (Horsman: Fig. 5, steps 506 and 508; paragraph [0074], At 440, objects of interest can be used to determine Joints 442);
fitting the extracted character key point data with a standard character model to obtain character fitting parameters (Horsman: paragraph [0074], Since the Objects of Interest 408 of interest can be calculated from a single 2D image, more accurate locations can be determined by calculating the best fit from multiple camera angles using a non-linear least squares minimization with a loss function that includes the confidence associated with each of the residuals and by accounting for rigidity of the distance between key points across frames);
encoding the extracted character key point data to form encoded character key point data (Horsman: paragraph [0080], The encoding can comprise encoding a mesh stream comprising surface shape information and a texture stream comprising surface albedo information.);
encoding the obtained character fitting parameters to form encoded character fitting parameters (Horsman: paragraph [0080], The encoding can comprise encoding a mesh stream comprising surface shape information and a texture stream comprising surface albedo information); and
forming a structured information code stream according to the encoded character key point data and the encoded character fitting parameters (Horsman: paragraph [0080], The encoding can comprise encoding a mesh stream comprising surface shape information and a texture stream comprising surface albedo information); and
decoding the structured information code stream, wherein the decoding of the structured information code stream comprises:
receiving the formed structured information code stream (Horsman: paragraph [0125], a client 2302 can transfer an encoded file, (e.g., encoded media item), to server 2304. Server 2304 can store the file, decode the file, or transmit the file to another client 2302),
wherein the structured information code stream comprises at least encoded character key point data and encoded character fitting parameters (Horsman: paragraph [0080], The encoding can comprise encoding a mesh stream comprising surface shape information and a texture stream comprising surface albedo information);
decoding the encoded character fitting parameters to form a personalized character model (Horsman: paragraph [0125], a client 2302 can transfer an encoded file, (e.g., encoded media item), to server 2304. Server 2304 can store the file, decode the file, or transmit the file to another client 2302; paragraph [0080], The encoding can comprise encoding a mesh stream comprising surface shape information and a texture stream comprising surface albedo information; paragraph [0062]); and
decoding the encoded character key point data to form a character image of a corresponding frame (Horsman: paragraph [0125], a client 2302 can transfer an encoded file, (e.g., encoded media item), to server 2304. Server 2304 can store the file, decode the file, or transmit the file to another client 2302; paragraph [0080], The encoding can comprise encoding a mesh stream comprising surface shape information and a texture stream comprising surface albedo information; paragraph [0062]).
Horsman does not specifically disclose that the standard character model is a virtual character model, the virtual character model is a computer-generated virtual character image, and the received video sequence is different from the computer-generated virtual character image
However, Abdrashitov discloses that FIG. 2 depicts an example of a frame of input video content that is transformed into a corresponding frame of stylized video content using the system of FIG. 1 (Abdrashitov: paragraph [0010]). Abdrashitov also discloses FIG. 6 depicts an example of stylized facial landmark data from stylized video content generated by the process of FIG. 3 from the input facial landmark data of FIG. 4 (Abdrashitov: paragraph [0014]). A stylization engine applies one or more stylization filters to video content depicting a face (Abdrashitov: paragraph [0020]). These stylization filters modify the trajectories traveled, over the duration of the video, by key points defining a feature (e.g., a point that partially defines a mouth) and thereby provide an intentionally exaggerated or distorted version of the feature's movement (Abdrashitov: paragraph [0020]). The stylization engine 104 performs one or more operations that generate stylized video content 122 from the input video content 120 (Abdrashitov: paragraph [0027]). In FIG. 2, the face depicted in frame 202 depicts a facial expression that includes a smiling and open mouth, slightly widened eyes, and slightly raised eyebrows, while the frame 204, which is a stylized version of the frame 202, depicts this facial expression with exaggerations in the smile (e.g., by widening the smile) and the eyebrows (e.g., by sharpening the arch of the raised eyebrows) (Abdrashitov: paragraph [0027]). The stylization filters 108 generate stylized facial landmark data, which modifies the facial expressions from the original facial landmark data such that facial expressions are exaggerated, distorted, or otherwise made more expressive in the stylized video content 122 as compared to the input video content 120 (Abdrashitov: paragraph [0029]).
Since Horsman and Abdrashitov relate to modification of video data based on key point extraction, one of ordinary skill in the art before the effective filing date would have found it obvious to combine the stylization of Abdrashitov with the system of Horsman in order to provide improved computer animation (Abdrashitov: paragraph [0023]).
Claim(s) 7-8, 10, and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Horsman et al. (US 20200035021 A1) in view of Abdrashitov et al. (US 20190259214 A1), and further in view of Jin (US 20220360707 A1).
Re claim 7, Horsman does not specifically disclose identifying background elements in the video sequence; performing background modeling according to the identified background elements to obtain background modeling data; and encoding the obtained background modeling data to form encoded background modeling data.
However, Jin discloses, after dividing the preview image G1 into the foreground image and the background image, the electronic device could obtain a plurality of predetermined background images. And then, the electronic could determine whether the plurality of background images have a predetermined background image matching the background image (Jin: paragraph [0199]). When determining that there is one predetermined background image matching the background image, the electronic device could obtain the plurality of the candidate key point sets corresponding to the predetermined background image and take them as the plurality of the candidate key point sets corresponding to the background image (Jin: paragraph [0199]). In this embodiment, when the plurality of candidate key point sets corresponding to the background image are obtained, the electronic device could determine the human body type of the human body in the photographing scene (Jin: paragraph [0202]). And then, the electronic device could determine at least one of the plurality of candidate key point sets that is corresponding to the human body type as the composition key point set corresponding to the photographing scene (Jin: paragraph [0202]).
Since Horsman and Jin relate to key point detection, one of ordinary skill in the art before the effective filing date would have found it obvious to combine the background detection of Jin with the system of Horsman in order to facilitate image quality improvement (Jin: paragraph [0004]).
Re claim 8, Horsman does not specifically disclose the performing of the background modeling according to the identified background elements to obtain the background modeling data comprises: complementing a blocked area of a background based on a plurality of video frames of the received video sequence.
However, Jin discloses that the electronic device could use the key to look up a plurality of candidate key point sets and a plurality of composition boundary frames in the composition database and determine the plurality of candidate key point sets and the plurality of composition boundary frames as a plurality of candidate key point sets and a plurality of composition boundary frames corresponding to the background image (Jin: paragraph [0108]).
Since Horsman and Jin relate to key point detection, one of ordinary skill in the art before the effective filing date would have found it obvious to combine the background detection of Jin with the system of Horsman in order to facilitate image quality improvement (Jin: paragraph [0004]).
Re claim 10, Horsman does not specifically disclose forming a structured information code stream according to the encoded character key point data, the encoded character fitting parameters, and the encoded background modeling data.
However, Jin discloses, after dividing the preview image G1 into the foreground image and the background image, the electronic device could obtain a plurality of predetermined background images. And then, the electronic could determine whether the plurality of background images have a predetermined background image matching the background image (Jin: paragraph [0199]). When determining that there is one predetermined background image matching the background image, the electronic device could obtain the plurality of the candidate key point sets corresponding to the predetermined background image and take them as the plurality of the candidate key point sets corresponding to the background image (Jin: paragraph [0199]). In this embodiment, when the plurality of candidate key point sets corresponding to the background image are obtained, the electronic device could determine the human body type of the human body in the photographing scene (Jin: paragraph [0202]). And then, the electronic device could determine at least one of the plurality of candidate key point sets that is corresponding to the human body type as the composition key point set corresponding to the photographing scene (Jin: paragraph [0202]).
Since Horsman and Jin relate to key point detection, one of ordinary skill in the art before the effective filing date would have found it obvious to combine the background detection of Jin with the system of Horsman in order to facilitate image quality improvement (Jin: paragraph [0004]).
Re claim 13, Horsman does not specifically disclose that the structured information code stream further comprises encoded background modeling data, and the video decoding method further comprises: decoding the encoded background modeling data to generate background modeling data; and generating a background image of the corresponding frame according to the background modeling data.
However, Jin discloses the electronic device could have different candidate key point sets with reasonable compositions (Jin: paragraph [0079]). The candidate key point set could correspond to the background image (Jin: paragraph [0079]). [0080] In an embodiment, the electronic device could divide the preview image of the photographing scene into a foreground image and a background image (Jin: paragraph [0080]). And then, the electronic device could obtain a plurality of candidate key point sets corresponding to the background image and determine one of the plurality of candidate key point sets as the composition key point set corresponding to the photographing scene (Jin: paragraph [0080]).
Since Horsman and Jin relate to key point detection, one of ordinary skill in the art before the effective filing date would have found it obvious to combine the background detection of Jin with the system of Horsman in order to facilitate image quality improvement (Jin: paragraph [0004]).
Re claim 14, Horsman does not specifically disclose synthesizing the video sequence according to the personalized character model, the character image of the corresponding frame, and the background image of the corresponding frame.
However, Jin discloses the electronic device could have different candidate key point sets with reasonable compositions (Jin: paragraph [0079]). The candidate key point set could correspond to the background image (Jin: paragraph [0079]). [0080] In an embodiment, the electronic device could divide the preview image of the photographing scene into a foreground image and a background image (Jin: paragraph [0080]). And then, the electronic device could obtain a plurality of candidate key point sets corresponding to the background image and determine one of the plurality of candidate key point sets as the composition key point set corresponding to the photographing scene (Jin: paragraph [0080]).
Since Horsman and Jin relate to key point detection, one of ordinary skill in the art before the effective filing date would have found it obvious to combine the background detection of Jin with the system of Horsman in order to facilitate image quality improvement (Jin: paragraph [0004]).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Contact
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHRISTOPHER G FINDLEY whose telephone number is (571)270-1199. The examiner can normally be reached Monday-Friday 9AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chris Kelley can be reached at (571)272-7331. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CHRISTOPHER G FINDLEY/Primary Examiner, Art Unit 2482