Last updated: April 19, 2026
Application No. 18/357,892
SYSTEMS AND METHODS OF TRAFFIC MEASUREMENT USING IMAGE CAPTURE DEVICES VIA COMPUTER VISION

Final Rejection §103§112
Filed
Jul 24, 2023
Examiner
ALLEN, LUCIUS CAMERON GREE
Art Unit
2673
Tech Center
2600 — Communications
Assignee
Walmart Apollo LLC
OA Round
2 (Final)
Interview Optional

— +39.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 38 resolved cases, 2023–2026
Examiner Intelligence

ALLEN, LUCIUS CAMERON GREE View full profile →
Grants 71% — above average
Career Allow Rate
27 granted / 38 resolved
+9.1% vs TC avg
Strong +39% interview lift
Without
With
+39.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
20 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
12.6%
-27.4% vs TC avg
§103
53.7%
+13.7% vs TC avg
§102
8.5%
-31.5% vs TC avg
§112
23.7%
-16.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 38 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of AIA  Status
The present application is being examined under the AIA  the first inventor to file provisions.

Response to Amendments
Applicant’s arguments see remarks, filed 01/16/2026, with respect to the claim objections, 112(f) interpretation of for claims 10-19, and 112(a) and 112(b) have been fully considered and are persuasive due to amendments thus have been withdrawn.
Please note due to the amendments claim 12 and 14 are no longer objected because the 112(f) was overcome.

Response to Arguments
Applicant’s arguments see remarks, filed 01/16/2026, with respect to the claim objections, 112(f) interpretation of for claims 1-9, have been fully considered and are not persuasive.
Applicant’s arguments see remarks, filed 01/16/2026, with respect to claims 1-20 have been fully considered but are moot because the arguments do not apply to the current combinations of references being used in the current rejection.



Claim Objections
Claims 1, and 11-18 are objected to because of the following informalities:
In Claim 1, Line 2, the term “an image capture device that generates” can be changed to “ for typographical/grammar issues to avoid clarity issues to prevent interpretation under 35 U.S.C. 112(f). 112(f) is not a rejection but a interpretation to clearly show what the office understands a block box structure to be.
In Claim 1, Line 7, the term “a set of instructions to:” should be changed to “image of a set of instructions that cause the processor to:” for typographical/grammar issues to avoid clarity issues to prevent interpretation under 35 U.S.C. 112(f) and to enhance patent quality. Appropriate correction is required. 
In Claim 11, Line 1, the term “The computer-implemented method of claim 10,” should be changed to “The computer-implemented method of claim 10, by the one or more processors” for typographical/grammar issues to avoid clarity issues to prevent interpretation under 35 U.S.C. 112(f) and to enhance patent quality. Appropriate correction is required. 
In Claim 12, Line 1, the term “The computer-implemented method of claim 10,” should be changed to “The computer-implemented method of claim 10, by the one or more processors” for typographical/grammar issues to avoid clarity issues to prevent interpretation under 35 U.S.C. 112(f) and to enhance patent quality. Appropriate correction is required. 
In Claim 13, Line 1, the term “The computer-implemented method of claim 10,” should be changed to “The computer-implemented method of claim 10, by the one or more processors” for typographical/grammar issues to avoid clarity issues to prevent interpretation under 35 U.S.C. 112(f) and to enhance patent quality. Appropriate correction is required. 
In Claim 14, Line 1, the term “The computer-implemented method of claim 10,” should be changed to “The computer-implemented method of claim 10, by the one or more processors” for typographical/grammar issues to avoid clarity issues to prevent interpretation under 35 U.S.C. 112(f) and to enhance patent quality. Appropriate correction is required. 
In Claim 15, Line 1, the term “The computer-implemented method of claim 10,” should be changed to “The computer-implemented method of claim 10, by the one or more processors” for typographical/grammar issues to avoid clarity issues to prevent interpretation under 35 U.S.C. 112(f) and to enhance patent quality. Appropriate correction is required. 
In Claim 16, Line 1, the term “The computer-implemented method of claim 10,” should be changed to “The computer-implemented method of claim 10, by the one or more processors” for typographical/grammar issues to avoid clarity issues to prevent interpretation under 35 U.S.C. 112(f) and to enhance patent quality. Appropriate correction is required. 
In Claim 17, Line 1, the term “The computer-implemented method of claim 16,” should be changed to “The computer-implemented method of claim 10, by the one or more processors” for typographical/grammar issues to avoid clarity issues to prevent interpretation under 35 U.S.C. 112(f) and to enhance patent quality. Appropriate correction is required. 
In Claim 18, Line 1, the term “The computer-implemented method of claim 10,” should be changed to “The computer-implemented method of claim 10, by the one or more processors” for typographical/grammar issues to avoid clarity issues to prevent interpretation under 35 U.S.C. 112(f) and to enhance patent quality. Appropriate correction is required. 

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
Claims 1, 3, 5, and 7-9 recite limitations that use words like “means” (or “step”) or similar terms with functional language and do invoke 35 U.S.C. 112(f):
Claim 1; recites the limitation, “image capture device configured to…,” [Line 2].
Claim 1; recites the limitation, “image processing model to…,” [Line 9].
Claim 3; recites the limitation, “regression model to…,” [Line 2].
Claim 5; recites the limitation, “a regression model to…,” [Line 6].
Claim 7; recites the limitation, “image processing model is configured to…,” [Line 1].
Claim 8; recites the limitation, “image processing model is configured to…,” [Line 1].
Claim 9; recites the limitation, “image capture device is configured to …,” [Line 1].
Because this/these claim limitation(s) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
After a careful analysis, as disclosed above, and a careful review of the specification the following limitations in claim 1, 3, 5, and 7-9:
“image capture device” (Fig. 2, #22a and 22b. Paragraph [0041] - “the input/output subsystem 6 can include an image capture device operative to obtain computer-readable image data from a selected portion of a physical environment. The image capture device can include any suitable image capture device, such as, for example, a charge-coupled device (CCD), an electron-multiplying charge-coupled device (EMCCD), a complementary metal-oxide-semiconductor (CMOS) device, a back-illuminated CMOS, and/or any other suitable image capture device. The image capture device can be configured to obtain still images and/or continuous images.” Thus, has sufficient structure wherein is any kind of camera).
“image processing model” (Fig. 1, #24 Paragraph [0058]- “The trained image processing models can be configured to perform one or more computer vision processing tasks. For example, in some embodiments, one or more trained image processing models are configured to perform image recognition (e.g., detecting persons within image data), object localization (e.g., identifying a location of an object or person within the image data), object tracking (e.g., determining a moving or changing position of a detected object or person within the image data), and/or any other suitable image processing task. The trained image processing models can include any suitable image processing model, such as, for example, deep learning model or framework such as Convolutional Neural Networks (CNNs), Region-Based CNNs (R-CNNs), Faster R-CNN with Region Proposal Networks (RPN), Mask R-CNNs, You Only Look Once (YOLO) models, You Only Learn One Representation (YOLOR) models, and/or other suitable deep learning models. Although embodiments are discussed herein including deep learning models, it will be appreciated that any suitable machine learning framework configured to image processing tasks can be used.” Thus, has sufficient structure wherein is any kind of machine learning that performs image processing).
“regression model” (Paragraph [0112] – regression model is seen as the trained regression model from Specification paragraph [0112]- “generalization of the at least one engagement metric 270 can be performed by a trained model, such as, for example, a trained regression model. For example, in some embodiments, a trained regression model can be configured to apply an auto regression defined by: yt=∑k=0p1αkTt-k+∑k=0p2βkC(t-k)+∑k=1p3γky^(t-k)+ϵt where yt is the a predicted time series representative of the at least one engagement metric 270 for an additional location, T is a selected time-series feature (e.g., transactions, units sold, etc.), C is a nearest cluster medoid time series (e.g., the engagement metric 270 represented as a time-series for the selected representative environment), and αk,β(k),γ(k) are regression parameters generated during iterative training of the model. In some embodiments, the trained regression model is generated by an iterative training process based on actual and predicted time-series values for the at least one engagement metric 270.” Thus, has sufficient structure wherein is a auto regressor defined by yt=∑k=0p1αkTt-k+∑k=0p2βkC(t-k)+∑k=1p3γky^(t-k)+ϵt where yt is the a predicted time series representative of the at least one engagement metric 270 for an additional location, T is a selected time-series feature (e.g., transactions, units sold, etc.), C is a nearest cluster medoid time series (e.g., the engagement metric 270 represented as a time-series for the selected representative environment), and αk,β(k),γ(k) are regression parameters generated during iterative training of the model).
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 6-7, 9-10, 16, and 18 are rejected under 35 U.S.C 103 as being unpatentable over Jayaraman et al. (US 20240119822 A1) hereafter referenced as Jayaraman in view of Saurabh et al. (US 11367083 B1) hereafter referenced as Saurabh, Smith et al. (US 20180322660 A1) hereafter referenced as Smith, and Choi et al. (US 20230016304 A1) hereafter referenced as Choi.
Regarding claim 1, Jayaraman explicitly teaches a system (Fig. 2, Paragraph [0035]- Jayaraman discloses referring to FIG. 2, the system 200 may be one or more controllers, servers, and/or computers located in a security panel or part of a central computing system of a retail environment such as the retail environment 100 (referred to above in FIG. 1)), 
a non-transitory memory (Fig. 2, Paragraph [0038]- Jayaraman discloses the memory 208 may include random access memory (RAM), read-only memory (ROM), hard drive storage, temporary storage, non-volatile memory, flash memory, optical memory, or any other suitable memory for storing software objects and/or computer instructions.); 
and a processor communicatively coupled to the non-transitory memory (Fig. 2, Paragraph [0038]- Jayaraman discloses the memory 208 may include random access memory (RAM), read-only memory (ROM), hard drive storage, temporary storage, non-volatile memory, flash memory, optical memory, or any other suitable memory for storing software objects and/or computer instructions.), 
wherein the processor is configured to read a set of instructions to: receive the image data (Fig. 2, Paragraph [0039]- Jayaraman discloses the system 200 is shown to be in communication with the one or more camera modules 220. In one example, the camera modules 220 may include an image sensor, video cameras, still image cameras, CCTV cameras, image and video processing systems for monitoring an environment (such as retail environment 100 referred above in FIG. 1). The camera modules 220 may provide data feed comprising one or more images, or video streams pertaining to the environment.); 
implement an image processing model that generates a person count (Fig. 2, Paragraph [0055]- Jayaraman discloses the transaction data may include information such as, but not limited to, person count, group count, number of transactions, average transaction size, and sales amount (e.g., 38$ sale during 9 AM to 10 AM) during the predetermined time interval.) and dwell time (Fig. 2, Paragraph [0056]- Jayaraman discloses the system 200 is shown to include a conversion tracker 216 that may be configured to communicate with the transaction data receiver 214 and the dwell time detector 212. Further in Fig. 7, Paragraph [0078]- Jayaraman discloses dwell time component 715 may use person detection algorithms (e.g., computer vision and/or machine learning) to determine that a person has exited the environment by arriving at an exit point and not being detectable in a subsequent frame.), 
Jayaraman fails to explicitly teach comprising: an image capture device that generates image data including an area of interest within a physical environment containing at least one engagement feature; and generate an engagement metric based on the person count and the dwell time, wherein the engagement metric is representative of engagement with the at least one engagement feature.
However, Saurabh explicitly teaches comprising: an image capture device to generate image data including an area of interest within a physical environment containing at least one engagement feature (Fig. 1, Column 12 Lines [60-65]- Saurabh discloses the plurality of means for capturing images 100 are set up in a way to cover a portion of the physical space in the vicinity of digital signage under consideration. The plurality of means for capturing images 100 may be installed in a sample of locations to build a representative sample.);
and generate an engagement metric based on the person count and the dwell time (Fig. 1, Column 11 Lines [63-67]- Saurabh discloses engagement index, defined as the percentage of audience members that notice the screen and engage with it. An audience member is said to engage with the digital signage if he or she looks toward the screen for more than a given duration.), 
wherein the engagement metric is representative of engagement with the at least one engagement feature (Fig. 1, Column 11 Lines [63-67]- Saurabh discloses engagement index, defined as the percentage of audience members that notice the screen and engage with it. An audience member is said to engage with the digital signage if he or she looks toward the screen for more than a given duration. (wherein the signage is the engagement feature)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Jayaraman of having a system, a non-transitory memory and a processor communicatively coupled to the non-transitory memory wherein the processor is configured to read a set of instructions to: receive the image data implement an image processing model to generate a person count with the teachings of Saurabh comprising: an image capture device to generate image data including an area of interest within a physical environment containing at least one engagement feature; and generate an engagement metric based on the person count and the dwell time, wherein the engagement metric is representative of engagement with the at least one engagement feature.
Wherein having Jayaraman’s system for detecting dwell time wherein comprising: an image capture device to generate image data including an area of interest within a physical environment containing at least one engagement feature; and generate an engagement metric based on the person count and the dwell time, wherein the engagement metric is representative of engagement with the at least one engagement feature.
The motivation behind the modification would have been to allow for more data to be obtained, since both Jayaraman and Saurabh are both systems that calculate dwell time and person counts. Wherein Jayaraman’s system wherein improved staff engagement and sales, while Saurabh’s system provides a way to calculate effectiveness of engagement. Please see Jayaraman et al. (US 20240119822 A1), Paragraph [0059] and Saurabh et al. (US 11367083 B1) Column 8 Lines [009-23].
Jayaraman in view of Saurabh fails to explicitly teach generate model input image data including a plurality of cropped images by applying a zoom-in crop process to the image data; wherein the image processing model receives the model input image data as an input. 
However, Smith explicitly teaches generate model input image data including a plurality of cropped images by applying a zoom-in crop process to the image data (Fig. 3, Paragraph [0047]- Smith discloses four images at a second magnification covering a region the size of subsection 302 may be obtained of the section 300. Sixteen images at a third magnification covering a region the size of subsection 304 may be obtained of the section 300. Thus, a plurality of images of the same region, at different magnifications may be obtained.);
wherein the image processing model receives the model input image data as an input (Fig. 6, Paragraph [0060]- Smith discloses the method 600 begins and a lab receives 602 a sample. An imaging system 102 digitizes 604 the sample into one or more images at one or more magnifications. A classification system processes 606 the digital image using a machine learning prediction model to classify or detect one or more particulates or materials of the sample.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Jayaraman in view of Saurabh of having a system, a non-transitory memory and a processor communicatively coupled to the non-transitory memory wherein the processor is configured to read a set of instructions to: receive the image data implement an image processing model to generate a person count with the teachings of Smith generate model input image data including a plurality of cropped images by applying a zoom-in crop process to the image data; wherein the image processing model receives the model input image data as an input.
Wherein having Jayaraman’s system for detecting dwell time wherein generate model input image data including a plurality of cropped images by applying a zoom-in crop process to the image data; wherein the image processing model receives the model input image data as an input.
The motivation behind the modification would have been to allow for greater classification accuracy, since both Jayaraman and Smith are both systems that track objects in images. Wherein Jayaraman’s system wherein improved staff engagement and sales, while Smith’s system provides an improved performance and accuracy. Please see Jayaraman et al. (US 20240119822 A1), Paragraph [0059] and Smith et al. (US 20180322660 A1) Paragraph [0062].
Jayaraman in view of Saurabh and Smith fails to explicitly teach a rotation process to the image data, that rotates the plurality of cropped images to a common orientation.
However, a new prior art Choi et al. (US 20230016304 A1) explicitly teaches a rotation process to the image data, that rotates the plurality of cropped images to a common orientation (Fig. 5A, Paragraph [0045]- Choi discloses The transformations and rotations applied to the regions of interest 302-308 in the fisheye image 300 in order to form the transformed regions 502-508 can provide for the appropriate scales and rotations of pixel data so that all images in the collage image 500 can have the same or substantially the same orientation (even if at different scales).).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Jayaraman in view of Saurabh and Smith of having a non-transitory memory and a processor communicatively coupled to the non-transitory memory wherein the processor is configured to read a set of instructions to: receive the image data implement an image processing model to generate a person count with the teachings of Choi a rotation process to the image data, that rotates the plurality of cropped images to a common orientation.
Wherein having Jayaraman’s system for detecting dwell time wherein a rotation process to the image data, that rotates the plurality of cropped images to a common orientation.
The motivation behind the modification would have been to allow for simpler classification by rotating images upright, since both Jayaraman and Choi are both systems that detect objects in images. Wherein Jayaraman’s system wherein improved staff engagement and sales, while Choi’s system provides a simple way to classify objects. Please see Jayaraman et al. (US 20240119822 A1), Paragraph [0059] and Choi et al. (US 20230016304 A1) Paragraph [0045].

Regarding claim 6, Jayaraman in view of Saurabh, Smith, and Choi teaches the system of claim 1, Jayaraman in view of Smith fails to explicitly teach comprising: an image capture device to generate image data including an area of interest within a physical environment containing at least one engagement feature; and generate an engagement metric based on the person count and the dwell time, wherein the engagement metric is representative of engagement with the at least one engagement feature.
However, Saurabh explicitly teaches wherein the engagement metric is a time series metric (Fig. 1, Column 11 Lines [21-26]- Saurabh discloses another way to approximate the average number of impressions of each type is to divide the ad spot into a finite number of equal time periods, measure the average number of impressions for the time periods, and then calculate the average of the average number of impressions across all time periods.).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Jayaraman in view of Saurabh and Smith of having a system, a non-transitory memory and a processor communicatively coupled to the non-transitory memory wherein the processor is configured to read a set of instructions to: receive the image data implement an image processing model to generate a person count with the teachings of Saurabh wherein the engagement metric is a time series metric
Wherein having Jayaraman’s system for detecting dwell time wherein the engagement metric is a time series metric.
The motivation behind the modification would have been to allow for more data to be obtained, since both Jayaraman and Saurabh are both systems that calculate dwell time and person counts. Wherein Jayaraman’s system wherein improved staff engagement and sales, while Saurabh’s system provides a way to calculate effectiveness of engagement. Please see Jayaraman et al. (US 20240119822 A1), Paragraph [0059] and Saurabh et al. (US 11367083 B1) Column 8 Lines [009-23].

Regarding claim 7, Jayaraman in view of Saurabh, Smith, and Choi teaches the system of claim 1, Jayaraman further teaches wherein the image processing model is configured to: generate one or more bounding boxes corresponding to one or more persons within the image data (Fig. 1, Paragraph [0031]- Jayaraman discloses persons 106 depicted in camera frame 104 may enter and exit the retail environment 100 through a gateway 102 (e.g., an entry and egress point such as a door). FIG. 1 is shown to have a gateway having the entrance and exit ways adjacent to each other, however the entrance and exit ways may be at different points. Furthermore, the retail environment 100 may have more than one gateway to provide ease of access. (Wherein fig. 1 show the persons 106 in bounding boxes)); 
determine a trajectory estimate for each of the one or more bounding boxes (Fig. 8, Paragraph [0094]- Jayaraman discloses a subsequent frame may depict a person in a portion of the frame where an entry point is located (e.g., bottom left quadrant of a frame). In a second frame, the same person may be detected in a different portion of the frame. The different portion may be away from the entry point (e.g., top right quadrant of the frame). Based on this trajectory, dwell time component 715 may determine that the person has moved through the entry point and into the environment.); 
identify an entry event (Fig. 8, Paragraph [0094]- Jayaraman discloses based on this trajectory, dwell time component 715 may determine that the person has moved through the entry point and into the environment.) and an exit event for each of the one or more bounding boxes based on the trajectory estimate (Fig. 8, Paragraph [0098]- Jayaraman discloses based on this trajectory, dwell time component 715 may determine that the person has moved through the exit point and out of the environment.); 
and output processed image data based on the entry event and exit event for each of the one or more bounding boxes (Fig. 3A and 3B, Paragraph [0043]- Jayaraman discloses the output of dwell time detector 212 is further shown in FIG. 3A. FIG. 3A shows an example 300 of retail store data showing each hour of the day (e.g., predetermined time intervals, number of exits during each hour of the day and an average time (minutes) spent by customers inside the retail store). As referred above, the average time spent may be computed using the FIFO ordering technique.),
wherein the person count is a count of entry events (Fig. 1, Paragraph [0010]- Jayaraman discloses the instructions are further executable to add, into a first-in-first-out (FIFO) queue, a respective identifier of a respective person entering the environment and an entry time for each of the plurality of indications.), 
and wherein the dwell time for each of the one or more bounding boxes is a time difference between the entry event and exit event for each of the one or more bounding boxes (Fig. 8, Paragraph [0079]- Jayaraman discloses at block 808, the method 800 includes calculating an estimated dwell time of the respective person based on a difference in the entry time and the exit time.).

Regarding claim 9, Jayaraman in view of Saurabh, Smith, and Choi teaches the system of claim 1, Jayaraman further teaches wherein the image capture device is configured to generate the image data at a predetermined interval (Fig. 2, Paragraph [0041]- Jayaraman discloses the traffic data collector 210 may analyze the timestamped data feed to determine traffic data such as number of entries and number of exits from the environment during a predetermined time interval such as during each hour of a day.), 
and wherein the image data includes a predetermined length (Fig. 2, Paragraph [0041]- Jayaraman discloses the traffic data collector 210 may analyze the timestamped data feed to determine traffic data such as number of entries and number of exits from the environment during a predetermined time interval such as during each hour of a day. (wherein the length is 1 hour)).

Regarding claim 10, Jayaraman explicitly teaches a computer-implemented method by one or more processors (Fig. 6, Paragraph [0060]- Jayaraman discloses now referring to FIG. 6, a flowchart illustrating method 600 for determining customer dwell time, is shown in accordance with exemplary aspects of the present disclosure. In some aspects, the method 600 is performed by the system 200.), 
the method comprising: receiving image data from an image capture device (Fig. 2, Paragraph [0039]- Jayaraman discloses the system 200 is shown to be in communication with the one or more camera modules 220. In one example, the camera modules 220 may include an image sensor, video cameras, still image cameras, CCTV cameras, image and video processing systems for monitoring an environment (such as retail environment 100 referred above in FIG. 1). The camera modules 220 may provide data feed comprising one or more images, or video streams pertaining to the environment.), 
implementing an image processing model that generates a person count (Fig. 2, Paragraph [0055]- Jayaraman discloses the transaction data may include information such as, but not limited to, person count, group count, number of transactions, average transaction size, and sales amount (e.g., 38$ sale during 9 AM to 10 AM) during the predetermined time interval.) and dwell time (Fig. 2, Paragraph [0056]- Jayaraman discloses the system 200 is shown to include a conversion tracker 216 that may be configured to communicate with the transaction data receiver 214 and the dwell time detector 212. Further in Fig. 7, Paragraph [0078]-Jayaraman discloses dwell time component 715 may use person detection algorithms (e.g., computer vision and/or machine learning) to determine that a person has exited the environment by arriving at an exit point and not being detectable in a subsequent frame.),
Jayaraman fails to explicitly teach wherein the image data includes an area of interest within a physical environment containing at least one engagement feature; and generating an engagement metric based on the person count and the dwell time, wherein the engagement metric is representative of engagement with the at least one engagement feature.
However, Saurabh explicitly teaches wherein the image data includes an area of interest within a physical environment containing at least one engagement feature (Fig. 1, Column 12 Lines [60-65]- Saurabh discloses the plurality of means for capturing images 100 are set up in a way to cover a portion of the physical space in the vicinity of digital signage under consideration. The plurality of means for capturing images 100 may be installed in a sample of locations to build a representative sample.); 
and generating an engagement metric based on the person count and the dwell time (Fig. 1, Column 11 Lines [63-67]- Saurabh discloses engagement index, defined as the percentage of audience members that notice the screen and engage with it. An audience member is said to engage with the digital signage if he or she looks toward the screen for more than a given duration.), 
wherein the engagement metric is representative of engagement with the at least one engagement feature (Fig. 1, Column 11 Lines [63-67]- Saurabh discloses engagement index, defined as the percentage of audience members that notice the screen and engage with it. An audience member is said to engage with the digital signage if he or she looks toward the screen for more than a given duration. (wherein the signage is the engagement feature)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Jayaraman of having a computer-implemented method, comprising: receiving image data from an image capture device implementing an image processing model to generate a person count with the teachings of Saurabh wherein the image data includes an area of interest within a physical environment containing at least one engagement feature; and generating an engagement metric based on the person count and the dwell time, wherein the engagement metric is representative of engagement with the at least one engagement feature.
Wherein having Jayaraman’s system for detecting dwell time wherein the image data includes an area of interest within a physical environment containing at least one engagement feature; and generating an engagement metric based on the person count and the dwell time, wherein the engagement metric is representative of engagement with the at least one engagement feature.
The motivation behind the modification would have been to allow for more data to be obtained, since both Jayaraman and Saurabh are both systems that calculate dwell time and person counts. Wherein Jayaraman’s system wherein improved staff engagement and sales, while Saurabh’s system provides a way to calculate effectiveness of engagement. Please see Jayaraman et al. (US 20240119822 A1), Paragraph [0059] and Saurabh et al. (US 11367083 B1) Column 8 Lines [009-23].
Jayaraman in view of Saurabh fails to explicitly teach generating model input image data including a plurality of cropped images by applying a zoom-in crop process to the image data; wherein the image processing model receives the model input image data as an input. 
However, Smith explicitly teaches generating model input image data including a plurality of cropped images by applying a zoom-in crop process to the image data (Fig. 3, Paragraph [0057]- Smith discloses four images at a second magnification covering a region the size of subsection 302 may be obtained of the section 300. Sixteen images at a third magnification covering a region the size of subsection 304 may be obtained of the section 300. Thus, a plurality of images of the same region, at different magnifications may be obtained.);
wherein the image processing model receives the model input image data as an input (Fig. 6, Paragraph [0060]- Smith discloses the method 600 begins and a lab receives 602 a sample. An imaging system 102 digitizes 604 the sample into one or more images at one or more magnifications. A classification system processes 606 the digital image using a machine learning prediction model to classify or detect one or more particulates or materials of the sample.);
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Jayaraman in view of Saurabh of having a computer-implemented method, comprising: receiving image data from an image capture device implementing an image processing model to generate a person count with the teachings of Smith generating model input image data including a plurality of cropped images by applying a zoom-in crop process to the image data; wherein the image processing model receives the model input image data as an input.
Wherein having Jayaraman’s system for detecting dwell time wherein generating model input image data including a plurality of cropped images by applying a zoom-in crop process to the image data; wherein the image processing model receives the model input image data as an input.
The motivation behind the modification would have been to allow for greater classification accuracy, since both Jayaraman and Smith are both systems that track objects in images. Wherein Jayaraman’s system wherein improved staff engagement and sales, while Smith’s system provides an improved performance and accuracy. Please see Jayaraman et al. (US 20240119822 A1), Paragraph [0059] and Smith et al. (US 20180322660 A1) Paragraph [0062].
Jayaraman in view of Saurabh and Smith fails to explicitly teach a rotation process to the image data, that rotates the plurality of cropped images to a common orientation.
However, Choi explicitly teaches a rotation process to the image data, that rotates the plurality of cropped images to a common orientation (Fig. 5A, Paragraph [0045]- Choi discloses The transformations and rotations applied to the regions of interest 302-308 in the fisheye image 300 in order to form the transformed regions 502-508 can provide for the appropriate scales and rotations of pixel data so that all images in the collage image 500 can have the same or substantially the same orientation (even if at different scales).).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Jayaraman in view of Saurabh and Smith of having a computer-implemented method, comprising: receiving image data from an image capture device implementing an image processing model to generate a person count with the teachings of Choi a rotation process to the image data, that rotates the plurality of cropped images to a common orientation.
Wherein having Jayaraman’s system for detecting dwell time wherein a rotation process to the image data, that rotates the plurality of cropped images to a common orientation.
The motivation behind the modification would have been to allow for simpler classification by rotating images upright, since both Jayaraman and Choi are both systems that detect objects in images. Wherein Jayaraman’s system wherein improved staff engagement and sales, while Choi’s system provides a simple way to classify objects. Please see Jayaraman et al. (US 20240119822 A1), Paragraph [0059] and Choi et al. (US 20230016304 A1) Paragraph [0045].

Regarding claim 16, Jayaraman in view of Saurabh, Smith, and Choi teaches the computer-implemented method of claim 10, Jayaraman further teaches wherein the image processing model: generates one or more bounding boxes corresponding to one or more persons within the image data (Fig. 1, Paragraph [0031]- Jayaraman discloses persons 106 depicted in camera frame 104 may enter and exit the retail environment 100 through a gateway 102 (e.g., an entry and egress point such as a door). FIG. 1 is shown to have a gateway having the entrance and exit ways adjacent to each other, however the entrance and exit ways may be at different points. Furthermore, the retail environment 100 may have more than one gateway to provide ease of access. (Wherein fig. 1 show the persons 106 in bounding boxes));
determines a trajectory estimate for each of the one or more bounding boxes (Fig. 8, Paragraph [0094]- Jayaraman discloses a subsequent frame may depict a person in a portion of the frame where an entry point is located (e.g., bottom left quadrant of a frame). In a second frame, the same person may be detected in a different portion of the frame. The different portion may be away from the entry point (e.g., top right quadrant of the frame). Based on this trajectory, dwell time component 715 may determine that the person has moved through the entry point and into the environment.);
 identifies an entry event (Fig. 8, Paragraph [0094]- Jayaraman discloses based on this trajectory, dwell time component 715 may determine that the person has moved through the entry point and into the environment.) and an exit event for each of the one or more bounding boxes based on the trajectory estimate ((Fig. 8, Paragraph [0098]- Jayaraman discloses based on this trajectory, dwell time component 715 may determine that the person has moved through the exit point and out of the environment.);
and outputs processed image data based on the entry event and exit event for each of the one or more bounding boxes (Fig. 3A and 3B, Paragraph [0043]- Jayaraman discloses the output of dwell time detector 212 is further shown in FIG. 3A. FIG. 3A shows an example 300 of retail store data showing each hour of the day (e.g., predetermined time intervals, number of exits during each hour of the day and an average time (minutes) spent by customers inside the retail store). As referred above, the average time spent may be computed using the FIFO ordering technique.),
wherein the person count is a count of entry events (Fig. 1, Paragraph [0010]- Jayaraman discloses the instructions are further executable to add, into a first-in-first-out (FIFO) queue, a respective identifier of a respective person entering the environment and an entry time for each of the plurality of indications.),
and wherein the dwell time for each of the one or more bounding boxes is a time difference between the entry event and exit event for each of the one or more bounding boxes (Fig. 8, Paragraph [0079]- Jayaraman discloses at block 808, the method 800 includes calculating an estimated dwell time of the respective person based on a difference in the entry time and the exit time.).    

Regarding claim 18, Jayaraman in view of Saurabh, Smith, and Choi teaches the computer-implemented method of claim 10, Jayaraman further teaches wherein the image capture device generates the image data at a predetermined interval (Fig. 2, Paragraph [0041]- Jayaraman discloses one or more filters may be applied to the traffic data for pre-processing the traffic data. For example, the traffic data may be filtered for normal operating hours. For example, normal operating hours may indicate operating hours of a retail store such as between 9 AM to 8 PM. (wherein the interval is the hours the operating hours)),  
and wherein the image data includes a predetermined length (Fig. 2, Paragraph [0041]- Jayaraman discloses the traffic data collector 210 may analyze the timestamped data feed to determine traffic data such as number of entries and number of exits from the environment during a predetermined time interval such as during each hour of a day. (wherein the length is 1 hour)).

Claims 2 and 11 are rejected under 35 U.S.C 103 as being unpatentable over Jayaraman et al. (US 20240119822 A1) hereafter referenced as Jayaraman in view of Saurabh et al. (US 11367083 B1) hereafter referenced as Saurabh, Smith et al. (US 20180322660 A1) hereafter referenced as Smith, Choi et al. (US 20230016304 A1) hereafter referenced as Choi, and Qian et al. (US 20220083782 A1) hereafter referenced as Qian.
Regarding claim 2, Jayaraman in view of Saurabh, Smith, and Choi teaches the system of claim 1, Jayaraman in view of Saurabh and Smith fails to explicitly teach wherein the model input image data is generated by applying frame differencing to the plurality of cropped images.
However, Qian explicitly teaches wherein the model input image data is generated by applying frame differencing to the plurality of cropped images (Fig. 3, Paragraph [0068]- Qian discloses the image cropping module 308 is configured to generate one or more cropped frame difference images with respect to candidate packages identified in frame difference images output by the frame differencing module 306. For example, the image cropping module 306 receives data corresponding to the frame difference image generated by the frame differencing module 306 and generates cropped frame difference images.).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Jayaraman in view of Saurabh and Smith of having a system, a non-transitory memory and a processor communicatively coupled to the non-transitory memory wherein the processor is configured to read a set of instructions to: receive the image data implement an image processing model to generate a person count with the teachings of Qian wherein the model input image data is generated by applying frame differencing to the plurality of cropped images.
Wherein having Jayaraman’s system for detecting dwell time wherein the model input image data is generated by applying frame differencing to the plurality of cropped images.
The motivation behind the modification would have been to allow for better motion detection, since both Jayaraman and Smith are both systems that track objects in images. Wherein Jayaraman’s system wherein improved staff engagement and sales, while Qian’s system provides an improved form of motion detection. Please see Jayaraman et al. (US 20240119822 A1), Paragraph [0059] and Qian et al. (US 20220083782 A1) Paragraph [0067].

Regarding claim 11, Jayaraman in view of Saurabh, Smith, and Choi teaches the computer-implemented method of claim 10, Jayaraman in view of Saurabh and Smith fails to explicitly teach wherein the model input image data is generated by applying frame differencing to the plurality of cropped images.
However, Qian explicitly teaches wherein the model input image data is generated by applying frame differencing to the plurality of cropped images (Fig. 3, Paragraph [0068]- Qian discloses the image cropping module 308 is configured to generate one or more cropped frame difference images with respect to candidate packages identified in frame difference images output by the frame differencing module 306. For example, the image cropping module 306 receives data corresponding to the frame difference image generated by the frame differencing module 306 and generates cropped frame difference images.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Jayaraman in view of Saurabh and Smith of having a computer-implemented method, comprising: receiving image data from an image capture device implementing an image processing model to generate a person count with the teachings of Qian wherein the model input image data is generated by applying frame differencing to the plurality of cropped images.
Wherein having Jayaraman’s system for detecting dwell time wherein the model input image data is generated by applying frame differencing to the plurality of cropped images.
The motivation behind the modification would have been to allow for better motion detection, since both Jayaraman and Smith are both systems that track objects in images. Wherein Jayaraman’s system wherein improved staff engagement and sales, while Qian’s system provides an improved form of motion detection. Please see Jayaraman et al. (US 20240119822 A1), Paragraph [0059] and Qian et al. (US 20220083782 A1) Paragraph [0067].

Claims 8 and 17 are rejected under 35 U.S.C 103 as being unpatentable over Jayaraman et al. (US 20240119822 A1) hereafter referenced as Jayaraman in view of Saurabh et al. (US 11367083 B1) hereafter referenced as Saurabh, Smith et al. (US 20180322660 A1) hereafter referenced as Smith, Choi et al. (US 20230016304 A1) hereafter referenced as Choi, and Liu et al. (US 20210229292 A1) hereafter referenced as Liu.

Regarding claim 8, Jayaraman in view of Saurabh, Smith, and Choi teaches the system of claim 7, Jayaraman in view of Saurabh and Smith fails to explicitly teach wherein the image processing model is configured to apply a region of interest process prior to generating the one or more bounding boxes.
However, Liu explicitly teaches wherein the image processing model is configured to apply a region of interest process prior to generating the one or more bounding boxes (Fig. 6A, Paragraph [0053]- Liu discloses before the bounding box predictions are made, a mask is produced for the visual area of the object, as is shown by the yellow mask in the adjacent image.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Jayaraman in view of Saurabh and Smith of having a system, a non-transitory memory and a processor communicatively coupled to the non-transitory memory wherein the processor is configured to read a set of instructions to: receive the image data implement an image processing model to generate a person count with the teachings of Liu wherein the image processing model is configured to apply a region of interest process prior to generating the one or more bounding boxes.
Wherein having Jayaraman’s system for detecting dwell time wherein the image processing model is configured to apply a region of interest process prior to generating the one or more bounding boxes.
The motivation behind the modification would have been to allow for more accurate and efficient predictions, since both Jayaraman and Smith are both systems that track objects in images. Wherein Jayaraman’s system wherein improved staff engagement and sales, while Liu’s system provides an improvement to accuracy and efficiency of movement predictions. Please see Jayaraman et al. (US 20240119822 A1), Paragraph [0059] and Liu et al. (US 20210229292 A1) Paragraph [0029].

Regarding claim 17, Jayaraman in view of Saurabh, Smith, and Choi teaches the computer-implemented method of claim 16, Jayaraman in view of Saurabh and Smith fails to explicitly teach wherein the image processing model is configured to applies a region of interest process prior to generating the one or more bounding boxes.
However, Liu explicitly teaches wherein the image processing model applies a region of interest process prior to generating the one or more bounding boxes (Fig. 6A, Paragraph [0053]- Liu discloses before the bounding box predictions are made, a mask is produced for the visual area of the object, as is shown by the yellow mask in the adjacent image.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Jayaraman in view of Saurabh and Smith of having a computer-implemented method, comprising: receiving image data from an image capture device implementing an image processing model to generate a person count with the teachings of Liu wherein the image processing model applies a region of interest process prior to generating the one or more bounding boxes.
Wherein having Jayaraman’s system for detecting dwell time wherein the image processing model applies a region of interest process prior to generating the one or more bounding boxes.
The motivation behind the modification would have been to allow for more accurate and efficient predictions, since both Jayaraman and Smith are both systems that track objects in images. Wherein Jayaraman’s system wherein improved staff engagement and sales, while Liu’s system provides an improvement to accuracy and efficiency of movement predictions. Please see Jayaraman et al. (US 20240119822 A1), Paragraph [0059] and Liu et al. (US 20210229292 A1) Paragraph [0029].


Claim 12 is rejected under 35 U.S.C 103 as being unpatentable over Jayaraman et al. (US 20240119822 A1) hereafter referenced as Jayaraman in view of Saurabh et al. (US 11367083 B1) hereafter referenced as Saurabh, Smith et al. (US 20180322660 A1) hereafter referenced as Smith, Choi et al. (US 20230016304 A1) hereafter referenced as Choi, and Chembakassery Rajendran et al. (US 20230267742 A1) hereafter referenced as Chembakassery Rajendran.
Regarding claim 12, Jayaraman in view of Saurabh, Smith, and Choi teaches the computer-implemented method of claim 10, Jayaraman in view of Saurabh, Smith, and Choi fails to explicitly teach comprising implementing a trained regression model to generate a predicted person count, wherein the trained regression model receives the person count as an input.
However, Chembakassery Rajendran explicitly teaches comprising implementing a trained regression model to generate a predicted person count (Fig. 6, Paragraph [0054]- Chembakassery Rajendran the image is fed into a regression model to generate a second result for the image that is indicative of a regression count of people detected in the one or more regions of the image. An estimate of people detected in the image is obtained based on the first result and the second result using one or more rules.), 
wherein the trained regression model receives the person count as an input (Fig. 6, Paragraph [0054]- Chembakassery Rajendran discloses a method 600 for counting number of people in an image includes feeding an image into an object counting model to generate a first result for the image that is indicative of a number of human heads detected in one or more regions of the image. The image is fed into a regression model to generate a second result for the image that is indicative of a regression count of people detected in the one or more regions of the image. An estimate of people detected in the image is obtained based on the first result and the second result using one or more rules.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Jayaraman in view of Saurabh and Smith of having a computer-implemented method, comprising: receiving image data from an image capture device implementing an image processing model to generate a person count with the teachings of Chembakassery Rajendran comprising implementing a trained regression model to generate a predicted person count, wherein the trained regression model receives the person count as an input.
Wherein having Jayaraman’s system for detecting dwell time wherein comprising implementing a trained regression model to generate a predicted person count, wherein the trained regression model receives the person count as an input.
The motivation behind the modification would have been to allow for more accurate counting to be done, since both Jayaraman and Chembakassery Rajendran are both systems that determine person counts. Wherein Jayaraman’s system wherein improved staff engagement and sales, while Chembakassery Rajendran’s system provides a way to improve the performance and accuracy of people counting. Please see Jayaraman et al. (US 20240119822 A1), Paragraph [0059] and Chembakassery Rajendran et al. (US 20230267742 A1) Paragraph [0032].

Claim 14 is rejected under 35 U.S.C 103 as being unpatentable over Jayaraman et al. (US 20240119822 A1) hereafter referenced as Jayaraman in view of Saurabh et al. (US 11367083 B1) hereafter referenced as Saurabh, Smith et al. (US 20180322660 A1) hereafter referenced as Smith, Choi et al. (US 20230016304 A1) hereafter referenced as Choi, and Zaman et al. (US 20230081797 A1) hereafter referenced as Zaman.
Regarding claim 14, Jayaraman in view of Saurabh, Smith, and Choi teaches the computer-implemented method of claim 10, Jayaraman in view of Saurabh, Smith, and Choi fails to explicitly teach wherein the physical environment is a selected one of a plurality of physical environments, the computer-implemented method comprising: generating a plurality of clusters, wherein a selected one of the plurality of clusters includes the selected one of the plurality of physical environments and at least one additional physical environment; implementing a trained regression model to generate an estimated engagement metric for each of the additional physical environments, wherein the trained regression model is generated by training data including one or more features of each of the plurality of physical environments, and wherein the regression model receives the engagement metric as an input.
However, Zaman explicitly teaches wherein the physical environment is a selected one of a plurality of physical environments (Fig. 1, Paragraph [0062]- Zaman discloses the geographical location 108 includes multiple stores 102 such as store A, store B and store C as shown in FIG. 1.), 
the computer-implemented method comprising: generating a plurality of clusters, wherein a selected one of the plurality of clusters includes the selected one of the plurality of physical environments and at least one additional physical environment (Fig. 30 Paragraph [0144]- Zaman discloses the DBAS/DSAI Engine may take an unsupervised learning/hierarchical clustering approach. The unsupervised learning may be used to identify areas where a lot of commercial POIs are in close proximity taking that as a sign of a marketplace, and as a consequence, mark the said location as an economic zone or a potential store location.);
implementing a trained regression model to generate an estimated engagement metric for each of the additional physical environments (Fig. 1, Paragraph [0062]- Zaman discloses each store, such as the store A, the store B, and the store C may be of different sizes and types. the retail management system takes into account multiple factors to predict one or more parameters for retail management such as optimum inventory, new store location based on different parameters associated with demographics, consumer behavior, commercial activity, competition etc.), 
wherein the trained regression model is generated by training data including one or more features of each of the plurality of physical environments (Fig. 29, Paragraph [0142]- Zaman discloses during the validation process, a set of data points related to sales information for certain existing store are kept aside and are not provided to the machine learning model. After the model is trained, the model is validated to calculate the store's success probability. Thereafter, the model is validated using the validation data to find its proximity to with real world. 80-20 split is made at random to get training and validation data for each catchment radius.), 
and wherein the regression model receives the engagement metric as an input (Fig. 3, Paragraph [0068]- Zaman discloses the predictive analysis module 322 may provide prediction based on the input received from one or more sources such as but not limited to statistical module 318, business logic module 320, artificial intelligence module 308 and other modules. Further in Fig. 3, Paragraph [0070]- Zaman discloses the artificial intelligence module 308 may implement data analytics algorithms in the data analytics module 314. Additionally, the data analytics module 314 may be associated with rule-based module 312 to evaluate different parameters relevant for analysis of different parameters associated with retail management. The artificial intelligence module 308 may further include the machine learning module 310 that implements machine learning algorithms for supervised and unsupervised learning. The machine learning module is interfaced with the rule based module 312 and the data analytics module 314 and act in unison for data analysis and training of algorithms.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Jayaraman in view of Saurabh and Smith of having a computer-implemented method, comprising: receiving image data from an image capture device implementing an image processing model to generate a person count with the teachings of Zaman wherein the physical environment is a selected one of a plurality of physical environments, the computer-implemented method comprising: generating a plurality of clusters, wherein a selected one of the plurality of clusters includes the selected one of the plurality of physical environments and at least one additional physical environment; implementing a trained regression model to generate an estimated engagement metric for each of the additional physical environments, wherein the trained regression model is generated by training data including one or more features of each of the plurality of physical environments, and wherein the regression model receives the engagement metric as an input.
Wherein having Jayaraman’s system for detecting dwell time wherein the physical environment is a selected one of a plurality of physical environments, the computer-implemented method comprising: generating a plurality of clusters, wherein a selected one of the plurality of clusters includes the selected one of the plurality of physical environments and at least one additional physical environment; implementing a trained regression model to generate an estimated engagement metric for each of the additional physical environments, wherein the trained regression model is generated by training data including one or more features of each of the plurality of physical environments, and wherein the regression model receives the engagement metric as an input.
The motivation behind the modification would have been to allow for better interpretation of data, since both Jayaraman and Zaman are both systems that determine information about a store. Wherein Jayaraman’s system wherein improved staff engagement and sales, while Zaman’s system provides a way to improve the interpretation and accuracy of the data. Please see Jayaraman et al. (US 20240119822 A1), Paragraph [0059] and Zaman et al. (US 20230081797 A1) Paragraph [0020].

Claim 15 is rejected under 35 U.S.C 103 as being unpatentable over Jayaraman et al. (US 20240119822 A1) hereafter referenced as Jayaraman in view of Saurabh et al. (US 11367083 B1) hereafter referenced as Saurabh, Smith et al. (US 20180322660 A1) hereafter referenced as Smith, Choi et al. (US 20230016304 A1) hereafter referenced as Choi, and Green et al. (US 20210110193 A1) hereafter referenced as Green.
Regarding claim 15, Jayaraman in view of Saurabh, Smith, and Choi teaches the computer-implemented method of claim 10, Jayaraman in view of Smith fails to explicitly teach a time series metric.
However, Saurabh explicitly teaches or a time series metric (Fig. 1, Column 11 Lines [21-26]- Saurabh discloses another way to approximate the average number of impressions of each type is to divide the ad spot into a finite number of equal time periods, measure the average number of impressions for the time periods, and then calculate the average of the average number of impressions across all time periods.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Jayaraman in view of Saurabh and Smith of having a computer-implemented method, comprising: receiving image data from an image capture device implementing an image processing model to generate a person count with the teachings of Saurabh a time series metric.
Wherein having Jayaraman’s system for detecting dwell time wherein a time series metric.
The motivation behind the modification would have been to allow for more data to be obtained, since both Jayaraman and Saurabh are both systems that calculate dwell time and person counts. Wherein Jayaraman’s system wherein improved staff engagement and sales, while Saurabh’s system provides a way to calculate effectiveness of engagement. Please see Jayaraman et al. (US 20240119822 A1), Paragraph [0059] and Saurabh et al. (US 11367083 B1) Column 8 Lines [009-23].
Jayaraman in view of Saurabh and Smith fails to explicitly teach wherein the engagement metric is one of an aggregated engagement score or a time series metric.
However, Green explicitly teaches wherein the engagement metric is one of an aggregated engagement score (Fig. 3, Paragraph [0062]- Green discloses upon periodically executing the engagement classifier over a time period, the system can then: aggregate the engagement conditions into the set of engagement conditions for the workstation; and calculate a workstation engagement rate based on the set of engagement conditions.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Jayaraman in view of Saurabh and Smith of having a computer-implemented method, comprising: receiving image data from an image capture device implementing an image processing model to generate a person count with the teachings of Saurabh wherein the engagement metric is one of an aggregated engagement score.
Wherein having Jayaraman’s system for detecting dwell time wherein the engagement metric is one of an aggregated engagement score.
The motivation behind the modification would have been to allow for more accurate data to be obtained, since both Jayaraman and Green are both systems that use object detection and classification. Wherein Jayaraman’s system wherein improved staff engagement and sales, while Green’s system provides a way to improve accuracy of the system. Please see Jayaraman et al. (US 20240119822 A1), Paragraph [0059] and Green et al. (US 20210110193 A1) Paragraph [0014].

Claims 19-20 are rejected under 35 U.S.C 103 as being unpatentable over Jayaraman et al. (US 20240119822 A1) hereafter referenced as Jayaraman in view of Saurabh et al. (US 11367083 B1) hereafter referenced as Saurabh and Kim et al. (US 20210042530 A1) hereafter referenced as Kim.
Regarding claim 19, Jayaraman explicitly teaches a computer-implemented method by one or more processors (Fig. 6, Paragraph [0060]- Jayaraman discloses now referring to FIG. 6, a flowchart illustrating method 600 for determining customer dwell time, is shown in accordance with exemplary aspects of the present disclosure. In some aspects, the method 600 is performed by the system 200.),  
Jayaraman fails to explicitly teach the method comprising: receiving a first training dataset including image data representative of an area of interest within a physical environment containing at least one engagement feature.
However, Saurabh explicitly teaches the method comprising: receiving a first training dataset including image data representative of an area of interest within a physical environment containing at least one engagement feature (Fig. 1, Column 12 Lines [60-65]- Saurabh discloses the plurality of means for capturing images 100 are set up in a way to cover a portion of the physical space in the vicinity of digital signage under consideration. The plurality of means for capturing images 100 may be installed in a sample of locations to build a representative sample.);
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Jayaraman of having a computer-implemented method, generating output image data including the image data and the one or more bounding boxes with the teachings of Saurabh the method comprising: receiving a first training dataset including image data representative of an area of interest within a physical environment containing at least one engagement feature.
Wherein having Jayaraman’s system for detecting dwell time wherein the method comprising: receiving a first training dataset including image data representative of an area of interest within a physical environment containing at least one engagement feature.
The motivation behind the modification would have been to allow for more data to be obtained, since both Jayaraman and Saurabh are both systems that calculate dwell time and person counts. Wherein Jayaraman’s system wherein improved staff engagement and sales, while Saurabh’s system provides a way to calculate effectiveness of engagement. Please see Jayaraman et al. (US 20240119822 A1), Paragraph [0059] and Saurabh et al. (US 11367083 B1) Column 8 Lines [009-23].
Jayaraman in view of Saurabh fails to explicitly teach iteratively training a model framework that generates an intermediate model based on the first training dataset, wherein the model framework identifies one or more bounding boxes representative of one or more individuals within the area of interest of the image data, wherein the model framework is configured to receive the image data as an input; generating output image data using the intermediate model, wherein the output image data includes the image data and the one or more bounding box; receiving a second training dataset, wherein the second training dataset includes modified output image data comprising the output image data with one or more bounding box corrections applied; and iteratively training the intermediate model, the training generating a trained image processing model based on the second training dataset.
However, Kim explicitly iteratively training a model framework that generates an intermediate model based on the first training dataset (Fig. 1, Paragraph [0066]- Kim discloses regarding direct transferability, at the beginning of annotating a new set of images, the AI model used in the annotation pipeline would have been trained with a different dataset.), 
wherein the model framework identifies one or more bounding boxes representative of one or more individuals within the area of interest of the image data (Fig. 1, Paragraph [0068]- Kim discloses the purpose of the AI is to make bounding box boundaries as precise as possible, meaning that there are as few as possible pixels between the object extreme points and the bounding box), 
wherein the model framework is configured to receive the image data as an input (Fig. 1, Paragraph [0030]- Kim discloses the server 102 further includes an image data input source 130 for the receipt of image data 132.); 
generating output image data using the intermediate model, wherein the output image data includes the image data and the one or more bounding boxes (Fig. 1, Paragraph [0056]- Kim discloses an approach may be utilized that takes an image patch as input and estimates a tight bounding box around the main object in the image patch as output.),
receiving a second training dataset (Fig. 1, Paragraph [0055]- Kim discloses the weak label, original machine predicted annotation, and human corrected annotation may be saved to the annotation system 100 to retrain the machine online or offline.), 
wherein the second training dataset includes modified output image data comprising the output image data with one or more bounding box corrections applied (Fig. 1, Paragraph [0055]- Kim discloses the weak label, original machine predicted annotation, and human corrected annotation may be saved to the annotation system 100 to retrain the machine online or offline.); 
and iteratively training the intermediate model the training generating a trained image processing model based on the second training dataset (Fig. 1, Paragraph [0067]- Kim discloses a common practice in data annotation is to re-train or finetune computer vision algorithms used in the system after a portion of the data is annotated.).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Jayaraman in view of Saurabh of having a computer-implemented method, generating output image data including the image data and the one or more bounding boxes with the teachings of Kim iteratively training a model framework that generates an intermediate model based on the first training dataset, wherein the model framework identifies one or more bounding boxes representative of one or more individuals within the area of interest of the image data, wherein the model framework is configured to receive the image data as an input; generating output image data using the intermediate model, wherein the output image data includes the image data and the one or more bounding box; receiving a second training dataset, wherein the second training dataset includes modified output image data comprising the output image data with one or more bounding box corrections applied; and iteratively training the intermediate model, the training generating a trained image processing model based on the second training dataset.
Wherein having Jayaraman’s system for detecting dwell time wherein iteratively training a model framework that generates an intermediate model based on the first training dataset, wherein the model framework identifies one or more bounding boxes representative of one or more individuals within the area of interest of the image data, wherein the model framework is configured to receive the image data as an input; generating output image data using the intermediate model, wherein the output image data includes the image data and the one or more bounding box; receiving a second training dataset, wherein the second training dataset includes modified output image data comprising the output image data with one or more bounding box corrections applied; and iteratively training the intermediate model, the training generating a trained image processing model based on the second training dataset.
The motivation behind the modification would have been to allow for a more efficient and accurate system, since both Jayaraman and Kim are both systems that use object detection and tracking. Wherein Jayaraman’s system wherein improved staff engagement and sales, while Kim’s system provides a way to improve accuracy and efficiency of the system. Please see Jayaraman et al. (US 20240119822 A1), Paragraph [0059] and Kim et al. (US 20210042530 A1) Paragraph [0023-25].

Regarding claim 21, Jayaraman in view of Saurabh, Smith, and Choi teaches the system of claim 1, Jayaraman in view of Saurabh, Smith, and Choi is silent to explicitly teach wherein the zoom-in crop process comprises dividing the image data into one or more portions that contain partial sections of the image data and scaling the one or more portions to a similar size.
However, a new prior art Zhang et al. (US 20160360104 A1) explicitly teaches wherein the zoom-in crop process comprises dividing the image data into one or more portions that contain partial sections of the image data and scaling the one or more portions to a similar size (Fig. 6, Paragraph [0132]- Zhang discloses in some configurations, an affine transform may be applied to align the fisheye images projected onto the common plane. For example, the images on the common plane may be translated and/or scaled to match the overlapping regions. Additionally or alternatively, a projective transform and/or image morphing between fisheye images may be used.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of Jayaraman in view of Saurabh and Smith of having a system, a non-transitory memory and a processor communicatively coupled to the non-transitory memory wherein the processor is configured to read a set of instructions to: receive the image data implement an image processing model to generate a person count with the teachings of Zhang wherein the zoom-in crop process comprises dividing the image data into one or more portions that contain partial sections of the image data and scaling the one or more portions to a similar size.
Wherein having Jayaraman’s system for detecting dwell time wherein the zoom-in crop process comprises dividing the image data into one or more portions that contain partial sections of the image data and scaling the one or more portions to a similar size
The motivation behind the modification would have been to allow for information to be obtained, since both Jayaraman and Zhang are both systems that use cameras for object detection. Wherein Jayaraman’s system wherein improved staff engagement and sales, while Zhang’s system provides a reduction in alignment error. Please see Jayaraman et al. (US 20240119822 A1), Paragraph [0059] and Zhang et al. (US 20160360104 A1) Paragraph [0135].

Allowable Subject Matter
Claims 3-5 and 13 are therefrom objected to as being dependent upon rejected base claim, claim 1, respectively but would be allowable if rewritten in independent form including all of the limitations of the base claims and any intervening claims.
The following is a statement of reasons for indication of allowable subject matter:
Regarding claim 3, the prior arts fail to explicitly teach, wherein the processor is configured to implement a trained regression model to generate a predicted person count, wherein the trained regression model receives the person count as an input, (Wherein the regression model has a structure associated with it which is an auto regressor defined by yt=∑k=0p1αkTt-k+∑k=0p2βkC(t-k)+∑k=1p3γky^(t-k)+ϵt where yt is the a predicted time series representative of the at least one engagement metric 270 for an additional location, T is a selected time-series feature (e.g., transactions, units sold, etc.), C is a nearest cluster medoid time series (e.g., the engagement metric 270 represented as a time-series for the selected representative environment), and αk,β(k),γ(k) are regression parameters generated during iterative training of the model, based on the functional language since the term “regression model to generate” is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. ), as claimed in claim 3.
Regarding claim 4, the prior arts fail to explicitly teach, wherein the engagement metric is generated as a weighted sum of the dwell time over a minimum time slot repetition of the engagement feature for each dwell time greater than a dwell time cutoff and scaled by a scaling factor, as claimed in claim 4.
Regarding claim 5, the prior arts fail to explicitly teach, and implement a trained regression model to generate an estimated engagement metric for each of the additional physical environments, wherein the trained regression model is generated by training data including one or more features of each of the plurality of physical environments, and wherein the trained regression model receives the engagement metric as an input.  (Wherein the regression model has a structure associated with it which is an auto regressor defined by yt=∑k=0p1αkTt-k+∑k=0p2βkC(t-k)+∑k=1p3γky^(t-k)+ϵt where yt is the a predicted time series representative of the at least one engagement metric 270 for an additional location, T is a selected time-series feature (e.g., transactions, units sold, etc.), C is a nearest cluster medoid time series (e.g., the engagement metric 270 represented as a time-series for the selected representative environment), and αk,β(k),γ(k) are regression parameters generated during iterative training of the model, based on the functional language since the term “regression model to generate” is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.), as claimed in claim 5.
Regarding claim 13, the prior arts fail to explicitly teach, wherein the engagement metric is generated as a weighted sum of the dwell time over a minimum time slot repetition of the engagement feature for each dwell time greater than a dwell time cutoff and scaled by a scaling factor, as claimed in claim 13.

Conclusion
Listed below are the prior arts made of record and not relied upon but are considered
pertinent to applicant`s disclosure.
Milne et al. (US 20230196096 A1)- Techniques that facilitate the development and/or modification of an automated visual inspection (AVI) system that implements deep learning are described herein. Some aspects facilitate the generation of a large and diverse training image library, such as by digitally modifying images of real-world containers, and/or generating synthetic container images using a deep generative model. Other aspects decrease the use of processing resources for training, and/or making inferences with, neural networks in an AVI system, such as by automatically reducing the pixel sizes of training images (e.g., by down-sampling and/or selectively cropping container images). Still other aspects facilitate the testing or qualification of an AVI neural network by automatically analyzing a heatmap or bounding box generated by the neural network. Various other techniques are also described herein...................Please see Fig. 1. Abstract.
WALLIN et al. (US 20200117918 A1)- A method performed by a vehicle system for handling images of the surroundings of a vehicle. An image of the surroundings of the vehicle is obtained. The image is obtained from at least one side image capturing device mounted in or on the vehicle, and the image capturing device comprises a fisheye camera lens. At least a part of the distortions in the image is corrected to obtain a corrected image. The corrected image is rotationally transformed using a first rotational transformation to obtain a first transformed image. The corrected image is rotationally transformed using a second rotational transformation to obtain a second transformed image. The first and second transformed images are consecutive or adjacent images....................Please see Fig. 1. Abstract.
Yan et al. (US 10796402 B2)- A system and method for fisheye image processing is disclosed. A particular embodiment can be configured to: receive fisheye image data from at least one fisheye lens camera associated with an autonomous vehicle, the fisheye image data representing at least one fisheye image frame; partition the fisheye image frame into a plurality of image portions representing portions of the fisheye image frame; warp each of the plurality of image portions to map an arc of a camera projected view into a line corresponding to a mapped target view, the mapped target view being generally orthogonal to a line between a camera center and a center of the arc of the camera projected view; combine the plurality of warped image portions to form a combined resulting fisheye image data set representing recovered or distortion-reduced fisheye image data corresponding to the fisheye image frame; generate auto-calibration data representing a correspondence between pixels in the at least one fisheye image frame and corresponding pixels in the combined resulting fisheye image data set; and provide the combined resulting fisheye image data set as an output for other autonomous vehicle subsystems.....................Please see Fig. 1. Abstract.
Cheatle et al. (US 7305146 B2)- A method of correcting the tilt, or rotation, of a casually captured image is described. Having corrected the rotation of the original image, the image is cropped by determining a crop boundary by applying one or more rules of composition to the image. The resulting image is more satisfactorily composed compared with prior art methods......................Please see Fig. 1. Abstract.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LUCIUS C.G. ALLEN whose telephone number is (703)756-5987. The examiner can normally be reached Mon - Fri 8-5pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chineyere Wills-Burns can be reached at (571)272-9752. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/LUCIUS CAMERON GREEN ALLEN/Examiner, Art Unit 2673                                                                                                                                                                                                        
/CHINEYERE WILLS-BURNS/Supervisory Patent Examiner, Art Unit 2673
Read full office action
Prosecution Timeline

Jul 24, 2023
Application Filed
Oct 30, 2025
Non-Final Rejection — §103, §112
Dec 17, 2025
Examiner Interview Summary
Dec 17, 2025
Applicant Interview (Telephonic)
Jan 16, 2026
Response Filed
Mar 17, 2026
Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/842,056
Patent 12597105
SEMANTIC-AWARE AUTO WHITE BALANCE
2y 5m to grant Granted Apr 07, 2026
17/791,058
Patent 12579755
OVERLAYING AUGMENTED REALITY (AR) CONTENT WITHIN AN AR HEADSET COUPLED TO A MAGNIFYING LOUPE
2y 5m to grant Granted Mar 17, 2026
18/032,016
Patent 12541972
Computing Device and Method for Handling an Object in Recorded Images
2y 5m to grant Granted Feb 03, 2026
18/245,942
Patent 12536247
Roughness Compensation Method and System, Image Processing Device, and Readable Storage Medium
2y 5m to grant Granted Jan 27, 2026
18/008,632
Patent 12529684
INSPECTION DEVICE, INSPECTION METHOD, AND INSPECTION PROGRAM
2y 5m to grant Granted Jan 20, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
71%
Grant Probability
99%
With Interview (+39.3%)
3y 0m
Median Time to Grant
Moderate
PTA Risk
Based on 38 resolved cases by this examiner. Grant probability derived from career allow rate.