Last updated: May 29, 2026
Application No. 18/635,337
LEARNING DEVICE, PREDICTION DEVICE, LEARNING PREDICTION DEVICE, NON-TRANSITORY COMPUTER-READABLE MEDIUM, LEARNING METHOD, PREDICTION METHOD, AND LEARNING PREDICTION METHOD

Final Rejection §101§103
Filed
Apr 15, 2024
Priority
Oct 21, 2021 — continuation of PCTJP2021038860
Examiner
AFRIFA-KYEI, ANTHONY D
Art Unit
2686
Tech Center
2600 — Communications
Assignee
Mitsubishi Electric Corporation
OA Round
2 (Final)
Interview Optional

— +13.6% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 65% grant rate with +13.6% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 549 resolved cases, 2023–2026
Examiner Intelligence

AFRIFA-KYEI, ANTHONY D View full profile →
Grants 65% of resolved cases
Career Allowance Rate
355 granted / 549 resolved
+2.7% vs TC avg
Moderate +14% lift
Without
With
+13.6%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
22 currently pending
Career history
585
Total Applications
across all art units
Statute-Specific Performance

§101
0.4%
-39.6% vs TC avg
§103
94.7%
+54.7% vs TC avg
§102
1.6%
-38.4% vs TC avg
§112
1.1%
-38.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 549 resolved cases
Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Status of Claims
In the amendment filed on January 29th, 2026, claims 1, 3, 5-20 have been amended, no claim has been cancelled and no new claim has been added.  Therefore, claims 1-20 are pending for examination.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim(s) recite(s) a mental process which the human mind can perform an observation, evaluation and judgement. This judicial exception is not integrated into a practical application because the claims are directed to mental processes without any significantly more. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because a human can organize and perform the mental process. Below is the analysis.
Claim 1 recites, “A learning device comprising: a processor to execute a program; and a memory to store past congestion-area data indicating congestion-related information of a station including a past congestion level of each of m areas of the station (where m is an integer of two or more), past sensor data indicating values detected in the past by one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of the m areas, and the program which, when executed by the processor, performs processes of, generating, by the processor, a first model by using the values indicated by the past sensor data as input data and using the congestion-related information indicated by the past congestion-area data as correct data, the first model being a learning model for predicting, from values detected by the one or more sensors, congestion-related information of a time point at which the one or more sensors perform detection; and generating, by the processor, a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second time point being a time point after the first time point, acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection; using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level of any of the m areas: and displaying, by the processor, to users at the station the predicted congestion- related information.

Step 2A Prong One: the claim is an abstract idea of nature or natural phenomenon because of the recited step of, “generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second time point being a time point after the first time point”. Despite the recitation of the processor generating the first as well as the second models, the claim language ultimately  uses generic computer components (processors), generic machine learning.  There is no specific technical problem solved.  There is no technical innovation.  The claims use multiple instances of function language. The claim does not explain why two separate models are necessary or what technical benefit the two-stage approach provides.

Step 2A Prong Two: the claimed invention remains an abstract idea because The two “non-abstract” idea elements do not tie the abstract idea to anything substantially, i.e. the processor to execute a program, the memory to store past congestion-area data indicating congestion-related information, past sensor data detected in the past by one or more sensors installed in different areas, only define the generation of the first traffic model, but fail to tie the second generated traffic model to the prediction step.  Failing to provide anything significantly more. Thereby, the invention is simply a gathering acquiring of information from sensors and computing devices, where generating a formulaic model for predicting traffic based on the gathered information may be done within the human mind

Claim 2 recites “wherein, the past congestion-area data indicates the congestion-related information of respective time points, and the past sensor data indicates the values of the respective time points.”  The elements fail but fail to tie the second generated traffic model to the prediction step.  They also fail to link or connect the first model and the second model to provide anything significantly more. Thereby do not cure the invention of natural phenomena that is gathering acquiring of information from sensors and computing devices and generating a formulaic model for predicting traffic based on the gathered information within the human mind


Claim 3 recites, “A learning device comprising: a processor to execute a program; and a memory to store past congestion-area data indicating congestion-related information of a station including a past congestion level of each of m areas of the station (where m is an integer of two or more), past sensor data indicating values detected in the past by one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of the m areas, and the program which, when executed by the processor, performs processes of, generating, by the processor, a first model by using the congestion-related information indicated by the past congestion-area data as input data and using the values indicated by the past sensor data as correct data, the first model being a learning model for predicting, from the congestion-related information, values to be detected by the one or more sensors at a time point at which the congestion-related information is acquired; using, by the processor, the first model to predict, from the congestion- related information at a first time point indicated by the past congestion-area data, values to be detected by the one or more sensors at the first time point; and generating, by the processor a second model by using the predicted values as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from values detected by the one or more sensors, the second time point being a time point after the first time point; acquiring, by the processor, present values detected by the one or more sensors of the station; using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection; using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level of any of the m areas; and displaying, by the processor, to users at the station the predicted congestion- related information..”

Step 2A Prong One: the claim is an abstract idea of nature or natural phenomenon because of the recited step of, “generating a second model by using the predicted values as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from values detected by the one or more sensors, the second time point being a time point after the first time point”. Despite the recitation of the processor generating the first as well as the second models, the claim language ultimately  uses generic computer components (processors), generic machine learning.  There is no specific technical problem solved.  There is no technical innovation.  The claims use multiple instances of function language. The claim does not explain why two separate models are necessary or what technical benefit the two-stage approach provides.

Though certain elements of the first model are tied to sensor data and processor(s) for future traffic predictive purposes, the second model fails to disclose the source of its generation.  Even though the data used in the second model is sourced from more sensors, simply acquiring/gathering information from sensor or computing devices, and thereafter, generating a formulaic model for predicting traffic within the human mind, based on the gathered information.

Step 2A Prong Two: the claimed invention remains an abstract idea because The two “non-abstract” idea elements do not tie the abstract idea to anything substantially, i.e. the processor to execute a program, the memory to store past congestion-area data indicating congestion-related information, past sensor data detected in the past by one or more sensors installed in different areas, fail to tie the second generated traffic model to the prediction step. Thereby, the invention is simply a gathering acquiring of information from sensors and computing devices and generating a formulaic model for predicting traffic based on the gathered information within the human mind

Claim 4 recites “wherein, the past congestion-area data indicates the congestion-related information of respective time points, and the past sensor data indicates the values of the respective time points.”  The elements fail but fail to tie the second generated traffic model to the prediction step.  They also fail to link or connect the first model and the second model to provide anything significantly more. Thereby do not cure the invention of natural phenomena that is gathering acquiring of information from sensors and computing devices and generating a formulaic model for predicting traffic based on the gathered information within the human mind

Claim 5 recites, “A prediction device comprising: a processor to execute a program; and a memory to store the program which, when executed by the processor, performs processes of,  acquiring, by the processor, values detected by one or more sensors installed in n areas of a station (where n is an integer of one or more, and n<m) out of m areas of the station (where m is an integer of two or more);using, by the processor, a first model to predict, from the acquired values, congestion-related information of the station of a time point at which the one or more sensors perform detection, the first model being a learning model for predicting, from values to be detected by the one or more sensors, the congestion- related information of the time point at which the one or more sensors perform detection, the first model being generated by using values indicated by past sensor data indicating the values detected in the past by the one or more sensors installed in the n areas (where n is an integer of one or more and n<m) out of the m areas as input data and using the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of the m areas as correct data; and using, by the processor, a second model to predict, from the predicted congestion-related information, a future congestion level of any of the m areas, the second model being a learning model for predicting the future congestion level from the congestion-related information, the second model being generated by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion- related information of a second time point indicated by the past congestion-area data as correct data, the second time point being a time point after the first time point acquiring, by the processor, present values detected by the one or more sensors of the station; using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection; using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level of any of the m areas; and displaying, by the processor, to users at the station the predicted congestion- related information..”

Step 2A Prong One: the claim is an abstract idea of nature or natural phenomenon because of the recited step of, “using a first model to predict, from the acquired values, congestion-related information of a time point at which the one or more sensors perform detection, the first model being a learning model for predicting, from values to be detected by the one or more sensors, the congestion-related information of the time point at which the one or more sensors perform detection, the first model being generated by using values indicated by past sensor data indicating the values detected in the past by the one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of the m areas as input data and using the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of the m areas as correct data”. Here the claim language fails to establish what generates the first prediction model, and only establishes a processor to execute a program; and a memory to store the program which, when executed by the processor, performs processes of, acquiring values detected by one or more sensors installed in n areas (where n is an integer of one or more, and n<m) out of m areas (where m is an integer of two or more).
Furthermore, by stating, “using a second model to predict, from the predicted congestion-related information, a future congestion level of any of the m areas, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second model being generated by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second time point being a time point after the first time point.” Despite the recitation of the processor generating the first as well as the second models, the claim language ultimately  uses generic computer components (processors), generic machine learning.  There is no specific technical problem solved.  There is no technical innovation.  The claims use multiple instances of function language. The claim does not explain why two separate models are necessary or what technical benefit the two-stage approach provides.


Step 2A Prong Two: the claimed invention remains an abstract idea because The two “non-abstract” idea elements do not tie the abstract idea to anything substantially, i.e. the processor to execute a program, the memory to store past congestion-area data indicating congestion-related information, past sensor data detected in the past by one or more sensors installed in different areas, fail to tie the second generated traffic model to the prediction step.  They also fail to link or connect the first model and the second model to any processor or computing means to provide anything significantly more. Thereby, the invention is simply a gathering acquiring of information from sensors and computing devices and generating a formulaic model for predicting traffic based on the gathered information within the human mind.

Claim 6, recites, “A prediction device comprising: a processor to execute a program; and a memory to store the program which, when executed by the processor, performs processes of, acquiring, by the processor, values detected by one or more sensors installed in n areas of a station (where n is an integer of one or more, and n<m) out of m areas of the station (where m is an integer of two or more); and using, by the processor, a second model to predict a future congestion level of any of the m areas from the acquired values, the a first model being a learning model for predicting, from congestion-related information of the station, values to be detected by the one or more sensors at a time point at which the congestion- related information is acquired, the first model being generated by using, as input data, the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of the m areas and by using, as correct data, values indicated by past sensor data indicating values detected by the one or more sensors in the past, the second model being a learning model for predicting a future congestion level from the values detected by the one or more sensors, the second model being generated by using, as input data, predicted values obtained by using the first model to predict the values to be detected by the one or more sensors at a first time point, from congestion-related information of the first time point indicated by the past congestion-area data and by using, as correct data, a congestion level in the congestion-related information of a second time point indicated by the past congestion-area data, the second time point being a time point after the first time point; acquiring, by the processor, present values detected by the one or more sensors of the station; using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection; using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level of any of the m areas; and displaying, by the processor, to users at the station the predicted congestion-related information.”

Step 2A Prong One: the claim is an abstract idea of nature or natural phenomenon because of the recited step of, “using a second model to predict a future congestion level of any of the m areas from the acquired values, the first model being a learning model for predicting, from congestion-related information, values to be detected by the one or more sensors at a time point at which the congestion-related information is acquired”. Here the claim language fails to establish what generates the second prediction model, and only establishes a processor to execute a program; and a memory to store the program which, when executed by the processor, performs processes of, acquiring values detected by one or more sensors installed in n areas (where n is an integer of one or more, and n<m) out of m areas (where m is an integer of two or more).
Furthermore, by stating, “using a first model to predict the values to be detected by the one or more sensors at a first time point, from congestion-related information of the first time point indicated by the past congestion-area data and by using, as correct data.” Despite the recitation of the processor generating the first as well as the second models, the claim language ultimately  uses generic computer components (processors), generic machine learning.  There is no specific technical problem solved.  There is no technical innovation.  The claims use multiple instances of function language. The claim does not explain why two separate models are necessary or what technical benefit the two-stage approach provides.


Step 2A Prong Two: the claimed invention remains an abstract idea because The two “non-abstract” idea elements do not tie the abstract idea to anything substantially, i.e. the processor to execute a program, the memory to store past congestion-area data indicating congestion-related information, past sensor data detected in the past by one or more sensors installed in different areas, fail to tie the second generated traffic model to the prediction step.  They also fail to link or connect the first model and the second model using any processor or any computing means to provide anything significantly more. Thereby, the invention is simply a gathering acquiring of information from sensors and computing devices and generating a formulaic model for predicting traffic based on the gathered information within the human mind.

Claim 7, recites, “A learning prediction device comprising: a processor to execute a program; and a memory to store past congestion-area data indicating congestion-related information of a station including a past congestion level of each of m areas of the station (where m is an integer of two or more), past sensor data indicating values detected in the past by one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of the m areas, and the program which, when executed by the processor, performs processes of, generating, by the processor, a first model by using the values indicated by the past sensor data as input data and using the congestion-related information indicated by the past congestion-area data as correct data, the first model being a learning model for predicting, from values detected by the one or more sensors, congestion-related information of a time point at which the one or more sensors perform detection; generating, by the processor, a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second time point being a time point after the first time point; acquiring, by the processor, present values detected by the one or more sensors of the station; using, by the processor, the first model to predict, from the acquired values, the congestion-related information of a time point at which the one or more sensors perform detection; and using, by the processor, the second model to predict, from the predicted congestion-related information, a future congestion level of any of the m areas; and displaying, by the processor, to users at the station the predicted congestion- related information..”

Step 2A Prong One: the claim is an abstract idea of nature or natural phenomenon because of the recited step of, “generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second time point being a time point after the first time point”. Despite the recitation of the processor generating the first as well as the second models, the claim language ultimately  uses generic computer components (processors), generic machine learning.  There is no specific technical problem solved.  There is no technical innovation.  The claims use multiple instances of function language. The claim does not explain why two separate models are necessary or what technical benefit the two-stage approach provides.

Though certain elements of the first and second model are tied to sensor data and processor(s) for future traffic predictive purposes, how and what generates the second model is not disclosed, enabling the possibility of natural phenomena of the predictive step being done by the human mind; simply acquiring/gathering information from sensor or computing devices, thereafter, enabling the generation of a formulaic model for predicting traffic within the human mind, based on the gathered information 

Step 2A Prong Two: the claimed invention remains an abstract idea because The two “non-abstract” idea elements do not tie the abstract idea to anything substantially, i.e. the processor to execute a program, the memory to store past congestion-area data indicating congestion-related information, past sensor data detected in the past by one or more sensors installed in different areas, only define the generation of the first traffic model, but fail to tie the second generated traffic model to the prediction step.  Thereby, the invention is simply a gathering acquiring of information from sensors and computing devices and generating a formulaic model for predicting traffic based on the gathered information within the human mind.

Claim 8, recites, “A learning prediction device comprising: a processor to execute a program; and a memory to store past congestion-area data indicating congestion-related information including a past congestion level of each of m areas of a station (where m is an integer of two or more), past sensor data indicating values detected in the past by one or more sensors installed in n areas of the station (where n is an integer of one or more and n<m) out of the m areas, and the program which, when executed by the processor, performs processes of, generating, by the processor, a first model by using the congestion-related information of the station indicated by the past congestion-area data as input data and using the values indicated by the past sensor data as correct data, the first model being a learning model for predicting, from the congestion-related information, values to be detected by the one or more sensors at a time point at which the congestion-related information is acquired; using, by the processor, the first model to predict, from the congestion- related information at a first time point indicated by the past congestion-area data, values to be detected by the one or more sensors at the first time point; generating, by the processor, a second model by using the predicted values as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the values detected by the one or more sensors, the second time point being a time point after the first time point; acquiring, by the processor, the values detected by the one or more sensors; and using, by the processor, the second model to predict, from the acquired values, a future congestion level of any of the m areas; and displaying, by the processor, to users at the station the future congestion level of any of them areas..”

Step 2A Prong One: the claim is an abstract idea of nature or natural phenomenon because of the recited step of, “generating a second model by using the predicted values as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the values detected by the one or more sensors, the second time point being a time point after the first time point”. Despite the recitation of the processor generating the first as well as the second models, the claim language ultimately  uses generic computer components (processors), generic machine learning.  There is no specific technical problem solved.  There is no technical innovation.  The claims use multiple instances of function language. The claim does not explain why two separate models are necessary or what technical benefit the two-stage approach provides.
. 
Though certain elements of the first and second model are tied to sensor data and processor(s) for future traffic predictive purposes, how and what generates the second model is not disclosed, enabling the possibility of natural phenomena of the predictive step being done by the human mind; simply acquiring/gathering information from sensor or computing devices, thereafter, enabling the generation of a formulaic model for predicting traffic within the human mind, based on the gathered information 

Step 2A Prong Two: the claimed invention remains an abstract idea because The two “non-abstract” idea elements do not tie the abstract idea to anything substantially, i.e. the processor to execute a program, the memory to store past congestion-area data indicating congestion-related information, past sensor data detected in the past by one or more sensors installed in different areas, only define the generation of the first traffic model, but fail to tie the second generated traffic model to the prediction step.   Thereby, the invention is simply a gathering acquiring of information from sensors and computing devices and generating a formulaic model for predicting traffic based on the gathered information within the human mind

Claim 9, recites, “A non-transitory computer-readable medium that stores therein a program that causes a computer to execute processes of: storing, by the computer, past congestion-area data indicating congestion-related information of a station including a past congestion level of each of m areas of the station (where m is an integer of two or more);storing, by the computer, past sensor data indicating values detected in the past by one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of the m areas; generating, by the computer, a first model by using the values indicated by the past sensor data as input data and using the congestion-related information indicated by the past congestion-area data as correct data, the first model being a learning model for predicting, from values detected by the one or more sensors, congestion-related information of a time point at which the one or more sensors perform detection; and generating, by the computer, a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second time point being a time point after the first time point. acquiring, by the computer, present values detected by the one or more sensors of the station; using, by the computer, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection; using, by the computer, the second model to predict, from the predicted congestion- related information, the future congestion level of any of the m areas; and displaying, by the computer, to users at the station the predicted congestion-related information.

Step 2A Prong One: the claim is an abstract idea of nature or natural phenomenon because of the recited step of, “generating a first model by using the values indicated by the past sensor data as input data and using the congestion-related information indicated by the past congestion-area data as correct data, the first model being a learning model for predicting, from values detected by the one or more sensors, congestion-related information of a time point at which the one or more sensors perform detection; and generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second time point being a time point after the first time point”. 
Despite the recitation of the processor generating the first as well as the second models, the claim language ultimately  uses generic computer components (processors), generic machine learning.  There is no specific technical problem solved.  There is no technical innovation.  The claims use multiple instances of function language. The claim does not explain why two separate models are necessary or what technical benefit the two-stage approach provides.
Though certain elements of the first and second model are tied to sensor data for future traffic predictive purposes, how and what generates the first and second model is not disclosed, enabling the possibility of natural phenomena of the predictive step being done by the human mind; by simply acquiring/gathering information from sensor or computing devices, thereafter, further generating a formulaic model for predicting traffic within the human mind, based on the gathered information. 

Step 2A Prong Two: the claimed invention remains an abstract idea because The two “non-abstract” idea elements do not tie the abstract idea to anything substantially, i.e. the processor to execute a program, the memory to store past congestion-area data indicating congestion-related information, past sensor data detected in the past by one or more sensors installed in different areas, only define the how the data is derived. What actually generates the first and second models is not addressed in the claims, nor the prediction step.  Thereby, failing to add anything significantly more.  The invention is simply a gathering acquiring of information from sensors and computing devices and generating a formulaic model for predicting traffic based on the gathered information within the human mind

Claim 10, recites, “A non-transitory computer-readable medium that stores therein a program that causes a computer to execute processes of: storing, by the computer, past congestion-area data indicating congestion-related information of a station including a past congestion level of each of m areas of the station (where m is an integer of two or more);storing, by the computer, past sensor data indicating values detected in the past by one or more sensors installed in n areas of the station (where n is an integer of one or more and n<m) out of the m areas; generating, by the computer, a first model by using the congestion-related information indicated by the past congestion-area data as input data and using the values indicated by the past sensor data as correct data, the first model being a learning model for predicting, from the congestion-related information, values to be detected by the one or more sensors at a time point at which the congestion-related information is acquired; using, by the computer, the first model to predict, from the congestion-related information at a first time point indicated by the past congestion-area data, values to be detected by the one or more sensors at the first time point; and generating, by the computer a second model by using the predicted values as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from values detected by the one or more sensors, the second time point being a time point after the first time point; acquiring, by the computer, present values detected by the one or more sensors of the station; using, by the computer, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection; using, by the computer, the second model to predict, from the predicted congestion- related information, the future congestion level of any of the m areas; and displaying, by the computer, to users at the station the predicted congestion-related information”

Step 2A Prong One: the claim is an abstract idea of nature or natural phenomenon because of the recited step of, “generating a first model by using the congestion-related information indicated by the past congestion-area data as input data and using the values indicated by the past sensor data as correct data, the first model being a learning model for predicting, from the congestion-related information, values to be detected by the one or more sensors at a time point at which the congestion-related information is acquired; using the first model to predict, from the congestion-related information at a first time point indicated by the past congestion-area data, values to be detected by the one or more sensors at the first time point; and generating a second model by using the predicted values as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from values detected by the one or more sensors, the second time point being a time point after the first time point.” 
Despite the recitation of the processor generating the first as well as the second models, the claim language ultimately  uses generic computer components (processors), generic machine learning.  There is no specific technical problem solved.  There is no technical innovation.  The claims use multiple instances of function language. The claim does not explain why two separate models are necessary or what technical benefit the two-stage approach provides. 
Though certain elements of the first and second model are tied to sensor data for future traffic predictive purposes, how and what generates the first and second model is not disclosed, enabling the possibility of natural phenomena of the predictive step being done by the human mind; by simply acquiring/gathering information from sensor or computing devices, thereafter, generating a formulaic model for predicting traffic within the human mind, based on the gathered information. 

Step 2A Prong Two: the claimed invention remains an abstract idea because The two “non-abstract” idea elements do not tie the abstract idea to anything substantially, i.e. the processor to execute a program, the memory to store past congestion-area data indicating congestion-related information, past sensor data detected in the past by one or more sensors installed in different areas, only define the how the data is derived. What actually generates the first and second models is not addressed in the claims, nor the prediction step.  Thereby, failing to add anything significantly more.  The invention is simply a gathering acquiring of information from sensors and computing devices and generating a formulaic model for predicting traffic based on the gathered information within the human mind.

Claim 11, recites, “A non-transitory computer-readable medium that stores therein a program that causes a computer to execute processes of: acquiring, by the computer, values detected by one or more sensors installed in n areas of a station (where n is an integer of one or more, and n<m) out of m areas of the station (where m is an integer of two or more);using, by the computer, a first model to predict, from the acquired values, congestion-related information of the station of a time point at which the one or more sensors perform detection, the first model being a learning model for predicting, from values to be detected by the one or more sensors, the congestion-related information of the time point at which the one or more sensors perform detection, the first model being generated by using values indicated by past sensor data indicating the values detected in the past by the one or more sensors installed in the n areas (where n is an integer of one or more and n<m) out of the m areas as input data and using the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of the m areas as correct data; and using, by the computer, a second model to predict, from the predicted congestion- related information, a future congestion level of any of the m areas, the second model being a learning model for predicting a-the future congestion level from the congestion- related information, the second model being generated by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second time point being a time point after the first time point; and displaying, by the computer, to users at the station the predicted congestion-related information..”

Step 2A Prong One: the claim is an abstract idea of nature or natural phenomenon because of the recited step of, “using a first model to predict, from the acquired values, congestion-related information of a time point at which the one or more sensors perform detection, the first model being a learning model for predicting, from values to be detected by the one or more sensors, the congestion-related information of the time point at which the one or more sensors perform detection, the first model being generated by using values indicated by past sensor data indicating the values detected in the past by the one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of the m areas as input data and using the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of the m areas as correct data.”   And, “using a second model to predict, from the predicted congestion-related information, a future congestion level of any of the m areas, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second model being generated by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second time point being a time point after the first time point”.
Despite the recitation of the processor generating the first as well as the second models, the claim language ultimately  uses generic computer components (processors), generic machine learning.  There is no specific technical problem solved.  There is no technical innovation.  The claims use multiple instances of function language. The claim does not explain why two separate models are necessary or what technical benefit the two-stage approach provides. 
Though certain elements of the first and second model are tied to sensor data for future traffic predictive purposes, how and what generates the first and second model is not disclosed, enabling the possibility of natural phenomena of the predictive step being done by the human mind; by simply acquiring/gathering information from sensor or computing devices, thereafter, generating a formulaic model for predicting traffic within the human mind, based on the gathered information. 

Step 2A Prong Two: the claimed invention remains an abstract idea because The two “non-abstract” idea elements do not tie the abstract idea to anything substantially, i.e. the processor to execute a program, the memory to store past congestion-area data indicating congestion-related information, past sensor data detected in the past by one or more sensors installed in different areas, only define the how the data is derived. What actually generates the first and second models is not addressed in the claims, nor the prediction step.  Thereby, failing to add anything significantly more.  The invention is simply a gathering acquiring of information from sensors and computing devices and generating a formulaic model for predicting traffic based on the gathered information within the human mind

Claim 12, recites, “A non-transitory computer-readable medium that stores therein a program that causes a computer to execute processes of: acquiring, by the computer, values detected by one or more sensors installed in n areas of a station (where n is an integer of one or more, and n<m) out of m areas of the station (where m is an integer of two or more); and using, by the computer, a second model to predict a future congestion level of any of the m areas from the acquired values, the first model being a learning model for predicting, from congestion-related information of the station, values to be detected by the one or more sensors at a time point at which the congestion-related information is acquired, the first model being generated by using, as input data, the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of the m areas and by using, as correct data, values indicated by past sensor data indicating values detected by the one or more sensors in the past, the second model being a learning model for predicting a-the future congestion level from the values detected by the one or more sensors, the second model being generated by using, as input data, predicted values obtained by using a-the first model to predict the values to be detected by the one or more sensors at a first time point, from congestion-related information of the first time point indicated by the past congestion-area data and by using, as correct data, a congestion level in the congestion- related information of a second time point indicated by the past congestion-area data, the second time point being a time point after the first time point; and displaying, by the computer, to users at the station the predicted congestion-related information..”

Step 2A Prong One: the claim is an abstract idea of nature or natural phenomenon because of the recited step of, “using a second model to predict a future congestion level of any of the m areas from the acquired values, the first model being a learning model for predicting, from congestion-related information, values to be detected by the one or more sensors at a time point at which the congestion-related information is acquired, the first model being generated by using, as input data, the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of the m areas and by using, as correct data, values indicated by past sensor data indicating values detected by the one or more sensors in the past, the second model being a learning model for predicting a future congestion level from the values detected by the one or more sensors, the second model being generated by using, as input data, predicted values obtained by using a first model to predict the values to be detected by the one or more sensors at a first time point”.
Despite the recitation of the processor generating the first as well as the second models, the claim language ultimately  uses generic computer components (processors), generic machine learning.  There is no specific technical problem solved.  There is no technical innovation.  The claims use multiple instances of function language. The claim does not explain why two separate models are necessary or what technical benefit the two-stage approach provides. 
Though certain elements of the first and second model are tied to sensor data for future traffic predictive purposes, how and what generates the first and second model is not disclosed, enabling the possibility of natural phenomena of the predictive step being done by the human mind; by simply acquiring/gathering information from sensor or computing devices, thereafter, generating a formulaic model for predicting traffic within the human mind, based on the gathered information. 

Step 2A Prong Two: the claimed invention remains an abstract idea because The two “non-abstract” idea elements do not tie the abstract idea to anything substantially, i.e. A non-transitory computer-readable medium that stores therein a program that causes a computer to execute processes of: acquiring values detected by one or more sensors installed in n areas (where n is an integer of one or more, and n<m) out of m areas (where m is an integer of two or more), only define the how the data is derived. What actually generates the first and second models is not addressed in the claims, nor the prediction step.  Thereby, failing to add anything significantly more.  The invention is simply a gathering acquiring of information from sensors and computing devices and generating a formulaic model for predicting traffic based on the gathered information within the human mind.


Claim 13, recites, “A non-transitory computer-readable medium that stores therein a program that causes a computer to execute processes of: storing, by the computer, past congestion-area data indicating congestion-related information of a station including a past congestion level of each of m areas of the station (where m is an integer of two or more);storing, by the computer, past sensor data indicating values detected in the past by one or more sensors installed in n areas of the station (where n is an integer of one or more and n<m) out of the m areas; generating, by the computer, a first model by using the values indicated by the past sensor data as input data and using the congestion-related information indicated by the past congestion-area data as correct data, the first model being a learning model for predicting, from values detected by the one or more sensors, congestion-related information of a time point at which the one or more sensors perform detection; generating, by the computer, a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second time point being a time point after the first time point; acquiring, by the computer, the values detected by the one or more sensors; using, by the computer, the first model to predict, from the acquired values, the congestion-related information of a time point at which the one or more sensors perform detection; and using, by the computer, the second model to predict, from the predicted congestion- related information, a-the future congestion level of any of the m areas: and displaying, by the computer, to users at the station the predicted congestion-related information.”

Step 2A Prong One: the claim is an abstract idea of nature or natural phenomenon because of the recited step of, “generating a first model by using the values indicated by the past sensor data as input data and using the congestion-related information indicated by the past congestion-area data as correct data, the first model being a learning model for predicting, from values detected by the one or more sensors, congestion-related information of a time point at which the one or more sensors perform detection; generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data”.
Despite the recitation of the processor generating the first as well as the second models, the claim language ultimately  uses generic computer components (processors), generic machine learning.  There is no specific technical problem solved.  There is no technical innovation.  The claims use multiple instances of function language. The claim does not explain why two separate models are necessary or what technical benefit the two-stage approach provides.  
Though certain elements of the first and second model are tied to sensor data for future traffic predictive purposes, how and what generates the first and second model is not disclosed, enabling the possibility of natural phenomena of the predictive step being done by the human mind; by simply acquiring/gathering information from sensor or computing devices, thereafter, further generating a formulaic model for predicting traffic within the human mind, based on the gathered information. 

Step 2A Prong Two: the claimed invention remains an abstract idea because The two “non-abstract” idea elements do not tie the abstract idea to anything substantially, i.e. A non-transitory computer-readable medium that stores therein a program that causes a computer to execute processes of: storing past congestion-area data indicating congestion-related information including a past congestion level of each of m areas (where m is an integer of two or more); storing past sensor data indicating values detected in the past by one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of the m areas, only define the how the data is derived. What actually generates the first and second models is not addressed in the claims, nor the prediction step.  Thereby, failing to add anything significantly more.  The invention is simply a gathering acquiring of information from sensors and computing devices and generating a formulaic model for predicting traffic based on the gathered information within the human mind

Claim 14, recites, “A non-transitory computer-readable medium that stores therein a program that causes a computer to execute processes of: storing, by the computer, past congestion-area data indicating congestion-related information including a past congestion level of each of m areas of a station (where m is an integer of two or more);storing, by the computer, past sensor data indicating values detected in the past by one or more sensors installed in n areas of the station (where n is an integer of one or more and n<m) out of the m areas; generating, by the computer, a first model by using the congestion-related information of the station indicated by the past congestion-area data as input data and using the values indicated by the past sensor data as correct data, the first model being a learning model for predicting, from the congestion-related information, values to be detected by the one or more sensors at a time point at which the congestion-related information is acquired; using, by the computer, the first model to predict, from the congestion-related information at a first time point indicated by the past congestion-area data, values to be detected by the one or more sensors at the first time point; generating, by the computer, a second model by using the predicted values as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the values detected by the one or more sensors, the second time point being a time point after the first time point; acquiring, by the computer, the values detected by the one or more sensors; and using, by the computer, the second model to predict, from the acquired values, a-the future congestion level of any of the m areas; and displaying, by the computer, to users at the station the future congestion level of any of the m areas.”

Step 2A Prong One: the claim is an abstract idea of nature or natural phenomenon because of the recited step of, “generating a first model by using the congestion-related information indicated by the past congestion-area data as input data and using the values indicated by the past sensor data as correct data, the first model being a learning model for predicting, from the congestion-related information, values to be detected by the one or more sensors at a time point at which the congestion-related information is acquired; using the first model to predict, from the congestion-related information at a first time point indicated by the past congestion-area data, values to be detected by the one or more sensors at the first time point; generating a second model by using the predicted values as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data”.
Despite the recitation of the processor generating the first as well as the second models, the claim language ultimately  uses generic computer components (processors), generic machine learning.  There is no specific technical problem solved.  There is no technical innovation.  The claims use multiple instances of function language. The claim does not explain why two separate models are necessary or what technical benefit the two-stage approach provides.  
Though certain elements of the first and second model are tied to sensor data for future traffic predictive purposes, how and what generates the first and second model is not disclosed, enabling the possibility of natural phenomena of the predictive step being done by the human mind; by simply acquiring/gathering information from sensor or computing devices, thereafter, further generating a formulaic model for predicting traffic within the human mind, based on the gathered information. 

Step 2A Prong Two: the claimed invention remains an abstract idea because The two “non-abstract” idea elements do not tie the abstract idea to anything substantially, i.e. A non-transitory computer-readable medium that stores therein a program that causes a computer to execute processes of: storing past congestion-area data indicating congestion-related information including a past congestion level of each of m areas (where m is an integer of two or more); storing past sensor data indicating values detected in the past by one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of the m areas, only define the how the data is derived. What actually generates the first and second models is not addressed in the claims, nor the prediction step.  Thereby, failing to add anything significantly more.  The invention is simply a gathering acquiring of information from sensors and computing devices and generating a formulaic model for predicting traffic based on the gathered information within the human mind

Claim 15, recites, “A learning method comprising: generating, by a processor, a first model that is a learning model for predicting, from values detected by one or more sensors of a station, congestion-related information of the station of a time point at which the one or more sensors perform detection, the first model being generated by using values indicated by past sensor data indicating the values detected in the past by the one or more sensors installed in n areas of the station (where n is an integer of one or more and n<m) out of m areas of the station (where m is an integer of two or more) as input data and using the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of the m areas as correct data; and generating, by the processor, a second model that is a learning model for predicting a future congestion level from the congestion-related information, the second model being generated by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion- related information of a second time point indicated by the past congestion-area data as correct data, the second time point being a time point after the first time point. acquiring, by the processor, present values detected by the one or more sensors at the station; using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection; using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level of any of them areas; and displaying, by the processor, to users at the station the predicted congestion- related information.”

Step 2A Prong One: the claim is an abstract idea of nature or natural phenomenon because of the recited step of, “the first model being generated by using values indicated by past sensor data indicating the values detected in the past by the one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of m areas (where m is an integer of two or more) as input data and using the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of the m areas as correct data; and generating a second model that is a learning model for predicting a future congestion level from the congestion-related information, the second model being generated by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data”.
Despite the recitation of the processor generating the first as well as the second models, the claim language ultimately  uses generic computer components (processors), generic machine learning.  There is no specific technical problem solved.  There is no technical innovation.  The claims use multiple instances of function language. The claim does not explain why two separate models are necessary or what technical benefit the two-stage approach provides.  
Though certain elements of the first and second model are tied to sensor data for future traffic predictive purposes, how and what generates the first and second model is not disclosed, enabling the possibility of natural phenomena of the predictive step being done by the human mind; by simply acquiring/gathering information from sensor or computing devices, thereafter, further generating a formulaic model for predicting traffic within the human mind, based on the gathered information. 

Step 2A Prong Two: the claimed invention remains an abstract idea because The two “non-abstract” idea elements do not tie the abstract idea to anything substantially, i.e. a learning method comprising: generating a first model that is a learning model for predicting, from values detected by one or more sensors, congestion-related information of a time point at which the one or more sensors perform detection, the first model being generated by using values indicated by past sensor data indicating the values detected in the past by the one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of m areas (where m is an integer of two or more) as input data and using the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of the m areas as correct data; and generating a second model that is a learning model for predicting a future congestion level from the congestion-related information, the second model being generated by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, only define the how the data is derived. What actually generates the first and second models is not addressed in the claims, nor the prediction step.  Thereby, failing to add anything significantly more.  The invention is simply a gathering acquiring of information from sensors and computing devices and generating a formulaic model for predicting traffic based on the gathered information within the human mind

Claim 16, recites, “A learning method comprising: generating, by a processor, a first model by using, as input data, the congestion- related information of a station indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of m areas of the station (where m is an integer of two or more) and using, as correct data, values indicated by past sensor data indicating the values detected in the past by the one or more sensors installed in n areas of the station (where n is an integer of one or more and n<m) out of the m areas, the first model being a learning model for predicting, from congestion-related information, values to be detected by one or more sensors of a time point at which the congestion-related information is acquired; using, by the processor, the first model to predict, from the congestion-related information at a first time point indicated by the past congestion-area data, values to be detected by the one or more sensors at the first time point; and generating a second model by using the predicted values as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from values detected by the one or more sensors, the second time point being a time point after the first time point. acquiring, by the processor, present values detected by the one or more sensors at the station; using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection; using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level of any of them areas; and displaying, by the processor, to users at the station the predicted congestion-related information”

Step 2A Prong One: the claim is an abstract idea of nature or natural phenomenon because of the recited step of, “generating a first model by using, as input data, the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of m areas (where m is an integer of two or more) and using, as correct data, values indicated by past sensor data indicating the values detected in the past by the one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of the m areas, the first model being a learning model for predicting, from congestion-related information, values to be detected by one or more sensors of a time point at which the congestion-related information is acquired; using the first model to predict, from the congestion-related information at a first time point indicated by the past congestion-area data, values to be detected by the one or more sensors at the first time point; and generating a second model by using the predicted values as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data”.
Despite the recitation of the processor generating the first as well as the second models, the claim language ultimately  uses generic computer components (processors), generic machine learning.  There is no specific technical problem solved.  There is no technical innovation.  The claims use multiple instances of function language. The claim does not explain why two separate models are necessary or what technical benefit the two-stage approach provides.  
Though certain elements of the first and second model are tied to sensor data for future traffic predictive purposes, how and what generates the first and second model is not disclosed, enabling the possibility of natural phenomena of the predictive step being done by the human mind; by simply acquiring/gathering information from sensor or computing devices, thereafter, further generating a formulaic model for predicting traffic within the human mind, based on the gathered information. 

Step 2A Prong Two: the claimed invention remains an abstract idea because The two “non-abstract” idea elements do not tie the abstract idea to anything substantially, i.e. generating a first model by using, as input data, the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of m areas (where m is an integer of two or more) and using, as correct data, values indicated by past sensor data indicating the values detected in the past by the one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of the m areas, the first model being a learning model for predicting, from congestion-related information, values to be detected by one or more sensors of a time point at which the congestion-related information is acquired; using the first model to predict, from the congestion-related information at a first time point indicated by the past congestion-area data, values to be detected by the one or more sensors at the first time point; and generating a second model by using the predicted values as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, only define the how the data is derived. What actually generates the first and second models is not addressed in the claims, nor the prediction step.  Thereby, failing to add anything significantly more.  The invention is simply a gathering acquiring of information from sensors and computing devices and generating a formulaic model for predicting traffic based on the gathered information within the human mind

Claim 17, recites, “A prediction method comprising: acquiring, by a processor, values detected by one or more sensors installed in n areas of a station (where n is an integer of one or more, and n<m) out of m areas of the station (where m is an integer of two or more);using, by the processor, a first model to predict, from the acquired values, congestion-related information of a time point at which the one or more sensors perform detection, the first model being a learning model for predicting, from values to be detected by the one or more sensors, the congestion-related information of the station of the time point at which the one or more sensors perform detection, the first model being generated by using values indicated by past sensor data indicating the values detected in the past by the one or more sensors installed in the n areas (where n is an integer of one or more and n<m) out of the m areas as input data and using the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of the m areas as correct data; and using, by the processor, a second model to predict, from the predicted congestion-related information, a future congestion level of any of the m areas, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second model being generated by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second time point being a time point after the first time point; and displaying, by the processor, to users at the station the predicted congestion-related information.”

Step 2A Prong One: the claim is an abstract idea of nature or natural phenomenon because of the recited step of, “using a first model to predict, from the acquired values, congestion-related information of a time point at which the one or more sensors perform detection, the first model being a learning model for predicting, from values to be detected by the one or more sensors, the congestion-related information of the time point at which the one or more sensors perform detection, the first model being generated by using values indicated by past sensor data indicating the values detected in the past by the one or more sensors installed in n areas (where n is an integer of one or more and n<m)”, and, “using a second model to predict, from the predicted congestion-related information, a future congestion level of any of the m areas, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second model being generated by using the congestion-related information of a first time point indicated by the past congestion-area data as input data”.
Despite the recitation of the processor generating the first as well as the second models, the claim language ultimately  uses generic computer components (processors), generic machine learning.  There is no specific technical problem solved.  There is no technical innovation.  The claims use multiple instances of function language. The claim does not explain why two separate models are necessary or what technical benefit the two-stage approach provides.  
Though certain elements of the first and second model are tied to sensor data for future traffic predictive purposes, how and what generates the first and second model is not disclosed, enabling the possibility of natural phenomena of the predictive step being done by the human mind; by simply acquiring/gathering information from sensor or computing devices, thereafter, further generating a formulaic model for predicting traffic within the human mind, based on the gathered information. 

Step 2A Prong Two: the claimed invention remains an abstract idea because The two “non-abstract” idea elements do not tie the abstract idea to anything substantially, i.e. the first model being generated by using values indicated by past sensor data indicating the values detected in the past by the one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of the m areas, and the second model being generated by using the congestion-related information of a first time point indicated by the past congestion-area data as input data, only define the how the data is derived. What actually generates the first and second models is not addressed in the claims, nor the prediction step.  Thereby, failing to add anything significantly more.  The invention is simply a gathering acquiring of information from sensors and computing devices and generating a formulaic model for predicting traffic based on the gathered information within the human mind.

Claim 18, recites, “A prediction method comprising: acquiring, by the processor, values detected by one or more sensors installed in n areas of a station (where n is an integer of one or more, and n<m) out of m areas of the station (where m is an integer of two or more); and using, by the processor, a second model to predict a future congestion level of any of the m areas from the acquired values, the first model being a learning model for predicting, from congestion-related information of the station, values to be detected by the one or more sensors at a time point at which the congestion-related information is acquired, the first model being generated by using, as input data, the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of the m areas and by using, as correct data, values indicated by past sensor data indicating values detected by the one or more sensors in the past, the second model being a learning model for predicting a-the future congestion level from the values detected by the one or more sensors, the second model being generated by using, as input data, predicted values obtained by using a-the first model to predict the values to be detected by the one or more sensors at a first time point, from congestion-related information of the first time point indicated by the past congestion-area data and by using, as correct data, a congestion level in the congestion- related information of a second time point indicated by the past congestion-area data, the second time point being a time point after the first time point; and displaying, by the processor, to users at the station the future congestion level.”

Step 2A Prong One: the claim is an abstract idea of nature or natural phenomenon because of the recited step of, “using a second model to predict a future congestion level of any of the m areas from the acquired values, the first model being a learning model for predicting, from congestion-related information, values to be detected by the one or more sensors at a time point at which the congestion-related information is acquired, the first model being generated by using, as input data, the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of the m areas and by using, as correct data, values indicated by past sensor data indicating values detected by the one or more sensors in the past, the second model being a learning model for predicting a future congestion level from the values detected by the one or more sensors, the second model being generated by using, as input data, predicted values obtained by using a first model to predict the values to be detected by the one or more sensors at a first time point”.
Despite the recitation of the processor generating the first as well as the second models, the claim language ultimately  uses generic computer components (processors), generic machine learning.  There is no specific technical problem solved.  There is no technical innovation.  The claims use multiple instances of function language. The claim does not explain why two separate models are necessary or what technical benefit the two-stage approach provides.  
Though certain elements of the first and second model are tied to sensor data for future traffic predictive purposes, how and what generates the first and second model is not disclosed, enabling the possibility of natural phenomena of the predictive step being done by the human mind; by simply acquiring/gathering information from sensor or computing devices, thereafter, further generating a formulaic model for predicting traffic within the human mind, based on the gathered information. 

Step 2A Prong Two: the claimed invention remains an abstract idea because The two “non-abstract” idea elements do not tie the abstract idea to anything substantially, i.e. acquiring values detected by one or more sensors installed in n areas (where n is an integer of one or more, and n<m) out of m areas (where m is an integer of two or more), only define the how the data is derived. What actually generates the first and second models is not addressed in the claims, nor the prediction step.  Thereby, failing to add anything significantly more.  The invention is simply a gathering acquiring of information from sensors and computing devices and generating a formulaic model for predicting traffic based on the gathered information within the human mind.

Claim 19, recites, “A learning prediction method comprising: generating, by a processor, a first model that is a learning model for predicting, from values detected by one or more sensors of a station, congestion-related information of the station of a time point at which the one or more sensors perform detection, the first model being generated by using values indicated by past sensor data indicating the values detected in the past by the one or more sensors installed in n areas of the station (where n is an integer of one or more and n<m) out of m areas of the station (where m is an integer of two or more) as input data and using the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of the m areas as correct data; generating, by the processor, a second model that is a learning model for predicting a future congestion level from the congestion-related information, the second model being generated by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion- related information of a second time point as correct data, the second time point being a time point after the first time point; an acquiring unit configured to acquiring, by the processor, values detected by the one or more sensors; using, by the processor, the first model to predict, from the acquired values, the congestion-related information of a time point at which the one or more sensors perform detection; and using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level of any of the m areas; and displaying, by the processor, to users at the station the predicted congestion-related information

Step 2A Prong One: the claim is an abstract idea of nature or natural phenomenon because of the recited step of, “the first model being generated by using values indicated by past sensor data indicating the values detected in the past by the one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of m areas (where m is an integer of two or more) as input data and using the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of the m areas as correct data; generating a second model that is a learning model for predicting a future congestion level from the congestion-related information, the second model being generated by using the congestion-related information of a first time point indicated by the past congestion-area data”.
Despite the recitation of the processor generating the first as well as the second models, the claim language ultimately  uses generic computer components (processors), generic machine learning.  There is no specific technical problem solved.  There is no technical innovation.  The claims use multiple instances of function language. The claim does not explain why two separate models are necessary or what technical benefit the two-stage approach provides.  
Though certain elements of the first and second model are tied to sensor data for future traffic predictive purposes, how and what generates the first and second model is not disclosed, enabling the possibility of natural phenomena of the predictive step being done by the human mind; by simply acquiring/gathering information from sensor or computing devices, thereafter, further generating a formulaic model for predicting traffic within the human mind, based on the gathered information. 

Step 2A Prong Two: the claimed invention remains an abstract idea because The two “non-abstract” idea elements do not tie the abstract idea to anything substantially, i.e. using values indicated by past sensor data indicating the values detected in the past by the one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of m areas (where m is an integer of two or more) as input data and using the congestion-related information indicated by past congestion-area data, only define the how the data is derived. What actually generates the first and second models is not addressed in the claims, nor the prediction step.  Thereby, failing to add anything significantly more.  The invention is simply a gathering acquiring of information from sensors and computing devices and generating a formulaic model for predicting traffic based on the gathered information within the human mind

Claim 20, recites, “A learning prediction method comprising: generating, by the processor, a first model that is a learning model for predicting, from congestion-related information of a station, values to be detected by one or more sensors of a time point at which the congestion-related information is acquired, the first model being generated by using, as input data, the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of m areas of the station (where m is an integer of two or more) and using, as correct data, values indicated by past sensor data indicating the values detected in the past by the one or more sensors installed in n areas of the station (where n is an integer of one or more and n<m) out of the m areas; using, by the processor, the first model to predict, from the congestion-related information at a first time point indicated by the past congestion-area data, values to be detected by the one or more sensors at the first time point; generating, by the processor, a second model that is a learning model for predicting a future congestion level from values detected by the one or more sensors, the second model being generated by using the predicted values as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second time point being a time point after the first time point; an acquiring unit configured to acquiring, by the processor, values detected by the one or more sensors; and using, by the processor, the second model to predict, from the acquired values, the future congestion level of any of the m areas; and displaying, by the processor, to users at the station the future congestion level of any of the m areas..”

Step 2A Prong One: the claim is an abstract idea of nature or natural phenomenon because of the recited step of, “generating a first model that is a learning model for predicting, from congestion-related information, values to be detected by one or more sensors of a time point at which the congestion-related information is acquired, the first model being generated by using, as input data”, as well as, “.
Despite the recitation of the processor generating the first as well as the second models, the claim language ultimately  uses generic computer components (processors), generic machine learning.  There is no specific technical problem solved.  There is no technical innovation.  The claims use multiple instances of function language. The claim does not explain why two separate models are necessary or what technical benefit the two-stage approach provides.  
Though certain elements of the first and second model are tied to sensor data for future traffic predictive purposes, how and what generates the first and second model is not disclosed, enabling the possibility of natural phenomena of the predictive step being done by the human mind; by simply acquiring/gathering information from sensor or computing devices, thereafter, further generating a formulaic model for predicting traffic within the human mind, based on the gathered information. 

Step 2A Prong Two: the claimed invention remains an abstract idea because The two “non-abstract” idea elements do not tie the abstract idea to anything substantially, i.e. a learning prediction method comprising: generating a first model that is a learning model for predicting, from congestion-related information, values to be detected by one or more sensors of a time point at which the congestion-related information is acquired, only define the how the data is derived. What actually generates the first and second models is not addressed in the claims, nor the prediction step.  Thereby, failing to add anything significantly more.  The invention is simply a gathering acquiring of information from sensors and computing devices and generating a formulaic model for predicting traffic based on the gathered information within the human mind

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 3,5-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Guo et al. (US 20220414450 A1) in view of Huang  (CN 112053550B).

In regards to claim 1, Guo teaches a learning device comprising: a processor to execute a program; and a memory to store past congestion-area data indicating congestion-related information including a past congestion level of each of m areas (where m is an integer of two or more), past sensor data indicating values detected in the past by one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of the m areas (Paragraphs 9, 11, 17, 60, 63)
It is one object of some embodiments to provide a distributed machine learning based vehicular traffic prediction method that can accurately predict short time, middle time and long time traffic for both city roads and highways. Additionally, it is another object of some embodiments to make optimal route planning for vehicles to minimize traffic congestion and travel time by using the accurate traffic prediction.[P-9]
A learning server coordinates distributed learning among a set of data collectors. A learning server first designs and distributes the traffic models such as neural networks to the set of data collectors for the first round of the distributed training Each data collector then trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training process, each data collector sends the trained traffic models to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models. Upon completion of the model aggregation, the learning server re-distributes the aggregated traffic models to data collectors for the second round of training process. This process of training and aggregation continues until the robust traffic models are built.[P-11]
According to some embodiments of the present invention, a distributed machine learning based traffic prediction method can be provided for predicting traffic of roads. In this case, the distributed machine learning based traffic prediction method may a computer-implemented distributed machine learning based traffic prediction method for predicting traffic of roads. The method may include distributing global multi-task traffic models by a learning server to learning agents to locally train the traffic models; uploading locally trained global multi-task traffic models to the learning server, wherein the locally trained global multi-task traffic models have been trained by the learning agents; updating the global multi-task traffic models by the learning server using the locally trained global multi-task traffic models uploaded from the learning agents; generating a time-dependent global traffic map by the learning server using the updated global multi-task traffic models; and distributing the time-dependent global traffic map to each of vehicles traveling on the roads.[P-17]
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
The learning server 115 may be referred to as a distributed machine learning based traffic predication server. The learning server 115 is configured to provide map information with respect to traffic predictions to vehicles 120 traveling on a road network. The learning server 115 may include one or more processors 121, a memory 140, a memory unit/storage 200 configured to store traffic prediction neural networks 132, traffic data cluster rules 134, global time-dependent map 135, trained multi-task traffic models 136, global map of road network 137, an input interface (or transceiver) 150 configured to communicate with the learning agent 110 via the communication network 112 and update model parameters of the traffic prediction neural networks 132. The trained multi-task traffic models 136 may be the traffic models 310 shown in FIG. 3A.[P-63]
Guo establishes M clusters to represent different traffic models attributed to different traffic areas, where Guo illustrates one area may be freeway traffic to which would labeled Cluster 1, another area may be city traffic (Cluster 2), thereby satisfying the condition of M being two or more.  Furthermore, Guo teaches data collectors/sensors, established as data collector N.  The data collector(s) may be a toll station, a loop detector, or a camera. In which case depending on the area generating  the traffic/cluster model, a desired number of sensors/detectors may be set, such that a desired ratio of  N<M may be met, i.e.data collectors: cluster model. 
 Furthermore, Guo teaches the program which, when executed by the processor, performs processes of, generating a first model by using the values indicated by the past sensor data as input data and using the congestion-related information indicated by the past congestion-area data as correct data (Paragraph 48)
To reap all the aforementioned benefits, the traffic prediction must process the real-time and historical traffic data and observations collected by data collectors and mobile devices. For example, the inductive loop can measure the travel speed by reading the inductance changes over time and such data can be used for the traffic speed prediction. In addition, the wide use of mobile devices (e.g., on-board global position systems and phones) enables the mobility data to be crowdsourced from the general public, further facilitating the unprecedented traffic data collection. Such emerging big data can substantially augment the data availability in terms of the coverage and fidelity and significantly boost the data-driven traffic prediction. The prior art on the traffic prediction can be mainly grouped into two categories. The first category focus on using parametric approaches, such as autoregressive integrated moving average (ARIMA) model and Kalman filtering models. When dealing with the traffic only presenting regular variations, e.g., recurrent traffic congestion occurred in morning and evening rush hour, the parametric approaches can achieve promising prediction results. However, due to the stochastic and nonlinear nature of the road traffic, the traffic predictions of using the parametric approaches can deviate from the actual values especially in the abrupt traffic. Hence, instead of fitting the traffic data into a mathematical model as done by the parametric approach, an alternative way is using the nonparametric approaches where the machine learning (ML) based method is used. For example, a stacked autoencoder model can be used to learn the generic traffic flow features for the predictions. The long short-term memory (LSTM) recurrent neural network (RNN) can be used to predict the traffic flow, speed and occupancy, based on the data collected by the data collector and its upstream and downstream data collectors. Along with the use of RNN, the convolution neural network (CNN) can also be utilized to capture the latent traffic evolution patterns within the underlying road network.[P-48]

Here, we see the integration of real-time traffic data with passed sensed traffic data, congestion related traffic information used to aggregate and enable the prediction of future traffic patterns.
Guo teaches the first model being a learning model for predicting, from values detected by the one or more sensors, congestion-related information of a time point at which the one or more sensors perform detection(Paragraph 60; Figure 3A)
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
Here, in P-60 and Fig 3A, it is illustrated and explained that several different traffic models are generated, where one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection
Guo does not verbatim teach generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second time point being a time point after the first time point.  
However, as taught in P-60 multiple traffic models, where for example, one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection.  Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc.[P-60]
Hence the next round of learning, entails generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data (the previous aggregated traffic model) as input data and using the congestion level in the congestion-related information of a second time point (a different time point in rush hour) indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second time point being a time point after the first time point( a different time in rush hour after the previous time in the preceding learning model). 
Therefore, though Guo’s teaching does not verbatim recite generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data, by teaching the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data, it is obvious to one of ordinary skill in the art that Guo’s teaching reads on the applicant’s second generated model using substitution of the applicant’s teaching with Guo’s teaching to further yield the exact same results in order to enable a more efficient and robust traffic prediction model.  
Guo however fails to teach acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection; using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level of any of the m areas: and displaying, by the processor, to users at the station the predicted congestion- related information.
Whereas, Huang teaches acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection(Page 3, Paragraph 2; Page 5, Paragraphs 5-10, )
Preferably, the monitoring terminal comprises a traffic monitoring terminal for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal for counting the pedestrian density at a certain time in a certain time, and a conflict monitoring terminal for counting the pedestrian conflict degree in a certain area and an abnormal monitoring terminal for identifying the pedestrian abnormal behaviour in a certain area.[Pg 3, P-2]
Further, the monitoring terminal 2 comprises a traffic monitoring terminal 201 for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal 202 for counting a certain area pedestrian density at a certain time, a conflict monitoring terminal 203 for counting pedestrian conflict degree in a certain area and an abnormal monitoring terminal 204 for identifying pedestrian abnormal behavior in a certain area, each terminal is composed of a sensor, an embedded processing chip, a processing program and a communication module.[Pg 5, P-5]
the sensor of the flow monitoring terminal 201 is a binocular video camera embedded processing chip and flow statistics method according to the video data obtained by the sensor, counting the pedestrian number in and out of the specified area and calculating the flow rate (unit: The cumulative flow rate (unit: 1 %) and the cumulative flow rate (unit: human), then the traffic data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-6]
the sensor of the density monitoring terminal 202 is a wide-angle video camera embedded processing chip and density statistical method according to the video data obtained by the sensor, counting the pedestrian distribution of the specified area and calculating the density (unit of each point after discrete area of the area: the density data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-7]
the sensor of the conflict monitoring terminal 203 is high video camera, embedded processing chip and conflict calculation algorithm according to the video data obtained by the sensor, counting the appointed area pedestrian speed direction and size change rate, relative direction pedestrian number, local density and other parameters, and according to the direction and size change rate of the pedestrian speed in the specified area, relative direction pedestrian number, local density parameter calculating conflict coefficient (dimensionless parameter), then the conflict coefficient is uploaded to the management control terminal 1 through the communication module[Pg 5, P-8]
the sensor of the abnormal monitoring terminal 204 is high video camera, embedded processing chip and the abnormal detection algorithm according to the video data obtained by the sensor, detecting whether there is abnormal behaviour in the specified area, such as a fight, terrorist attack, fast escape, crowding, then the type code of the abnormal behaviour is uploaded to the management control terminal 1 through the communication module. [Pg 5, P-9]
Further, the guide terminal 3 comprises an acoustic guide terminal 301 for conveying sound instruction, for area division, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304. The area separation terminal 303 for the dynamic adjustment of the pedestrian and vehicular traffic area and the guide terminal 305 for order maintenance and manual assistance 305. [Pg 5, P-10]

Huang teaches further using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level (Page 3, Paragraphs 3, 4, 9; Page 3, Paragraph 9-Page 4, Paragraph 1; Page 6, Paragraphs 2, 3, 7, 8 )
Preferably, the management control terminal comprises a region modeling module, a distribution estimation module connected with the region modeling module signal, a distribution prediction module connected with the distribution estimation module signal, and a risk evaluation module connected with the distribution prediction module and the distribution estimation module signal, a simulation and optimization module connected with the risk evaluation module signal and a guide control module connected with the simulation and optimization module signal. [Pg 3, P-3]
Preferably, the simulation and optimization module comprises a personnel evacuation model and personnel flow model, when the risk assessment result of the risk assessment module is low comfort level personnel flow module, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of simulating pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and flow inducing scheme [Pg 3, P-4]
the distribution prediction module according to the history data of the distribution estimation result, establishing model based on machine learning algorithm to predict the future period of pedestrian distribution parameter, and uploading data continuous learning model parameter according to the future monitoring terminal.[Pg 3, P-9]
Preferably, the risk evaluation module signal is connected with a conflict monitoring terminal and an abnormal monitoring terminal, and according to the distribution prediction module, the output result of the distribution estimation module and conflict monitoring terminal, the uploading data of the abnormal monitoring terminal evaluates the possibility of the danger of congestion and trampling on the comfort level region. [Pg 3, P-9-Pg 4, P-1]
Further, the management control terminal 1 comprises a region modeling module 101, and the region modeling module 101 signal connection of the distribution estimation module 102, and the distribution prediction module 102 signal connection distribution prediction module 103. a risk evaluation module 104 connected with the distribution prediction module 103 and the distribution estimation module 102, a simulation and optimization module 105 connected with the risk evaluation module 104 signal and a guide control module 106 connected with the simulation and optimization module 105 signal.[Pg 6, P-2]
Further, the simulation and optimization module 105 comprises a personnel evacuation model and personnel flow model, when the risk evaluation result of the risk evaluation module 104 is low comfort level personnel flow module, management control terminal 1 sends attention area low comfort level alarm, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of the simulated pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and a flow inducing scheme, after the management staff 4 of the auditing by the guide control module 106 sends the inducing information to the guide terminal 3;[Pg 6, P-3]
the distribution prediction module 103 according to the history data of the distribution estimation result, based on the machine learning algorithm establishing model prediction future period of pedestrian distribution parameter, and according to the future monitoring terminal 2 uploading data continuous learning model parameter.[Pg 6, P-7]
Further, the risk evaluation module 104 signal connected with the conflict monitoring terminal 203 and the abnormal monitoring terminal 204, and according to the distribution prediction module 103, distribution estimation module 102 the output result and conflict monitoring terminal 203. The uploading data of the abnormal monitoring terminal 204 evaluates the possibility of the comfort level of congestion and trampling on the region of interest and the occurrence of congestion and trample.[Pg 6, P-8]

Here, we see Huang teaching data from one of more sensors at a specific area of interest (station), using a module synonymous to the first model to predict, from the sensor data, congestion information relating to the one or more sensors that detect the congestion.  Furthermore, Huang uses module for capturing and evaluating congestion data synonymous to the second prediction model/deep learning model to analyze in more depth time series related data to further analyze and predict more accurate parameters affecting the traffic/congestion to future traffic congestion.  When taken in combination with Guo’s teaching of the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data one of ordinary skill in the art may then be able to determine, the future congestion level of any of the specific (m) areas of the various stations and thereafter displaying, by the processor, to users at the station the predicted congestion-related information.
It would have been obvious to one of ordinary skill in the art to combine Guo’s teaching with Huang’s teaching in order to have a more accurate prediction model to determine congestion/traffic activities and behaviors within a given area.
In regards to claim 3, Guo teaches a learning device comprising: a processor to execute a program; and a memory to store past congestion-area data indicating congestion-related information including a past congestion level of each of m areas (where m is an integer of two or more), past sensor data indicating values detected in the past by one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of the m areas (Paragraphs 9, 11, 17, 60, 63)
It is one object of some embodiments to provide a distributed machine learning based vehicular traffic prediction method that can accurately predict short time, middle time and long time traffic for both city roads and highways. Additionally, it is another object of some embodiments to make optimal route planning for vehicles to minimize traffic congestion and travel time by using the accurate traffic prediction.[P-9]
A learning server coordinates distributed learning among a set of data collectors. A learning server first designs and distributes the traffic models such as neural networks to the set of data collectors for the first round of the distributed training Each data collector then trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training process, each data collector sends the trained traffic models to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models. Upon completion of the model aggregation, the learning server re-distributes the aggregated traffic models to data collectors for the second round of training process. This process of training and aggregation continues until the robust traffic models are built.[P-11]
According to some embodiments of the present invention, a distributed machine learning based traffic prediction method can be provided for predicting traffic of roads. In this case, the distributed machine learning based traffic prediction method may a computer-implemented distributed machine learning based traffic prediction method for predicting traffic of roads. The method may include distributing global multi-task traffic models by a learning server to learning agents to locally train the traffic models; uploading locally trained global multi-task traffic models to the learning server, wherein the locally trained global multi-task traffic models have been trained by the learning agents; updating the global multi-task traffic models by the learning server using the locally trained global multi-task traffic models uploaded from the learning agents; generating a time-dependent global traffic map by the learning server using the updated global multi-task traffic models; and distributing the time-dependent global traffic map to each of vehicles traveling on the roads.[P-17]
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
The learning server 115 may be referred to as a distributed machine learning based traffic predication server. The learning server 115 is configured to provide map information with respect to traffic predictions to vehicles 120 traveling on a road network. The learning server 115 may include one or more processors 121, a memory 140, a memory unit/storage 200 configured to store traffic prediction neural networks 132, traffic data cluster rules 134, global time-dependent map 135, trained multi-task traffic models 136, global map of road network 137, an input interface (or transceiver) 150 configured to communicate with the learning agent 110 via the communication network 112 and update model parameters of the traffic prediction neural networks 132. The trained multi-task traffic models 136 may be the traffic models 310 shown in FIG. 3A.[P-63]
Guo establishes M clusters to represent different traffic models attributed to different traffic areas, where Guo illustrates one area may be freeway traffic to which would labeled Cluster 1, another area may be city traffic (Cluster 2), thereby satisfying the condition of M being two or more.  Furthermore, Guo teaches data collectors/sensors, established as data collector N.  The data collector(s) may be a toll station, a loop detector, or a camera. In which case depending on the area generating  the traffic/cluster model, a desired number of sensors/detectors may be set, such that a desired ratio of  N<M may be met, i.e.data collectors: cluster model. 
 Furthermore, Guo teaches the program which, when executed by the processor, performs processes of, generating a first model by using the congestion-related information indicated by the past congestion-area data as input data and using the values indicated by the past sensor data as correct data (Paragraph 48)
To reap all the aforementioned benefits, the traffic prediction must process the real-time and historical traffic data and observations collected by data collectors and mobile devices. For example, the inductive loop can measure the travel speed by reading the inductance changes over time and such data can be used for the traffic speed prediction. In addition, the wide use of mobile devices (e.g., on-board global position systems and phones) enables the mobility data to be crowdsourced from the general public, further facilitating the unprecedented traffic data collection. Such emerging big data can substantially augment the data availability in terms of the coverage and fidelity and significantly boost the data-driven traffic prediction. The prior art on the traffic prediction can be mainly grouped into two categories. The first category focus on using parametric approaches, such as autoregressive integrated moving average (ARIMA) model and Kalman filtering models. When dealing with the traffic only presenting regular variations, e.g., recurrent traffic congestion occurred in morning and evening rush hour, the parametric approaches can achieve promising prediction results. However, due to the stochastic and nonlinear nature of the road traffic, the traffic predictions of using the parametric approaches can deviate from the actual values especially in the abrupt traffic. Hence, instead of fitting the traffic data into a mathematical model as done by the parametric approach, an alternative way is using the nonparametric approaches where the machine learning (ML) based method is used. For example, a stacked autoencoder model can be used to learn the generic traffic flow features for the predictions. The long short-term memory (LSTM) recurrent neural network (RNN) can be used to predict the traffic flow, speed and occupancy, based on the data collected by the data collector and its upstream and downstream data collectors. Along with the use of RNN, the convolution neural network (CNN) can also be utilized to capture the latent traffic evolution patterns within the underlying road network.[P-48]

Here, we see the integration of real-time traffic data with passed sensed traffic data, congestion related traffic information used to aggregate and enable the prediction of future traffic patterns.
Guo teaches the first model being a learning model for predicting, from the congestion-related information, values to be detected by the one or more sensors at a time point at which the congestion-related information is acquired; using the first model to predict, from the congestion-related information at a first time point indicated by the past congestion-area data, values to be detected by the one or more sensors at the first time point (Paragraph 60; Figure 3A)
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
Here, in P-60 and Fig 3A, it is illustrated and explained that several different traffic models are generated, where one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection
Guo does not verbatim teach generating a second model by using the predicted values as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from values detected by the one or more sensors, the second time point being a time point after the first time point..  
However, as taught in P-60 multiple traffic models, where for example, one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection.  Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc.[P-60]
Hence the next round of learning, entails generating a second model by using the predicted values(the previous aggregated traffic model to predict traffic congestion) as input data and using the congestion level in the congestion-related information of a second time point (a different time point in rush hour) indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from values detected by one or more sensors, the second time point being a time point after the first time point (a different time in rush hour after the previous time in the preceding learning model). 
Therefore, though Guo’s teaching does not verbatim recite generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data, by teaching the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data, it is obvious to one of ordinary skill in the art that Guo’s teaching reads on the applicant’s second generated model using substitution of the applicant’s teaching with Guo’s teaching to further yield the exact same results in order to enable a more efficient and robust traffic prediction model.  
Guo however fails to teach acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection; using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level of any of the m areas: and displaying, by the processor, to users at the station the predicted congestion- related information.
Whereas, Huang teaches acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection, (Page 3, Paragraph 2; Page 5, Paragraphs 5-10, )
Preferably, the monitoring terminal comprises a traffic monitoring terminal for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal for counting the pedestrian density at a certain time in a certain time, and a conflict monitoring terminal for counting the pedestrian conflict degree in a certain area and an abnormal monitoring terminal for identifying the pedestrian abnormal behaviour in a certain area.[Pg 3, P-2]
Further, the monitoring terminal 2 comprises a traffic monitoring terminal 201 for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal 202 for counting a certain area pedestrian density at a certain time, a conflict monitoring terminal 203 for counting pedestrian conflict degree in a certain area and an abnormal monitoring terminal 204 for identifying pedestrian abnormal behavior in a certain area, each terminal is composed of a sensor, an embedded processing chip, a processing program and a communication module.[Pg 5, P-5]
the sensor of the flow monitoring terminal 201 is a binocular video camera embedded processing chip and flow statistics method according to the video data obtained by the sensor, counting the pedestrian number in and out of the specified area and calculating the flow rate (unit: The cumulative flow rate (unit: 1 %) and the cumulative flow rate (unit: human), then the traffic data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-6]
the sensor of the density monitoring terminal 202 is a wide-angle video camera embedded processing chip and density statistical method according to the video data obtained by the sensor, counting the pedestrian distribution of the specified area and calculating the density (unit of each point after discrete area of the area: the density data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-7]
the sensor of the conflict monitoring terminal 203 is high video camera, embedded processing chip and conflict calculation algorithm according to the video data obtained by the sensor, counting the appointed area pedestrian speed direction and size change rate, relative direction pedestrian number, local density and other parameters, and according to the direction and size change rate of the pedestrian speed in the specified area, relative direction pedestrian number, local density parameter calculating conflict coefficient (dimensionless parameter), then the conflict coefficient is uploaded to the management control terminal 1 through the communication module[Pg 5, P-8]
the sensor of the abnormal monitoring terminal 204 is high video camera, embedded processing chip and the abnormal detection algorithm according to the video data obtained by the sensor, detecting whether there is abnormal behaviour in the specified area, such as a fight, terrorist attack, fast escape, crowding, then the type code of the abnormal behaviour is uploaded to the management control terminal 1 through the communication module. [Pg 5, P-9]
Further, the guide terminal 3 comprises an acoustic guide terminal 301 for conveying sound instruction, for area division, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304. The area separation terminal 303 for the dynamic adjustment of the pedestrian and vehicular traffic area and the guide terminal 305 for order maintenance and manual assistance 305. [Pg 5, P-10]

Huang teaches further using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level (Page 3, Paragraphs 3, 4, 9; Page 3, Paragraph 9-Page 4, Paragraph 1; Page 6, Paragraphs 2, 3, 7, 8 )
Preferably, the management control terminal comprises a region modeling module, a distribution estimation module connected with the region modeling module signal, a distribution prediction module connected with the distribution estimation module signal, and a risk evaluation module connected with the distribution prediction module and the distribution estimation module signal, a simulation and optimization module connected with the risk evaluation module signal and a guide control module connected with the simulation and optimization module signal. [Pg 3, P-3]
Preferably, the simulation and optimization module comprises a personnel evacuation model and personnel flow model, when the risk assessment result of the risk assessment module is low comfort level personnel flow module, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of simulating pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and flow inducing scheme [Pg 3, P-4]
the distribution prediction module according to the history data of the distribution estimation result, establishing model based on machine learning algorithm to predict the future period of pedestrian distribution parameter, and uploading data continuous learning model parameter according to the future monitoring terminal.[Pg 3, P-9]
Preferably, the risk evaluation module signal is connected with a conflict monitoring terminal and an abnormal monitoring terminal, and according to the distribution prediction module, the output result of the distribution estimation module and conflict monitoring terminal, the uploading data of the abnormal monitoring terminal evaluates the possibility of the danger of congestion and trampling on the comfort level region. [Pg 3, P-9-Pg 4, P-1]
Further, the management control terminal 1 comprises a region modeling module 101, and the region modeling module 101 signal connection of the distribution estimation module 102, and the distribution prediction module 102 signal connection distribution prediction module 103. a risk evaluation module 104 connected with the distribution prediction module 103 and the distribution estimation module 102, a simulation and optimization module 105 connected with the risk evaluation module 104 signal and a guide control module 106 connected with the simulation and optimization module 105 signal.[Pg 6, P-2]
Further, the simulation and optimization module 105 comprises a personnel evacuation model and personnel flow model, when the risk evaluation result of the risk evaluation module 104 is low comfort level personnel flow module, management control terminal 1 sends attention area low comfort level alarm, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of the simulated pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and a flow inducing scheme, after the management staff 4 of the auditing by the guide control module 106 sends the inducing information to the guide terminal 3;[Pg 6, P-3]
the distribution prediction module 103 according to the history data of the distribution estimation result, based on the machine learning algorithm establishing model prediction future period of pedestrian distribution parameter, and according to the future monitoring terminal 2 uploading data continuous learning model parameter.[Pg 6, P-7]
Further, the risk evaluation module 104 signal connected with the conflict monitoring terminal 203 and the abnormal monitoring terminal 204, and according to the distribution prediction module 103, distribution estimation module 102 the output result and conflict monitoring terminal 203. The uploading data of the abnormal monitoring terminal 204 evaluates the possibility of the comfort level of congestion and trampling on the region of interest and the occurrence of congestion and trample.[Pg 6, P-8]


Here, we see Huang teaching data from one of more sensors at a specific area of interest (station), using a module synonymous to the first model to predict, from the sensor data, congestion information relating to the one or more sensors that detect the congestion.  Furthermore, Huang uses module for capturing and evaluating congestion data synonymous to the second prediction model/deep learning model to analyze in more depth time series related data to further analyze and predict more accurate parameters affecting the traffic/congestion to future traffic congestion.  When taken in combination with Guo’s teaching of the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data one of ordinary skill in the art may then be able to determine, the future congestion level of any of the specific (m) areas of the various stations and thereafter displaying, by the processor, to users at the station the predicted congestion-related information.
It would have been obvious to one of ordinary skill in the art to combine Guo’s teaching with Huang’s teaching in order to have a more accurate prediction model to determine congestion/traffic activities and behaviors within a given area.

In regards to claim 5, Guo teaches a prediction device comprising: a processor to execute a program; and a memory to store the program which, when executed by the processor, performs processes of, acquiring values detected by one or more sensors installed in n areas (where n is an integer of one or more, and n<m) out of m areas (where m is an integer of two or more) (Paragraphs 9, 11, 17, 60, 63)
It is one object of some embodiments to provide a distributed machine learning based vehicular traffic prediction method that can accurately predict short time, middle time and long time traffic for both city roads and highways. Additionally, it is another object of some embodiments to make optimal route planning for vehicles to minimize traffic congestion and travel time by using the accurate traffic prediction.[P-9]
A learning server coordinates distributed learning among a set of data collectors. A learning server first designs and distributes the traffic models such as neural networks to the set of data collectors for the first round of the distributed training Each data collector then trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training process, each data collector sends the trained traffic models to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models. Upon completion of the model aggregation, the learning server re-distributes the aggregated traffic models to data collectors for the second round of training process. This process of training and aggregation continues until the robust traffic models are built.[P-11]
According to some embodiments of the present invention, a distributed machine learning based traffic prediction method can be provided for predicting traffic of roads. In this case, the distributed machine learning based traffic prediction method may a computer-implemented distributed machine learning based traffic prediction method for predicting traffic of roads. The method may include distributing global multi-task traffic models by a learning server to learning agents to locally train the traffic models; uploading locally trained global multi-task traffic models to the learning server, wherein the locally trained global multi-task traffic models have been trained by the learning agents; updating the global multi-task traffic models by the learning server using the locally trained global multi-task traffic models uploaded from the learning agents; generating a time-dependent global traffic map by the learning server using the updated global multi-task traffic models; and distributing the time-dependent global traffic map to each of vehicles traveling on the roads.[P-17]
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
The learning server 115 may be referred to as a distributed machine learning based traffic predication server. The learning server 115 is configured to provide map information with respect to traffic predictions to vehicles 120 traveling on a road network. The learning server 115 may include one or more processors 121, a memory 140, a memory unit/storage 200 configured to store traffic prediction neural networks 132, traffic data cluster rules 134, global time-dependent map 135, trained multi-task traffic models 136, global map of road network 137, an input interface (or transceiver) 150 configured to communicate with the learning agent 110 via the communication network 112 and update model parameters of the traffic prediction neural networks 132. The trained multi-task traffic models 136 may be the traffic models 310 shown in FIG. 3A.[P-63]
Guo establishes M clusters to represent different traffic models attributed to different traffic areas, where Guo illustrates one area may be freeway traffic to which would labeled Cluster 1, another area may be city traffic (Cluster 2), thereby satisfying the condition of M being two or more.  Furthermore, Guo teaches data collectors/sensors, established as data collector N.  The data collector(s) may be a toll station, a loop detector, or a camera. In which case depending on the area generating  the traffic/cluster model, a desired number of sensors/detectors may be set, such that a desired ratio of  N<M may be met, i.e.data collectors: cluster model. 
Guo also teaches using a first model to predict, from the acquired values, congestion-related information of a time point at which the one or more sensors perform detection(Paragraph 48)
To reap all the aforementioned benefits, the traffic prediction must process the real-time and historical traffic data and observations collected by data collectors and mobile devices. For example, the inductive loop can measure the travel speed by reading the inductance changes over time and such data can be used for the traffic speed prediction. In addition, the wide use of mobile devices (e.g., on-board global position systems and phones) enables the mobility data to be crowdsourced from the general public, further facilitating the unprecedented traffic data collection. Such emerging big data can substantially augment the data availability in terms of the coverage and fidelity and significantly boost the data-driven traffic prediction. The prior art on the traffic prediction can be mainly grouped into two categories. The first category focus on using parametric approaches, such as autoregressive integrated moving average (ARIMA) model and Kalman filtering models. When dealing with the traffic only presenting regular variations, e.g., recurrent traffic congestion occurred in morning and evening rush hour, the parametric approaches can achieve promising prediction results. However, due to the stochastic and nonlinear nature of the road traffic, the traffic predictions of using the parametric approaches can deviate from the actual values especially in the abrupt traffic. Hence, instead of fitting the traffic data into a mathematical model as done by the parametric approach, an alternative way is using the nonparametric approaches where the machine learning (ML) based method is used. For example, a stacked autoencoder model can be used to learn the generic traffic flow features for the predictions. The long short-term memory (LSTM) recurrent neural network (RNN) can be used to predict the traffic flow, speed and occupancy, based on the data collected by the data collector and its upstream and downstream data collectors. Along with the use of RNN, the convolution neural network (CNN) can also be utilized to capture the latent traffic evolution patterns within the underlying road network.[P-48]

Here, we see the integration of real-time traffic data with passed sensed traffic data, congestion related traffic information used to aggregate and enable the prediction of future traffic patterns.
 Guo then the first model being a learning model for predicting, from values to be detected by the one or more sensors, the congestion-related information of the time point at which the one or more sensors perform detection, the first model being generated by using values indicated by past sensor data indicating the values detected in the past by the one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of the m areas as input data and using the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of the m areas as correct data (Paragraph 60; Figure 3A)
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]

Here, in P-60 and Fig 3A, it is illustrated and explained that several different traffic models are generated, where one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection.
This is used conjunction with the established M clusters representative of different traffic models attributed to different traffic areas, where one area may be freeway traffic to which would labeled Cluster 1, another area may be city traffic (Cluster 2), thereby satisfying the condition of M being two or more.  Furthermore, Guo teaches data collectors/sensors, established as data collector N.  The data collector(s) may be a toll station, a loop detector, or a camera. In which case depending on the area generating  the traffic/cluster model, a desired number of sensors/detectors may be set, such that a desired ratio of  N<M may be met, i.e. data collectors: cluster model. 
Guo fails to teach verbatim using a second model to predict, from the predicted congestion-related information, a future congestion level of any of the m areas, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second model being generated by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second time point being a time point after the first time point.  
However, as taught in P-60 multiple traffic models, where for example, one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection.  Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc.[P-60]
Hence the next round of learning, entails generating a second model by using the predicted values(the previous aggregated traffic model to predict traffic congestion) as input data and using the congestion level in the congestion-related information of a second time point (a different time point in rush hour) indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from values detected by one or more sensors, the second time point being a time point after the first time point (a different time in rush hour after the previous time in the preceding learning model). 
Therefore, though Guo’s teaching does not verbatim recite generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data, by teaching the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data, it is obvious to one of ordinary skill in the art that Guo’s teaching reads on the applicant’s second generated model using substitution of the applicant’s teaching with Guo’s teaching to further yield the exact same results in order to enable a more efficient and robust traffic prediction model.  
Guo however fails to teach acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection; using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level of any of the m areas: and displaying, by the processor, to users at the station the predicted congestion- related information.
Whereas, Huang teaches acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection, (Page 3, Paragraph 2; Page 5, Paragraphs 5-10, )
Preferably, the monitoring terminal comprises a traffic monitoring terminal for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal for counting the pedestrian density at a certain time in a certain time, and a conflict monitoring terminal for counting the pedestrian conflict degree in a certain area and an abnormal monitoring terminal for identifying the pedestrian abnormal behaviour in a certain area.[Pg 3, P-2]
Further, the monitoring terminal 2 comprises a traffic monitoring terminal 201 for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal 202 for counting a certain area pedestrian density at a certain time, a conflict monitoring terminal 203 for counting pedestrian conflict degree in a certain area and an abnormal monitoring terminal 204 for identifying pedestrian abnormal behavior in a certain area, each terminal is composed of a sensor, an embedded processing chip, a processing program and a communication module.[Pg 5, P-5]
the sensor of the flow monitoring terminal 201 is a binocular video camera embedded processing chip and flow statistics method according to the video data obtained by the sensor, counting the pedestrian number in and out of the specified area and calculating the flow rate (unit: The cumulative flow rate (unit: 1 %) and the cumulative flow rate (unit: human), then the traffic data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-6]
the sensor of the density monitoring terminal 202 is a wide-angle video camera embedded processing chip and density statistical method according to the video data obtained by the sensor, counting the pedestrian distribution of the specified area and calculating the density (unit of each point after discrete area of the area: the density data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-7]
the sensor of the conflict monitoring terminal 203 is high video camera, embedded processing chip and conflict calculation algorithm according to the video data obtained by the sensor, counting the appointed area pedestrian speed direction and size change rate, relative direction pedestrian number, local density and other parameters, and according to the direction and size change rate of the pedestrian speed in the specified area, relative direction pedestrian number, local density parameter calculating conflict coefficient (dimensionless parameter), then the conflict coefficient is uploaded to the management control terminal 1 through the communication module[Pg 5, P-8]
the sensor of the abnormal monitoring terminal 204 is high video camera, embedded processing chip and the abnormal detection algorithm according to the video data obtained by the sensor, detecting whether there is abnormal behaviour in the specified area, such as a fight, terrorist attack, fast escape, crowding, then the type code of the abnormal behaviour is uploaded to the management control terminal 1 through the communication module. [Pg 5, P-9]
Further, the guide terminal 3 comprises an acoustic guide terminal 301 for conveying sound instruction, for area division, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304. The area separation terminal 303 for the dynamic adjustment of the pedestrian and vehicular traffic area and the guide terminal 305 for order maintenance and manual assistance 305. [Pg 5, P-10]

Huang teaches further using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level (Page 3, Paragraphs 3, 4, 9; Page 3, Paragraph 9-Page 4, Paragraph 1; Page 6, Paragraphs 2, 3, 7, 8 )
Preferably, the management control terminal comprises a region modeling module, a distribution estimation module connected with the region modeling module signal, a distribution prediction module connected with the distribution estimation module signal, and a risk evaluation module connected with the distribution prediction module and the distribution estimation module signal, a simulation and optimization module connected with the risk evaluation module signal and a guide control module connected with the simulation and optimization module signal. [Pg 3, P-3]
Preferably, the simulation and optimization module comprises a personnel evacuation model and personnel flow model, when the risk assessment result of the risk assessment module is low comfort level personnel flow module, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of simulating pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and flow inducing scheme [Pg 3, P-4]
the distribution prediction module according to the history data of the distribution estimation result, establishing model based on machine learning algorithm to predict the future period of pedestrian distribution parameter, and uploading data continuous learning model parameter according to the future monitoring terminal.[Pg 3, P-9]
Preferably, the risk evaluation module signal is connected with a conflict monitoring terminal and an abnormal monitoring terminal, and according to the distribution prediction module, the output result of the distribution estimation module and conflict monitoring terminal, the uploading data of the abnormal monitoring terminal evaluates the possibility of the danger of congestion and trampling on the comfort level region. [Pg 3, P-9-Pg 4, P-1]
Further, the management control terminal 1 comprises a region modeling module 101, and the region modeling module 101 signal connection of the distribution estimation module 102, and the distribution prediction module 102 signal connection distribution prediction module 103. a risk evaluation module 104 connected with the distribution prediction module 103 and the distribution estimation module 102, a simulation and optimization module 105 connected with the risk evaluation module 104 signal and a guide control module 106 connected with the simulation and optimization module 105 signal.[Pg 6, P-2]
Further, the simulation and optimization module 105 comprises a personnel evacuation model and personnel flow model, when the risk evaluation result of the risk evaluation module 104 is low comfort level personnel flow module, management control terminal 1 sends attention area low comfort level alarm, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of the simulated pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and a flow inducing scheme, after the management staff 4 of the auditing by the guide control module 106 sends the inducing information to the guide terminal 3;[Pg 6, P-3]
the distribution prediction module 103 according to the history data of the distribution estimation result, based on the machine learning algorithm establishing model prediction future period of pedestrian distribution parameter, and according to the future monitoring terminal 2 uploading data continuous learning model parameter.[Pg 6, P-7]
Further, the risk evaluation module 104 signal connected with the conflict monitoring terminal 203 and the abnormal monitoring terminal 204, and according to the distribution prediction module 103, distribution estimation module 102 the output result and conflict monitoring terminal 203. The uploading data of the abnormal monitoring terminal 204 evaluates the possibility of the comfort level of congestion and trampling on the region of interest and the occurrence of congestion and trample.[Pg 6, P-8]

Here, we see Huang teaching data from one of more sensors at a specific area of interest (station), using a module synonymous to the first model to predict, from the sensor data, congestion information relating to the one or more sensors that detect the congestion.  Furthermore, Huang uses module for capturing and evaluating congestion data synonymous to the second prediction model/deep learning model to analyze in more depth time series related data to further analyze and predict more accurate parameters affecting the traffic/congestion to future traffic congestion.  When taken in combination with Guo’s teaching of the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data one of ordinary skill in the art may then be able to determine, the future congestion level of any of the specific (m) areas of the various stations and thereafter displaying, by the processor, to users at the station the predicted congestion-related information.
It would have been obvious to one of ordinary skill in the art to combine Guo’s teaching with Huang’s teaching in order to have a more accurate prediction model to determine congestion/traffic activities and behaviors within a given area.

In regards to claim 6, Guo teaches a prediction device comprising: a processor to execute a program; and a memory to store the program which, when executed by the processor, performs processes of, acquiring values detected by one or more sensors installed in n areas (where n is an integer of one or more, and n<m) out of m areas (where m is an integer of two or more) (Paragraphs 9, 11, 17, 60, 63)
It is one object of some embodiments to provide a distributed machine learning based vehicular traffic prediction method that can accurately predict short time, middle time and long time traffic for both city roads and highways. Additionally, it is another object of some embodiments to make optimal route planning for vehicles to minimize traffic congestion and travel time by using the accurate traffic prediction.[P-9]
A learning server coordinates distributed learning among a set of data collectors. A learning server first designs and distributes the traffic models such as neural networks to the set of data collectors for the first round of the distributed training Each data collector then trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training process, each data collector sends the trained traffic models to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models. Upon completion of the model aggregation, the learning server re-distributes the aggregated traffic models to data collectors for the second round of training process. This process of training and aggregation continues until the robust traffic models are built.[P-11]
According to some embodiments of the present invention, a distributed machine learning based traffic prediction method can be provided for predicting traffic of roads. In this case, the distributed machine learning based traffic prediction method may a computer-implemented distributed machine learning based traffic prediction method for predicting traffic of roads. The method may include distributing global multi-task traffic models by a learning server to learning agents to locally train the traffic models; uploading locally trained global multi-task traffic models to the learning server, wherein the locally trained global multi-task traffic models have been trained by the learning agents; updating the global multi-task traffic models by the learning server using the locally trained global multi-task traffic models uploaded from the learning agents; generating a time-dependent global traffic map by the learning server using the updated global multi-task traffic models; and distributing the time-dependent global traffic map to each of vehicles traveling on the roads.[P-17]
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
The learning server 115 may be referred to as a distributed machine learning based traffic predication server. The learning server 115 is configured to provide map information with respect to traffic predictions to vehicles 120 traveling on a road network. The learning server 115 may include one or more processors 121, a memory 140, a memory unit/storage 200 configured to store traffic prediction neural networks 132, traffic data cluster rules 134, global time-dependent map 135, trained multi-task traffic models 136, global map of road network 137, an input interface (or transceiver) 150 configured to communicate with the learning agent 110 via the communication network 112 and update model parameters of the traffic prediction neural networks 132. The trained multi-task traffic models 136 may be the traffic models 310 shown in FIG. 3A.[P-63]

Guo establishes M clusters to represent different traffic models attributed to different traffic areas, where Guo illustrates one area may be freeway traffic to which would labeled Cluster 1, another area may be city traffic (Cluster 2), thereby satisfying the condition of M being two or more.  Furthermore, Guo teaches data collectors/sensors, established as data collector N.  The data collector(s) may be a toll station, a loop detector, or a camera. In which case depending on the area generating  the traffic/cluster model, a desired number of sensors/detectors may be set, such that a desired ratio of  N<M may be met, i.e.data collectors: cluster model. 
Guo then teaches the first model being generated by using, as input data, the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of the m areas and by using, as correct data, values indicated by past sensor data indicating values detected by the one or more sensors in the past(Paragraph 48)
To reap all the aforementioned benefits, the traffic prediction must process the real-time and historical traffic data and observations collected by data collectors and mobile devices. For example, the inductive loop can measure the travel speed by reading the inductance changes over time and such data can be used for the traffic speed prediction. In addition, the wide use of mobile devices (e.g., on-board global position systems and phones) enables the mobility data to be crowdsourced from the general public, further facilitating the unprecedented traffic data collection. Such emerging big data can substantially augment the data availability in terms of the coverage and fidelity and significantly boost the data-driven traffic prediction. The prior art on the traffic prediction can be mainly grouped into two categories. The first category focus on using parametric approaches, such as autoregressive integrated moving average (ARIMA) model and Kalman filtering models. When dealing with the traffic only presenting regular variations, e.g., recurrent traffic congestion occurred in morning and evening rush hour, the parametric approaches can achieve promising prediction results. However, due to the stochastic and nonlinear nature of the road traffic, the traffic predictions of using the parametric approaches can deviate from the actual values especially in the abrupt traffic. Hence, instead of fitting the traffic data into a mathematical model as done by the parametric approach, an alternative way is using the nonparametric approaches where the machine learning (ML) based method is used. For example, a stacked autoencoder model can be used to learn the generic traffic flow features for the predictions. The long short-term memory (LSTM) recurrent neural network (RNN) can be used to predict the traffic flow, speed and occupancy, based on the data collected by the data collector and its upstream and downstream data collectors. Along with the use of RNN, the convolution neural network (CNN) can also be utilized to capture the latent traffic evolution patterns within the underlying road network.[P-48]

Here, we see the integration of real-time traffic data with passed sensed traffic data, congestion related traffic information used to aggregate and enable the prediction of future traffic patterns.
Guo teaches the first model being a learning model for predicting, from values detected by the one or more sensors, congestion-related information of a time point at which the one or more sensors perform detection(Paragraph 60; Figure 3A)
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
Here, in P-60 and Fig 3A, it is illustrated and explained that several different traffic models are generated, where one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection
Guo fails to teach verbatim using a second model to predict a future congestion level of any of the m areas from the acquired values, the first model being a learning model for predicting, from congestion-related information, values to be detected by the one or more sensors at a time point at which the congestion-related information is acquired,  the second model being a learning model for predicting a future congestion level from the values detected by the one or more sensors, the second model being generated by using, as input data, predicted values obtained by using a first model to predict the values to be detected by the one or more sensors at a first time point, from congestion-related information of the first time point indicated by the past congestion-area data and by using, as correct data, a congestion level in the congestion-related information of a second time point indicated by the past congestion-area data, the second time point being a time point after the first time point.
However, as taught in P-60 multiple traffic models, where for example, one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection.  Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc.[P-60]
Hence the next round of learning, entails generating a second model by using the predicted values(the previous aggregated traffic model to predict traffic congestion/the first traffic model) as input data and using the congestion level in the congestion-related information of a second time point (a different time point in rush hour) indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from values detected by one or more sensors, the second time point being a time point after the first time point (a different time in rush hour after the previous time in the preceding learning model). 
Therefore, though Guo’s teaching does not verbatim recite generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data, by teaching the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data, it is obvious to one of ordinary skill in the art that Guo’s teaching reads on the applicant’s second generated model using substitution of the applicant’s teaching with Guo’s teaching to further yield the exact same results in order to enable a more efficient and robust traffic prediction model.  
Guo however fails to teach acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection; using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level of any of the m areas: and displaying, by the processor, to users at the station the predicted congestion- related information.
Whereas, Huang teaches acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection, (Page 3, Paragraph 2; Page 5, Paragraphs 5-10, )
Preferably, the monitoring terminal comprises a traffic monitoring terminal for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal for counting the pedestrian density at a certain time in a certain time, and a conflict monitoring terminal for counting the pedestrian conflict degree in a certain area and an abnormal monitoring terminal for identifying the pedestrian abnormal behaviour in a certain area.[Pg 3, P-2]
Further, the monitoring terminal 2 comprises a traffic monitoring terminal 201 for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal 202 for counting a certain area pedestrian density at a certain time, a conflict monitoring terminal 203 for counting pedestrian conflict degree in a certain area and an abnormal monitoring terminal 204 for identifying pedestrian abnormal behavior in a certain area, each terminal is composed of a sensor, an embedded processing chip, a processing program and a communication module.[Pg 5, P-5]
the sensor of the flow monitoring terminal 201 is a binocular video camera embedded processing chip and flow statistics method according to the video data obtained by the sensor, counting the pedestrian number in and out of the specified area and calculating the flow rate (unit: The cumulative flow rate (unit: 1 %) and the cumulative flow rate (unit: human), then the traffic data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-6]
the sensor of the density monitoring terminal 202 is a wide-angle video camera embedded processing chip and density statistical method according to the video data obtained by the sensor, counting the pedestrian distribution of the specified area and calculating the density (unit of each point after discrete area of the area: the density data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-7]
the sensor of the conflict monitoring terminal 203 is high video camera, embedded processing chip and conflict calculation algorithm according to the video data obtained by the sensor, counting the appointed area pedestrian speed direction and size change rate, relative direction pedestrian number, local density and other parameters, and according to the direction and size change rate of the pedestrian speed in the specified area, relative direction pedestrian number, local density parameter calculating conflict coefficient (dimensionless parameter), then the conflict coefficient is uploaded to the management control terminal 1 through the communication module[Pg 5, P-8]
the sensor of the abnormal monitoring terminal 204 is high video camera, embedded processing chip and the abnormal detection algorithm according to the video data obtained by the sensor, detecting whether there is abnormal behaviour in the specified area, such as a fight, terrorist attack, fast escape, crowding, then the type code of the abnormal behaviour is uploaded to the management control terminal 1 through the communication module. [Pg 5, P-9]
Further, the guide terminal 3 comprises an acoustic guide terminal 301 for conveying sound instruction, for area division, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304. The area separation terminal 303 for the dynamic adjustment of the pedestrian and vehicular traffic area and the guide terminal 305 for order maintenance and manual assistance 305. [Pg 5, P-10]

Huang teaches further using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level (Page 3, Paragraphs 3, 4, 9; Page 3, Paragraph 9-Page 4, Paragraph 1; Page 6, Paragraphs 2, 3, 7, 8 )
Preferably, the management control terminal comprises a region modeling module, a distribution estimation module connected with the region modeling module signal, a distribution prediction module connected with the distribution estimation module signal, and a risk evaluation module connected with the distribution prediction module and the distribution estimation module signal, a simulation and optimization module connected with the risk evaluation module signal and a guide control module connected with the simulation and optimization module signal. [Pg 3, P-3]
Preferably, the simulation and optimization module comprises a personnel evacuation model and personnel flow model, when the risk assessment result of the risk assessment module is low comfort level personnel flow module, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of simulating pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and flow inducing scheme [Pg 3, P-4]
the distribution prediction module according to the history data of the distribution estimation result, establishing model based on machine learning algorithm to predict the future period of pedestrian distribution parameter, and uploading data continuous learning model parameter according to the future monitoring terminal.[Pg 3, P-9]
Preferably, the risk evaluation module signal is connected with a conflict monitoring terminal and an abnormal monitoring terminal, and according to the distribution prediction module, the output result of the distribution estimation module and conflict monitoring terminal, the uploading data of the abnormal monitoring terminal evaluates the possibility of the danger of congestion and trampling on the comfort level region. [Pg 3, P-9-Pg 4, P-1]
Further, the management control terminal 1 comprises a region modeling module 101, and the region modeling module 101 signal connection of the distribution estimation module 102, and the distribution prediction module 102 signal connection distribution prediction module 103. a risk evaluation module 104 connected with the distribution prediction module 103 and the distribution estimation module 102, a simulation and optimization module 105 connected with the risk evaluation module 104 signal and a guide control module 106 connected with the simulation and optimization module 105 signal.[Pg 6, P-2]
Further, the simulation and optimization module 105 comprises a personnel evacuation model and personnel flow model, when the risk evaluation result of the risk evaluation module 104 is low comfort level personnel flow module, management control terminal 1 sends attention area low comfort level alarm, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of the simulated pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and a flow inducing scheme, after the management staff 4 of the auditing by the guide control module 106 sends the inducing information to the guide terminal 3;[Pg 6, P-3]
the distribution prediction module 103 according to the history data of the distribution estimation result, based on the machine learning algorithm establishing model prediction future period of pedestrian distribution parameter, and according to the future monitoring terminal 2 uploading data continuous learning model parameter.[Pg 6, P-7]
Further, the risk evaluation module 104 signal connected with the conflict monitoring terminal 203 and the abnormal monitoring terminal 204, and according to the distribution prediction module 103, distribution estimation module 102 the output result and conflict monitoring terminal 203. The uploading data of the abnormal monitoring terminal 204 evaluates the possibility of the comfort level of congestion and trampling on the region of interest and the occurrence of congestion and trample.[Pg 6, P-8]


Here, we see Huang teaching data from one of more sensors at a specific area of interest (station), using a module synonymous to the first model to predict, from the sensor data, congestion information relating to the one or more sensors that detect the congestion.  Furthermore, Huang uses module for capturing and evaluating congestion data synonymous to the second prediction model/deep learning model to analyze in more depth time series related data to further analyze and predict more accurate parameters affecting the traffic/congestion to future traffic congestion.  When taken in combination with Guo’s teaching of the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data one of ordinary skill in the art may then be able to determine, the future congestion level of any of the specific (m) areas of the various stations and thereafter displaying, by the processor, to users at the station the predicted congestion-related information.
It would have been obvious to one of ordinary skill in the art to combine Guo’s teaching with Huang’s teaching in order to have a more accurate prediction model to determine congestion/traffic activities and behaviors within a given area.

In regards to claim 7, Guo teaches a learning prediction device comprising a processor to execute a program; and a memory to store past congestion-area data indicating congestion-related information including a past congestion level of each of m areas (where m is an integer of two or more), past sensor data indicating values detected in the past by one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of the m areas(Paragraphs 9, 11, 17, 60, 63)
It is one object of some embodiments to provide a distributed machine learning based vehicular traffic prediction method that can accurately predict short time, middle time and long time traffic for both city roads and highways. Additionally, it is another object of some embodiments to make optimal route planning for vehicles to minimize traffic congestion and travel time by using the accurate traffic prediction.[P-9]
A learning server coordinates distributed learning among a set of data collectors. A learning server first designs and distributes the traffic models such as neural networks to the set of data collectors for the first round of the distributed training Each data collector then trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training process, each data collector sends the trained traffic models to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models. Upon completion of the model aggregation, the learning server re-distributes the aggregated traffic models to data collectors for the second round of training process. This process of training and aggregation continues until the robust traffic models are built.[P-11]
According to some embodiments of the present invention, a distributed machine learning based traffic prediction method can be provided for predicting traffic of roads. In this case, the distributed machine learning based traffic prediction method may a computer-implemented distributed machine learning based traffic prediction method for predicting traffic of roads. The method may include distributing global multi-task traffic models by a learning server to learning agents to locally train the traffic models; uploading locally trained global multi-task traffic models to the learning server, wherein the locally trained global multi-task traffic models have been trained by the learning agents; updating the global multi-task traffic models by the learning server using the locally trained global multi-task traffic models uploaded from the learning agents; generating a time-dependent global traffic map by the learning server using the updated global multi-task traffic models; and distributing the time-dependent global traffic map to each of vehicles traveling on the roads.[P-17]
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
The learning server 115 may be referred to as a distributed machine learning based traffic predication server. The learning server 115 is configured to provide map information with respect to traffic predictions to vehicles 120 traveling on a road network. The learning server 115 may include one or more processors 121, a memory 140, a memory unit/storage 200 configured to store traffic prediction neural networks 132, traffic data cluster rules 134, global time-dependent map 135, trained multi-task traffic models 136, global map of road network 137, an input interface (or transceiver) 150 configured to communicate with the learning agent 110 via the communication network 112 and update model parameters of the traffic prediction neural networks 132. The trained multi-task traffic models 136 may be the traffic models 310 shown in FIG. 3A.[P-63]
Guo establishes M clusters to represent different traffic models attributed to different traffic areas, where Guo illustrates one area may be freeway traffic to which would labeled Cluster 1, another area may be city traffic (Cluster 2), thereby satisfying the condition of M being two or more.  Furthermore, Guo teaches data collectors/sensors, established as data collector N.  The data collector(s) may be a toll station, a loop detector, or a camera. In which case depending on the area generating  the traffic/cluster model, a desired number of sensors/detectors may be set, such that a desired ratio of  N<M may be met, i.e.data collectors: cluster model. 

Furthermore, Guo teaches the program which, when executed by the processor, performs processes of, generating a first model by using the values indicated by the past sensor data as input data and using the congestion-related information indicated by the past congestion-area data as correct data(Paragraph 48)
To reap all the aforementioned benefits, the traffic prediction must process the real-time and historical traffic data and observations collected by data collectors and mobile devices. For example, the inductive loop can measure the travel speed by reading the inductance changes over time and such data can be used for the traffic speed prediction. In addition, the wide use of mobile devices (e.g., on-board global position systems and phones) enables the mobility data to be crowdsourced from the general public, further facilitating the unprecedented traffic data collection. Such emerging big data can substantially augment the data availability in terms of the coverage and fidelity and significantly boost the data-driven traffic prediction. The prior art on the traffic prediction can be mainly grouped into two categories. The first category focus on using parametric approaches, such as autoregressive integrated moving average (ARIMA) model and Kalman filtering models. When dealing with the traffic only presenting regular variations, e.g., recurrent traffic congestion occurred in morning and evening rush hour, the parametric approaches can achieve promising prediction results. However, due to the stochastic and nonlinear nature of the road traffic, the traffic predictions of using the parametric approaches can deviate from the actual values especially in the abrupt traffic. Hence, instead of fitting the traffic data into a mathematical model as done by the parametric approach, an alternative way is using the nonparametric approaches where the machine learning (ML) based method is used. For example, a stacked autoencoder model can be used to learn the generic traffic flow features for the predictions. The long short-term memory (LSTM) recurrent neural network (RNN) can be used to predict the traffic flow, speed and occupancy, based on the data collected by the data collector and its upstream and downstream data collectors. Along with the use of RNN, the convolution neural network (CNN) can also be utilized to capture the latent traffic evolution patterns within the underlying road network.[P-48]

Here, we see the integration of real-time traffic data with passed sensed traffic data, congestion related traffic information used to aggregate and enable the prediction of future traffic patterns.
Guo teaches the first model being a learning model for predicting, from values detected by the one or more sensors, congestion-related information of a time point at which the one or more sensors perform detection (Paragraph 60; Figure 3A)
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
Here, in P-60 and Fig 3A, it is illustrated and explained that several different traffic models are generated, where one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection
Guo does not verbatim teach generating a second model by using the predicted values as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the values detected by the one or more sensors, the second time point being a time point after the first time point; acquiring the values detected by the one or more sensors; and using the second model to predict, from the acquired values, a future congestion level of any of the m areas.  
  However, as taught in P-60 multiple traffic models, where for example, one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection.  Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc.[P-60]
Hence the next round of learning, entails generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data (the previous aggregated traffic model) as input data and using the congestion level in the congestion-related information of a second time point (a different time point in rush hour) indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second time point being a time point after the first time point( a different time in rush hour after the previous time in the preceding learning model) and different areas (toll station, or camera(s) at different location).  The regenerated/second model essentially derived from updated and previous predicted congestion-related information, to using predict a future congestion level of any of the m areas.  
Therefore, though Guo’s teaching does not verbatim recite generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data, by teaching the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data, it is obvious to one of ordinary skill in the art that Guo’s teaching reads on the applicant’s second generated model using substitution of the applicant’s teaching with Guo’s teaching to further yield the exact same results in order to enable a more efficient and robust traffic prediction model.  
Guo however fails to teach displaying, by the processor, to users at the station the future congestion level of any of the m areas.
Guo however fails to teach acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection; using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level of any of the m areas: and displaying, by the processor, to users at the station the predicted congestion- related information.
Whereas, Huang teaches acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection, (Page 3, Paragraph 2; Page 5, Paragraphs 5-10, )
Preferably, the monitoring terminal comprises a traffic monitoring terminal for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal for counting the pedestrian density at a certain time in a certain time, and a conflict monitoring terminal for counting the pedestrian conflict degree in a certain area and an abnormal monitoring terminal for identifying the pedestrian abnormal behaviour in a certain area.[Pg 3, P-2]
Further, the monitoring terminal 2 comprises a traffic monitoring terminal 201 for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal 202 for counting a certain area pedestrian density at a certain time, a conflict monitoring terminal 203 for counting pedestrian conflict degree in a certain area and an abnormal monitoring terminal 204 for identifying pedestrian abnormal behavior in a certain area, each terminal is composed of a sensor, an embedded processing chip, a processing program and a communication module.[Pg 5, P-5]
the sensor of the flow monitoring terminal 201 is a binocular video camera embedded processing chip and flow statistics method according to the video data obtained by the sensor, counting the pedestrian number in and out of the specified area and calculating the flow rate (unit: The cumulative flow rate (unit: 1 %) and the cumulative flow rate (unit: human), then the traffic data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-6]
the sensor of the density monitoring terminal 202 is a wide-angle video camera embedded processing chip and density statistical method according to the video data obtained by the sensor, counting the pedestrian distribution of the specified area and calculating the density (unit of each point after discrete area of the area: the density data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-7]
the sensor of the conflict monitoring terminal 203 is high video camera, embedded processing chip and conflict calculation algorithm according to the video data obtained by the sensor, counting the appointed area pedestrian speed direction and size change rate, relative direction pedestrian number, local density and other parameters, and according to the direction and size change rate of the pedestrian speed in the specified area, relative direction pedestrian number, local density parameter calculating conflict coefficient (dimensionless parameter), then the conflict coefficient is uploaded to the management control terminal 1 through the communication module[Pg 5, P-8]
the sensor of the abnormal monitoring terminal 204 is high video camera, embedded processing chip and the abnormal detection algorithm according to the video data obtained by the sensor, detecting whether there is abnormal behaviour in the specified area, such as a fight, terrorist attack, fast escape, crowding, then the type code of the abnormal behaviour is uploaded to the management control terminal 1 through the communication module. [Pg 5, P-9]
Further, the guide terminal 3 comprises an acoustic guide terminal 301 for conveying sound instruction, for area division, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304. The area separation terminal 303 for the dynamic adjustment of the pedestrian and vehicular traffic area and the guide terminal 305 for order maintenance and manual assistance 305. [Pg 5, P-10]

Huang teaches further using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level (Page 3, Paragraphs 3, 4, 9; Page 3, Paragraph 9-Page 4, Paragraph 1; Page 6, Paragraphs 2, 3, 7, 8 )
Preferably, the management control terminal comprises a region modeling module, a distribution estimation module connected with the region modeling module signal, a distribution prediction module connected with the distribution estimation module signal, and a risk evaluation module connected with the distribution prediction module and the distribution estimation module signal, a simulation and optimization module connected with the risk evaluation module signal and a guide control module connected with the simulation and optimization module signal. [Pg 3, P-3]
Preferably, the simulation and optimization module comprises a personnel evacuation model and personnel flow model, when the risk assessment result of the risk assessment module is low comfort level personnel flow module, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of simulating pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and flow inducing scheme [Pg 3, P-4]
the distribution prediction module according to the history data of the distribution estimation result, establishing model based on machine learning algorithm to predict the future period of pedestrian distribution parameter, and uploading data continuous learning model parameter according to the future monitoring terminal.[Pg 3, P-9]
Preferably, the risk evaluation module signal is connected with a conflict monitoring terminal and an abnormal monitoring terminal, and according to the distribution prediction module, the output result of the distribution estimation module and conflict monitoring terminal, the uploading data of the abnormal monitoring terminal evaluates the possibility of the danger of congestion and trampling on the comfort level region. [Pg 3, P-9-Pg 4, P-1]
Further, the management control terminal 1 comprises a region modeling module 101, and the region modeling module 101 signal connection of the distribution estimation module 102, and the distribution prediction module 102 signal connection distribution prediction module 103. a risk evaluation module 104 connected with the distribution prediction module 103 and the distribution estimation module 102, a simulation and optimization module 105 connected with the risk evaluation module 104 signal and a guide control module 106 connected with the simulation and optimization module 105 signal.[Pg 6, P-2]
Further, the simulation and optimization module 105 comprises a personnel evacuation model and personnel flow model, when the risk evaluation result of the risk evaluation module 104 is low comfort level personnel flow module, management control terminal 1 sends attention area low comfort level alarm, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of the simulated pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and a flow inducing scheme, after the management staff 4 of the auditing by the guide control module 106 sends the inducing information to the guide terminal 3;[Pg 6, P-3]
the distribution prediction module 103 according to the history data of the distribution estimation result, based on the machine learning algorithm establishing model prediction future period of pedestrian distribution parameter, and according to the future monitoring terminal 2 uploading data continuous learning model parameter.[Pg 6, P-7]
Further, the risk evaluation module 104 signal connected with the conflict monitoring terminal 203 and the abnormal monitoring terminal 204, and according to the distribution prediction module 103, distribution estimation module 102 the output result and conflict monitoring terminal 203. The uploading data of the abnormal monitoring terminal 204 evaluates the possibility of the comfort level of congestion and trampling on the region of interest and the occurrence of congestion and trample.[Pg 6, P-8]


Here, we see Huang teaching data from one of more sensors at a specific area of interest (station), using a module synonymous to the first model to predict, from the sensor data, congestion information relating to the one or more sensors that detect the congestion.  Furthermore, Huang uses module for capturing and evaluating congestion data synonymous to the second prediction model/deep learning model to analyze in more depth time series related data to further analyze and predict more accurate parameters affecting the traffic/congestion to future traffic congestion.  
Huang then teaches, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304, thereby displaying congestion result data as well as prediction congestion data
When taken in combination with Guo’s teaching of the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data one of ordinary skill in the art may then be able to determine, the future congestion level of any of the specific (m) areas of the various stations and thereafter displaying, by the processor, to users at the station the predicted congestion-related information.
It would have been obvious to one of ordinary skill in the art to combine Guo’s teaching with Huang’s teaching in order to have a more accurate prediction model to determine congestion/traffic activities and behaviors within a given area.

In regards to claim 8, Guo teaches a learning prediction device comprising: a processor to execute a program; and a memory to store past congestion-area data indicating congestion-related information including a past congestion level of each of m areas (where m is an integer of two or more), past sensor data indicating values detected in the past by one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of the m areas (Paragraphs 9, 11, 17, 60, 63)
It is one object of some embodiments to provide a distributed machine learning based vehicular traffic prediction method that can accurately predict short time, middle time and long time traffic for both city roads and highways. Additionally, it is another object of some embodiments to make optimal route planning for vehicles to minimize traffic congestion and travel time by using the accurate traffic prediction.[P-9]
A learning server coordinates distributed learning among a set of data collectors. A learning server first designs and distributes the traffic models such as neural networks to the set of data collectors for the first round of the distributed training Each data collector then trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training process, each data collector sends the trained traffic models to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models. Upon completion of the model aggregation, the learning server re-distributes the aggregated traffic models to data collectors for the second round of training process. This process of training and aggregation continues until the robust traffic models are built.[P-11]
According to some embodiments of the present invention, a distributed machine learning based traffic prediction method can be provided for predicting traffic of roads. In this case, the distributed machine learning based traffic prediction method may a computer-implemented distributed machine learning based traffic prediction method for predicting traffic of roads. The method may include distributing global multi-task traffic models by a learning server to learning agents to locally train the traffic models; uploading locally trained global multi-task traffic models to the learning server, wherein the locally trained global multi-task traffic models have been trained by the learning agents; updating the global multi-task traffic models by the learning server using the locally trained global multi-task traffic models uploaded from the learning agents; generating a time-dependent global traffic map by the learning server using the updated global multi-task traffic models; and distributing the time-dependent global traffic map to each of vehicles traveling on the roads.[P-17]
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
The learning server 115 may be referred to as a distributed machine learning based traffic predication server. The learning server 115 is configured to provide map information with respect to traffic predictions to vehicles 120 traveling on a road network. The learning server 115 may include one or more processors 121, a memory 140, a memory unit/storage 200 configured to store traffic prediction neural networks 132, traffic data cluster rules 134, global time-dependent map 135, trained multi-task traffic models 136, global map of road network 137, an input interface (or transceiver) 150 configured to communicate with the learning agent 110 via the communication network 112 and update model parameters of the traffic prediction neural networks 132. The trained multi-task traffic models 136 may be the traffic models 310 shown in FIG. 3A.[P-63]
Guo establishes M clusters to represent different traffic models attributed to different traffic areas, where Guo illustrates one area may be freeway traffic to which would labeled Cluster 1, another area may be city traffic (Cluster 2), thereby satisfying the condition of M being two or more.  Furthermore, Guo teaches data collectors/sensors, established as data collector N.  The data collector(s) may be a toll station, a loop detector, or a camera. In which case depending on the area generating  the traffic/cluster model, a desired number of sensors/detectors may be set, such that a desired ratio of  N<M may be met, i.e.data collectors: cluster model. 
Guo then teaches the program which, when executed by the processor, performs processes of, generating a first model by using the congestion-related information indicated by the past congestion-area data as input data and using the values indicated by the past sensor data as correct data(Paragraph 48)
To reap all the aforementioned benefits, the traffic prediction must process the real-time and historical traffic data and observations collected by data collectors and mobile devices. For example, the inductive loop can measure the travel speed by reading the inductance changes over time and such data can be used for the traffic speed prediction. In addition, the wide use of mobile devices (e.g., on-board global position systems and phones) enables the mobility data to be crowdsourced from the general public, further facilitating the unprecedented traffic data collection. Such emerging big data can substantially augment the data availability in terms of the coverage and fidelity and significantly boost the data-driven traffic prediction. The prior art on the traffic prediction can be mainly grouped into two categories. The first category focus on using parametric approaches, such as autoregressive integrated moving average (ARIMA) model and Kalman filtering models. When dealing with the traffic only presenting regular variations, e.g., recurrent traffic congestion occurred in morning and evening rush hour, the parametric approaches can achieve promising prediction results. However, due to the stochastic and nonlinear nature of the road traffic, the traffic predictions of using the parametric approaches can deviate from the actual values especially in the abrupt traffic. Hence, instead of fitting the traffic data into a mathematical model as done by the parametric approach, an alternative way is using the nonparametric approaches where the machine learning (ML) based method is used. For example, a stacked autoencoder model can be used to learn the generic traffic flow features for the predictions. The long short-term memory (LSTM) recurrent neural network (RNN) can be used to predict the traffic flow, speed and occupancy, based on the data collected by the data collector and its upstream and downstream data collectors. Along with the use of RNN, the convolution neural network (CNN) can also be utilized to capture the latent traffic evolution patterns within the underlying road network.[P-48]

Here, we see the integration of real-time traffic data with passed sensed traffic data, congestion related traffic information used to aggregate and enable the prediction of future traffic patterns.
Guo teaches the first model being a learning model for predicting, from the congestion-related information, values to be detected by the one or more sensors at a time point at which the congestion-related information is acquired; using the first model to predict, from the congestion-related information at a first time point indicated by the past congestion-area data, values to be detected by the one or more sensors at the first time point(Paragraph 60; Figure 3A)
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
Here, in P-60 and Fig 3A, it is illustrated and explained that several different traffic models are generated, where one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection
Guo does not verbatim teach generating a second model by using the predicted values as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the values detected by the one or more sensors, the second time point being a time point after the first time point; acquiring the values detected by the one or more sensors; and using the second model to predict, from the acquired values, a future congestion level of any of the m areas.  
However, as taught in P-60 multiple traffic models, where for example, one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection.  Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc.[P-60]
Hence the next round of learning, entails generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data (the previous aggregated traffic model) as input data and using the congestion level in the congestion-related information of a second time point (a different time point in rush hour) indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second time point being a time point after the first time point( a different time in rush hour after the previous time in the preceding learning model) and different areas (toll station, or camera(s) at different location).  The regenerated/second model essentially derived from updated and previous predicted congestion-related information, to using predict a future congestion level of any of the m areas.  
Therefore, though Guo’s teaching does not verbatim recite generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data, by teaching the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data, it is obvious to one of ordinary skill in the art that Guo’s teaching reads on the applicant’s second generated model using substitution of the applicant’s teaching with Guo’s teaching to further yield the exact same results in order to enable a more efficient and robust traffic prediction model.  
Guo however fails to teach displaying, by the processor, to users at the station the future congestion level of any of the m areas.
Guo however fails to teach acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection; using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level of any of the m areas: and displaying, by the processor, to users at the station the predicted congestion- related information.
Whereas, Huang teaches acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection, (Page 3, Paragraph 2; Page 5, Paragraphs 5-10, )
Preferably, the monitoring terminal comprises a traffic monitoring terminal for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal for counting the pedestrian density at a certain time in a certain time, and a conflict monitoring terminal for counting the pedestrian conflict degree in a certain area and an abnormal monitoring terminal for identifying the pedestrian abnormal behaviour in a certain area.[Pg 3, P-2]
Further, the monitoring terminal 2 comprises a traffic monitoring terminal 201 for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal 202 for counting a certain area pedestrian density at a certain time, a conflict monitoring terminal 203 for counting pedestrian conflict degree in a certain area and an abnormal monitoring terminal 204 for identifying pedestrian abnormal behavior in a certain area, each terminal is composed of a sensor, an embedded processing chip, a processing program and a communication module.[Pg 5, P-5]
the sensor of the flow monitoring terminal 201 is a binocular video camera embedded processing chip and flow statistics method according to the video data obtained by the sensor, counting the pedestrian number in and out of the specified area and calculating the flow rate (unit: The cumulative flow rate (unit: 1 %) and the cumulative flow rate (unit: human), then the traffic data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-6]
the sensor of the density monitoring terminal 202 is a wide-angle video camera embedded processing chip and density statistical method according to the video data obtained by the sensor, counting the pedestrian distribution of the specified area and calculating the density (unit of each point after discrete area of the area: the density data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-7]
the sensor of the conflict monitoring terminal 203 is high video camera, embedded processing chip and conflict calculation algorithm according to the video data obtained by the sensor, counting the appointed area pedestrian speed direction and size change rate, relative direction pedestrian number, local density and other parameters, and according to the direction and size change rate of the pedestrian speed in the specified area, relative direction pedestrian number, local density parameter calculating conflict coefficient (dimensionless parameter), then the conflict coefficient is uploaded to the management control terminal 1 through the communication module[Pg 5, P-8]
the sensor of the abnormal monitoring terminal 204 is high video camera, embedded processing chip and the abnormal detection algorithm according to the video data obtained by the sensor, detecting whether there is abnormal behaviour in the specified area, such as a fight, terrorist attack, fast escape, crowding, then the type code of the abnormal behaviour is uploaded to the management control terminal 1 through the communication module. [Pg 5, P-9]
Further, the guide terminal 3 comprises an acoustic guide terminal 301 for conveying sound instruction, for area division, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304. The area separation terminal 303 for the dynamic adjustment of the pedestrian and vehicular traffic area and the guide terminal 305 for order maintenance and manual assistance 305. [Pg 5, P-10]

Huang teaches further using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level (Page 3, Paragraphs 3, 4, 9; Page 3, Paragraph 9-Page 4, Paragraph 1; Page 6, Paragraphs 2, 3, 7, 8 )
Preferably, the management control terminal comprises a region modeling module, a distribution estimation module connected with the region modeling module signal, a distribution prediction module connected with the distribution estimation module signal, and a risk evaluation module connected with the distribution prediction module and the distribution estimation module signal, a simulation and optimization module connected with the risk evaluation module signal and a guide control module connected with the simulation and optimization module signal. [Pg 3, P-3]
Preferably, the simulation and optimization module comprises a personnel evacuation model and personnel flow model, when the risk assessment result of the risk assessment module is low comfort level personnel flow module, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of simulating pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and flow inducing scheme [Pg 3, P-4]
the distribution prediction module according to the history data of the distribution estimation result, establishing model based on machine learning algorithm to predict the future period of pedestrian distribution parameter, and uploading data continuous learning model parameter according to the future monitoring terminal.[Pg 3, P-9]
Preferably, the risk evaluation module signal is connected with a conflict monitoring terminal and an abnormal monitoring terminal, and according to the distribution prediction module, the output result of the distribution estimation module and conflict monitoring terminal, the uploading data of the abnormal monitoring terminal evaluates the possibility of the danger of congestion and trampling on the comfort level region. [Pg 3, P-9-Pg 4, P-1]
Further, the management control terminal 1 comprises a region modeling module 101, and the region modeling module 101 signal connection of the distribution estimation module 102, and the distribution prediction module 102 signal connection distribution prediction module 103. a risk evaluation module 104 connected with the distribution prediction module 103 and the distribution estimation module 102, a simulation and optimization module 105 connected with the risk evaluation module 104 signal and a guide control module 106 connected with the simulation and optimization module 105 signal.[Pg 6, P-2]
Further, the simulation and optimization module 105 comprises a personnel evacuation model and personnel flow model, when the risk evaluation result of the risk evaluation module 104 is low comfort level personnel flow module, management control terminal 1 sends attention area low comfort level alarm, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of the simulated pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and a flow inducing scheme, after the management staff 4 of the auditing by the guide control module 106 sends the inducing information to the guide terminal 3;[Pg 6, P-3]
the distribution prediction module 103 according to the history data of the distribution estimation result, based on the machine learning algorithm establishing model prediction future period of pedestrian distribution parameter, and according to the future monitoring terminal 2 uploading data continuous learning model parameter.[Pg 6, P-7]
Further, the risk evaluation module 104 signal connected with the conflict monitoring terminal 203 and the abnormal monitoring terminal 204, and according to the distribution prediction module 103, distribution estimation module 102 the output result and conflict monitoring terminal 203. The uploading data of the abnormal monitoring terminal 204 evaluates the possibility of the comfort level of congestion and trampling on the region of interest and the occurrence of congestion and trample.[Pg 6, P-8]


Here, we see Huang teaching data from one of more sensors at a specific area of interest (station), using a module synonymous to the first model to predict, from the sensor data, congestion information relating to the one or more sensors that detect the congestion.  Furthermore, Huang uses module for capturing and evaluating congestion data synonymous to the second prediction model/deep learning model to analyze in more depth time series related data to further analyze and predict more accurate parameters affecting the traffic/congestion to future traffic congestion.  
Huang then teaches, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304, thereby displaying congestion result data as well as prediction congestion data
When taken in combination with Guo’s teaching of the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data one of ordinary skill in the art may then be able to determine, the future congestion level of any of the specific (m) areas of the various stations and thereafter displaying, by the processor, to users at the station the predicted congestion-related information.
It would have been obvious to one of ordinary skill in the art to combine Guo’s teaching with Huang’s teaching in order to have a more accurate prediction model to determine congestion/traffic activities and behaviors within a given area.

In regards to claim 9, Guo teaches a non-transitory computer-readable medium that stores therein a program that causes a computer to execute processes of: storing past congestion-area data indicating congestion-related information including a past congestion level of each of m areas (where m is an integer of two or more); storing past sensor data indicating values detected in the past by one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of the m areas(Paragraphs 9, 11, 17, 60, 63)
It is one object of some embodiments to provide a distributed machine learning based vehicular traffic prediction method that can accurately predict short time, middle time and long time traffic for both city roads and highways. Additionally, it is another object of some embodiments to make optimal route planning for vehicles to minimize traffic congestion and travel time by using the accurate traffic prediction.[P-9]
A learning server coordinates distributed learning among a set of data collectors. A learning server first designs and distributes the traffic models such as neural networks to the set of data collectors for the first round of the distributed training Each data collector then trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training process, each data collector sends the trained traffic models to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models. Upon completion of the model aggregation, the learning server re-distributes the aggregated traffic models to data collectors for the second round of training process. This process of training and aggregation continues until the robust traffic models are built.[P-11]
According to some embodiments of the present invention, a distributed machine learning based traffic prediction method can be provided for predicting traffic of roads. In this case, the distributed machine learning based traffic prediction method may a computer-implemented distributed machine learning based traffic prediction method for predicting traffic of roads. The method may include distributing global multi-task traffic models by a learning server to learning agents to locally train the traffic models; uploading locally trained global multi-task traffic models to the learning server, wherein the locally trained global multi-task traffic models have been trained by the learning agents; updating the global multi-task traffic models by the learning server using the locally trained global multi-task traffic models uploaded from the learning agents; generating a time-dependent global traffic map by the learning server using the updated global multi-task traffic models; and distributing the time-dependent global traffic map to each of vehicles traveling on the roads.[P-17]
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
The learning server 115 may be referred to as a distributed machine learning based traffic predication server. The learning server 115 is configured to provide map information with respect to traffic predictions to vehicles 120 traveling on a road network. The learning server 115 may include one or more processors 121, a memory 140, a memory unit/storage 200 configured to store traffic prediction neural networks 132, traffic data cluster rules 134, global time-dependent map 135, trained multi-task traffic models 136, global map of road network 137, an input interface (or transceiver) 150 configured to communicate with the learning agent 110 via the communication network 112 and update model parameters of the traffic prediction neural networks 132. The trained multi-task traffic models 136 may be the traffic models 310 shown in FIG. 3A.[P-63]
Guo establishes M clusters to represent different traffic models attributed to different traffic areas, where Guo illustrates one area may be freeway traffic to which would labeled Cluster 1, another area may be city traffic (Cluster 2), thereby satisfying the condition of M being two or more.  Furthermore, Guo teaches data collectors/sensors, established as data collector N.  The data collector(s) may be a toll station, a loop detector, or a camera. In which case depending on the area generating  the traffic/cluster model, a desired number of sensors/detectors may be set, such that a desired ratio of  N<M may be met, i.e.data collectors: cluster model. 
Guo then teaches generating a first model by using the values indicated by the past sensor data as input data and using the congestion-related information indicated by the past congestion-area data as correct data(Paragraph 48)
To reap all the aforementioned benefits, the traffic prediction must process the real-time and historical traffic data and observations collected by data collectors and mobile devices. For example, the inductive loop can measure the travel speed by reading the inductance changes over time and such data can be used for the traffic speed prediction. In addition, the wide use of mobile devices (e.g., on-board global position systems and phones) enables the mobility data to be crowdsourced from the general public, further facilitating the unprecedented traffic data collection. Such emerging big data can substantially augment the data availability in terms of the coverage and fidelity and significantly boost the data-driven traffic prediction. The prior art on the traffic prediction can be mainly grouped into two categories. The first category focus on using parametric approaches, such as autoregressive integrated moving average (ARIMA) model and Kalman filtering models. When dealing with the traffic only presenting regular variations, e.g., recurrent traffic congestion occurred in morning and evening rush hour, the parametric approaches can achieve promising prediction results. However, due to the stochastic and nonlinear nature of the road traffic, the traffic predictions of using the parametric approaches can deviate from the actual values especially in the abrupt traffic. Hence, instead of fitting the traffic data into a mathematical model as done by the parametric approach, an alternative way is using the nonparametric approaches where the machine learning (ML) based method is used. For example, a stacked autoencoder model can be used to learn the generic traffic flow features for the predictions. The long short-term memory (LSTM) recurrent neural network (RNN) can be used to predict the traffic flow, speed and occupancy, based on the data collected by the data collector and its upstream and downstream data collectors. Along with the use of RNN, the convolution neural network (CNN) can also be utilized to capture the latent traffic evolution patterns within the underlying road network.[P-48]

Here, we see the integration of real-time traffic data with passed sensed traffic data, congestion related traffic information used to aggregate and enable the prediction of future traffic patterns.
Guo then teaches the first model being a learning model for predicting, from values detected by the one or more sensors, congestion-related information of a time point at which the one or more sensors perform detection(Paragraph 60; Figure 3A)
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
Here, in P-60 and Fig 3A, it is illustrated and explained that several different traffic models are generated, where one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection
Guo fails to teach verbatim, generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second time point being a time point after the first time point.  
However, as taught in P-60 multiple traffic models, where for example, one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection.  Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc.[P-60]
Hence the next round of learning, entails generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data (the previous aggregated traffic model) as input data and using the congestion level in the congestion-related information of a second time point (a different time point in rush hour) indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second time point being a time point after the first time point( a different time in rush hour after the previous time in the preceding learning model) and different areas (toll station, or camera(s) at different location).  
Therefore, though Guo’s teaching does not verbatim recite generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data, by teaching the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data, it is obvious to one of ordinary skill in the art that Guo’s teaching reads on the applicant’s second generated model using substitution of the applicant’s teaching with Guo’s teaching to further yield the exact same results in order to enable a more efficient and robust traffic prediction model.  
Guo however fails to teach acquiring, by the computer, present values detected by the one or more sensors of the station: using, by the computer, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection; using, by the computer, the second model to predict, from the predicted congestion-related information, the future congestion level of any of the m areas: and displaying, by the computer, to users at the station the predicted congestion- related information.
Whereas, Huang teaches acquiring, by the computer, present values detected by the one or more sensors of the station: using, by the computer, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection, (Page 3, Paragraph 2; Page 5, Paragraphs 5-10, )
Preferably, the monitoring terminal comprises a traffic monitoring terminal for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal for counting the pedestrian density at a certain time in a certain time, and a conflict monitoring terminal for counting the pedestrian conflict degree in a certain area and an abnormal monitoring terminal for identifying the pedestrian abnormal behaviour in a certain area.[Pg 3, P-2]
Further, the monitoring terminal 2 comprises a traffic monitoring terminal 201 for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal 202 for counting a certain area pedestrian density at a certain time, a conflict monitoring terminal 203 for counting pedestrian conflict degree in a certain area and an abnormal monitoring terminal 204 for identifying pedestrian abnormal behavior in a certain area, each terminal is composed of a sensor, an embedded processing chip, a processing program and a communication module.[Pg 5, P-5]
the sensor of the flow monitoring terminal 201 is a binocular video camera embedded processing chip and flow statistics method according to the video data obtained by the sensor, counting the pedestrian number in and out of the specified area and calculating the flow rate (unit: The cumulative flow rate (unit: 1 %) and the cumulative flow rate (unit: human), then the traffic data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-6]
the sensor of the density monitoring terminal 202 is a wide-angle video camera embedded processing chip and density statistical method according to the video data obtained by the sensor, counting the pedestrian distribution of the specified area and calculating the density (unit of each point after discrete area of the area: the density data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-7]
the sensor of the conflict monitoring terminal 203 is high video camera, embedded processing chip and conflict calculation algorithm according to the video data obtained by the sensor, counting the appointed area pedestrian speed direction and size change rate, relative direction pedestrian number, local density and other parameters, and according to the direction and size change rate of the pedestrian speed in the specified area, relative direction pedestrian number, local density parameter calculating conflict coefficient (dimensionless parameter), then the conflict coefficient is uploaded to the management control terminal 1 through the communication module[Pg 5, P-8]
the sensor of the abnormal monitoring terminal 204 is high video camera, embedded processing chip and the abnormal detection algorithm according to the video data obtained by the sensor, detecting whether there is abnormal behaviour in the specified area, such as a fight, terrorist attack, fast escape, crowding, then the type code of the abnormal behaviour is uploaded to the management control terminal 1 through the communication module. [Pg 5, P-9]
Further, the guide terminal 3 comprises an acoustic guide terminal 301 for conveying sound instruction, for area division, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304. The area separation terminal 303 for the dynamic adjustment of the pedestrian and vehicular traffic area and the guide terminal 305 for order maintenance and manual assistance 305. [Pg 5, P-10]

Huang teaches further using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level (Page 3, Paragraphs 3, 4, 9; Page 3, Paragraph 9-Page 4, Paragraph 1; Page 6, Paragraphs 2, 3, 7, 8 )
Preferably, the management control terminal comprises a region modeling module, a distribution estimation module connected with the region modeling module signal, a distribution prediction module connected with the distribution estimation module signal, and a risk evaluation module connected with the distribution prediction module and the distribution estimation module signal, a simulation and optimization module connected with the risk evaluation module signal and a guide control module connected with the simulation and optimization module signal. [Pg 3, P-3]
Preferably, the simulation and optimization module comprises a personnel evacuation model and personnel flow model, when the risk assessment result of the risk assessment module is low comfort level personnel flow module, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of simulating pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and flow inducing scheme [Pg 3, P-4]
the distribution prediction module according to the history data of the distribution estimation result, establishing model based on machine learning algorithm to predict the future period of pedestrian distribution parameter, and uploading data continuous learning model parameter according to the future monitoring terminal.[Pg 3, P-9]
Preferably, the risk evaluation module signal is connected with a conflict monitoring terminal and an abnormal monitoring terminal, and according to the distribution prediction module, the output result of the distribution estimation module and conflict monitoring terminal, the uploading data of the abnormal monitoring terminal evaluates the possibility of the danger of congestion and trampling on the comfort level region. [Pg 3, P-9-Pg 4, P-1]
Further, the management control terminal 1 comprises a region modeling module 101, and the region modeling module 101 signal connection of the distribution estimation module 102, and the distribution prediction module 102 signal connection distribution prediction module 103. a risk evaluation module 104 connected with the distribution prediction module 103 and the distribution estimation module 102, a simulation and optimization module 105 connected with the risk evaluation module 104 signal and a guide control module 106 connected with the simulation and optimization module 105 signal.[Pg 6, P-2]
Further, the simulation and optimization module 105 comprises a personnel evacuation model and personnel flow model, when the risk evaluation result of the risk evaluation module 104 is low comfort level personnel flow module, management control terminal 1 sends attention area low comfort level alarm, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of the simulated pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and a flow inducing scheme, after the management staff 4 of the auditing by the guide control module 106 sends the inducing information to the guide terminal 3;[Pg 6, P-3]
the distribution prediction module 103 according to the history data of the distribution estimation result, based on the machine learning algorithm establishing model prediction future period of pedestrian distribution parameter, and according to the future monitoring terminal 2 uploading data continuous learning model parameter.[Pg 6, P-7]
Further, the risk evaluation module 104 signal connected with the conflict monitoring terminal 203 and the abnormal monitoring terminal 204, and according to the distribution prediction module 103, distribution estimation module 102 the output result and conflict monitoring terminal 203. The uploading data of the abnormal monitoring terminal 204 evaluates the possibility of the comfort level of congestion and trampling on the region of interest and the occurrence of congestion and trample.[Pg 6, P-8]


Here, we see Huang teaching data from one of more sensors at a specific area of interest (station), using a module synonymous to the first model to predict, from the sensor data, congestion information relating to the one or more sensors that detect the congestion.  Furthermore, Huang uses module for capturing and evaluating congestion data synonymous to the second prediction model/deep learning model to analyze in more depth time series related data to further analyze and predict more accurate parameters affecting the traffic/congestion to future traffic congestion.  When taken in combination with Guo’s teaching of the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data one of ordinary skill in the art may then be able to determine, the future congestion level of any of the specific (m) areas of the various stations and thereafter displaying, by the processor, to users at the station the predicted congestion-related information.
It would have been obvious to one of ordinary skill in the art to combine Guo’s teaching with Huang’s teaching in order to have a more accurate prediction model to determine congestion/traffic activities and behaviors within a given area.
 In regard to claim 10, Guo teaches a non-transitory computer-readable medium that stores therein a program that causes a computer to execute processes of: storing past congestion-area data indicating congestion-related information including a past congestion level of each of m areas (where m is an integer of two or more); storing past sensor data indicating values detected in the past by one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of the m areas (Paragraphs 9, 11, 17, 60, 63)
It is one object of some embodiments to provide a distributed machine learning based vehicular traffic prediction method that can accurately predict short time, middle time and long time traffic for both city roads and highways. Additionally, it is another object of some embodiments to make optimal route planning for vehicles to minimize traffic congestion and travel time by using the accurate traffic prediction.[P-9]
A learning server coordinates distributed learning among a set of data collectors. A learning server first designs and distributes the traffic models such as neural networks to the set of data collectors for the first round of the distributed training Each data collector then trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training process, each data collector sends the trained traffic models to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models. Upon completion of the model aggregation, the learning server re-distributes the aggregated traffic models to data collectors for the second round of training process. This process of training and aggregation continues until the robust traffic models are built.[P-11]
According to some embodiments of the present invention, a distributed machine learning based traffic prediction method can be provided for predicting traffic of roads. In this case, the distributed machine learning based traffic prediction method may a computer-implemented distributed machine learning based traffic prediction method for predicting traffic of roads. The method may include distributing global multi-task traffic models by a learning server to learning agents to locally train the traffic models; uploading locally trained global multi-task traffic models to the learning server, wherein the locally trained global multi-task traffic models have been trained by the learning agents; updating the global multi-task traffic models by the learning server using the locally trained global multi-task traffic models uploaded from the learning agents; generating a time-dependent global traffic map by the learning server using the updated global multi-task traffic models; and distributing the time-dependent global traffic map to each of vehicles traveling on the roads.[P-17]
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
The learning server 115 may be referred to as a distributed machine learning based traffic predication server. The learning server 115 is configured to provide map information with respect to traffic predictions to vehicles 120 traveling on a road network. The learning server 115 may include one or more processors 121, a memory 140, a memory unit/storage 200 configured to store traffic prediction neural networks 132, traffic data cluster rules 134, global time-dependent map 135, trained multi-task traffic models 136, global map of road network 137, an input interface (or transceiver) 150 configured to communicate with the learning agent 110 via the communication network 112 and update model parameters of the traffic prediction neural networks 132. The trained multi-task traffic models 136 may be the traffic models 310 shown in FIG. 3A.[P-63]
Guo establishes M clusters to represent different traffic models attributed to different traffic areas, where Guo illustrates one area may be freeway traffic to which would labeled Cluster 1, another area may be city traffic (Cluster 2), thereby satisfying the condition of M being two or more.  Furthermore, Guo teaches data collectors/sensors, established as data collector N.  The data collector(s) may be a toll station, a loop detector, or a camera. In which case depending on the area generating  the traffic/cluster model, a desired number of sensors/detectors may be set, such that a desired ratio of  N<M may be met, i.e.data collectors: cluster model. 
Guo then teaches generating a first model by using the congestion-related information indicated by the past congestion-area data as input data and using the values indicated by the past sensor data as correct data(Paragraph 48)
To reap all the aforementioned benefits, the traffic prediction must process the real-time and historical traffic data and observations collected by data collectors and mobile devices. For example, the inductive loop can measure the travel speed by reading the inductance changes over time and such data can be used for the traffic speed prediction. In addition, the wide use of mobile devices (e.g., on-board global position systems and phones) enables the mobility data to be crowdsourced from the general public, further facilitating the unprecedented traffic data collection. Such emerging big data can substantially augment the data availability in terms of the coverage and fidelity and significantly boost the data-driven traffic prediction. The prior art on the traffic prediction can be mainly grouped into two categories. The first category focus on using parametric approaches, such as autoregressive integrated moving average (ARIMA) model and Kalman filtering models. When dealing with the traffic only presenting regular variations, e.g., recurrent traffic congestion occurred in morning and evening rush hour, the parametric approaches can achieve promising prediction results. However, due to the stochastic and nonlinear nature of the road traffic, the traffic predictions of using the parametric approaches can deviate from the actual values especially in the abrupt traffic. Hence, instead of fitting the traffic data into a mathematical model as done by the parametric approach, an alternative way is using the nonparametric approaches where the machine learning (ML) based method is used. For example, a stacked autoencoder model can be used to learn the generic traffic flow features for the predictions. The long short-term memory (LSTM) recurrent neural network (RNN) can be used to predict the traffic flow, speed and occupancy, based on the data collected by the data collector and its upstream and downstream data collectors. Along with the use of RNN, the convolution neural network (CNN) can also be utilized to capture the latent traffic evolution patterns within the underlying road network.[P-48]

Here, we see the integration of real-time traffic data with passed sensed traffic data, congestion related traffic information used to aggregate and enable the prediction of future traffic patterns.
Guo then teaches the first model being a learning model for predicting, from the congestion-related information, values to be detected by the one or more sensors at a time point at which the congestion-related information is acquired, using the first model to predict, from the congestion-related information at a first time point indicated by the past congestion-area data, values to be detected by the one or more sensors at the first time point(Paragraph 60; Figure 3A)
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
Here, in P-60 and Fig 3A, it is illustrated and explained that several different traffic models are generated, where one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection
Guo fails to verbatim teach generating a second model by using the predicted values as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from values detected by the one or more sensors, the second time point being a time point after the first time point.  
However, as taught in P-60 multiple traffic models, where for example, one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection.  Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc.[P-60]
Hence the next round of learning, entails generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data (the previous aggregated traffic model) as input data and using the congestion level in the congestion-related information of a second time point (a different time point in rush hour) indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second time point being a time point after the first time point( a different time in rush hour after the previous time in the preceding learning model) and different areas (toll station, or camera(s) at different location).  
Therefore, though Guo’s teaching does not verbatim recite generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data, by teaching the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data, it is obvious to one of ordinary skill in the art that Guo’s teaching reads on the applicant’s second generated model using substitution of the applicant’s teaching with Guo’s teaching to further yield the exact same results in order to enable a more efficient and robust traffic prediction model.  
Guo however fails to teach acquiring, by the computer, present values detected by the one or more sensors of the station: using, by the computer, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection; using, by the computer, the second model to predict, from the predicted congestion-related information, the future congestion level of any of the m areas: and displaying, by the computer, to users at the station the predicted congestion- related information.
Whereas, Huang teaches acquiring, by the computer, present values detected by the one or more sensors of the station: using, by the computer, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection(Page 3, Paragraph 2; Page 5, Paragraphs 5-10, )
Preferably, the monitoring terminal comprises a traffic monitoring terminal for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal for counting the pedestrian density at a certain time in a certain time, and a conflict monitoring terminal for counting the pedestrian conflict degree in a certain area and an abnormal monitoring terminal for identifying the pedestrian abnormal behaviour in a certain area.[Pg 3, P-2]
Further, the monitoring terminal 2 comprises a traffic monitoring terminal 201 for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal 202 for counting a certain area pedestrian density at a certain time, a conflict monitoring terminal 203 for counting pedestrian conflict degree in a certain area and an abnormal monitoring terminal 204 for identifying pedestrian abnormal behavior in a certain area, each terminal is composed of a sensor, an embedded processing chip, a processing program and a communication module.[Pg 5, P-5]
the sensor of the flow monitoring terminal 201 is a binocular video camera embedded processing chip and flow statistics method according to the video data obtained by the sensor, counting the pedestrian number in and out of the specified area and calculating the flow rate (unit: The cumulative flow rate (unit: 1 %) and the cumulative flow rate (unit: human), then the traffic data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-6]
the sensor of the density monitoring terminal 202 is a wide-angle video camera embedded processing chip and density statistical method according to the video data obtained by the sensor, counting the pedestrian distribution of the specified area and calculating the density (unit of each point after discrete area of the area: the density data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-7]
the sensor of the conflict monitoring terminal 203 is high video camera, embedded processing chip and conflict calculation algorithm according to the video data obtained by the sensor, counting the appointed area pedestrian speed direction and size change rate, relative direction pedestrian number, local density and other parameters, and according to the direction and size change rate of the pedestrian speed in the specified area, relative direction pedestrian number, local density parameter calculating conflict coefficient (dimensionless parameter), then the conflict coefficient is uploaded to the management control terminal 1 through the communication module[Pg 5, P-8]
the sensor of the abnormal monitoring terminal 204 is high video camera, embedded processing chip and the abnormal detection algorithm according to the video data obtained by the sensor, detecting whether there is abnormal behaviour in the specified area, such as a fight, terrorist attack, fast escape, crowding, then the type code of the abnormal behaviour is uploaded to the management control terminal 1 through the communication module. [Pg 5, P-9]
Further, the guide terminal 3 comprises an acoustic guide terminal 301 for conveying sound instruction, for area division, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304. The area separation terminal 303 for the dynamic adjustment of the pedestrian and vehicular traffic area and the guide terminal 305 for order maintenance and manual assistance 305. [Pg 5, P-10]

Huang teaches further using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level (Page 3, Paragraphs 3, 4, 9; Page 3, Paragraph 9-Page 4, Paragraph 1; Page 6, Paragraphs 2, 3, 7, 8 )
Preferably, the management control terminal comprises a region modeling module, a distribution estimation module connected with the region modeling module signal, a distribution prediction module connected with the distribution estimation module signal, and a risk evaluation module connected with the distribution prediction module and the distribution estimation module signal, a simulation and optimization module connected with the risk evaluation module signal and a guide control module connected with the simulation and optimization module signal. [Pg 3, P-3]
Preferably, the simulation and optimization module comprises a personnel evacuation model and personnel flow model, when the risk assessment result of the risk assessment module is low comfort level personnel flow module, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of simulating pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and flow inducing scheme [Pg 3, P-4]
the distribution prediction module according to the history data of the distribution estimation result, establishing model based on machine learning algorithm to predict the future period of pedestrian distribution parameter, and uploading data continuous learning model parameter according to the future monitoring terminal.[Pg 3, P-9]
Preferably, the risk evaluation module signal is connected with a conflict monitoring terminal and an abnormal monitoring terminal, and according to the distribution prediction module, the output result of the distribution estimation module and conflict monitoring terminal, the uploading data of the abnormal monitoring terminal evaluates the possibility of the danger of congestion and trampling on the comfort level region. [Pg 3, P-9-Pg 4, P-1]
Further, the management control terminal 1 comprises a region modeling module 101, and the region modeling module 101 signal connection of the distribution estimation module 102, and the distribution prediction module 102 signal connection distribution prediction module 103. a risk evaluation module 104 connected with the distribution prediction module 103 and the distribution estimation module 102, a simulation and optimization module 105 connected with the risk evaluation module 104 signal and a guide control module 106 connected with the simulation and optimization module 105 signal.[Pg 6, P-2]
Further, the simulation and optimization module 105 comprises a personnel evacuation model and personnel flow model, when the risk evaluation result of the risk evaluation module 104 is low comfort level personnel flow module, management control terminal 1 sends attention area low comfort level alarm, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of the simulated pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and a flow inducing scheme, after the management staff 4 of the auditing by the guide control module 106 sends the inducing information to the guide terminal 3;[Pg 6, P-3]
the distribution prediction module 103 according to the history data of the distribution estimation result, based on the machine learning algorithm establishing model prediction future period of pedestrian distribution parameter, and according to the future monitoring terminal 2 uploading data continuous learning model parameter.[Pg 6, P-7]
Further, the risk evaluation module 104 signal connected with the conflict monitoring terminal 203 and the abnormal monitoring terminal 204, and according to the distribution prediction module 103, distribution estimation module 102 the output result and conflict monitoring terminal 203. The uploading data of the abnormal monitoring terminal 204 evaluates the possibility of the comfort level of congestion and trampling on the region of interest and the occurrence of congestion and trample.[Pg 6, P-8]

Here, we see Huang teaching data from one of more sensors at a specific area of interest (station), using a module synonymous to the first model to predict, from the sensor data, congestion information relating to the one or more sensors that detect the congestion.  Furthermore, Huang uses module for capturing and evaluating congestion data synonymous to the second prediction model/deep learning model to analyze in more depth time series related data to further analyze and predict more accurate parameters affecting the traffic/congestion to future traffic congestion.  When taken in combination with Guo’s teaching of the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data one of ordinary skill in the art may then be able to determine, the future congestion level of any of the specific (m) areas of the various stations and thereafter displaying, by the processor, to users at the station the predicted congestion-related information.
It would have been obvious to one of ordinary skill in the art to combine Guo’s teaching with Huang’s teaching in order to have a more accurate prediction model to determine congestion/traffic activities and behaviors within a given area.

In regards to claim 11, Guo teaches a non-transitory computer-readable medium that stores therein a program that causes a computer to execute processes of: acquiring values detected by one or more sensors installed in n areas (where n is an integer of one or more, and n<m) out of m areas (where m is an integer of two or more) (Paragraphs 9, 11, 17, 60, 63)
It is one object of some embodiments to provide a distributed machine learning based vehicular traffic prediction method that can accurately predict short time, middle time and long time traffic for both city roads and highways. Additionally, it is another object of some embodiments to make optimal route planning for vehicles to minimize traffic congestion and travel time by using the accurate traffic prediction.[P-9]
A learning server coordinates distributed learning among a set of data collectors. A learning server first designs and distributes the traffic models such as neural networks to the set of data collectors for the first round of the distributed training Each data collector then trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training process, each data collector sends the trained traffic models to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models. Upon completion of the model aggregation, the learning server re-distributes the aggregated traffic models to data collectors for the second round of training process. This process of training and aggregation continues until the robust traffic models are built.[P-11]
According to some embodiments of the present invention, a distributed machine learning based traffic prediction method can be provided for predicting traffic of roads. In this case, the distributed machine learning based traffic prediction method may a computer-implemented distributed machine learning based traffic prediction method for predicting traffic of roads. The method may include distributing global multi-task traffic models by a learning server to learning agents to locally train the traffic models; uploading locally trained global multi-task traffic models to the learning server, wherein the locally trained global multi-task traffic models have been trained by the learning agents; updating the global multi-task traffic models by the learning server using the locally trained global multi-task traffic models uploaded from the learning agents; generating a time-dependent global traffic map by the learning server using the updated global multi-task traffic models; and distributing the time-dependent global traffic map to each of vehicles traveling on the roads.[P-17]
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
The learning server 115 may be referred to as a distributed machine learning based traffic predication server. The learning server 115 is configured to provide map information with respect to traffic predictions to vehicles 120 traveling on a road network. The learning server 115 may include one or more processors 121, a memory 140, a memory unit/storage 200 configured to store traffic prediction neural networks 132, traffic data cluster rules 134, global time-dependent map 135, trained multi-task traffic models 136, global map of road network 137, an input interface (or transceiver) 150 configured to communicate with the learning agent 110 via the communication network 112 and update model parameters of the traffic prediction neural networks 132. The trained multi-task traffic models 136 may be the traffic models 310 shown in FIG. 3A.[P-63]
Guo establishes M clusters to represent different traffic models attributed to different traffic areas, where Guo illustrates one area may be freeway traffic to which would labeled Cluster 1, another area may be city traffic (Cluster 2), thereby satisfying the condition of M being two or more.  Furthermore, Guo teaches data collectors/sensors, established as data collector N.  The data collector(s) may be a toll station, a loop detector, or a camera. In which case depending on the area generating  the traffic/cluster model, a desired number of sensors/detectors may be set, such that a desired ratio of  N<M may be met, i.e.data collectors: cluster model. 
Guo then teaches using a first model to predict, from the acquired values, congestion-related information of a time point at which the one or more sensors perform detection, the first model being a learning model for predicting, from values to be detected by the one or more sensors, the congestion-related information of the time point at which the one or more sensors perform detection(Paragraph 48)
To reap all the aforementioned benefits, the traffic prediction must process the real-time and historical traffic data and observations collected by data collectors and mobile devices. For example, the inductive loop can measure the travel speed by reading the inductance changes over time and such data can be used for the traffic speed prediction. In addition, the wide use of mobile devices (e.g., on-board global position systems and phones) enables the mobility data to be crowdsourced from the general public, further facilitating the unprecedented traffic data collection. Such emerging big data can substantially augment the data availability in terms of the coverage and fidelity and significantly boost the data-driven traffic prediction. The prior art on the traffic prediction can be mainly grouped into two categories. The first category focus on using parametric approaches, such as autoregressive integrated moving average (ARIMA) model and Kalman filtering models. When dealing with the traffic only presenting regular variations, e.g., recurrent traffic congestion occurred in morning and evening rush hour, the parametric approaches can achieve promising prediction results. However, due to the stochastic and nonlinear nature of the road traffic, the traffic predictions of using the parametric approaches can deviate from the actual values especially in the abrupt traffic. Hence, instead of fitting the traffic data into a mathematical model as done by the parametric approach, an alternative way is using the nonparametric approaches where the machine learning (ML) based method is used. For example, a stacked autoencoder model can be used to learn the generic traffic flow features for the predictions. The long short-term memory (LSTM) recurrent neural network (RNN) can be used to predict the traffic flow, speed and occupancy, based on the data collected by the data collector and its upstream and downstream data collectors. Along with the use of RNN, the convolution neural network (CNN) can also be utilized to capture the latent traffic evolution patterns within the underlying road network.[P-48]

Here, we see the integration of real-time traffic data with passed sensed traffic data, congestion related traffic information used to aggregate and enable the prediction of future traffic patterns.
Guo also teaches  the first model being generated by using values indicated by past sensor data indicating the values detected in the past by the one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of the m areas as input data and using the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of the m areas as correct data(Paragraph 60; Figure 3A)
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
Here, in P-60 and Fig 3A, it is illustrated and explained that several different traffic models are generated, where one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection
Guo fails to teach verbatim using a second model to predict, from the predicted congestion-related information, a future congestion level of any of the m areas, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second model being generated by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second time point being a time point after the first time point.  
 However, as taught in P-60 multiple traffic models, where for example, one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection.  Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc.[P-60]
Hence the next round of learning, entails generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data (the previous aggregated traffic model) as input data and using the congestion level in the congestion-related information of a second time point (a different time point in rush hour) indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second time point being a time point after the first time point( a different time in rush hour after the previous time in the preceding learning model) and different areas (toll station, or camera(s) at different location).  The regenerated/second model essentially derived from updated and previous predicted congestion-related information, to using predict a future congestion level of any of the m areas.  

Therefore, though Guo’s teaching does not verbatim recite generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data, by teaching the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data, it is obvious to one of ordinary skill in the art that Guo’s teaching reads on the applicant’s second generated model using substitution of the applicant’s teaching with Guo’s teaching to further yield the exact same results in order to enable a more efficient and robust traffic prediction model.  
Guo however fails to teach displaying, by the processor, to users at the station the future congestion level of any of the m areas.
Guo however fails to teach acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection; using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level of any of the m areas: and displaying, by the processor, to users at the station the predicted congestion- related information.
Whereas, Huang teaches acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection, (Page 3, Paragraph 2; Page 5, Paragraphs 5-10, )
Preferably, the monitoring terminal comprises a traffic monitoring terminal for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal for counting the pedestrian density at a certain time in a certain time, and a conflict monitoring terminal for counting the pedestrian conflict degree in a certain area and an abnormal monitoring terminal for identifying the pedestrian abnormal behaviour in a certain area.[Pg 3, P-2]
Further, the monitoring terminal 2 comprises a traffic monitoring terminal 201 for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal 202 for counting a certain area pedestrian density at a certain time, a conflict monitoring terminal 203 for counting pedestrian conflict degree in a certain area and an abnormal monitoring terminal 204 for identifying pedestrian abnormal behavior in a certain area, each terminal is composed of a sensor, an embedded processing chip, a processing program and a communication module.[Pg 5, P-5]
the sensor of the flow monitoring terminal 201 is a binocular video camera embedded processing chip and flow statistics method according to the video data obtained by the sensor, counting the pedestrian number in and out of the specified area and calculating the flow rate (unit: The cumulative flow rate (unit: 1 %) and the cumulative flow rate (unit: human), then the traffic data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-6]
the sensor of the density monitoring terminal 202 is a wide-angle video camera embedded processing chip and density statistical method according to the video data obtained by the sensor, counting the pedestrian distribution of the specified area and calculating the density (unit of each point after discrete area of the area: the density data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-7]
the sensor of the conflict monitoring terminal 203 is high video camera, embedded processing chip and conflict calculation algorithm according to the video data obtained by the sensor, counting the appointed area pedestrian speed direction and size change rate, relative direction pedestrian number, local density and other parameters, and according to the direction and size change rate of the pedestrian speed in the specified area, relative direction pedestrian number, local density parameter calculating conflict coefficient (dimensionless parameter), then the conflict coefficient is uploaded to the management control terminal 1 through the communication module[Pg 5, P-8]
the sensor of the abnormal monitoring terminal 204 is high video camera, embedded processing chip and the abnormal detection algorithm according to the video data obtained by the sensor, detecting whether there is abnormal behaviour in the specified area, such as a fight, terrorist attack, fast escape, crowding, then the type code of the abnormal behaviour is uploaded to the management control terminal 1 through the communication module. [Pg 5, P-9]
Further, the guide terminal 3 comprises an acoustic guide terminal 301 for conveying sound instruction, for area division, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304. The area separation terminal 303 for the dynamic adjustment of the pedestrian and vehicular traffic area and the guide terminal 305 for order maintenance and manual assistance 305. [Pg 5, P-10]

Huang teaches further using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level (Page 3, Paragraphs 3, 4, 9; Page 3, Paragraph 9-Page 4, Paragraph 1; Page 6, Paragraphs 2, 3, 7, 8 )
Preferably, the management control terminal comprises a region modeling module, a distribution estimation module connected with the region modeling module signal, a distribution prediction module connected with the distribution estimation module signal, and a risk evaluation module connected with the distribution prediction module and the distribution estimation module signal, a simulation and optimization module connected with the risk evaluation module signal and a guide control module connected with the simulation and optimization module signal. [Pg 3, P-3]
Preferably, the simulation and optimization module comprises a personnel evacuation model and personnel flow model, when the risk assessment result of the risk assessment module is low comfort level personnel flow module, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of simulating pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and flow inducing scheme [Pg 3, P-4]
the distribution prediction module according to the history data of the distribution estimation result, establishing model based on machine learning algorithm to predict the future period of pedestrian distribution parameter, and uploading data continuous learning model parameter according to the future monitoring terminal.[Pg 3, P-9]
Preferably, the risk evaluation module signal is connected with a conflict monitoring terminal and an abnormal monitoring terminal, and according to the distribution prediction module, the output result of the distribution estimation module and conflict monitoring terminal, the uploading data of the abnormal monitoring terminal evaluates the possibility of the danger of congestion and trampling on the comfort level region. [Pg 3, P-9-Pg 4, P-1]
Further, the management control terminal 1 comprises a region modeling module 101, and the region modeling module 101 signal connection of the distribution estimation module 102, and the distribution prediction module 102 signal connection distribution prediction module 103. a risk evaluation module 104 connected with the distribution prediction module 103 and the distribution estimation module 102, a simulation and optimization module 105 connected with the risk evaluation module 104 signal and a guide control module 106 connected with the simulation and optimization module 105 signal.[Pg 6, P-2]
Further, the simulation and optimization module 105 comprises a personnel evacuation model and personnel flow model, when the risk evaluation result of the risk evaluation module 104 is low comfort level personnel flow module, management control terminal 1 sends attention area low comfort level alarm, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of the simulated pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and a flow inducing scheme, after the management staff 4 of the auditing by the guide control module 106 sends the inducing information to the guide terminal 3;[Pg 6, P-3]
the distribution prediction module 103 according to the history data of the distribution estimation result, based on the machine learning algorithm establishing model prediction future period of pedestrian distribution parameter, and according to the future monitoring terminal 2 uploading data continuous learning model parameter.[Pg 6, P-7]
Further, the risk evaluation module 104 signal connected with the conflict monitoring terminal 203 and the abnormal monitoring terminal 204, and according to the distribution prediction module 103, distribution estimation module 102 the output result and conflict monitoring terminal 203. The uploading data of the abnormal monitoring terminal 204 evaluates the possibility of the comfort level of congestion and trampling on the region of interest and the occurrence of congestion and trample.[Pg 6, P-8]


Here, we see Huang teaching data from one of more sensors at a specific area of interest (station), using a module synonymous to the first model to predict, from the sensor data, congestion information relating to the one or more sensors that detect the congestion.  Furthermore, Huang uses module for capturing and evaluating congestion data synonymous to the second prediction model/deep learning model to analyze in more depth time series related data to further analyze and predict more accurate parameters affecting the traffic/congestion to future traffic congestion.  
Huang then teaches, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304, thereby displaying congestion result data as well as prediction congestion data
When taken in combination with Guo’s teaching of the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data one of ordinary skill in the art may then be able to determine, the future congestion level of any of the specific (m) areas of the various stations and thereafter displaying, by the processor, to users at the station the predicted congestion-related information.
It would have been obvious to one of ordinary skill in the art to combine Guo’s teaching with Huang’s teaching in order to have a more accurate prediction model to determine congestion/traffic activities and behaviors within a given area.

In regards to claim 12, Guo teaches a non-transitory computer-readable medium that stores therein a program that causes a computer to execute processes of: acquiring values detected by one or more sensors installed in n areas (where n is an integer of one or more, and n<m) out of m areas (where m is an integer of two or more) (Paragraphs 9, 11, 17, 60, 63)
It is one object of some embodiments to provide a distributed machine learning based vehicular traffic prediction method that can accurately predict short time, middle time and long time traffic for both city roads and highways. Additionally, it is another object of some embodiments to make optimal route planning for vehicles to minimize traffic congestion and travel time by using the accurate traffic prediction.[P-9]
A learning server coordinates distributed learning among a set of data collectors. A learning server first designs and distributes the traffic models such as neural networks to the set of data collectors for the first round of the distributed training Each data collector then trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training process, each data collector sends the trained traffic models to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models. Upon completion of the model aggregation, the learning server re-distributes the aggregated traffic models to data collectors for the second round of training process. This process of training and aggregation continues until the robust traffic models are built.[P-11]
According to some embodiments of the present invention, a distributed machine learning based traffic prediction method can be provided for predicting traffic of roads. In this case, the distributed machine learning based traffic prediction method may a computer-implemented distributed machine learning based traffic prediction method for predicting traffic of roads. The method may include distributing global multi-task traffic models by a learning server to learning agents to locally train the traffic models; uploading locally trained global multi-task traffic models to the learning server, wherein the locally trained global multi-task traffic models have been trained by the learning agents; updating the global multi-task traffic models by the learning server using the locally trained global multi-task traffic models uploaded from the learning agents; generating a time-dependent global traffic map by the learning server using the updated global multi-task traffic models; and distributing the time-dependent global traffic map to each of vehicles traveling on the roads.[P-17]
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
The learning server 115 may be referred to as a distributed machine learning based traffic predication server. The learning server 115 is configured to provide map information with respect to traffic predictions to vehicles 120 traveling on a road network. The learning server 115 may include one or more processors 121, a memory 140, a memory unit/storage 200 configured to store traffic prediction neural networks 132, traffic data cluster rules 134, global time-dependent map 135, trained multi-task traffic models 136, global map of road network 137, an input interface (or transceiver) 150 configured to communicate with the learning agent 110 via the communication network 112 and update model parameters of the traffic prediction neural networks 132. The trained multi-task traffic models 136 may be the traffic models 310 shown in FIG. 3A.[P-63]

Guo establishes M clusters to represent different traffic models attributed to different traffic areas, where Guo illustrates one area may be freeway traffic to which would labeled Cluster 1, another area may be city traffic (Cluster 2), thereby satisfying the condition of M being two or more.  Furthermore, Guo teaches data collectors/sensors, established as data collector N.  The data collector(s) may be a toll station, a loop detector, or a camera. In which case depending on the area generating  the traffic/cluster model, a desired number of sensors/detectors may be set, such that a desired ratio of  N<M may be met, i.e.data collectors: cluster model. 
 Guo also teaches the first model being generated by using, as input data, the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of the m areas and by using, as correct data(Paragraph 48)
To reap all the aforementioned benefits, the traffic prediction must process the real-time and historical traffic data and observations collected by data collectors and mobile devices. For example, the inductive loop can measure the travel speed by reading the inductance changes over time and such data can be used for the traffic speed prediction. In addition, the wide use of mobile devices (e.g., on-board global position systems and phones) enables the mobility data to be crowdsourced from the general public, further facilitating the unprecedented traffic data collection. Such emerging big data can substantially augment the data availability in terms of the coverage and fidelity and significantly boost the data-driven traffic prediction. The prior art on the traffic prediction can be mainly grouped into two categories. The first category focus on using parametric approaches, such as autoregressive integrated moving average (ARIMA) model and Kalman filtering models. When dealing with the traffic only presenting regular variations, e.g., recurrent traffic congestion occurred in morning and evening rush hour, the parametric approaches can achieve promising prediction results. However, due to the stochastic and nonlinear nature of the road traffic, the traffic predictions of using the parametric approaches can deviate from the actual values especially in the abrupt traffic. Hence, instead of fitting the traffic data into a mathematical model as done by the parametric approach, an alternative way is using the nonparametric approaches where the machine learning (ML) based method is used. For example, a stacked autoencoder model can be used to learn the generic traffic flow features for the predictions. The long short-term memory (LSTM) recurrent neural network (RNN) can be used to predict the traffic flow, speed and occupancy, based on the data collected by the data collector and its upstream and downstream data collectors. Along with the use of RNN, the convolution neural network (CNN) can also be utilized to capture the latent traffic evolution patterns within the underlying road network.[P-48]

Here, we see the integration of real-time traffic data with passed sensed traffic data, congestion related traffic information used to aggregate and enable the prediction of future traffic patterns.
Guo teaches the first model being a learning model for predicting, from values detected by the one or more sensors, congestion-related information of a time point at which the one or more sensors perform detection(Paragraph 60; Figure 3A)
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
Here, in P-60 and Fig 3A, it is illustrated and explained that several different traffic models are generated, where one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection
Guo fails to teach verbatim using a second model to predict a future congestion level of any of the m areas from the acquired values, the first model being a learning model for predicting, from congestion-related information, values to be detected by the one or more sensors at a time point at which the congestion-related information is acquired, , the second model being generated by using, as input data, predicted values obtained by using a first model to predict the values to be detected by the one or more sensors at a first time point, from congestion-related information of the first time point indicated by the past congestion-area data and by using, as correct data, a congestion level in the congestion-related information of a second time point indicated by the past congestion-area data, the second time point being a time point after the first time point.  
However, as taught in P-60 multiple traffic models, where for example, one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection.  Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc.[P-60]
Hence the next round of learning, entails generating a second model by using the predicted values(the previous aggregated traffic model to predict traffic congestion/the first traffic model) as input data and using the congestion level in the congestion-related information of a second time point (a different time point in rush hour) indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from values detected by one or more sensors, the second time point being a time point after the first time point (a different time in rush hour after the previous time in the preceding learning model). 
Therefore, though Guo’s teaching does not verbatim recite generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data, by teaching the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data, it is obvious to one of ordinary skill in the art that Guo’s teaching reads on the applicant’s second generated model using substitution of the applicant’s teaching with Guo’s teaching to further yield the exact same results in order to enable a more efficient and robust traffic prediction model.  
Guo however fails to teach displaying, by the processor, to users at the station the future congestion level of any of the m areas.
Guo however fails to teach acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection; using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level of any of the m areas: and displaying, by the processor, to users at the station the predicted congestion- related information.
Whereas, Huang teaches acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection, (Page 3, Paragraph 2; Page 5, Paragraphs 5-10, )
Preferably, the monitoring terminal comprises a traffic monitoring terminal for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal for counting the pedestrian density at a certain time in a certain time, and a conflict monitoring terminal for counting the pedestrian conflict degree in a certain area and an abnormal monitoring terminal for identifying the pedestrian abnormal behaviour in a certain area.[Pg 3, P-2]
Further, the monitoring terminal 2 comprises a traffic monitoring terminal 201 for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal 202 for counting a certain area pedestrian density at a certain time, a conflict monitoring terminal 203 for counting pedestrian conflict degree in a certain area and an abnormal monitoring terminal 204 for identifying pedestrian abnormal behavior in a certain area, each terminal is composed of a sensor, an embedded processing chip, a processing program and a communication module.[Pg 5, P-5]
the sensor of the flow monitoring terminal 201 is a binocular video camera embedded processing chip and flow statistics method according to the video data obtained by the sensor, counting the pedestrian number in and out of the specified area and calculating the flow rate (unit: The cumulative flow rate (unit: 1 %) and the cumulative flow rate (unit: human), then the traffic data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-6]
the sensor of the density monitoring terminal 202 is a wide-angle video camera embedded processing chip and density statistical method according to the video data obtained by the sensor, counting the pedestrian distribution of the specified area and calculating the density (unit of each point after discrete area of the area: the density data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-7]
the sensor of the conflict monitoring terminal 203 is high video camera, embedded processing chip and conflict calculation algorithm according to the video data obtained by the sensor, counting the appointed area pedestrian speed direction and size change rate, relative direction pedestrian number, local density and other parameters, and according to the direction and size change rate of the pedestrian speed in the specified area, relative direction pedestrian number, local density parameter calculating conflict coefficient (dimensionless parameter), then the conflict coefficient is uploaded to the management control terminal 1 through the communication module[Pg 5, P-8]
the sensor of the abnormal monitoring terminal 204 is high video camera, embedded processing chip and the abnormal detection algorithm according to the video data obtained by the sensor, detecting whether there is abnormal behaviour in the specified area, such as a fight, terrorist attack, fast escape, crowding, then the type code of the abnormal behaviour is uploaded to the management control terminal 1 through the communication module. [Pg 5, P-9]
Further, the guide terminal 3 comprises an acoustic guide terminal 301 for conveying sound instruction, for area division, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304. The area separation terminal 303 for the dynamic adjustment of the pedestrian and vehicular traffic area and the guide terminal 305 for order maintenance and manual assistance 305. [Pg 5, P-10]

Huang teaches further using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level (Page 3, Paragraphs 3, 4, 9; Page 3, Paragraph 9-Page 4, Paragraph 1; Page 6, Paragraphs 2, 3, 7, 8 )
Preferably, the management control terminal comprises a region modeling module, a distribution estimation module connected with the region modeling module signal, a distribution prediction module connected with the distribution estimation module signal, and a risk evaluation module connected with the distribution prediction module and the distribution estimation module signal, a simulation and optimization module connected with the risk evaluation module signal and a guide control module connected with the simulation and optimization module signal. [Pg 3, P-3]
Preferably, the simulation and optimization module comprises a personnel evacuation model and personnel flow model, when the risk assessment result of the risk assessment module is low comfort level personnel flow module, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of simulating pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and flow inducing scheme [Pg 3, P-4]
the distribution prediction module according to the history data of the distribution estimation result, establishing model based on machine learning algorithm to predict the future period of pedestrian distribution parameter, and uploading data continuous learning model parameter according to the future monitoring terminal.[Pg 3, P-9]
Preferably, the risk evaluation module signal is connected with a conflict monitoring terminal and an abnormal monitoring terminal, and according to the distribution prediction module, the output result of the distribution estimation module and conflict monitoring terminal, the uploading data of the abnormal monitoring terminal evaluates the possibility of the danger of congestion and trampling on the comfort level region. [Pg 3, P-9-Pg 4, P-1]
Further, the management control terminal 1 comprises a region modeling module 101, and the region modeling module 101 signal connection of the distribution estimation module 102, and the distribution prediction module 102 signal connection distribution prediction module 103. a risk evaluation module 104 connected with the distribution prediction module 103 and the distribution estimation module 102, a simulation and optimization module 105 connected with the risk evaluation module 104 signal and a guide control module 106 connected with the simulation and optimization module 105 signal.[Pg 6, P-2]
Further, the simulation and optimization module 105 comprises a personnel evacuation model and personnel flow model, when the risk evaluation result of the risk evaluation module 104 is low comfort level personnel flow module, management control terminal 1 sends attention area low comfort level alarm, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of the simulated pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and a flow inducing scheme, after the management staff 4 of the auditing by the guide control module 106 sends the inducing information to the guide terminal 3;[Pg 6, P-3]
the distribution prediction module 103 according to the history data of the distribution estimation result, based on the machine learning algorithm establishing model prediction future period of pedestrian distribution parameter, and according to the future monitoring terminal 2 uploading data continuous learning model parameter.[Pg 6, P-7]
Further, the risk evaluation module 104 signal connected with the conflict monitoring terminal 203 and the abnormal monitoring terminal 204, and according to the distribution prediction module 103, distribution estimation module 102 the output result and conflict monitoring terminal 203. The uploading data of the abnormal monitoring terminal 204 evaluates the possibility of the comfort level of congestion and trampling on the region of interest and the occurrence of congestion and trample.[Pg 6, P-8]


Here, we see Huang teaching data from one of more sensors at a specific area of interest (station), using a module synonymous to the first model to predict, from the sensor data, congestion information relating to the one or more sensors that detect the congestion.  Furthermore, Huang uses module for capturing and evaluating congestion data synonymous to the second prediction model/deep learning model to analyze in more depth time series related data to further analyze and predict more accurate parameters affecting the traffic/congestion to future traffic congestion.  
Huang then teaches, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304, thereby displaying congestion result data as well as prediction congestion data
When taken in combination with Guo’s teaching of the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data one of ordinary skill in the art may then be able to determine, the future congestion level of any of the specific (m) areas of the various stations and thereafter displaying, by the processor, to users at the station the predicted congestion-related information.
It would have been obvious to one of ordinary skill in the art to combine Guo’s teaching with Huang’s teaching in order to have a more accurate prediction model to determine congestion/traffic activities and behaviors within a given area.

In regards to claim 13, Guo teaches a non-transitory computer-readable medium that stores therein a program that causes a computer to execute processes of: storing past congestion-area data indicating congestion-related information including a past congestion level of each of m areas (where m is an integer of two or more); storing past sensor data indicating values detected in the past by one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of the m areas(Paragraphs 9, 11, 17, 60, 63)
It is one object of some embodiments to provide a distributed machine learning based vehicular traffic prediction method that can accurately predict short time, middle time and long time traffic for both city roads and highways. Additionally, it is another object of some embodiments to make optimal route planning for vehicles to minimize traffic congestion and travel time by using the accurate traffic prediction.[P-9]
A learning server coordinates distributed learning among a set of data collectors. A learning server first designs and distributes the traffic models such as neural networks to the set of data collectors for the first round of the distributed training Each data collector then trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training process, each data collector sends the trained traffic models to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models. Upon completion of the model aggregation, the learning server re-distributes the aggregated traffic models to data collectors for the second round of training process. This process of training and aggregation continues until the robust traffic models are built.[P-11]
According to some embodiments of the present invention, a distributed machine learning based traffic prediction method can be provided for predicting traffic of roads. In this case, the distributed machine learning based traffic prediction method may a computer-implemented distributed machine learning based traffic prediction method for predicting traffic of roads. The method may include distributing global multi-task traffic models by a learning server to learning agents to locally train the traffic models; uploading locally trained global multi-task traffic models to the learning server, wherein the locally trained global multi-task traffic models have been trained by the learning agents; updating the global multi-task traffic models by the learning server using the locally trained global multi-task traffic models uploaded from the learning agents; generating a time-dependent global traffic map by the learning server using the updated global multi-task traffic models; and distributing the time-dependent global traffic map to each of vehicles traveling on the roads.[P-17]
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
The learning server 115 may be referred to as a distributed machine learning based traffic predication server. The learning server 115 is configured to provide map information with respect to traffic predictions to vehicles 120 traveling on a road network. The learning server 115 may include one or more processors 121, a memory 140, a memory unit/storage 200 configured to store traffic prediction neural networks 132, traffic data cluster rules 134, global time-dependent map 135, trained multi-task traffic models 136, global map of road network 137, an input interface (or transceiver) 150 configured to communicate with the learning agent 110 via the communication network 112 and update model parameters of the traffic prediction neural networks 132. The trained multi-task traffic models 136 may be the traffic models 310 shown in FIG. 3A.[P-63]

Guo establishes M clusters to represent different traffic models attributed to different traffic areas, where Guo illustrates one area may be freeway traffic to which would labeled Cluster 1, another area may be city traffic (Cluster 2), thereby satisfying the condition of M being two or more.  Furthermore, Guo teaches data collectors/sensors, established as data collector N.  The data collector(s) may be a toll station, a loop detector, or a camera. In which case depending on the area generating  the traffic/cluster model, a desired number of sensors/detectors may be set, such that a desired ratio of  N<M may be met, i.e.data collectors: cluster model. 
Guo then teaches generating a first model by using the values indicated by the past sensor data as input data and using the congestion-related information indicated by the past congestion-area data as correct data(Paragraph 48)
To reap all the aforementioned benefits, the traffic prediction must process the real-time and historical traffic data and observations collected by data collectors and mobile devices. For example, the inductive loop can measure the travel speed by reading the inductance changes over time and such data can be used for the traffic speed prediction. In addition, the wide use of mobile devices (e.g., on-board global position systems and phones) enables the mobility data to be crowdsourced from the general public, further facilitating the unprecedented traffic data collection. Such emerging big data can substantially augment the data availability in terms of the coverage and fidelity and significantly boost the data-driven traffic prediction. The prior art on the traffic prediction can be mainly grouped into two categories. The first category focus on using parametric approaches, such as autoregressive integrated moving average (ARIMA) model and Kalman filtering models. When dealing with the traffic only presenting regular variations, e.g., recurrent traffic congestion occurred in morning and evening rush hour, the parametric approaches can achieve promising prediction results. However, due to the stochastic and nonlinear nature of the road traffic, the traffic predictions of using the parametric approaches can deviate from the actual values especially in the abrupt traffic. Hence, instead of fitting the traffic data into a mathematical model as done by the parametric approach, an alternative way is using the nonparametric approaches where the machine learning (ML) based method is used. For example, a stacked autoencoder model can be used to learn the generic traffic flow features for the predictions. The long short-term memory (LSTM) recurrent neural network (RNN) can be used to predict the traffic flow, speed and occupancy, based on the data collected by the data collector and its upstream and downstream data collectors. Along with the use of RNN, the convolution neural network (CNN) can also be utilized to capture the latent traffic evolution patterns within the underlying road network.[P-48]

Here, we see the integration of real-time traffic data with passed sensed traffic data, congestion related traffic information used to aggregate and enable the prediction of future traffic patterns.
Guo teaches the first model being a learning model for predicting, from values detected by the one or more sensors, congestion-related information of a time point at which the one or more sensors perform detection (Paragraph 60; Figure 3A)
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
Here, in P-60 and Fig 3A, it is illustrated and explained that several different traffic models are generated, where one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection
Guo does not verbatim teach generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second time point being a time point after the first time point; acquiring the values detected by the one or more sensors; using the first model to predict, from the acquired values, the congestion-related information of a time point at which the one or more sensors perform detection; and using the second model to predict, from the predicted congestion-related information, a future congestion level of any of the m areas.  
  However, as taught in P-60 multiple traffic models, where for example, one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection.  Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc.[P-60]
Hence the next round of learning, entails generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data (the previous aggregated traffic model) as input data and using the congestion level in the congestion-related information of a second time point (a different time point in rush hour) indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second time point being a time point after the first time point( a different time in rush hour after the previous time in the preceding learning model) and different areas (toll station, or camera(s) at different location).  The regenerated/second model essentially derived from updated and previous predicted congestion-related information, to using predict a future congestion level of any of the m areas.  
Therefore, though Guo’s teaching does not verbatim recite generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data, by teaching the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data, it is obvious to one of ordinary skill in the art that Guo’s teaching reads on the applicant’s second generated model using substitution of the applicant’s teaching with Guo’s teaching to further yield the exact same results in order to enable a more efficient and robust traffic prediction model.  
Guo however fails to teach displaying, by the processor, to users at the station the future congestion level of any of the m areas.
Guo however fails to teach acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection; using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level of any of the m areas: and displaying, by the processor, to users at the station the predicted congestion- related information.
Whereas, Huang teaches acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection, (Page 3, Paragraph 2; Page 5, Paragraphs 5-10, )
Preferably, the monitoring terminal comprises a traffic monitoring terminal for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal for counting the pedestrian density at a certain time in a certain time, and a conflict monitoring terminal for counting the pedestrian conflict degree in a certain area and an abnormal monitoring terminal for identifying the pedestrian abnormal behaviour in a certain area.[Pg 3, P-2]
Further, the monitoring terminal 2 comprises a traffic monitoring terminal 201 for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal 202 for counting a certain area pedestrian density at a certain time, a conflict monitoring terminal 203 for counting pedestrian conflict degree in a certain area and an abnormal monitoring terminal 204 for identifying pedestrian abnormal behavior in a certain area, each terminal is composed of a sensor, an embedded processing chip, a processing program and a communication module.[Pg 5, P-5]
the sensor of the flow monitoring terminal 201 is a binocular video camera embedded processing chip and flow statistics method according to the video data obtained by the sensor, counting the pedestrian number in and out of the specified area and calculating the flow rate (unit: The cumulative flow rate (unit: 1 %) and the cumulative flow rate (unit: human), then the traffic data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-6]
the sensor of the density monitoring terminal 202 is a wide-angle video camera embedded processing chip and density statistical method according to the video data obtained by the sensor, counting the pedestrian distribution of the specified area and calculating the density (unit of each point after discrete area of the area: the density data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-7]
the sensor of the conflict monitoring terminal 203 is high video camera, embedded processing chip and conflict calculation algorithm according to the video data obtained by the sensor, counting the appointed area pedestrian speed direction and size change rate, relative direction pedestrian number, local density and other parameters, and according to the direction and size change rate of the pedestrian speed in the specified area, relative direction pedestrian number, local density parameter calculating conflict coefficient (dimensionless parameter), then the conflict coefficient is uploaded to the management control terminal 1 through the communication module[Pg 5, P-8]
the sensor of the abnormal monitoring terminal 204 is high video camera, embedded processing chip and the abnormal detection algorithm according to the video data obtained by the sensor, detecting whether there is abnormal behaviour in the specified area, such as a fight, terrorist attack, fast escape, crowding, then the type code of the abnormal behaviour is uploaded to the management control terminal 1 through the communication module. [Pg 5, P-9]
Further, the guide terminal 3 comprises an acoustic guide terminal 301 for conveying sound instruction, for area division, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304. The area separation terminal 303 for the dynamic adjustment of the pedestrian and vehicular traffic area and the guide terminal 305 for order maintenance and manual assistance 305. [Pg 5, P-10]

Huang teaches further using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level (Page 3, Paragraphs 3, 4, 9; Page 3, Paragraph 9-Page 4, Paragraph 1; Page 6, Paragraphs 2, 3, 7, 8 )
Preferably, the management control terminal comprises a region modeling module, a distribution estimation module connected with the region modeling module signal, a distribution prediction module connected with the distribution estimation module signal, and a risk evaluation module connected with the distribution prediction module and the distribution estimation module signal, a simulation and optimization module connected with the risk evaluation module signal and a guide control module connected with the simulation and optimization module signal. [Pg 3, P-3]
Preferably, the simulation and optimization module comprises a personnel evacuation model and personnel flow model, when the risk assessment result of the risk assessment module is low comfort level personnel flow module, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of simulating pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and flow inducing scheme [Pg 3, P-4]
the distribution prediction module according to the history data of the distribution estimation result, establishing model based on machine learning algorithm to predict the future period of pedestrian distribution parameter, and uploading data continuous learning model parameter according to the future monitoring terminal.[Pg 3, P-9]
Preferably, the risk evaluation module signal is connected with a conflict monitoring terminal and an abnormal monitoring terminal, and according to the distribution prediction module, the output result of the distribution estimation module and conflict monitoring terminal, the uploading data of the abnormal monitoring terminal evaluates the possibility of the danger of congestion and trampling on the comfort level region. [Pg 3, P-9-Pg 4, P-1]
Further, the management control terminal 1 comprises a region modeling module 101, and the region modeling module 101 signal connection of the distribution estimation module 102, and the distribution prediction module 102 signal connection distribution prediction module 103. a risk evaluation module 104 connected with the distribution prediction module 103 and the distribution estimation module 102, a simulation and optimization module 105 connected with the risk evaluation module 104 signal and a guide control module 106 connected with the simulation and optimization module 105 signal.[Pg 6, P-2]
Further, the simulation and optimization module 105 comprises a personnel evacuation model and personnel flow model, when the risk evaluation result of the risk evaluation module 104 is low comfort level personnel flow module, management control terminal 1 sends attention area low comfort level alarm, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of the simulated pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and a flow inducing scheme, after the management staff 4 of the auditing by the guide control module 106 sends the inducing information to the guide terminal 3;[Pg 6, P-3]
the distribution prediction module 103 according to the history data of the distribution estimation result, based on the machine learning algorithm establishing model prediction future period of pedestrian distribution parameter, and according to the future monitoring terminal 2 uploading data continuous learning model parameter.[Pg 6, P-7]
Further, the risk evaluation module 104 signal connected with the conflict monitoring terminal 203 and the abnormal monitoring terminal 204, and according to the distribution prediction module 103, distribution estimation module 102 the output result and conflict monitoring terminal 203. The uploading data of the abnormal monitoring terminal 204 evaluates the possibility of the comfort level of congestion and trampling on the region of interest and the occurrence of congestion and trample.[Pg 6, P-8]


Here, we see Huang teaching data from one of more sensors at a specific area of interest (station), using a module synonymous to the first model to predict, from the sensor data, congestion information relating to the one or more sensors that detect the congestion.  Furthermore, Huang uses module for capturing and evaluating congestion data synonymous to the second prediction model/deep learning model to analyze in more depth time series related data to further analyze and predict more accurate parameters affecting the traffic/congestion to future traffic congestion.  
Huang then teaches, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304, thereby displaying congestion result data as well as prediction congestion data
When taken in combination with Guo’s teaching of the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data one of ordinary skill in the art may then be able to determine, the future congestion level of any of the specific (m) areas of the various stations and thereafter displaying, by the processor, to users at the station the predicted congestion-related information.
It would have been obvious to one of ordinary skill in the art to combine Guo’s teaching with Huang’s teaching in order to have a more accurate prediction model to determine congestion/traffic activities and behaviors within a given area.

In regards to claim 14, Guo teaches a non-transitory computer-readable medium that stores therein a program that causes a computer to execute processes of: storing past congestion-area data indicating congestion-related information including a past congestion level of each of m areas (where m is an integer of two or more); storing past sensor data indicating values detected in the past by one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of the m areas (Paragraphs 9, 11, 17, 60, 63)
It is one object of some embodiments to provide a distributed machine learning based vehicular traffic prediction method that can accurately predict short time, middle time and long time traffic for both city roads and highways. Additionally, it is another object of some embodiments to make optimal route planning for vehicles to minimize traffic congestion and travel time by using the accurate traffic prediction.[P-9]
A learning server coordinates distributed learning among a set of data collectors. A learning server first designs and distributes the traffic models such as neural networks to the set of data collectors for the first round of the distributed training Each data collector then trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training process, each data collector sends the trained traffic models to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models. Upon completion of the model aggregation, the learning server re-distributes the aggregated traffic models to data collectors for the second round of training process. This process of training and aggregation continues until the robust traffic models are built.[P-11]
According to some embodiments of the present invention, a distributed machine learning based traffic prediction method can be provided for predicting traffic of roads. In this case, the distributed machine learning based traffic prediction method may a computer-implemented distributed machine learning based traffic prediction method for predicting traffic of roads. The method may include distributing global multi-task traffic models by a learning server to learning agents to locally train the traffic models; uploading locally trained global multi-task traffic models to the learning server, wherein the locally trained global multi-task traffic models have been trained by the learning agents; updating the global multi-task traffic models by the learning server using the locally trained global multi-task traffic models uploaded from the learning agents; generating a time-dependent global traffic map by the learning server using the updated global multi-task traffic models; and distributing the time-dependent global traffic map to each of vehicles traveling on the roads.[P-17]
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
The learning server 115 may be referred to as a distributed machine learning based traffic predication server. The learning server 115 is configured to provide map information with respect to traffic predictions to vehicles 120 traveling on a road network. The learning server 115 may include one or more processors 121, a memory 140, a memory unit/storage 200 configured to store traffic prediction neural networks 132, traffic data cluster rules 134, global time-dependent map 135, trained multi-task traffic models 136, global map of road network 137, an input interface (or transceiver) 150 configured to communicate with the learning agent 110 via the communication network 112 and update model parameters of the traffic prediction neural networks 132. The trained multi-task traffic models 136 may be the traffic models 310 shown in FIG. 3A.[P-63]
Guo establishes M clusters to represent different traffic models attributed to different traffic areas, where Guo illustrates one area may be freeway traffic to which would labeled Cluster 1, another area may be city traffic (Cluster 2), thereby satisfying the condition of M being two or more.  Furthermore, Guo teaches data collectors/sensors, established as data collector N.  The data collector(s) may be a toll station, a loop detector, or a camera. In which case depending on the area generating  the traffic/cluster model, a desired number of sensors/detectors may be set, such that a desired ratio of  N<M may be met, i.e.data collectors: cluster model. 
Guo teaches generating a first model by using the congestion-related information indicated by the past congestion-area data as input data and using the values indicated by the past sensor data as correct data (Paragraph 48)
To reap all the aforementioned benefits, the traffic prediction must process the real-time and historical traffic data and observations collected by data collectors and mobile devices. For example, the inductive loop can measure the travel speed by reading the inductance changes over time and such data can be used for the traffic speed prediction. In addition, the wide use of mobile devices (e.g., on-board global position systems and phones) enables the mobility data to be crowdsourced from the general public, further facilitating the unprecedented traffic data collection. Such emerging big data can substantially augment the data availability in terms of the coverage and fidelity and significantly boost the data-driven traffic prediction. The prior art on the traffic prediction can be mainly grouped into two categories. The first category focus on using parametric approaches, such as autoregressive integrated moving average (ARIMA) model and Kalman filtering models. When dealing with the traffic only presenting regular variations, e.g., recurrent traffic congestion occurred in morning and evening rush hour, the parametric approaches can achieve promising prediction results. However, due to the stochastic and nonlinear nature of the road traffic, the traffic predictions of using the parametric approaches can deviate from the actual values especially in the abrupt traffic. Hence, instead of fitting the traffic data into a mathematical model as done by the parametric approach, an alternative way is using the nonparametric approaches where the machine learning (ML) based method is used. For example, a stacked autoencoder model can be used to learn the generic traffic flow features for the predictions. The long short-term memory (LSTM) recurrent neural network (RNN) can be used to predict the traffic flow, speed and occupancy, based on the data collected by the data collector and its upstream and downstream data collectors. Along with the use of RNN, the convolution neural network (CNN) can also be utilized to capture the latent traffic evolution patterns within the underlying road network.[P-48]

Here, we see the integration of real-time traffic data with passed sensed traffic data, congestion related traffic information used to aggregate and enable the prediction of future traffic patterns
Guo teaches the first model being a learning model for predicting, from the congestion-related information, values to be detected by the one or more sensors at a time point at which the congestion-related information is acquired; using the first model to predict, from the congestion-related information at a first time point indicated by the past congestion-area data, values to be detected by the one or more sensors at the first time point (Paragraph 60; Figure 3A)
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
Here, in P-60 and Fig 3A, it is illustrated and explained that several different traffic models are generated, where one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection
Guo does not verbatim teach generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second time point being a time point after the first time point; acquiring the values detected by the one or more sensors; using the first model to predict, from the acquired values, the congestion-related information of a time point at which the one or more sensors perform detection; and using the second model to predict, from the predicted congestion-related information, a future congestion level of any of the m areas.  
  However, as taught in P-60 multiple traffic models, where for example, one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection.  Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc.[P-60]
Hence the next round of learning, entails generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data (the previous aggregated traffic model) as input data and using the congestion level in the congestion-related information of a second time point (a different time point in rush hour) indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second time point being a time point after the first time point( a different time in rush hour after the previous time in the preceding learning model) and different areas (toll station, or camera(s) at different location).  The regenerated/second model essentially derived from updated and previous predicted congestion-related information, to using predict a future congestion level of any of the m areas.  
Therefore, though Guo’s teaching does not verbatim recite generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data, by teaching the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data, it is obvious to one of ordinary skill in the art that Guo’s teaching reads on the applicant’s second generated model using substitution of the applicant’s teaching with Guo’s teaching to further yield the exact same results in order to enable a more efficient and robust traffic prediction model.  
Guo however fails to teach displaying, by the processor, to users at the station the future congestion level of any of the m areas.
Guo however fails to teach acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection; using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level of any of the m areas: and displaying, by the processor, to users at the station the predicted congestion- related information.
Whereas, Huang teaches acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection, (Page 3, Paragraph 2; Page 5, Paragraphs 5-10, )
Preferably, the monitoring terminal comprises a traffic monitoring terminal for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal for counting the pedestrian density at a certain time in a certain time, and a conflict monitoring terminal for counting the pedestrian conflict degree in a certain area and an abnormal monitoring terminal for identifying the pedestrian abnormal behaviour in a certain area.[Pg 3, P-2]
Further, the monitoring terminal 2 comprises a traffic monitoring terminal 201 for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal 202 for counting a certain area pedestrian density at a certain time, a conflict monitoring terminal 203 for counting pedestrian conflict degree in a certain area and an abnormal monitoring terminal 204 for identifying pedestrian abnormal behavior in a certain area, each terminal is composed of a sensor, an embedded processing chip, a processing program and a communication module.[Pg 5, P-5]
the sensor of the flow monitoring terminal 201 is a binocular video camera embedded processing chip and flow statistics method according to the video data obtained by the sensor, counting the pedestrian number in and out of the specified area and calculating the flow rate (unit: The cumulative flow rate (unit: 1 %) and the cumulative flow rate (unit: human), then the traffic data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-6]
the sensor of the density monitoring terminal 202 is a wide-angle video camera embedded processing chip and density statistical method according to the video data obtained by the sensor, counting the pedestrian distribution of the specified area and calculating the density (unit of each point after discrete area of the area: the density data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-7]
the sensor of the conflict monitoring terminal 203 is high video camera, embedded processing chip and conflict calculation algorithm according to the video data obtained by the sensor, counting the appointed area pedestrian speed direction and size change rate, relative direction pedestrian number, local density and other parameters, and according to the direction and size change rate of the pedestrian speed in the specified area, relative direction pedestrian number, local density parameter calculating conflict coefficient (dimensionless parameter), then the conflict coefficient is uploaded to the management control terminal 1 through the communication module[Pg 5, P-8]
the sensor of the abnormal monitoring terminal 204 is high video camera, embedded processing chip and the abnormal detection algorithm according to the video data obtained by the sensor, detecting whether there is abnormal behaviour in the specified area, such as a fight, terrorist attack, fast escape, crowding, then the type code of the abnormal behaviour is uploaded to the management control terminal 1 through the communication module. [Pg 5, P-9]
Further, the guide terminal 3 comprises an acoustic guide terminal 301 for conveying sound instruction, for area division, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304. The area separation terminal 303 for the dynamic adjustment of the pedestrian and vehicular traffic area and the guide terminal 305 for order maintenance and manual assistance 305. [Pg 5, P-10]

Huang teaches further using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level (Page 3, Paragraphs 3, 4, 9; Page 3, Paragraph 9-Page 4, Paragraph 1; Page 6, Paragraphs 2, 3, 7, 8 )
Preferably, the management control terminal comprises a region modeling module, a distribution estimation module connected with the region modeling module signal, a distribution prediction module connected with the distribution estimation module signal, and a risk evaluation module connected with the distribution prediction module and the distribution estimation module signal, a simulation and optimization module connected with the risk evaluation module signal and a guide control module connected with the simulation and optimization module signal. [Pg 3, P-3]
Preferably, the simulation and optimization module comprises a personnel evacuation model and personnel flow model, when the risk assessment result of the risk assessment module is low comfort level personnel flow module, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of simulating pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and flow inducing scheme [Pg 3, P-4]
the distribution prediction module according to the history data of the distribution estimation result, establishing model based on machine learning algorithm to predict the future period of pedestrian distribution parameter, and uploading data continuous learning model parameter according to the future monitoring terminal.[Pg 3, P-9]
Preferably, the risk evaluation module signal is connected with a conflict monitoring terminal and an abnormal monitoring terminal, and according to the distribution prediction module, the output result of the distribution estimation module and conflict monitoring terminal, the uploading data of the abnormal monitoring terminal evaluates the possibility of the danger of congestion and trampling on the comfort level region. [Pg 3, P-9-Pg 4, P-1]
Further, the management control terminal 1 comprises a region modeling module 101, and the region modeling module 101 signal connection of the distribution estimation module 102, and the distribution prediction module 102 signal connection distribution prediction module 103. a risk evaluation module 104 connected with the distribution prediction module 103 and the distribution estimation module 102, a simulation and optimization module 105 connected with the risk evaluation module 104 signal and a guide control module 106 connected with the simulation and optimization module 105 signal.[Pg 6, P-2]
Further, the simulation and optimization module 105 comprises a personnel evacuation model and personnel flow model, when the risk evaluation result of the risk evaluation module 104 is low comfort level personnel flow module, management control terminal 1 sends attention area low comfort level alarm, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of the simulated pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and a flow inducing scheme, after the management staff 4 of the auditing by the guide control module 106 sends the inducing information to the guide terminal 3;[Pg 6, P-3]
the distribution prediction module 103 according to the history data of the distribution estimation result, based on the machine learning algorithm establishing model prediction future period of pedestrian distribution parameter, and according to the future monitoring terminal 2 uploading data continuous learning model parameter.[Pg 6, P-7]
Further, the risk evaluation module 104 signal connected with the conflict monitoring terminal 203 and the abnormal monitoring terminal 204, and according to the distribution prediction module 103, distribution estimation module 102 the output result and conflict monitoring terminal 203. The uploading data of the abnormal monitoring terminal 204 evaluates the possibility of the comfort level of congestion and trampling on the region of interest and the occurrence of congestion and trample.[Pg 6, P-8]


Here, we see Huang teaching data from one of more sensors at a specific area of interest (station), using a module synonymous to the first model to predict, from the sensor data, congestion information relating to the one or more sensors that detect the congestion.  Furthermore, Huang uses module for capturing and evaluating congestion data synonymous to the second prediction model/deep learning model to analyze in more depth time series related data to further analyze and predict more accurate parameters affecting the traffic/congestion to future traffic congestion.  
Huang then teaches, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304, thereby displaying congestion result data as well as prediction congestion data
When taken in combination with Guo’s teaching of the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data one of ordinary skill in the art may then be able to determine, the future congestion level of any of the specific (m) areas of the various stations and thereafter displaying, by the processor, to users at the station the predicted congestion-related information.
It would have been obvious to one of ordinary skill in the art to combine Guo’s teaching with Huang’s teaching in order to have a more accurate prediction model to determine congestion/traffic activities and behaviors within a given area.

In regards to claim 15, Guo teaches a learning method comprising: generating a first model that is a learning model for predicting, from values detected by one or more sensors, congestion-related information of a time point at which the one or more sensors perform detection, the first model being generated by using values indicated by past sensor data indicating the values detected in the past by the one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of m areas (where m is an integer of two or more) as input data and using the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of the m areas as correct data (Paragraphs 9, 11, 17, 48, 60, 63)
It is one object of some embodiments to provide a distributed machine learning based vehicular traffic prediction method that can accurately predict short time, middle time and long time traffic for both city roads and highways. Additionally, it is another object of some embodiments to make optimal route planning for vehicles to minimize traffic congestion and travel time by using the accurate traffic prediction.[P-9]
A learning server coordinates distributed learning among a set of data collectors. A learning server first designs and distributes the traffic models such as neural networks to the set of data collectors for the first round of the distributed training Each data collector then trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training process, each data collector sends the trained traffic models to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models. Upon completion of the model aggregation, the learning server re-distributes the aggregated traffic models to data collectors for the second round of training process. This process of training and aggregation continues until the robust traffic models are built.[P-11]
According to some embodiments of the present invention, a distributed machine learning based traffic prediction method can be provided for predicting traffic of roads. In this case, the distributed machine learning based traffic prediction method may a computer-implemented distributed machine learning based traffic prediction method for predicting traffic of roads. The method may include distributing global multi-task traffic models by a learning server to learning agents to locally train the traffic models; uploading locally trained global multi-task traffic models to the learning server, wherein the locally trained global multi-task traffic models have been trained by the learning agents; updating the global multi-task traffic models by the learning server using the locally trained global multi-task traffic models uploaded from the learning agents; generating a time-dependent global traffic map by the learning server using the updated global multi-task traffic models; and distributing the time-dependent global traffic map to each of vehicles traveling on the roads.[P-17]
To reap all the aforementioned benefits, the traffic prediction must process the real-time and historical traffic data and observations collected by data collectors and mobile devices. For example, the inductive loop can measure the travel speed by reading the inductance changes over time and such data can be used for the traffic speed prediction. In addition, the wide use of mobile devices (e.g., on-board global position systems and phones) enables the mobility data to be crowdsourced from the general public, further facilitating the unprecedented traffic data collection. Such emerging big data can substantially augment the data availability in terms of the coverage and fidelity and significantly boost the data-driven traffic prediction. The prior art on the traffic prediction can be mainly grouped into two categories. The first category focus on using parametric approaches, such as autoregressive integrated moving average (ARIMA) model and Kalman filtering models. When dealing with the traffic only presenting regular variations, e.g., recurrent traffic congestion occurred in morning and evening rush hour, the parametric approaches can achieve promising prediction results. However, due to the stochastic and nonlinear nature of the road traffic, the traffic predictions of using the parametric approaches can deviate from the actual values especially in the abrupt traffic. Hence, instead of fitting the traffic data into a mathematical model as done by the parametric approach, an alternative way is using the nonparametric approaches where the machine learning (ML) based method is used. For example, a stacked autoencoder model can be used to learn the generic traffic flow features for the predictions. The long short-term memory (LSTM) recurrent neural network (RNN) can be used to predict the traffic flow, speed and occupancy, based on the data collected by the data collector and its upstream and downstream data collectors. Along with the use of RNN, the convolution neural network (CNN) can also be utilized to capture the latent traffic evolution patterns within the underlying road network.[P-48]
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
The learning server 115 may be referred to as a distributed machine learning based traffic predication server. The learning server 115 is configured to provide map information with respect to traffic predictions to vehicles 120 traveling on a road network. The learning server 115 may include one or more processors 121, a memory 140, a memory unit/storage 200 configured to store traffic prediction neural networks 132, traffic data cluster rules 134, global time-dependent map 135, trained multi-task traffic models 136, global map of road network 137, an input interface (or transceiver) 150 configured to communicate with the learning agent 110 via the communication network 112 and update model parameters of the traffic prediction neural networks 132. The trained multi-task traffic models 136 may be the traffic models 310 shown in FIG. 3A.[P-63]
Guo establishes M clusters to represent different traffic models attributed to different traffic areas, where Guo illustrates one area may be freeway traffic to which would labeled Cluster 1, another area may be city traffic (Cluster 2), thereby satisfying the condition of M being two or more.  Furthermore, Guo teaches data collectors/sensors, established as data collector N.  The data collector(s) may be a toll station, a loop detector, or a camera. In which case depending on the area generating  the traffic/cluster model, a desired number of sensors/detectors may be set, such that a desired ratio of  N<M may be met, i.e.data collectors: cluster model. 
In P-48, we see the integration of real-time traffic data with passed sensed traffic data, congestion related traffic information used to aggregate and enable the prediction of future traffic patterns
Also in P-60 and Fig 3A, it is illustrated and explained that several different traffic models are generated, where one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection
Guo fails to verbatim teach generating a second model that is a learning model for predicting a future congestion level from the congestion-related information, the second model being generated by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second time point being a time point after the first time point.  
 However, as taught in P-60 multiple traffic models, where for example, one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection.  Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc.[P-60]
Hence the next round of learning, entails generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data (the previous aggregated traffic model) as input data and using the congestion level in the congestion-related information of a second time point (a different time point in rush hour) indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second time point being a time point after the first time point( a different time in rush hour after the previous time in the preceding learning model).
Therefore, though Guo’s teaching does not verbatim recite generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data, by teaching the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data, it is obvious to one of ordinary skill in the art that Guo’s teaching reads on the applicant’s second generated model using substitution of the applicant’s teaching with Guo’s teaching to further yield the exact same results in order to enable a more efficient and robust traffic prediction model.  
Guo however fails to teach acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection; using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level of any of the m areas: and displaying, by the processor, to users at the station the predicted congestion- related information.
Whereas, Huang teaches acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection, (Page 3, Paragraph 2; Page 5, Paragraphs 5-10, )
Preferably, the monitoring terminal comprises a traffic monitoring terminal for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal for counting the pedestrian density at a certain time in a certain time, and a conflict monitoring terminal for counting the pedestrian conflict degree in a certain area and an abnormal monitoring terminal for identifying the pedestrian abnormal behaviour in a certain area.[Pg 3, P-2]
Further, the monitoring terminal 2 comprises a traffic monitoring terminal 201 for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal 202 for counting a certain area pedestrian density at a certain time, a conflict monitoring terminal 203 for counting pedestrian conflict degree in a certain area and an abnormal monitoring terminal 204 for identifying pedestrian abnormal behavior in a certain area, each terminal is composed of a sensor, an embedded processing chip, a processing program and a communication module.[Pg 5, P-5]
the sensor of the flow monitoring terminal 201 is a binocular video camera embedded processing chip and flow statistics method according to the video data obtained by the sensor, counting the pedestrian number in and out of the specified area and calculating the flow rate (unit: The cumulative flow rate (unit: 1 %) and the cumulative flow rate (unit: human), then the traffic data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-6]
the sensor of the density monitoring terminal 202 is a wide-angle video camera embedded processing chip and density statistical method according to the video data obtained by the sensor, counting the pedestrian distribution of the specified area and calculating the density (unit of each point after discrete area of the area: the density data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-7]
the sensor of the conflict monitoring terminal 203 is high video camera, embedded processing chip and conflict calculation algorithm according to the video data obtained by the sensor, counting the appointed area pedestrian speed direction and size change rate, relative direction pedestrian number, local density and other parameters, and according to the direction and size change rate of the pedestrian speed in the specified area, relative direction pedestrian number, local density parameter calculating conflict coefficient (dimensionless parameter), then the conflict coefficient is uploaded to the management control terminal 1 through the communication module[Pg 5, P-8]
the sensor of the abnormal monitoring terminal 204 is high video camera, embedded processing chip and the abnormal detection algorithm according to the video data obtained by the sensor, detecting whether there is abnormal behaviour in the specified area, such as a fight, terrorist attack, fast escape, crowding, then the type code of the abnormal behaviour is uploaded to the management control terminal 1 through the communication module. [Pg 5, P-9]
Further, the guide terminal 3 comprises an acoustic guide terminal 301 for conveying sound instruction, for area division, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304. The area separation terminal 303 for the dynamic adjustment of the pedestrian and vehicular traffic area and the guide terminal 305 for order maintenance and manual assistance 305. [Pg 5, P-10]

Huang teaches further using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level (Page 3, Paragraphs 3, 4, 9; Page 3, Paragraph 9-Page 4, Paragraph 1; Page 6, Paragraphs 2, 3, 7, 8 )
Preferably, the management control terminal comprises a region modeling module, a distribution estimation module connected with the region modeling module signal, a distribution prediction module connected with the distribution estimation module signal, and a risk evaluation module connected with the distribution prediction module and the distribution estimation module signal, a simulation and optimization module connected with the risk evaluation module signal and a guide control module connected with the simulation and optimization module signal. [Pg 3, P-3]
Preferably, the simulation and optimization module comprises a personnel evacuation model and personnel flow model, when the risk assessment result of the risk assessment module is low comfort level personnel flow module, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of simulating pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and flow inducing scheme [Pg 3, P-4]
the distribution prediction module according to the history data of the distribution estimation result, establishing model based on machine learning algorithm to predict the future period of pedestrian distribution parameter, and uploading data continuous learning model parameter according to the future monitoring terminal.[Pg 3, P-9]
Preferably, the risk evaluation module signal is connected with a conflict monitoring terminal and an abnormal monitoring terminal, and according to the distribution prediction module, the output result of the distribution estimation module and conflict monitoring terminal, the uploading data of the abnormal monitoring terminal evaluates the possibility of the danger of congestion and trampling on the comfort level region. [Pg 3, P-9-Pg 4, P-1]
Further, the management control terminal 1 comprises a region modeling module 101, and the region modeling module 101 signal connection of the distribution estimation module 102, and the distribution prediction module 102 signal connection distribution prediction module 103. a risk evaluation module 104 connected with the distribution prediction module 103 and the distribution estimation module 102, a simulation and optimization module 105 connected with the risk evaluation module 104 signal and a guide control module 106 connected with the simulation and optimization module 105 signal.[Pg 6, P-2]
Further, the simulation and optimization module 105 comprises a personnel evacuation model and personnel flow model, when the risk evaluation result of the risk evaluation module 104 is low comfort level personnel flow module, management control terminal 1 sends attention area low comfort level alarm, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of the simulated pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and a flow inducing scheme, after the management staff 4 of the auditing by the guide control module 106 sends the inducing information to the guide terminal 3;[Pg 6, P-3]
the distribution prediction module 103 according to the history data of the distribution estimation result, based on the machine learning algorithm establishing model prediction future period of pedestrian distribution parameter, and according to the future monitoring terminal 2 uploading data continuous learning model parameter.[Pg 6, P-7]
Further, the risk evaluation module 104 signal connected with the conflict monitoring terminal 203 and the abnormal monitoring terminal 204, and according to the distribution prediction module 103, distribution estimation module 102 the output result and conflict monitoring terminal 203. The uploading data of the abnormal monitoring terminal 204 evaluates the possibility of the comfort level of congestion and trampling on the region of interest and the occurrence of congestion and trample.[Pg 6, P-8]

Here, we see Huang teaching data from one of more sensors at a specific area of interest (station), using a module synonymous to the first model to predict, from the sensor data, congestion information relating to the one or more sensors that detect the congestion.  Furthermore, Huang uses module for capturing and evaluating congestion data synonymous to the second prediction model/deep learning model to analyze in more depth time series related data to further analyze and predict more accurate parameters affecting the traffic/congestion to future traffic congestion.  When taken in combination with Guo’s teaching of the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data one of ordinary skill in the art may then be able to determine, the future congestion level of any of the specific (m) areas of the various stations and thereafter displaying, by the processor, to users at the station the predicted congestion-related information.
It would have been obvious to one of ordinary skill in the art to combine Guo’s teaching with Huang’s teaching in order to have a more accurate prediction model to determine congestion/traffic activities and behaviors within a given area.

In regards to claim 16, Guo teaches a learning method comprising: generating a first model by using, as input data, the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of m areas (where m is an integer of two or more) using, as correct data, values indicated by past sensor data indicating the values detected in the past by the one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of the m areas, the first model being a learning model for predicting, from congestion-related information, values to be detected by one or more sensors of a time point at which the congestion-related information is acquired; using the first model to predict, from the congestion-related information at a first time point indicated by the past congestion-area data, values to be detected by the one or more sensors at the first time point (Paragraphs 9, 11, 17, 48, 60, 63)
It is one object of some embodiments to provide a distributed machine learning based vehicular traffic prediction method that can accurately predict short time, middle time and long time traffic for both city roads and highways. Additionally, it is another object of some embodiments to make optimal route planning for vehicles to minimize traffic congestion and travel time by using the accurate traffic prediction.[P-9]
A learning server coordinates distributed learning among a set of data collectors. A learning server first designs and distributes the traffic models such as neural networks to the set of data collectors for the first round of the distributed training Each data collector then trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training process, each data collector sends the trained traffic models to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models. Upon completion of the model aggregation, the learning server re-distributes the aggregated traffic models to data collectors for the second round of training process. This process of training and aggregation continues until the robust traffic models are built.[P-11]
According to some embodiments of the present invention, a distributed machine learning based traffic prediction method can be provided for predicting traffic of roads. In this case, the distributed machine learning based traffic prediction method may a computer-implemented distributed machine learning based traffic prediction method for predicting traffic of roads. The method may include distributing global multi-task traffic models by a learning server to learning agents to locally train the traffic models; uploading locally trained global multi-task traffic models to the learning server, wherein the locally trained global multi-task traffic models have been trained by the learning agents; updating the global multi-task traffic models by the learning server using the locally trained global multi-task traffic models uploaded from the learning agents; generating a time-dependent global traffic map by the learning server using the updated global multi-task traffic models; and distributing the time-dependent global traffic map to each of vehicles traveling on the roads.[P-17]
To reap all the aforementioned benefits, the traffic prediction must process the real-time and historical traffic data and observations collected by data collectors and mobile devices. For example, the inductive loop can measure the travel speed by reading the inductance changes over time and such data can be used for the traffic speed prediction. In addition, the wide use of mobile devices (e.g., on-board global position systems and phones) enables the mobility data to be crowdsourced from the general public, further facilitating the unprecedented traffic data collection. Such emerging big data can substantially augment the data availability in terms of the coverage and fidelity and significantly boost the data-driven traffic prediction. The prior art on the traffic prediction can be mainly grouped into two categories. The first category focus on using parametric approaches, such as autoregressive integrated moving average (ARIMA) model and Kalman filtering models. When dealing with the traffic only presenting regular variations, e.g., recurrent traffic congestion occurred in morning and evening rush hour, the parametric approaches can achieve promising prediction results. However, due to the stochastic and nonlinear nature of the road traffic, the traffic predictions of using the parametric approaches can deviate from the actual values especially in the abrupt traffic. Hence, instead of fitting the traffic data into a mathematical model as done by the parametric approach, an alternative way is using the nonparametric approaches where the machine learning (ML) based method is used. For example, a stacked autoencoder model can be used to learn the generic traffic flow features for the predictions. The long short-term memory (LSTM) recurrent neural network (RNN) can be used to predict the traffic flow, speed and occupancy, based on the data collected by the data collector and its upstream and downstream data collectors. Along with the use of RNN, the convolution neural network (CNN) can also be utilized to capture the latent traffic evolution patterns within the underlying road network.[P-48]
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
The learning server 115 may be referred to as a distributed machine learning based traffic predication server. The learning server 115 is configured to provide map information with respect to traffic predictions to vehicles 120 traveling on a road network. The learning server 115 may include one or more processors 121, a memory 140, a memory unit/storage 200 configured to store traffic prediction neural networks 132, traffic data cluster rules 134, global time-dependent map 135, trained multi-task traffic models 136, global map of road network 137, an input interface (or transceiver) 150 configured to communicate with the learning agent 110 via the communication network 112 and update model parameters of the traffic prediction neural networks 132. The trained multi-task traffic models 136 may be the traffic models 310 shown in FIG. 3A.[P-63]
Guo establishes M clusters to represent different traffic models attributed to different traffic areas, where Guo illustrates one area may be freeway traffic to which would labeled Cluster 1, another area may be city traffic (Cluster 2), thereby satisfying the condition of M being two or more.  Furthermore, Guo teaches data collectors/sensors, established as data collector N.  The data collector(s) may be a toll station, a loop detector, or a camera. In which case depending on the area generating  the traffic/cluster model, a desired number of sensors/detectors may be set, such that a desired ratio of  N<M may be met, i.e.data collectors: cluster model. 
In P-48, we see the integration of real-time traffic data with passed sensed traffic data, congestion related traffic information used to aggregate and enable the prediction of future traffic patterns
Also in P-60 and Fig 3A, it is illustrated and explained that several different traffic models are generated, where one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection
Guo fails to verbatim teach generating a second model that is a learning model for predicting a future congestion level from the congestion-related information, the second model being generated by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second time point being a time point after the first time point.  
 However, as taught in P-60 multiple traffic models, where for example, one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection.  Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc.[P-60]
Hence the next round of learning, entails generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data (the previous aggregated traffic model) as input data and using the congestion level in the congestion-related information of a second time point (a different time point in rush hour) indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second time point being a time point after the first time point( a different time in rush hour after the previous time in the preceding learning model).
Therefore, though Guo’s teaching does not verbatim recite generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data, by teaching the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data, it is obvious to one of ordinary skill in the art that Guo’s teaching reads on the applicant’s second generated model using substitution of the applicant’s teaching with Guo’s teaching to further yield the exact same results in order to enable a more efficient and robust traffic prediction model.  
Guo however fails to teach acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection; using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level of any of the m areas: and displaying, by the processor, to users at the station the predicted congestion- related information.
Whereas, Huang teaches acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection, (Page 3, Paragraph 2; Page 5, Paragraphs 5-10, )
Preferably, the monitoring terminal comprises a traffic monitoring terminal for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal for counting the pedestrian density at a certain time in a certain time, and a conflict monitoring terminal for counting the pedestrian conflict degree in a certain area and an abnormal monitoring terminal for identifying the pedestrian abnormal behaviour in a certain area.[Pg 3, P-2]
Further, the monitoring terminal 2 comprises a traffic monitoring terminal 201 for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal 202 for counting a certain area pedestrian density at a certain time, a conflict monitoring terminal 203 for counting pedestrian conflict degree in a certain area and an abnormal monitoring terminal 204 for identifying pedestrian abnormal behavior in a certain area, each terminal is composed of a sensor, an embedded processing chip, a processing program and a communication module.[Pg 5, P-5]
the sensor of the flow monitoring terminal 201 is a binocular video camera embedded processing chip and flow statistics method according to the video data obtained by the sensor, counting the pedestrian number in and out of the specified area and calculating the flow rate (unit: The cumulative flow rate (unit: 1 %) and the cumulative flow rate (unit: human), then the traffic data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-6]
the sensor of the density monitoring terminal 202 is a wide-angle video camera embedded processing chip and density statistical method according to the video data obtained by the sensor, counting the pedestrian distribution of the specified area and calculating the density (unit of each point after discrete area of the area: the density data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-7]
the sensor of the conflict monitoring terminal 203 is high video camera, embedded processing chip and conflict calculation algorithm according to the video data obtained by the sensor, counting the appointed area pedestrian speed direction and size change rate, relative direction pedestrian number, local density and other parameters, and according to the direction and size change rate of the pedestrian speed in the specified area, relative direction pedestrian number, local density parameter calculating conflict coefficient (dimensionless parameter), then the conflict coefficient is uploaded to the management control terminal 1 through the communication module[Pg 5, P-8]
the sensor of the abnormal monitoring terminal 204 is high video camera, embedded processing chip and the abnormal detection algorithm according to the video data obtained by the sensor, detecting whether there is abnormal behaviour in the specified area, such as a fight, terrorist attack, fast escape, crowding, then the type code of the abnormal behaviour is uploaded to the management control terminal 1 through the communication module. [Pg 5, P-9]
Further, the guide terminal 3 comprises an acoustic guide terminal 301 for conveying sound instruction, for area division, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304. The area separation terminal 303 for the dynamic adjustment of the pedestrian and vehicular traffic area and the guide terminal 305 for order maintenance and manual assistance 305. [Pg 5, P-10]

Huang teaches further using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level (Page 3, Paragraphs 3, 4, 9; Page 3, Paragraph 9-Page 4, Paragraph 1; Page 6, Paragraphs 2, 3, 7, 8 )
Preferably, the management control terminal comprises a region modeling module, a distribution estimation module connected with the region modeling module signal, a distribution prediction module connected with the distribution estimation module signal, and a risk evaluation module connected with the distribution prediction module and the distribution estimation module signal, a simulation and optimization module connected with the risk evaluation module signal and a guide control module connected with the simulation and optimization module signal. [Pg 3, P-3]
Preferably, the simulation and optimization module comprises a personnel evacuation model and personnel flow model, when the risk assessment result of the risk assessment module is low comfort level personnel flow module, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of simulating pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and flow inducing scheme [Pg 3, P-4]
the distribution prediction module according to the history data of the distribution estimation result, establishing model based on machine learning algorithm to predict the future period of pedestrian distribution parameter, and uploading data continuous learning model parameter according to the future monitoring terminal.[Pg 3, P-9]
Preferably, the risk evaluation module signal is connected with a conflict monitoring terminal and an abnormal monitoring terminal, and according to the distribution prediction module, the output result of the distribution estimation module and conflict monitoring terminal, the uploading data of the abnormal monitoring terminal evaluates the possibility of the danger of congestion and trampling on the comfort level region. [Pg 3, P-9-Pg 4, P-1]
Further, the management control terminal 1 comprises a region modeling module 101, and the region modeling module 101 signal connection of the distribution estimation module 102, and the distribution prediction module 102 signal connection distribution prediction module 103. a risk evaluation module 104 connected with the distribution prediction module 103 and the distribution estimation module 102, a simulation and optimization module 105 connected with the risk evaluation module 104 signal and a guide control module 106 connected with the simulation and optimization module 105 signal.[Pg 6, P-2]
Further, the simulation and optimization module 105 comprises a personnel evacuation model and personnel flow model, when the risk evaluation result of the risk evaluation module 104 is low comfort level personnel flow module, management control terminal 1 sends attention area low comfort level alarm, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of the simulated pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and a flow inducing scheme, after the management staff 4 of the auditing by the guide control module 106 sends the inducing information to the guide terminal 3;[Pg 6, P-3]
the distribution prediction module 103 according to the history data of the distribution estimation result, based on the machine learning algorithm establishing model prediction future period of pedestrian distribution parameter, and according to the future monitoring terminal 2 uploading data continuous learning model parameter.[Pg 6, P-7]
Further, the risk evaluation module 104 signal connected with the conflict monitoring terminal 203 and the abnormal monitoring terminal 204, and according to the distribution prediction module 103, distribution estimation module 102 the output result and conflict monitoring terminal 203. The uploading data of the abnormal monitoring terminal 204 evaluates the possibility of the comfort level of congestion and trampling on the region of interest and the occurrence of congestion and trample.[Pg 6, P-8]

Here, we see Huang teaching data from one of more sensors at a specific area of interest (station), using a module synonymous to the first model to predict, from the sensor data, congestion information relating to the one or more sensors that detect the congestion.  Furthermore, Huang uses module for capturing and evaluating congestion data synonymous to the second prediction model/deep learning model to analyze in more depth time series related data to further analyze and predict more accurate parameters affecting the traffic/congestion to future traffic congestion.  When taken in combination with Guo’s teaching of the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data one of ordinary skill in the art may then be able to determine, the future congestion level of any of the specific (m) areas of the various stations and thereafter displaying, by the processor, to users at the station the predicted congestion-related information.
It would have been obvious to one of ordinary skill in the art to combine Guo’s teaching with Huang’s teaching in order to have a more accurate prediction model to determine congestion/traffic activities and behaviors within a given area.

In regards to claim 17, Guo teaches a prediction method comprising: acquiring values detected by one or more sensors installed in n areas (where n is an integer of one or more, and n<m) out of m areas (where m is an integer of two or more); using a first model to predict, from the acquired values, congestion-related information of a time point at which the one or more sensors perform detection, the first model being a learning model for predicting, from values to be detected by the one or more sensors, the congestion-related information of the time point at which the one or more sensors perform detection, the first model being generated by using values indicated by past sensor data indicating the values detected in the past by the one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of the m areas as input data and using the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of the m areas as correct data(Paragraphs 9, 11, 17, 48, 60, 63)
It is one object of some embodiments to provide a distributed machine learning based vehicular traffic prediction method that can accurately predict short time, middle time and long time traffic for both city roads and highways. Additionally, it is another object of some embodiments to make optimal route planning for vehicles to minimize traffic congestion and travel time by using the accurate traffic prediction.[P-9]
A learning server coordinates distributed learning among a set of data collectors. A learning server first designs and distributes the traffic models such as neural networks to the set of data collectors for the first round of the distributed training Each data collector then trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training process, each data collector sends the trained traffic models to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models. Upon completion of the model aggregation, the learning server re-distributes the aggregated traffic models to data collectors for the second round of training process. This process of training and aggregation continues until the robust traffic models are built.[P-11]
According to some embodiments of the present invention, a distributed machine learning based traffic prediction method can be provided for predicting traffic of roads. In this case, the distributed machine learning based traffic prediction method may a computer-implemented distributed machine learning based traffic prediction method for predicting traffic of roads. The method may include distributing global multi-task traffic models by a learning server to learning agents to locally train the traffic models; uploading locally trained global multi-task traffic models to the learning server, wherein the locally trained global multi-task traffic models have been trained by the learning agents; updating the global multi-task traffic models by the learning server using the locally trained global multi-task traffic models uploaded from the learning agents; generating a time-dependent global traffic map by the learning server using the updated global multi-task traffic models; and distributing the time-dependent global traffic map to each of vehicles traveling on the roads.[P-17]
To reap all the aforementioned benefits, the traffic prediction must process the real-time and historical traffic data and observations collected by data collectors and mobile devices. For example, the inductive loop can measure the travel speed by reading the inductance changes over time and such data can be used for the traffic speed prediction. In addition, the wide use of mobile devices (e.g., on-board global position systems and phones) enables the mobility data to be crowdsourced from the general public, further facilitating the unprecedented traffic data collection. Such emerging big data can substantially augment the data availability in terms of the coverage and fidelity and significantly boost the data-driven traffic prediction. The prior art on the traffic prediction can be mainly grouped into two categories. The first category focus on using parametric approaches, such as autoregressive integrated moving average (ARIMA) model and Kalman filtering models. When dealing with the traffic only presenting regular variations, e.g., recurrent traffic congestion occurred in morning and evening rush hour, the parametric approaches can achieve promising prediction results. However, due to the stochastic and nonlinear nature of the road traffic, the traffic predictions of using the parametric approaches can deviate from the actual values especially in the abrupt traffic. Hence, instead of fitting the traffic data into a mathematical model as done by the parametric approach, an alternative way is using the nonparametric approaches where the machine learning (ML) based method is used. For example, a stacked autoencoder model can be used to learn the generic traffic flow features for the predictions. The long short-term memory (LSTM) recurrent neural network (RNN) can be used to predict the traffic flow, speed and occupancy, based on the data collected by the data collector and its upstream and downstream data collectors. Along with the use of RNN, the convolution neural network (CNN) can also be utilized to capture the latent traffic evolution patterns within the underlying road network.[P-48]

 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
The learning server 115 may be referred to as a distributed machine learning based traffic predication server. The learning server 115 is configured to provide map information with respect to traffic predictions to vehicles 120 traveling on a road network. The learning server 115 may include one or more processors 121, a memory 140, a memory unit/storage 200 configured to store traffic prediction neural networks 132, traffic data cluster rules 134, global time-dependent map 135, trained multi-task traffic models 136, global map of road network 137, an input interface (or transceiver) 150 configured to communicate with the learning agent 110 via the communication network 112 and update model parameters of the traffic prediction neural networks 132. The trained multi-task traffic models 136 may be the traffic models 310 shown in FIG. 3A.[P-63]
Guo establishes M clusters to represent different traffic models attributed to different traffic areas, where Guo illustrates one area may be freeway traffic to which would labeled Cluster 1, another area may be city traffic (Cluster 2), thereby satisfying the condition of M being two or more.  Furthermore, Guo teaches data collectors/sensors, established as data collector N.  The data collector(s) may be a toll station, a loop detector, or a camera. In which case depending on the area generating  the traffic/cluster model, a desired number of sensors/detectors may be set, such that a desired ratio of  N<M may be met, i.e.data collectors: cluster model. 
In P-48, we see the integration of real-time traffic data with passed sensed traffic data, congestion related traffic information used to aggregate and enable the prediction of future traffic patterns
Also in P-60 and Fig 3A, it is illustrated and explained that several different traffic models are generated, where one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection
Guo fails to verbatim teach using a second model to predict, from the predicted congestion-related information, a future congestion level of any of the m areas, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second model being generated by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second time point being a time point after the first time point.  
 However, as taught in P-60 multiple traffic models, where for example, one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection.  Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc.[P-60]
Hence the next round of learning, entails generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data (the previous aggregated traffic model) as input data and using the congestion level in the congestion-related information of a second time point (a different time point in rush hour) indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second time point being a time point after the first time point( a different time in rush hour after the previous time in the preceding learning model).
Therefore, though Guo’s teaching does not verbatim recite generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data, by teaching the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data, it is obvious to one of ordinary skill in the art that Guo’s teaching reads on the applicant’s second generated model using substitution of the applicant’s teaching with Guo’s teaching to further yield the exact same results in order to enable a more efficient and robust traffic prediction model.  
Guo however fails to teach displaying, by the processor, to users at the station the future congestion level of any of the m areas.
Guo however fails to teach acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection; using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level of any of the m areas: and displaying, by the processor, to users at the station the predicted congestion- related information.
Whereas, Huang teaches acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection, (Page 3, Paragraph 2; Page 5, Paragraphs 5-10, )
Preferably, the monitoring terminal comprises a traffic monitoring terminal for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal for counting the pedestrian density at a certain time in a certain time, and a conflict monitoring terminal for counting the pedestrian conflict degree in a certain area and an abnormal monitoring terminal for identifying the pedestrian abnormal behaviour in a certain area.[Pg 3, P-2]
Further, the monitoring terminal 2 comprises a traffic monitoring terminal 201 for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal 202 for counting a certain area pedestrian density at a certain time, a conflict monitoring terminal 203 for counting pedestrian conflict degree in a certain area and an abnormal monitoring terminal 204 for identifying pedestrian abnormal behavior in a certain area, each terminal is composed of a sensor, an embedded processing chip, a processing program and a communication module.[Pg 5, P-5]
the sensor of the flow monitoring terminal 201 is a binocular video camera embedded processing chip and flow statistics method according to the video data obtained by the sensor, counting the pedestrian number in and out of the specified area and calculating the flow rate (unit: The cumulative flow rate (unit: 1 %) and the cumulative flow rate (unit: human), then the traffic data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-6]
the sensor of the density monitoring terminal 202 is a wide-angle video camera embedded processing chip and density statistical method according to the video data obtained by the sensor, counting the pedestrian distribution of the specified area and calculating the density (unit of each point after discrete area of the area: the density data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-7]
the sensor of the conflict monitoring terminal 203 is high video camera, embedded processing chip and conflict calculation algorithm according to the video data obtained by the sensor, counting the appointed area pedestrian speed direction and size change rate, relative direction pedestrian number, local density and other parameters, and according to the direction and size change rate of the pedestrian speed in the specified area, relative direction pedestrian number, local density parameter calculating conflict coefficient (dimensionless parameter), then the conflict coefficient is uploaded to the management control terminal 1 through the communication module[Pg 5, P-8]
the sensor of the abnormal monitoring terminal 204 is high video camera, embedded processing chip and the abnormal detection algorithm according to the video data obtained by the sensor, detecting whether there is abnormal behaviour in the specified area, such as a fight, terrorist attack, fast escape, crowding, then the type code of the abnormal behaviour is uploaded to the management control terminal 1 through the communication module. [Pg 5, P-9]
Further, the guide terminal 3 comprises an acoustic guide terminal 301 for conveying sound instruction, for area division, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304. The area separation terminal 303 for the dynamic adjustment of the pedestrian and vehicular traffic area and the guide terminal 305 for order maintenance and manual assistance 305. [Pg 5, P-10]

Huang teaches further using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level (Page 3, Paragraphs 3, 4, 9; Page 3, Paragraph 9-Page 4, Paragraph 1; Page 6, Paragraphs 2, 3, 7, 8 )
Preferably, the management control terminal comprises a region modeling module, a distribution estimation module connected with the region modeling module signal, a distribution prediction module connected with the distribution estimation module signal, and a risk evaluation module connected with the distribution prediction module and the distribution estimation module signal, a simulation and optimization module connected with the risk evaluation module signal and a guide control module connected with the simulation and optimization module signal. [Pg 3, P-3]
Preferably, the simulation and optimization module comprises a personnel evacuation model and personnel flow model, when the risk assessment result of the risk assessment module is low comfort level personnel flow module, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of simulating pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and flow inducing scheme [Pg 3, P-4]
the distribution prediction module according to the history data of the distribution estimation result, establishing model based on machine learning algorithm to predict the future period of pedestrian distribution parameter, and uploading data continuous learning model parameter according to the future monitoring terminal.[Pg 3, P-9]
Preferably, the risk evaluation module signal is connected with a conflict monitoring terminal and an abnormal monitoring terminal, and according to the distribution prediction module, the output result of the distribution estimation module and conflict monitoring terminal, the uploading data of the abnormal monitoring terminal evaluates the possibility of the danger of congestion and trampling on the comfort level region. [Pg 3, P-9-Pg 4, P-1]
Further, the management control terminal 1 comprises a region modeling module 101, and the region modeling module 101 signal connection of the distribution estimation module 102, and the distribution prediction module 102 signal connection distribution prediction module 103. a risk evaluation module 104 connected with the distribution prediction module 103 and the distribution estimation module 102, a simulation and optimization module 105 connected with the risk evaluation module 104 signal and a guide control module 106 connected with the simulation and optimization module 105 signal.[Pg 6, P-2]
Further, the simulation and optimization module 105 comprises a personnel evacuation model and personnel flow model, when the risk evaluation result of the risk evaluation module 104 is low comfort level personnel flow module, management control terminal 1 sends attention area low comfort level alarm, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of the simulated pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and a flow inducing scheme, after the management staff 4 of the auditing by the guide control module 106 sends the inducing information to the guide terminal 3;[Pg 6, P-3]
the distribution prediction module 103 according to the history data of the distribution estimation result, based on the machine learning algorithm establishing model prediction future period of pedestrian distribution parameter, and according to the future monitoring terminal 2 uploading data continuous learning model parameter.[Pg 6, P-7]
Further, the risk evaluation module 104 signal connected with the conflict monitoring terminal 203 and the abnormal monitoring terminal 204, and according to the distribution prediction module 103, distribution estimation module 102 the output result and conflict monitoring terminal 203. The uploading data of the abnormal monitoring terminal 204 evaluates the possibility of the comfort level of congestion and trampling on the region of interest and the occurrence of congestion and trample.[Pg 6, P-8]


Here, we see Huang teaching data from one of more sensors at a specific area of interest (station), using a module synonymous to the first model to predict, from the sensor data, congestion information relating to the one or more sensors that detect the congestion.  Furthermore, Huang uses module for capturing and evaluating congestion data synonymous to the second prediction model/deep learning model to analyze in more depth time series related data to further analyze and predict more accurate parameters affecting the traffic/congestion to future traffic congestion.  
Huang then teaches, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304, thereby displaying congestion result data as well as prediction congestion data
When taken in combination with Guo’s teaching of the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data one of ordinary skill in the art may then be able to determine, the future congestion level of any of the specific (m) areas of the various stations and thereafter displaying, by the processor, to users at the station the predicted congestion-related information.
It would have been obvious to one of ordinary skill in the art to combine Guo’s teaching with Huang’s teaching in order to have a more accurate prediction model to determine congestion/traffic activities and behaviors within a given area.

In regards to claim 18, Guo teaches a prediction method comprising: acquiring values detected by one or more sensors installed in n areas (where n is an integer of one or more, and n<m) out of m areas (where m is an integer of two or more) (Paragraphs 9, 11, 17, 60, 63)
It is one object of some embodiments to provide a distributed machine learning based vehicular traffic prediction method that can accurately predict short time, middle time and long time traffic for both city roads and highways. Additionally, it is another object of some embodiments to make optimal route planning for vehicles to minimize traffic congestion and travel time by using the accurate traffic prediction.[P-9]
A learning server coordinates distributed learning among a set of data collectors. A learning server first designs and distributes the traffic models such as neural networks to the set of data collectors for the first round of the distributed training Each data collector then trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training process, each data collector sends the trained traffic models to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models. Upon completion of the model aggregation, the learning server re-distributes the aggregated traffic models to data collectors for the second round of training process. This process of training and aggregation continues until the robust traffic models are built.[P-11]
According to some embodiments of the present invention, a distributed machine learning based traffic prediction method can be provided for predicting traffic of roads. In this case, the distributed machine learning based traffic prediction method may a computer-implemented distributed machine learning based traffic prediction method for predicting traffic of roads. The method may include distributing global multi-task traffic models by a learning server to learning agents to locally train the traffic models; uploading locally trained global multi-task traffic models to the learning server, wherein the locally trained global multi-task traffic models have been trained by the learning agents; updating the global multi-task traffic models by the learning server using the locally trained global multi-task traffic models uploaded from the learning agents; generating a time-dependent global traffic map by the learning server using the updated global multi-task traffic models; and distributing the time-dependent global traffic map to each of vehicles traveling on the roads.[P-17]
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
The learning server 115 may be referred to as a distributed machine learning based traffic predication server. The learning server 115 is configured to provide map information with respect to traffic predictions to vehicles 120 traveling on a road network. The learning server 115 may include one or more processors 121, a memory 140, a memory unit/storage 200 configured to store traffic prediction neural networks 132, traffic data cluster rules 134, global time-dependent map 135, trained multi-task traffic models 136, global map of road network 137, an input interface (or transceiver) 150 configured to communicate with the learning agent 110 via the communication network 112 and update model parameters of the traffic prediction neural networks 132. The trained multi-task traffic models 136 may be the traffic models 310 shown in FIG. 3A.[P-63]
Guo establishes M clusters to represent different traffic models attributed to different traffic areas, where Guo illustrates one area may be freeway traffic to which would labeled Cluster 1, another area may be city traffic (Cluster 2), thereby satisfying the condition of M being two or more.  Furthermore, Guo teaches data collectors/sensors, established as data collector N.  The data collector(s) may be a toll station, a loop detector, or a camera. In which case depending on the area generating  the traffic/cluster model, a desired number of sensors/detectors may be set, such that a desired ratio of  N<M may be met, i.e.data collectors: cluster model. 
Guo teaches generating a first model by using the congestion-related information indicated by the past congestion-area data as input data and using the values indicated by the past sensor data as correct data (Paragraph 48)
To reap all the aforementioned benefits, the traffic prediction must process the real-time and historical traffic data and observations collected by data collectors and mobile devices. For example, the inductive loop can measure the travel speed by reading the inductance changes over time and such data can be used for the traffic speed prediction. In addition, the wide use of mobile devices (e.g., on-board global position systems and phones) enables the mobility data to be crowdsourced from the general public, further facilitating the unprecedented traffic data collection. Such emerging big data can substantially augment the data availability in terms of the coverage and fidelity and significantly boost the data-driven traffic prediction. The prior art on the traffic prediction can be mainly grouped into two categories. The first category focus on using parametric approaches, such as autoregressive integrated moving average (ARIMA) model and Kalman filtering models. When dealing with the traffic only presenting regular variations, e.g., recurrent traffic congestion occurred in morning and evening rush hour, the parametric approaches can achieve promising prediction results. However, due to the stochastic and nonlinear nature of the road traffic, the traffic predictions of using the parametric approaches can deviate from the actual values especially in the abrupt traffic. Hence, instead of fitting the traffic data into a mathematical model as done by the parametric approach, an alternative way is using the nonparametric approaches where the machine learning (ML) based method is used. For example, a stacked autoencoder model can be used to learn the generic traffic flow features for the predictions. The long short-term memory (LSTM) recurrent neural network (RNN) can be used to predict the traffic flow, speed and occupancy, based on the data collected by the data collector and its upstream and downstream data collectors. Along with the use of RNN, the convolution neural network (CNN) can also be utilized to capture the latent traffic evolution patterns within the underlying road network.[P-48]

Here, we see the integration of real-time traffic data with passed sensed traffic data, congestion related traffic information used to aggregate and enable the prediction of future traffic patterns
Guo teaches the first model being a learning model for predicting, from the congestion-related information, values to be detected by the one or more sensors at a time point at which the congestion-related information is acquired; using the first model to predict, from the congestion-related information at a first time point indicated by the past congestion-area data, values to be detected by the one or more sensors at the first time point (Paragraph 60; Figure 3A)
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
Here, in P-60 and Fig 3A, it is illustrated and explained that several different traffic models are generated, where one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection
Guo does not verbatim teach using a second model to predict a future congestion level of any of the m areas from the acquired values, the second model being generated by using, as input data, predicted values obtained by using a first model to predict the values to be detected by the one or more sensors at a first time point, from congestion-related information of the first time point indicated by the past congestion-area data and by using, as correct data, a congestion level in the congestion-related information of a second time point indicated by the past congestion-area data, the second time point being a time point after the first time point.    
  However, as taught in P-60 multiple traffic models, where for example, one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection.  Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc.[P-60]
Hence the next round of learning, entails generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data (the previous aggregated traffic model) as input data and using the congestion level in the congestion-related information of a second time point (a different time point in rush hour) indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second time point being a time point after the first time point( a different time in rush hour after the previous time in the preceding learning model).  
Therefore, though Guo’s teaching does not verbatim recite generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data, by teaching the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data, it is obvious to one of ordinary skill in the art that Guo’s teaching reads on the applicant’s second generated model using substitution of the applicant’s teaching with Guo’s teaching to further yield the exact same results in order to enable a more efficient and robust traffic prediction model.  
Guo however fails to teach displaying, by the processor, to users at the station the future congestion level of any of the m areas.
Guo however fails to teach acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection; using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level of any of the m areas: and displaying, by the processor, to users at the station the predicted congestion- related information.
Whereas, Huang teaches acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection, (Page 3, Paragraph 2; Page 5, Paragraphs 5-10, )
Preferably, the monitoring terminal comprises a traffic monitoring terminal for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal for counting the pedestrian density at a certain time in a certain time, and a conflict monitoring terminal for counting the pedestrian conflict degree in a certain area and an abnormal monitoring terminal for identifying the pedestrian abnormal behaviour in a certain area.[Pg 3, P-2]
Further, the monitoring terminal 2 comprises a traffic monitoring terminal 201 for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal 202 for counting a certain area pedestrian density at a certain time, a conflict monitoring terminal 203 for counting pedestrian conflict degree in a certain area and an abnormal monitoring terminal 204 for identifying pedestrian abnormal behavior in a certain area, each terminal is composed of a sensor, an embedded processing chip, a processing program and a communication module.[Pg 5, P-5]
the sensor of the flow monitoring terminal 201 is a binocular video camera embedded processing chip and flow statistics method according to the video data obtained by the sensor, counting the pedestrian number in and out of the specified area and calculating the flow rate (unit: The cumulative flow rate (unit: 1 %) and the cumulative flow rate (unit: human), then the traffic data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-6]
the sensor of the density monitoring terminal 202 is a wide-angle video camera embedded processing chip and density statistical method according to the video data obtained by the sensor, counting the pedestrian distribution of the specified area and calculating the density (unit of each point after discrete area of the area: the density data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-7]
the sensor of the conflict monitoring terminal 203 is high video camera, embedded processing chip and conflict calculation algorithm according to the video data obtained by the sensor, counting the appointed area pedestrian speed direction and size change rate, relative direction pedestrian number, local density and other parameters, and according to the direction and size change rate of the pedestrian speed in the specified area, relative direction pedestrian number, local density parameter calculating conflict coefficient (dimensionless parameter), then the conflict coefficient is uploaded to the management control terminal 1 through the communication module[Pg 5, P-8]
the sensor of the abnormal monitoring terminal 204 is high video camera, embedded processing chip and the abnormal detection algorithm according to the video data obtained by the sensor, detecting whether there is abnormal behaviour in the specified area, such as a fight, terrorist attack, fast escape, crowding, then the type code of the abnormal behaviour is uploaded to the management control terminal 1 through the communication module. [Pg 5, P-9]
Further, the guide terminal 3 comprises an acoustic guide terminal 301 for conveying sound instruction, for area division, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304. The area separation terminal 303 for the dynamic adjustment of the pedestrian and vehicular traffic area and the guide terminal 305 for order maintenance and manual assistance 305. [Pg 5, P-10]

Huang teaches further using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level (Page 3, Paragraphs 3, 4, 9; Page 3, Paragraph 9-Page 4, Paragraph 1; Page 6, Paragraphs 2, 3, 7, 8 )
Preferably, the management control terminal comprises a region modeling module, a distribution estimation module connected with the region modeling module signal, a distribution prediction module connected with the distribution estimation module signal, and a risk evaluation module connected with the distribution prediction module and the distribution estimation module signal, a simulation and optimization module connected with the risk evaluation module signal and a guide control module connected with the simulation and optimization module signal. [Pg 3, P-3]
Preferably, the simulation and optimization module comprises a personnel evacuation model and personnel flow model, when the risk assessment result of the risk assessment module is low comfort level personnel flow module, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of simulating pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and flow inducing scheme [Pg 3, P-4]
the distribution prediction module according to the history data of the distribution estimation result, establishing model based on machine learning algorithm to predict the future period of pedestrian distribution parameter, and uploading data continuous learning model parameter according to the future monitoring terminal.[Pg 3, P-9]
Preferably, the risk evaluation module signal is connected with a conflict monitoring terminal and an abnormal monitoring terminal, and according to the distribution prediction module, the output result of the distribution estimation module and conflict monitoring terminal, the uploading data of the abnormal monitoring terminal evaluates the possibility of the danger of congestion and trampling on the comfort level region. [Pg 3, P-9-Pg 4, P-1]
Further, the management control terminal 1 comprises a region modeling module 101, and the region modeling module 101 signal connection of the distribution estimation module 102, and the distribution prediction module 102 signal connection distribution prediction module 103. a risk evaluation module 104 connected with the distribution prediction module 103 and the distribution estimation module 102, a simulation and optimization module 105 connected with the risk evaluation module 104 signal and a guide control module 106 connected with the simulation and optimization module 105 signal.[Pg 6, P-2]
Further, the simulation and optimization module 105 comprises a personnel evacuation model and personnel flow model, when the risk evaluation result of the risk evaluation module 104 is low comfort level personnel flow module, management control terminal 1 sends attention area low comfort level alarm, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of the simulated pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and a flow inducing scheme, after the management staff 4 of the auditing by the guide control module 106 sends the inducing information to the guide terminal 3;[Pg 6, P-3]
the distribution prediction module 103 according to the history data of the distribution estimation result, based on the machine learning algorithm establishing model prediction future period of pedestrian distribution parameter, and according to the future monitoring terminal 2 uploading data continuous learning model parameter.[Pg 6, P-7]
Further, the risk evaluation module 104 signal connected with the conflict monitoring terminal 203 and the abnormal monitoring terminal 204, and according to the distribution prediction module 103, distribution estimation module 102 the output result and conflict monitoring terminal 203. The uploading data of the abnormal monitoring terminal 204 evaluates the possibility of the comfort level of congestion and trampling on the region of interest and the occurrence of congestion and trample.[Pg 6, P-8]


Here, we see Huang teaching data from one of more sensors at a specific area of interest (station), using a module synonymous to the first model to predict, from the sensor data, congestion information relating to the one or more sensors that detect the congestion.  Furthermore, Huang uses module for capturing and evaluating congestion data synonymous to the second prediction model/deep learning model to analyze in more depth time series related data to further analyze and predict more accurate parameters affecting the traffic/congestion to future traffic congestion.  
Huang then teaches, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304, thereby displaying congestion result data as well as prediction congestion data
When taken in combination with Guo’s teaching of the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data one of ordinary skill in the art may then be able to determine, the future congestion level of any of the specific (m) areas of the various stations and thereafter displaying, by the processor, to users at the station the predicted congestion-related information.
It would have been obvious to one of ordinary skill in the art to combine Guo’s teaching with Huang’s teaching in order to have a more accurate prediction model to determine congestion/traffic activities and behaviors within a given area.

In regards to Claim 19, Guo teaches a learning prediction method comprising: generating a first model that is a learning model for predicting, from values detected by one or more sensors, congestion-related information of a time point at which the one or more sensors perform detection, the first model being generated by using values indicated by past sensor data indicating the values detected in the past by the one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of m areas (where m is an integer of two or more) as input data and using the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of the m areas as correct data(Paragraphs 9, 11, 17, 48, 60, 63)
It is one object of some embodiments to provide a distributed machine learning based vehicular traffic prediction method that can accurately predict short time, middle time and long time traffic for both city roads and highways. Additionally, it is another object of some embodiments to make optimal route planning for vehicles to minimize traffic congestion and travel time by using the accurate traffic prediction.[P-9]
A learning server coordinates distributed learning among a set of data collectors. A learning server first designs and distributes the traffic models such as neural networks to the set of data collectors for the first round of the distributed training Each data collector then trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training process, each data collector sends the trained traffic models to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models. Upon completion of the model aggregation, the learning server re-distributes the aggregated traffic models to data collectors for the second round of training process. This process of training and aggregation continues until the robust traffic models are built.[P-11]
According to some embodiments of the present invention, a distributed machine learning based traffic prediction method can be provided for predicting traffic of roads. In this case, the distributed machine learning based traffic prediction method may a computer-implemented distributed machine learning based traffic prediction method for predicting traffic of roads. The method may include distributing global multi-task traffic models by a learning server to learning agents to locally train the traffic models; uploading locally trained global multi-task traffic models to the learning server, wherein the locally trained global multi-task traffic models have been trained by the learning agents; updating the global multi-task traffic models by the learning server using the locally trained global multi-task traffic models uploaded from the learning agents; generating a time-dependent global traffic map by the learning server using the updated global multi-task traffic models; and distributing the time-dependent global traffic map to each of vehicles traveling on the roads.[P-17]
To reap all the aforementioned benefits, the traffic prediction must process the real-time and historical traffic data and observations collected by data collectors and mobile devices. For example, the inductive loop can measure the travel speed by reading the inductance changes over time and such data can be used for the traffic speed prediction. In addition, the wide use of mobile devices (e.g., on-board global position systems and phones) enables the mobility data to be crowdsourced from the general public, further facilitating the unprecedented traffic data collection. Such emerging big data can substantially augment the data availability in terms of the coverage and fidelity and significantly boost the data-driven traffic prediction. The prior art on the traffic prediction can be mainly grouped into two categories. The first category focus on using parametric approaches, such as autoregressive integrated moving average (ARIMA) model and Kalman filtering models. When dealing with the traffic only presenting regular variations, e.g., recurrent traffic congestion occurred in morning and evening rush hour, the parametric approaches can achieve promising prediction results. However, due to the stochastic and nonlinear nature of the road traffic, the traffic predictions of using the parametric approaches can deviate from the actual values especially in the abrupt traffic. Hence, instead of fitting the traffic data into a mathematical model as done by the parametric approach, an alternative way is using the nonparametric approaches where the machine learning (ML) based method is used. For example, a stacked autoencoder model can be used to learn the generic traffic flow features for the predictions. The long short-term memory (LSTM) recurrent neural network (RNN) can be used to predict the traffic flow, speed and occupancy, based on the data collected by the data collector and its upstream and downstream data collectors. Along with the use of RNN, the convolution neural network (CNN) can also be utilized to capture the latent traffic evolution patterns within the underlying road network.[P-48]
 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
The learning server 115 may be referred to as a distributed machine learning based traffic predication server. The learning server 115 is configured to provide map information with respect to traffic predictions to vehicles 120 traveling on a road network. The learning server 115 may include one or more processors 121, a memory 140, a memory unit/storage 200 configured to store traffic prediction neural networks 132, traffic data cluster rules 134, global time-dependent map 135, trained multi-task traffic models 136, global map of road network 137, an input interface (or transceiver) 150 configured to communicate with the learning agent 110 via the communication network 112 and update model parameters of the traffic prediction neural networks 132. The trained multi-task traffic models 136 may be the traffic models 310 shown in FIG. 3A.[P-63]
Guo establishes M clusters to represent different traffic models attributed to different traffic areas, where Guo illustrates one area may be freeway traffic to which would labeled Cluster 1, another area may be city traffic (Cluster 2), thereby satisfying the condition of M being two or more.  Furthermore, Guo teaches data collectors/sensors, established as data collector N.  The data collector(s) may be a toll station, a loop detector, or a camera. In which case depending on the area generating  the traffic/cluster model, a desired number of sensors/detectors may be set, such that a desired ratio of  N<M may be met, i.e.data collectors: cluster model. 
In P-48, we see the integration of real-time traffic data with passed sensed traffic data, congestion related traffic information used to aggregate and enable the prediction of future traffic patterns
Also in P-60 and Fig 3A, it is illustrated and explained that several different traffic models are generated, where one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection
Guo fails to verbatim teach generating a second model that is a learning model for predicting a future congestion level from the congestion-related information, the second model being generated by using the congestion-related information of a first time point indicated by the past congestion-area data as input data and using the congestion level in the congestion-related information of a second time point as correct data, the second time point being a time point after the first time point; an acquiring unit configured to acquire values detected by the one or more sensors; using the first model to predict, from the acquired values, the congestion-related information of a time point at which the one or more sensors perform detection; and using the second model to predict, from the predicted congestion-related information, a future congestion level of any of the m areas.  
 However, as taught in P-60 multiple traffic models, where for example, one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection.  Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc.[P-60]
Hence the next round of learning, entails generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data (the previous aggregated traffic model) as input data and using the congestion level in the congestion-related information of a second time point (a different time point in rush hour) indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second time point being a time point after the first time point( a different time in rush hour after the previous time in the preceding learning model) and different areas (toll station, or camera(s) at different location).  The regenerated/second model essentially derived from updated and previous predicted congestion-related information, to using predict a future congestion level of any of the m areas.  
Therefore, though Guo’s teaching does not verbatim recite generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data, by teaching the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data, it is obvious to one of ordinary skill in the art that Guo’s teaching reads on the applicant’s second generated model using substitution of the applicant’s teaching with Guo’s teaching to further yield the exact same results in order to enable a more efficient and robust traffic prediction model.  
Guo however fails to teach displaying, by the processor, to users at the station the future congestion level of any of the m areas.
Guo however fails to teach acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection; using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level of any of the m areas: and displaying, by the processor, to users at the station the predicted congestion- related information.
Whereas, Huang teaches acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection, (Page 3, Paragraph 2; Page 5, Paragraphs 5-10, )
Preferably, the monitoring terminal comprises a traffic monitoring terminal for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal for counting the pedestrian density at a certain time in a certain time, and a conflict monitoring terminal for counting the pedestrian conflict degree in a certain area and an abnormal monitoring terminal for identifying the pedestrian abnormal behaviour in a certain area.[Pg 3, P-2]
Further, the monitoring terminal 2 comprises a traffic monitoring terminal 201 for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal 202 for counting a certain area pedestrian density at a certain time, a conflict monitoring terminal 203 for counting pedestrian conflict degree in a certain area and an abnormal monitoring terminal 204 for identifying pedestrian abnormal behavior in a certain area, each terminal is composed of a sensor, an embedded processing chip, a processing program and a communication module.[Pg 5, P-5]
the sensor of the flow monitoring terminal 201 is a binocular video camera embedded processing chip and flow statistics method according to the video data obtained by the sensor, counting the pedestrian number in and out of the specified area and calculating the flow rate (unit: The cumulative flow rate (unit: 1 %) and the cumulative flow rate (unit: human), then the traffic data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-6]
the sensor of the density monitoring terminal 202 is a wide-angle video camera embedded processing chip and density statistical method according to the video data obtained by the sensor, counting the pedestrian distribution of the specified area and calculating the density (unit of each point after discrete area of the area: the density data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-7]
the sensor of the conflict monitoring terminal 203 is high video camera, embedded processing chip and conflict calculation algorithm according to the video data obtained by the sensor, counting the appointed area pedestrian speed direction and size change rate, relative direction pedestrian number, local density and other parameters, and according to the direction and size change rate of the pedestrian speed in the specified area, relative direction pedestrian number, local density parameter calculating conflict coefficient (dimensionless parameter), then the conflict coefficient is uploaded to the management control terminal 1 through the communication module[Pg 5, P-8]
the sensor of the abnormal monitoring terminal 204 is high video camera, embedded processing chip and the abnormal detection algorithm according to the video data obtained by the sensor, detecting whether there is abnormal behaviour in the specified area, such as a fight, terrorist attack, fast escape, crowding, then the type code of the abnormal behaviour is uploaded to the management control terminal 1 through the communication module. [Pg 5, P-9]
Further, the guide terminal 3 comprises an acoustic guide terminal 301 for conveying sound instruction, for area division, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304. The area separation terminal 303 for the dynamic adjustment of the pedestrian and vehicular traffic area and the guide terminal 305 for order maintenance and manual assistance 305. [Pg 5, P-10]

Huang teaches further using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level (Page 3, Paragraphs 3, 4, 9; Page 3, Paragraph 9-Page 4, Paragraph 1; Page 6, Paragraphs 2, 3, 7, 8 )
Preferably, the management control terminal comprises a region modeling module, a distribution estimation module connected with the region modeling module signal, a distribution prediction module connected with the distribution estimation module signal, and a risk evaluation module connected with the distribution prediction module and the distribution estimation module signal, a simulation and optimization module connected with the risk evaluation module signal and a guide control module connected with the simulation and optimization module signal. [Pg 3, P-3]
Preferably, the simulation and optimization module comprises a personnel evacuation model and personnel flow model, when the risk assessment result of the risk assessment module is low comfort level personnel flow module, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of simulating pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and flow inducing scheme [Pg 3, P-4]
the distribution prediction module according to the history data of the distribution estimation result, establishing model based on machine learning algorithm to predict the future period of pedestrian distribution parameter, and uploading data continuous learning model parameter according to the future monitoring terminal.[Pg 3, P-9]
Preferably, the risk evaluation module signal is connected with a conflict monitoring terminal and an abnormal monitoring terminal, and according to the distribution prediction module, the output result of the distribution estimation module and conflict monitoring terminal, the uploading data of the abnormal monitoring terminal evaluates the possibility of the danger of congestion and trampling on the comfort level region. [Pg 3, P-9-Pg 4, P-1]
Further, the management control terminal 1 comprises a region modeling module 101, and the region modeling module 101 signal connection of the distribution estimation module 102, and the distribution prediction module 102 signal connection distribution prediction module 103. a risk evaluation module 104 connected with the distribution prediction module 103 and the distribution estimation module 102, a simulation and optimization module 105 connected with the risk evaluation module 104 signal and a guide control module 106 connected with the simulation and optimization module 105 signal.[Pg 6, P-2]
Further, the simulation and optimization module 105 comprises a personnel evacuation model and personnel flow model, when the risk evaluation result of the risk evaluation module 104 is low comfort level personnel flow module, management control terminal 1 sends attention area low comfort level alarm, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of the simulated pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and a flow inducing scheme, after the management staff 4 of the auditing by the guide control module 106 sends the inducing information to the guide terminal 3;[Pg 6, P-3]
the distribution prediction module 103 according to the history data of the distribution estimation result, based on the machine learning algorithm establishing model prediction future period of pedestrian distribution parameter, and according to the future monitoring terminal 2 uploading data continuous learning model parameter.[Pg 6, P-7]
Further, the risk evaluation module 104 signal connected with the conflict monitoring terminal 203 and the abnormal monitoring terminal 204, and according to the distribution prediction module 103, distribution estimation module 102 the output result and conflict monitoring terminal 203. The uploading data of the abnormal monitoring terminal 204 evaluates the possibility of the comfort level of congestion and trampling on the region of interest and the occurrence of congestion and trample.[Pg 6, P-8]


Here, we see Huang teaching data from one of more sensors at a specific area of interest (station), using a module synonymous to the first model to predict, from the sensor data, congestion information relating to the one or more sensors that detect the congestion.  Furthermore, Huang uses module for capturing and evaluating congestion data synonymous to the second prediction model/deep learning model to analyze in more depth time series related data to further analyze and predict more accurate parameters affecting the traffic/congestion to future traffic congestion.  
Huang then teaches, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304, thereby displaying congestion result data as well as prediction congestion data
When taken in combination with Guo’s teaching of the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data one of ordinary skill in the art may then be able to determine, the future congestion level of any of the specific (m) areas of the various stations and thereafter displaying, by the processor, to users at the station the predicted congestion-related information.
It would have been obvious to one of ordinary skill in the art to combine Guo’s teaching with Huang’s teaching in order to have a more accurate prediction model to determine congestion/traffic activities and behaviors within a given area.
In regards to claim 20, Guo teaches a learning prediction method comprising: generating a first model that is a learning model for predicting, from congestion-related information, values to be detected by one or more sensors of a time point at which the congestion-related information is acquired, the first model being generated by using, as input data, the congestion-related information indicated by past congestion-area data indicating the congestion-related information including a past congestion level of each of m areas (where m is an integer of two or more) and using, as correct data, values indicated by past sensor data indicating the values detected in the past by the one or more sensors installed in n areas (where n is an integer of one or more and n<m) out of the m areas; using the first model to predict, from the congestion-related information at a first time point indicated by the past congestion-area data, values to be detected by the one or more sensors at the first time point (Paragraphs 9, 11, 17, 48, 60, 63)
It is one object of some embodiments to provide a distributed machine learning based vehicular traffic prediction method that can accurately predict short time, middle time and long time traffic for both city roads and highways. Additionally, it is another object of some embodiments to make optimal route planning for vehicles to minimize traffic congestion and travel time by using the accurate traffic prediction.[P-9]
A learning server coordinates distributed learning among a set of data collectors. A learning server first designs and distributes the traffic models such as neural networks to the set of data collectors for the first round of the distributed training Each data collector then trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training process, each data collector sends the trained traffic models to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models. Upon completion of the model aggregation, the learning server re-distributes the aggregated traffic models to data collectors for the second round of training process. This process of training and aggregation continues until the robust traffic models are built.[P-11]
According to some embodiments of the present invention, a distributed machine learning based traffic prediction method can be provided for predicting traffic of roads. In this case, the distributed machine learning based traffic prediction method may a computer-implemented distributed machine learning based traffic prediction method for predicting traffic of roads. The method may include distributing global multi-task traffic models by a learning server to learning agents to locally train the traffic models; uploading locally trained global multi-task traffic models to the learning server, wherein the locally trained global multi-task traffic models have been trained by the learning agents; updating the global multi-task traffic models by the learning server using the locally trained global multi-task traffic models uploaded from the learning agents; generating a time-dependent global traffic map by the learning server using the updated global multi-task traffic models; and distributing the time-dependent global traffic map to each of vehicles traveling on the roads.[P-17]
To reap all the aforementioned benefits, the traffic prediction must process the real-time and historical traffic data and observations collected by data collectors and mobile devices. For example, the inductive loop can measure the travel speed by reading the inductance changes over time and such data can be used for the traffic speed prediction. In addition, the wide use of mobile devices (e.g., on-board global position systems and phones) enables the mobility data to be crowdsourced from the general public, further facilitating the unprecedented traffic data collection. Such emerging big data can substantially augment the data availability in terms of the coverage and fidelity and significantly boost the data-driven traffic prediction. The prior art on the traffic prediction can be mainly grouped into two categories. The first category focus on using parametric approaches, such as autoregressive integrated moving average (ARIMA) model and Kalman filtering models. When dealing with the traffic only presenting regular variations, e.g., recurrent traffic congestion occurred in morning and evening rush hour, the parametric approaches can achieve promising prediction results. However, due to the stochastic and nonlinear nature of the road traffic, the traffic predictions of using the parametric approaches can deviate from the actual values especially in the abrupt traffic. Hence, instead of fitting the traffic data into a mathematical model as done by the parametric approach, an alternative way is using the nonparametric approaches where the machine learning (ML) based method is used. For example, a stacked autoencoder model can be used to learn the generic traffic flow features for the predictions. The long short-term memory (LSTM) recurrent neural network (RNN) can be used to predict the traffic flow, speed and occupancy, based on the data collected by the data collector and its upstream and downstream data collectors. Along with the use of RNN, the convolution neural network (CNN) can also be utilized to capture the latent traffic evolution patterns within the underlying road network.[P-48]

 FIG. 3A illustrates the traffic prediction architecture 300 using distributed multi-task learning techniques according to some embodiments of the current invention, where an infrastructure device acts as learning server 115 to coordinate a set of data collectors 110, which serve as learning agents. In the FIG. 3A, there are N data collectors, data collector 1 is a toll station, data collector 2 is a loop detector, and data collector N is a camera. The learning server 115 designs M traffic models 310 such that each traffic model is targeted for a specific task, e.g., a model for rush hour traffic and another traffic model for off hour traffic, a model for city traffic and another model for freeway traffic. Therefore, these traffic models are different but correlated because traffic at one location can impact on traffic at other locations and traffic at one can impact on traffic at another time. Accordingly, the learning server 115 defines data clustering rules 320 to divide data into M clusters 330. The learning server 115 then distributes the rules 320 and traffic models 310 to data collectors 110. Correspondingly, each data collector 110 divides its data into M clusters 330. It is possible that some of data clusters are empty. In that case, the data collector will not train corresponding traffic model. In the FIG. 3A, data collector 1 has all data clusters, data collector 2 does not have cluster-M data and, and data collector N does not have cluster-1 data. As a result, data collector 1 will train all traffic models, data collector 2 does not train traffic model M, and data collector N does not train traffic model 1. After clustering its data, each data collector 110 starts the first round of the distributed training. Each data collector 110 trains the received traffic models independently by using its own data without sharing its data with other data collectors and learning server. After certain iterations of training, each data collector sends the locally trained traffic models 340 to the learning server, which then aggregates the received traffic models from all data collectors to generate the common global traffic models 350. Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc. For route planning application, the traffic speed is predicted. Finally, data collectors send their local traffic predictions 360 to the learning server to build a global traffic prediction model, i.e., time-dependent graph, that is then distributed to vehicles 120 for route planning[P-60]
The learning server 115 may be referred to as a distributed machine learning based traffic predication server. The learning server 115 is configured to provide map information with respect to traffic predictions to vehicles 120 traveling on a road network. The learning server 115 may include one or more processors 121, a memory 140, a memory unit/storage 200 configured to store traffic prediction neural networks 132, traffic data cluster rules 134, global time-dependent map 135, trained multi-task traffic models 136, global map of road network 137, an input interface (or transceiver) 150 configured to communicate with the learning agent 110 via the communication network 112 and update model parameters of the traffic prediction neural networks 132. The trained multi-task traffic models 136 may be the traffic models 310 shown in FIG. 3A.[P-63]
Guo establishes M clusters to represent different traffic models attributed to different traffic areas, where Guo illustrates one area may be freeway traffic to which would labeled Cluster 1, another area may be city traffic (Cluster 2), thereby satisfying the condition of M being two or more.  Furthermore, Guo teaches data collectors/sensors, established as data collector N.  The data collector(s) may be a toll station, a loop detector, or a camera. In which case depending on the area generating  the traffic/cluster model, a desired number of sensors/detectors may be set, such that a desired ratio of  N<M may be met, i.e.data collectors: cluster model. 
In P-48, we see the integration of real-time traffic data with passed sensed traffic data, congestion related traffic information used to aggregate and enable the prediction of future traffic patterns
Also in P-60 and Fig 3A, it is illustrated and explained that several different traffic models are generated, where one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection
Guo fails to verbatim teach generating a second model that is a learning model for predicting a future congestion level from values detected by the one or more sensors, the second model being generated by using the predicted values as input data and using the congestion level in the congestion-related information of a second time point indicated by the past congestion-area data as correct data, the second time point being a time point after the first time point; an acquiring unit configured to acquire values detected by the one or more sensors; and using the second model to predict, from the acquired values, a future congestion level of any of the m areas.
 However, as taught in P-60 multiple traffic models, where for example, one model (first) model being a learning model for predicting, from values detected by the one or more sensors (toll station), congestion-related information of a time point (rush hour)at which the one or more sensors perform detection.  Upon completion of the model aggregation, the learning server 115 re-distributes the aggregated traffic models 350 to data collectors for the second round of training. This process of training and aggregation continues until the robust traffic models are built. Once the final traffic models are built, the learning server 115 distributes the models to data collectors 110 for traffic prediction. The predicted traffic depends on applications and can be velocity, traffic flow, number of specific vehicles, etc.[P-60]
Hence the next round of learning, entails generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data (the previous aggregated traffic model) as input data and using the congestion level in the congestion-related information of a second time point (a different time point in rush hour) indicated by the past congestion-area data as correct data, the second model being a learning model for predicting a future congestion level from the congestion-related information, the second time point being a time point after the first time point( a different time in rush hour after the previous time in the preceding learning model) and different areas (toll station, or camera(s) at different location).  The regenerated/second model essentially derived from updated and previous predicted congestion-related information, to using predict a future congestion level of any of the m areas.  
Therefore, though Guo’s teaching does not verbatim recite generating a second model by using the congestion-related information of a first time point indicated by the past congestion-area data as input data, by teaching the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data, it is obvious to one of ordinary skill in the art that Guo’s teaching reads on the applicant’s second generated model using substitution of the applicant’s teaching with Guo’s teaching to further yield the exact same results in order to enable a more efficient and robust traffic prediction model.  
Guo however fails to teach displaying, by the processor, to users at the station the future congestion level of any of the m areas.
Guo however fails to teach acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection; using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level of any of the m areas: and displaying, by the processor, to users at the station the predicted congestion- related information.
Whereas, Huang teaches acquiring, by the processor, present values detected by the one or more sensors of the station: using, by the processor, the first model to predict, from the acquired present values, the congestion-related information of a time point at which the one or more sensors perform detection, (Page 3, Paragraph 2; Page 5, Paragraphs 5-10, )
Preferably, the monitoring terminal comprises a traffic monitoring terminal for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal for counting the pedestrian density at a certain time in a certain time, and a conflict monitoring terminal for counting the pedestrian conflict degree in a certain area and an abnormal monitoring terminal for identifying the pedestrian abnormal behaviour in a certain area.[Pg 3, P-2]
Further, the monitoring terminal 2 comprises a traffic monitoring terminal 201 for counting the pedestrian number in a certain area of a certain time, a density monitoring terminal 202 for counting a certain area pedestrian density at a certain time, a conflict monitoring terminal 203 for counting pedestrian conflict degree in a certain area and an abnormal monitoring terminal 204 for identifying pedestrian abnormal behavior in a certain area, each terminal is composed of a sensor, an embedded processing chip, a processing program and a communication module.[Pg 5, P-5]
the sensor of the flow monitoring terminal 201 is a binocular video camera embedded processing chip and flow statistics method according to the video data obtained by the sensor, counting the pedestrian number in and out of the specified area and calculating the flow rate (unit: The cumulative flow rate (unit: 1 %) and the cumulative flow rate (unit: human), then the traffic data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-6]
the sensor of the density monitoring terminal 202 is a wide-angle video camera embedded processing chip and density statistical method according to the video data obtained by the sensor, counting the pedestrian distribution of the specified area and calculating the density (unit of each point after discrete area of the area: the density data is uploaded to the management control terminal 1 through the communication module [Pg 5, P-7]
the sensor of the conflict monitoring terminal 203 is high video camera, embedded processing chip and conflict calculation algorithm according to the video data obtained by the sensor, counting the appointed area pedestrian speed direction and size change rate, relative direction pedestrian number, local density and other parameters, and according to the direction and size change rate of the pedestrian speed in the specified area, relative direction pedestrian number, local density parameter calculating conflict coefficient (dimensionless parameter), then the conflict coefficient is uploaded to the management control terminal 1 through the communication module[Pg 5, P-8]
the sensor of the abnormal monitoring terminal 204 is high video camera, embedded processing chip and the abnormal detection algorithm according to the video data obtained by the sensor, detecting whether there is abnormal behaviour in the specified area, such as a fight, terrorist attack, fast escape, crowding, then the type code of the abnormal behaviour is uploaded to the management control terminal 1 through the communication module. [Pg 5, P-9]
Further, the guide terminal 3 comprises an acoustic guide terminal 301 for conveying sound instruction, for area division, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304. The area separation terminal 303 for the dynamic adjustment of the pedestrian and vehicular traffic area and the guide terminal 305 for order maintenance and manual assistance 305. [Pg 5, P-10]

Huang teaches further using, by the processor, the second model to predict, from the predicted congestion-related information, the future congestion level (Page 3, Paragraphs 3, 4, 9; Page 3, Paragraph 9-Page 4, Paragraph 1; Page 6, Paragraphs 2, 3, 7, 8 )
Preferably, the management control terminal comprises a region modeling module, a distribution estimation module connected with the region modeling module signal, a distribution prediction module connected with the distribution estimation module signal, and a risk evaluation module connected with the distribution prediction module and the distribution estimation module signal, a simulation and optimization module connected with the risk evaluation module signal and a guide control module connected with the simulation and optimization module signal. [Pg 3, P-3]
Preferably, the simulation and optimization module comprises a personnel evacuation model and personnel flow model, when the risk assessment result of the risk assessment module is low comfort level personnel flow module, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of simulating pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and flow inducing scheme [Pg 3, P-4]
the distribution prediction module according to the history data of the distribution estimation result, establishing model based on machine learning algorithm to predict the future period of pedestrian distribution parameter, and uploading data continuous learning model parameter according to the future monitoring terminal.[Pg 3, P-9]
Preferably, the risk evaluation module signal is connected with a conflict monitoring terminal and an abnormal monitoring terminal, and according to the distribution prediction module, the output result of the distribution estimation module and conflict monitoring terminal, the uploading data of the abnormal monitoring terminal evaluates the possibility of the danger of congestion and trampling on the comfort level region. [Pg 3, P-9-Pg 4, P-1]
Further, the management control terminal 1 comprises a region modeling module 101, and the region modeling module 101 signal connection of the distribution estimation module 102, and the distribution prediction module 102 signal connection distribution prediction module 103. a risk evaluation module 104 connected with the distribution prediction module 103 and the distribution estimation module 102, a simulation and optimization module 105 connected with the risk evaluation module 104 signal and a guide control module 106 connected with the simulation and optimization module 105 signal.[Pg 6, P-2]
Further, the simulation and optimization module 105 comprises a personnel evacuation model and personnel flow model, when the risk evaluation result of the risk evaluation module 104 is low comfort level personnel flow module, management control terminal 1 sends attention area low comfort level alarm, according to pedestrian flow parameter, the function of different area node and the lowest comfort level of the simulated pedestrian flow process, outputting the predicted pedestrian flow parameter and distribution parameter of each node in the future period of time, and a flow inducing scheme, after the management staff 4 of the auditing by the guide control module 106 sends the inducing information to the guide terminal 3;[Pg 6, P-3]
the distribution prediction module 103 according to the history data of the distribution estimation result, based on the machine learning algorithm establishing model prediction future period of pedestrian distribution parameter, and according to the future monitoring terminal 2 uploading data continuous learning model parameter.[Pg 6, P-7]
Further, the risk evaluation module 104 signal connected with the conflict monitoring terminal 203 and the abnormal monitoring terminal 204, and according to the distribution prediction module 103, distribution estimation module 102 the output result and conflict monitoring terminal 203. The uploading data of the abnormal monitoring terminal 204 evaluates the possibility of the comfort level of congestion and trampling on the region of interest and the occurrence of congestion and trample.[Pg 6, P-8]


Here, we see Huang teaching data from one of more sensors at a specific area of interest (station), using a module synonymous to the first model to predict, from the sensor data, congestion information relating to the one or more sensors that detect the congestion.  Furthermore, Huang uses module for capturing and evaluating congestion data synonymous to the second prediction model/deep learning model to analyze in more depth time series related data to further analyze and predict more accurate parameters affecting the traffic/congestion to future traffic congestion.  
Huang then teaches, route guidance and information display of the optical guide terminal 302, for issuing the mobile phone message of the information issuing terminal 304, thereby displaying congestion result data as well as prediction congestion data
When taken in combination with Guo’s teaching of the learning server, regenerating a traffic model (a second model) after previous generated model(s) using previous traffic related data, with real time traffic related data one of ordinary skill in the art may then be able to determine, the future congestion level of any of the specific (m) areas of the various stations and thereafter displaying, by the processor, to users at the station the predicted congestion-related information.
It would have been obvious to one of ordinary skill in the art to combine Guo’s teaching with Huang’s teaching in order to have a more accurate prediction model to determine congestion/traffic activities and behaviors within a given area.

Claim(s) 2, 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Guo et al. (US 20220414450 A1) in view Huang (CN 112053550B) as applied above in claims 1  and 3, in further view of Downs et al. (US 20070208492 A1)

In regards to claim 2, Guo modified fails to explicitly teach past congestion-area data indicates the congestion-related information of respective time points, and the past sensor data indicates the values of the respective time points.  
Downs on the other hand teaches past congestion-area data indicates the congestion-related information of respective time points, and the past sensor data indicates the values of the respective time points (Paragraph 89)
Various embodiments may further utilize various input information and provide various output information for the predictive models used to make future traffic conditions predictions. In some embodiments, inputs to the predictive models related to date and time information include the following variables: Marketid (an identifier for a geographic region); DateTimeUtc (the time of day in Universal Time); DateTimeLocal (the time of day in local time); DateTimeKey, DateDayOfWeekLocal (the day of the week); DateMonthLocal (the month of the year); DateDayLocal; DateHourLocal (the hour of the day); DatePeriod15MinutesLocal (the 15 minute interval of the day); and HolidayLocal (whether the day is a holiday). In some embodiments, inputs to the predictive models related to current and past traffic conditions information include the following variables: RoadSegmentId (an identifier for a particular road segment); SpeedX (the current reported speed of traffic on road segment X); BlackStartLocalX (the length of time that black traffic congestion level conditions have been reported for road segment X); PercentBlackX (the percentage of sensors or other data sources associated with road segment X that are reporting black traffic congestion level conditions); PercentBlackX-N, where X is a particular road segment and N is a member of {15, 30, 45, 60} and where the value corresponds to the percentage of a road segment X (e.g., percent of sensors associated with the road segment) for which black traffic conditions were reported N minutes ago; RawColorX (the current color corresponding to a level of traffic congestion on road segment X); RawColorX-N, where X is a particular road segment and N is a member of {15, 30, 45, 60}, and where the value is a color corresponding to a level of traffic congestion on road segment X N minutes ago; SinceBlackX (the length of time since black traffic congestion levels have been reported for road segment X); HealthX; and AbnormalityX. In some embodiments, inputs to the predictive models related to weather conditions information include the following variables: Temperature (current temperature); WindDirection (current wind direction); WindSpeed (current wind speed); SkyCover (current level of cloud or haze); PresentWeather (current weather state); and RainNHour, where N is a member of {1, 3, 6, 24} and represents precipitation accumulation in the previous N hour(s); and Metarld. In some embodiments, inputs to the predictive models related to event and school schedules information include the following variables: EventVenueId (a venue identifier); EventScheduleId (a schedule identifier); DateDayLocal (the day of a given event); StartHourLocal (the start hour of a given event); EventTypeId (an event type identifier); EventVenueId (a venue identifier); SchoolLocationId (a school location identifier); and IsSchoolDay (whether or not the current day is a school day).[P-89]
Here Downs teaches traffic prediction from route planning from the traffic prediction model, where past  congestion-related information of respective time points, and the past sensor data indicates the values of the respective time points are taken into account amongst other parameters for predictive purposes.
Therefore, it would have been obvious to one of ordinary skill in the art to combine Downs teaching with Guo modified’s teaching in order to enable a more effective method to enable more accurate evaluation of sensor data for the purpose of traffic prediction and route planning.

In regards to claim 4, Guo modified fails to teach the past congestion-area data indicates the congestion-related information of respective time points, and the past sensor data indicates the values of the respective time points.  
Downs on the other hand teaches past congestion-area data indicates the congestion-related information of respective time points, and the past sensor data indicates the values of the respective time points (Paragraph 89)
Various embodiments may further utilize various input information and provide various output information for the predictive models used to make future traffic conditions predictions. In some embodiments, inputs to the predictive models related to date and time information include the following variables: Marketid (an identifier for a geographic region); DateTimeUtc (the time of day in Universal Time); DateTimeLocal (the time of day in local time); DateTimeKey, DateDayOfWeekLocal (the day of the week); DateMonthLocal (the month of the year); DateDayLocal; DateHourLocal (the hour of the day); DatePeriod15MinutesLocal (the 15 minute interval of the day); and HolidayLocal (whether the day is a holiday). In some embodiments, inputs to the predictive models related to current and past traffic conditions information include the following variables: RoadSegmentId (an identifier for a particular road segment); SpeedX (the current reported speed of traffic on road segment X); BlackStartLocalX (the length of time that black traffic congestion level conditions have been reported for road segment X); PercentBlackX (the percentage of sensors or other data sources associated with road segment X that are reporting black traffic congestion level conditions); PercentBlackX-N, where X is a particular road segment and N is a member of {15, 30, 45, 60} and where the value corresponds to the percentage of a road segment X (e.g., percent of sensors associated with the road segment) for which black traffic conditions were reported N minutes ago; RawColorX (the current color corresponding to a level of traffic congestion on road segment X); RawColorX-N, where X is a particular road segment and N is a member of {15, 30, 45, 60}, and where the value is a color corresponding to a level of traffic congestion on road segment X N minutes ago; SinceBlackX (the length of time since black traffic congestion levels have been reported for road segment X); HealthX; and AbnormalityX. In some embodiments, inputs to the predictive models related to weather conditions information include the following variables: Temperature (current temperature); WindDirection (current wind direction); WindSpeed (current wind speed); SkyCover (current level of cloud or haze); PresentWeather (current weather state); and RainNHour, where N is a member of {1, 3, 6, 24} and represents precipitation accumulation in the previous N hour(s); and Metarld. In some embodiments, inputs to the predictive models related to event and school schedules information include the following variables: EventVenueId (a venue identifier); EventScheduleId (a schedule identifier); DateDayLocal (the day of a given event); StartHourLocal (the start hour of a given event); EventTypeId (an event type identifier); EventVenueId (a venue identifier); SchoolLocationId (a school location identifier); and IsSchoolDay (whether or not the current day is a school day).[P-89]
Here Downs teaches traffic prediction from route planning from the traffic prediction model, where past  congestion-related information of respective time points, and the past sensor data indicates the values of the respective time points are taken into account amongst other parameters for predictive purposes.
Therefore, it would have been obvious to one of ordinary skill in the art to combine Downs teaching with Guo modified’s teaching in order to enable a more effective method to enable more accurate evaluation of sensor data for the purpose of traffic prediction and route planning.

Response to Arguments
The examiner has acknowledged the amendments to the independent claims to further narrow the limitations, and has addressed them above, under new grounds of rejection.  Furthermore, the examiner encourages the applicant to contact them, regarding recommendations to overcome the 101 rejection. 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANTHONY D AFRIFA-KYEI whose telephone number is (571)270-7826. The examiner can normally be reached Monday-Friday 10am-7pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, BRIAN ZIMMERMAN can be reached at 571-272-3059. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/ANTHONY D AFRIFA-KYEI/Examiner, Art Unit 2686                                                                                                                                                                                                        

/BRIAN A ZIMMERMAN/Supervisory Patent Examiner, Art Unit 2686
Read full office action
Prosecution Timeline

Apr 15, 2024
Application Filed
Nov 03, 2025
Non-Final Rejection mailed — §101, §103
Dec 30, 2025
Interview Requested
Jan 16, 2026
Applicant Interview (Telephonic)
Jan 16, 2026
Examiner Interview Summary
Jan 29, 2026
Response Filed
Apr 14, 2026
Final Rejection mailed — §101, §103
May 19, 2026
Interview Requested
Precedent Cases

Applications granted by this same examiner with similar technology

18/405,480
Patent 12638301
DATABASE GENERATION METHOD, DATABASE GENERATION DEVICE, NON-TRANSITORY COMPUTER READABLE STORAGE MEDIUM, DATA ANALYSIS METHOD, AND DATA ANALYSIS DEVICE
2y 4m to grant Granted May 26, 2026
17/710,558
Patent 12616393
WEARABLE DEVICE WITH STATUS DISPLAY
4y 1m to grant Granted May 05, 2026
18/757,748
Patent 12614451
LIGHT EMITTING DIODES AND DIODE ARRAYS FOR SMART RING VISUAL OUTPUT
1y 10m to grant Granted Apr 28, 2026
18/509,740
Patent 12605977
DEVICE, SERVER, AND METHOD FOR CONTROLLING VEHICLE
2y 5m to grant Granted Apr 21, 2026
17/927,407
Patent 12582360
MEANS TO ACCURATELY PREDICT, ALARM AND HENCE AVOID SPORT INJURIES AND METHODS THEREOF
3y 4m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
65%
Grant Probability
78%
With Interview (+13.6%)
2y 11m (~10m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 549 resolved cases by this examiner. Grant probability derived from career allowance rate.