Microscopic Estimation of Freeway Vehicle Positions From the Behavior of Connected Vehicles

Given the current connected vehicles program in the United States, as well as other similar initiatives in vehicular networking, it is highly likely that vehicles will soon wirelessly transmit status data, such as speed and position, to nearby vehicles and infrastructure. This will drastically impact the way traffic is managed, allowing for more responsive traffic signals, better traffic information, and more accurate travel time prediction. Research suggests that to begin experiencing these benefits, at least 20% of vehicles must communicate, with benefits increasing with higher penetration rates. Because of bandwidth limitations and a possible slow deployment of the technology, only a portion of vehicles on the roadway will participate initially. Fortunately, the behavior of these communicating vehicles may be used to estimate the locations of nearby noncommunicating vehicles, thereby artificially augmenting the penetration rate and producing greater benefits. We propose an algorithm to predict the locations of individual noncommunicating vehicles based on the behaviors of nearby communicating vehicles by comparing a communicating vehicle's acceleration with its expected acceleration as predicted by a car-following model. Based on analysis from field data, the algorithm is able to predict the locations of 30% of vehicles with 9-m accuracy in the same lane, with only 10% of vehicles communicating. Similar improvements were found at other initial penetration rates of less than 80%. Because the algorithm relies on vehicle interactions, estimates were accurate only during or downstream of congestion. The proposed algorithm was merged with an existing ramp metering algorithm and was able to significantly improve its performance at low connected vehicle penetration rates and maintain performance at high penetration rates.


INTRODUCTION
In 2010, American drivers wasted 4.8 billion hours and 1.9 billion gallons of fuel due to traffic congestion (Schrank, Lomax, & Eisele, 2011). Several strategies have been proposed to reduce congestion, including dynamic signal timing, ramp metering, and dynamic speed limits. These strategies often rely on historical data supplemented with real-time point detection such as in-pavement loop or video detectors. Because point detection cannot cover the entire roadway, these data are often aggregated over time and space. The high installation and maintenance costs of point detection also prevent wide-scale deployment. Even when detectors are deployed, they are often spread far apart and the conditions between detectors must be estimated.
Most modern vehicles are heavily instrumented, and at a minimum are aware of their locations via global positioning systems (GPS), and their headings and speeds through other onboard sensors. Research suggests GPS may achieve lane-level accuracy in the near future (Du & Barth, 2008;He & Head, 2010), with accuracies of 1.5 m 95% of the time (Popovic & Bai, 2011). This information can then be wireless transmitted to other vehicles or roadside listening devices. In the United States, the communication between vehicles and the roadside infrastructure is often referred to as "connected vehicles" (CV).
As methods for collecting mobile vehicle data are defined and implemented, researchers have proposed mobility applications 46 N. J. GOODALL ET AL. 10-50% Intersection analysis 7,8,9 10-30% Note. Sources: 1 Smith et al., 2010a, 2 He et al., 2012, 3 Priemer & Friedrich, 2009 Barria & Thajchayapong, 2011, 5 Rim, Oh, Kang, & Kim, 2011, 6 Li, Zou, Bu, & Zhang, 2008, 7 Ban, Herring, Hao, & Bayen, 2009, 8 Ban et al., 2011, 9 Hao, Ban, Bennett, Ji, & Sun, 2012 that utilize these new mobile data to improve traffic flow and reduce congestion. Several new algorithms have been developed that, rather than estimate vehicle trajectories from loop detectors or historical data, rely on the locations and speeds of individual vehicles. A ramp metering scheme that is based on detecting platoons of vehicles in the mainline rather than aggregated density measurements is an example of an application that requires the locations of individual vehicles (Park, 2008). Many of these applications function most effectively when all or the majority of vehicles are equipped with sensing technology (Smith et al., 2010a). Simply put, these applications are not designed for detector or historical data only, but instead require knowledge of individual vehicle locations. Some proposed applications include adaptive traffic signal control (Datesh, Scherer, & Smith, 2011;He, Head, & Ding, 2012;Lee & Park, 2012;Priemer & Friedrich, 2009;Smith et al., 2010b), ramp metering (Park, 2008), and dynamic gap-out (Agbolosu-Amison et al., 2012), with others in development.
The deployment of mobile sensors among roadway users will not be instantaneous. Bandwidth shortages and battery life restrict the use of smart phones, and the John A. Volpe National Transportation Systems Center estimated that only 50% of vehicles will have connected vehicle communications capabilities 9 years after the program's initiation (John A. Volpe National Transportation Systems Center, 2008). In any scenario, there will likely be a transition period when only a portion of vehicles are equipped. The developers of mobility applications have been careful to study the effect of low connected vehicle penetration rates on the application's performance, testing their applications across a wide range of penetration rates. The locations of vehicles that are not participating as connected vehicles are referred to as unequipped vehicles and are ignored in most applications. Unsurprisingly, mobility applications produce greater benefits with higher equipped vehicle penetration rates, and most require a minimum percentage of participating vehicles to experience any benefits at all. Table 1 shows a summary of the minimum required percentage of equipped vehicles found in several mobility applications.
We expect that with a reasonable approximation of the locations of unequipped vehicles, the performance of mobility applications that require individual vehicle locations will be significantly improved. We propose a method to estimate the individual locations of unequipped vehicles on a freeway by analyzing the behavior of connected vehicles. By estimating the locations of these unequipped vehicles, the "effective" penetration rate of equipped vehicles is increased, theoretically improving the performance of connected vehicle applications. To measure the algorithm's potential benefits, an existing connected vehicle application, ramp metering, is then tested both with and without the location estimation algorithm.

BACKGROUND
Predicting the locations of individual vehicles based on wirelessly transmitted data is a recent area of interest. Preliminary work focused on estimating freeway travel states rather than individual vehicle locations. The earliest work was based on vehicle location and travel time information as determined from cell tower signal triangulation (Bargera, 2007;Sanwal & Walrand, 1995;Westerman, Litjens, & Linnartz, 1996). Later work focused on the much more accurate, although sparsely collected, GPS data (Ferman, Blumenfeld, & Dai, 2005;Krause, Horvitz, Kansal, & Zhao, 2008). Mobile sensor data were eventually integrated with point detection data, and were used to estimate vehicle travel time by Nanthawichit et al. (2003). Herrera and Bayen (2010) used Kalman filtering techniques and Newtonian relaxation to integrate point detection and mobile sensors into a highresolution traffic state estimation of a freeway. Their algorithms were evaluated using both empirical ground-truth freeway data (Federal Highway Administration, 2010) and actual in-vehicle cell phones with GPS receivers (Herrera et al., 2008). They were able to estimate vehicle densities of multilane 120-ft segments with a root mean square error (RMSE) of 1.78-2.44 vehicles depending on assumptions, but did not estimate individual vehicle locations.
One technique has been proposed to estimate vehicle locations based on their travel times through a section. Ban et al. (2011) used the reported travel times of a portion of vehicles traveling through an intersection to calculate their individual delays. This information was then used to determine the amount of time each vehicle was in the signal's queue. Using this information, Ban et al. could estimate the arrival rate at the intersection and, by assuming uniform flow rate and constant discharge rate, could estimate the total length of the queue with only 30% of vehicles reporting their locations.
Building on this technique, Sun and Ban (2011) attempted to estimate the precise trajectories of the unequipped vehicles that make up the total queue, whose length was estimated using the technique described earlier. However, the number of unequipped vehicles arriving between equipped vehicles is known, implying the use of an upstream detector. Later work was able to relax this assumption without a significant difference in performance . The behavior of these unequipped vehicles did not follow a car-following model, but instead showed immediate speed changes from free flow to stopped and back again. techniques have yet been proposed to estimate individual vehicle locations based on the behavior of equipped vehicles beyond queuing analysis.

DESCRIPTION OF THE ALGORITHM
To develop a microscopic estimation of vehicle locations and speeds, the proposed algorithms determine when an equipped vehicle is behaving differently than would be expected based on the locations and speeds of vehicles directly ahead. This requires a definition of expected vehicle behavior. Car-following models, which attempt to predict the behavior of individual vehicles, are used here as an approximation of a vehicle's expected behavior. For this study, the Wiedemann model is used as the car-following model. The Wiedemann model is a psychophysical model that estimates the thresholds for a driver's decision to accelerate or decelerate based on drivers' perceptions of changes in relative velocity. The model uses four stages: free driving, following, approaching, and braking/emergency (Wiedemann, 1974). A vehicle's current stage is based on its change in headway and relative velocity to the leading vehicle. The Wiedemann model has gained acceptance as the basis for the microscopic simulation software VISSIM (Planung Transport Verkehr [PTV], 2011).
To avoid overfitting the model to the evaluation data set, the Wiedemann model as applied here uses the calibration parameters based on empirical freeway data as proposed by Wiedemann and Reiter (1992). Some model parameters were found in the original paper, while others were extrapolated from charts by Olstam and Tapani (2004). For the proposed algorithm, the Wiedemann model is also limited to in-lane car-following; lanechanging decisions are not modeled.
To estimate the location of unequipped vehicles, first the locations, speeds, and accelerations of all vehicles at a given time t are used to populate a virtual roadway. The expected position of all vehicles are updated to time t + 1 based on the Wiedemann model considering all known equipped vehicles as well as the estimated positions of unequipped vehicles at time t. The positions of all vehicles, equipped and estimated, are checked against two criteria. First, if an estimated vehicle is found to overlap with an equipped vehicle, the estimated vehicle is erased from the rolling estimation. Second, if an equipped vehicle is found to have an acceleration that is less than expected by a predetermined threshold, it is assumed that the equipped vehicle is following the Wiedemann model and is reacting to a previously undetected unequipped vehicle, referred to here as the estimated vehicle. Refer to Figure 1 for a description of the algorithm's parameters.
The speed of the estimated vehicle can be predicted from an empirical relationship between lead vehicle speed, following vehicle speed, and following vehicle acceleration. The relationship is defined in Eq. 1: (1) Figure 1 Parameters used in the location estimation algorithm.
In Eq. 1, v n-1 is the speed of the estimated lead vehicle, v n and b n are the speed and acceleration of the equipped following vehicle, and λ is a calibration factor. Using linear regression on the Next-Generation Simulation data set of empirical vehicle trajectories described in the Evaluation section, λ = 0.162 when using m/s and m/s 2 for speed and acceleration respectively, with R 2 = .831. This value was used in all evaluations, regardless of network.
In the Wiedemann model, there are three regimes where a following vehicle reacts to a lead vehicle: following, closing, and emergency. The equipped vehicles were assumed to be in the closing regime if traveling faster than the estimated vehicle, and in the following regime if traveling slower than the lead vehicle. Therefore, if v n-1 < v n , then the closing regime was used to determine the unequipped vehicle's position. The formula for acceleration of vehicle n is defined in Eq. 2: In Eq. 2, ABX is the desired minimum space headway at low speed differences, v is the difference in speed between vehicle n (the equipped following vehicle) and the n -1 lead vehicle, x is the space headway from vehicle n and vehicle n -1, and b n-1 is the acceleration of the n -1 lead vehicle. A more detailed description of the model can be found in the literature (Olstam & Tapani, 2004;Wiedemann & Reiter, 1992).
Because the actual acceleration b n of vehicle n is known, Eq. 1 can be rearranged to determine the space headway, thereby predicting the location of the leading vehicle n -1. The lead vehicle's speed was estimated from Eq. 1, and v can thereby be determined from Eq. 3: The leading vehicle is assumed to have an acceleration of zero and a standard vehicle length of 4.75 m. Equation 4 demonstrates this rearrangement of Wiedemann and Reiter's closing acceleration equation with the new assumptions: If the lead vehicle's speed is estimated to be greater than or equal to the following vehicle's speed, that is, a n ≥ 0, then the following vehicle is assumed to be in the following regime. The space headway is simply the desired minimum space headway ABX as defined in Eq. 5: intelligent transportation systems vol. 20 no. 1 2016 By assuming the unequipped vehicle's speed using Eq. 2, and assuming its acceleration of zero, the headway of the two vehicles is quickly calculated, and the lead vehicle is inserted at the appropriate location and speed. The new vehicle's location can be found using Eq. 6: The new vehicle is an estimate of the position of an unequipped vehicle. The estimated vehicle is inserted into the rolling estimation of the traffic network, and continues to move forward and interact with other equipped and unequipped vehicles according to the Wiedemann car-following model, although it never changes lanes. The estimated vehicle is removed from the simulation when an equipped vehicle no longer reacts to it and overlaps positions.
The acceleration difference threshold that initiates a estimated vehicle insertion is a critical value in the analysis, and for all testing in this article, the value was set to 0.2 g (1.96 m/s 2 ). This value was chosen based on the threshold of 0.5 g for determining a potential incident in naturalistic driving studies (Dingus et al., 2006). Also, the car-following model used here has a maximum acceleration of 0.36 g (3.5 m/s 2 ) when a vehicle is at standstill, with this maximum rate of acceleration inversely related to the vehicle's speed. Therefore, any equipped vehicle traveling at less than 17 m/s with no acceleration, and that is predicted to have the maximum acceleration, will insert a single vehicle estimate. In this way, the algorithm is effective at low-speed synchronized traffic flow and queuing, even when vehicles are not decelerating.
Finally, the algorithm assumes that equipped vehicles report their lane, location, speed (which if not reported directly can be determined from the difference in location since the last transmission), and instantaneous acceleration. The vehicles are assumed to report once per second, as this allows for the periodic message drops expected in a 10-Hz transmission rate in a connected vehicle environment (Society of Automotive Engineers, 2009).

EVALUATION
The proposed algorithm was tested using vehicle trajectories from the Next-Generation Simulation (NGSIM) project, a fieldcollected data set of vehicle movements along several corridors in the United States (Federal Highway Administration, 2010). Vehicle movements were collected from video recordings, and then extracted via specialized software. This study used the data set collected from a 500-m section of Interstate 80 in Emeryville, CA, between 5:00 p.m. and 5:30 p.m. on April 13, 2005. The roadway has five lanes in the northbound direction, along with a one on-ramp and a weave area. The activity on the ramps and merge lanes are not analyzed, only the behavior of vehicles traveling in the through lanes. Figure 2 shows  vehicle ten times per second, they are considered ground-truth data.
A portion of vehicles in the data set were randomly assigned as connected vehicles and their movements are pulled from the full data set, and all other vehicle records were removed from the evaluation set.
Because the location estimation algorithm requires vehicle interactions to produce any estimates, it requires not only warmup time, but also warmup space. Figure 3 shows estimated vehicle densities for the final 220 m of the freeway segment after a 60-second initialization at various penetration rates. The left side shows only equipped vehicles, while the right shows the densities of both equipped vehicles and estimated vehicle locations. Densities are much more accurate downstream of congestion (vehicles travel bottom to top in the figures), as seen at 25% penetration rates. The algorithm occasionally misses traffic phenomena at low penetration rates, such as the wide moving jam at between 5:18 and 5:20, which only becomes visible at 50% penetration rate using the location estimation algorithm. At higher penetration rates, because there are so few unequipped vehicles in the network, the algorithm often overestimates densities as seen in at 100% penetration rate, and to a lesser extent at 70% penetration. In spite of its shortcomings, the algorithm provides a remarkable improvement over equipped vehicles alone at low and mid penetration rates near congestion.
The characteristics of the location estimation algorithm can be demonstrated by analyzing vehicle trajectories. Figure 4 shows the trajectories of equipped, unequipped, and estimated unequipped vehicles over a small portion of the I-80 data set at a 25% penetration rate. In the figure, estimated vehicles are often initially placed near an unequipped vehicle. However, because the locations of all unequipped vehicles have not been estimated, there is often little traffic nearby with which to interact. As a result, the vehicle accelerates to free-flow speed until it encounters an estimated or equipped vehicle, essentially "driving" itself into position. Estimated vehicles are continually inserted behind the original equipped vehicle, and continue to drive themselves into place. This occurs several times in Figure 4 between 1140 and 1160 seconds. As a result, vehicle speeds as estimated by the algorithm are often higher than in actuality; for example, at 25% penetration rate the average speed of estimated vehicles was 27% higher than speeds of unequipped vehicles.
Measuring the performance of the algorithm is a challenge. Normally the difference between observed and estimated values can be measured and averaged, which requires a one-to-one  Accuracy is improved at low penetration rates, but densities are often overestimated at high penetration rates.
relationship between estimates and observations. The location estimation algorithm, by contrast, often has a different number of estimates than observations. To provide some understanding of the algorithm's performance, we introduce a new metric called the effective penetration rate, PR eff . To calculate effective penetration rate, the individual estimates and observations within the same lane and time are sorted into exclusive pairs based on nearest neighbor. Only one observation for any time and lane may be matched with a single estimation at the same time and lane, and vice versa. The one-dimensional distances between these two values are recorded as the minimum absolute distance between any single estimated location and observed location. This procedure is defined in the iterative process, performed from n = 1 to #O t,l , as described in Eqs. 7 through 11: O is the set of all observations (locations of unequipped vehicles) and X is the set of errors for each vehicle location estimation in the set E for each lane l in the set of all lanes L and each time interval t in the set T. In each iteration n, variables i and j represent the minimum respective observation and estimation with the closest relative error of the set, as shown in Eq. 7. These individual records are removed from the set in Eqs. 8 through 11, using sets A and B as placeholders.
To calculate effective penetration rate, each location estimation error in X that is less than or equal to the threshold ρ is classified as an accurate measurement, and each greater than the threshold is classified as inaccurate. The metric essentially defines all equipped vehicles as accurate measurements, and furthermore ensures that any inaccurate measurement cancels out an accurate measurement. The effective penetration rate for a given accuracy ρ is defined in Eq. 12: In Eq. 12, S is the set of all equipped vehicles, E is the set of all estimated positions, O is the set of all unequipped positions, X is the set of all location errors, and ρ is the acceptable distance error threshold. Table 2 shows the effective penetration rates of the I-80 data at various accuracy levels and actual penetration rates. At very high accuracy levels of 3 m or less, the algorithm generates more inaccurate than accurate estimates. Additionally at actual penetration rates above 80%, there are few unequipped vehicles to detect and therefore the algorithm performs poorly. The algorithm is most effective at penetration rates of 70% or less, with minimum accuracy rates of 5-10 m.

Ramp Metering Application
Used alone, the location estimation algorithm is useful for detecting highway conditions and providing an estimate of densities in low-or no-detection segments. However, by providing estimates of individual vehicle locations, the algorithm should also be able to improve the performance of some connected vehicle applications at low penetration rates. To test this theory, the location estimation algorithm was applied to a connected vehicle ramp metering algorithm called the GAP algorithm (Park, 2008).

Description of the GAP Algorithm
The GAP algorithm analyzes the speeds, accelerations, and locations of mainline vehicles upstream of a freeway on-ramp to predict future gaps in the right-most lane at the merge area in the intelligent transportation systems vol. 20 no. 1 2016 near future. On-ramp vehicles are queued at a ramp signal until a gap is predicted on the mainline, at which point one or several vehicles are given a green signal and released onto the mainline. Although traditional ramp metering algorithms release vehicles at a fixed rate over a set time period, the GAP algorithm releases vehicles at irregular rates based on the prediction of gaps in mainline traffic. The GAP algorithm relies on two calculations: the position of the on-ramp vehicle, and the positions of the mainline vehicles. The on-ramp vehicle is expected to follow the Wiedemann carfollowing model. An on-ramp vehicle upon receiving a green phase is expected to hold its current speed for one second as the driver reacts to the signal, then accelerates at a rate described in Eq. 13: In Eq. 13, a t is acceleration in meters per second per second, and v t-1 is speed in meters per second at the previous time step. The GAP algorithm then predicts the positions of vehicles on the mainline in the right lane nearest the on-ramp. Vehicle positions are predicted using the fundamental equation as shown in Eq. 14: In Eq. 14, x t and x 0 are the vehicle positions at time t and the initial time of measurement, respectively. In the GAP algorithm, at each time step the vehicle on the on-ramp next in line at the meter has its position projected several time steps into the future. The positions of vehicles in the mainline right lane are also projected. If there exists a gap at the on-ramp vehicle's position before the on-ramp vehicle is projected to reach the end of the merge area, then the on-ramp vehicle receives a green signal. If no gap is detected, then the ramp meter is set to red. Vehicle positions are recalculated every second, and a green signal's minimum duration is 2 seconds. To prevent backups, the ramp meter is set to green if there are any stopped vehicles in the last 50 m of the on-ramp.

Evaluation of GAP Algorithm With and Without Location Estimation
The GAP algorithm was originally developed for a 100% connected vehicle (CV) environment. At low CV penetration rates, the algorithm frequently predicts gaps where there are none. The GAP algorithm was tested at CV penetration rates of 10, 25, 50, 75, and 100% both with and without the location estimation algorithm. The algorithms were also tested against a fixed time metering strategy designed for the expected flow rate at the ramp of one vehicle every 6 seconds. The test network consisted of a two-lane freeway with a single-lane on-ramp volume of 600 vehicles per hour. Table 3 shows the results of the analysis.
The location estimation algorithm demonstrates a statistically significant effect on the performance of the GAP algorithm at low and medium penetration rates. At a 10% penetration rate, the performance of the GAP algorithm experienced improved speed in the merge area distance traveled, with no measurable effect in any other area. The location estimation algorithm's effect was more noticeable at 25% and 50% penetration rates, several metrics either within or near a 10% significance level. As expected, the location estimation algorithm produces more intelligent transportation systems vol. 20 no. 1 2016 Note. In all cases, n = 10, and bold values represent difference at p < .10.
inaccurate than accurate estimates at high penetration rates, and no measurable benefit was found at 75% or 100% penetration rate. It is worth noting that the GAP algorithm did not reduce benefits at high penetration rates, even when producing many inaccurate vehicle location estimates. This suggests that the algorithm can remain active even during periods of high connected vehicle penetration rates without degrading the performance of the GAP algorithm.

CONCLUSIONS AND FUTURE RESEARCH
The introduction of individual vehicle location data through connected vehicles will allow the development of new types of mobility applications. Research has shown that with higher rates of connected vehicles, these applications produce higher benefits. An algorithm to estimate the locations of unequipped vehicles in a connected vehicle freeway environment was introduced here as a way to both estimate freeway vehicle positions and improve the performance of connected vehicle applications at low penetration rates. The algorithm compares the behavior of connected vehicles against the expected behavior predicted from a car-following model, in this case the Wiedemann model. Any actual accelerations lower than the expected accelerations by a predetermined threshold (0.2 g as tested here) insert an estimated vehicle into a rolling simulation of the network. This vehicle moves as predicted by the car-following model without changing lanes until it is overlapped by a connected vehicle and deleted from the set of estimates.
A new metric, the effective penetration rate, was introduced to measure the performance of the algorithm. Analysis shows that the algorithm is able to make more accurate than inaccurate estimates of vehicle locations when penetration rates are less than or equal to 80%, and when the required accuracy level is 5 m or greater in one dimension, that is, within the same lane. Because the algorithm analyzes vehicle interaction in order to make predictions, some congestion upstream of the study area is required for correct estimates. Furthermore, because estimated vehicles move according to the Wiedemann model, and because the algorithm's model must self-populate, many estimated vehicles have no other vehicles with which to interact, and accelerate to free flow speed. This causes the location estimation algorithm to overestimate speeds.
The location estimation algorithm was tested in the GAP ramp metering algorithm using data from connected vehicles. The GAP algorithm allows on-ramp vehicle into the freeway when gaps in the merge lane are predicted at a future time of merge. By using the location estimation, the performance of the GAP algorithm was improved significantly at low equipped vehicle penetration rates. Most benefits were experienced between 10% and 50% connected vehicle penetration rates. At higher penetration rates, the location estimation algorithm did not improve nor degrade the performance of the GAP algorithm, and was able to improve upon a static fixed time ramp metering strategy. Statistical significance of the improvement of the GAP algorithm was difficult to measure, as ramp metering often produces only minor improvements in simulation. Future research should investigate specific factors in a vehicle's behavior that may improve or reduce the accuracy of a location estimate. For example, certain parts of the roadway such as weaving areas may be known for unusual accelerations that do not indicate a leading vehicle. Also, the life span of an estimated vehicle may be related to the quality of the estimate; for example, an estimated vehicle than can "survive" longer in the simulation without being overlapped by an equipped vehicle may indicate greater confidence in the estimate.