Analysis of Forecasting Techniques for the Growth of Vehicular Population in Nepal

Forecasting of vehicular population is a critical process that entails predicting future values based on analyzing past trends and considering explanatory variables, such as economic and demographic factors. This becomes especially vital in the context of Nepal, where the accurate prediction of the future growth of the vehicular population is paramount for achieving sustainable transportation systems. This study employs three distinct forecasting methods: trend line analysis, econometric analysis, and time series analysis. These methods have been rigorously evaluated to assess their respective levels of accuracy in predicting Nepal's vehicular population. Notably, the results from time series analysis, particularly the ARIMA model, have demonstrated a remarkable level of precision compared to the traditional trend line and econometric analysis approaches. The superiority of the ARIMA model underscores its efficacy as the preferred method for accurate vehicular population forecasting, providing a reliable foundation for future planning and policy implementation. The forecasted figures for Nepal's vehicular population indicate anticipated counts of 8,914,793 for 2030 AD, 1,482,842,6 for 2040 AD, and 2,203,801,2 for 2050 AD. These predictions offer transportation planners invaluable insights for the effective implementation of new projects, ensuring that resources are optimally allocated and transportation sustainability is realized.


Introduction
In recent decades, the growth of the vehicular population has become increasingly problematic due to the inability of the existing road network to handle the demand, resulting in social and economic inconvenience.Knowledge of future traffic flow is essential in planning, implementing, and developing a transportation system.It also helps in its operation, management and control [1].Traffic forecasting is the process of predicting how vehicles will move on a particular road or network of roads in the future.To do this, historical traffic data is analyzed and factors such as population growth, land use patterns, and economic conditions are considered.Traffic forecasting plans transportation infrastructure, improves traffic flow, and reduces congestion.It also plays a crucial role in transportation policy and decision-making by informing stakeholders of the potential impacts of different transportation projects.Furthermore, forecasting is necessary for conducting economic analyses related to transportation projects [2].The growth of the vehicular population has become quite challenging in the metropolitan cities of Nepal, and the urban streets of Nepal have faced congestion problems.Forecasting vehicular numbers is the prime concern for infrastructure planners and Engineers as it is essential for planning and developing infrastructure projects related to transportation and its networking.Accurate estimation of vehicular growth in a region is essential for transportation planning, implementation of traffic rules and regulations, pavement design, environmental concerns and other road infrastructure elements.Predicting vehicular growth is an essential factor for efficient planning of future road networks.Developments in automobile technology and the rise in vehicles on the streets have made traffic management quite challenging [3].This paper includes forecasting of the vehicular population of Nepal based on the past trend of vehicular growth.The three methods adopted are Trend Line Analysis, Econometric Analysis and Time Series Analysis.Among these methods, the level of accuracy is questionable to the estimators or researchers.Based on the accuracy level, suitable methods are selected for the future prediction of Nepal's vehicular population.

Problem Statement and Objective of the Study
Transportation infrastructure projects like highways, bridges, and tunnels are designed to accommodate a certain traffic volume level.Accurately forecasting the expected growth in vehicular population is crucial for determining these infrastructure projects' appropriate size and capacity.However, inaccurate forecasts can lead to overestimation or underestimation of traffic volume, resulting in over or under-designing of the infrastructure and finally to wastage of resources.This study concludes the following two objectives: • To evaluate the effective forecasting technique for the growth of Nepal's vehicular population.• To predict the anticipated figure of vehicular population for the target year.

Literature Review
In the context of Nepal, past research has mainly concentrated on short-term forecasting [4] and the traffic volume levels of the Kathmandu Ring Road have been predicted using a multiplicative decomposition forecasting model.Frequently used forecasting models such as ARIMA, SARIMA, etc., require extensive traffic data collection, which may not be feasible for predicting the short-term traffic volume.Approximate nonparametric regression method, traffic forecasting is a process predicting a dynamic variable [5].
That is why several approaches may be adopted for traffic forecasting depending upon the situation at hand.Although there can be various traffic volume forecasting methods, three of the most relevant methods were chosen for comparative analysis in this study due to data availability constraints.
Research conducted in India demonstrated the implication of three different analyses of traffic forecasting, i.e., trend line analysis, econometric analysis, and time series analysis, on data from the past 25 years.It forecasts the traffic volume after ten years.They considered vehicular population, per capita income, gross national product, and people.The research paper concluded that econometric and time series analyses are more accurate than trend line analyses.The study sug-gested that trend line analysis may be suitable for long-term forecasting [6].The econometric model has vital significance in the transportation sector.The study of the air passenger demand model shows a strong statistical correlation between GDP, tourist arrivals, and passenger air transport demand [7].
The research paper titled "Growth Rate of Motor Vehicles in India -Impact of Demographic and Economic Development" explores the factors influencing the rapid growth of motor vehicles in India.The study finds that economic development, population growth, and urbanization contribute to increasing motor vehicle ownership.
The rise in personal vehicles, driven by economic factors such as rising incomes, has led to concerns regarding environmental impacts and strain on urban transport infrastructure [8].
A study addresses the challenge of requiring a large amount of data when using the ARIMA model by introducing a seasonal ARIMA (SARIMA) model, which can be used with limited data.The study focused on traffic flow in a specific roadway section in India, and the researchers developed a SARIMA model to forecast the traffic volume data.The model's predictions were compared with the actual data, and the mean absolute percentage error (MAPE), which fell within an acceptable range [9], was calculated.

Methodology
The methodology for the study was designed or developed with a series of steps extracted from the previous literature review of the eminent issues that confront traffic forecasting with minimum errors that might distort the evaluation outcome.
The period selected for the collection of data is 1989-2019.The necessary data for this study are Vehicle Population, Population, Gross National Product, Gross Domestic Product, Per Capita Income, and Labor Force.The prediction for Nepal's vehicular population has been made for the target years 2030, 2040, and 2050 AD.These predictions provide anticipated vehicle numbers for these specific years and can serve as valuable inputs for transportation planners when implementing transportation projects effectively.1.
The variables used in this analysis are vehicular population, PCI (Per Capita Income), GNP (Gross National Product), Population, and GDP per capita.Trend line analysis was performed using two variables i.e., Vehicular population and Gross national product [6], and similarly for econometric analysis population and per capita income have been selected as independent variables.This choice is based on the understanding that an increase in population is likely to contribute to the growth of the vehicular population.As the overall population expands, there tends to be a corresponding rise in private vehicle ownership.Additionally, the inclusion of per capita income as an independent variable is motivated by the expectation that increasing individual income levels will boost people's purchasing power.This, in turn, is anticipated to result in a higher vehicular population, as individuals with higher incomes are more likely to acquire vehicles.Therefore, these variables are considered essential factors in predicting and understanding the trends in the vehicular population in Nepal.These data are collected from a few sites like the CEIC global database, Ministry of Finance, World Bank, Macrotrends, and Economic Survey 79/80 as in Table 2.

Data Analysis
In this process, three techniques were used for to compare results: Trend Line Analysis, Econometric Analysis, and Time Series Analysis.

Trend Line Analysis
Trendline analysis, also known as trend analysis or trend forecasting, is a statistical technique used in various fields, including finance, economics, and data analysis, to identify and analyze patterns or trends in data over time.It involves plotting data points on a graph and fitting a straight line or curve to the data points to make predictions.This assumes a linear relationship between the country's Gross National Product (GNP) and the total vehicular population(T) [6].The data used in this analysis is for the years 1990-2009, i.e., 20 years and the predictions are done for 2014 to 2019.The equation resulting from the regression analysis can be expressed in  2) shows that Per Capita Income has more effect on traffic demand than the country's population.This is because higher per capita income often correlates with increased purchasing power, which can lead to higher demand for personal vehicles.As people's incomes rise, they are more likely to afford and desire private transportation options.Tables 1 and 2 show some important indicators defining the model's fitness-the Durbin-Watson static ranges from 0 to 4. A value towards 0 indicates positive autocorrelation, a value towards 4 indicates a negative correlation and a value near 2 indicates non-autocorrelation.

Table3. Analysis Model Summary Table 4. Coefficient and Significance level
Since positive autocorrelation is seen much more in practice than negative autocorrelation, the value in our case is 0.877, which is towards 0, meaning there is approximately positive autocorrelation.Positive autocorrelation indicates that the increase observed in a time interval leads to a proportionate increase in the lagged time interval.The significance value for PCI is 0.000 and for the population, it is 0.000 which is less than 0.05.This means rejecting the null hypothesis and accepting the alternative view that a relationship shows the independent variables do a good job explaining the variation in the dependent variable.Also, we can see the f-change value is 4207.787,ranging from zero to positive infinity depending upon the degree of freedom.A larger value of fchange indicates a more substantial improvement in model fit.The Histogram(fig. 4) is a frequency plot obtained by placing the data in regularly spaced cells and plotting each cell frequency versus the center of the cell.This graph is used to verify that the residuals are normally distributed, as is assumed by the regression model.Due to fewer observations, a perfect average graph has not been obtained here but the plot seems to suggest a normal distribution of residuals.Hence, the error terms can be said to be normally distributed.The "N" condition of the linear regression model is that the error terms are normally distributed.A normal probability plot of the residuals is a scatter plot with the observed cumulative residual of the normal distribution on the x-axis and the expected cumulative residual of the residuals on the y-axis as shown in fig.3.The relationship between the theoretical percentiles and the sample percentiles is approximately linear.Therefore, the nor-mal probability plot of the residuals suggests that the error terms are indeed normally distributed.

Time Series Analysis
Time series analysis involves studying and understanding data points collected sequentially over time.This data type often exhibits patterns, trends, seasonality, and other temporal dependencies not present in cross-sectional data.As Box and Jenkins (1976) suggested, ideally at least 30 observations are required to perform an appropriate Time Series Analysis [16].The Box and Jenkins methodology has been adopted and analysis has been done using the Auto-Regressive Integrated Moving Average (ARIMA) approach [17].The analysis has been performed on STATA.ARIMA combines three key components: AutoRegressive (AR), Integrated (I), and Moving Average (MA).Each component addresses a specific aspect of the time series data.Combining these components, the ARIMA model is expressed as ARIMA(p, d, q), where: • 'p' represents the order of the AutoRegressive component (number of past values to consider).• 'd' represents the number of differencing operations applied to achieve stationarity.• 'q' represents the order of the Moving Average component (number of past residuals to consider).The analysis has been performed on STATA Time series must be stationary, an assumption confirmed by the Dickey-Fuller test and data trend.Figure 5 shows the mean and covariance of the series do not depend on time and the p-value is 0.0048, which is less than 0.05 as shown in Figure 6.Hence, the series becomes stationary after two differencing, which gives the order of 'd' as 2. Autocorrelation (ACF) measures the correlation between a time series and its lagged values.In contrast, partial autocorrelation (PACF) measures the correlation between a time series and its lagged values, excluding the influence of intermediate lags.PACF and ACF plots help determine the order of the ARIMA model's AR (Au-toRegressive, p) and MA (Moving Average, q) components.
In a multiple regression model, the variable of interest is forecasted by constructing a linear combination of predictors.In an autoregression model, the variable of interest is denoted by constructing a linear combination of past values of the variable.The term autoregression indicates a regression of the variable against itself.In the partial autocorrelation graph, five lags exceed the confidence band.This means the order of the autoregressive component is five (the order of p is 5).Similarly, In the autocorrelation graph, no lags exceed the confidence band.This means the order of moving average component is zero (the order of q is 0).Hence, the order of (p,d,q) is (5,2,0), which gives ten possible ARIMA models by substituting the value of p equals 5 to 1 and d equals 2 to 1.  From Table 3, checking all selection criteria, the model ARIMA(5,2,0) has minimum sigma square, maximum log-likelihood rule, and lowest RMSE, which meets the maximum number of selection crite-ria (Box, G., & Jenkins, G. (1976).So, this model seems to be the best fit among other forecasting models.There are two primary approaches to forecasting using an ARIMA model: static and dynamic.In static forecasting, also referred to as one-step ahead forecasting, predictions are made solely for the immediate subsequent period.This entails forecasting one time step ahead using the model trained on historical data.On the other hand, dynamic forecasting involves making predictions for a longer time horizon.Dynamic forecasting is more intricate but better suited to capturing extended trends and patterns, whereas static forecasting offers simpler, immediate predictions.In this case, dynamic forecasting is used for a longer time horizon.

Results and Discussion
The analysis results compare three methods adopted in this study to predict the growth of Nepal's vehicular population and forecast the vehicular population for the target years 2030, 2040, and 2050 AD using effective forecasting techniques.The input data used for analysis is from 1990 to 2009 (20 years), and a prediction was made for the years 2014 to 2019 for the comparative study, as shown in Table 5.These error percentages provide a measure of the deviation between the predicted values and the actual values.A negative error indicates an underestimation, while a positive error indicates an overestimation.The error's magnitude suggests the deviation's extent, with larger values representing larger discrepancies.In trend line analysis, The graphical plot of Log(T) versus (Log(GNP) reveals a linear relationship between vehicular population and GNP, which is represented as a straight line.This suggests that the vehicular population has a corresponding linear growth as GNP increases.However, a significant discrepancy emerges between the actual and expected values when this model is applied to predict vehicular population from 2014 to 2019.To provide context, the Department of Transport Management data indicates that half of Nepal's total vehicles were registered after 2014.This rapid and exponential increase in vehicular population after 2014 is not captured by the initially assumed linear model, leading to inaccurate predictions for 2014 to 2019.However, the econometric model shows a satisfactory level of accuracy where the error is below 33%.The econometric model chooses population and per capita income as independent variables.It shows that the growth of Nepal's vehicular population depends on the country's people and their income level.The results of the ARIMA (5,2,0) model applied to forecast Nepal's vehicular population are promising, exhibiting a tight error range of -1 % to +3%.This indicates high accuracy and precision in predicting future vehicular population trends.Such reliable predictions within this narrow percentage range can aid policy formulation, urban planning, and informed decision-making in Nepal's transportation sector.Until 2018, the predicted values generated by the model exhibited a satisfactory level of consistency in the prediction of vehicular population from all three methods.However, a significant prediction accuracy shift became evident in 2019.This sudden increase in prediction errors in 2019 can be attributed, at least in part, to the outbreak of the COVID-19 pandemic.The pandemic introduced unprecedented disruptions to global economies, supply chains, and socio-economic aspects.The resultant lockdowns, travel restrictions, and shifts in consumer behavior led to a decrement in vehicle registration in this particular year.Overall, econometric analysis shows a satisfactory level of prediction as compared to trend line analysis.This clarifies that the economic/demographic indicator population and per capita income depend more on the prediction of vehicular growth than using a single indicator GNP.Encompassing trend line analysis, econometric analysis, and time series analysis to forecast Nepal's vehicular population, a clear pattern emerges, indicating the superior accuracy of the time series analysis, specifically the ARIMA(5,2,0) model.The error margin of up to 3% achieved by the ARIMA model distinctly outshines the alternatives.This outcome underscores the robustness of the ARIMA model's predictive capability, making it the preferred method for accurate vehicular population forecasting.The prediction of Nepal's vehicular population reveals anticipated figures of 8,914,793 for 2030AD,1,482,842,6 for 2040AD, and 2,203,801,2 for 2050AD.These projections offer critical insights for planners and policymakers as invaluable tools for strategic decision-making and urban development.By accurately estimating the future vehicular population, authorities can anticipate the associated demands on transportation infrastructure, energy resources, and environmental impacts.These figures enable planners to allocate resources efficiently, design sustainable transportation systems, and implement policies that cater to the evolving needs of a growing vehicular population.

Conclusion
The study's results have shown that the conventional Trend Line Analysis often leads to a significant overestimation of future traffic volumes.It is crucial to improve the accuracy of these estimations to ensure the efficient allocation of limited resources such as land, labor, and funds, especially in developing nations.This research suggests that adopting more rational, reliable, and advanced analytical methods, such as Econometric Analysis and Time Series Analysis, can yield more credible forecasts.Econometric modeling proves to be superior in capturing the impact of two critical factors: the growth in the overall number of users and their purchasing power to access new transportation developments.This stands in contrast to the traditional Trend Line Analysis, which relies solely on a country's total productivity level (e.g., Gross National Product) for estimating prospective users.Notably, Time Series Analysis has a wellestablished track record in short-term forecasting within finance and economics and warrants exploration in transportation engineering.Further investigation can determine its accuracy within different time frames.Research findings indicate that Time Series Analysis is particularly practical for short-term forecasts.When comparing our analyses, Trend Line, Econometric, and Time Series approaches, Time Series modeling displayed significantly lower errors.The potential of Time Series Analysis for longterm forecasting, given advancements in data availability and technology, is promising.It aligns with the findings of other researchers and could substantially contribute to more precise traffic forecasting in the future.

Figure 1 .
Figure 1.Flowchart of Research Methodology 3.1 Data Collection For vehicle registration, the Vehicle & Transport Management Act, 2049 (1992) and Vehicle & Transport Management Rule, 2054 (1997) of Nepal classify vehicles into the following 5 main categories based on size and capacity as in Table1.The variables used in this analysis are vehicular population, PCI (Per Capita Income), GNP (Gross National Product), Population, and GDP per capita.Trend line analysis was performed using two variables i.e., Vehicular population and Gross national product[6], and similarly for econometric analysis population and per capita income have been selected as independent variables.This

Figure 2 .
Figure 2. Plot between Log(T) and Log (GNP)4.2Econometric AnalysisAn econometric model is a statistical framework used to analyze and quantify the relationships among economic variables.It combines economic theory, statistical methods, and real-world data to understand and predict economic phenomena.Traffic growth is often linked to specific financial and demographic factors[15].These factors include population size, per capita income, and per capita net national product.In this study population and per capita income is predictor variables.For the analysis, the data was chosen for 21 years (1989-2009) and the estimation is done from

Figure 3 .
Figure 3. Normal P-P plot regression standardized residual

Figure 7 .
Figure 7. PACF and ACF of Vehicular Data Ten diverse models have been developed, and to select the best-fit model, there are a few model selection criteria.• Significance of ARIMA components • Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC) should be minimum • Maximum Likelihood: A bigger value is better • Sigma Square: Estimate of the error variance (Smaller value is better) • Least RMSE(Root Mean Square Error) • White Noise closer to 1

Table 5 .
Diagnostic Checking and Selection of Model

Table 7 .
Results Obtained from Different Methods for the year 2014 to 2019