这是一篇来自澳大利亚统计局 (https://www.abs.gov.au/) 的关于时间序列的一些基本概念的文章。 尝试用中英对照看看。
原文在此–《Time Series Analysis: The Basics》。
什么是时间序列?WHAT IS A TIME SERIES?
A time series is a collection of observations of well-defined data items obtained through repeated measurements over time. For example, measuring the value of retail sales each month of the year would comprise a time series. This is because sales revenue is well defined, and consistently measured at equally spaced intervals. Data collected irregularly or only once are not time series. |
时间序列是通过一段时间内反复测量而获得的对明确界定的数据项目的观察结果的集合。例如,测量一年中每个月的零售额就构成了一个时间序列。这是因为销售收入是明确定义的,并以相等的间隔持续测量。不定期或只收集一次的数据不属于时间序列。 |
An observed time series can be decomposed into three components: the trend (long term direction), the seasonal (systematic, calendar related movements) and the irregular (unsystematic, short term fluctuations). |
观察到的时间序列可以分解为三个部分:趋势(长期方向)、季节(系统性的、与日历有关的运动)和不规则(非系统性的、短期波动)。 |
什么是存量和流量序列?WHAT ARE STOCK AND FLOW SERIES?
Time series can be classified into two different types: stock and flow. |
时间序列可以分为两种不同的类型:存量和流量。 |
A stock series is a measure of certain attributes at a point in time and can be thought of as “stocktakes”. For example, the Monthly Labour Force Survey is a stock measure because it takes stock of whether a person was employed in the reference week. |
存量系列是对某一时间点上某些属性的衡量,可以被视为 “存量”。例如,劳动力月度调查就是一种存量测量,因为它测量的是一个人在参考周内是否就业。 |
Flow series are series which are a measure of activity over a given period. For example, surveys of Retail Trade activity. Manufacturing is also a flow measure because a certain amount is produced each day, and then these amounts are summed to give a total value for production for a given reporting period. |
流动系列是衡量某一特定时期活动的系列。例如,零售贸易活动调查。制造业也是一种流量计量,因为每天生产一定数量的产品,然后将这些数量相加,得出某一报告期的生产总值。 |
The main difference between a stock and a flow series is that flow series can contain effects related to the calendar (trading day effects). Both types of series can still be seasonally adjusted using the same seasonal adjustment process. |
存量系列和流量系列的主要区别在于流量系列可以包含与日历有关的效应(交易日效应)。这两类序列仍可使用相同的季节性调整程序进行季节性调整。 |
什么是季节效应?WHAT ARE SEASONAL EFFECTS?
A seasonal effect is a systematic and calendar related effect. Some examples include the sharp escalation in most Retail series which occurs around December in response to the Christmas period, or an increase in water consumption in summer due to warmer weather. Other seasonal effects include trading day effects (the number of working or trading days in a given month differs from year to year which will impact upon the level of activity in that month) and moving holiday (the timing of holidays such as Easter varies, so the effects of the holiday will be experienced in different periods each year). |
季节性效应是一种系统的、与日历有关的效应。一些例子包括大多数零售系列在12月前后因圣诞节而急剧上升,或夏季因天气变暖而用水量增加。其他季节性效应包括交易日效应(特定月份的工作日或交易日数量每年不同,这将影响该月份的活动水平)和移动假日(复活节等假日的时间不同,因此每年的不同时期会受到假日的影响)。 |
什么是季节调整,为什么需要做季节调整?WHAT IS SEASONAL ADJUSTMENT AND WHY DO WE NEED IT?
Seasonal adjustment is the process of estimating and then removing from a time series influences that are systematic and calendar related. Observed data needs to be seasonally adjusted as seasonal effects can conceal both the true underlying movement in the series, as well as certain non-seasonal characteristics which may be of interest to analysts. |
季节性调整是估计然后从时间序列中消除与日历有关的系统性影响的过程。观测到的数据需要进行季节性调整,因为季节性影响可以掩盖序列中真正的基本变动,以及分析人员可能感兴趣的某些非季节性特征。 |
为什么我们不能比较每年同一时期的原始数据?WHY CAN’T WE JUST COMPARE ORIGINAL DATA FROM THE SAME PERIOD IN EACH YEAR?
A comparison of original data from the same period in each year does not completely remove all seasonal effects. Certain holidays such as Easter and Chinese New Year fall in different periods in each year, hence they will distort observations. Also, year to year values will be biased by any changes in seasonal patterns that occur over time. For example, consider a comparison between two consecutive March months i.e. compare the level of the original series observed in March for 2000 and 2001. This comparison ignores the moving holiday effect of Easter. Easter occurs in April for most years but if Easter falls in March, the level of activity can vary greatly for that month for some series. This distorts the original estimates. A comparison of these two months will not reflect the underlying pattern of the data. The comparison also ignores trading day effects. If the two consecutive months of March have different composition of trading days, it might reflect different levels of activity in original terms even though the underlying level of activity is unchanged. In a similar way, any changes to seasonal patterns might also be ignored. The original estimates also contains the influence of the irregular component. If the magnitude of the irregular component of a series is strong compared with the magnitude of the trend component, the underlying direction of the series can be distorted. |
比较每年同一时期的原始数据并不能完全消除所有季节性影响。某些节日,如复活节和春节,在每年的不同时期,因此它们会扭曲观察结果。此外,随着时间的推移,季节性的变化也会使每年的数值出现偏差。举例来说,可考慮比較連續兩個三月的數字,即比較二零零零年及二零零一年三月所觀察到的原始數列的水平。這個比較忽略了復活節的移動假期效應。在大部分年份,復活節是在四月發生,但如果復活節是在三月,則某些數列在該月的活動水平會有很大的差異。这就扭曲了原来的估计。对这两个月进行比较将无法反映数据的基本模式。比较还忽略了交易日的影响。 如果3月份连续两个月的交易日构成不同,那么即使基本的活动水平没有变化,也可能反映出原有的不同活动水平。同样,任何季节性模式的变化也可能被忽略。原先的估計亦包含不規則成分的影響。如果一个系列的不规则成分的幅度比趋势成分的幅度大,则该系列的基本方向可能被扭曲。 |
However, the major disadvantage of comparing year to year original data, is lack of precision and time delays in the identification of turning points in a series. Turning points occur when the direction of underlying level of the series changes, for example when a consistently decreasing series begins to rise steadily. If we compare year apart data in the original series, we may miss turning points occurring during the year. For example, if March 2001 has a higher original estimate than March 2000, by comparing these year apart values, we might conclude that the level of activity has increased during the year. However, the series might have increased up to September 2000 and then started to decrease steadily. |
然而,比较年与年的原始数据的主要缺点是,在确定序列中的转折点时缺乏精确性和时间延迟。转折点发生在数列基本水平的方向发生变化时,例如当一个持续下降的数列开始稳步上升时。如果我们比较原序列中相隔一年的数据,我们可能会错过一年内发生的转折点。例如,如果2001年3月的原始估计值比2000年3月高,通过比较这些相隔一年的数值,我们可能会得出结论,认为活动水平在这一年中增加了。然而,该系列可能在2000年9月之前有所上升,然后开始稳步下降。 |
何时不适宜进行季节性调整?WHEN IS SEASONAL ADJUSTMENT INAPPROPRIATE?
When a time series is dominated by the trend or irregular components, it is nearly impossible to identify and remove what little seasonality is present. Hence seasonally adjusting a non-seasonal series is impractical and will often introduce an artificial seasonal element. |
当一个时间序列由趋势或不规则成分主导时,几乎不可能识别和消除存在的少量季节性因素。因此,对非季节性序列进行季节性调整是不切实际的,往往会引入人为的季节性因素。 |
什么是季节性?WHAT IS SEASONALITY?
The seasonal component consists of effects that are reasonably stable with respect to timing, direction and magnitude. It arises from systematic, calendar related influences such as: |
季节性成分包括在时间、方向和幅度方面相当稳定的影响。它来自于系统性的、与日历相关的影响,例如: |
Natural Conditions |
自然条件 |
weather fluctuations that are representative of the season (uncharacteristic weather patterns such as snow in summer would be considered irregular influences) |
季节带来的气象波动(夏季下雪等不寻常的天气模式将被视为非正常影响) |
Business and Administrative procedures |
商业和行政程序 |
start and end of the school term |
开学和放假 |
Social and Cultural behaviour |
社会和文化行为 |
Christmas |
圣诞节 |
It also includes calendar related systematic effects that are not stable in their annual timing or are caused by variations in the calendar from year to year, such as: |
它还包括与日历有关的系统效应,这些效应在年度时间上并不稳定,或由每年的日历变化引起,例如: |
Trading Day Effects |
交易日效应 |
the number of occurrences of each of the day of the week in a given month will differ from year to year- There were 4 weekends in March in 2000, but 5 weekends in March of 2002 |
各年某月星期数的不同- 2000年3月有4个周末,但2002年3月有5个周末。 |
Moving Holiday Effects,holidays which occur each year, but whose exact timing shifts - Easter, Chinese New Year |
移动假日效应,每年都有但年年时间不同的假日- 复活节、春节 |
我们如何识别季节性?HOW DO WE IDENTIFY SEASONALITY?
Seasonality in a time series can be identified by regularly spaced peaks and troughs which have a consistent direction and approximately the same magnitude every year, relative to the trend. The following diagram depicts a strongly seasonal series. There is an obvious large seasonal increase in December retail sales in New South Wales due to Christmas shopping. In this example, the magnitude of the seasonal component increases over time, as does the trend. |
一个时间序列的季节性可以通过有规律的高峰和低谷来识别,这些高峰和低谷相对于趋势而言,每年都有一个一致的方向和大致相同的幅度。下图描述了一个强烈的季节性序列。由于圣诞购物,新南威尔士州12月的零售额有明显的季节性大幅增长。在这个例子中,季节性成分的幅度随着时间的推移而增加,趋势也是如此。 |
Figure 1: Monthly Retail Sales in New South Wales (NSW) Retail Department Stores |
图1:新南威尔士州的每月零售额 零售百货公司(图见原文链接,此处不附图,下同) |
什么是不规则项?WHAT IS AN IRREGULAR?
The irregular component (sometimes also known as the residual) is what remains after the seasonal and trend components of a time series have been estimated and removed. It results from short term fluctuations in the series which are neither systematic nor predictable. In a highly irregular series, these fluctuations can dominate movements, which will mask the trend and seasonality. The following graph is of a highly irregular time series: |
不规则成分(有时也称为残差)是在估计和去除时间序列的季节性和趋势性成分后留下的。它是序列中既不系统也不可预测的短期波动造成的。在一个高度不规则的序列中,这些波动可能会主导运动,从而掩盖了趋势和季节性。下图是一个高度不规则的时间序列。 |
Figure 2: Monthly Value of Building Approvals, Australian Capital Territory (ACT) |
图2:澳大利亚首都直辖区建筑审批的月价值 |
什么是趋势?WHAT IS THE TREND?
The ABS trend is defined as the ‘long term’ movement in a time series without calendar related and irregular effects, and is a reflection of the underlying level. It is the result of influences such as population growth, price inflation and general economic changes. The following graph depicts a series in which there is an obvious upward trend over time: |
ABS趋势被定义为时间序列中没有日历相关和不规则影响的 “长期”运动,是基本水平的反映。它是人口增长、价格上涨和总体经济变化等影响因素的结果。下图描述了一个随时间推移有明显上升趋势的序列。 |
Figure 3: Quarterly Gross Domestic Product |
图3.季度国内生产总值 季度国内生产总值 |
时间序列分解的基本模型有哪些?WHAT ARE THE UNDERLYING MODELS USED TO DECOMPOSE THE OBSERVED TIME SERIES?
Decomposition models are typically additive or multiplicative, but can also take other forms such as pseudo-additive. |
分解模型通常为加法或乘法,但也可以采取其他形式,如伪加法。 |
Additive Decomposition |
加法分解 |
In some time series, the amplitude of both the seasonal and irregular variations do not change as the level of the trend rises or falls. In such cases, an additive model is appropriate. |
在某些时间序列中,季节性变化和不规则变化的幅度并不随着趋势水平的上升或下降而变化。在这种情况下,适合采用加法模型。 |
In the additive model, the observed time series ($ O_t $) is considered to be the sum of three independent components: the seasonal \(S_t\), the trend \(T_t\) and the irregular \(I_t\). |
在加法模型中,观测到的时间序列(\(O_t\))被认为是三个独立成分的总和:季节性\(S_t\)、趋势\(T_t\)和不规则\(I_t\)。 |
\[观察序列=趋势+季节性+不规则项\]
\[O_t=T_t+S_t+I_t\]
Each of the three components has the same units as the original series. The seasonally adjusted series is obtained by estimating and removing the seasonal effects from the original time series. The estimated seasonal component is denoted by The seasonally adjusted estimates can be expressed by: |
三个组成部分中的每一个都具有与原始序列相同的单位。季节调整后的序列是通过估计和去除原始时间序列的季节性影响而得到的。估计的季节性成分用下列公式表示: |
\[季节调整后的序列=观察序列-季节性 = 趋势+不规则项\]
\[SA_t=O_t-\hat S_t =T_t+I_t\]
The following figure depicts a typically additive series. The underlying level of the series fluctuates but the magnitude of the seasonal spikes remains approximately stable. |
下图描述了一个典型的加法系列。该系列的基本水平有所波动,但季节性峰值的幅度大致保持稳定。 |
Figure 4: General Government and Other Current Transfers to Other Sectors |
图4:一般政府和其他经常性转移到其他部门的资金 |
Multiplicative Decomposition |
乘法分解 |
In many time series, the amplitude of both the seasonal and irregular variations increase as the level of the trend rises. In this situation, a multiplicative model is usually appropriate. |
在许多时间序列中,随着趋势水平的上升,季节性变化和不规则变化的幅度都会增加。在这种情况下,通常适合采用乘法模型。 |
In the multiplicative model, the original time series is expressed as the product of trend, seasonal and irregular components. |
在乘法模型中,原时间序列表示为趋势性、季节性和不规则成分的乘积。 |
\[观察序列=趋势*季节性*不规则项\]
\[O_t=T_t*S_t*I_t\]
The seasonally adjusted data then becomes: |
经季节调整后的数据就变成了。 |
\[季节调整后的序列=\frac{观察序列}{季节性} = 趋势*不规则项\]
\[SA_t=\frac{O_t}{\hat S_t} =T_t*I_t\]
Under this model, the trend has the same units as the original series, but the seasonal and irregular components are unitless factors, distributed around 1. |
在这种模式下,趋势的单位与原序列相同,但季节性和不规则成分是无单位因素,分布在1左右。 |
Most of the series analysed by the ABS show characteristics of a multiplicative model. As the underlying level of the series changes, the magnitude of the seasonal fluctuations varies as well. |
澳大利亚统计局分析的大多数系列都显示出乘数模型的特征。随着系列的基本水平的变化,季节性波动的幅度也在变化。 |
Figure 5: Monthly NSW ANZ Job Advertisements |
图5:新南威尔士州澳新银行每月的招聘广告 |
Pseudo-Additive Decomposition |
伪加法分解 |
The multiplicative model cannot be used when the original time series contains very small or zero values. This is because it is not possible to divide a number by zero. In these cases, a pseudo additive model combining the elements of both the additive and multiplicative models is used. This model assumes that seasonal and irregular variations are both dependent on the level of the trend but independent of each other. |
当原始时间序列包含非常小或零的值时,不能使用乘法模型。这是因为不可能将一个数字除以零。在这些情况下,可以使用结合了加法和乘法模型要素的伪加法模型。这种模型假定季节性和不规则的变化都取决于趋势的水平,但相互独立。 |
The original data can be expressed in the following form: |
原始数据可以用以下形式表示。 |
\[O_t=T_t+T_t*(S_t-1)+T_t*(I_t-1)=T_t*(S_t+I_t-1)\]
The pseudo-additive model continues the convention of the multiplicative model to have both the seasonal factor \(S_t\) and the irregular factor \(I_t\) centred around one. Therefore we need to subtract one from \(S_t\) and \(I_t\) to ensure that the terms \(T_t\) x (\(S_t\) - 1) and \(T_t\) x (\(I_t\) - 1) are centred around zero. These terms can be interpreted as the additive seasonal and additive irregular components respectively and because they are centred around zero the original data \(O_t\) will be centred around the trend values \(T_t\). |
伪加法模型延续了乘法模型的惯例,季节系数St和不规则系数It都以1为中心。因此,我们需要从\(S_t\) 和\(I_t\)中减去1,以确保项\(T_t\) x (\(S_t\) - 1)和\(T_t\) x (\(I_t\) - 1)以零为中心。这些项可以分别解释为季节性和不规则成分的加法,由于它们的中心值为零,所以原始数据\(O_t\)将以趋势值\(T_t\) 为中心。 |
The seasonally adjusted estimate is defined to be: |
经季节性调整后的估计数被定义为: |
\[SA_t=O_t-\hat T_t *(\hat S_t-1) =T_t*I_t\]
where \(\hat T_t\) and \(\hat S_t\) are the trend and seasonal component estimates. In the pseudo-additive model, the trend has the same units as the original series, but the seasonal and irregular components are unitless factors, distributed around 1. |
其中\(\hat T_t\) 和 \(\hat S_t\)为趋势和季节性成分估计值。在伪加法模型中,趋势的单位与原序列相同,但季节性和不规则成分是无单位因素,分布在1左右。 |
An example of series that requires a pseudo-additive decomposition model is shown below. This model is used as cereal crops are only produced during certain months, with crop production being virtually zero for one quarter each year. |
下面是一个需要伪加法分解模型的系列实例。使用该模型的原因是,谷物作物只在某些月份生产,每年有一个季度的作物产量几乎为零。 |
Figure 6: Quarterly Gross Value for the Production of Cereal Crops |
图6:谷类作物生产的季度总产值。 |
Example: Shiskin Decomposition |
例子。Shiskin分解 |
The Shiskin decomposition gives graphs of the original series, seasonally adjusted series, trend series, residual (irregular) factors and the between month (seasonal) and within month (trading day) factors that are combined to form the combined adjustment factors. The residual (irregular) factors are found by dividing the seasonally adjusted series by the trend series. Figure 7 shows a Shiskin decomposition for the Australian Retail series. |
Shiskin分解法给出了原始序列、季节性调整序列、趋势序列、残差(不规则)因子以及月间(季节性)和月内(交易日)因子的图形,这些因子组合在一起形成综合调整因子。残差(不规则)因子由季节调整序列除以趋势序列得出。图7是澳大利亚零售系列的Shiskin分解图。 |
Figure 7: Shiskin decomposition for Australian Total Retail Turnover, May 1990 to May 2000 |
图7:1990年5月至2000年5月澳大利亚零售业总营业额的Shiskin分解图。 |
我如何知道使用哪种分解模型?HOW DO I KNOW WHICH DECOMPOSITION MODEL TO USE?
To choose an appropriate decomposition model, the time series analyst will examine a graph of the original series and try a range of models, selecting the one which yields the most stable seasonal component. If the magnitude of the seasonal component is relatively constant regardless of changes in the trend, an additive model is suitable. If it varies with changes in the trend, a multiplicative model is the most likely candidate. However if the series contains values close or equal to zero, and the magnitude of seasonal component appears to be dependent upon the trend level, then pseudo-additive model is most appropriate. |
为了选择一个合适的分解模型,时间序列分析者将研究原始序列的图形,并尝试一系列模型,选择产生最稳定的季节性成分的模型。如果无论趋势如何变化,季节性成分的幅度都相对稳定,则适合采用加法模型。如果它随着趋势的变化而变化,那么乘法模型是最可能的候选模型。但是,如果序列中的数值接近或等于零,而且季节性成分的大小似乎取决于趋势水平,那么伪加法模型是最合适的。 |
WHAT IS A SEASONAL AND IRREGULAR (SI) CHART?
Once the trend component is estimated, it can be removed from the original data, leaving behind the combined seasonal and irregular components or SIs. A seasonal and irregular or SI chart graphically presents the SI’s for particular months or quarters in the series span. |
一旦趋势成分被估计出来,就可以从原始数据中删除,留下季节性和不规则成分或SI的组合。季节性和不规则成分或SI图表以图形方式显示了序列跨度中特定月份或季度的SI。 |
The following graph is an SI chart for a monthly series, using a multiplicative decomposition model. |
下图是一个月度系列的SI图,采用的是乘法分解模型。 |
Figure 8: Seasonal and Irregular (SI) Chart - Value of Building Approvals, ACT |
图8:季节性和不规则(SI)图 – – 澳大利亚首都直辖区建筑审批的价值。 |
The points represent the SIs obtained from the time series, while the solid line shows the seasonal component. The seasonal component is calculated by smoothing the SI’s, to remove irregular influences. |
这些点代表从时间序列中获得的SI,而实线则显示季节性成分。季节性成分是通过平滑SI计算的,以消除不规则的影响。 |
SI charts are useful in determining whether short-term movements are caused by seasonal or irregular influences. In the graph above, the SIs can be seen to fluctuate erratically, which indicates the time series under analysis is dominated by its irregular component. |
SI图有助于确定短期变动是由季节性还是不规则影响引起的。在上图中,可以看到SIs不规则地波动,这表明所分析的时间序列是由其不规则成分主导的。 |
SI charts are also used to identify seasonal breaks, moving holiday patterns and extreme values in a time series. |
SI图表还可用于识别季节性休息、移动假日模式和时间序列中的极端值。 |