|
|
Time Series Requirement (Version 1.1)
The intent of this document is to define a list of requirement to facilitate the creation, storage and transformation of time series. First, some terminology:
Requirements(R-1) At a given index in a time series all values for all variables are for the same moment in time. (R-2) The range of the index can be either the intersection or the union of all timestamps of all variables. When doing intersection (Padding=Disable), user can assume that all variables have a defined value for each index.
Figure 1 - Padding Attribute In this picture, the green background represent the range of the index. Each black block represent the portion of the variable for which values have been defined by the user or generated by a function. The red block represent the portion of the index for which the variable value will be a "pad value". Notice that "pad value" are found either at the beginning or end of the variable, never in the middle. When two variable overlaps, they are always synchronized. See (R-5). For a time series of double, the "pad value" is defined as NaN (Not A Number). Pad value are never used by built-in function acting on the variable e.g. mathematic operation, statistics and technical analysis functions. (R-3) For each index, there must be a boolean (true/false) to verify if a given value is a "pad value" or not. (R-4) An interface could be added to allow time series of user defined type. It will be required to provide methods for interpolation, time compression, time expansion, creation, copy, deletion and serialization of the user defined type. (R-5) Key requirement is the ability to process and compare time series that are perfectly synchronized. Two type of synchronization are supported: (R-6) The following are rules for both synchronization type:
(R-7) The default interpolation method is 'oldest neighbor'. Alternative methods are: the more recent neighbor value, use a fix value, linear or average of the two neighbors. (R-8) Interpolation logic can be user defined. (R-9) A variable can be removed from a time series. (R-10) It shall be possible to:
(R-11) It shall be possible to define time axis that covers a "natural period", such as one year, one month etc. It shall also be possible to have a time axis that provides only ordering, not a real time reference. The first moment is "0", the next moment is "1" and so on...
*SQL lower limit is 1/1/1753. Operation involving SQL should fail gracefully with a "date overflow" exception. (R-12) When creating variables from user provided data, the range of the timestamps must not conflict with the Timebase attribute. Example: If the time series has a period of 1 day, only variable built with intraday data can be added. An error/exception shall be reported when there is an attempt to add data overlapping two days. (R-13) It should be possible to perform mathematical operation between time series variable. The operation is performed individually for each index. Example: a = b + c where a, b and c are time series variable. Each value of 'a' are the addition of the corresponding value of 'b' and 'c' at the same moment in time. (R-15) Design should provide a mean to create new variables by iterating through a time series variables. This iterator is intended to facilitate (among other things) the creation of new user defined technical analysis function. (R-16) User shall be able to create a copy of a time series and all its variables. (R-17) Because a time series can become quite large objects, it is expected that it can be manipulated by reference. (R-18) User shall be able to build a variable with simply an array of floating point values. (R-19) Timestamps for a time series can be defined with either an array of DateTime values (type is platform dependent) or by specifying a start timestamp and the period for subsequent index. (R-20) User should be able to build time series from comma separated value ASCII files and SQL database. (R-21) Time series should be serializable to and from a XML format to facilitate network transmission or save/load from storage. (R-22) On iteration, the values are always delivered in chronological orders. (R-23) Operation between time series with different Timebase or periodicity should fail. (R-24) Support for Univariate and Multivariate time series functions. Univariate time series consists of single observations recorded sequentially. (R-25) Each variable in a time series should be identified with a unique name. The user should be able to rename variables. (R-26) A valid variable name is a sequence of one or more ASCII printable character. At least one character must be different than space. Names are not case sensitive. "Ab" is the same variable as "aB". The user specified cases shall be preserved. The following printable characters are allowed (first character is space and last character is ~):
(R-27) Sorting of variable's name is the ASCII-Order with the exception that uppercase are considered equivalent to lower case and leading space(s) are ignored. (R-28) Among the variables of a time series, the defaults are selected by the sorting algorithm specified by (R-27). (R-29) User should be able to express concisely the creation of a new time series by enumerating a subset of variables of an existing time series. (R-30) User should be able to express concisely the creation of a new time series built with variables in a specific order. The variables name in the new time series are unspecified. The user can only assume that the variables names are in the proper sorted order. (R-31) Variable names "Open", "High", "Low", "Close", "Volume" are expected to express historical market data. Upon addition of these variables, the time series shall report an error/exception if any of the following rule is broken:
(R-32) User should be able to iterate the list of variable name of a time series. (R-33) Values representing a period (like a price bar), should have its timestamp represent the start of the period. Examples: the first 5 minute of trading starting at 9AM will have a timestamp of 9:00AM. The second value will be 9:05AM. A yearly data will have a timestamp of January 1st 00:00:00AM (R-34) User should be able to access a value by using a timestamp (instead of an index). Method should allow to retrieve the value at an exact timestamp, the oldest neighbor or the more recent neighbor. A Boolean should allow to verify first if the value exist. (R-35) When timebase is changed such that some fields are lost, it is not expected that a revert will be possible e.g. change from OneYear to OneHour cannot be reverted, but further change to OneSecond is allowed. That means that once a Timebase is changed to "None" all the time information is lost and it is not possible to change anymore the Timebase parameter. (R-36) The user shall be able to attach zero or more user defined object to a value of a variable, a variable, a specific timestamp or a time series. The object(s) shall be accessible by name. Do not confuse these user defined object with the user define type for values (shown also below).
Figure 2 - Attached object versus user defined type of value (R-37) The user shall be able to create a new variable where each value is the result of an operation performed on all values aligned on the same index. Operation expected are Min, Max, Average and Summation.
Figure 3 - Aligned Operation (R-38) Time series shall provide a way to specify the timestamps periodicity. The periodicity indicates the amount of time that are covered by each value. Remember that the time series are not necessarily continuous, so periodicity can't be derived from a delta of the timestamps. The periodicity is an optional attribute, and when not specified all the period compression and expansion operations cannot be performed. The following list of periodicity must be supported:
Figure 4 - Supported Periodicity (R-39) Method should be provided to create a new time series with Open, High, Low, Close variables from one existing variable. The Open, High, Low, Close should represent the compression or expansion of the variable at the user specified period. Compression by summation should also be supported e.g. compression for Volume. The existing variables must be part of a time series with a specified periodicity (see R-38). (R-40) Method should be provided to create a new time series with Open, High, Low, Close, Volume, Open Interest variables from an existing time series with the same variables. The variables will be compress or expand to any period which is an integer multiplier of the existing period. Example: 3 minutes OHLCV can be compress to 30 minutes OHLCV, but not into 5 minutes OHLCV. The existing variables must be part of a time series with a specified periodicity (see R-38). (R-41) User shall be able to do in-memory data compression of a time series. The user shall control the compress/uncompress state. It is not expected that the time series data is accessible while compressed. (R-42) The time series shall be serializable to/from storage while compressed. It is desirable, but not mandatory that the compressed binary stream is portable. (R-43) User shall be able to create a compress/uncompress copy of a time series. (R-44) The user can build a new variable by defining a function that will be called for each index. The parameter of the function is an array of all the corresponding aligned value. The function returns a value for the same index in the new variable. (R-45) The user can choose to be notified or not by an exception (or error code) when an operation did generate and empty variable because of lack of sufficient data from the inputs variables. (R-46) The time series should have two attributes reflecting the following characteristic: Sampling Uniformity (Read Only) A time series is uniform whenever one of the following is true: Sample Type (Read and Write) A continuous sample covers all possible measurements between two moments (the "Period").
Figure 5 - Sampling and Sample Characteristic For financial market data, the time series will typically be a non-uniform sampling of continuous sample.
|
| # | Goal | |
|---|---|---|
| UC-1 |
A user wants to implement a new technical analysis indicator defined as follow:
| |
| UC-2 |
User needs to calculate the correlation between the typical price of the IBM and MSFT market data. The user wish also to display on the console the timestamp, the typical price and the closes of both stocks. Each line will represent one price bar. | |
| UC-3 |
The user wish to do technical analysis on a series of price bar built from plain old arrays. The timestamps are irrelevant to the user, but still the data is expected to be easily aligned with the input arrays. | |
| UC-4 |
User wants to use 10 years of December contract for wheat and calculate the average daily close for every moment up to expiration. | |
| UC-5 |
Implement Tushar S. Chande's VIDYA as published in
Stock&Commodities V10:3. Corresponding EasyLanguage is:
| |
| UC-6 |
User wants to iterate into daily data of IBM. The user logic
requires for each day: | |
| UC-7 |
This is an example of logic using multiple time frame. User wish to trig a buy when a 1 minute moving average cross over occurs (period 3 and 8 will be used). Furthermore, a buy can occur only if the Dow Jones Relative Strength Index over a period of 5 price bar is greater than 70. The time frame of the RSI is 3 minutes. Users data source has only price bar of 1 minutes. The user will check for a new buy signal every minute during trading hours by calling a function. | |
| UC-8 | Implement the equivalent of Wealth-Lab synchronization logic explained here. | |
| UC-9 | User wish to: - Add Sell/Buy arrows represented by "graphical object". - Add annotation to some values of variables. - Have a string to comment each variable. - Have a time stamp to remember when a time series was created. The user wants all these added data to be saved on local storage and be restored later. | |
| UC-10 | User wants to verify the hypothesis that the 10 minutes simple moving average of the price at 11:30AM is lower than the open. This is verified for a basket of securities that do not necessarily have the exact same trading hours. | |
| UC-11 | User wants to measure the typical 5-days momentum for every trading days of the year. The user is curious to see if perceptible difference are observable prior to some holidays. The users has up to 20 years of daily data of 8000 stocks. Not all stock were traded for the whole 20 year period, still, their data is expected to be used. | |
| UC-12 | User wish to calculate the spread between bid/ask and make a 10 minutes moving average of it. If on the last known data, the spread exceed the average, generate a buy signal. The user uses tick data. | |
| UC-13 | User wish to align 10 years worth of S&P500 market data on lunar cycle. |
The following tables helps to do cross-reference between the requirement, the use-case and the interface of a commercial product (not yet released).
| # | Interface | Use Case | Release Version |
|---|---|---|---|
| R-1 | Variable [] operator. | ||
| R-2 | Variable [] operator to access floating point representation and Values object to access value object. | UC-4 | |
| R-3 | Variable [] operator return double. Value Property: IsDefined, AsDouble and AsObject | ||
| R-4 | <<TO BE DONE>> | ||
| R-5 | Timeseries.SyncMode and Timeseries.Sync static method. | ||
| R-6 | Variable Interpolation and FixValue properties. | ||
| R-7 | <<TO BE DONE>> | ||
| R-8 | Timeseries Delete | ||
| R-9 | See the following functions for a variety of ways functions with different input/output are handled: - Sma has one input and one output. - TypPrice has multiple input and one output. - Macd has one input and multiple output. - Aroon has multiple input and multiple output. | ||
| R-10 | Change of value is done with the Variable method setValue. These will replace existing value at the same index/timestamp and insert/append/prepend new values. The [ ] operator will be the equivalent of setValue(index,double). Deletion is done with the Timeseries method delValue(). |
| R-11 | <<TO BE DONE>> | ||
| R-12 | <<TO BE DONE>> | ||
| R-13 | Arithmetic operator between variables. Also, implicit cast of timeseries to its default variable. Operator between variable and floating-value are also supported. | ||
| R-14 | Same interface as R-13 | ||
| R-15 | <<TO BE DONE>> | ||
| R-16 | Timeseries Clone() | ||
| R-17 | On .NET, the Timeseries are a reference type. | ||
| R-18 | <<TO BE DONE>> | ||
| R-19 | <<TO BE DONE>> | ||
| R-20 | <<TO BE DONE>> |
| R-21 | For .NET, attribute Serializable is set for Timeseries and all its referred objects. | ||
| R-22 | No interface needed. | ||
| R-23 | No interface needed. | ||
| R-24 | Member functions of Timeseries object are either multivariate or univariate. Member functions of variables are always univariate. | ||
| R-25 | Timeseries Rename, Delete, SetVar and GetVar method. Timeseries [] operator is equivalent to SetVar and GetVar. | ||
| R-26 | No interface needed. | ||
| R-27 | No interface needed. | ||
| R-28 | Timeseries Default property allows to directly refer to the default variable. | ||
| R-29 | Timeseries Select() method. | ||
| R-30 | Timeseries SelectInOrder() method. |
| R-31 | Timeseries have Open, High, Low, Close, Volume variable always defined. | ||
| R-32 | <<TO BE DONE>> | ||
| R-33 | No interface needed. | ||
| R-34 | <<TO BE DONE>> | ||
| R-35 | Variable Timestamps field. | ||
| R-36 | <<TO BE DONE>> | ||
| R-37 | <<TO BE DONE>> | ||
| R-38 | Timeseries Periodicity property. | ||
| R-39 | Variable CreateBar() and ConvertPeriod() method. Timeseries Periodicity property can be change to modify all the variables at once. | ||
| R-40 | Timeseries ConvertBarPeriod() method. |
| R-41 | Timeseries DataCompression property. | ||
| R-42 | Same interface as R-21 | ||
| R-43 | Timeseries Clone() | ||
| R-44 | <<TO BE DONE>> | ||
| R-45 | Timeseries AllowEmptyVariables property. | ||
|