Technical Analysis Library - OO Design Documentation

Home

Documentation
Download
Developers
Other Links

API
Function Index
TADOC

Forum
SourceForge

 

Time Series Requirement (Version 1.1)

horizontal rule

The intent of this document is to define a list of requirement to facilitate the creation, storage and transformation of time series.

This document propose a set of requirements, some use cases and sometimes a hint of a desirable interface.

This requirement is actively used in the context of a commercial product (not yet released). This requirement document is donated to the community in case someone would be interested to implement an equivalent free/open-source version.

First, some terminology:

 Time seriesAgglomeration of one or more variables that varies over time.
Example: A technical analysis chart typically display a time series compose of the daily value of the open, high, low, close and volume.
 VariableA symbol that can hold different values at different times.
Example: The close price is a variable.
 TimestampsDefine the moments in time.
 IndexAn offset within a time series. The index 0 represent the first moment in time, the index 1 is the next moment and so on.
 ValueA variable at a given time. Example: 1.12$ is the value for the open variable on January 3rd, 2005.
 Time CompressionAgglomeration of multiple values into one e.g. weekly to yearly.
 Time ExpansionTransforming one value into multiple values equally distant in time e.g. yearly to weekly.
 Data CompressionLossless encoding/decoding of data/object for the purpose of minimizing its size for storage and/or network transmission.
 InterpolationEstimation of new values BETWEEN existing values.
 ExtrapolationExtending a time series into the future (or past) using existing values. Say you have the value A and B you can find the slope and calculate the NEXT value C. To avoid to artificially create future and past data, TA-Lib objects never perform extrapolations.

horizontal rule

Requirements

(R-1) At a given index in a time series all values for all variables are for the same moment in time.

(R-2) The range of the index can be either the intersection or the union of all timestamps of all variables.

When doing intersection (Padding=Disable), user can assume that all variables have a defined value for each index.

When doing union (Padding=Enable), a default filler is needed when there is no defined value. This filler is called the "pad value".
 

Figure 1 - Padding Attribute

In this picture, the green background represent the range of the index. Each black block represent the portion of the variable for which values have been defined by the user or generated by a function. The red block represent the portion of the index for which the variable value will be a "pad value". Notice that "pad value" are found either at the beginning or end of the variable, never in the middle. When two variable overlaps, they are always synchronized. See  (R-5).

For a time series of double, the "pad value" is defined as NaN (Not A Number). Pad value are never used by built-in function acting on the variable e.g. mathematic operation, statistics and technical analysis functions.

(R-3) For each index, there must be a boolean (true/false) to verify if a given value is a "pad value" or not.

(R-4) An interface could be added to allow time series of user defined type. It will be required to provide methods for interpolation, time compression, time expansion, creation, copy, deletion and serialization of the user defined type.

(R-5) Key requirement is the ability to process and compare time series that are perfectly synchronized. Two type of synchronization are supported:

Time series Synchronization
The user can use an operator to synchronize at once all the variables of a given time series with another time series.

Example:
   Timeseries a, b;
   ...
   a.SyncTo( b, ActionOnMissing.Interpolate, ActionOnExtra.Delete );
   b.SyncTo( a, ActionOnMissing.Interpolate, ActionOnExtra.Ignore );

Time series 'a' is modified first. Missing values in 'a' are interpolated. Extra values in 'a' are deleted. Time series 'b' is modified second. A missing values in 'b' would cause an exception and the extra are ignored (no action).

List of ActionOnMissing : Ignore, Interpolate
(default), Exception
List of ActionOnExtra: Ignore
(default), Delete, Exception

Auto Synchronization
New variables can be added to a time series. The new variable might not have the same number of values as the existing variables. The new variable might come with a different set of timestamps. These discrepancies are either detected or resolved at the time the addition is made. The added variable is NOT modified. The existing variables in the time series are the ones being synchronized.

Example 1:
 Timeseries a, b, c;
 ...
 a["Var1"] = b.Sma(12);
 a["Var2"] = c.Sma(12);

Var1 will be synchronized to Var2 because Var2 is being added.

Example 2:
  a["Var2"].AddAt( "10/01/01", 112.2 );

When an element is added to Var2, all other variables in 'a' are synchronized to 'Var2'.

API example of how the auto synchronization behavior is defined:
  a.AutoSyncOnMissing = AutoSyncOnMissing.Interpolate;
  a.AutoSyncOnExtra = AutoSyncOnExtra.Delete;

List of AutoSyncOnMissing: Interpolate
(default), Exception.
List of AutoSyncOnExtra: Delete
(default), Exception

All variables in a time series must have the same periodicity (when defined). Attempt to add a variable of a different periodicity will fail.

(R-6) The following are rules for both synchronization type:

  1. Synchronization is perform only among the intersection of timestamps among all variables (also called the "common range")

  2. "Exception" means a .NET/Java exception is thrown. All variables are restored to their previous valid state on exception.

  3. When "Delete" is used, the extra values are permanently loss.

  4. When "Ignore" is used, the resulting time series might not be synchronized. This is why auto synchronization does not allow the "Ignore" option because it must be either complete or failed.

  5. No synchronization take place when the Timebase is "None".

  6. When interpolation and deletion are both requested for the same variable, the interpolation operations is always performed first.

(R-7) The default interpolation method is 'oldest neighbor'. Alternative methods are: the more recent neighbor value, use a fix value, linear or average of the two neighbors.

(R-8) Interpolation logic can be user defined.

(R-9) A variable can be removed from a time series.

(R-10) It shall be possible to:

  1. Modify the value of a variable at a given index/timestamp.

  2. Insert one or many values before/after existing ones. Insertion can trig an interpolation in other variables or an error (depending of the synchronization settings).

  3. Delete one or many value at a given index/timestamp. Deletion can trig an interpolation in other variables or an error (depending of the synchronization setting).

(R-11) It shall be possible to define time axis that covers a "natural period", such as one year, one month etc.

This allows to, say, align multiple variables that span a year even if they are not sample done of the SAME year.

It shall also be possible to have a time axis that provides only ordering, not a real time reference. The first moment is "0", the next moment is "1" and so on...

The "Timebase" attribute controls the type of timestamps and the "natural period":

 TimebaseFields Minimum Possible
Timestamp
Maximum Possible
Timestamp
 Infinitemonth/day/year hour:min:seconds.ms01/01/0001 00:00:00.000 *12/31/9999 23:59:59.999
 OneYearmonth/day hour:min:sec.ms01/01 00:00:00.00012/31 23:59:59.999
 OneMonthday hour:min:seconds.ms01 00:00:00.00031 23:59:59.999
 OneDayhour:min:seconds.ms00:00:00.00023:59:59.999
 OneHourmin:seconds.ms00:00.00059:59.999
 OneMinuteseconds.ms00.00059.999
 OneSecondms000999
 None(unsigned integer)0232-1

*SQL lower limit is 1/1/1753. Operation involving SQL should fail gracefully with a "date overflow" exception.

"Infinite" means all fields of a timestamp are used for synchronization and there is no boundary (beside the platform limits). All other settings are using either a subset or no timestamp field at all.

As an example, a OneMinute time base will use only the second and millisecond field. The data will then be coerce into the boundary of a minute.

When "None" new variables are align to the beginning or end of an existing variable (align left or right for visual people). Adding new variable with offset will also be possible. The date/time information is lost. The time series is now considered uniform. No synchronization will ever take place, but alignment of output from built-in function is still done.

(R-12) When creating variables from user provided data, the range of the timestamps must not conflict with the Timebase attribute. Example: If the time series has a period of 1 day, only variable built with intraday data can be added. An error/exception shall be reported when there is an attempt to add data overlapping two days.

(R-13) It should be possible to perform mathematical operation between time series variable. The operation is performed individually for each index. Example: a = b + c where a, b and c are time series variable. Each value of 'a' are the addition of the corresponding value of 'b' and 'c' at the same moment in time.

(R-14) Mathematical operations between variable of distinctive time series shall be allowed as long they share a common range of timestamps. A missing timestamps/values shall cause an error.

(R-15) Design should provide a mean to create new variables by iterating through a time series variables. This iterator is intended to facilitate (among other things) the creation of new user defined technical analysis function.

(R-16) User shall be able to create a copy of a time series and all its variables.

(R-17) Because a time series can become quite large objects, it is expected that it can be manipulated by reference.

(R-18) User shall be able to build a variable with simply an array of floating point values.

(R-19) Timestamps for a time series can be defined with either an array of DateTime values (type is platform dependent) or by specifying a start timestamp and the period for subsequent index.

(R-20) User should be able to build time series from comma separated value ASCII files and SQL database.

(R-21) Time series should be serializable to and from a XML format to facilitate network transmission or save/load from storage.

(R-22) On iteration, the values are always delivered in chronological orders.

(R-23) Operation between time series with different Timebase or periodicity should fail.

(R-24) Support for Univariate and Multivariate time series functions.

Univariate time series consists of single observations recorded sequentially.
Multivariate time series consists of a vector of observations recorded sequentially.

The time series object support both concept by allowing containment of zero, one or more variables that are always kept aligned.

Univariate functions can be applied to any single variable. Because time series have the notion of selecting a single "default variable", univariate operations can also be applied to a time series object regardless of the number of variable contained.

Multivariate functions can be applied only to time series. Obviously, multivariate functions can never be applied to a single variable object.

(R-25) Each variable in a time series should be identified with a unique name. The user should be able to rename variables.

(R-26) A valid variable name is a sequence of one or more ASCII printable character. At least one character must be different than space. Names are not case sensitive. "Ab" is the same variable as "aB". The user specified cases shall be preserved. The following printable characters are allowed (first character is space and last character is ~):


(Source Wikipedia)

(R-27) Sorting of variable's name is the ASCII-Order with the exception that uppercase are considered equivalent to lower case and leading space(s) are ignored.

(R-28) Among the variables of a time series, the defaults are selected by the sorting algorithm specified by (R-27).

(R-29) User should be able to express concisely the creation of a new time series by enumerating a subset of variables of an existing time series.

(R-30) User should be able to express concisely the creation of a new time series built with variables in a specific order. The variables name in the new time series are unspecified. The user can only assume that the variables names are in the proper sorted order.

(R-31) Variable names "Open", "High", "Low", "Close", "Volume" are expected to express historical market data.  Upon addition of these variables, the time series shall report an error/exception if any of the following rule is broken:

  1. Volume >= 0.0

  2. High >= Low

  3. High >= Open >= Low

  4. High >= Close >= Low

(R-32) User should be able to iterate the list of variable name of a time series.

(R-33) Values representing a period (like a price bar), should have its timestamp represent the start of the period. Examples: the first 5 minute of trading starting at 9AM will have a timestamp of 9:00AM. The second value will be 9:05AM. A yearly data will have a timestamp of January 1st 00:00:00AM

(R-34) User should be able to access a value by using a timestamp (instead of an index). Method should allow to retrieve the value at an exact timestamp, the oldest neighbor or the more recent neighbor. A Boolean should allow to verify first if the value exist.

(R-35) When timebase is changed such that some fields are lost, it is not expected that a revert will be possible e.g. change from OneYear to OneHour cannot be reverted, but further change to OneSecond is allowed.

That means that once a Timebase is changed to "None" all the time information is lost and it is not possible to change anymore the Timebase parameter.

(R-36) The user shall be able to attach zero or more user defined object to a value of a variable, a variable, a specific timestamp or a time series. The object(s) shall be accessible by name. Do not confuse these user defined object with the user define type for values (shown also below).

Figure 2 - Attached object versus user defined type of value

(R-37) The user shall be able to create a new variable where each value is the result of an operation performed on all values aligned on the same index. Operation expected are Min, Max, Average and Summation.


Figure 3 - Aligned Operation

(R-38) Time series shall provide a way to specify the timestamps periodicity. The periodicity indicates the amount of time that are covered by each value. Remember that the time series are not necessarily continuous, so periodicity can't be derived from a delta of the timestamps. The periodicity is an optional attribute, and when not specified all the period compression and expansion operations cannot be performed. The following list of periodicity must be supported:

Seconds: 1,2,3,4,5,6,10,12,15,20,30
Minutes: 1,2,3,4,5,6,10,12,15,20,30
Hours: 1,2,3,4,6,8,12
Daily, Weekly, Quarterly and Yearly

Figure 4 - Supported Periodicity

(R-39) Method should be provided to create a new time series with Open, High, Low, Close variables from one existing variable. The Open, High, Low, Close should represent the compression or expansion of the variable at the user specified period. Compression by summation should also be supported e.g. compression for Volume. The existing variables must be part of a time series with a specified periodicity (see R-38).

(R-40) Method should be provided to create a new time series with Open, High, Low, Close, Volume, Open Interest variables from an existing time series with the same variables. The variables will be compress or expand to any period which is an integer multiplier of the existing period. Example: 3 minutes OHLCV can be compress to 30 minutes OHLCV, but not into 5 minutes OHLCV. The existing variables must be part of a time series with a specified periodicity (see R-38).

(R-41) User shall be able to do in-memory data compression of a time series. The user shall control the compress/uncompress state. It is not expected that the time series data is accessible while compressed.

(R-42) The time series shall be serializable to/from storage while compressed. It is desirable, but not mandatory that the compressed binary stream is portable.

(R-43) User shall be able to create a compress/uncompress copy of a time series.

(R-44) The user can build a new variable by defining a function that will be called for each index. The parameter of the function is an array of all the corresponding aligned value. The function returns a value for the same index in the new variable.

(R-45) The user can choose to be notified or not by an exception (or error code) when an operation did generate and empty variable because of lack of sufficient data from the inputs variables.

(R-46) The time series should have two attributes reflecting the following characteristic:

Sampling Uniformity (Read Only)

A time series is uniform whenever one of the following is true:
   - timestamps are at equal intervals
   - there is only one or two timestamps.
   - the Timebase is "None".

Examples of uniform time series:
    - Sampling spanning over 8 hours with a value for every minutes.
    - Daily measurement of temperature for a complete year.

Examples of non-uniform time series:
    - Daily price data, but only for weekdays.
    - A time series of monthly data with one month missing.

Sample Type (Read and Write)

A continuous sample covers all possible measurements between two moments (the "Period").
Example: the timestamp is the beginning of the price bars that embed all transactions up to the beginning of the next price bar.

A discrete sample is a single observation taken at a single moment.
Example: the timestamp is the moment that a measurement of the room temperature was sampled.

Figure 5 - Sampling and Sample Characteristic

For financial market data, the time series will typically be a non-uniform sampling of continuous sample.


Use Cases

#Goal
UC-1 A user wants to implement a new technical analysis indicator defined as follow:
  For each price bar
    if( today open > yesterday close )
       newIndicator = (today open + yesterday close)/2;
    else
       newIndicator = yesterday newIndicator;
The output of the new indicator is expected to start only at the first occurrence of today's open being greater than yesterday's close.
UC-2

User needs to calculate the correlation between the typical price of the IBM and MSFT market data. The user wish also to display on the console the timestamp, the typical price and the closes of both stocks. Each line will represent one price bar.

UC-3

The user wish to do technical analysis on a series of price bar built from plain old arrays. The timestamps are irrelevant to the user, but still the data is expected to be easily aligned with the input arrays.

UC-4

User wants to use 10 years of December contract for wheat and calculate the average daily close for every moment up to expiration.

UC-5 Implement Tushar S. Chande's VIDYA as published in Stock&Commodities V10:3. Corresponding EasyLanguage is:
User Function
Inputs:Length(NumericSimple), Smooth(NumericSimple);
Vars: Up(0), Dn(0), UpSum(0), DnSum(0), AbsCMO(0), SC(0);
Up=IFF(Close>Close[1], Close-Close[1],0);
Dn=IFF(Close<Close[1],AbsValue(Close-Close[1]),0);
UpSum=Summation(Up,Length);
DnSum=Summation(Dn,Length);
If UpSum+DnSum >0 then
AbsCMO=AbsValue((UpSum-DnSum)/(UpSum+DnSum));
SC= 2/(Smooth+1);
If Currentbar=Length then VIDYA=close;
If Currentbar>Length then
VIDYA=(SC*AbsCMO*Close)+((1-(SC*AbsCMO))*VIDYA[1]);

Indicator
Inputs: Length(9), Smooth(12),PCT(1);
Value1=VIDYA(Length,Smooth):
If Value1>0 then begin
    Plot1(Value1,"VIDYA");
    Plot2(Value1*((100+PCT)/100),"UpBand");
    Plot3(Value1*((100-PCT)/100),"DnBand");
End;
UC-6

User wants to iterate into daily data of IBM. The user logic requires for each day:
   - the last earning value.
   - the date of the next earnings.
   - the corresponding 6-weeks exponential moving average of the close.

UC-7 This is an example of logic using multiple time frame.

User wish to trig a buy when a 1 minute moving average cross over occurs (period 3 and 8 will be used). Furthermore, a buy can occur only if the Dow Jones Relative Strength Index over a period of 5 price bar is greater than 70. The time frame of the RSI is 3 minutes. Users data source has only price bar of 1 minutes. The user will check for a new buy signal every minute during trading hours by calling a function.
UC-8 Implement the equivalent of Wealth-Lab synchronization logic explained here.
UC-9User wish to:
  - Add Sell/Buy arrows represented by "graphical object".
  - Add annotation to some values of variables.
  - Have a string to comment each variable.
  - Have a time stamp to remember when a time series was created.
The user wants all these added data to be saved on local storage and be restored later.
UC-10User wants to verify the hypothesis that the 10 minutes simple moving average of the price at 11:30AM is lower than the open. This is verified for a basket of securities that do not necessarily have the exact same trading hours.
UC-11User wants to measure the typical 5-days momentum for every trading days of the year. The user is curious to see if perceptible difference are observable prior to some holidays. The users has up to 20 years of daily data of 8000 stocks. Not all stock were traded for the whole 20 year period, still, their data is expected to be used.
UC-12User wish to calculate the spread between bid/ask and make a 10 minutes moving average of it. If on the last known data, the spread exceed the average, generate a buy signal. The user uses tick data.
UC-13User wish to align 10 years worth of S&P500 market data on lunar cycle.

Requirement Traceability

The following tables helps to do cross-reference between the requirement, the use-case and the interface of a commercial product (not yet released).
 

#InterfaceUse Case Release
Version
R-1Variable [] operator.  
R-2Variable [] operator to access floating point representation and Values object to access value object.UC-4 
R-3Variable [] operator return double.
Value Property: IsDefined, AsDouble and AsObject
  
R-4<<TO BE DONE>>  
R-5Timeseries.SyncMode and Timeseries.Sync static method.  
R-6Variable Interpolation and FixValue properties.   
R-7<<TO BE DONE>>  
R-8Timeseries Delete  
R-9See the following functions for a variety of ways functions with different input/output are handled:
- Sma has one input and one output.
- TypPrice has multiple input and one output.
- Macd has one input and multiple output.
- Aroon has multiple input and multiple output.
  
R-10Change of value is done with the Variable method setValue. These will replace existing value at the same index/timestamp and insert/append/prepend new values.
The [ ] operator will be the equivalent of  setValue(index,double).
Deletion is done with the Timeseries method delValue().
  
R-11<<TO BE DONE>>  
R-12<<TO BE DONE>>  
R-13Arithmetic operator between variables. Also, implicit cast of timeseries to its default variable. Operator between variable and floating-value are also supported.  
R-14Same interface as R-13  
R-15<<TO BE DONE>>  
R-16Timeseries Clone()  
R-17On .NET, the Timeseries are a reference type.  
R-18<<TO BE DONE>>  
R-19<<TO BE DONE>>  
R-20<<TO BE DONE>>  
R-21For .NET, attribute Serializable is set for Timeseries and all its referred objects.  
R-22No interface needed.  
R-23No interface needed.  
R-24Member functions of Timeseries object are either multivariate or univariate.
Member functions of variables are always univariate
.
  
R-25Timeseries Rename, Delete, SetVar and GetVar method.
Timeseries [] operator is equivalent to SetVar and GetVar.
  
R-26No interface needed.  
R-27No interface needed.  
R-28Timeseries Default property allows to directly refer to the default variable.  
R-29Timeseries Select() method.  
R-30Timeseries SelectInOrder() method.  
R-31Timeseries have Open, High, Low, Close, Volume variable always defined.  
R-32<<TO BE DONE>>  
R-33No interface needed.  
R-34<<TO BE DONE>>  
R-35Variable Timestamps field.  
R-36<<TO BE DONE>>  
R-37<<TO BE DONE>>  
R-38Timeseries Periodicity property.  
R-39Variable CreateBar() and ConvertPeriod() method.
Timeseries Periodicity property can be change to modify all the variables at once.
  
R-40Timeseries ConvertBarPeriod() method.  
R-41Timeseries DataCompression property.  
R-42Same interface as R-21  
R-43Timeseries Clone()  
R-44<<TO BE DONE>>  
R-45Timeseries AllowEmptyVariables property.  
    
Google  SourceForge Logo
  Web TA-Lib.org
 
 

Copyright© 2007 TicTacTec LLC. All Rights Reserved. Last Update: 04/15/08