VTL has a string concatenation operator,
|| , while STATA as well as many programming languages use
+. This makes it difficult to translate from STATA to VTL, bacause the translator has no way of knowing when a
+ is supposed to be translated into
|| as it can't tell the difference between names for numerical and alphanumerical variables.
generate newvar = oldvar1 + oldvar2
Should the expression be translated to
oldvar1 + oldvar2or
oldvar1 || oldvar2?
The best solution would be to use
+ as concatenation operator in VTL.
"Boolean expression" in filter
The VLT spec (v1.1 line 5408 ) states that filters must be "boolean expressions". But what is this exactly? In STATA, there is no real difference between a boolean and a numerical expression.
Variable wildcards and ranges
In STATA, some commands allow wildacards in variable names and/or ranges of variables. I can't find any suport for this in VTL. The wildcards and ranges can't be expanded unless the translator knows about all variables , including the generated ones (and their order). Many
egen functions, as well as
reshape rely on wildcards.
STATA has support for macros, in loops they are even mandatory. Macros are tricky, because they can appear anywhere in the program and must be expanded by a preprocessor before the parser can deal with the program, just like C/C++ . The trickiest part is when there are macros are referenced within macro references, like
These are difficult to expand because the number of iterations is not known in advance. If the exit condition contains information from within the dataset, it cannot be evaluated.
STATA has 27 missing values: sysmiss (represented by the character
.) and 26 user missing values (represented by
.z). They can sometimes be referred to collectively as
missing . My understanding is that VTL has only one missing value:
For VTL to be compatible with STATA, this must be addressed. As I see it, there are two options, both with pros and cons:
- VTL could mimic STATA's fixed user missing values.
- Pro: Easy to translate from STATA and SAS(?) to VTL
- Con: Not a good way to deal with missing values in the first place.
- Allow any numerical value to be tagged as missing in variable metadata.
- Pro: More flexible, compatible with other formats like SPSS
- Con: A STATA value like
.acannot be translated to a numerical value unless there is a known, unused value available. This would require access to the actual data.