SQL transform is one of the key transform that we use in DI Jobs. In SAS DI Studio there is no transform for data step specifically for merging various data sets that the SAS developer are so used to. So for all the merge/join operations we use the standard SQL transform that is available in SAS DI.
SQL transform generates at the back end proc sql code. There are lot of advantages when we use the sql transform instead of the user written which should be avoided to the extent possible.
Some of the advantages that I can think right over my head are.
1) The metadata adaptation of the transform is good. So even in case of the any change in the underlying data set the SQL transform automatically adjusts.
2) There is option in SQL transform to generate code for explicit sql pass through.
3) Sub queries and the functionality of the case statement is good so we can maintain the code in a better fashion.
4) Specification of the types of joins and the filter conditions conditions is good.
5) And finally as with any visual tools the automatic code generation by the transform save times lot of precious coding and maintenance time for the developers.
As the great person has said that perfection of the tool only comes when the user knows all the limitations that it keeps on him.
With my experience I have faced the following limitations which lead me to use user written code even though I wanted to avoid it.
1) Problem with explicit sql pass through when DBMS column size is more than 32 chars.
As known column width in the data sets is limited in SAS to 32 chars. But the column width can be more than 32 chars in DBMS such as SQL Server, so when we want to do explicit sql pass through in Such a case, but interestingly the implicit sql pass through does not fail. Which means that the SQL has cognisance of the full name of the column but some how the transform developer never envisioned such a problem will come :( . Hope this will be fixed in future versions of SAS DI.
2) Problem with readbuff option.
SAS provides with various performance improvement options. One of such option is readbuff. The readbuff options drastically improves the I/O performance by increasing the buffer that is available by reading more than 1 (default) row into buffer.
By default if we dont set the readbuff option on the individual datasets SAS generates the code with does the implicit sql pass through of the join. But when I keep the readbuff option the SAS tries to pull both the data sets used in the join to the SAS layer using the readbuff option and then do the join at the SAS layer. There is no option for use to specify in the SQL transform to apply readbuff directly onto the SQL join output i.e., to apply readbuff option on the result set of the implicit sql pass through.
The readbuff option can also be applied at the library level but I have not tried it as I didnt have the rights to modified the library registration.
I think that irrespective of where the option applied on the library level helps or not it is my opinion that DI must have a place where I could have provided the option on the result set of implicit sql pass through.
3) Problem with Subquery in CASE statement.
SQL transform has limitation that it cannot have a subquery with in case statement, which is possible in the proc sql.
4) Whenever a new change happens to the dataflow of the DI jobs all the columns are erroneously propagated from the source to the target.
It is very irritating for the developer to correct the column mapping again and again whenever there is a change in the job. Though this problem is not specific to the SQL transform but I wanted to specify this because the tool is over intelligent to do what we are not saying also. This may result is erroneous data flow big time pain for the developers as they will need to check and correct the data mapping for each and every transform.
5) Vertical string concatenation is not possible in proc sql.
Suppose there is a data set
A
x y
1 a
1 b
1 c
2 d
2 e
3 a
data set B
x y_str
1 a,b,c
2 d,e
3 a
This operation is not possible in proc sql, so we have to write user written code ( data step ) to do the job.
---
SQL transform generates at the back end proc sql code. There are lot of advantages when we use the sql transform instead of the user written which should be avoided to the extent possible.
Some of the advantages that I can think right over my head are.
1) The metadata adaptation of the transform is good. So even in case of the any change in the underlying data set the SQL transform automatically adjusts.
2) There is option in SQL transform to generate code for explicit sql pass through.
3) Sub queries and the functionality of the case statement is good so we can maintain the code in a better fashion.
4) Specification of the types of joins and the filter conditions conditions is good.
5) And finally as with any visual tools the automatic code generation by the transform save times lot of precious coding and maintenance time for the developers.
As the great person has said that perfection of the tool only comes when the user knows all the limitations that it keeps on him.
With my experience I have faced the following limitations which lead me to use user written code even though I wanted to avoid it.
1) Problem with explicit sql pass through when DBMS column size is more than 32 chars.
As known column width in the data sets is limited in SAS to 32 chars. But the column width can be more than 32 chars in DBMS such as SQL Server, so when we want to do explicit sql pass through in Such a case, but interestingly the implicit sql pass through does not fail. Which means that the SQL has cognisance of the full name of the column but some how the transform developer never envisioned such a problem will come :( . Hope this will be fixed in future versions of SAS DI.
2) Problem with readbuff option.
SAS provides with various performance improvement options. One of such option is readbuff. The readbuff options drastically improves the I/O performance by increasing the buffer that is available by reading more than 1 (default) row into buffer.
By default if we dont set the readbuff option on the individual datasets SAS generates the code with does the implicit sql pass through of the join. But when I keep the readbuff option the SAS tries to pull both the data sets used in the join to the SAS layer using the readbuff option and then do the join at the SAS layer. There is no option for use to specify in the SQL transform to apply readbuff directly onto the SQL join output i.e., to apply readbuff option on the result set of the implicit sql pass through.
The readbuff option can also be applied at the library level but I have not tried it as I didnt have the rights to modified the library registration.
I think that irrespective of where the option applied on the library level helps or not it is my opinion that DI must have a place where I could have provided the option on the result set of implicit sql pass through.
3) Problem with Subquery in CASE statement.
SQL transform has limitation that it cannot have a subquery with in case statement, which is possible in the proc sql.
4) Whenever a new change happens to the dataflow of the DI jobs all the columns are erroneously propagated from the source to the target.
It is very irritating for the developer to correct the column mapping again and again whenever there is a change in the job. Though this problem is not specific to the SQL transform but I wanted to specify this because the tool is over intelligent to do what we are not saying also. This may result is erroneous data flow big time pain for the developers as they will need to check and correct the data mapping for each and every transform.
5) Vertical string concatenation is not possible in proc sql.
Suppose there is a data set
A
x y
1 a
1 b
1 c
2 d
2 e
3 a
data set B
x y_str
1 a,b,c
2 d,e
3 a
This operation is not possible in proc sql, so we have to write user written code ( data step ) to do the job.
---