Sunday, February 19, 2012

How to build a procedure that returns different numbers of columns as a result based on a

/*
Subject: How to build a procedure that returns different
numbers of columns as a result based on a parameter.

You can copy/paste this whole post in SQL Query Analyzer
or Management Studio and run it once you've made sure
there is no harmful code.

Currently we have several stored procedures which final
result is a select with several joins that returns many
columns (150 in one case, maybe around 50 the average).

We have analyzed our application and found out that most
of the time not all the columns are used. We have
identified 3 different sets of columns needed in
different parts of the application.

Let's identify and name these sets as:
1- simple set, return the employee list for example
2- common set, return the employee information (which
include the simple set)
3- extended set, return the employee information (which
inlude the common set which itself includes the simple
set) + additional information from other tables, maybe
even some SUM aggregates and so on (I don't know for
example, how much sales the employee did so far).

So the bigger sets contain the smaller ones. Please keep
reading all the way to the bottom to better understand
technically what we are trying.

Here is a code sample of how our current procedures
work. Please note that the passing parameter we can either
pass a Unique Identifier (PK) to retrieve a single record,
or if we pass for example -1 or NULL we retrieve all the
employee records.
*/
create table a ( apk int primary key, af1 int, af2 int, af3 int, af4
int, af5 int, af6 int)
create table b ( bpk int primary key, bf1 int, bf2 int, bf3 int, bf4
int, bf5 int, bf6 int)
create table c ( cpk int primary key, cf1 int, cf2 int, cf3 int, cf4
int, cf5 int, cf6 int)
create table d ( dpk int primary key, df1 int, df2 int, df3 int, df4
int, df5 int, df6 int)

insert a values (1,1111,1112,1113,1114,1115,1116)
insert a values (2,1211,1212,1213,1214,1215,1216)
insert a values (3,1311,1312,1313,1314,1315,1316)
insert a values (4,1411,1412,1413,1431,1415,1416)
insert a values (5,1511,1512,1513,1514,1515,1516)
insert a values (6,1611,1612,1613,1614,1615,1616)

insert b values (1,2111,2112,2113,2114,2115,2116)
insert b values (2,2211,2212,2213,2214,2215,2216)
insert b values (3,2311,2312,2313,2314,2315,2316)
insert b values (4,2411,2412,2413,2431,2415,2416)
insert b values (5,2511,2512,2513,2514,2515,2516)
insert b values (6,2611,2612,2613,2614,2615,2616)

insert c values (1,3111,3112,3113,3114,3115,3116)
insert c values (2,3211,3212,3213,3214,3215,3216)
insert c values (3,3311,3312,3313,3314,3315,3316)
insert c values (4,3411,3412,3413,3431,3415,3416)
insert c values (5,3511,3512,3513,3514,3515,3516)
insert c values (6,3611,3612,3613,3614,3615,3616)

insert d values (1,4111,4112,4113,4114,4115,4116)
insert d values (2,4211,4212,4213,4214,4215,4216)
insert d values (3,4311,4312,4313,4314,4315,4316)
insert d values (4,4411,4412,4413,4431,4415,4416)
insert d values (5,4511,4512,4513,4514,4515,4516)
insert d values (6,4611,4612,4613,4614,4615,4616)

go

create procedure original_proc @.pk int as

if @.pk = -1
set @.pk = null

select
a.af1, a.af2, a.af3, a.af4, b.bf1, b.bf2, b.bf3, b.bf4, c.cf1, c.cf2,
c.cf3, c.cf4, d.df1, d.df2, d.df3, d.df4
from
a
join b on a.apk = b.bpk
join c on b.bpk = c.cpk
join d on c.cpk = d.dpk
where
a.apk = ISNULL(@.pk, a.apk)

go

exec original_proc 1
go

/*
Currently the above SP is a single SP that is basically
returning ALL possible needed data. However most of the
time we might need to call and retrieve a simple employee
list.

So we thought about modifying the stored procedure by
adding an extra parameter that will indicate which set
of columns to return.

For modifying the stored procedure in order to get a
variable name of columns returned and avoiding
repeating code, we built 4 objects: the stored
procedure being called, one table function and 2 views.
One table function so that we are able to pass a parameter.
The views since they do not accept parameters they are
always joined at least with the inline table function.

The stored procedure generates in its body a dynamic
SQL statement, where it queries the table function and
the views, depending which set is required. Here is a
code sample of our current design (you need to run the
previous code in order for this to work).
*/

create function _1_set(@.pk int)
returns table
as return
(
select a.apk, a.af1, a.af2, a.af3, a.af4, b.bf1, b.bf2
from a
join b on a.apk = b.bpk
where a.apk = ISNULL(@.pk, a.apk)
)

go

create view _2_set as
select b.bpk, b.bf3, b.bf4, c.cf1, c.cf2
from b
join c on b.bpk = c.cpk

go

create view _3_set as
select c.cpk, c.cf3, c.cf4, d.df1, d.df2, d.df3, d.df4
from c
join d on c.cpk = d.dpk

go

create procedure new_proc @.pk int, @.set int as
declare @.sql nvarchar(4000)

if @.pk = -1
set @.pk = null

set @.sql = 'select * from _1_set(@.pk) fs '

if @.set 1
set @.sql = @.sql + 'join _2_set ss on fs.apk = ss.bpk '

if @.set 2
set @.sql = @.sql + 'join _3_set ts on ss.bpk = ts.cpk '

exec sp_executesql @.sql, N'@.pk int', @.pk

go

exec new_proc 1, 3
go

/*
For executing the new procedure, we pass parameter 1
for the smaller set, 2 for the medium size set or 3
for the complete set.

For example when we want to retrieve the common set
we pass the Unique Identifier of the employee to the
SP and then we pass the type of set we want to use
as the second parameter (1 for simple set, 2 for
common set and 3 for extended set).

The SP has the IF and dynamic SQL to add more JOINs.

We would like to know what you think of this approach
and if you know a simpler way of doing it.

For cleaning up the test objects run the following code.
*/
drop procedure original_proc
drop procedure new_proc
drop function _1_set
drop view _2_set
drop view _3_set
drop table a
drop table b
drop table c
drop table d

As always I would appreciate any feedback, opinion,
comments, ideas and suggestions.

Thank youserge (sergea@.nospam.ehmail.com) writes:

Quote:

Originally Posted by

select
a.af1, a.af2, a.af3, a.af4, b.bf1, b.bf2, b.bf3, b.bf4, c.cf1, c.cf2,
c.cf3, c.cf4, d.df1, d.df2, d.df3, d.df4
from
a
join b on a.apk = b.bpk
join c on b.bpk = c.cpk
join d on c.cpk = d.dpk
where
a.apk = ISNULL(@.pk, a.apk)


Beware that this query will always scan the table a, even if you provide
a value in @.pk. If the table is modest in size this may not be an issue
to you, but I like to point it out.

Quote:

Originally Posted by

For modifying the stored procedure in order to get a
variable name of columns returned and avoiding
repeating code, we built 4 objects: the stored
procedure being called, one table function and 2 views.
One table function so that we are able to pass a parameter.
The views since they do not accept parameters they are
always joined at least with the inline table function.
>
The stored procedure generates in its body a dynamic
SQL statement, where it queries the table function and
the views, depending which set is required. Here is a
code sample of our current design (you need to run the
previous code in order for this to work).


Personally, I find adding one function and two views being bit too much.
I would probably only show it all into variables in the stored procedure
that builds the dynamic SQL.

Whether to use dynamic SQL or not for the task... well, on the one hand
you don't have to repeat code, on the other hand there may be issues
with giving users permissions. (Then again, there may be not.) One
alternative would be insert the big result set into a temp table of
table variable, and then use static SQL to select from the temp
table. But obviously bouncing the data an extra time has a performance
cost.

The ideal solution would be to use a preprocessor and define the logic
that needs to be repeated as a macro. But most people don't use any
preprocessor with their SQL. (We do in our shop.)

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||On Thu, 23 Nov 2006 13:29:02 -0500, serge wrote:

(snip)

Quote:

Originally Posted by

>We would like to know what you think of this approach
>and if you know a simpler way of doing it.


Hi Serge,

First: thanks for posting the necessary SQL statements to recreate your
situation on our computers.

To your question: if you need to return different sets of columns under
different circumstances, I'd use different stored procedures for each
result set. The use of dynamic SQL as in your example has some downsides
and some risks that you should avoid if at all possible.

See http://www.sommarskog.se/dynamic_sql.html
--
Hugo Kornelis, SQL Server MVP|||Beware that this query will always scan the table a, even if you provide

Quote:

Originally Posted by

a value in @.pk. If the table is modest in size this may not be an issue
to you, but I like to point it out.


Actually we realized that and we already moved the WHERE
logic in the SP using dynamic SQL.

Quote:

Originally Posted by

Personally, I find adding one function and two views being bit too much.
I would probably only show it all into variables in the stored procedure
that builds the dynamic SQL.


Quote:

Originally Posted by

Whether to use dynamic SQL or not for the task... well, on the one hand
you don't have to repeat code, on the other hand there may be issues
with giving users permissions. (Then again, there may be not.) One
alternative would be insert the big result set into a temp table of
table variable, and then use static SQL to select from the temp
table. But obviously bouncing the data an extra time has a performance
cost.


Two reasons why we chose not to use a single SP with dynamic SQL:
1- If we only use dynamic SQL, we are passing the limit for storing text
and we don't want to use EXEC calls to handle this.
2- It is possible that we end up re-using the Inline Table Function in
other SPs.

Quote:

Originally Posted by

The ideal solution would be to use a preprocessor and define the logic
that needs to be repeated as a macro. But most people don't use any
preprocessor with their SQL. (We do in our shop.)


In our situation we can not.

Thanks Erland. At least we now know what we are using is not bad.|||To your question: if you need to return different sets of columns under

Quote:

Originally Posted by

different circumstances, I'd use different stored procedures for each
result set.


This was one of the options we considered before going with the method
we want to use.

Quote:

Originally Posted by

The use of dynamic SQL as in your example has some downsides
and some risks that you should avoid if at all possible.
>
See http://www.sommarskog.se/dynamic_sql.html


We will continue to consult this great article as we continue
working on this project.

Thanks Hugo.|||> How to build a procedure that returns different numbers of columns as a result based on a parameter. <<

Don't program like this; it is slow, very dangrous, a bitch to
maintain and violates some of basic ideas of RDBMS. Go back to the
foundations of good SQL programming:

What is a query? A statement that returns a table.
What is a table? A set of entities of the same kind.
What is an entity? A fixed set of known attributes.

What you have is a "magically jellyfish" that changes shape

Quote:

Originally Posted by

Quote:

Originally Posted by

>We have analyzed our application and found out that most of the time not all the columns are used. <<


SQL is not an application language; it is a data retrieval language.
Period.

Quote:

Originally Posted by

Quote:

Originally Posted by

>So the bigger sets contain the smaller ones. <<


That is why we use VIEWs most of the time, but this might be better
done with three separate queries.

Quote:

Originally Posted by

Quote:

Originally Posted by

>Please keep reading all the way to the bottom to better understand technically what we are trying. <<


Well, you start by confusing rows and records.

Quote:

Originally Posted by

Quote:

Originally Posted by

> Please note that the passing parameter we can either pass a Unique Identifier (PK) to retrieve a single record [sic], or if we pass for example -1 or NULL we retrieve all the employee records [sic]. <<


Did you ever have a software engineering course? Probably not, or you
would remember the time spent on coupling and cohesion and why
overloaded parameters are bad. Which is it - minus one or a NULL?

Quote:

Originally Posted by

Quote:

Originally Posted by

> So we thought about modifying the stored procedure by adding an extra parameter that will indicate which set of columns to return. <<


Since the correct form of a procedure is "<verb><object>", what would
you name your module? It returns multiple kinds of things, in
violation of cohesion. I call such things "Britney Spears, Automobiles
and Squid" procedures.

Quote:

Originally Posted by

Quote:

Originally Posted by

>For modifying the stored procedure in order to get a variable name of columns returned and avoiding repeating code, we built 4 objects: the stored procedure being called, one table function and 2 views ... dynamic SQL statement, where it queries the table function and the views, depending which set is required.<<


When we violate the basics of SE, the code gets ugly and
un-maintainable pretty fast!

Quote:

Originally Posted by

Quote:

Originally Posted by

>We would like to know what you think of this approach and if you know a simpler way of doing it. <<


Write one VIEW or SP for each clearly defined result set. Do not try
to cram everything into one SQL module. And please read some Yourdon,
DeMarco, Myers, etc. and the basics of structured programming. Your
SQL is a nightmare not because of a tricky SQL problem, but because you
do not know how to design a correct program.|||As always I would appreciate any feedback, opinion,

Quote:

Originally Posted by

comments, ideas and suggestions.


I was wondering maybe you can use a table where you actually store the
sql with different sets. For example insert table values (3, 'select
extended set from sometable')
Then you just execute the sql if parameter is 3.
There can be other solutions too. For example, you can pass the actual
column names to the procedure and select only those columns etc. or
pass a number and parse it to column names etc.

serge wrote:

Quote:

Originally Posted by

/*
Subject: How to build a procedure that returns different
numbers of columns as a result based on a parameter.
>
You can copy/paste this whole post in SQL Query Analyzer
or Management Studio and run it once you've made sure
there is no harmful code.
>
Currently we have several stored procedures which final
result is a select with several joins that returns many
columns (150 in one case, maybe around 50 the average).
>
We have analyzed our application and found out that most
of the time not all the columns are used. We have
identified 3 different sets of columns needed in
different parts of the application.
>
Let's identify and name these sets as:
1- simple set, return the employee list for example
2- common set, return the employee information (which
include the simple set)
3- extended set, return the employee information (which
inlude the common set which itself includes the simple
set) + additional information from other tables, maybe
even some SUM aggregates and so on (I don't know for
example, how much sales the employee did so far).
>
So the bigger sets contain the smaller ones. Please keep
reading all the way to the bottom to better understand
technically what we are trying.
>
Here is a code sample of how our current procedures
work. Please note that the passing parameter we can either
pass a Unique Identifier (PK) to retrieve a single record,
or if we pass for example -1 or NULL we retrieve all the
employee records.
*/
create table a ( apk int primary key, af1 int, af2 int, af3 int, af4
int, af5 int, af6 int)
create table b ( bpk int primary key, bf1 int, bf2 int, bf3 int, bf4
int, bf5 int, bf6 int)
create table c ( cpk int primary key, cf1 int, cf2 int, cf3 int, cf4
int, cf5 int, cf6 int)
create table d ( dpk int primary key, df1 int, df2 int, df3 int, df4
int, df5 int, df6 int)
>
insert a values (1,1111,1112,1113,1114,1115,1116)
insert a values (2,1211,1212,1213,1214,1215,1216)
insert a values (3,1311,1312,1313,1314,1315,1316)
insert a values (4,1411,1412,1413,1431,1415,1416)
insert a values (5,1511,1512,1513,1514,1515,1516)
insert a values (6,1611,1612,1613,1614,1615,1616)
>
insert b values (1,2111,2112,2113,2114,2115,2116)
insert b values (2,2211,2212,2213,2214,2215,2216)
insert b values (3,2311,2312,2313,2314,2315,2316)
insert b values (4,2411,2412,2413,2431,2415,2416)
insert b values (5,2511,2512,2513,2514,2515,2516)
insert b values (6,2611,2612,2613,2614,2615,2616)
>
insert c values (1,3111,3112,3113,3114,3115,3116)
insert c values (2,3211,3212,3213,3214,3215,3216)
insert c values (3,3311,3312,3313,3314,3315,3316)
insert c values (4,3411,3412,3413,3431,3415,3416)
insert c values (5,3511,3512,3513,3514,3515,3516)
insert c values (6,3611,3612,3613,3614,3615,3616)
>
insert d values (1,4111,4112,4113,4114,4115,4116)
insert d values (2,4211,4212,4213,4214,4215,4216)
insert d values (3,4311,4312,4313,4314,4315,4316)
insert d values (4,4411,4412,4413,4431,4415,4416)
insert d values (5,4511,4512,4513,4514,4515,4516)
insert d values (6,4611,4612,4613,4614,4615,4616)
>
go
>
create procedure original_proc @.pk int as
>
if @.pk = -1
set @.pk = null
>
select
a.af1, a.af2, a.af3, a.af4, b.bf1, b.bf2, b.bf3, b.bf4, c.cf1, c.cf2,
c.cf3, c.cf4, d.df1, d.df2, d.df3, d.df4
from
a
join b on a.apk = b.bpk
join c on b.bpk = c.cpk
join d on c.cpk = d.dpk
where
a.apk = ISNULL(@.pk, a.apk)
>
go
>
exec original_proc 1
go
>
/*
Currently the above SP is a single SP that is basically
returning ALL possible needed data. However most of the
time we might need to call and retrieve a simple employee
list.
>
So we thought about modifying the stored procedure by
adding an extra parameter that will indicate which set
of columns to return.
>
For modifying the stored procedure in order to get a
variable name of columns returned and avoiding
repeating code, we built 4 objects: the stored
procedure being called, one table function and 2 views.
One table function so that we are able to pass a parameter.
The views since they do not accept parameters they are
always joined at least with the inline table function.
>
The stored procedure generates in its body a dynamic
SQL statement, where it queries the table function and
the views, depending which set is required. Here is a
code sample of our current design (you need to run the
previous code in order for this to work).
*/
>
create function _1_set(@.pk int)
returns table
as return
(
select a.apk, a.af1, a.af2, a.af3, a.af4, b.bf1, b.bf2
from a
join b on a.apk = b.bpk
where a.apk = ISNULL(@.pk, a.apk)
)
>
go
>
create view _2_set as
select b.bpk, b.bf3, b.bf4, c.cf1, c.cf2
from b
join c on b.bpk = c.cpk
>
go
>
create view _3_set as
select c.cpk, c.cf3, c.cf4, d.df1, d.df2, d.df3, d.df4
from c
join d on c.cpk = d.dpk
>
go
>
create procedure new_proc @.pk int, @.set int as
declare @.sql nvarchar(4000)
>
if @.pk = -1
set @.pk = null
>
set @.sql = 'select * from _1_set(@.pk) fs '
>
if @.set 1
set @.sql = @.sql + 'join _2_set ss on fs.apk = ss.bpk '
>
if @.set 2
set @.sql = @.sql + 'join _3_set ts on ss.bpk = ts.cpk '
>
exec sp_executesql @.sql, N'@.pk int', @.pk
>
go
>
exec new_proc 1, 3
go
>
/*
For executing the new procedure, we pass parameter 1
for the smaller set, 2 for the medium size set or 3
for the complete set.
>
For example when we want to retrieve the common set
we pass the Unique Identifier of the employee to the
SP and then we pass the type of set we want to use
as the second parameter (1 for simple set, 2 for
common set and 3 for extended set).
>
The SP has the IF and dynamic SQL to add more JOINs.
>
We would like to know what you think of this approach
and if you know a simpler way of doing it.
>
For cleaning up the test objects run the following code.
*/
drop procedure original_proc
drop procedure new_proc
drop function _1_set
drop view _2_set
drop view _3_set
drop table a
drop table b
drop table c
drop table d
>
>
As always I would appreciate any feedback, opinion,
comments, ideas and suggestions.
>
Thank you

|||None of these were good suggestions.

Quote:

Originally Posted by

Quote:

Originally Posted by

>.. you can use a table where you actually store the sql with different sets. For example insert table values (3, 'select extended set from sometable') <<


SQL is not meant to be used this way -- it stores data and not code.
Why not use a procedure instead?

Quote:

Originally Posted by

Quote:

Originally Posted by

>.. you can pass the actual column names to the procedure and select only those columns etc. or pass a number and parse it to column names etc. <<


You cannot pass column names in T-SQL amd most programming languages.
And you should not do this anyway. You want strong cohesion in a
module (basic Software Engineering, duh!) which means you have a spec
that tells you exactly what the output will be. This "I don't know
what is happening until some random user decides at run time" or
"Briteny Spears, Squid or Automobiles" approach is just plain wrong.|||serge (sergea@.nospam.ehmail.com) writes:

Quote:

Originally Posted by

Two reasons why we chose not to use a single SP with dynamic SQL:
1- If we only use dynamic SQL, we are passing the limit for storing
text and we don't want to use EXEC calls to handle this.


Ah, I see! A workaround or a kludge whichever you want to call it.

On SQL 2005, there is nvarchar(MAX), so there would be limits you could
run into.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||Hugo Kornelis (hugo@.perFact.REMOVETHIS.info.INVALID) writes:

Quote:

Originally Posted by

To your question: if you need to return different sets of columns under
different circumstances, I'd use different stored procedures for each
result set. The use of dynamic SQL as in your example has some downsides
and some risks that you should avoid if at all possible.


Personnally, I agree with the idea of keeping it all in the same
procedure. Splitting it up over several, increases the risk that the
the branches one day deviates from each other when they shouldn't.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx

No comments:

Post a Comment