Monday, March 30, 2015

Get ALL the Triggers

Every once in a while, we perform upgrades and updates to one of the main applications we use company wide, and when we do, it drops all our custom triggers. Why, I'm not sure, but it does. It's a little annoying when the developers don't have backups of the triggers they've created, so here's something I run to get all the triggers in a database. Just set Management Studio to save the results to a file and then press F5.

SELECT
 OBJECT_NAME(asm.object_id)
 ,asm.[definition]
FROM
 sys.all_sql_modules asm
WHERE
 asm.object_id IN (SELECT object_id FROM sys.events)

Yes, that's all it is. Comment out the first column (and comma) if you don't want the name of the trigger.

Tuesday, March 17, 2015

Data Types and Primary Databases Matter

Yes, they do.

I have just finished going through a painful analysis of a view that queries 12 tables using nothing but LEFT OUTER JOINs and has an average of 500,000 rows in 8 of the 12 tables. Lots of scans right there. For the life of me though, I couldn't get SELECT * FROM view_name WHERE ID = 123456 to return in under 10 seconds. I'd worked out the appropriate covering indexes (as well as deleting 5GB of indexes that hadn't ever been used), re-wrote what I could without changing the results, and tried on different disks, but to no avail.

Then, just on a whim, I decided to put quotation marks around the ID. Now, the ID is in an INT column, but the view is joined to another query (yes, I know) where the ID column has become NVARCHAR(38). After the quotation marks were added, the row is returned instantly, and all my Index Scans turned to Index Seeks.

The issue: If an implicit data type conversion is determined by SQL Server to be required, it takes much to be done rather than an explicit conversion (because it doesn't have to test the data types first). This meant that rather than having the required query take 40 minutes, it now takes 5 minutes. Not as fast as I'd like, but a lot better.
Another performance increase I found was running the full query against the "secondary" database with the overly large view, rather than the "primary" database that is intended to be queried. Both are on the same server, but on another hunch, I just changed the query to point to Database B and reference Database A instead. No idea why, but it works.

So uh... yeah.

Sunday, March 1, 2015

Workaround: Blank datasets for tables in SSRS used for report layout

So, we meet again. I guess I'm busy working, but I'm glad I write down some of the more obscure things I end up doing, because I referred to one of my own posts while doing some work last week, so, I guess there's that.


When you're creating a report in Reporting Services and need to use a table for layout (hey, sometimes it works), just create a blank dataset (I called it Blank), and then set the DataSetName for the table to the blank dataset.


I've never heard of anyone doing this, but I've needed to assign datasets to tables not returning data before, and so setting it to a blank one at least prevents the execution of the query at least once during runtime.


Edit: Sorry, I forgot to add one thing. In the blank dataset, add the query "SELECT 1". That will prevent any errors stating that the query cannot be executed.

Sunday, August 24, 2014

Searching all columns in all tables in a database

I've been asked this before by people, and I've always said it's possible, but takes too much effort. However, I found a need to search for a value that may be present in any column in any given database. At present, it will search any column with a text-based data type (gleaned from the system_type_id column in sys.columns), and of course could be changed to search for a numeric value (or anything you want for that matter).

Here's the code. It's got some comments in it. You don't need to change anything except the search string and then press F5. It will print out a list of what it's doing, and it will indent any tables that have matching data to make it easier to find them. I may end up changing this to do something else, but this worked for me.

declare @tablename varchar(50)
declare @columnname varchar(50)
declare @coltablename varchar(50)
declare @searchstringvalue varchar(100)

declare @columns nvarchar(1000)
declare @sql nvarchar(4000)

declare @rows int

set @searchstringvalue = 'search string' -- Put search string here
set @columns = '' -- Set a blank value because no one likes NULLs

-- This cursor gets the tables in the database that have text fields
declare CurTables cursor fast_forward for
select
 st.[name]
from
 sys.columns sc
 inner join sys.tables st on sc.object_id = st.object_id
where
 system_type_id in (175, 239, 99, 231, 35, 167, 241)
 and sc.object_id in (select object_id from sys.tables where [type] = 'U')
 and max_length > 21
group by
 st.[name]
order by 1

open CurTables

fetch next from CurTables into @tablename

while @@fetch_status = 0
begin
 set @columns = CAST((select
  '[' + sc.[name] + ']' + ' LIKE ''%' + @searchstringvalue + '%'' OR '
 from
  sys.columns sc
  inner join sys.tables st on sc.object_id = st.object_id
 where
  st.[name] = @tablename
  and system_type_id in (175, 239, 99, 231, 35, 167, 241)
  and sc.object_id in (select object_id from sys.tables where [type] = 'U')
  and max_length > 21
 order by 1
 for xml path ('')) AS VARCHAR(MAX))
 
 -- Remove ' OR ' from the end of the string
 set @columns = SUBSTRING(@columns, 1, LEN(@columns) - 3)
  
 set @sql = 'SELECT @numrows = COUNT(1) FROM ' + @tablename + ' WHERE ' + @columns
 
 exec sp_executesql @sql, N'@numrows INT OUTPUT', @numrows = @rows OUTPUT; 
 
 if @rows > 0
 begin
  --print CHAR(9) + CHAR(9) + 'Table name: ' + @tablename + ' - ' + CAST(@rows as VARCHAR(10)) + ' row(s) returned.'
  print CHAR(9) + CHAR(9) + 'SELECT * FROM ' + @tablename + ' WHERE ' + @columns
 end
 /*
 else
 begin
  print 'Table name: ' + @tablename + ' - ' + CAST(@rows as VARCHAR(10)) + ' row(s) returned.'
 end
 */
 
 set @columns = ''
 
 fetch next from CurTables into @tablename
end

close CurTables
deallocate CurTables

Wednesday, July 2, 2014

Missing parameter values in Reporting Serivces Linked Reports

If, one day, you're creating a linked report, and you specify a default value for one (or more) of the parameters, you might receive an error stating 'The parameter [name] is missing a value' when you test that the report is working.

No matter what you try, nothing works until you decide to check the XML in the Parameter column in the table ReportServer.dbo.Catalog. You'll need to remove this particular string:

<DynamicValidValues>True</DynamicValidValues>

This key/value causes the report to basically delete the default value you've specified for the parameter, hence the error message. Just copy the current XML string from that field, remove that string, and update it with the XML you've just modified.

Fun times.

Monday, June 30, 2014

Finding the biggest tables in a database

There will always be times as a Database Administrator or Developer when you need to know where the bulk of the data lies in any given database. There are probably some instances of software available that will provide you with that information, but there's no need to use those. Below is a very simple script I wrote that will give you what you need, ordered from largest to smallest.

CREATE TABLE #tabledata
(
 tablename VARCHAR(255)
 ,numrows INT
 ,reserved VARCHAR(127)
 ,data VARCHAR(127)
 ,index_size VARCHAR(127)
 ,unused VARCHAR(127)
)

EXEC sp_MSforeachtable 'INSERT INTO #tabledata EXEC sp_spaceused ''?'''

SELECT * FROM #tabledata ORDER BY numrows DESC

DROP TABLE #tabledata

This script can come in handy if you want to appease data-hungry managers appetites with low level database growth figures by storing it in a data warehouse somewhere on a daily basis, and then write some reports showing where the major growth is, and then go back to developers and say "Hey, 75% of the database is audit records - stop doing so much crap" or some such tactful assertion.

Monday, May 12, 2014

A Rant: Bad Database Design

Sometimes, there's just no excuse for bad database design. Design that shuns any best practices, past and present (or even future), and dives headlong into a design that requires far more CPU time than is ever necessary, causes data file sizes to grow at an exponential rate, and generally leaves users blaming a DBA or an IT department in general when things are slow.

While I won't mention the names of the company or the product, I need to at least divulge some of the particulars about the table design to illustrate some of its terrible aspects - perhaps I'll be as specifically unspecific as possible.

The offending table which gives rise to all other offending tables relates to Employee data. In this particular table, there is an Employee ID column which contains the employee ID, which is an all-numeric ID. Debates will rage whether a fixed-length identifier which contains only numeric characters should be stored in an INT or a VARCHAR column, but this design uses CHAR. I hear cries of "Page splits! Query Performance! Data file usage!" for VARCHAR column types. Of course, data in a CHAR column is automatically allocated for each row, so there's no need to shuffle data around to add a new value as there may be required for VARCHAR types. However, with the speed of commercial grade hard drives and CPUs these days, with appropriate data file growth and fill factor settings, and use of CHAR data types only when average data length is consistent, I feel this becomes a moot point.

Back to the design. So, we've got a CHAR type for the Employee ID. However, this ID isn't used as the primary key. There's a column in the table that is referred to as a surrogate key column, which stores VARBINARY data and has been set as the primary key. However, this VARBINARY column only stores the employee ID which is already stored as CHAR data. Digging further, tables that contain employee data store the same Employee ID, plus additional data separated by a space, in a VARBINARY column. Now SQL Server or the application needs to convert from one data type to another for each row in the main table, and on child tables it also needs to use LEFT() to retrieve the Employee ID from the surrogate column.

I've thought about this for a while, and I can't really see why this has been done (or at the very least, continued through three major upgrades and perhaps 20 minor revisions). It requires more than double the storage for the same data (going against typical normalisation forms), and requires greater CPU time to get the data in and out of particular formats.

Then, there's the aforementioned upgrades to the database & application themselves. Each major upgrade has resulted in large increases in database size - the most recent one being 9GB. I know, I know - that's not a lot considering the size of hard drives these days. However - we don't spend money on appropriately sized SANs, and we only have one SAN for the entire company (a few thousand people), and I wasn't asked how big I thought the SAN should be, so I've only been given 1.3TB in total for data, logs/TempDB and backups between three SQL instances. This compounds the problem of arbitrarily increasing the size of the database during an upgrade (this generally involves increasing the size of CHAR and VARBINARY columns by a few characters each time).

In these instances, VARCHAR columns are a lot better than CHAR, because simply altering the maximum size of a VARCHAR column won't result in data file size growth, because it doesn't need to pad the columns with extra spaces.

My biggest gripe with this particular database design is how foreign keys, lookup values and other frequently accessed pieces of data are stored. In some tables (not all, so there's a lack of consistency in design), often accessed data is stored in a long VARBINARY data type which in a previous iteration of the application was CHAR, but was changed for some unknown reason. Within this VARBINARY column, character data is stored in either TAB or SPACE separated format, so now you need to CAST() or CONVERT() the data, and then use SUBSTRING() to get the particular data you want. I've no idea why, but it takes a LOT longer now to run reports over this database than it did previously.

These points sound like basic things that don't really matter, but having watched this database grow 2000% over four years, and watched changes made to it which have severely affected performance, I just wanted to at least keep bad design near the front of peoples minds so they can perhaps remember to prevent it where possible.