Tuesday, March 17, 2015

Data Types and Primary Databases Matter

Yes, they do.

I have just finished going through a painful analysis of a view that queries 12 tables using nothing but LEFT OUTER JOINs and has an average of 500,000 rows in 8 of the 12 tables. Lots of scans right there. For the life of me though, I couldn't get SELECT * FROM view_name WHERE ID = 123456 to return in under 10 seconds. I'd worked out the appropriate covering indexes (as well as deleting 5GB of indexes that hadn't ever been used), re-wrote what I could without changing the results, and tried on different disks, but to no avail.

Then, just on a whim, I decided to put quotation marks around the ID. Now, the ID is in an INT column, but the view is joined to another query (yes, I know) where the ID column has become NVARCHAR(38). After the quotation marks were added, the row is returned instantly, and all my Index Scans turned to Index Seeks.

The issue: If an implicit data type conversion is determined by SQL Server to be required, it takes much to be done rather than an explicit conversion (because it doesn't have to test the data types first). This meant that rather than having the required query take 40 minutes, it now takes 5 minutes. Not as fast as I'd like, but a lot better.
Another performance increase I found was running the full query against the "secondary" database with the overly large view, rather than the "primary" database that is intended to be queried. Both are on the same server, but on another hunch, I just changed the query to point to Database B and reference Database A instead. No idea why, but it works.

So uh... yeah.

Sunday, March 1, 2015

Workaround: Blank datasets for tables in SSRS used for report layout

So, we meet again. I guess I'm busy working, but I'm glad I write down some of the more obscure things I end up doing, because I referred to one of my own posts while doing some work last week, so, I guess there's that.


When you're creating a report in Reporting Services and need to use a table for layout (hey, sometimes it works), just create a blank dataset (I called it Blank), and then set the DataSetName for the table to the blank dataset.


I've never heard of anyone doing this, but I've needed to assign datasets to tables not returning data before, and so setting it to a blank one at least prevents the execution of the query at least once during runtime.


Edit: Sorry, I forgot to add one thing. In the blank dataset, add the query "SELECT 1". That will prevent any errors stating that the query cannot be executed.

Sunday, August 24, 2014

Searching all columns in all tables in a database

I've been asked this before by people, and I've always said it's possible, but takes too much effort. However, I found a need to search for a value that may be present in any column in any given database. At present, it will search any column with a text-based data type (gleaned from the system_type_id column in sys.columns), and of course could be changed to search for a numeric value (or anything you want for that matter).

Here's the code. It's got some comments in it. You don't need to change anything except the search string and then press F5. It will print out a list of what it's doing, and it will indent any tables that have matching data to make it easier to find them. I may end up changing this to do something else, but this worked for me.

declare @tablename varchar(50)
declare @columnname varchar(50)
declare @coltablename varchar(50)
declare @searchstringvalue varchar(30)

declare @columns nvarchar(1000)
declare @sql nvarchar(4000)

declare @rows int

set @searchstringvalue = 'SEARCH FOR ME!!!' -- Put search string here
set @columns = '' -- Set a blank value because no one likes NULLs

-- This cursor gets the tables in the database that have text fields
declare CurTables cursor fast_forward for
select top 50
 st.[name]
from
 sys.columns sc
 inner join sys.tables st on sc.object_id = st.object_id
where
 system_type_id in (175, 239, 99, 231, 35, 167, 241)
 and sc.object_id in (select object_id from sys.tables where [type] = 'U')
 and max_length > 21
group by
 st.[name]
order by 1

open CurTables

fetch next from CurTables into @tablename

while @@fetch_status = 0
begin
 set @columns = CAST((select
  sc.[name] + ' = ''' + @searchstringvalue + ''' OR '
 from
  sys.columns sc
  inner join sys.tables st on sc.object_id = st.object_id
 where
  st.[name] = @tablename
  and system_type_id in (175, 239, 99, 231, 35, 167, 241)
  and sc.object_id in (select object_id from sys.tables where [type] = 'U')
  and max_length > 21
 order by 1
 for xml path ('')) AS VARCHAR(MAX))
 
 -- Remove ' OR ' from the end of the string
 set @columns = SUBSTRING(@columns, 1, LEN(@columns) - 3)
  
 set @sql = 'SELECT @numrows = COUNT(1) FROM ' + @tablename + ' WHERE ' + @columns
 
 exec sp_executesql @sql, N'@numrows INT OUTPUT', @numrows = @rows OUTPUT; 
 
 if @rows > 0
 begin
  print CHAR(9) + CHAR(9) + 'Table name: ' + @tablename + ' - ' + CAST(@rows as VARCHAR(10)) + ' row(s) returned.'
 end
 else
 begin
  print 'Table name: ' + @tablename + ' - ' + CAST(@rows as VARCHAR(10)) + ' row(s) returned.'
 end
  
 set @columns = ''
 
 fetch next from CurTables into @tablename
end

close CurTables
deallocate CurTables

Wednesday, July 2, 2014

Missing parameter values in Reporting Serivces Linked Reports

If, one day, you're creating a linked report, and you specify a default value for one (or more) of the parameters, you might receive an error stating 'The parameter [name] is missing a value' when you test that the report is working.

No matter what you try, nothing works until you decide to check the XML in the Parameter column in the table ReportServer.dbo.Catalog. You'll need to remove this particular string:

<DynamicValidValues>True</DynamicValidValues>

This key/value causes the report to basically delete the default value you've specified for the parameter, hence the error message. Just copy the current XML string from that field, remove that string, and update it with the XML you've just modified.

Fun times.

Monday, June 30, 2014

Finding the biggest tables in a database

There will always be times as a Database Administrator or Developer when you need to know where the bulk of the data lies in any given database. There are probably some instances of software available that will provide you with that information, but there's no need to use those. Below is a very simple script I wrote that will give you what you need, ordered from largest to smallest.

CREATE TABLE #tabledata
(
 tablename VARCHAR(255)
 ,numrows INT
 ,reserved VARCHAR(127)
 ,data VARCHAR(127)
 ,index_size VARCHAR(127)
 ,unused VARCHAR(127)
)

EXEC sp_MSforeachtable 'INSERT INTO #tabledata EXEC sp_spaceused ''?'''

SELECT * FROM #tabledata ORDER BY numrows DESC

DROP TABLE #tabledata

This script can come in handy if you want to appease data-hungry managers appetites with low level database growth figures by storing it in a data warehouse somewhere on a daily basis, and then write some reports showing where the major growth is, and then go back to developers and say "Hey, 75% of the database is audit records - stop doing so much crap" or some such tactful assertion.

Monday, May 12, 2014

A Rant: Bad Database Design

Sometimes, there's just no excuse for bad database design. Design that shuns any best practices, past and present (or even future), and dives headlong into a design that requires far more CPU time than is ever necessary, causes data file sizes to grow at an exponential rate, and generally leaves users blaming a DBA or an IT department in general when things are slow.

While I won't mention the names of the company or the product, I need to at least divulge some of the particulars about the table design to illustrate some of its terrible aspects - perhaps I'll be as specifically unspecific as possible.

The offending table which gives rise to all other offending tables relates to Employee data. In this particular table, there is an Employee ID column which contains the employee ID, which is an all-numeric ID. Debates will rage whether a fixed-length identifier which contains only numeric characters should be stored in an INT or a VARCHAR column, but this design uses CHAR. I hear cries of "Page splits! Query Performance! Data file usage!" for VARCHAR column types. Of course, data in a CHAR column is automatically allocated for each row, so there's no need to shuffle data around to add a new value as there may be required for VARCHAR types. However, with the speed of commercial grade hard drives and CPUs these days, with appropriate data file growth and fill factor settings, and use of CHAR data types only when average data length is consistent, I feel this becomes a moot point.

Back to the design. So, we've got a CHAR type for the Employee ID. However, this ID isn't used as the primary key. There's a column in the table that is referred to as a surrogate key column, which stores VARBINARY data and has been set as the primary key. However, this VARBINARY column only stores the employee ID which is already stored as CHAR data. Digging further, tables that contain employee data store the same Employee ID, plus additional data separated by a space, in a VARBINARY column. Now SQL Server or the application needs to convert from one data type to another for each row in the main table, and on child tables it also needs to use LEFT() to retrieve the Employee ID from the surrogate column.

I've thought about this for a while, and I can't really see why this has been done (or at the very least, continued through three major upgrades and perhaps 20 minor revisions). It requires more than double the storage for the same data (going against typical normalisation forms), and requires greater CPU time to get the data in and out of particular formats.

Then, there's the aforementioned upgrades to the database & application themselves. Each major upgrade has resulted in large increases in database size - the most recent one being 9GB. I know, I know - that's not a lot considering the size of hard drives these days. However - we don't spend money on appropriately sized SANs, and we only have one SAN for the entire company (a few thousand people), and I wasn't asked how big I thought the SAN should be, so I've only been given 1.3TB in total for data, logs/TempDB and backups between three SQL instances. This compounds the problem of arbitrarily increasing the size of the database during an upgrade (this generally involves increasing the size of CHAR and VARBINARY columns by a few characters each time).

In these instances, VARCHAR columns are a lot better than CHAR, because simply altering the maximum size of a VARCHAR column won't result in data file size growth, because it doesn't need to pad the columns with extra spaces.

My biggest gripe with this particular database design is how foreign keys, lookup values and other frequently accessed pieces of data are stored. In some tables (not all, so there's a lack of consistency in design), often accessed data is stored in a long VARBINARY data type which in a previous iteration of the application was CHAR, but was changed for some unknown reason. Within this VARBINARY column, character data is stored in either TAB or SPACE separated format, so now you need to CAST() or CONVERT() the data, and then use SUBSTRING() to get the particular data you want. I've no idea why, but it takes a LOT longer now to run reports over this database than it did previously.

These points sound like basic things that don't really matter, but having watched this database grow 2000% over four years, and watched changes made to it which have severely affected performance, I just wanted to at least keep bad design near the front of peoples minds so they can perhaps remember to prevent it where possible.

Sunday, May 4, 2014

Audit Tables, Clustered Indexes, GUIDs

After a weekend full of application and database updates, upgrades and maintenance, I have refreshed some things, and learned others.

Firstly: Don't use a GUID as an ID for an audit table. If it REALLY is just an audit table, INT is completely acceptable. If you want to create 58,494,241,735.507 per second every second for 5 years, and never truncate the table, and can store that much data, then use BIGINT, which is still only half the space of UNIQUEIDENTIFIER. Incidentally, at that rate, you're creating 35.91 petabytes of data every day. You could probably do this if you were somehow able to harness all the storage in the world.

Second: Don't leave the fill factor for the unique clustered index at 5%, because it wastes a TON of space. For a table that had 148,000,000 rows, there was 35GB of data and 15GB of indexes reported by sp_spaceused. However, this is VERY misleading, because with the fill factor set to 5% on the clustered index, most of that was mostly blank sections of pages in the database, and with index fragmentation at 100%, there was a massive performance hit.

Third: Reorganise your index every once in a while, especially if there's lots of inserts/updates/deletes. Tthis is a given, but when you get side tracked with non-DBA work for six months, and the developers say they've got it under control, spend a little bit of time outside normal hours to make sure the index has an adequate fill factor, and isn't too fragmented.

Fourth: If you've got one or more tables that are heavily used, but the data stored doesn't form part of what is essential business data (again: audit data, transactional data to be synced with an ESB like Neuron), create another file group and put the table(s) there. It will make a MASSIVE difference to performance. There's less unnecessary data to go through if you're running reports against the live database (should be warehoused where possible of course), and the applications that use the database will run quicker. Everyone will be happy.

For those of you who are curious about moving tables between filegroups, and you find Books Online to be too tedious to go through, this article on StackOverflow.com gives a concise overview of the process: https://stackoverflow.com/questions/2438019/how-i-can-move-table-to-another-filegroup