Sunday, August 24, 2014

Searching all columns in all tables in a database

I've been asked this before by people, and I've always said it's possible, but takes too much effort. However, I found a need to search for a value that may be present in any column in any given database. At present, it will search any column with a text-based data type (gleaned from the system_type_id column in sys.columns), and of course could be changed to search for a numeric value (or anything you want for that matter).

Here's the code. It's got some comments in it. You don't need to change anything except the search string and then press F5. It will print out a list of what it's doing, and it will indent any tables that have matching data to make it easier to find them. I may end up changing this to do something else, but this worked for me.

declare @tablename varchar(50)
declare @columnname varchar(50)
declare @coltablename varchar(50)
declare @searchstringvalue varchar(30)

declare @columns nvarchar(1000)
declare @sql nvarchar(4000)

declare @rows int

set @searchstringvalue = 'SEARCH FOR ME!!!' -- Put search string here
set @columns = '' -- Set a blank value because no one likes NULLs

-- This cursor gets the tables in the database that have text fields
declare CurTables cursor fast_forward for
select top 50
 st.[name]
from
 sys.columns sc
 inner join sys.tables st on sc.object_id = st.object_id
where
 system_type_id in (175, 239, 99, 231, 35, 167, 241)
 and sc.object_id in (select object_id from sys.tables where [type] = 'U')
 and max_length > 21
group by
 st.[name]
order by 1

open CurTables

fetch next from CurTables into @tablename

while @@fetch_status = 0
begin
 set @columns = CAST((select
  sc.[name] + ' = ''' + @searchstringvalue + ''' OR '
 from
  sys.columns sc
  inner join sys.tables st on sc.object_id = st.object_id
 where
  st.[name] = @tablename
  and system_type_id in (175, 239, 99, 231, 35, 167, 241)
  and sc.object_id in (select object_id from sys.tables where [type] = 'U')
  and max_length > 21
 order by 1
 for xml path ('')) AS VARCHAR(MAX))
 
 -- Remove ' OR ' from the end of the string
 set @columns = SUBSTRING(@columns, 1, LEN(@columns) - 3)
  
 set @sql = 'SELECT @numrows = COUNT(1) FROM ' + @tablename + ' WHERE ' + @columns
 
 exec sp_executesql @sql, N'@numrows INT OUTPUT', @numrows = @rows OUTPUT; 
 
 if @rows > 0
 begin
  print CHAR(9) + CHAR(9) + 'Table name: ' + @tablename + ' - ' + CAST(@rows as VARCHAR(10)) + ' row(s) returned.'
 end
 else
 begin
  print 'Table name: ' + @tablename + ' - ' + CAST(@rows as VARCHAR(10)) + ' row(s) returned.'
 end
  
 set @columns = ''
 
 fetch next from CurTables into @tablename
end

close CurTables
deallocate CurTables

Wednesday, July 2, 2014

Missing parameter values in Reporting Serivces Linked Reports

If, one day, you're creating a linked report, and you specify a default value for one (or more) of the parameters, you might receive an error stating 'The parameter [name] is missing a value' when you test that the report is working.

No matter what you try, nothing works until you decide to check the XML in the Parameter column in the table ReportServer.dbo.Catalog. You'll need to remove this particular string:

<DynamicValidValues>True</DynamicValidValues>

This key/value causes the report to basically delete the default value you've specified for the parameter, hence the error message. Just copy the current XML string from that field, remove that string, and update it with the XML you've just modified.

Fun times.

Monday, June 30, 2014

Finding the biggest tables in a database

There will always be times as a Database Administrator or Developer when you need to know where the bulk of the data lies in any given database. There are probably some instances of software available that will provide you with that information, but there's no need to use those. Below is a very simple script I wrote that will give you what you need, ordered from largest to smallest.

CREATE TABLE #tabledata
(
 tablename VARCHAR(255)
 ,numrows INT
 ,reserved VARCHAR(127)
 ,data VARCHAR(127)
 ,index_size VARCHAR(127)
 ,unused VARCHAR(127)
)

EXEC sp_MSforeachtable 'INSERT INTO #tabledata EXEC sp_spaceused ''?'''

SELECT * FROM #tabledata ORDER BY numrows DESC

DROP TABLE #tabledata

This script can come in handy if you want to appease data-hungry managers appetites with low level database growth figures by storing it in a data warehouse somewhere on a daily basis, and then write some reports showing where the major growth is, and then go back to developers and say "Hey, 75% of the database is audit records - stop doing so much crap" or some such tactful assertion.

Monday, May 12, 2014

A Rant: Bad Database Design

Sometimes, there's just no excuse for bad database design. Design that shuns any best practices, past and present (or even future), and dives headlong into a design that requires far more CPU time than is ever necessary, causes data file sizes to grow at an exponential rate, and generally leaves users blaming a DBA or an IT department in general when things are slow.

While I won't mention the names of the company or the product, I need to at least divulge some of the particulars about the table design to illustrate some of its terrible aspects - perhaps I'll be as specifically unspecific as possible.

The offending table which gives rise to all other offending tables relates to Employee data. In this particular table, there is an Employee ID column which contains the employee ID, which is an all-numeric ID. Debates will rage whether a fixed-length identifier which contains only numeric characters should be stored in an INT or a VARCHAR column, but this design uses CHAR. I hear cries of "Page splits! Query Performance! Data file usage!" for VARCHAR column types. Of course, data in a CHAR column is automatically allocated for each row, so there's no need to shuffle data around to add a new value as there may be required for VARCHAR types. However, with the speed of commercial grade hard drives and CPUs these days, with appropriate data file growth and fill factor settings, and use of CHAR data types only when average data length is consistent, I feel this becomes a moot point.

Back to the design. So, we've got a CHAR type for the Employee ID. However, this ID isn't used as the primary key. There's a column in the table that is referred to as a surrogate key column, which stores VARBINARY data and has been set as the primary key. However, this VARBINARY column only stores the employee ID which is already stored as CHAR data. Digging further, tables that contain employee data store the same Employee ID, plus additional data separated by a space, in a VARBINARY column. Now SQL Server or the application needs to convert from one data type to another for each row in the main table, and on child tables it also needs to use LEFT() to retrieve the Employee ID from the surrogate column.

I've thought about this for a while, and I can't really see why this has been done (or at the very least, continued through three major upgrades and perhaps 20 minor revisions). It requires more than double the storage for the same data (going against typical normalisation forms), and requires greater CPU time to get the data in and out of particular formats.

Then, there's the aforementioned upgrades to the database & application themselves. Each major upgrade has resulted in large increases in database size - the most recent one being 9GB. I know, I know - that's not a lot considering the size of hard drives these days. However - we don't spend money on appropriately sized SANs, and we only have one SAN for the entire company (a few thousand people), and I wasn't asked how big I thought the SAN should be, so I've only been given 1.3TB in total for data, logs/TempDB and backups between three SQL instances. This compounds the problem of arbitrarily increasing the size of the database during an upgrade (this generally involves increasing the size of CHAR and VARBINARY columns by a few characters each time).

In these instances, VARCHAR columns are a lot better than CHAR, because simply altering the maximum size of a VARCHAR column won't result in data file size growth, because it doesn't need to pad the columns with extra spaces.

My biggest gripe with this particular database design is how foreign keys, lookup values and other frequently accessed pieces of data are stored. In some tables (not all, so there's a lack of consistency in design), often accessed data is stored in a long VARBINARY data type which in a previous iteration of the application was CHAR, but was changed for some unknown reason. Within this VARBINARY column, character data is stored in either TAB or SPACE separated format, so now you need to CAST() or CONVERT() the data, and then use SUBSTRING() to get the particular data you want. I've no idea why, but it takes a LOT longer now to run reports over this database than it did previously.

These points sound like basic things that don't really matter, but having watched this database grow 2000% over four years, and watched changes made to it which have severely affected performance, I just wanted to at least keep bad design near the front of peoples minds so they can perhaps remember to prevent it where possible.

Sunday, May 4, 2014

Audit Tables, Clustered Indexes, GUIDs

After a weekend full of application and database updates, upgrades and maintenance, I have refreshed some things, and learned others.

Firstly: Don't use a GUID as an ID for an audit table. If it REALLY is just an audit table, INT is completely acceptable. If you want to create 58,494,241,735.507 per second every second for 5 years, and never truncate the table, and can store that much data, then use BIGINT, which is still only half the space of UNIQUEIDENTIFIER. Incidentally, at that rate, you're creating 35.91 petabytes of data every day. You could probably do this if you were somehow able to harness all the storage in the world.

Second: Don't leave the fill factor for the unique clustered index at 5%, because it wastes a TON of space. For a table that had 148,000,000 rows, there was 35GB of data and 15GB of indexes reported by sp_spaceused. However, this is VERY misleading, because with the fill factor set to 5% on the clustered index, most of that was mostly blank sections of pages in the database, and with index fragmentation at 100%, there was a massive performance hit.

Third: Reorganise your index every once in a while, especially if there's lots of inserts/updates/deletes. Tthis is a given, but when you get side tracked with non-DBA work for six months, and the developers say they've got it under control, spend a little bit of time outside normal hours to make sure the index has an adequate fill factor, and isn't too fragmented.

Fourth: If you've got one or more tables that are heavily used, but the data stored doesn't form part of what is essential business data (again: audit data, transactional data to be synced with an ESB like Neuron), create another file group and put the table(s) there. It will make a MASSIVE difference to performance. There's less unnecessary data to go through if you're running reports against the live database (should be warehoused where possible of course), and the applications that use the database will run quicker. Everyone will be happy.

For those of you who are curious about moving tables between filegroups, and you find Books Online to be too tedious to go through, this article on StackOverflow.com gives a concise overview of the process: https://stackoverflow.com/questions/2438019/how-i-can-move-table-to-another-filegroup

Wednesday, December 18, 2013

The best laid plans of mice and men...

It would be nice if people didn't do this...

Tuesday, October 29, 2013

Quirks

I've found two minor quirks within SQL Server today that, if I had hair, would have caused me to tear it (the hair) out.

1. SQL returns "*" when casting/converting an integer to data type varchar, and the data length is too short. Two two lines of code will show what I mean:

SELECT CAST(123 AS VARCHAR(3)) -- Returns 123
SELECT CAST(123 AS VARCHAR(2)) -- Returns *
This caused me issues because I (unfortunately) have to create a massive dynamic SQL statement to do some ridiculous data transformation from a flat file data source to an XLS file to importing into another database. Yes, I understand the best way to do it would be to insert directly into the necessary tables, however the vendor doesn't want us doing that.

2. You need to have the same number of columns numbered in your Excel spreadsheet destination as you  are selecting from the database.

As I mentioned above, I need to output data to an XLS file, and each column that I am placing data in has to be numbered. So if I select 61 columns of data from SQL, there has to be columns numbered 1 to 61 on the first row of the spreadsheet. 60 or 62 causes errors.