SQL Giant: SQL Server

Showing posts with label SQL Server. Show all posts

Saturday, March 24, 2012

Indexing Strategies for OLTP Databases.

Let me start by saying that I am not a DBA, I am a BI Developer. I work with some really bright DBA’s and they can tell you much more about this topic then I can, and probably in great depth, with many terms like PERFMON, DMV and querying a bunch of those Sys tables. To this point in my career I’ve haven’t had the exposure or need to think about many of the DBA tasks. I mostly worry about building packages, cubes and repots. This just isn’t going to cut it. As I’ve progressed in my career as a consultant, I realized that clients are going to be expecting more and more from me, and rightfully so. As a junior developer I could lean on the more senior guys to help carry me when things outside of my knowledge base came up, but now I need to become one of those senior guys, and that means <gulp> moving outside of my comfort zone and diving into topics that are in areas other than strictly Business Intelligence. So for my first foray into the land of the DBA I decided to look into Indexes, and more specifically indexing strategies in OLTP systems, since my only real exposure to them is reading from them when loading data into a data warehouse, and not writing to them.

Now some of this might seem pretty basic to you but this is how I think about them. Online Transactional Processing (OLTP) systems are primarily used for data entry. Since the primary function of them is to input data, and usually high volumes of data over an extended period of time, the primary performance concern for an OLTP system is how fast can it input the data. Now that is over simplifying it a bit, because its not like every new insert is coming from a new customer ordering a new item. Most likely it’s a returning customer ordering an item that has been ordered by other customers before. So obviously we aren’t going to put duplicate data inside our OLTP system (3NF and all that goodness) so there will have to be some reads performed on the database. This is where the indexes come in. You want to create indexes that will speed up the reads you are doing on your OLTP system to enable you to do faster inserts updates and deletes to your system. The problem is that indexes can actually slow down the inserts updates and deletes. The trick then is to find the “sweet spot” where the indexes speed up the overall process, not just the reads on your queries. As you can imagine there is no single way to do it, no one “right” answer. I did however find some tips out there to help fledgling DBAs and other wannabes like me, keep from making a fool of ourselves when the cool kids (DBAs) come behind and look at the indexes we’ve implemented. Here are some of the Highlights and I’ve included links to them.

Don’t over index your OLTP tables, as every index you add increases the time it takes to perform INSERTS, UPDATES, and DELETES. There is a fine line between having the ideal number of indexes (for SELECTs) and the ideal number for data modifications
Too many indexes on frequently updated (inclusive of inserts, updates and deletes) tables incur extra index maintenance overhead. Generally, OLTP database designs should keep the number of indexes to a functional minimum, again due to the high volumes of similar transactions combined with the cost of index maintenance.
Unused indexes incur the cost of index maintenance for inserts, updates, and deletes without benefiting any users. Unused indexes should be eliminated. Any index that has been used (by select, update or delete operations) will appear in sys.dm_db_index_usage_stats. Thus, any defined index not included in this DMV has not been used since the last re-start of SQL Server.
I generally start to get concerned when I see more than about five or six relational indexes on a volatile table. Having too many relational indexes has a negative effect on insert, update, and delete performance. Ultimately, it is a judgment call, based on your workload, your hardware and I/O subsystem, and on your experience.
Standard practices call for indexing strategies in OLTP systems to target an increase in concurrency versus query support; however, more indexes have to be created than is desired to reach acceptable query performance. The lower the proportion of write operations is in the system, the higher the level of indexing that can be tolerated, unless the timing of specific write operations is critical

Well I hope this has helped someone other than me. One thing I have learned since I started blogging, is that if I can explain it in writing then I have a pretty good grasp of the concept, and I can always fall back on the blog that I wrote as a refresher (convenient!).

Monday, June 13, 2011

Creating my first Development Environment For SharePoint 2010 Part 1:The Dilemma

The first time I started developing Performance Point solutions inside of SharePoint 2010 for a client I knew that to get beyond proficient and into the realm of expert I was going to have to spend a lot of my own time creating reports, KPIs, scorecards and dashboards. To do that I needed an SharePoint environment to call my own. I tried and successfully installed SharePoint 2010 on my laptop running Windows 7 using the steps outlined here. But it was slow going, I only have 3GB of RAM on my laptop. Since I was traveling so much for work it just had to do.

Then I was asked to tech edit SharePoint Business Intelligence 24 Hour Trainer and suddenly my laptop just wasn’t going to cut it anymore. For one thing I was going to have to uninstall and reinstall a couple of times just to properly edit the book. For another I have a more powerful machine at home (quad core and up to 8GB of RAM) and I wanted to install Windows Server 2008 R2 on it. But its my family’s computer and was still running Vista and I didn’t want to wipe all the programs I had installed on it.

While talking about my problem with Marc Stacey (@MarkGStacey ) suggested that I create a virtual Hard Disk for setting up the Windows Server 2008R2 environment. Even though I trust Marks opinion I wanted to understand a little more before I ran off and tried to implement a vhd. The first thing I wanted to know is what are the advantages of using a vhd? The most significant advantage in my opinion is not having to edit your current partitions. Another is that you are not limited on the number of vhd files on your computer except to the extent of the amount of storage you have. You can also create incremental vhd files having one based on another.

So now that I had a possible solution it was time to try and implement it. So next time I’ll go over the steps I took to set up the vhd. Until next time, GO GIANTS

Friday, June 10, 2011

Task Factory Trim Plus Transform

I’m beginning to realize that I really like to do things the easy way. Of course I always want things done right, but the easier the better, which is why I am always thankful that I get to use the Pragmatic Works Software Suite while I am developing SSIS Packages. Task Factory is very helpful in the mission to make my life easier. It has 26 tasks and growing with every release.

One of the consistent problems that I run into while developing ETL packages is cleansing the data. Sometimes strange or unwanted characters creep up in the data or leading and trailing spaces need to be removed before landing the data in a new table, and Task Factory has a transform that can help you quickly get rid of the unwanted characters or space called the Trim Plus Transform.

The steps for configuring this transform are quick and easy.

Connect your source data to the transform.

Double click to open up the transform editor.

Now you can select what actions you want to perform on each column. There are several actions to choose from in that drop down.

Trim all leading and trailing white-space characters

Trim all leading white-space characters

Trim all trailing white-space characters

Trim specified characters from start and end

Trim specified characters from start

Trim specified characters from end

You can specify what characters you want to trim in the last 3 actions. Certain characters are difficult for one reason or another but the transform is built to handle them using an escape sequence (e.g. Tab [\t], CarriageRerurn [\r], LineFeed [\n], VerticalTab [\v], SingleQuote [\'], DoubleQuotes [\"], NullCharacter [\0])

You can also specify if you want to replace or add a new column as the out put action.

If you look at the figure above you can see that I’ve set the FeetInches column I am choosing to get rid the extra double quote (“). The data looks like this going in.

then using the \” to indicate the “ as the character I want to remove I run the package and here is my result.

As you can see it removed the unwanted characters and now my data is ready to land in my table.

If you get a chance take a look at Task Factory it has 26 transforms designed to assist your development.

Well that’s all I have time for today, since its Friday. Have a good weekend and good luck in your development adventures.

Task Factory Address Parser Transform

Have you ever received a list of customers or clients that you want to use to do a mailing from but when you received it the address fields were not even close to being useable? Well then I have then I have some great news: Task Factory has a transform that can do this work for you with some very impressive results. The Task Factory Address Parse Transform will take your address data from its current for and separate it out into the standard Address Line 1, Address Line 2, City, State, Zip and Country fields. On top of that it is very simple to use.
Take a look at this set of addresses:

The time it would take you to parse out the addresses from a few thousand row could be better spent doing… well just about anything else (sleeping would be my choice, or maybe watching sports center). Using the Address Parse Transform will help free up that time. And here is how you configure it

Once you have connected your source data to the Task Factory Address Parse Transform you need to tell it what the address column or columns are. That’s right, it doesn’t need to be in one column it can be in multiple columns and it will still separate the address into the standard fields. This first example will use one column. Select the appropriate column from the dropdown.
Once that is done select the Address Quality Tab.

Once parsed the addresses are assigned a quality by the Address Parser Transform in two sections of the address.
· Address Line 1 and Address Line 2
· City, State and Zip
There are three levels of quality that can be assigned to the two sections of an address
· High
§ For address line 1 and address line 2, high quality indicates that the address is correct
§ For city, state and zip, high quality indicates that the city, state and zip correct
· Medium
§ For address line 1 and address line 2, medium quality means the address is complete but missing a part of the address like apartment number or street suffix
§ For city, state and zip, high quality means either the city, state or zip code is missing.
· Low
§ For address line 1 and address line 2, medium quality means the address is not verifiable.
§ For city, state and zip, high quality means more than one of the city, state or zip code is missing.
When the address parse runs it will parse out the data and assign the level of quality based on whether it can find a valid address and a valid city, state and zip according to the rules above. You can either include all of the address as part of the output from the transform or you can choose "Include only high quality parsed addresses." By selecting "Include only high quality parsed addresses" the output named "Parsed Address Output” will contain addresses that are considered high quality. You can then output the other rows to a separate table that will require more analysis, or simply disregard them all together.

You can also specify that some Null handling here as well.
Finally on the Parsed Output Tab you can specify how to handle the newly parsed data. You can:
§ Add New Column
§ Replace Column
You can also specify the data type and the length of each column as well as the code page. Finally you can also specify the casing of the data.

Now just hook it up to your destination and watch the Task Factory Address Parser Transform go to work.

If you get a chance take a look at Task Factory it has 26 transforms designed to assist your development. Good Luck to you all.

SQLServer