Discover SQL Server Filegroup Separation Secrets: What and Why?

The default settings in SQL Server are not always the best. Such is the case when we are working with new user databases; usually we get a single data (*.mdf) and transaction log (*.ldf) file. The data file resides within the PRIMARY file group; the only one we have so far, and it will store all of our databases objects, system objects, user tables, user stored procedures and all other objects. In some cases this default configuration may be good enough for us, but let us cover why would we prefer a different configuration in our production environment.

Working with multiple files and filegroups allows us to achieve higher throughput and to be more flexible when designing our environment.

An added benefit is that additional system PFS, GAM and SGAM pages can be latched within the memory avoiding contention, although most commonly it affects the TempDB system database. In addition the possibility exists to spread our data files between numerous disk drives achieving higher IO operations.

Utilizing the approach for using multiple SQL Server FILEGROUP, we have the possibility to split our data between multiple files within the same Filegroup, even when using table partitioning. In addition to that, we can use the additional filegroups to optimize our database throughput and achieve more flexible backup and restore process.

At a minimum, every SQL Server database has two operating system files: a data file and a log file. Data files contain data and objects such as tables, indexes, stored procedures, and views. Log files contain the information that is required to recover all transactions in the database. Data files can be grouped together in filegroups for allocation and administration purposes.

Database Files

SQL Server databases have three types of files, as shown in the following table.

FileDescription
PrimaryContains startup information for the database and points to the other files in the database. Every database has one primary data file. The recommended file name extension for primary data files is .mdf.
SecondaryOptional user-defined data files. Data can be spread across multiple disks by putting each file on a different disk drive. The recommended file name extension for secondary data files is .ndf.
Transaction LogThe log holds information used to recover the database. There must be at least one log file for each database. The recommended file name extension for transaction logs is .ldf.

For example, a simple database named Sales has one primary file that contains all data and objects and a log file that contains the transaction log information. A more complex database named Orders can be created that includes one primary file and five secondary files. The data and objects within the database spread across all six files, and the four log files contain the transaction log information.

By default, the data and transaction logs are put on the same drive and path to handle single-disk systems. This choice may not be optimal for production environments. We recommend that you put data and log files on separate disks.

Data File Pages

Pages in a SQL Server data file are numbered sequentially, starting with zero (0) for the first page in the file. Each file in a database has a unique file ID number. To uniquely identify a page in a database, both the file ID and the page number are required. A file header page is the first page that contains information about the attributes of the file. Several of the other pages at the start of the file also contain system information, such as allocation maps. One of the system pages stored in both the primary data file and the first log file is a database boot page that contains information about the attributes of the database.

SQL Server Filegroup

  • The filegroup contains the primary data file and any secondary files that aren’t put into other filegroups.
  • User-defined filegroups can be created to group data files together for administrative, data allocation, and placement purposes.

For example: Data1.ndfData2.ndf, and Data3.ndf, can be created on three disk drives, respectively, and assigned to the filegroup fgroup1. A table can then be created specifically on the filegroup fgroup1. Queries for data from the table will be spread across the three disks; it will improve performance. The same performance improvement can be accomplished by using a single file created on a RAID (redundant array of independent disks) stripe set. However, files and filegroups let us easily add new files to new disks.

A filegroup is a logical structure to group objects in a database. Don’t confuse filegroups with actual files (.mdf, .ddf, .ndf, .ldf, etc.). We can have multiple filegroups per database. One filegroup will be the primary, and all system tables are stored on it. Then, we add additional filegroups. We can specify one filegroup as the default, and objects not specifically assigned to a filegroup will exist in the default. In a filegroup, we can have multiple files.

When a new SQL Server database is created, the primary file group is created and the primary data file is included in the primary file group. Also, the primary group is marked as the default group. As a result, every newly created user object is automatically placed inside the primary file group (more specifically, inside the files in the primary file group). If we want your user objects (Tables/Views/Stored Procedures/Functions, and others) to be created in a secondary data file, then:

  • Create a new file group and mark that file group as Default.
  • Create a new data file (.ndf file) and set the file group of this data file to the new file group that you just created.

After doing this, all subsequent objects we create in the database are going to be created inside the file(s) in the secondary file group.
Please note that, Transaction log files are not included in any file group.

Default (Primary) Filegroup

When objects are created in the database without specifying which filegroup they belong to, they are assigned to the default filegroup. At any time, exactly one filegroup is designated as the default filegroup. The files in the default filegroup must be large enough to hold any new objects not allocated to other filegroups.

The PRIMARY filegroup is the default filegroup unless it is changed by using the ALTER DATABASE statement. Allocation for the system objects and tables remains within the PRIMARY filegroup, not the new default filegroup.

File and Filegroup Fill Strategy

Filegroups use a proportional fill strategy across all the files within each filegroup. As data is written to the filegroup, the SQL Server Database Engine writes an amount proportional to the free space in the file to each file within the filegroup, instead of writing all the data to the first file until full. It then writes to the next file. For example, if file f1 has 100 MB free and file f2 has 200 MB free, one extent is given from file f1, two extents from file f2, and so on. In this way, both files become full at about the same time, and simple striping is achieved.

For example, a filegroup is made up of three files, all set to automatically grow. When space in all the files in the filegroup is exhausted, only the first file is expanded. When the first file is full and no more data can be written to the filegroup, the second file is expanded. When the second file is full and no more data can be written to the filegroup, the third file is expanded. If the third file becomes full and no more data can be written to the filegroup, the first file is expanded again, and so on.

sql server filegroup

Key indicators to move to multiple filegroups

  • When disk queuing is causing application and user experience issues
    • If this is the case, consider leveraging additional disk drives with new filegroups housing IO intensive tables
  • When particular tables are 10% or more of the database
    • If this is the case, consider moving these particularly large tables to separate filegroups on separate underlying disk drives
    • Depending on the table size in proportion to the remainder of the tables, consider building a filegroup for individual table(s)
  • When non clustered index and data space are equal on large tables
    • If this is the case, consider splitting the data and clustered index from the non-clustered indexes
  • When an almost equal percentage of read-only and read-write data exist in the database
    • If this is the case, consider splitting the read-only data in a separate filegroup as the read-write data
  • When insufficient time is available to perform database maintenance
    • If this is the case, consider splitting the large tables into separate filegroups on different underlying disks and perform maintenance in parallel
  • When the business or application will be changing significantly and the data is going to grow at a much higher rate
    • If this is the case, consider working with the users to understand the potential growth.

The slowest part of a DB system is the disk drives. Eliminating bottlenecks at the disk level will improve performance. When data is being looked up and an index is used, the index is first looked up and then the corresponding data is fetched. If both the index and data are on the same disks, then there is some contention happening. Whereas, if the data were on a different (physical) disk, then there is faster IO happening, thereby increasing performance. The main part to note is that the data or index are on separate physical disks or LUNs.

Why Should I Create Multiple Filegroups?

There are two primary reasons for creating filegroups: performance and recovery.

Filegroups that contain files created on specific disks can alleviate disk performance issues. For example, we may have one very large table in our database with a lot of read and write activity – an orders table, perhaps. We can create a filegroup, create a file in the filegroup, and then move a table to the filegroup by moving the clustered index. If the file is created on a disk separate from other files, we are going to have better performance. This is similar to the logic behind separating data and log files in a database. Performance improves when we spread files across multiple disks because we have multiple heads reading and writing, rather than one doing all the work.

Filegroups can be backed up and restored separately as well. This can enable faster object recovery in the case of a disaster. It can also help the administration of large databases.

When Separating doesn’t make sense

We need to understand exactly why a given filegroup approach might be advantageous so that we can choose what is best for the situation at hand.  For example, I/O Performance of high transaction OLTP databases is best achieved by balancing the heavy random I/O workload over as many disks as possible because this maximizes the disk transfers/sec rate.  I don’t think segregating data and index files is appropriate for OLTP applications because it artificially caps the maximum I/O rate and can actually introduce an unnecessary I/O bottleneck. In contrast, a single filegroup for both data and indexes distributes I/O proportionally according to the size of the underlying files with less complexity and administration than the separate filegroup approach. 
Single user-defined filegroup strategy works well for both OLTP and mixed workloads where sequential I/O cannot be accurately predicted. If we have databases containing large amounts of historical data, we can use filegroups to place infrequently accessed older data on read-only filegroups with the files on relatively slow inexpensive storage. Often-queried recent data can then be stored on the more expensive fast storage devices via a read-write filegroup. 
The key here is that the additional maintenance and administration complexity is justified by the hardware savings. Similarly, large reporting databases can benefit from a customized filegroup strategy to facilitate backup/recovery or improve performance by isolating random and sequential I/O for specialized (and predictable) workloads.

SQL Server File group best practices

When you have a small or moderate sized database, then the default file/ file group organization that gets created while creating the database may be enough for you. But, when your database has a tendency to grow larger (say, over 1000 MB) in size, you can (and should) do a little tweaking in the file/file group organizations in the database to enhance database performance. Here are some of the best practices you can follow:

  • The primary file group must be totally separate, and should be left to have only system objects, and no user defined object should be created on the primary file group. Also, the primary file group should not be set as the default file group. Separating the system objects from other user objects will increase performance and enhance the ability to access tables in cases of serious data failures.
  • If there are N physical disk drives available in the system, then try to create N files per file group and put each one in a separate disk. This will allow distributing disk I/O load over multiple disks, and will increase performance.
  • For frequently accessed tables containing indexes, put the tables and the indexes in separate file groups. This would enable reading the index and table data faster.
  • For frequently accessed table containing Text or Image columns, create a separate file group and put the text, next, and image columns in that file group on different physical disks, and put the tables in a different file group. This would enable faster data retrieval from the table with queries that don’t contain text or image columns.
  • Put the transaction log file on a different physical disk that is not used by the data files. The logging operation (Transaction log writing operation) is more write-intensive, and hence it is important to have the log on the disk that has good I/O performance.
  • Consider assigning “Read only” tables into a file group that is marked as “Read only”. This would enable faster data retrieval from these read only tables. Similarly, assign “Write only” tables in a different file group to allow for faster updates.
  • Do not let SQL Server fire the “Auto grow” feature too often because it is a costly operation. Set an “Auto grow” increment value so that the database size is increased less frequently (say, once per week). Similarly, do not use the “Auto shrink” feature for the same reason. Disable it, and either shrink the database size manually, or use a scheduled operation that runs in a timed interval (say, once a month).

Recommendations

Recommendations when working with files and filegroups:

  • Most databases will work well with a single data file and a single transaction log file.
  • If you use multiple data files, create a second filegroup for the additional file and make that filegroup the default filegroup. In this way, the primary file will contain only system tables and objects.
  • To maximize performance, create files or filegroups on different available disks as possible. Put objects that compete heavily for space in different filegroups.
  • Use filegroups to enable placement of objects on specific physical disks.
  • Put different tables used in the same join queries in different filegroups. This step will improve performance, because of parallel disk I/O searching for joined data.
  • Put heavily accessed tables and the non-clustered indexes that belong to those tables on different filegroups. Using different filegroups will improve performance, because of parallel I/O if the files are located on different physical disks.
  • Don’t put the transaction log file(s) on the same physical disk that has the other files and filegroups.