A SQL Server filegroup is a container for database files. It lets the administrator organize and allocate them to different drives. Filegroups can store tables, indexes, and different types of data. They can be used to manage database growth and improve performance by placing frequently accessed files on faster drives. Filegroups offer flexible and powerful storage management for a SQL Server database.
The default settings in SQL Server are not always the best. Such is the case when we are working with new user databases; usually we get a single data (*.mdf) and transaction log (*.ldf) file. The data file resides within the PRIMARY file group; the only one we have so far, and it will store all of our databases objects, system objects, user tables, user stored procedures and all other objects. In some cases this default configuration may be good enough for us, but let us cover why would we prefer a different configuration in our production environment.
Working with multiple files and filegroups allows us to achieve higher throughput and to be more flexible when designing our environment.
Benefits of filegroup separation
An added benefit is that additional system PFS, GAM and SGAM pages can be latched within the memory avoiding contention, although most commonly it affects the TempDB system database. In addition the possibility exists to spread our data files between numerous disk drives achieving higher IO operations.
Utilizing the approach for using multiple SQL Server FILEGROUP, we have the possibility to split our data between multiple files within the same Filegroup, even when using table partitioning. In addition to that, we can use the additional filegroups to optimize our database throughput and achieve more flexible backup and restore process.
At a minimum, every SQL Server database has two operating system files: a data file and a log file. Data files contain data and objects such as tables, indexes, stored procedures, and views. Log files contain the information that is required to recover all transactions in the database. Data files can be grouped together in filegroups for allocation and administration purposes.
Database Files
SQL Server databases have three types of files, as shown in the following table.
File | Description |
---|---|
Primary | Contains startup information for the database and points to the other files in the database. Every database has one primary data file. The recommended file name extension for primary data files is .mdf. |
Secondary | Optional user-defined data files. Data can be spread across multiple disks by putting each file on a different disk drive. The recommended file name extension for secondary data files is .ndf. |
Transaction Log | The log holds information used to recover the database. There must be at least one log file for each database. The recommended file name extension for transaction logs is .ldf. |
For example, a simple database named Sales has one primary file that contains all data and objects and a log file that contains the transaction log information. A more complex database named Orders can be created that includes one primary file and five secondary files. The data and objects within the database spread across all six files, and the four log files contain the transaction log information.
By default, the data and transaction logs are put on the same drive and path to handle single-disk systems. This choice may not be optimal for production environments. We recommend that you put data and log files on separate disks.
Data File Pages
Pages in a SQL Server data file are numbered sequentially, starting with zero (0) for the first page in the file. Each file in a database has a unique file ID number. To uniquely identify a page in a database, both the file ID and the page number are required. A file header page is the first page that contains information about the attributes of the file. Several of the other pages at the start of the file also contain system information, such as allocation maps. One of the system pages stored in both the primary data file and the first log file is a database boot page that contains information about the attributes of the database.
Filegroup
- The filegroup contains the primary data file and any secondary files that aren’t put into other filegroups.
- User-defined filegroups can be created to group data files together for administrative, data allocation, and placement purposes.
For example: Data1.ndf
, Data2.ndf
, and Data3.ndf
, can be created on three disk drives, respectively, and assigned to the filegroup fgroup1
. A table can then be created specifically on the filegroup fgroup1
. Queries for data from the table will be spread across the three disks; it will improve performance. The same performance improvement can be accomplished by using a single file created on a RAID (redundant array of independent disks) stripe set. However, files and filegroups let us easily add new files to new disks.
A filegroup is a logical structure to group objects in a database. Don’t confuse filegroups with actual files (.mdf, .ddf, .ndf, .ldf, etc.). We can have multiple filegroups per database. One filegroup will be the primary, and all system tables are stored on it. Then, we add additional filegroups. We can specify one filegroup as the default, and objects not specifically assigned to a filegroup will exist in the default. In a filegroup, we can have multiple files.
SQL Server Filegroup
SQL Server creates a primary file group and data file when a new database is made. The primary file group is set as the default group. User objects, like tables and stored procedures, are placed in the primary file group by default. To put user objects in a secondary data file, follow these steps:
- Create a new file group and mark that file group as Default.
- Create a new data file (.ndf file) and set the file group of this data file to the new file group that you just created.
After doing this, all subsequent objects we create in the database are going to be created inside the file(s) in the secondary file group.
Please note that, Transaction log files are not included in any file group.
Default (Primary) Filegroup
When creating objects in a database, you can specify which filegroup they belong to. If you don’t, they’ll be assigned to the default filegroup. The default filegroup is a specific filegroup that is always designated. The files in the default filegroup must be large enough to hold any new objects not allocated to other filegroups.
The PRIMARY filegroup is the default filegroup unless it is changed by using the ALTER DATABASE statement. Allocation for the system objects and tables remains within the PRIMARY filegroup, not the new default filegroup.
File and Filegroup Fill Strategy
SQL Server Database Engine writes data to filegroups in proportion to free space in each file. For example, if file f1 has 100 MB free and file f2 has 200 MB free, one extent is given from file f1 and two extents from file f2. This way, both files become full at about the same time. This is called simple striping.
For example, A filegroup is made of three files. They all grow automatically. When the space in all the files is used up, the first file grows. When the first file is full and can’t fit any more data, the second file grows. When the second file is full and can’t fit any more data, the third file grows. If the third file is full and can’t fit any more data, the first file grows again. This pattern repeats.
Key indicators to move to multiple filegroups
- When disk queuing is causing application and user experience issues
- If this is the case, consider leveraging additional disk drives with new filegroups housing IO intensive tables
- When particular tables are 10% or more of the database
- If this is the case, consider moving these particularly large tables to separate filegroups on separate underlying disk drives
- Depending on the table size in proportion to the remainder of the tables, consider building a filegroup for individual table(s)
- When non clustered index and data space are equal on large tables
- If this is the case, consider splitting the data and clustered index from the non-clustered indexes
- When an almost equal percentage of read-only and read-write data exist in the database
- If this is the case, consider splitting the read-only data in a separate filegroup as the read-write data
- When insufficient time is available to perform database maintenance
- If this is the case, consider splitting the large tables into separate filegroups on different underlying disks and perform maintenance in parallel
- When the business or application will be changing significantly and the data is going to grow at a much higher rate
- If this is the case, consider working with the users to understand the potential growth.
Eliminating bottlenecks
The slowest part of a DB system is the disk drives. Eliminating bottlenecks at the disk level will improve performance. When data is being looked up and an index is used, the index is first looked up and then the corresponding data is fetched. If both the index and data are on the same disks, then there is some contention happening. Whereas, if the data were on a different (physical) disk, then there is faster IO happening, thereby increasing performance. The main part to note is that the data or index are on separate physical disks or LUNs.
Why Should I Create Multiple Filegroups?
There are two primary reasons for creating filegroups: performance and recovery.
Filegroups that contain files created on specific disks can alleviate disk performance issues. For example, we may have one very large table in our database with a lot of read and write activity – an orders table, perhaps. We can create a filegroup, create a file in the filegroup, and then move a table to the filegroup by moving the clustered index. If the file is created on a disk separate from other files, we are going to have better performance. This is similar to the logic behind separating data and log files in a database. Performance improves when we spread files across multiple disks because we have multiple heads reading and writing, rather than one doing all the work.
Filegroups can be backed up and restored separately as well. This can enable faster object recovery in the case of a disaster. It can also help the administration of large databases.
When Separating doesn’t make sense
We need to understand exactly why a given filegroup approach might be advantageous so that we can choose what is best for the situation at hand. For example, I/O Performance of high transaction OLTP databases is best achieved by balancing the heavy random I/O workload over as many disks as possible because this maximizes the disk transfers/sec rate. I don’t think segregating data and index files is appropriate for OLTP applications because it artificially caps the maximum I/O rate and can actually introduce an unnecessary I/O bottleneck. In contrast, a single filegroup for both data and indexes distributes I/O proportionally according to the size of the underlying files with less complexity and administration than the separate filegroup approach.
Single user-defined filegroup strategy works well for both OLTP and mixed workloads where sequential I/O cannot be accurately predicted. If we have databases containing large amounts of historical data, we can use filegroups to place infrequently accessed older data on read-only filegroups with the files on relatively slow inexpensive storage. Often-queried recent data can then be stored on the more expensive fast storage devices via a read-write filegroup.
The key here is that the additional maintenance and administration complexity is justified by the hardware savings. Similarly, large reporting databases can benefit from a customized filegroup strategy to facilitate backup/recovery or improve performance by isolating random and sequential I/O for specialized (and predictable) workloads.
SQL Server File group best practices
If you have a small or moderate database, the default file/file group organization should work fine. But, if your database grows larger than 1000 MB, you should optimize the file/file group organization to improve performance. Here are some best practices to follow:
- The primary file group must be totally separate, and should be left to have only system objects, and no user defined object should be created on the primary file group. Also, the primary file group should not be set as the default file group. Separating the system objects from other user objects will increase performance and enhance the ability to access tables in cases of serious data failures.
- If there are N physical disk drives available in the system, then try to create N files per file group and put each one in a separate disk. This will allow distributing disk I/O load over multiple disks, and will increase performance.
- For frequently accessed tables containing indexes, put the tables and the indexes in separate file groups. This would enable reading the index and table data faster.
- Consider assigning “Read only” tables into a file group that is marked as “Read only”. This would enable faster data retrieval from these read only tables. Similarly, assign “Write only” tables in a different file group to allow for faster updates.
Other things to consider:
- Keep system objects on the primary file group and don’t make it the default file group. This will improve performance and make it easier to access tables in case of data failure.
- If you have N physical disk drives, create N files per file group and put each one on a separate disk. This will distribute disk I/O load and improve performance.
- For frequently accessed tables with indexes, put the tables and indexes in separate file groups. This will make it faster to read the index and table data.
- For frequently accessed tables with Text or Image columns, create a separate file group for the text, next, and image columns and put them on different physical disks. Put the tables in a different file group. This will make it faster to retrieve data from the table with queries that don’t contain text or image columns.
- Put the transaction log file on a physical disk that is not used by the data files. The logging operation is write-intensive, so it’s important to have a disk with good I/O performance for the log.
- Assign “Read only” tables to a file group marked as “Read only” for faster data retrieval. Similarly, assign “Write only” tables to a different file group for faster updates.
- When your database grows larger than 1000 MB, you can improve performance by changing the file/file group organization.
- Put the transaction log file on a different physical disk that is not used by the data files. The logging operation (Transaction log writing operation) is more write-intensive, and hence it is important to have the log on the disk that has good I/O performance.
Last but not the least…
Avoid letting SQL Server use the “Auto grow” feature too often because it’s costly. Set an “Auto grow” increment value so the database size increases less frequently. Don’t use the “Auto shrink” feature either and either shrink the database size manually or use a scheduled operation that runs at a set interval.
For frequently accessed table containing Text
or Image
columns, create a separate file group and put the text, next, and image columns in that file group on different physical disks, and put the tables in a different file group. This would enable faster data retrieval from the table with queries that don’t contain text or image columns.
Do not let SQL Server fire the “Auto grow” feature too often because it is a costly operation. Set an “Auto grow” increment value so that the database size is increased less frequently (say, once per week). Similarly, do not use the “Auto shrink” feature for the same reason. Disable it, and either shrink the database size manually, or use a scheduled operation that runs in a timed interval (say, once a month).
Recommendations
Recommendations when working with files and filegroups:
- Most databases will work well with a single data file and a single transaction log file.
- If you use multiple data files, create a second filegroup for the additional file and make that filegroup the default filegroup. In this way, the primary file will contain only system tables and objects.
- To maximize performance, create files or filegroups on different available disks as possible. Put objects that compete heavily for space in different filegroups.
- Use filegroups to enable placement of objects on specific physical disks.
- Put different tables used in the same join queries in different filegroups. This step will improve performance, because of parallel disk I/O searching for joined data.
- Put heavily accessed tables and the non-clustered indexes that belong to those tables on different filegroups. Using different filegroups will improve performance, because of parallel I/O if the files are located on different physical disks.
- Don’t put the transaction log file(s) on the same physical disk that has the other files and filegroups.