MDEV-14992 BACKUP SERVER#4817
Conversation
|
|
2723322 to
1703796
Compare
9a529de to
857edeb
Compare
8149b3d to
c08d121
Compare
|
I plan to rebase this once #5070 has been merged up to the The ultimate merge target is While rebasing, I will write a description based on the commit message of 4769a43, but mentioning actual MDEVs for the outstanding work. Soon after the rebase, we can include #5140 so that this can be tested more conveniently. |
The following SQL statements will be introduced: BACKUP SERVER TO '/path/to/directory' [ 1 CONCURRENT ]; BACKUP SERVER WITH [ 1 CONCURRENT ] 'command'; In place of the 1, any positive number of threads may be specified. For the first variant, '/path/to' must exist and '/path/to/directory' must not exist; that is where the backup will be written to. For the second variant, 'command' must be the name of a script or command that will be executed in a child process. The standard input of that command will be in a format that is compatible with GNU tar --format=oldgnu (and also BSD tar variants that are also part of Microsoft Windows and Apple macOS). The command is expected to optionally compress and encrypt the stream and redirect it to a file on a local or a remote server. The BACKUP SERVER WITH will append an additional argument, a positive base-ten number in ASCII, starting with 1, to identify the current thread. In this way, each concurrent stream can write a separate file. The backup or the first stream will contain a file backup.cnf, which includes parameters needed for restoring the backup. Currently, these are innodb_log_recovery_start and innodb_log_recovery_target. If innodb_log_recovery_target>0, InnoDB will be in read-only mode, not allowing any writes to persistent files other than via the log application. To restore a streaming backup made with BACKUP SERVER WITH, an empty directory needs to be created and all streams be extracted there using the standard tar utility of the operating system, optionally after undoing any encryption or compression that had been added by the backup command. Then, the backup is prepared or MariaDB server started up on the extracted directory, similar to as if the BACKUP SERVER TO statement had been used. Note: The parameter innodb_log_recovery_start in backup.cnf is STRICTLY NECESSARY TO AVOID CORRUPTION! By default, InnoDB crash recovery starts from the latest available log checkpoint. However, for restoring a backup, recovery must start from the checkpoint that was the latest when the backup was started. Starting recovery from a possible later checkpoint will result in a corrupted database! The following will be implemented separately: MDEV-39061 mariadb-backup compatible wrapper script for BACKUP SERVER MDEV-40163 Partial backup and restore MDEV-39091 Back up ENGINE=RocksDB MDEV-39092 Less blocking backup of ENGINE=Aria The implementation introduces a basic driver Sql_cmd_backup, storage engine interfaces, and basic copying of the storage engines InnoDB, Aria, MyISAM, MERGE (MyISAM), Archive, CSV. backup_target: A structured data type to represent a target directory. On Microsoft Windows, we must use directory paths because there is no variant of CopyFileEx() that would work on file handles. backup_sink: Wraps a per-thread output stream as well as storage engine specific context. handlerton::backup_start(), handlerton::backup_end(): Invoked at the start or end of a backup phase, in the thread that executes a BACKUP SERVER statement. handlerton::backup_step(): A backup step that can be invoked from multiple threads concurrently, between the execution of the corresponding handlerton::backup_start() and handlerton::backup_end() of the same phase. copy_entire_file(): A file copying service for POSIX systems. copy_file(): A partial or sparse file-copying service for all systems. backup_stream_append(): Equivalent to copy_file(), but appending to a stream. On Linux, this uses sendfile(2), which assumes that the source data will not be changed before the data has been consumed from the pipe. backup_stream_append_async(): A variant of backup_stream_append() where the source file region is guaranteed to be immutable after the call returns. We must not use Linux sendfile(2) for copying data files that may be modified in place, because it could introduce a race condition between a page write that runs concurrently with a child process that is reading the data from the pipe. InnoDB_backup::context: Backup context, attached to backup_sink so that context can continue to exist between the time a BACKUP SERVER releases all locks and another BACKUP SERVER starts executing, with innodb_backup pointing to the new backup, while the old backup is still being finished. fil_space_t::write_or_backup: Keep track of in-flight page writes and pending backup operation. We must not allow them concurrently, because that could lead into torn pages in the backup. fil_space_t::backup_end: The first page number that is not being backed up (by default 0, to indicate that no backup is in progress). fil_space_t::BACKUP_BATCH_SIZE: The number of preceding pages that will be covered by fil_space_t::backup_end. This is the unit of "page range locking" during InnoDB backup. log_sys.backup: Whether BACKUP SERVER is in progress. The purpose of this is to make BACKUP SERVER prevent the concurrent execution of SET GLOBAL innodb_log_archive=OFF or SET GLOBAL innodb_log_file_size when innodb_log_archive=OFF. log_sys.archived_checkpoint: Keep track of the earliest available checkpoint, corresponding to log_sys.archived_lsn. This reflects SET GLOBAL innodb_log_recovery_start (which is settable now), for incremental backup. buf_flush_list_space(): Check for concurrent backup before writing each page. This is inefficient, but this function may be invoked from multiple threads concurrently, and it cannot be changed easily, especially for fil_crypt_thread(). fil_system.have_all_spaces: Whether all tablespace metadata is guaranteed to be known. To speed up startup, InnoDB does not normally open all tablespace files.
| const uint32_t end{start + fil_space_t::BACKUP_BATCH_SIZE}; | ||
| backup_batch_start(node->space, end); | ||
| /* TODO: avoid copying freed page ranges */ | ||
| err= copy_file(node->handle, f, start * uint64_t{page_size}, | ||
| std::min(end, file_size) * uint64_t{page_size}); | ||
| backup_batch_stop(node->space); |
There was a problem hiding this comment.
If this is a ROW_FORMAT=COMPRESSED table, then the file may be 1024, 2048, or 3172 bytes shorter than calculated, and the copying could fail. This API as well as the one in stream() must be refactored so that we will know how much was actually copied. The reason for this short file is that fil_space_extend_must_retry() will only extend files to integer multiples of 4096 bytes.
In stream() we must pad with field_ref_zero so that the file size will match what was written to the header. The last page will be recovered from the redo log.
Note: We don’t currently keep track of the file size or the allocated file size as of the checkpoint when the backup started. If we did that, we could copy even less. That could be an even more elegant fix of this. I think we would create sparse files that match the current file size.
The following SQL statements will be introduced:
In place of the
1, any positive number of threads may be specified. For the first variant,'/path/to'must exist and'/path/to/directory'must not exist; that is where the backup will be written to.For the second variant,
'command'must be the name of a script or command that will be executed in a child process. The standard input of that command will be in a format that is compatible with GNUtar --format=oldgnu(and also BSDtarvariants that are also part of Microsoft Windows and Apple macOS). The command is expected to optionally compress and encrypt the stream and redirect it to a file on a local or a remote server. TheBACKUP SERVER WITH willappend an additional argument, a positive base-ten number in ASCII, starting with1, to identify the current thread. In this way, each concurrent stream can write a separate file.The backup or the first stream will contain a file
backup.cnf, which includes parameters needed for restoring the backup. Currently, these areinnodb_log_recovery_startandinnodb_log_recovery_target. Ifinnodb_log_recovery_target>0, InnoDB will be in read-only mode, not allowing any writes to persistent files other than via the log application.To restore a streaming backup made with
BACKUP SERVER WITH, an empty directory needs to be created and all streams be extracted there using the standardtarutility of the operating system, optionally after undoing any encryption or compression that had been added by the backup command. Then, the backup is prepared or MariaDB server started up on the extracted directory, similar to as if theBACKUP SERVER TOstatement had been used.Note: The parameter
innodb_log_recovery_startinbackup.cnfis STRICTLY NECESSARY TO AVOID CORRUPTION! By default, InnoDB crash recovery starts from the latest available log checkpoint. However, for restoring a backup, recovery must start from the checkpoint that was the latest when the backup was started. Starting recovery from a possible later checkpoint will result in a corrupted database!The following will be implemented separately:
MDEV-39061
mariadb-backupcompatible wrapper script forBACKUP SERVERMDEV-40163 Partial backup and restore
MDEV-39091 Back up
ENGINE=RocksDBMDEV-39092 Less blocking backup of
ENGINE=AriaThe implementation introduces a basic driver
Sql_cmd_backup, storage engine interfaces, and basic copying of the storage engines InnoDB, Aria, MyISAM, MERGE (MyISAM), Archive, CSV.backup_target: A structured data type to represent a target directory. On Microsoft Windows, we must use directory paths because there is no variant ofCopyFileEx()that would work on file handles.backup_sink: Wraps a per-thread output stream as well as storage engine specific context.handlerton::backup_start(),handlerton::backup_end(): Invoked at the start or end of a backup phase, in the thread that executes aBACKUP SERVERstatement.handlerton::backup_step(): A backup step that can be invoked from multiple threads concurrently, between the execution of the correspondinghandlerton::backup_start()andhandlerton::backup_end()of the same phase.copy_entire_file(): A file copying service for POSIX systems.copy_file(): A partial or sparse file-copying service for all systems.backup_stream_append(): Equivalent tocopy_file(), but appending to a stream. On Linux, this usessendfile(2), which assumes that the source data will not be changed before the data has been consumed from the pipe.backup_stream_append_async(): A variant ofbackup_stream_append()where the source file region is guaranteed to be immutable after the call returns. We must not use Linuxsendfile(2)for copying data files that may be modified in place, because it could introduce a race condition between a page write that runs concurrently with a child process that is reading the data from the pipe.InnoDB_backup::context: Backup context, attached tobackup_sinkso that context can continue to exist between the time aBACKUP SERVERreleases all locks and anotherBACKUP SERVERstarts executing, withinnodb_backuppointing to the new backup, while the old backup is still being finished.fil_space_t::write_or_backup: Keep track of in-flight page writes and pending backup operation. We must not allow them concurrently, because that could lead into torn pages in the backup.fil_space_t::backup_end: The first page number that is not being backed up (by default 0, to indicate that no backup is in progress).fil_space_t::BACKUP_BATCH_SIZE: The number of preceding pages that will be covered byfil_space_t::backup_end. This is the unit of "page range locking" during InnoDB backup.log_sys.backup: WhetherBACKUP SERVERis in progress. The purpose of this is to makeBACKUP SERVERprevent the concurrent execution ofSET GLOBAL innodb_log_archive=OFForSET GLOBAL innodb_log_file_sizewheninnodb_log_archive=OFF.log_sys.archived_checkpoint: Keep track of the earliest available checkpoint, corresponding tolog_sys.archived_lsn. This reflectsSET GLOBAL innodb_log_recovery_start(which is settable now), for incremental backup.buf_flush_list_space(): Check for concurrent backup before writing each page. This is inefficient, but this function may be invoked from multiple threads concurrently, and it cannot be changed easily, especially forfil_crypt_thread().fil_system.have_all_spaces: Whether all tablespace metadata is guaranteed to be known. To speed up startup, InnoDB does not normally open all tablespace files.