Skip to content

MDEV-39092 Copy Aria data and logs as part of backup#4971

Open
mariadb-andrzejjarzabek wants to merge 14 commits into
MariaDB:MDEV-14992from
mariadb-andrzejjarzabek:MDEV-39092
Open

MDEV-39092 Copy Aria data and logs as part of backup#4971
mariadb-andrzejjarzabek wants to merge 14 commits into
MariaDB:MDEV-14992from
mariadb-andrzejjarzabek:MDEV-39092

Conversation

@mariadb-andrzejjarzabek

@mariadb-andrzejjarzabek mariadb-andrzejjarzabek commented Apr 22, 2026

Copy link
Copy Markdown
Contributor

An interim solution with some room for optimization:

  • All DDL is blcoked while Aria is being backed up
  • All table caches purged when Aria backup starts (including for non-Aria tables)
  • Writes to non-transactional, but not to transactional tables are blocked when table files are being backed up
  • All commits blocked when Aria log files are being backed up

@CLAassistant

CLAassistant commented Apr 22, 2026

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Comment thread include/my_backup.h Outdated
Comment thread include/my_backup.h Outdated
Comment thread include/my_backup.h Outdated
Comment thread mysys/CMakeLists.txt Outdated
Comment thread mysys/my_backup.cc Outdated
Comment thread sql/handler.h Outdated
Comment thread sql/sql_backup.cc Outdated
Comment thread storage/maria/ma_backup.cc Outdated
Comment thread storage/maria/ma_backup.cc Outdated
Comment thread storage/maria/ma_backup.cc Outdated
Comment thread sql/sql_backup.cc Outdated
@mariadb-andrzejjarzabek mariadb-andrzejjarzabek force-pushed the MDEV-39092 branch 2 times, most recently from 8565956 to 04e3bc2 Compare May 21, 2026 10:53
@mariadb-andrzejjarzabek mariadb-andrzejjarzabek force-pushed the MDEV-39092 branch 2 times, most recently from 14ca552 to c9429eb Compare May 29, 2026 09:47
@mariadb-andrzejjarzabek mariadb-andrzejjarzabek force-pushed the MDEV-39092 branch 2 times, most recently from d35cd47 to 824afeb Compare June 1, 2026 15:15
@mariadb-andrzejjarzabek mariadb-andrzejjarzabek marked this pull request as ready for review June 1, 2026 15:16
@mariadb-andrzejjarzabek mariadb-andrzejjarzabek marked this pull request as draft June 1, 2026 15:17
Comment thread sql/handler.h Outdated
Comment thread sql/handler.h Outdated
Comment thread sql/sql_backup.cc Outdated
Comment thread storage/maria/ma_backup.cc Outdated
Comment on lines 217 to 220
backup_target target;
#ifndef _WIN32
const int datadir_fd;
#endif

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a declaration of datadir_fd in the first place? It turns out that it is never being assigned, only passed to openat(2). This seems to rely on zero-initialization and some undefined behaviour. For example, in Linux, AT_FDCWD is defined as -100.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed later that this data member is being initialized from a call to open(…, O_DIRECTORY) in the constructor. I think that this is bad practice; we should allow the singleton object to be initialized statically.

It is unclear when if ever the directory handle would be closed. I suspect that we would hold the handle open until the server is shut down.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datadir_fd is initialized in the Aria_backup constructor.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a reasonable trade-off to have a descriptor permanently opened for backups for the purpose of just one plugin? Should each plugin that performs the backup keep an open descriptor to the data directory? When adding another bit of functionality (not backup) to any part of the server or a plugin, with its own class/module/translation unit, which needs to open files in the data directory, should it also maintain a file descriptor to the data directory? There may be a case for defining an API for the server to maintain a single descriptor and make it available for other code, but given that there isn't one, I would argue that opening a descriptor for the duration of the backup and closing it on backup end. A backup isn't a fine-grained operation which we expect to be performed multiple times per second (or even minute), but we expect that many hours may pass between successive backups. And every backup will typically open hundreds or thousands of files, so saving the time and I/O of one open/close is not significant and it's not clear that maintaining an additional open descriptor the whole time the server is running is is balanced out by that.

In any case I propose putting that discussion off for a possible future improvement, where we could discuss defining an API to make a data directory descriptor available to the whole server.

Comment thread storage/maria/ma_backup.cc Outdated
Comment thread storage/maria/ma_backup.cc Outdated
Comment thread storage/maria/ma_backup.cc Outdated
Comment thread storage/maria/ma_backup.cc Outdated
Comment thread storage/maria/ma_backup.cc Outdated
Comment thread mysql-test/main/backup_server_restore.result Outdated
@mariadb-andrzejjarzabek mariadb-andrzejjarzabek force-pushed the MDEV-39092 branch 4 times, most recently from 0de6368 to 8b1c81b Compare June 12, 2026 06:11
Comment thread storage/maria/ma_backup.cc Outdated
Comment on lines +621 to +625
case BACKUP_PHASE_NO_DML_NON_TRANS:
/* FIXME: Would be better to selectively purge only the tables we need. */
tc_purge();
tdc_purge(true);
return 0;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is missing a reference to a ticket where this performance problem will be fixed. As far as I understand, we only need such cleanup for ENGINE=Aria and ENGINE=MyISAM. We don’t want unnecessary disruption of ENGINE=InnoDB. The BACKUP_PHASE_NO_COMMIT that will be executed a couple of steps after this one will be disruptive enough.

@mariadb-andrzejjarzabek mariadb-andrzejjarzabek marked this pull request as ready for review June 17, 2026 08:44
This introduces a basic driver Sql_cmd_backup, storage engine interfaces,
and basic copying of InnoDB, Aria, and MyISAM files.

For streaming, we aim to generate streams that are compatible with GNU
tar --format=oldgnu, which is also supported by the built-in BSD tar
on FreeBSD, macOS as well as Microsoft Windows. That is, to extract
files from a backup stream, you can use the standard tar utility of
the operating system, instead of anything nonstandard like xbstream or
mbstream.

TODO: Support partial backup and restore.  This should be done by
implementing some configuration parameters as well as a predicate
that checks if a file name pattern should be included.

TODO: Back up ENGINE=RocksDB.

TODO: Refactor the crude first implementation of Aria_backup that is
based on the work of Andrzej Jarzabek, to copy as many files as
concurrently as possible while holding the minimum amount of locks.

---

backup_target: A structured data type to represent a target directory.
On Microsoft Windows, we must use directory paths because there is
no variant of CopyFileEx() that would work on file handles.

backup_sink: Wraps a per-thread output stream as well as storage engine
specific context.

handlerton::backup_start(), handlerton::backup_end(): Invoked at the
start or end of a backup phase, in the thread that executes a
BACKUP SERVER statement.

handlerton::backup_step(): A backup step that can be invoked from
multiple threads concurrently, between the execution of the corresponding
handlerton::backup_start() and handlerton::backup_end() of the same
phase.

copy_entire_file(): A file copying service for POSIX systems.

copy_file(): A sparse file-copying service for all systems.

backup_stream_append_async(): A variant of backup_stream_append()
where the source file region is guaranteed to be immutable after the
call returns. We must not use Linux sendfile(2) for copying data files
that may be modified in place, because it could introduce a race
condition between a page write that runs concurrently with a child process
that is reading the data from the pipe.

InnoDB_backup::context: Backup context, attached to backup_sink
so that context can continue to exist between the time a
BACKUP SERVER releases all locks and another BACKUP SERVER starts
executing, with innodb_backup pointing to the new backup, while
the old backup is still being finished.

fil_space_t::write_or_backup: Keep track of in-flight page writes and
pending backup operation. We must not allow them concurrently, because
that could lead into torn pages in the backup.

fil_space_t::backup_end: The first page number that is not being backed up
(by default 0, to indicate that no backup is in progress).

fil_space_t::BACKUP_BATCH_SIZE: The number of preceding pages that will be
covered by fil_space_t::backup_end. This is the unit of "page range locking"
during InnoDB backup.

log_sys.backup: Whether BACKUP SERVER is in progress. The purpose of this
is to make BACKUP SERVER prevent the concurrent execution of
SET GLOBAL innodb_log_archive=OFF or SET GLOBAL innodb_log_file_size
when innodb_log_archive=OFF.

log_sys.archived_checkpoint: Keep track of the earliest available
checkpoint, corresponding to log_sys.archived_lsn. This reflects
SET GLOBAL innodb_log_recovery_start (which is settable now), for
incremental backup.

buf_flush_list_space(): Check for concurrent backup before writing each
page. This is inefficient, but this function may be invoked from multiple
threads concurrently, and it cannot be changed easily, especially for
fil_crypt_thread().
dr-m added 4 commits June 26, 2026 09:22
Eliminate the buggy Source_dir that broke error propagation.
Also, do not misidentify Linux block devices or sockets as directories.
InnoDB_backup::context::de_hardlink(): Avoid copying garbage at the start
of the first file.
fil_system.have_all_spaces: A flag that indicates whether all
tablespace metadata is present in fil_system.
Comment thread sql/sql_backup.cc Outdated
Comment on lines +455 to +462
static my_bool copy_misc_files(const backup_target *target,
const backup_sink *sink)
{
Dir_scan datadir(".", MYF(MY_WANT_STAT));
if (datadir.is_error())
return true;
std::unordered_set<std::string> ensured_dirs;
int error= datadir.for_each([target, sink, &ensured_dirs](const fileinfo &fi)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I pointed out in this comment in #4817, executing potentially failing operations in a constructor is not a good design. This must be refactored in the style of fffd4b6. Inline iteration (range for using st_::span or std::span) should be more efficient than lambda functions, which cannot always be inlined by compilers. In this function we even have two nested lambda function iterations, on two different Dir_scan.

Comment on lines +25 to 28
struct backup_sink;

/** A payload chunk in a sparse file that is being streamed */
struct backup_chunk

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of adding a forward declaration right before an actual declaration?

Comment on lines +230 to +249
/* RAII wrapper for my_dir() */
class Dir_scan
{
public:
explicit Dir_scan(const char* path, myf flags) noexcept
{
dir_info= my_dir(path, flags);
if (!dir_info)
{
my_error(ER_CANT_READ_DIR, MYF(0), path, my_errno);
}
}
~Dir_scan() noexcept
{
my_dirend(dir_info);
}
bool is_error() const noexcept
{
return !dir_info;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, do not execute potentially failing operations in a constructor.

Why is the destructor potentially invoking my_dirend(nullptr)?

What purpose does explicit have on a constructor that takes more than 1 parameter?

MY_DIR *dir_info {nullptr};
};

std::string build_path(const char *base_path, const char *filename) noexcept;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to refactor my_dir() so that it can return a directory handle, on which openat may be invoked. We generally try to avoid operations on std::string because there already are enough issues with heap fragmentation.

Comment thread sql/sql_backup.cc
Comment on lines +785 to +788
if (phase == BACKUP_PHASE_NO_DDL)
fail= copy_misc_files(&target_phase->target, &target_phase->sink);
if (fail)
break;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessarily testing fail when the first if does not hold.

Comment thread sql/sql_backup.cc
const backup_sink *sink)
{
Dir_scan datadir(".", MYF(MY_WANT_STAT));
if (datadir.is_error())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the embedded server library libmariadbd, the data directory is not necessarily the current directory.

Comment thread sql/sql_backup.cc
Comment on lines +434 to +441
static bool is_misc_file(const char* filename)
{
size_t filename_len= strlen(filename);
if (filename_len < ext_len)
return false;
const char *file_ext = filename + filename_len - ext_len;
return match_misc_ext(file_ext) || is_db_opt(filename, filename_len);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 4769a43, Aria_backup::is_db_file() is using strrchr and strcmp for these, and it is omitting a . prefix from the suffix string that are being compared to. This feels unnecessarily complicated. Can you demonstrate that this approach results in smaller or more efficient code on at least one ISA?

Comment on lines +396 to +403
#if 1 // FIXME: invoke these only for Aria, MyISAM, CSV but not others
case BACKUP_PHASE_NO_DML_NON_TRANS:
/* FIXME: Would be better to selectively purge only the tables we need.
To be fixed in MDEV-39987. */
tc_purge();
tdc_purge(true);
break;
#endif

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only partly addressing my previous comment about this. MDEV-39987 does not make it clear that this performance issue mainly affects other storage engines, such as ENGINE=InnoDB or ENGINE=RocksDB, which do not need any table handles to be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants