Redis AOF刷新策略分析 – 运维那点事

Redis AOF刷新策略

Redis支持使用aof来进行持久化，防止数据丢失，aof的刷新策略通过参数appendfsync控制，有三个值：always、everysec、no，默认是everysec。

下面从源码的角度剖析一下aof的刷新策略。

每次redis进入event循环准备执行这个event时，会调用beforeSleep方法。

void aeMain(aeEventLoop *eventLoop) {
  eventLoop->stop = 0;
  while (!eventLoop->stop) {
    if (eventLoop->beforesleep != NULL)
      eventLoop->beforesleep(eventLoop);
    aeProcessEvents(eventLoop, AE_ALL_EVENTS);
  }
}

/* This function gets called every time Redis is entering the
 * main loop of the event driven library, that is, before to sleep
 * for ready file descriptors. */
void beforeSleep(struct aeEventLoop *eventLoop) {
  ......
  /* Write the AOF buffer on disk */
  flushAppendOnlyFile(0);
  ......
}

void aeMain(aeEventLoop *eventLoop) {

eventLoop->stop = 0;

while (!eventLoop->stop) {

if (eventLoop->beforesleep != NULL)

eventLoop->beforesleep(eventLoop);

aeProcessEvents(eventLoop, AE_ALL_EVENTS);

}

/* This function gets called every time Redis is entering the

* main loop of the event driven library, that is, before to sleep

* for ready file descriptors. */

void beforeSleep(struct aeEventLoop *eventLoop) {

......

/* Write the AOF buffer on disk */

flushAppendOnlyFile(0);

......

}

上面的代码中的flushAppendOnlyFile(int force)进行实际的执行。

src/aof.c

void flushAppendOnlyFile(int force) {
    ssize_t nwritten;
    int sync_in_progress = 0;
    mstime_t latency;

    if (sdslen(server.aof_buf) == 0) return;

    if (server.aof_fsync == AOF_FSYNC_EVERYSEC)
        sync_in_progress = bioPendingJobsOfType(BIO_AOF_FSYNC) != 0;

    if (server.aof_fsync == AOF_FSYNC_EVERYSEC && !force) {
        /* With this append fsync policy we do background fsyncing.
         * If the fsync is still in progress we can try to delay
         * the write for a couple of seconds. */
        if (sync_in_progress) {
            if (server.aof_flush_postponed_start == 0) {
                /* No previous write postponing, remember that we are
                 * postponing the flush and return. */
                server.aof_flush_postponed_start = server.unixtime;
                return;
            } else if (server.unixtime - server.aof_flush_postponed_start < 2) {
                /* We were already waiting for fsync to finish, but for less
                 * than two seconds this is still ok. Postpone again. */
                return;
            }
            /* Otherwise fall trough, and go write since we can't wait
             * over two seconds. */
            server.aof_delayed_fsync++;
            serverLog(LL_NOTICE,"Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.");
        }
    }

void flushAppendOnlyFile(int force) {

ssize_t nwritten;

int sync_in_progress = 0;

mstime_t latency;

if (sdslen(server.aof_buf) == 0) return;

if (server.aof_fsync == AOF_FSYNC_EVERYSEC)

sync_in_progress = bioPendingJobsOfType(BIO_AOF_FSYNC) != 0;

if (server.aof_fsync == AOF_FSYNC_EVERYSEC && !force) {

/* With this append fsync policy we do background fsyncing.

* If the fsync is still in progress we can try to delay

* the write for a couple of seconds. */

if (sync_in_progress) {

if (server.aof_flush_postponed_start == 0) {

/* No previous write postponing, remember that we are

* postponing the flush and return. */

server.aof_flush_postponed_start = server.unixtime;

return;

} else if (server.unixtime - server.aof_flush_postponed_start < 2) {

/* We were already waiting for fsync to finish, but for less

* than two seconds this is still ok. Postpone again. */

return;

}

/* Otherwise fall trough, and go write since we can't wait

* over two seconds. */

server.aof_delayed_fsync++;

serverLog(LL_NOTICE,"Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.");

}

AOF_FSYNC_ALWAYS会调用aof_fsync进行同步写入，而aof_fsync在Linux下就是fdatasync，AOF_FSYNC_EVERYSEC会调用aof_background_fsync，而aof_background_fsync会创建一个任务交给后台的bio线程进行处理。

/* Define aof_fsync to fdatasync() in Linux and fsync() for all the rest */
#ifdef __linux__
#define aof_fsync fdatasync
#else
#define aof_fsync fsync
#endif
/* Starts a background task that performs fsync() against the specified
 * file descriptor (the one of the AOF file) in another thread. */
void aof_background_fsync(int fd) {
    bioCreateBackgroundJob(REDIS_BIO_AOF_FSYNC,(void*)(long)fd,NULL,NULL);
}

/* Define aof_fsync to fdatasync() in Linux and fsync() for all the rest */

#ifdef __linux__

#define aof_fsync fdatasync

#else

#define aof_fsync fsync

#endif

/* Starts a background task that performs fsync() against the specified

* file descriptor (the one of the AOF file) in another thread. */

void aof_background_fsync(int fd) {

bioCreateBackgroundJob(REDIS_BIO_AOF_FSYNC,(void*)(long)fd,NULL,NULL);

}

其中everysec是通过下面的逻辑来进行的，检测后台是否fsync任务在进行，如果有的话，判断上次的fsync距离现在的时间，如果大于2s， 则阻塞， 否则直接进行后台队列。

如果上一次的fsync执行了2s多，则会阻塞执行，直到写入成功，这个时候日志中会记录下面一条记录，并且增加info中对应的aof_delayed_fsync值

[5750] 12 Aug 09:56:17.057 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.

详细逻辑如下：

/*
 * When the fsync policy is set to 'everysec' we may delay the flush if there
 * is still an fsync() going on in the background thread, since for instance
 * on Linux write(2) will be blocked by the background fsync anyway.
 * When this happens we remember that there is some aof buffer to be
 * flushed ASAP, and will try to do that in the serverCron() function.
 *
 * However if force is set to 1 we'll write regardless of the background
 * fsync.
 *
 * 但是如果上一次的fsync执行了2s多，则会阻塞执行，直到写入成功
 */
    /* With this append fsync policy we do background fsyncing.
     * If the fsync is still in progress we can try to delay
     * the write for a couple of seconds. */
    if (sync_in_progress) {
        if (server.aof_flush_postponed_start == 0) {
            /* No previous write postponinig, remember that we are
             * postponing the flush and return. */
            server.aof_flush_postponed_start = server.unixtime;
            return;
        } else if (server.unixtime - server.aof_flush_postponed_start < 2) {
            /* We were already waiting for fsync to finish, but for less
             * than two seconds this is still ok. Postpone again. */
            return;
        }
        /* Otherwise fall trough, and go write since we can't wait
         * over two seconds. */

* When the fsync policy is set to 'everysec' we may delay the flush if there

* is still an fsync() going on in the background thread, since for instance

* on Linux write(2) will be blocked by the background fsync anyway.

* When this happens we remember that there is some aof buffer to be

* flushed ASAP, and will try to do that in the serverCron() function.

* However if force is set to 1 we'll write regardless of the background

* fsync.

* 但是如果上一次的fsync执行了2s多，则会阻塞执行，直到写入成功

/* With this append fsync policy we do background fsyncing.

* If the fsync is still in progress we can try to delay

* the write for a couple of seconds. */

if (sync_in_progress) {

if (server.aof_flush_postponed_start == 0) {

/* No previous write postponinig, remember that we are

* postponing the flush and return. */

server.aof_flush_postponed_start = server.unixtime;

return;

} else if (server.unixtime - server.aof_flush_postponed_start < 2) {

/* We were already waiting for fsync to finish, but for less

* than two seconds this is still ok. Postpone again. */

return;

}

/* Otherwise fall trough, and go write since we can't wait

* over two seconds. */

aof_pending_bio_fsync

/* Return the number of pending jobs of the specified type. */

bioPendingJobsOfType(REDIS_BIO_AOF_FSYNC),

serverCron中检查server.aof_flush_postponed_start，如果有的话，就追加一次flush，但是只有在上面的情况下会导致阻塞，其他情况下都会很快返回；

/* AOF postponed flush: Try at every cron cycle if the slow fsync

* completed. */

if (server.aof_flush_postponed_start) flushAppendOnlyFile(0);

Redis AOF故障分析

现象描述：当AOF rewrite 15G大小的内存时，Redis整个死掉的样子，所有指令甚至包括slave发到master的ping，redis-cli info都不能被执行。可能会在日志中看到此类错误：

Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.

下面就分析一下出现这个错误出现的原因？

原因分析：

1）官方文档，由IO产生的Latency详细分析, 已经预言了悲剧的发生，但一开始没留意。

2）Redis为求简单，采用了单请求处理线程结构。

3）打开AOF持久化功能后， Redis处理完每个事件后会调用write(2)将变化写入kernel的buffer，如果此时write(2)被阻塞，Redis就不能处理下一个事件。

4）Linux规定执行write(2)时，如果对同一个文件正在执行fdatasync(2)将kernel buffer写入物理磁盘，或者有system wide sync在执行，write(2)会被block住，整个Redis被block住。

5）如果系统IO繁忙，比如有别的应用在写盘，或者Redis自己在AOF rewrite或RDB snapshot（虽然此时写入的是另一个临时文件，虽然各自都在连续写，但两个文件间的切换使得磁盘磁头的寻道时间加长），就可能导致fdatasync(2)迟迟未能完成从而block住write(2)，block住整个Redis。

6）为了更清晰的看到fdatasync(2)的执行时长，可以使用”strace -p (pid of redis server) -T -e -f trace=fdatasync”，但会影响系统性能。

7）Redis提供了一个自救的方式，当发现文件有在执行fdatasync(2)时，就先不调用write(2)，只存在cache里，免得被block。但如果已经超过两秒都还是这个样子，则会硬着头皮执行write(2)，即使redis会被block住。此时那句要命的log会打印：“Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.” 之后用redis-cli INFO可以看到aof_delayed_fsync的值被加1。

8）因此，对于fsync设为everysec时丢失数据的可能性的最严谨说法是：如果有fdatasync在长时间的执行，此时redis意外关闭会造成文件里不多于两秒的数据丢失。如果fdatasync运行正常，redis意外关闭没有影响，只有当操作系统crash时才会造成少于1秒的数据丢失。

如果您觉得本站对你有帮助，那么可以支付宝扫码捐助以帮助本站更好地发展，在此谢过。