Redis AOF刷新策略
Redis支持使用aof来进行持久化,防止数据丢失,aof的刷新策略通过参数appendfsync控制,有三个值:always、everysec、no,默认是everysec。
下面从源码的角度剖析一下aof的刷新策略。
每次redis进入event循环准备执行这个event时,会调用beforeSleep方法。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
void aeMain(aeEventLoop *eventLoop) { eventLoop->stop = 0; while (!eventLoop->stop) { if (eventLoop->beforesleep != NULL) eventLoop->beforesleep(eventLoop); aeProcessEvents(eventLoop, AE_ALL_EVENTS); } } /* This function gets called every time Redis is entering the * main loop of the event driven library, that is, before to sleep * for ready file descriptors. */ void beforeSleep(struct aeEventLoop *eventLoop) { ...... /* Write the AOF buffer on disk */ flushAppendOnlyFile(0); ...... } |
上面的代码中的flushAppendOnlyFile(int force)进行实际的执行。
src/aof.c
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
void flushAppendOnlyFile(int force) { ssize_t nwritten; int sync_in_progress = 0; mstime_t latency; if (sdslen(server.aof_buf) == 0) return; if (server.aof_fsync == AOF_FSYNC_EVERYSEC) sync_in_progress = bioPendingJobsOfType(BIO_AOF_FSYNC) != 0; if (server.aof_fsync == AOF_FSYNC_EVERYSEC && !force) { /* With this append fsync policy we do background fsyncing. * If the fsync is still in progress we can try to delay * the write for a couple of seconds. */ if (sync_in_progress) { if (server.aof_flush_postponed_start == 0) { /* No previous write postponing, remember that we are * postponing the flush and return. */ server.aof_flush_postponed_start = server.unixtime; return; } else if (server.unixtime - server.aof_flush_postponed_start < 2) { /* We were already waiting for fsync to finish, but for less * than two seconds this is still ok. Postpone again. */ return; } /* Otherwise fall trough, and go write since we can't wait * over two seconds. */ server.aof_delayed_fsync++; serverLog(LL_NOTICE,"Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis."); } } |
AOF_FSYNC_ALWAYS会调用aof_fsync进行同步写入,而aof_fsync在Linux下就是fdatasync,AOF_FSYNC_EVERYSEC会调用aof_background_fsync,而aof_background_fsync会创建一个任务交给后台的bio线程进行处理。
1 2 3 4 5 6 7 8 9 10 11 |
/* Define aof_fsync to fdatasync() in Linux and fsync() for all the rest */ #ifdef __linux__ #define aof_fsync fdatasync #else #define aof_fsync fsync #endif /* Starts a background task that performs fsync() against the specified * file descriptor (the one of the AOF file) in another thread. */ void aof_background_fsync(int fd) { bioCreateBackgroundJob(REDIS_BIO_AOF_FSYNC,(void*)(long)fd,NULL,NULL); } |
其中everysec是通过下面的逻辑来进行的,检测后台是否fsync任务在进行,如果有的话,判断上次的fsync距离现在的时间,如果大于2s, 则阻塞, 否则直接进行后台队列。
如果上一次的fsync执行了2s多,则会阻塞执行,直到写入成功,这个时候日志中会记录下面一条记录,并且增加info中对应的aof_delayed_fsync值
[5750] 12 Aug 09:56:17.057 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
详细逻辑如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
/* * When the fsync policy is set to 'everysec' we may delay the flush if there * is still an fsync() going on in the background thread, since for instance * on Linux write(2) will be blocked by the background fsync anyway. * When this happens we remember that there is some aof buffer to be * flushed ASAP, and will try to do that in the serverCron() function. * * However if force is set to 1 we'll write regardless of the background * fsync. * * 但是如果上一次的fsync执行了2s多,则会阻塞执行,直到写入成功 */ /* With this append fsync policy we do background fsyncing. * If the fsync is still in progress we can try to delay * the write for a couple of seconds. */ if (sync_in_progress) { if (server.aof_flush_postponed_start == 0) { /* No previous write postponinig, remember that we are * postponing the flush and return. */ server.aof_flush_postponed_start = server.unixtime; return; } else if (server.unixtime - server.aof_flush_postponed_start < 2) { /* We were already waiting for fsync to finish, but for less * than two seconds this is still ok. Postpone again. */ return; } /* Otherwise fall trough, and go write since we can't wait * over two seconds. */ |
aof_pending_bio_fsync
/* Return the number of pending jobs of the specified type. */
bioPendingJobsOfType(REDIS_BIO_AOF_FSYNC),
serverCron中检查server.aof_flush_postponed_start,如果有的话,就追加一次flush,但是只有在上面的情况下会导致阻塞,其他情况下都会很快返回;
/* AOF postponed flush: Try at every cron cycle if the slow fsync
* completed. */
if (server.aof_flush_postponed_start) flushAppendOnlyFile(0);
Redis AOF故障分析
现象描述:当AOF rewrite 15G大小的内存时,Redis整个死掉的样子,所有指令甚至包括slave发到master的ping,redis-cli info都不能被执行。可能会在日志中看到此类错误:
Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
下面就分析一下出现这个错误出现的原因?
原因分析:
1)官方文档,由IO产生的Latency详细分析, 已经预言了悲剧的发生,但一开始没留意。
2)Redis为求简单,采用了单请求处理线程结构。
3)打开AOF持久化功能后, Redis处理完每个事件后会调用write(2)将变化写入kernel的buffer,如果此时write(2)被阻塞,Redis就不能处理下一个事件。
4)Linux规定执行write(2)时,如果对同一个文件正在执行fdatasync(2)将kernel buffer写入物理磁盘,或者有system wide sync在执行,write(2)会被block住,整个Redis被block住。
5)如果系统IO繁忙,比如有别的应用在写盘,或者Redis自己在AOF rewrite或RDB snapshot(虽然此时写入的是另一个临时文件,虽然各自都在连续写,但两个文件间的切换使得磁盘磁头的寻道时间加长),就可能导致fdatasync(2)迟迟未能完成从而block住write(2),block住整个Redis。
6)为了更清晰的看到fdatasync(2)的执行时长,可以使用”strace -p (pid of redis server) -T -e -f trace=fdatasync”,但会影响系统性能。
7)Redis提供了一个自救的方式,当发现文件有在执行fdatasync(2)时,就先不调用write(2),只存在cache里,免得被block。但如果已经超过两秒都还是这个样子,则会硬着头皮执行write(2),即使redis会被block住。此时那句要命的log会打印:“Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.” 之后用redis-cli INFO可以看到aof_delayed_fsync的值被加1。
8)因此,对于fsync设为everysec时丢失数据的可能性的最严谨说法是:如果有fdatasync在长时间的执行,此时redis意外关闭会造成文件里不多于两秒的数据丢失。如果fdatasync运行正常,redis意外关闭没有影响,只有当操作系统crash时才会造成少于1秒的数据丢失。