fix(apisix-standalone): system time rollback caused sync failure #393
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Note
This is a speculative fix. We cannot truly verify that the issue exists, but it is theoretically possible.
The APISIX Standalone backend uses a timestamp as conf_version and modified index. As required by the APISIX Admin API (for standalone), this value must be numeric and incrementally increasing.
Previously, we used
Date.now()to get the latest millisecond timestamp from Node.js, which is obtained by libuv at a lower level via a system call. If the host's clock jumps forward due to regular time synchronisation from NTP or similar components,Date.now()might return a smaller, previous value. In short, this violates the rule that the configuration version must be monotonically increasing.This problem is actually quite tricky, because it is difficult to establish a reliable timestamp that is monotonically increasing and also remains valid on a restart. Fortunately, this is not impossible; we combine two mechanisms to implement a timestamp mechanism that appears to be more reliable.
For a single ADC server process, when it starts up, we can get a start timestamp, which is stored as a static value in memory and remains fixed. We then use the Web Performance API (
performance.now()) to get the seconds from the Node.js was started. After that, simply adding them together is enough to get an incremented timestamp.Next, we must consider what happens if a time rollback occurs while the ADC is restarting.
When we perform the first sync after the ADC starts to build the cache, we extract the maximum configuration version number from the configuration pulled from APISIX and cache it. This way, if a new timestamp that the ADC wants to use is smaller than the one already applied in APISIX, we use the larger of the two to overwrite it, and then increment it there. This ensures that the Admin API can accept configuration change requests. Afterwards, when the time advance exceeds the old version number's timestamp, the sync continues with the standard process.
Strictly, APISIX does not enforce an "incrementing" constraint on the value of conf_version. It merely checks whether the new value matches the old one to execute actions such as rebuilding the routing tree or invalidating caches. Perhaps in future we may remove this incrementality constraint from the Admin API. As one of the first APISIX releases supporting this pattern, imposing stricter limitations helps us understand the boundaries and uncover potential issues. Should we deem it no longer essential in the future, we may remove it.
Checklist