Clock skew or timezone issue registering new clients on pangolin vpn. #286

Closed
opened 2026-04-05 17:05:08 +02:00 by MrUnknownDE · 0 comments
Owner

Originally created by @devicemanager on 2/2/2026

Describe the Bug

I'm in different timezones for my pangolin server and the pangolin client and get the client to work, but registering is failing. This is a likely effect of the difference in TZ settings and perhaps calculation. I have just had copilot to analyze it, not looked at the code myself. The olm service is running on a proxmox lxc and remotely sending requests to pangolin over a link that also has quite some ms delay. It has previously worked with different release versions, but I don't have these version numbers.

Thanks — I searched the repository for the log strings and where hole-punch timestamps are produced/used, and I inspected the key handlers. Below are the concrete findings (with quoted code lines), my analysis about the timezone/timestamp issue you mentioned, and clear remediation options plus a minimal patch you can apply.

What I ran

  • Loaded the repository search ability and ran code searches for:
    • "last hole punch is too old"
    • "Public key mismatch"
    • "lastHolePunch"
    • functions that update hole-punch timestamps

Note: search results may be incomplete (limited to 10 results). You can view more results in the GitHub UI:

Key files I found (relevant snippets)

  1. Olm register logic (where the "last hole punch is too old" register-skip check and the "Public key mismatch..." log appear)
    logger.debug(
        `Olm client ID: ${client.clientId}, Public Key: ${publicKey}, Relay: ${relay}`
    );

    if (!publicKey) {
        logger.warn("Public key not provided");
        return;
    }

    if (client.pubKey !== publicKey || client.archived) {
        logger.info(
            "Public key mismatch. Updating public key and clearing session info..."
        );
        // Update the client's public key
        await db
            .update(clients)
            .set({
                pubKey: publicKey,
                archived: false
            })
            .where(eq(clients.clientId, client.clientId));

        // set isRelay to false for all of the client's sites to reset the connection metadata
        await db
            .update(clientSitesAssociationsCache)
            .set({
                isRelayed: relay == true
            })
            .where(eq(clientSitesAssociationsCache.clientId, client.clientId));
    }

    ...

    if (now - (client.lastHolePunch || 0) > 5 && sitesCount > 0) {
        logger.warn(
            "Client last hole punch is too old and we have sites to send; skipping this register"
        );
        return;
    }
  1. The hole-punch update function that writes lastHolePunch into the clients table (this is where the server currently uses the timestamp provided by the client)
        const [updatedClient] = await db
            .update(clients)
            .set({
                lastHolePunch: timestamp
            })
            .where(eq(clients.clientId, olm.clientId))
            .returning();
  1. Newt get-config logic that also checks lastHolePunch for sites
    const now = new Date().getTime() / 1000;

    ...

    if (existingSite.lastHolePunch && now - existingSite.lastHolePunch > 5) {
        logger.warn(
            `handleGetConfigMessage: Site ${existingSite.siteId} last hole punch is too old, skipping`
        );
        return;
    }

What these snippets show (summarized)

  • The register handler (olm) uses a server-side "now" (new Date().getTime() / 1000) and compares it to client.lastHolePunch with a very small threshold (5 seconds).
  • The hole-punch handler (updateHolePunch) writes the timestamp value that it receives from the client into lastHolePunch without converting it or replacing it by server time.
  • That means the server is trusting the client-provided timestamp (and comparing it to a server-derived "now"), so any client clock skew, timezone conversions, or unit mismatches (seconds vs ms) can cause the server to classify the hole-punch as "too old" even though the client just sent it.

Why timezone (and clock) differences can cause the failure

  • Date.getTime() / Date.now() returns epoch milliseconds (UTC) on the machine executing the code; however the code converts to seconds in some places (divide by 1000).
  • If the client sends a timestamp in a different unit (ms vs seconds) or a timestamp derived from the client's local clock (non-UTC or unsynchronized), the numeric value stored in lastHolePunch may not match server expectations.
  • Because the server compares server-now against client-supplied timestamp and uses a tiny threshold (5 seconds), even small clock skew (NTP not running on client, timezone conversions, or unit mismatch) will make the server treat the hole punch as stale and skip the register.

Immediate recommendations (short)

  1. Treat timestamps as server-authoritative:

    • When a hole punch request arrives, set lastHolePunch to the server time (Math.floor(Date.now() / 1000)) instead of trusting client-provided timestamp.
    • If you need the client-reported time for debugging, store it in a separate column (e.g., lastHolePunchClientTs) but do not use it for freshness checks.
  2. Normalize units and use UTC epoch seconds consistently:

    • Decide on a single unit (I recommend epoch seconds as integer) across the codebase and convert any incoming timestamps to that unit explicitly.
    • Ensure comparisons use the same unit (both are seconds).
  3. Increase the acceptance window / add tolerance:

    • The current 5-second window is very tight. If you keep client-supplied timestamps, increase to a reasonably larger window (e.g., 30s) or use an adaptive window that considers likely network delays.
  4. Prefer server-now for freshness checks:

    • Change code that updates lastHolePunch to set a server timestamp and leave detection logic as-is.
  5. Instrument and monitor:

    • Log client-reported timestamp, server now, and the resulting delta in debug mode to confirm fixes.

Minimal concrete patch (recommended change)

  • Replace the write of the client-supplied timestamp in updateAndGenerateEndpointDestinations / updateHolePunch with server time. Example change:
// BEFORE
const [updatedClient] = await db
  .update(clients)
  .set({
    lastHolePunch: timestamp
  })
  .where(eq(clients.clientId, olm.clientId))
  .returning();

// AFTER (use server time in seconds)
const serverNow = Math.floor(Date.now() / 1000);
const [updatedClient] = await db
  .update(clients)
  .set({
    lastHolePunch: serverNow
  })
  .where(eq(clients.clientId, olm.clientId))
  .returning();

Notes on the patch

  • This makes the server the single source of truth for hole-punch freshness and avoids timezone/clock/units mistakes coming from the client.
  • If you still want to record the client-supplied timestamp, add a new column (e.g., lastHolePunchClient) and store it there for debugging, but do not base freshness checks on it.

Additional / optional adjustments

  • Increase the register accept window from 5s to something that tolerates modest clock skew or network latency (for example 30s). Example location: handleOlmRegisterMessage.ts and newt/handleGetConfigMessage.ts where the checks use "> 5".
  • Add debug logging in updateAndGenerateEndpointDestinations to print serverNow, client timestamp, and formatted endpoint for easier troubleshooting.
  • Ensure clients send timestamps in a documented unit (if still used) and run NTP on client devices.

Suggested next steps I can do for you

  • I can open a PR patching updateHolePunch/updateAndGenerateEndpointDestinations to write server time and add debug logs showing serverNow vs client timestamp.
  • I can also change the check threshold (e.g., from 5 to 30 seconds) and add log lines that print the times/deltas so you can observe whether your clients are now accepted.
  • If you want, I can produce the exact patch (diff) ready to apply.

Which would you like me to do next?

  • Create the PR/patch to set lastHolePunch from server time (and optionally add client timestamp column + logging), or
  • Only increase the acceptance threshold and add logging, or
  • Just produce a ready-to-apply patch file here for you to review.

Environment

  • OS Type & Version: (e.g., Ubuntu 22.04)
  • Pangolin Version:
  • Gerbil Version:
  • Traefik Version:
  • Newt Version:
  • Olm Version: (if applicable)

To Reproduce

Have a different TZ setting for client and server and run pangolin on US continent and pangolin client and olm on EU continent. In addition the pangolin client needs to run on a client machine with TZ set to CET or equivalent.

Expected Behavior

Registering new client machine works but status never get to connected state and in pangolin in the docker compose log shows:
026-02-02T17:15:19+00:00 [info]: Public key mismatch. Updating public key and clearing session info...
2026-02-02T17:15:19+00:00 [warn]: Client last hole punch is too old and we have sites to send; skipping this register
2026-02-02T17:15:20+00:00 [info]: Handling register olm message!
2026-02-02T17:15:20+00:00 [warn]: Client last hole punch is too old and we have sites to send; skipping this register
2026-02-02T17:15:21+00:00 [info]: Handling register olm message!
2026-02-02T17:15:21+00:00 [warn]: Client last hole punch is too old and we have sites to send; skipping this register
This message repeats 10 times
...
In the pangolin client on MacOS it shows the three dots and shows "Registering".

*Originally created by @devicemanager on 2/2/2026* ### Describe the Bug I'm in different timezones for my pangolin server and the pangolin client and get the client to work, but registering is failing. This is a likely effect of the difference in TZ settings and perhaps calculation. I have just had copilot to analyze it, not looked at the code myself. The olm service is running on a proxmox lxc and remotely sending requests to pangolin over a link that also has quite some ms delay. It has previously worked with different release versions, but I don't have these version numbers. Thanks — I searched the repository for the log strings and where hole-punch timestamps are produced/used, and I inspected the key handlers. Below are the concrete findings (with quoted code lines), my analysis about the timezone/timestamp issue you mentioned, and clear remediation options plus a minimal patch you can apply. What I ran - Loaded the repository search ability and ran code searches for: - "last hole punch is too old" - "Public key mismatch" - "lastHolePunch" - functions that update hole-punch timestamps Note: search results may be incomplete (limited to 10 results). You can view more results in the GitHub UI: - Code search for the log string: https://github.com/fosrl/pangolin/search?q=%22last+hole+punch+is+too+old%22&type=code - General repo search page: https://github.com/fosrl/pangolin/search Key files I found (relevant snippets) 1) Olm register logic (where the "last hole punch is too old" register-skip check and the "Public key mismatch..." log appear) ```typescript name=server/routers/olm/handleOlmRegisterMessage.ts url=https://github.com/fosrl/pangolin/blob/b0566d3c6fc0e2a993c8606acc128a68c4e27247/server/routers/olm/handleOlmRegisterMessage.ts#L175-L244 logger.debug( `Olm client ID: ${client.clientId}, Public Key: ${publicKey}, Relay: ${relay}` ); if (!publicKey) { logger.warn("Public key not provided"); return; } if (client.pubKey !== publicKey || client.archived) { logger.info( "Public key mismatch. Updating public key and clearing session info..." ); // Update the client's public key await db .update(clients) .set({ pubKey: publicKey, archived: false }) .where(eq(clients.clientId, client.clientId)); // set isRelay to false for all of the client's sites to reset the connection metadata await db .update(clientSitesAssociationsCache) .set({ isRelayed: relay == true }) .where(eq(clientSitesAssociationsCache.clientId, client.clientId)); } ... if (now - (client.lastHolePunch || 0) > 5 && sitesCount > 0) { logger.warn( "Client last hole punch is too old and we have sites to send; skipping this register" ); return; } ``` 2) The hole-punch update function that writes lastHolePunch into the clients table (this is where the server currently uses the timestamp provided by the client) ```typescript name=server/routers/gerbil/updateHolePunch.ts url=https://github.com/fosrl/pangolin/blob/b0566d3c6fc0e2a993c8606acc128a68c4e27247/server/routers/gerbil/updateHolePunch.ts#L125-L210 const [updatedClient] = await db .update(clients) .set({ lastHolePunch: timestamp }) .where(eq(clients.clientId, olm.clientId)) .returning(); ``` 3) Newt get-config logic that also checks lastHolePunch for sites ```typescript name=server/routers/newt/handleGetConfigMessage.ts url=https://github.com/fosrl/pangolin/blob/b0566d3c6fc0e2a993c8606acc128a68c4e27247/server/routers/newt/handleGetConfigMessage.ts#L52-L92 const now = new Date().getTime() / 1000; ... if (existingSite.lastHolePunch && now - existingSite.lastHolePunch > 5) { logger.warn( `handleGetConfigMessage: Site ${existingSite.siteId} last hole punch is too old, skipping` ); return; } ``` What these snippets show (summarized) - The register handler (olm) uses a server-side "now" (new Date().getTime() / 1000) and compares it to client.lastHolePunch with a very small threshold (5 seconds). - The hole-punch handler (updateHolePunch) writes the timestamp value that it receives from the client into lastHolePunch without converting it or replacing it by server time. - That means the server is trusting the client-provided timestamp (and comparing it to a server-derived "now"), so any client clock skew, timezone conversions, or unit mismatches (seconds vs ms) can cause the server to classify the hole-punch as "too old" even though the client just sent it. Why timezone (and clock) differences can cause the failure - Date.getTime() / Date.now() returns epoch milliseconds (UTC) on the machine executing the code; however the code converts to seconds in some places (divide by 1000). - If the client sends a timestamp in a different unit (ms vs seconds) or a timestamp derived from the client's local clock (non-UTC or unsynchronized), the numeric value stored in lastHolePunch may not match server expectations. - Because the server compares server-now against client-supplied timestamp and uses a tiny threshold (5 seconds), even small clock skew (NTP not running on client, timezone conversions, or unit mismatch) will make the server treat the hole punch as stale and skip the register. Immediate recommendations (short) 1. Treat timestamps as server-authoritative: - When a hole punch request arrives, set lastHolePunch to the server time (Math.floor(Date.now() / 1000)) instead of trusting client-provided timestamp. - If you need the client-reported time for debugging, store it in a separate column (e.g., lastHolePunchClientTs) but do not use it for freshness checks. 2. Normalize units and use UTC epoch seconds consistently: - Decide on a single unit (I recommend epoch seconds as integer) across the codebase and convert any incoming timestamps to that unit explicitly. - Ensure comparisons use the same unit (both are seconds). 3. Increase the acceptance window / add tolerance: - The current 5-second window is very tight. If you keep client-supplied timestamps, increase to a reasonably larger window (e.g., 30s) or use an adaptive window that considers likely network delays. 4. Prefer server-now for freshness checks: - Change code that updates lastHolePunch to set a server timestamp and leave detection logic as-is. 5. Instrument and monitor: - Log client-reported timestamp, server now, and the resulting delta in debug mode to confirm fixes. Minimal concrete patch (recommended change) - Replace the write of the client-supplied timestamp in updateAndGenerateEndpointDestinations / updateHolePunch with server time. Example change: ```typescript name=server/routers/gerbil/updateHolePunch.ts url=https://github.com/fosrl/pangolin/blob/b0566d3c6fc0e2a993c8606acc128a68c4e27247/server/routers/gerbil/updateHolePunch.ts // BEFORE const [updatedClient] = await db .update(clients) .set({ lastHolePunch: timestamp }) .where(eq(clients.clientId, olm.clientId)) .returning(); // AFTER (use server time in seconds) const serverNow = Math.floor(Date.now() / 1000); const [updatedClient] = await db .update(clients) .set({ lastHolePunch: serverNow }) .where(eq(clients.clientId, olm.clientId)) .returning(); ``` Notes on the patch - This makes the server the single source of truth for hole-punch freshness and avoids timezone/clock/units mistakes coming from the client. - If you still want to record the client-supplied timestamp, add a new column (e.g., lastHolePunchClient) and store it there for debugging, but do not base freshness checks on it. Additional / optional adjustments - Increase the register accept window from 5s to something that tolerates modest clock skew or network latency (for example 30s). Example location: handleOlmRegisterMessage.ts and newt/handleGetConfigMessage.ts where the checks use "> 5". - Add debug logging in updateAndGenerateEndpointDestinations to print serverNow, client timestamp, and formatted endpoint for easier troubleshooting. - Ensure clients send timestamps in a documented unit (if still used) and run NTP on client devices. Suggested next steps I can do for you - I can open a PR patching updateHolePunch/updateAndGenerateEndpointDestinations to write server time and add debug logs showing serverNow vs client timestamp. - I can also change the check threshold (e.g., from 5 to 30 seconds) and add log lines that print the times/deltas so you can observe whether your clients are now accepted. - If you want, I can produce the exact patch (diff) ready to apply. Which would you like me to do next? - Create the PR/patch to set lastHolePunch from server time (and optionally add client timestamp column + logging), or - Only increase the acceptance threshold and add logging, or - Just produce a ready-to-apply patch file here for you to review. ### Environment - OS Type & Version: (e.g., Ubuntu 22.04) - Pangolin Version: - Gerbil Version: - Traefik Version: - Newt Version: - Olm Version: (if applicable) ### To Reproduce Have a different TZ setting for client and server and run pangolin on US continent and pangolin client and olm on EU continent. In addition the pangolin client needs to run on a client machine with TZ set to CET or equivalent. ### Expected Behavior Registering new client machine works but status never get to connected state and in pangolin in the docker compose log shows: 026-02-02T17:15:19+00:00 [info]: Public key mismatch. Updating public key and clearing session info... 2026-02-02T17:15:19+00:00 [warn]: Client last hole punch is too old and we have sites to send; skipping this register 2026-02-02T17:15:20+00:00 [info]: Handling register olm message! 2026-02-02T17:15:20+00:00 [warn]: Client last hole punch is too old and we have sites to send; skipping this register 2026-02-02T17:15:21+00:00 [info]: Handling register olm message! 2026-02-02T17:15:21+00:00 [warn]: Client last hole punch is too old and we have sites to send; skipping this register This message repeats 10 times ... In the pangolin client on MacOS it shows the three dots and shows "Registering".
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github/pangolin#286