Skip to content

Conversation

@brandonpayton
Copy link
Member

Motivation for the change, related issues

In order to fix native file locking in Windows, we are moving native file locking into workers.

More details coming...

Implementation details

TBD

Testing Instructions (or ideally a Blueprint)

TBD

@brandonpayton
Copy link
Member Author

We can do this with a FileLockManagerForPosix and a FileLockManagerForWindows and fall back to the FileLockManagerForNode if the native locking API is not available.

I've working on the implementation for both. The main things that require care are:

  • Treat zero-length ranges as extending to the end of the file
  • Release locks when a related file descriptor is closed
  • Release locks when the process exits (or in this case, when a PHP request is completed)
  • Implement fcntl() semantics
    • Release fcntl() locks when any file descriptor for the locked file is closed by the locking process.
    • Locked ranges can be unlocked or merged piece by piece. In contrast, Windows locking requires that an unlock range corresponds exactly to the locked range.

For Posix, we can keep it fairly simple for fcntl() by keeping track of which files a process has locked via fcntl() and then unlocking the entire range via fcntl() when locks need to be released.

For Windows, implementing fcntl() semantics is more complicated. We'll have to maintain a collection of which ranges are locked per file in order to be able to unlock those ranges. If a caller wants to unlock part of a range, we'll have to unlock the entire range and then obtain locks for the remaining portions of the original locked range. For shared locks, we can obtain the new ranges before releasing the original range, but for exclusive ranges, we'll have to release the original range before attempting to obtain locks on the remaining ranges. (According to a Google answer about whether Windows allows overlapping exclusive locks by the same process)

The good news is that we are already tracking locked ranges in the FileLockManagerForNode. The work for the Windows locks shouldn't be that different.

cc @adamziel

@brandonpayton brandonpayton force-pushed the playground-cli/move-native-locking-into-workers branch from 60a6f0b to 6023b7d Compare December 9, 2025 18:10
@brandonpayton
Copy link
Member Author

I roughed out native FileLockManager's for POSIX and Windows, but they are yet untested. Tomorrow, I plan to start by adapting native locking tests and testing these new classes.

@adamziel
Copy link
Collaborator

The pre-requisites for this one seem to be mostly in place. The CLI spawn handler now creates a new OS process for any spawned PHP subprocess. The request handler still uses multiple PHP instances, but can be tuned down by adding maxPhpInstances: 1, to every bootWordPress* call in worker-thread-v*.ts files – which we could do in this PR.

In #3014, I'm exploring a CI stress test to confirm multiple workers are indeed used for handling concurrent requests.

@brandonpayton
Copy link
Member Author

I made the FileLockManager test suite declarations reusable so it could test FileLockManagerForPosix and FileLockManagerForWindows.... except I forgot we need to run multiple workers to properly test the native locking, so I'm reworking the tests to do that.

It shouldn't be too hard. Without additional work, Vitest doesn't really support tests creating workers based on TS modules, but I think we might just require Node.js v22 for these tests and create workers using TS type stripping flags and our custom unbuilt module loader. The main thing we want to test is that all the file lock managers function properly across workers.

Note: I renamed FileLockManagerForNode to FileLockManagerInMemory because it no longer implements any native locking and completely relies on in-memory lock tracking.

@adamziel
Copy link
Collaborator

adamziel commented Dec 10, 2025

@brandonpayton we have a test that creates a new Worker, a search for new Worker should surface it. tl;dr it uses execArgs to strip types and use our custom loader. Or we could ditch vitest for that one as you've said, anything goes.

@brandonpayton
Copy link
Member Author

brandonpayton commented Dec 10, 2025

@brandonpayton we have a test that creates a new Worker, a search for new Worker should surface it. tl;dr it uses execArgs to strip types and use our custom loader.

@adamziel I remember something like that too but didn't find an example in files with *.spec.*. It might be in another kind of test... Regardless, this is what I locally did with one test for this PR. Now, it just needs to be used across tests. Comlink continues to be a huge help, especially the sync remote API stuff (because it doesn't require making a sync interface Promised).

Or we could ditch vitest for that one as you've said, anything goes.

👍 thanks

@brandonpayton
Copy link
Member Author

brandonpayton commented Dec 11, 2025

@adamziel My mental model here was incorrect. I'd been implementing the FileLockManagerForPosix and the FileLockManagerForWindows as if the individual workers were actually separate processes. If they were separate processes, native OS locking would naturally enforce locks between the different workers.

But workers from the Node.js worker_threads module are just threads. I think that has been clear to me in the past, so I'm not sure how I became mistaken in this case.

Because all workers share the same native OS process ID, the native OS locking facilities cannot protect the workers from themselves because all workers appear as part of the same process.

I think what this means is that the platform-specific FileLockManagers have to wrap a FileLockManagerInMemory that continues to be shared across workers. The platform-specific lock managers can still run within each worker, but in order to avoid corruption due to php-wasm workers within the same process, they'll have to coordinate with the FileLockManagerInMemory.

I think the following may work for each native lock manager:

  • Attempt to obtain a requested lock from the in-memory lock manager. If that fails, deny the lock.
  • If the in-memory lock is obtained, request a corresponding native lock. Wait until the lock is granted.
  • If the native lock is denied, release the corresponding in-memory lock.

There are a bunch of fcntl() subtleties this does not respect (e.g., unlocking, upgrading, or downgrading only part of a locked range). But for the sake of SQLite, it looks like SQLite should only require adding and removing whole-and-exact ranges as as mentioned above. (The unixLock() function is implemented here.)

@brandonpayton
Copy link
Member Author

NOTE: I tested lockFileExSync() from the fs-ext-extra-prebuilt package in Windows, and lockFileExSync() and unlockFileExSync() do not appear to have a return value, even though the types say they return a number. The functions do not return a number but rather throw errors to indicate failure.

@adamziel
Copy link
Collaborator

adamziel commented Dec 11, 2025

@brandonpayton I'm not convinced going through a central file log manager is the way to go. the native OS mechanics depends on separate PIDs for or enforcing logs. Let's turn every stone we have and try to find a way to use separate PIDs in here before shifting the direction. For example, could an alternative technique of spawning workers do the trick?

This came out of an LLM, it's untested but seems promising. Note fork does not actually use fork(2) but seems to spawn a separate process. Who picked that name?!

The child_process.fork() method is a special case of child_process.spawn()
Unlike the fork(2) POSIX system call, child_process.fork() does not clone the current process.

IPC messaging seems available on all OSes:

// parent.ts
import { fork, ChildProcess } from "node:child_process";
import { resolve } from "node:path";

const childPath = resolve(__dirname, "child.js"); // compiled JS

const child: ChildProcess = fork(process.argv[0], [
	...process.execArgv,
	...process.argv.slice(1),
	// somehow replace the current script path with child.ts
], {
  stdio: ["inherit", "inherit", "inherit", "ipc"], // "ipc" is the important part
});

console.log("Parent PID:", process.pid, "Child PID:", child.pid);

child.on("message", (msg: unknown) => {
  console.log("Parent got message from child:", msg);
});

child.on("exit", (code) => {
  console.log("Child exited with code", code);
});

// send something to the child
child.send({ type: "hello", payload: "from parent" });
// child.ts
// child.ts

console.log("Child PID:", process.pid);

// type-safe-ish helper
type IPCProcess = NodeJS.Process & {
  send?: (message: unknown) => void;
};

const ipcProcess = process as IPCProcess;

process.on("message", (msg: unknown) => {
  console.log("Child got message from parent:", msg);

  // reply
  ipcProcess.send?.({
    type: "reply",
    payload: "got it",
  });
});

@brandonpayton
Copy link
Member Author

brandonpayton commented Dec 11, 2025

@adamziel thanks for the encouragement. Separate processes are preferable, and I will explore that direction first.

I'd looked at the built-in child_process and cluster packages yesterday but landed on the shared lock manager. We discussed using the in-memory lock manager as a fallback in cases where the native locking is unavailable. But what do think about just requiring native locking now that we have prebuilt binaries for common platforms? Based on what you've done with prebuilds, would anyone with a less common platform still be able to build fs-ext-extra-prebuild via node-gyp if they had the right tools?

@adamziel
Copy link
Collaborator

Yeah let's just require native locks 👍 we can always add more prebuilds if anyone struggles.

I am not sure if the IPC module supports synchronous communication. But even if it doesn't, then we can do a mixture of techniques; for example, spin a worker with which we are able to exchange messages synchronously. And then use that worker to communicate asynchronously with a process.

@brandonpayton
Copy link
Member Author

@adamziel OK, cool, that simplifies things a bit. Independent worker processes will still have access to synchronous native file locking APIs used by the system call overrides.

Do you know of any other reason we need synchronous communication between independent worker processes and Playground CLI?

This currently seems like a really good direction to me.

One open question I have is what this change would mean for XDebug support. It actually might not matter at all since XDebug creates the network connection to the debugger, not the other way around.

@mho22, we are exploring moving php-wasm workers into separate processes for Playground CLI. Do you have any concerns about how that might affect XDebug support

@adamziel
Copy link
Collaborator

As for other reasons for sync communication - that's mostly to avoid dealing with asyncify. I use it in an upcoming gethostbyname PR. Is it troublesome to keep around?

@mho22
Copy link
Collaborator

mho22 commented Dec 11, 2025

@brandonpayton I don't think something will change for Xdebug.

@brandonpayton
Copy link
Member Author

As for other reasons for sync communication - that's mostly to avoid dealing with asyncify. I use it in an upcoming gethostbyname PR. Is it troublesome to keep around?

I don't think it will be troublesome. It seems like a good idea to make each php-wasm worker process look like this:

  • Single php-wasm worker process
    • Main thread (setup and asynchronous operations)
      • Worker thread that runs php-wasm and can block, waiting for the main thread to perform async operations.

Even before considering blocking for gethostbyname(), it seemed like a good idea to keep the main thread unblocked since Node.js is designed around an async event loop.

I'm working on this today.

@brandonpayton
Copy link
Member Author

brandonpayton commented Dec 12, 2025

@adamziel, as we discussed elsewhere, I'm testing the native locking approach via a multi-process unit test setup to validate this direction before doing the work to move Playground CLI from workers to child processes.

There is an initial comlink adapter working for IPC between parent and child Node.js processes. It is partially AI-generated and may need some work if we want to use it elsewhere, but it is good enough to run the tests. There are some failures for the POSIX file lock manager tests. I plan to fix those first to confirm the test set is good and then run the same test suite against the Windows file lock manager.

Planning to continue this in the morning.

@brandonpayton
Copy link
Member Author

The relevant test files are under packages/php-wasm/node/src/test/.

@brandonpayton
Copy link
Member Author

The POSIX file lock manager tests are running well locally. There seems to be a path-related issue breaking the Windows tests. Planning to work on this tomorrow and see how the file lock manager is looking on Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants