-
Notifications
You must be signed in to change notification settings - Fork 374
[CLI] Move native file locking into workers #2997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: php-wasm/node/testable-syscall-overrides
Are you sure you want to change the base?
[CLI] Move native file locking into workers #2997
Conversation
…Windows FileLockManager tests
|
We can do this with a FileLockManagerForPosix and a FileLockManagerForWindows and fall back to the FileLockManagerForNode if the native locking API is not available. I've working on the implementation for both. The main things that require care are:
For Posix, we can keep it fairly simple for fcntl() by keeping track of which files a process has locked via fcntl() and then unlocking the entire range via fcntl() when locks need to be released. For Windows, implementing fcntl() semantics is more complicated. We'll have to maintain a collection of which ranges are locked per file in order to be able to unlock those ranges. If a caller wants to unlock part of a range, we'll have to unlock the entire range and then obtain locks for the remaining portions of the original locked range. For shared locks, we can obtain the new ranges before releasing the original range, but for exclusive ranges, we'll have to release the original range before attempting to obtain locks on the remaining ranges. (According to a Google answer about whether Windows allows overlapping exclusive locks by the same process) The good news is that we are already tracking locked ranges in the FileLockManagerForNode. The work for the Windows locks shouldn't be that different. cc @adamziel |
60a6f0b to
6023b7d
Compare
…ire remaining range
|
I roughed out native FileLockManager's for POSIX and Windows, but they are yet untested. Tomorrow, I plan to start by adapting native locking tests and testing these new classes. |
|
The pre-requisites for this one seem to be mostly in place. The CLI spawn handler now creates a new OS process for any spawned PHP subprocess. The request handler still uses multiple PHP instances, but can be tuned down by adding In #3014, I'm exploring a CI stress test to confirm multiple workers are indeed used for handling concurrent requests. |
|
I made the FileLockManager test suite declarations reusable so it could test FileLockManagerForPosix and FileLockManagerForWindows.... except I forgot we need to run multiple workers to properly test the native locking, so I'm reworking the tests to do that. It shouldn't be too hard. Without additional work, Vitest doesn't really support tests creating workers based on TS modules, but I think we might just require Node.js v22 for these tests and create workers using TS type stripping flags and our custom unbuilt module loader. The main thing we want to test is that all the file lock managers function properly across workers. Note: I renamed FileLockManagerForNode to FileLockManagerInMemory because it no longer implements any native locking and completely relies on in-memory lock tracking. |
|
@brandonpayton we have a test that creates a new Worker, a search for |
@adamziel I remember something like that too but didn't find an example in files with
👍 thanks |
|
@adamziel My mental model here was incorrect. I'd been implementing the FileLockManagerForPosix and the FileLockManagerForWindows as if the individual workers were actually separate processes. If they were separate processes, native OS locking would naturally enforce locks between the different workers. But workers from the Node.js worker_threads module are just threads. I think that has been clear to me in the past, so I'm not sure how I became mistaken in this case. Because all workers share the same native OS process ID, the native OS locking facilities cannot protect the workers from themselves because all workers appear as part of the same process. I think what this means is that the platform-specific FileLockManagers have to wrap a FileLockManagerInMemory that continues to be shared across workers. The platform-specific lock managers can still run within each worker, but in order to avoid corruption due to php-wasm workers within the same process, they'll have to coordinate with the FileLockManagerInMemory. I think the following may work for each native lock manager:
There are a bunch of fcntl() subtleties this does not respect (e.g., unlocking, upgrading, or downgrading only part of a locked range). But for the sake of SQLite, it looks like SQLite should only require adding and removing whole-and-exact ranges as as mentioned above. (The unixLock() function is implemented here.) |
|
NOTE: I tested lockFileExSync() from the fs-ext-extra-prebuilt package in Windows, and lockFileExSync() and unlockFileExSync() do not appear to have a return value, even though the types say they return a number. The functions do not return a number but rather throw errors to indicate failure. |
|
@brandonpayton I'm not convinced going through a central file log manager is the way to go. the native OS mechanics depends on separate PIDs for or enforcing logs. Let's turn every stone we have and try to find a way to use separate PIDs in here before shifting the direction. For example, could an alternative technique of spawning workers do the trick? This came out of an LLM, it's untested but seems promising. Note
IPC messaging seems available on all OSes: // parent.ts
import { fork, ChildProcess } from "node:child_process";
import { resolve } from "node:path";
const childPath = resolve(__dirname, "child.js"); // compiled JS
const child: ChildProcess = fork(process.argv[0], [
...process.execArgv,
...process.argv.slice(1),
// somehow replace the current script path with child.ts
], {
stdio: ["inherit", "inherit", "inherit", "ipc"], // "ipc" is the important part
});
console.log("Parent PID:", process.pid, "Child PID:", child.pid);
child.on("message", (msg: unknown) => {
console.log("Parent got message from child:", msg);
});
child.on("exit", (code) => {
console.log("Child exited with code", code);
});
// send something to the child
child.send({ type: "hello", payload: "from parent" });// child.ts
// child.ts
console.log("Child PID:", process.pid);
// type-safe-ish helper
type IPCProcess = NodeJS.Process & {
send?: (message: unknown) => void;
};
const ipcProcess = process as IPCProcess;
process.on("message", (msg: unknown) => {
console.log("Child got message from parent:", msg);
// reply
ipcProcess.send?.({
type: "reply",
payload: "got it",
});
}); |
|
@adamziel thanks for the encouragement. Separate processes are preferable, and I will explore that direction first. I'd looked at the built-in child_process and cluster packages yesterday but landed on the shared lock manager. We discussed using the in-memory lock manager as a fallback in cases where the native locking is unavailable. But what do think about just requiring native locking now that we have prebuilt binaries for common platforms? Based on what you've done with prebuilds, would anyone with a less common platform still be able to build fs-ext-extra-prebuild via node-gyp if they had the right tools? |
|
Yeah let's just require native locks 👍 we can always add more prebuilds if anyone struggles. I am not sure if the IPC module supports synchronous communication. But even if it doesn't, then we can do a mixture of techniques; for example, spin a worker with which we are able to exchange messages synchronously. And then use that worker to communicate asynchronously with a process. |
|
@adamziel OK, cool, that simplifies things a bit. Independent worker processes will still have access to synchronous native file locking APIs used by the system call overrides. Do you know of any other reason we need synchronous communication between independent worker processes and Playground CLI? This currently seems like a really good direction to me. One open question I have is what this change would mean for XDebug support. It actually might not matter at all since XDebug creates the network connection to the debugger, not the other way around. @mho22, we are exploring moving php-wasm workers into separate processes for Playground CLI. Do you have any concerns about how that might affect XDebug support |
|
As for other reasons for sync communication - that's mostly to avoid dealing with asyncify. I use it in an upcoming |
|
@brandonpayton I don't think something will change for Xdebug. |
I don't think it will be troublesome. It seems like a good idea to make each php-wasm worker process look like this:
Even before considering blocking for gethostbyname(), it seemed like a good idea to keep the main thread unblocked since Node.js is designed around an async event loop. I'm working on this today. |
|
@adamziel, as we discussed elsewhere, I'm testing the native locking approach via a multi-process unit test setup to validate this direction before doing the work to move Playground CLI from workers to child processes. There is an initial comlink adapter working for IPC between parent and child Node.js processes. It is partially AI-generated and may need some work if we want to use it elsewhere, but it is good enough to run the tests. There are some failures for the POSIX file lock manager tests. I plan to fix those first to confirm the test set is good and then run the same test suite against the Windows file lock manager. Planning to continue this in the morning. |
|
The relevant test files are under |
…nd-cli/move-native-locking-into-workers
…nd-cli/move-native-locking-into-workers
|
The POSIX file lock manager tests are running well locally. There seems to be a path-related issue breaking the Windows tests. Planning to work on this tomorrow and see how the file lock manager is looking on Windows. |
Motivation for the change, related issues
In order to fix native file locking in Windows, we are moving native file locking into workers.
More details coming...
Implementation details
TBD
Testing Instructions (or ideally a Blueprint)
TBD