-
Notifications
You must be signed in to change notification settings - Fork 5
Description
I am currently running killMS on a server with only 32 CPU cores and 128 GB of memory, using the following command:
kMS.py --MSName $msfile --FieldID 0 --SolverType KAFCA --PolMode Scalar --BaseImageName image_DI_Clustered.DeeperDeconv --dt 5 --NCPU 30 --OutSolsName DD0 --NChanSols 5 --InCol CORRECTED_DATA --TChunk 0.2 --BeamModel FITS --FITSParAngleIncDeg 0.5 --FITSFile=$BEAMfits --CenterNorm 1 --FITSFeed xy --FITSFeedSwap 1 --ApplyPJones 1 --FlipVisibilityHands 1 --NChanBeamPerMS 2
When using TChunk=0.2, I encountered the following error:
slurmstepd: error: Detected 7 oom-kill event(s) in StepId=86546498.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.
To reduce the memory usage, I then tried TChunk=0.1. This setting worked until the final time chunk [4.30, 4.39] (the total observation time for this target is 4.38 hours), and an error happened:
- 23:44:39 - ClassVisServer | Reading next data chunk in [ 4.30, 4.39] hours (column CORRECTED_DATA)
- 23:44:39 - ClassMS | Reading rows [2653500 -> 2723040]
- 23:44:42 - ClassMS | Reading uvw_dt column
- 23:44:43 - ClassMS | �[1m�[91mData has only two polarisation, adapting shape�[0m�[0m
- 23:44:45 - ClassMS | Flagging the zeros-weighted visibilities
- 23:44:51 - ClassMS | Increase in flag fraction: 0.022015
- 23:44:54 - ClassVisServer | Channels are equidistant, can go fast
- 23:44:54 - ClassVisServer | Flagging baselines with w > 5.539633 km
- 23:44:54 - ClassVisServer | w-Flagged 0.0% of the data
- 23:45:04 - ClassVisServer | Estimating Beam directions at the center of the individual facets areas
- 23:45:04 - ClassFITSBeam | Using station-independent E Jones for the array
- 23:45:04 - ClassFITSBeam | polarization basis specified by FITSFeed parameter: xx xy yx yy
- 23:45:04 - ClassFITSBeam | swapping feeds as per FITSFeedSwap setting
- 23:26:32 - ClassFITSBeam | All stations: beam patterns /meerkat_pb/meerkat_pb_jones_cube_97channels_yy_re.fits /meerkat_pb/meerkat_pb_jones_cube_97channels_yy_im.fits already in memory
- 23:26:32 - ClassFITSBeam | All stations: beam patterns /meerkat_pb/meerkat_pb_jones_cube_97channels_yx_re.fits /meerkat_pb/meerkat_pb_jones_cube_97channels_yx_im.fits already in memory
- 23:26:32 - ClassFITSBeam | All stations: beam patterns /meerkat_pb/meerkat_pb_jones_cube_97channels_xy_re.fits /meerkat_pb/meerkat_pb_jones_cube_97channels_xy_im.fits already in memory
- 23:26:32 - ClassFITSBeam | All stations: beam patterns /meerkat_pb/meerkat_pb_jones_cube_97channels_xx_re.fits /meerkat_pb/meerkat_pb_jones_cube_97channels_xx_im.fits already in memory
- 23:45:04 - ClassFITSBeam | computing beam sample times for 69540 timeslots
- 23:45:04 - ClassFITSBeam | DtBeamMin=5.00 min results in 1 samples
- 23:45:04 - ClassFITSBeam | FITSParAngleIncrement=0.50 deg results in 1 samples
- 23:45:04 - ClassVisServer | Update FITS beam in 190 dirs, 2 times, 2 freqs ...
- 23:45:04 - ClassVisServer | .... done Update beam
- 23:45:04 - ClassJonesDomains | Building VisToJones time mapping...
- 23:45:04 - ClassJonesDomains | Building VisToJones freq mapping...
- 23:45:36 - ClassWirtingerSolver | DT=306.321903, dt=300.000000, nt=2.000000
Traceback (most recent call last):
File "/public/home/danhu/.local/ddc-env/bin/kMS.py", line 8, in <module>
sys.exit(kms_main())
File "/public/home/danhu/.local/ddc-env/lib/python3.10/site-packages/killMS/__main__.py", line 3, in kms_main
kMS.driver()
File "/public/home/danhu/.local/ddc-env/lib/python3.10/site-packages/killMS/kMS.py", line 1335, in driver
main(OP=OP,MSName=MSName)
File "/public/home/danhu/.local/ddc-env/lib/python3.10/site-packages/killMS/kMS.py", line 758, in main
Solver.doNextTimeSolve_Parallel(Parallel=True)
File "/public/home/danhu/.local/ddc-env/lib/python3.10/site-packages/killMS/Wirtinger/ClassWirtingerSolver.py", line 898, in doNextTimeSolve_Parallel
Res=self.setNextData()
File "/public/home/danhu/.local/ddc-env/lib/python3.10/site-packages/killMS/Wirtinger/ClassWirtingerSolver.py", line 532, in setNextData
self.AppendGToSolArray()
File "/public/home/danhu/.local/ddc-env/lib/python3.10/site-packages/killMS/Wirtinger/ClassWirtingerSolver.py", line 1252, in AppendGToSolArray
self.SolsArray_t0[self.iCurrentSol]=t0
IndexError: index 79 is out of bounds for axis 0 with size 79
I would like to ask:
- Are there any conflicting or improper parameter settings in the command above?
- how is the index 79 determined?
- Additionally, when this error occurred, the task was not automatically terminated. Instead, it becomes unresponsive and remains stuck on the server.
Any insights or suggestions would be greatly appreciated.