Project

General

Profile

Feature #1670

create mdrun option checking mini-tool

Added by Szilárd Páll almost 5 years ago. Updated over 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
core library
Target version:
-
Difficulty:
uncategorized
Close

Description

One of the most annoying things that happens to most people is to find that an mdrun job submitted to a cluster crashed after a long queuing due to silly mistakes like invalid command line options or missing input files.

To avoid this, it would be extremely useful to have a small tool that could be called before job submission to do some basic input validation. Command line arguments could be validated using the same option parsing code that mdrun uses, input files could be initially just checked for existence/opened for r/rw.


Related issues

Related to GROMACS - Task #1170: mdlib reorganizationNew

History

#1 Updated by Mark Abraham almost 5 years ago

  • Related to Task #1170: mdlib reorganization added

#2 Updated by Mark Abraham almost 5 years ago

This would indeed be useful. However, there's a bunch of files that are only handled in their modules (probably pull code and others), some algorithms can't work without MPI (e.g. multi-sim) and the utility has to run on the login node. People would also like a tool that can check whether the number of ranks will be suitable for DD-PME, or GPU settings. All these issues are manageable, but there's a lot of code in the way of doing all of them, so we'd have to pick carefully.

An ideal implementation might be to run all the way to the first MD step with some kind of mock hardware environment, but we are a very long way from that.

#3 Updated by Szilárd Páll almost 5 years ago

Mark Abraham wrote:

This would indeed be useful. However, there's a bunch of files that are only handled in their modules (probably pull code and others), some algorithms can't work without MPI (e.g. multi-sim) and the utility has to run on the login node. People would also like a tool that can check whether the number of ranks will be suitable for DD-PME, or GPU settings. All these issues are manageable, but there's a lot of code in the way of doing all of them, so we'd have to pick carefully.

All those things you mention would be very useful, but their usefulness does in no way speak against starting with something simple. For instance, doing only command line parsing/validation and checking the existence of input files would catch the majority of the really irritating mistakes (like missing or typo in argument, duplicate option, missing input file, etc).

Also, I'm well aware that the tool would need to run on the login node which in many cases will require a separate build, but that's again something we can't do much about except document and this does not diminish the usefulness of such a the tool either.

#4 Updated by Mark Abraham almost 5 years ago

Szilárd Páll wrote:

Mark Abraham wrote:

This would indeed be useful. However, there's a bunch of files that are only handled in their modules (probably pull code and others), some algorithms can't work without MPI (e.g. multi-sim) and the utility has to run on the login node. People would also like a tool that can check whether the number of ranks will be suitable for DD-PME, or GPU settings. All these issues are manageable, but there's a lot of code in the way of doing all of them, so we'd have to pick carefully.

All those things you mention would be very useful, but their usefulness does in no way speak against starting with something simple.

Agreed, my point was mostly that we must recognize that it won't be able to do many the things of "command-line checker" simply because much logic is deep enough in mdrun that the early command-line parsing doesn't even know whether a file's existence is relevant. Conditional compilation starts already in gmx_mdrun(), so we probably couldn't support any file-based checking with -multi.

I think the only simple thing to do is have an mdrun option (e.g. gmx mdrun -check that always stops after parse_common_args() in gmx_mdrun() - I just don't know what useful things that would do, because (for example) I don't know whether it has checked for existence of files that were named. I don't even know if duplicate flags like mdrun -npme 3 -npme 4 is an error caught (there, or elsewhere). mdrun -nt 4 -ntmpi 4 -ntomp 4 won't be caught there (nothing hardware-related can be caught there). Probably mdrun -s topol-with-typo could be caught, but I don't remember how much reading of the filesystem is done by that parsing function.

For instance, doing only command line parsing/validation and checking the existence of input files would catch the majority of the really irritating mistakes (like missing or typo in argument, duplicate option, missing input file, etc).

I think these can be caught already with gmx mdrun -nsteps 0, but it's ugly because it will touch the filesystem in some cases. I think not touching the filesystem is pretty much an essential requirement of such a checker, which puts a fairly hard limit on where it has to exit - which is pretty soon after parse_common_args().

Do you have some specific examples of errors you've seen that we can try?

Also, I'm well aware that the tool would need to run on the login node which in many cases will require a separate build, but that's again something we can't do much about except document and this does not diminish the usefulness of such a the tool either.

Should gmx mdrun-checker -nt -1 be an error or not? It is OK with thread-MPI and not with real MPI.

#5 Updated by Szilárd Páll almost 5 years ago

Mark Abraham wrote:

Szilárd Páll wrote:

Mark Abraham wrote:

This would indeed be useful. However, there's a bunch of files that are only handled in their modules (probably pull code and others), some algorithms can't work without MPI (e.g. multi-sim) and the utility has to run on the login node. People would also like a tool that can check whether the number of ranks will be suitable for DD-PME, or GPU settings. All these issues are manageable, but there's a lot of code in the way of doing all of them, so we'd have to pick carefully.

All those things you mention would be very useful, but their usefulness does in no way speak against starting with something simple.

Agreed, my point was mostly that we must recognize that it won't be able to do many the things of "command-line checker" simply because much logic is deep enough in mdrun that the early command-line parsing doesn't even know whether a file's existence is relevant. Conditional compilation starts already in gmx_mdrun(), so we probably couldn't support any file-based checking with -multi.

I think the only simple thing to do is have an mdrun option (e.g. gmx mdrun -check that always stops after parse_common_args() in gmx_mdrun() - I just don't know what useful things that would do, because (for example) I don't know whether it has checked for existence of files that were named. I don't even know if duplicate flags like mdrun -npme 3 -npme 4 is an error caught (there, or elsewhere). mdrun -nt 4 -ntmpi 4 -ntomp 4 won't be caught there (nothing hardware-related can be caught there). Probably mdrun -s topol-with-typo could be caught, but I don't remember how much reading of the filesystem is done by that parsing function.

Duplicate command line arguments are checked for and should be caught early. Checking the existence of a few files is rather straightforward and little code, so again, a first version would not need to reuse the very code that does all checks on input files before opening it.

For instance, doing only command line parsing/validation and checking the existence of input files would catch the majority of the really irritating mistakes (like missing or typo in argument, duplicate option, missing input file, etc).

I think these can be caught already with gmx mdrun -nsteps 0, but it's ugly because it will touch the filesystem in some cases. I think not touching the filesystem is pretty much an essential requirement of such a checker,

I agree.

which puts a fairly hard limit on where it has to exit - which is pretty soon after parse_common_args().

Do you have some specific examples of errors you've seen that we can try?

Well, I would be happy with a checker that does only the command like checking currently done (+ perhaps a bit more thorough type in the option parser, e.g. nsteps<-2 could be caught already by an isValidMdrunOptionValue(option, val) method) and makes sure that my required inputs (e.g. topol.tpr and index.ndx) exist.

Also, I'm well aware that the tool would need to run on the login node which in many cases will require a separate build, but that's again something we can't do much about except document and this does not diminish the usefulness of such a the tool either.

Should gmx mdrun-checker -nt -1 be an error or not? It is OK with thread-MPI and not with real MPI.

Hmmm, that sounds like a detail that can be decided later.

#6 Updated by Teemu Murtola almost 5 years ago

Szilárd Páll wrote:

Mark Abraham wrote:

I think the only simple thing to do is have an mdrun option (e.g. gmx mdrun -check that always stops after parse_common_args() in gmx_mdrun() - I just don't know what useful things that would do, because (for example) I don't know whether it has checked for existence of files that were named. I don't even know if duplicate flags like mdrun -npme 3 -npme 4 is an error caught (there, or elsewhere). mdrun -nt 4 -ntmpi 4 -ntomp 4 won't be caught there (nothing hardware-related can be caught there). Probably mdrun -s topol-with-typo could be caught, but I don't remember how much reading of the filesystem is done by that parsing function.

Duplicate command line arguments are checked for and should be caught early. Checking the existence of a few files is rather straightforward and little code, so again, a first version would not need to reuse the very code that does all checks on input files before opening it.

If you just want to check for the existence of the input files, then parse_common_args() in master (but not in earlier versions) already does that (except for -multi or -multidir, since those are just too difficult to implement with the parse_common_args() interface; converting gmx_mdrun() into a C++ class like is now done for gmx insert-molecules would probably make this much more feasible).

#7 Updated by Mark Abraham over 3 years ago

  • Target version deleted (future)

Also available in: Atom PDF