Project

General

Profile

Task #2770

change branch maintenance policy

Added by Mark Abraham 8 months ago. Updated 8 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Difficulty:
uncategorized
Close

Description

Several issues have been found in the shell code, and some test cases that use it. Neither Paul nor I know how to make a merge commit from release-2018 into release-2019. These are always complicated (because there are usually multiple changes and code has sometimes been refactored), and there has to be changes to regression tests also it can take large amounts of human and Jenkins time.

Erik and I think we need a change of policy regarding fixing on old branches. If someone thinks an issue is serious enough to fix on an old branch, then we need the onus to be on them and those who agree with them to make the fix and arrange for it to be done in multiple branches. The easiest way to implement that is if we by default do all fixes in the most recent release branch. If someone wants to backport a suitably serious fix, then they can do that, and it will be clear up front whether code refactoring is going to be a problem for doing that fix. Because the recent release branch has all the fixes, nothing needs to be merged. The semantics of all release branches are clear - they have the latest believed good version of the code.

In the short term, I think we need to revert the release-2018 shell fix https://gerrit.gromacs.org/#/c/8676/ until someone understands how to do a good job of it that will also work in release-2019 branch. I don't have time to look into it until at least January.

That the shell and normal-mode code is hard to understand and refactor is a separate problem. If it needs to be tightly coupled to force calculation then we need someone to step up and invest the time into decoupling it. For example, rerun, minimize and tpi have all been implemented in their own integrator loop, and shells could do likewise. Now there is not an additional layer of functions wrapping do_force to understand and maintain.

Associated revisions

Revision d5972f77 (diff)
Added by Paul Bauer 8 months ago

Fix energy history reading

The energy history could be read as a nullptr from a checkpoint file,
leading to issues when trying to restart a simulation. Fixed the logic
issue and added an assertion to catch it in the future.

Also removed redundant comparisons to nullptr.

Separate commit on 2018 as per #2770.

Refs #2781

Change-Id: Ic584dc92c110065c1650cc1ab0d7ff0a8960fb3a

History

#1 Updated by Berk Hess 8 months ago

I don't understand what the actual suggested change is here. We already changed our policy about a year ago to fixing things first in the latest release branch and then merging them to master and to older releases, if necessary. The text here does not seem to suggest a change to that policy. The question in the particular issue mentioned here is what is our latest release, which I would say is 2018, as 2019 is in beta.

If things changes are too hard to merge, we should fix them separately in release and master. Or is the issue that the whole branch can not be merged anymore?

#2 Updated by Erik Lindahl 8 months ago

I think what we failed to consider last year is the 2-3 month period that happens after branching off the new release, but before the actual sharp release has happened. During this period we effective have three branches: Master, the beta (about to become the new release), and the previous release.

Merging is not a free lunch, but it takes time and effort for somebody to check the merging works as intended (previously mostly Mark, and now increasingly also Paul), and in particular in this frame where we are all very busy and there are a ton of fixes being made in the beta branch that then have to be merged into master, I can see how it adds to the complexity and work overhead to also have to work on merges from the previous release at the same time - and then I would prioritize the time of the people working on the release, since they have a deadline to meet.

So, since we already have a clear threshold that the second we branch off the release (into the beta) we no longer do new development there, to me it makes sense that this branch also becomes the main fix target at the same time to avoid double branch merging, even though the formal release is still a little bit away.

All important bugfixes can of course still go in stable branch too, but I think it a reasonable balance that we backport from that date to distribute the work between all of us, rather than hoping that the person doing the merging can fix it :-)

#3 Updated by David van der Spoel 8 months ago

So to get this completely straight, during the beta stages the policy should then change to
- fix in beta
- backport to stable release if deemed necessary.
Is that correct?

#4 Updated by Mark Abraham 8 months ago

  • Description updated (diff)

@Berk My proposal is that we should do our fixes on one branch (the most recent release-20xx branch). If something is severe enough to backport, then someone can go do that (e.g. by a cherry-pick). But once they've done that, then both release branches are fixed, and that is the end of it. There is nothing to merge. At some point we stop ever doing those cherry picks for a given branch, and that is also fine.

The objective is to get rid of the policy of fixing the bug in the oldest supported branch (which has a problematic definition during this release phase), because you now have the risk that we can't get a merge working in time to get out the next patch/beta release.

This is even worse when we still have the regressiontests in another repo. I will not be available to discuss feature or performance work in 2019 until we have agreed on who will support me to work on moving all the test coverage into the source repo, port it to the verlet scheme, and found a way to get it running with reasonable coverage. This is something we all need, and I won't tolerate the appearance of an underclass of developers who take care of the dirty work. So I expect everyone to offer that support.

@David Yes that is the gist

#5 Updated by Mark Abraham 8 months ago

We do still need to merge into master, however.

#6 Updated by Erik Lindahl 8 months ago

Yeah, but merging to master usually doesn't have any deadline/urgency.

I'll look into moving some stuff during the hackathon tomorrow. In particular, since we haven't seen the group kernel tests catch any real bugs the last year I think we can disable them.

#7 Updated by David van der Spoel 8 months ago

I totally agree with Mark's comment that every developer should help with the dirty work. However as we all know it can be rather overwhelming to start developing and testing gromacs.

Would it make sense to update the website for this purpose, for instance http://www.gromacs.org/Developer_Zone/Building_and_Testing ?

Is there anything that can be done to make the output from Jenkins more comprehensible?

#8 Updated by Erik Lindahl 8 months ago

Reading up in detail on the tools we use, studying the config files used, and then documenting that properly for others is of course one area where help would always be welcome.

Personally I hope to be able to finalize both a ceph storage and a more flexible (redundant) hardware infrastructure for the testing in december (with some help from Stefan), but given the usual packed schedule that might not be until the X-mas break.

#9 Updated by Mark Abraham 8 months ago

Erik Lindahl wrote:

Yeah, but merging to master usually doesn't have any deadline/urgency.

I'll look into moving some stuff during the hackathon tomorrow. In particular, since we haven't seen the group kernel tests catch any real bugs the last year I think we can disable them.

I de-abandoned https://gerrit.gromacs.org/#/c/4849/ and https://gerrit.gromacs.org/#/c/4856/

The real work is working out how to annotate test cases with when it makes sense to run them (number of GPUs, duties, ranks, threads) and then matching the capabilities with how the test was required to run (so that we can have clarity that we do test particular cases like a separate PME rank using a GPU) with what the node can actually do.

#10 Updated by Mark Abraham 8 months ago

David van der Spoel wrote:

I totally agree with Mark's comment that every developer should help with the dirty work. However as we all know it can be rather overwhelming to start developing and testing gromacs.

Indeed. Of course, I am looking at the more experienced people here.

Would it make sense to update the website for this purpose, for instance http://www.gromacs.org/Developer_Zone/Building_and_Testing ?

Much of that website content is long superseded (see http://jenkins.gromacs.org/job/Documentation_Nightly_master/javadoc/dev-manual/index.html which itself is due for some ease-of-reading reorganization), but since I have to exist running from fire to fire supporting legacy code and little used algorithms, strategic things never happen.

Is there anything that can be done to make the output from Jenkins more comprehensible?

Possibly, depends what exactly you mean. Getting access log files and stdout from the regression suite is one problem. I'd like to get rid of the way we have a job trigger a matrix job, but I don't have a plan for how to do that. Possibly fancier groovy and a single pipeline build. But the current pipeline build (e.g. the release builds) have their own issues with discoverability of reasons for failures. When we have time to transition to a container-based implementation, then it will be possible to give developers a link to the container that ran the tests and has all the output as if you'd run it on your local machine... and perhaps you can tweak the code, recompile because the compiler is in the docker image, and fix it locally.

#11 Updated by Gerrit Code Review Bot 8 months ago

Gerrit received a related patchset '1' for Issue #2770.
Uploader: Paul Bauer ()
Change-Id: gromacs~release-2018~Ic584dc92c110065c1650cc1ab0d7ff0a8960fb3a
Gerrit URL: https://gerrit.gromacs.org/8760

Also available in: Atom PDF